postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-04-22 14:47:29 -04:00

Author	SHA1	Message	Date
Michael Paquier	3c5ec35dea	oid2name: Add relation path to the information provided by -x/--extended This affects two command patterns, showing information about relations: * oid2name -x -d DBNAME, applying to all relations on a database. * oid2name -x -d DBNAME -t TABNAME [-t ..], applying to a subset of defined relations on a database. The relative path of a relation is added to the information provided, using pg_relation_filepath(). Author: David Bidoc <dcbidoc@gmail.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Guillaume Lelarge <guillaume.lelarge@dalibo.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Mark Wong <markwkm@gmail.com> Discussion: https://postgr.es/m/CABour1v2CU1wjjoM86wAFyezJQ3-+ncH43zY1f1uXeVojVN8Ow@mail.gmail.com	2026-02-05 09:02:12 +09:00
John Naylor	176dffdf7d	Fix various instances of undefined behavior Mostly this involves checking for NULL pointer before doing operations that add a non-zero offset. The exception is an overflow warning in heap_fetch_toast_slice(). This was caused by unneeded parentheses forcing an expression to be evaluated to a negative integer, which then got cast to size_t. Per clang 21 undefined behavior sanitizer. Backpatch to all supported versions. Co-authored-by: Alexander Lakhin <exclusion@gmail.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/777bd201-6e3a-4da0-a922-4ea9de46a3ee@gmail.com Backpatch-through: 14	2026-02-04 18:09:35 +07:00
Peter Eisentraut	955e507668	Change StaticAssertVariableIsOfType to be a declaration This allows moving the uses to more natural and useful positions. Also, a declaration is the more native use of static assertions in C. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/2273bc2a-045d-4a75-8584-7cd9396e5534%40eisentraut.org	2026-02-03 08:46:02 +01:00
Peter Eisentraut	137d05df2f	Rename AssertVariableIsOfType to StaticAssertVariableIsOfType This keeps run-time assertions and static assertions clearly separate. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/2273bc2a-045d-4a75-8584-7cd9396e5534%40eisentraut.org	2026-02-03 08:45:24 +01:00
Melanie Plageman	4a99ef1a0d	Fix flakiness in the pg_visibility VM-only vacuum test by using a temporary table. The test relies on VACUUM being able to mark a page all-visible, but this can fail when autovacuum in other sessions prevents the visibility horizon from advancing. Making the test table temporary isolates its horizon from other sessions, including catalog table vacuums, ensuring reliable test behavior. Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/2b09fba6-6b71-497a-96ef-a6947fcc39f6%40gmail.com	2026-02-02 17:45:27 -05:00
Tom Lane	da7a1dc0d6	Refactor att_align_nominal() to improve performance. Separate att_align_nominal() into two macros, similarly to what was already done with att_align_datum() and att_align_pointer(). The inner macro att_nominal_alignby() is really just TYPEALIGN(), while att_align_nominal() retains its previous API by mapping TYPALIGN_xxx values to numbers of bytes to align to and then calling att_nominal_alignby(). In support of this, split out tupdesc.c's logic to do that mapping into a publicly visible function typalign_to_alignby(). Having done that, we can replace performance-critical uses of att_align_nominal() with att_nominal_alignby(), where the typalign_to_alignby() mapping is done just once outside the loop. In most places I settled for doing typalign_to_alignby() once per function. We could in many places pass the alignby value in from the caller if we wanted to change function APIs for this purpose; but I'm a bit loath to do that, especially for exported APIs that extensions might call. Replacing a char typalign argument by a uint8 typalignby argument would be an API change that compilers would fail to warn about, thus silently breaking code in hard-to-debug ways. I did revise the APIs of array_iter_setup and array_iter_next, moving the element type attribute arguments to the former; if any external code uses those, the argument-count change will cause visible compile failures. Performance testing shows that ExecEvalScalarArrayOp is sped up by about 10% by this change, when using a simple per-element function such as int8eq. I did not check any of the other loops optimized here, but it's reasonable to expect similar gains. Although the motivation for creating this patch was to avoid a performance loss if we add some more typalign values, it evidently is worth doing whether that patch lands or not. Discussion: https://postgr.es/m/1127261.1769649624@sss.pgh.pa.us	2026-02-02 14:39:50 -05:00
Masahiko Sawada	1fdbca159e	Standardize replication origin naming to use "ReplOrigin". The replication origin code was using inconsistent naming conventions. Functions were typically prefixed with 'replorigin', while typedefs and constants used "RepOrigin". This commit unifies the naming convention by renaming RepOriginId to ReplOriginId. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAD21AoBDgm3hDqUZ+nqu=ViHmkCnJBuJyaxG_yvv27BAi2zBmQ@mail.gmail.com	2026-01-28 11:03:29 -08:00
Tomas Vondra	09c37015d4	pgindent fix for `3fccbd94cb` Backpatch-through: 18	2026-01-27 00:26:36 +01:00
Tomas Vondra	3fccbd94cb	Handle ENOENT status when querying NUMA node We've assumed that touching the memory is sufficient for a page to be located on one of the NUMA nodes. But a page may be moved to a swap after we touch it, due to memory pressure. We touch the memory before querying the status, but there is no guarantee it won't be moved to the swap in the meantime. The touching happens only on the first call, so later calls are more likely to be affected. And the batching increases the window too. It's up to the kernel if/when pages get moved to swap. We have to accept ENOENT (-2) as a valid result, and handle it without failing. This patch simply treats it as an unknown node, and returns NULL in the two affected views (pg_shmem_allocations_numa and pg_buffercache_numa). Hugepages cannot be swapped out, so this affects only regular pages. Reported by Christoph Berg, investigation and fix by me. Backpatch to 18, where the two views were introduced. Reported-by: Christoph Berg <myon@debian.org> Discussion: 18 Backpatch-through: https://postgr.es/m/aTq5Gt_n-oS_QSpL@msg.df7cb.de	2026-01-27 00:21:40 +01:00
Melanie Plageman	21796c267d	Combine visibilitymap_set() cases in lazy_scan_prune() lazy_scan_prune() previously had two separate cases that called visibilitymap_set() after pruning and freezing. These branches were nearly identical except that one attempted to avoid dirtying the heap buffer. However, that situation can never occur — the heap buffer cannot be clean at that point (and we would hit an assertion if it were). In lazy_scan_prune(), when we change a previously all-visible page to all-frozen and the page was recorded as all-visible in the visibility map by find_next_unskippable_block(), the heap buffer will always be dirty. Either we have just frozen a tuple and already dirtied the buffer, or the buffer was modified between find_next_unskippable_block() and heap_page_prune_and_freeze() and then pruned in heap_page_prune_and_freeze(). Additionally, XLogRegisterBuffer() asserts that the buffer is dirty, so attempting to add a clean heap buffer to the WAL chain would assert out anyway. Since the “clean heap buffer with already set VM” case is impossible, the two visibilitymap_set() branches in lazy_scan_prune() can be merged. Doing so makes the intent clearer and emphasizes that the heap buffer must always be marked dirty before being added to the WAL chain. This commit also adds a test case for vacuuming when no heap modifications are required. Currently this ensures that the heap buffer is marked dirty before it is added to the WAL chain, but if we later remove the heap buffer from the VM-set WAL chain or pass it with the REGBUF_NO_CHANGES flag, this test would guard that behavior. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com Discussion: https://postgr.es/m/flat/CAAKRu_ZWx5gCbeCf7PWCv8p5%3D%3Db7EEws0VD2wksDxpXCvCyHvQ%40mail.gmail.com	2026-01-26 16:03:32 -05:00
Peter Eisentraut	5ca5f12c2c	Fix accidentally cast away qualifiers This fixes cases where a qualifier (const, in all cases here) was dropped by a cast, but the cast was otherwise necessary or desirable, so the straightforward fix is to add the qualifier into the cast. Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b04f4d3a-5e70-4e73-9ef2-87f777ca4aac%40eisentraut.org	2026-01-26 16:02:31 +01:00
Michael Paquier	72e3abd082	pg_stat_statements: Fix test instability with cache-clobbering builds Builds with CLOBBER_CACHE_ALWAYS enabled are failing the new test introduced in `1572ea96e6`, checking the nesting level calculation in the planner hook. The inner query of the function called twice is registered as normalized, as such builds would register a PGSS entry in the post-parse-analyse hook due to the cached plans requiring revalidation. A trick based on debug_discard_caches cannot work as far as I can, a normalized query still being registered. This commit takes a different approach with the addition of a DISCARD PLANS before the first function call. This forces the use of a normalized query in the PGSS entry for the inner query of the function with and without CLOBBER_CACHE_ALWAYS, which should be enough to stabilize the test. Note that the test is still checking what it should: when removing the nesting level calculation in the planner hook of PGSS, one still gets a failure for the PGSS entry of the inner query in the function, with "toplevel" being flipped to true instead of false (it should be false, as a non-top-level entry). Per buildfarm members avocet and trilobite, at least. Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/82dd02bb-4e0f-40ad-a60b-baa1763ff0bd@gmail.com	2026-01-25 19:01:23 +09:00
Amit Langote	f9a468c664	Fix bogus ctid requirement for dummy-root partitioned targets ExecInitModifyTable() unconditionally required a ctid junk column even when the target was a partitioned table. This led to spurious "could not find junk ctid column" errors when all children were excluded and only the dummy root result relation remained. A partitioned table only appears in the result relations list when all leaf partitions have been pruned, leaving the dummy root as the sole entry. Assert this invariant (nrels == 1) and skip the ctid requirement. Also adjust ExecModifyTable() to tolerate invalid ri_RowIdAttNo for partitioned tables, which is safe since no rows will be processed in this case. Bug: #19099 Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/19099-e05dcfa022fe553d%40postgresql.org Backpatch-through: 14	2026-01-23 10:23:30 +09:00
Peter Eisentraut	a5b40d156e	Mark commented out code as unused There were many PG_GETARG_* calls, mostly around gin, gist, spgist code, that were commented out, presumably to indicate that the argument was unused and to indicate that it wasn't forgotten or miscounted. But keeping commented-out code updated with refactorings and style changes is annoying. So this commit changes them to #ifdef NOT_USED blocks, which is a style already in use. That way, at least the indentation and syntax highlighting works correctly, making some of these blocks much easier to read. An alternative would be to just delete that code, but there is some value in making unused arguments explicit, and some of this arguably serves as example code for index AM APIs. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: David Geier <geidav.pg@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/328e4371-9a4c-4196-9df9-1f23afc900df%40eisentraut.org	2026-01-22 12:44:07 +01:00
Fujii Masao	26cb14aea1	file_fdw: Support multi-line HEADER option. Commit `bc2f348` introduced multi-line HEADER support for COPY. This commit extends this capability to file_fdw, allowing multiple header lines to be skipped. Because CREATE/ALTER FOREIGN TABLE requires option values to be single-quoted, this commit also updates defGetCopyHeaderOption() to accept integer values specified as strings for HEADER option. Author: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: songjinzhou <tsinghualucky912@foxmail.com> Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAOzEurT+iwC47VHPMS+uJ4WSzvOLPsZ2F2_wopm8M7O+CZa3Xw@mail.gmail.com	2026-01-22 10:14:12 +09:00
Tom Lane	4576208454	Force standard_conforming_strings to always be ON. Continuing to support this backwards-compatibility feature has nontrivial costs; in particular it is potentially a security hazard if an application somehow gets confused about which setting the server is using. We changed the default to ON fifteen years ago, which seems like enough time for applications to have adapted. Let's remove support for the legacy string syntax. We should not remove the GUC altogether, since client-side code will still test it, pg_dump scripts will attempt to set it to ON, etc. Instead, just prevent it from being set to OFF. There is precedent for this approach (see commit `de66987ad`). This patch does remove the related GUC escape_string_warning, however. That setting does nothing when standard_conforming_strings is on, so it's now useless. We could leave it in place as a do-nothing setting to avoid breaking clients that still set it, if there are any. But it seems likely that any such client is also trying to turn off standard_conforming_strings, so it'll need work anyway. The client-side changes in this patch are pretty minimal, because even though we are dropping the server's support, most of our clients still need to be able to talk to older server versions. We could remove dead client code only once we disclaim compatibility with pre-v19 servers, which is surely years away. One change of note is that pg_dump/pg_dumpall now set standard_conforming_strings = on in their source session, rather than accepting the source server's default. This ensures that literals in view definitions and such will be printed in a way that's acceptable to v19+. In particular, pg_upgrade will work transparently even if the source installation has standard_conforming_strings = off. (However, pg_restore will behave the same as before if given an archive file containing standard_conforming_strings = off. Such an archive will not be safely restorable into v19+, but we shouldn't break the ability to extract valid data from it for use with an older server.) Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/3279216.1767072538@sss.pgh.pa.us	2026-01-21 15:08:38 -05:00
Álvaro Herrera	1f28982e40	amcheck: Fix snapshot usage in bt_index_parent_check We were using SnapshotAny to do some index checks, but that's wrong and causes spurious errors when used on indexes created by CREATE INDEX CONCURRENTLY. Fix it to use an MVCC snapshot, and add a test for it. Backpatch of `6bd469d26a` to branches 14-16. I previously misidentified the bug's origin: it came in with commit `7f563c09f8` (pg11-era, not `5ae2087202` as claimed previously), so all live branches are affected. Also take the opportunity to fix some comments that we failed to update in the original commits and apply pgperltidy. In branch 14, remove the unnecessary test plan specification (which would have need to have been changed anyway; c.f. commit 549ec201d613.) Diagnosed-by: Donghang Lin <donghanglin@gmail.com> Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Backpatch-through: 17 Discussion: https://postgr.es/m/CANtu0ojmVd27fEhfpST7RG2KZvwkX=dMyKUqg0KM87FkOSdz8Q@mail.gmail.com	2026-01-21 18:55:43 +01:00
Michael Paquier	1572ea96e6	pg_stat_statements: Add more tests for level tracking This commit adds tests to verify the computation of the nesting level for two code paths: the planner hook and the ExecutorFinish() hook. The nesting level is essential to save a correct "toplevel" status for the added PGSS entries. The author has noticed that removing the manipulations of nesting_level in these two code paths did not cause the tests to complain, meaning that we never had coverage for the assumptions taken by the code. Author: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0uK1PSrgf52bWCtDpzaqbWt04o6ZA7zBm6UQyv7vyvf9w@mail.gmail.com	2026-01-21 18:18:15 +09:00
Michael Paquier	905ef401d5	pg_stat_statements: Clean up REGRESS list in Makefile The "wal" and "entry_timestamp" items were still on the same line, which was not intentional. Thinko in `f9afd56218`. Reported-by: Man Zeng <zengman@halodbtech.com> Discussion: https://postgr.es/m/aW6_Xc8auuu5iAPi@paquier.xyz	2026-01-21 11:29:34 +09:00
Michael Paquier	f9afd56218	pg_stat_statements: Rework test order The test "squashing" was the last item of the REGRESS list, but "cleanup" should be the second to last, dropping the extension. "oldextversions" is the last item. In passing, the REGRESS list is cleaned up to include one item per line, so as diffs are minimized when adding new test files. Noticed while playing with this area of the code. Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Man Zeng <zengman@halodbtech.com> Discussion: https://postgr.es/m/aW6_Xc8auuu5iAPi@paquier.xyz	2026-01-21 07:47:38 +09:00
Michael Paquier	5d95219faa	pg_stat_statements: Fix crash in list squashing with Vars When IN/ANY clauses contain both constants and variable expressions, the optimizer transforms them into separate structures: constants become an array expression while variables become individual OR conditions. This transformation was creating an overlap with the token locations, causing pg_stat_statements query normalization to crash because it could not calculate the amount of bytes remaining to write for the normalized query. This commit disables squashing for mixed IN list expressions when constructing a scalar array op, by setting list_start and list_end to -1 when both variables and non-variables are present. Some regression tests are added to PGSS to verify these patterns. Author: Sami Imseih <samimseih@gmail.com> Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0ts9qiONnHjjHxPxtePs22GBo4d3jZ_s2BQC59AN7XbAA@mail.gmail.com Backpatch-through: 18	2026-01-20 08:11:12 +09:00
Andres Freund	dac328c8a6	bufmgr: Change BufferDesc.state to be a 64-bit atomic This is motivated by wanting to merge buffer content locks into BufferDesc.state in a future commit, rather than having a separate lwlock (see commit `c75ebc657f` for more details). As this change is rather mechanical, it seems to make sense to split it out into a separate commit, for easier review. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2026-01-15 14:20:41 -05:00
Álvaro Herrera	35e3fae738	Remove #include <math.h> where not needed Liujinyang reported the one in binaryheap.c, I then found and analyzed the rest. For future patches, we require git archaelogical analysis before we accept patches of this nature. Co-authored-by: liujinyang <21043272@qq.com> Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/tencent_6B302BFCAF6F010E00AB5C2C0ECB7AA3F205@qq.com	2026-01-15 19:09:47 +01:00
Michael Paquier	6dcfac9696	Use more consistent *GetDatum() macros for some unsigned numbers This patch switches some code paths to use GetDatum() macros more in line with the data types of the variables they manipulate. This set of changes does not fix a problem, but it is always nice to be more consistent across the board. Author: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Roman Khapov <rkhapov@yandex-team.ru> Reviewed-by: Yuan Li <carol.li2025@outlook.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Man Zeng <zengman@halodbtech.com> Discussion: https://postgr.es/m/CALdSSPidtC7j3MwhkqRj0K2hyp36ztnnjSt6qzGxQtiePR1dzw@mail.gmail.com	2026-01-14 17:07:49 +09:00
Amit Kapila	e385a4e2fd	Prevent unintended dropping of active replication origins. Commit `5b148706c5` exposed functionality that allows multiple processes to use the same replication origin, enabling non-builtin logical replication solutions to implement parallel apply for large transactions. With this functionality, if two backends acquire the same replication origin and one of them resets it first, the acquired_by flag is cleared without acknowledging that another backend is still actively using the origin. This can lead to the origin being unintentionally dropped. If the shared memory for that dropped origin is later reused for a newly created origin, the remaining backend that still holds a pointer to the old memory may inadvertently advance the LSN of a completely different origin, causing unpredictable behavior. Although the underlying issue predates commit `5b148706c5`, it did not surface earlier because the internal parallel apply worker mechanism correctly coordinated origin resets and drops. This commit resolves the problem by introducing a reference counter for replication origins. The reference count increases when a backend sets the origin and decreases when it resets it. Additionally, the backend that first acquires the origin will not release it until all other backends using the origin have released it as well. The patch also prevents dropping a replication origin when acquired_by is zero but the reference counter is nonzero, covering the scenario where the first session exits without properly releasing the origin. Author: Hou Zhijie <houzj.fnst@fujitsu.com> Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Shveta Malik <shveta.malik@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/TY4PR01MB169077EE72ABE9E55BAF162D494B5A@TY4PR01MB16907.jpnprd01.prod.outlook.com Discussion: https://postgr.es/m/CAMPB6wfe4zLjJL8jiZV5kjjpwBM2=rTRme0UCL7Ra4L8MTVdOg@mail.gmail.com	2026-01-14 07:15:46 +00:00
Michael Paquier	e217dc7484	Fix query jumbling with GROUP BY clauses RangeTblEntry.groupexprs was marked with the node attribute query_jumble_ignore, causing a list of GROUP BY expressions to be ignored during the query jumbling. For example, these two queries could be grouped together within the same query ID: SELECT count() FROM t GROUP BY a; SELECT count() FROM t GROUP BY b; However, as such queries use different GROUP BY clauses, they should be split across multiple entries. This fixes an oversight in `247dea89f7`, that has introduced an RTE for GROUP BY clauses. Query IDs are documented as being stable across minor releases, but as this is a regression new to v18 and that we are still early in its support cycle, a backpatch is exceptionally done as this has broken a behavior that exists since query jumbling is supported in core, since its introduction in pg_stat_statements. The tests of pg_stat_statements are expanded to cover this area, with patterns involving GROUP BY and GROUPING clauses. Author: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxEy2W+tCqC7XuJ94r3ivWsM=onKJp94kRFx3hoARjBeFQ@mail.gmail.com Backpatch-through: 18	2026-01-14 08:44:12 +09:00
Álvaro Herrera	225d1df1d2	Stop including {brin,gin}_tuple.h in tuplesort.h Doing this meant that those two headers, which are supposed to be internal to their corresponding index AMs, were being included pretty much universally, because tuplesort.h is included by execnodes.h which is very widely used. Stop that, and fix fallout. We also change indexing.h to no longer include execnodes.h (tuptable.h is sufficient), and relscan.h to no longer include buf.h (pointless since `c2fe139c20`). Author: Mario González <gonzalemario@gmail.com> Discussion: https://postgr.es/m/CAFsReFUcBFup=Ohv_xd7SNQ=e73TXi8YNEkTsFEE2BW7jS1noQ@mail.gmail.com	2026-01-12 18:09:49 +01:00
Jeff Davis	b96a9fd76f	fuzzystrmatch: use pg_ascii_toupper(). fuzzystrmatch is designed for ASCII, so no need to rely on the global LC_CTYPE setting. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/dd0cdd1f-e786-426e-b336-1ffa9b2f1fc6%40eisentraut.org	2026-01-12 08:54:04 -08:00
Álvaro Herrera	2defd00062	Move instrumentation-related structs to instrument_node.h Some structs and enums related to parallel query instrumentation had organically grown scattered across various files, and were causing header pollution especially through execnodes.h. Create a single file where they can live together. This only moves the structs to the new file; cleaning up the pollution by removing no-longer-necessary cross-header inclusion will be done in future commits. Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de> Co-authored-by: Mario González <gonzalemario@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/202510051642.wwmn4mj77wch@alvherre.pgsql Discussion: https://postgr.es/m/CAFsReFUr4KrQ60z+ck9cRM4WuUw1TCghN7EFwvV0KvuncTRc2w@mail.gmail.com	2026-01-12 16:59:28 +01:00
Peter Eisentraut	e39ece0343	Make dmetaphone collation-aware The dmetaphone() SQL function internally upper-cases the argument string. It did this using the toupper() function. That way, it has a dependency on the global LC_CTYPE locale setting, which we want to get rid of. The "double metaphone" algorithm specifically supports the "C with cedilla" letter, so just using ASCII case conversion wouldn't work. To fix that, use the passed-in collation and use the str_toupper() function, which has full awareness of collations and collation providers. Note that this does not change the fact that this function only works correctly with single-byte encodings. The change to str_toupper() makes the case conversion multibyte-enabled, but the rest of the function is still not ready. Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://www.postgresql.org/message-id/108e07a2-0632-4f00-984d-fe0e0d0ec726%40eisentraut.org	2026-01-12 08:35:48 +01:00
Andres Freund	e5a5e0a907	instrumentation: Keep time fields as instrtime, convert in callers Previously the instrumentation logic always converted to seconds, only for many of the callers to do unnecessary division to get to milliseconds. As an upcoming refactoring will split the Instrumentation struct, utilize instrtime always to keep things simpler. It's also a bit faster to not have to first convert to a double in functions like InstrEndLoop(), InstrAggNode(). Author: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAP53PkzZ3UotnRrrnXWAv=F4avRq9MQ8zU+bxoN9tpovEu6fGQ@mail.gmail.com	2026-01-09 13:38:00 -05:00
Tom Lane	b8ccd29152	Remove now-useless btree_gist--1.2.sql script. In the wake of the previous commit, this script will fail if executed via CREATE EXTENSION, so it's useless. Remove it, but keep the delta scripts, allowing old (even very old) versions of the btree_gist SQL objects to be upgraded to 1.9 via ALTER EXTENSION UPDATE after a pg_upgrade. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/2483812.1754072263@sss.pgh.pa.us	2026-01-08 14:09:58 -05:00
Tom Lane	b3b0b45717	Create btree_gist v1.9, in which inet/cidr opclasses aren't default. btree_gist's gist_inet_ops and gist_cidr_ops opclasses are fundamentally broken: they rely on an approximate representation of the inet values and hence sometimes miss rows they should return. We want to eventually get rid of them altogether, but as the first step on that journey, we should mark them not-opcdefault. To do that, roll up the preceding deltas since 1.2 into a new base script btree_gist--1.9.sql. This will allow installing 1.9 without going through a transient situation where gist_inet_ops and gist_cidr_ops are marked as opcdefault; trying to create them that way will fail if there's already a matching default opclass in the core system. Additionally provide btree_gist--1.8--1.9.sql, so that a database that's been pg_upgraded from an older version can be migrated to 1.9. I noted along the way that commit `57e3c5160` had missed marking the gist_bool_ops support functions as PARALLEL SAFE. While that probably has little harmful effect (since AFAIK we don't check that when calling index support functions), this seems like a good time to make things consistent. Readers will also note that I removed the former habit of installing some opclass operators/functions with ALTER OPERATOR FAMILY, instead just rolling them all into the CREATE OPERATOR CLASS steps. The comment in btree_gist--1.2.sql that it's necessary to use ALTER for pg_upgrade reproducibility has been obsolete since we invented the amadjustmembers infrastructure. Nowadays, gistadjustmembers will force all operators and non-required support functions to have "soft" opfamily dependencies, regardless of whether they are installed by CREATE or ALTER. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/2483812.1754072263@sss.pgh.pa.us	2026-01-08 13:56:08 -05:00
Michael Paquier	8b9b93e39b	Use relation_close() more consistently in contrib/ All the code paths updated here have been using index_close() to close a relation that was opened with relation_open(), in pgstattuple and pageinspect. index_close() does the same thing as relation_close(), so there is no harm, but being inconsistent could lead to issues if the internals of these close() functions begin to introduce some specific logic in the future. In passing, this commit adds some comments explaining why we are using relation_open() instead of index_open() in a few places, which is due to the fact that partitioned indexes are not allowed in these functions. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Japin Li <japinli@hotmail.com> Discussion: https://postgr.es/m/aUKamYGiDKO6byp5@ip-10-97-1-34.eu-west-3.compute.internal	2026-01-06 16:17:59 +09:00
Masahiko Sawada	e212a0f8e6	pg_visibility: Fix incorrect buffer lock description in comment. Although the comment in collect_corrupt_items() stated that the buffer is locked in exclusive mode, it is actually locked in shared mode. Author: Chao Li <lic@highgo.com> Discussion: https://postgr.es/m/CAEoWx2kkhxgfp=kinPMetnwHaa0JjR6YBkO_0gg0oiy6mu7Zjw@mail.gmail.com	2026-01-05 15:49:43 -08:00
Michael Paquier	b8cfcb9e00	Fix typos and inconsistencies in code and comments This change is a cocktail of harmonization of function argument names, grammar typos, renames for better consistency and unused code (see ltree). All of these have been spotted by the author. Author: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/b2c0d0b7-3944-487d-a03d-d155851958ff@gmail.com	2026-01-05 09:19:15 +09:00
David Rowley	4f49e4b55e	Fix selectivity estimation integer overflow in contrib/intarray This fixes a poorly written integer comparison function which was performing subtraction in an attempt to return a negative value when a < b and a positive value when a > b, and 0 when the values were equal. Unfortunately that didn't always work correctly due to two's complement having the INT_MIN 1 further from zero than INT_MAX. This could result in an overflow and cause the comparison function to return an incorrect result, which would result in the binary search failing to find the value being searched for. This could cause poor selectivity estimates when the statistics stored the value of INT_MAX (2147483647) and the value being searched for was large enough to result in the binary search doing a comparison with that INT_MAX value. Author: Chao Li <li.evan.chao@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAEoWx2ng1Ot5LoKbVU-Dh---dFTUZWJRH8wv2chBu29fnNDMaQ@mail.gmail.com Backpatch-through: 14	2026-01-04 20:32:40 +13:00
Bruce Momjian	451c43974f	Update copyright for 2026 Backpatch-through: 14	2026-01-01 13:24:10 -05:00
Tom Lane	bc6374cd76	Change IndexAmRoutines to be statically-allocated structs. Up to now, index amhandlers were expected to produce a new, palloc'd struct on each call. That requires palloc/pfree overhead, and creates a risk of memory leaks if the caller fails to pfree, and the time taken to fill such a large structure isn't nil. Moreover, we were storing these things in the relcache, eating several hundred bytes for each cached index. There is not anything in these structs that needs to vary at runtime, so let's change the definition so that an amhandler can return a pointer to a "static const" struct of which there's only one copy per index AM. Mark all the core code's IndexAmRoutine pointers const so that we catch anyplace that might still try to change or pfree one. (This is similar to the way we were already handling TableAmRoutine structs. This commit does fix one comment that was infelicitously copied-and-pasted into tableamapi.c.) This commit needs to be called out in the v19 release notes as an API change for extension index AMs. An un-updated AM will still work (as of now, anyway) but it risks memory leaks and will be slower than necessary. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAEoWx2=vApYk2LRu8R0DdahsPNEhWUxGBZ=rbZo1EXE=uA+opQ@mail.gmail.com	2025-12-30 18:26:23 -05:00
Tom Lane	cb77bc0442	Further stabilize a postgres_fdw test case. This patch causes one postgres_fdw test case to revert to the plan it used before `aa86129e1`, i.e., using a remote sort in preference to local sort. That decision is actually a coin-flip because cost_sort() will give the same answer on both sides, so that the plan choice comes down to little more than roundoff error. In consequence, the test output can change as a result of even minor changes in nearby costs, as we saw in `aa86129e1` (compare also `b690e5fac` and `4b14e1871`). b690e5fac's solution to stabilizing the adjacent test case was to disable sorting locally, and here we extend that to the currently- problematic case. Without this, the following patch would cause this plan choice to change back in this same way, for even less apparent reason. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/2551253.1766952956@sss.pgh.pa.us	2025-12-29 12:53:49 -05:00
Michael Paquier	9adf32da6b	Split some long Makefile lists This change makes more readable code diffs when adding new items or removing old items, while ensuring that lines do not get excessively long. Some SUBDIRS, PROGRAMS and REGRESS lists are split. Note that there are a few more REGRESS lists that could be split, particularly in contrib/. Author: Jelte Fennema-Nio <postgres@jeltef.nl> Co-Authored-By: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Man Zeng <zengman@halodbtech.com> Discussion: https://postgr.es/m/DF6HDGB559U5.3MPRFCWPONEAE@jeltef.nl	2025-12-28 09:17:42 +09:00
Masahiko Sawada	55c46bbf3a	pg_visibility: Use visibilitymap_count instead of loop. This commit updates pg_visibility_map_summary() to use the visibilitymap_count() API, replacing its own counting mechanism. This simplifies the function and improves performance by leveraging the vectorized implementation introduced in commit `41c51f0c68`. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAEze2WgPu-EYYuYQimy=AHQHGa7w8EvLVve5DM5eGMR6zh-7sw@mail.gmail.com	2025-12-23 10:33:06 -08:00
Michael Paquier	5cf03552fb	btree_gist: Fix memory allocation formula This change has been suggested by the two authors listed in this commit, both of them providing an incomplete solution (David's formula relied on a "bytea *", while Bertrand's did not use palloc_array()). The solution provided in this commit uses GBT_VARKEY instead of the inconsistent bytea for the allocation size, with a palloc_array(). The change related to Vsrt is one I am flipping to a more consistent style, in passing. Author: David Geier <geidav.pg@gmail.com> Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com Discussion: https://postgr.es/m/aTrG3Fi4APtfiCvQ@ip-10-97-1-34.eu-west-3.compute.internal	2025-12-18 11:01:43 +09:00
Jeff Davis	7f007e4a04	ltree: fix case-insensitive matching. Previously, ltree_prefix_eq_ci() used lowercasing with the default collation; while ltree_crc32_sz() used tolower() directly. These were equivalent only if the default collation provider was libc and the encoding was single-byte. Change both to use casefolding with the default collation. Backpatch through 18, where the casefolding APIs were introduced. The bug exists in earlier versions, but would require some adaptation. A REINDEX is required for ltree indexes where the database default collation is not libc. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Backpatch-through: 18 Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com Discussion: https://postgr.es/m/01fc00fd66f641b9693d4f9f1af0ccf44cbdfbdf.camel@j-davis.com	2025-12-16 12:57:00 -08:00
Jeff Davis	84d5efa7e3	Fix multibyte issue in ltree_strncasecmp(). Previously, the API for ltree_strncasecmp() took two inputs but only one length (that of the smaller input). It truncated the larger input to that length, but that could break a multibyte sequence. Change the API to be a check for prefix equality (possibly case-insensitive) instead, which is all that's needed by the callers. Also, provide the lengths of both inputs. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/5f65b85740197ba6249ea507cddf609f84a6188b.camel%40j-davis.com Backpatch-through: 14	2025-12-16 10:35:40 -08:00
Nathan Bossart	48d4a1423d	Allow passing a pointer to GetNamedDSMSegment()'s init callback. This commit adds a new "void *arg" parameter to GetNamedDSMSegment() that is passed to the initialization callback function. This is useful for reusing an initialization callback function for multiple DSM segments. Author: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CAN4CZFMjh8TrT9ZhWgjVTzBDkYZi2a84BnZ8bM%2BfLPuq7Cirzg%40mail.gmail.com	2025-12-15 14:27:16 -06:00
Michael Paquier	171198ff2a	pageinspect: use index_close() for GiST index relation gist_page_items() opens its target relation with index_open(), but closed it using relation_close() instead of index_close(). This was harmless because index_close() and relation_close() do the exact same work, still inconsistent with the rest of the code tree as routines opening and closing a relation based on a relkind are expected to match, at least in name. Author: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAEoWx2=bL41WWcD-4Fxx-buS2Y2G5=9PjkxZbHeFMR6Uy2WNvw@mail.gmail.com	2025-12-15 10:28:28 +09:00
Peter Eisentraut	493eb0da31	Replace most StaticAssertStmt() with StaticAssertDecl() Similar to commit `75f49221c2`, it is preferable to use StaticAssertDecl() instead of StaticAssertStmt() when possible. Discussion: https://www.postgresql.org/message-id/flat/CA%2BhUKGKvr0x_oGmQTUkx%3DODgSksT2EtgCA6LmGx_jQFG%3DsDUpg%40mail.gmail.com	2025-12-12 10:06:40 +01:00
Michael Paquier	4f7dacc5b8	Use palloc_object() and palloc_array(), the last change This is the last batch of changes that have been suggested by the author, this part covering the non-trivial changes. Some of the changes suggested have been discarded as they seem to lead to more instructions generated, leaving the parts that can be qualified as in-place replacements. Similar work has been done in `1b105f9472`, `0c3c5c3b06` and `31d3847a37`. Author: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com	2025-12-11 14:29:12 +09:00
Michael Paquier	3f83de20ba	pg_buffercache: Fix memory allocation formula The code over-allocated the memory required for os_page_status, relying on uint64 for its element size instead of an int, hence doubling what was required. This could mean quite a lot of memory if dealing with a lot of NUMA pages. Oversight in `ba2a3c2302`. Author: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com Backpatch-through: 18	2025-12-11 14:11:06 +09:00
Peter Eisentraut	2268f2b91b	Remove useless casts in format arguments There were a number of useless casts in format arguments, either where the input to the cast was already in the right type, or seemingly uselessly casting between types instead of just using the right format placeholder to begin with. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/07fa29f9-42d7-4aac-8834-197918cbbab6%40eisentraut.org	2025-12-09 07:33:08 +01:00
Peter Eisentraut	907caf5c39	Clean up int64-related format strings Remove some gratuitous uses of INT64_FORMAT. Make use of PRIu64/PRId64 were appropriate, remove unnecessary casts. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/07fa29f9-42d7-4aac-8834-197918cbbab6%40eisentraut.org	2025-12-09 07:33:08 +01:00
Tom Lane	0986e95161	Revise APIs for pushJsonbValue() and associated routines. Instead of passing "JsonbParseState **" to pushJsonbValue(), pass a pointer to a JsonbInState, which will contain the parseState stack pointer as well as other useful fields. Also, instead of returning a JsonbValue pointer that is often meaningless/ignored, return the top-level JsonbValue pointer in the "result" field of the JsonbInState. This involves a lot of (mostly mechanical) edits, but I think the results are notationally cleaner and easier to understand. Certainly the business with sometimes capturing the result of pushJsonbValue() and sometimes not was bug-prone and incapable of mechanical verification. In the new arrangement, JsonbInState.result remains null until we've completed a valid sequence of pushes, so that an incorrect sequence will result in a null-pointer dereference, not mistaken use of a partial result. However, this isn't simply an exercise in prettier notation. The real reason for doing it is to provide a mechanism whereby pushJsonbValue() can be told to construct the JsonbValue tree in a context that is not CurrentMemoryContext. That happens when a non-null "outcontext" is specified in the JsonbInState. No callers exercise that option in this patch, but the next patch in the series will make use of it. I tried to improve the comments in this area too. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: jian he <jian.universality@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/1060917.1753202222@sss.pgh.pa.us	2025-12-07 11:51:33 -05:00
Michael Paquier	31d3847a37	Use more palloc_object() and palloc_array() in contrib/ The idea is to encourage more the use of these new routines across the tree, as these offer stronger type safety guarantees than palloc(). In an ideal world, palloc() would then act as an internal routine of these flavors, whose footprint in the tree is minimal. The patch sent by the author is very large, and this chunk of changes represents something like 10% of the overall patch submitted. The code compiled is the same before and after this commit, using objdump to do some validation with a difference taken in-between. There are some diffs, which are caused by changes in line numbers because some of the new allocation formulas are shorter, for the following files: trgm_regexp.c, xpath.c and pg_walinspect.c. Author: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com	2025-12-05 16:40:26 +09:00
Amit Kapila	5db6a344ab	Rename column slotsync_skip_at to slotsync_last_skip. Commit `76b78721ca` introduced two new columns in pg_stat_replication_slots to improve monitoring of slot synchronization. One of these columns was named slotsync_skip_at, which is inconsistent with the naming convention used for similar columns in other system views. Columns that store timestamps of the most recent event typically use the 'last_' in the column name (e.g., last_autovacuum, checksum_last_failure). Renaming slotsync_skip_at to slotsync_last_skip aligns with this pattern, making the purpose of the column clearer and improving overall consistency across the views. Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Michael Banck <mbanck@gmx.net> Discussion: https://postgr.es/m/20251128091552.GB13635@p46.dedyn.io;lightning.p46.dedyn.io Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com	2025-12-05 04:12:55 +00:00
Peter Eisentraut	e158fd4d68	Remove no longer needed casts from Pointer These casts used to be required when Pointer was char , but now it's void (commit `1b2bb5077e`), so they are not needed anymore. Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org	2025-12-04 20:44:52 +01:00
Peter Eisentraut	c6be3daa05	Remove no longer needed casts to Pointer These casts used to be required when Pointer was char , but now it's void (commit `1b2bb5077e`), so they are not needed anymore. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org	2025-12-04 19:40:08 +01:00
Álvaro Herrera	6bd469d26a	amcheck: Fix snapshot usage in bt_index_parent_check We were using SnapshotAny to do some index checks, but that's wrong and causes spurious errors when used on indexes created by CREATE INDEX CONCURRENTLY. Fix it to use an MVCC snapshot, and add a test for it. This problem came in with commit `5ae2087202`, which introduced uniqueness check. Backpatch to 17. Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Backpatch-through: 17 Discussion: https://postgr.es/m/CANtu0ojmVd27fEhfpST7RG2KZvwkX=dMyKUqg0KM87FkOSdz8Q@mail.gmail.com	2025-12-04 18:12:08 +01:00
Peter Eisentraut	9790affcce	Fix stray references to SubscriptRef This type never existed. SubscriptingRef was meant instead. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/2eaa45e3-efc5-4d75-b082-f8159f51445f%40eisentraut.org	2025-12-03 14:44:14 +01:00
Peter Eisentraut	756a436893	Don't rely on pointer arithmetic with Pointer type The comment for the Pointer type says 'XXX Pointer arithmetic is done with this, so it can't be void * under "true" ANSI compilers.'. This fixes that. Change from Pointer to use char * explicitly where pointer arithmetic is needed. This makes the meaning of the code clearer locally and removes a dependency on the actual definition of the Pointer type. (The definition of the Pointer type is not changed in this commit.) Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org	2025-12-03 09:54:15 +01:00
Peter Eisentraut	623801b3bd	Remove useless casts to Pointer in arguments of memcpy() and memmove() calls Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org	2025-12-03 08:40:33 +01:00
Heikki Linnakangas	cbe04e5d72	Fix amcheck's handling of half-dead B-tree pages amcheck incorrectly reported the following error if there were any half-dead pages in the index: ERROR: mismatch between parent key and child high key in index "amchecktest_id_idx" It's expected that a half-dead page does not have a downlink in the parent level, so skip the test. Reported-by: Konstantin Knizhnik <knizhnik@garret.ru> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Reviewed-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Discussion: https://www.postgresql.org/message-id/33e39552-6a2a-46f3-8b34-3f9f8004451f@garret.ru Backpatch-through: 14	2025-12-02 21:11:15 +02:00
Heikki Linnakangas	6c05ef5729	Fix amcheck's handling of incomplete root splits in B-tree When the root page is being split, it's normal that root page according to the metapage is not marked BTP_ROOT. Fix bogus error in amcheck about that case. Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://www.postgresql.org/message-id/abd65090-5336-42cc-b768-2bdd66738404@iki.fi Backpatch-through: 14	2025-12-02 21:10:51 +02:00
Peter Eisentraut	4f941d432b	Remove useless casting to same type This removes some casts where the input already has the same type as the type specified by the cast. Their presence could cause risks of hiding actual type mismatches in the future or silently discarding qualifiers. It also improves readability. Same kind of idea as `7f798aca1d` and `ef8fe69360`. (This does not change all such instances, but only those hand-picked by the author.) Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/aSQy2JawavlVlEB0%40ip-10-97-1-34.eu-west-3.compute.internal	2025-12-02 10:09:32 +01:00
Michael Paquier	713d9a847e	Update some timestamp[tz] functions to use soft-error reporting This commit updates two functions that convert "timestamptz" to "timestamp", and vice-versa, to use the soft error reporting rather than a their own logic to do the same. These are now named as follows: - timestamp2timestamptz_safe() - timestamptz2timestamp_safe() These functions were suffixed with "_opt_overflow", previously. This shaves some code, as it is possible to detect how a timestamp[tz] overflowed based on the returned value rather than a custom state. It is optionally possible for the callers of these functions to rely on the error generated internally by these functions, depending on the error context. Similar work has been done in `d03668ea05` and `4246a977ba`. Reviewed-by: Amul Sul <sulamul@gmail.com> Discussion: https://postgr.es/m/aS09YF2GmVXjAxbJ@paquier.xyz	2025-12-02 09:30:23 +09:00
Michael Paquier	d03668ea05	Switch some date/timestamp functions to use the soft error reporting This commit changes some functions related to the data types date and timestamp to use the soft error reporting rather than a custom boolean flag called "overflow", used to let the callers of these functions know if an overflow happens. This results in the removal of some boilerplate code, as it is possible to rely on an error context rather than a custom state, with the possibility to use the error generated inside the functions updated here, if necessary. These functions were suffixed with "_opt_overflow". They are now renamed to use "_safe" as suffix. This work is similar to `4246a977ba`. Author: Amul Sul <sulamul@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAAJ_b95HEmFyzHZfsdPquSHeswcopk8MCG1Q_vn4tVkZ+xxofw@mail.gmail.com	2025-12-01 15:22:20 +09:00
Michael Paquier	9ccc049dfe	pg_buffercache: Add pg_buffercache_mark_dirty{,_relation,_all}() This commit introduces three new functions for marking shared buffers as dirty by using the functions introduced in `9660906dbd`: * pg_buffercache_mark_dirty() for one shared buffer. - pg_buffercache_mark_dirt_relation() for all the shared buffers in a relation. * pg_buffercache_mark_dirty_all() for all the shared buffers in pool. The "_all" and "_relation" flavors are designed to address the inefficiency of repeatedly calling pg_buffercache_mark_dirty() for each individual buffer, which can be time-consuming when dealing with with large shared buffers pool. These functions are intended as developer tools and are available only to superusers. There is no need to bump the version of pg_buffercache, `4b203d499c` having done this job in this release cycle. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Joseph Koshakow <koshy44@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Yuhang Qiu <iamqyh@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw@mail.gmail.com	2025-11-28 09:04:04 +09:00
David Rowley	42473b3b31	Have the planner replace COUNT(ANY) with COUNT(), when possible This adds SupportRequestSimplifyAggref to allow pg_proc.prosupport functions to receive an Aggref and allow them to determine if there is a way that the Aggref call can be optimized. Also added is a support function to allow transformation of COUNT(ANY) into COUNT(). This is possible to do when the given "ANY" cannot be NULL and also that there are no ORDER BY / DISTINCT clauses within the Aggref. This is a useful transformation to do as it is common that people write COUNT(1), which until now has added unneeded overhead. When counting a NOT NULL column. The overheads can be worse as that might mean deforming more of the tuple, which for large fact tables may be many columns in. It may be possible to add prosupport functions for other aggregates. We could consider if ORDER BY could be dropped for some calls, e.g. the ORDER BY is quite useless in MAX(c ORDER BY c). There is a little bit of passing fallout from adjusting expr_is_nonnullable() to handle Const which results in a plan change in the aggregates.out regression test. Previously, nothing was able to determine that "One-Time Filter: (100 IS NOT NULL)" was always true, therefore useless to include in the plan. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Discussion: https://postgr.es/m/CAApHDvqGcPTagXpKfH=CrmHBqALpziThJEDs_MrPqjKVeDF9wA@mail.gmail.com	2025-11-27 10:43:28 +13:00
Amit Kapila	76b78721ca	Add slotsync skip statistics. This patch adds two new columns to the pg_stat_replication_slots view: slotsync_skip_count - the total number of times a slotsync operation was skipped. slotsync_skip_at - the timestamp of the most recent skip. These additions provide better visibility into replication slot synchronization behavior. A future patch will introduce the slotsync_skip_reason column in pg_replication_slots to capture the reason for skip. Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com	2025-11-25 07:06:02 +00:00
Michael Paquier	4b203d499c	pg_buffercache: Add pg_buffercache_os_pages `ba2a3c2302` has added a way to check if a buffer is spread across multiple pages with some NUMA information, via a new view pg_buffercache_numa that depends on pg_buffercache_numa_pages(), a SQL function. These can only be queried when support for libnuma exists, generating an error if not. However, it can be useful to know how shared buffers and OS pages map when NUMA is not supported or not available. This commit expands the capabilities around pg_buffercache_numa: - pg_buffercache_numa_pages() is refactored as an internal function, able to optionally process NUMA. Its SQL definition prior to this commit is still around to ensure backward-compatibility with v1.6. - A SQL function called pg_buffercache_os_pages() is added, able to work with or without NUMA. - The view pg_buffercache_numa is redefined to use pg_buffercache_os_pages(). - A new view is added, called pg_buffercache_os_pages. This ignores NUMA for its result processing, for a better efficiency. The implementation is done so as there is no code duplication between the NUMA and non-NUMA views/functions, relying on one internal function that does the job for all of them. The module is bumped to v1.7. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/Z/fFA2heH6lpSLlt@ip-10-97-1-34.eu-west-3.compute.internal	2025-11-24 14:29:15 +09:00
Michael Paquier	7d9043aee8	pg_buffercache: Remove unused fields from BufferCacheNumaRec These fields have been added in commit `ba2a3c2302`, and have never been used. While on it, this commit moves a comment that was out of place, improving it. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aSBOKX6pLJzumbmF@ip-10-97-1-34.eu-west-3.compute.internal	2025-11-23 13:37:42 +09:00
Michael Paquier	694b4ab33b	pg_buffercache: Fix incorrect result cast for relforknumber pg_buffercache_pages.relforknumber is defined as an int2, but its value was stored with ObjectIdGetDatum() rather than Int16GetDatum() in the result record. Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CAExHW5s2_qwSdhKpVnUzjRMf0cf1PvmhUHQDLaFM3QzKbP1OyQ@mail.gmail.com	2025-11-18 15:46:43 +09:00
Andres Freund	c75ebc657f	bufmgr: Allow some buffer state modifications while holding header lock Until now BufferDesc.state was not allowed to be modified while the buffer header spinlock was held. This meant that operations like unpinning buffers needed to use a CAS loop, waiting for the buffer header spinlock to be released before updating. The benefit of that restriction is that it allowed us to unlock the buffer header spinlock with just a write barrier and an unlocked write (instead of a full atomic operation). That was important to avoid regressions in `48354581a4`. However, since then the hottest buffer header spinlock uses have been replaced with atomic operations (in particular, the most common use of PinBuffer_Locked(), in GetVictimBuffer() (formerly in BufferAlloc()), has been removed in `5e89985928`). This change will allow, in a subsequent commit, to release buffer pins with a single atomic-sub operation. This previously was not possible while such operations were not allowed while the buffer header spinlock was held, as an atomic-sub would not have allowed a race-free check for the buffer header lock being held. Using atomic-sub to unpin buffers is a nice scalability win, however it is not the primary motivation for this change (although it would be sufficient). The primary motivation is that we would like to merge the buffer content lock into BufferDesc.state, which will result in more frequent changes of the state variable, which in some situations can cause a performance regression, due to an increased CAS failure rate when unpinning buffers. The regression entirely vanishes when using atomic-sub. Naively implementing this would require putting CAS loops in every place modifying the buffer state while holding the buffer header lock. To avoid that, introduce UnlockBufHdrExt(), which can set/add flags as well as the refcount, together with releasing the lock. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-11-06 16:42:10 -05:00
Álvaro Herrera	a2b02293bc	Use XLogRecPtrIsValid() in various places Now that commit `06edbed478` has introduced XLogRecPtrIsValid(), we can use that instead of: - XLogRecPtrIsInvalid() - direct comparisons with InvalidXLogRecPtr - direct comparisons with literal 0 This makes the code more consistent. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aQB7EvGqrbZXrMlg@ip-10-97-1-34.eu-west-3.compute.internal	2025-11-06 20:33:57 +01:00
Etsuro Fujita	c3eec94fc1	postgres_fdw: Add more test coverage for EvalPlanQual testing. postgres_fdw supports EvalPlanQual testing by using the infrastructure provided by the core with the RecheckForeignScan callback routine (cf. commits `5fc4c26db` and `385f337c9`), but there has been no test coverage for that, except that recent commit `12609fbac`, which fixed an issue in commit `385f337c9`, added a test case to exercise only a code path added by that commit to the core infrastructure. So let's add test cases to exercise other code paths as well at this time. Like commit `12609fbac`, back-patch to all supported branches. Reported-by: Masahiko Sawada <sawada.mshk@gmail.com> Author: Etsuro Fujita <etsuro.fujita@gmail.com> Discussion: https://postgr.es/m/CAPmGK15%2B6H%3DkDA%3D-y3Y28OAPY7fbAdyMosVofZZ%2BNc769epVTQ%40mail.gmail.com Backpatch-through: 13	2025-11-06 12:15:00 +09:00
David Rowley	6d0eba6627	Use stack allocated StringInfoDatas, where possible Various places that were using StringInfo but didn't need that StringInfo to exist beyond the scope of the function were using makeStringInfo(), which allocates both a StringInfoData and the buffer it uses as two separate allocations. It's more efficient for these cases to use a StringInfoData on the stack and initialize it with initStringInfo(), which only allocates the string buffer. This also simplifies the cleanup, in a few cases. Author: Mats Kindahl <mats.kindahl@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/4379aac8-26f1-42f2-a356-ff0e886228d3@gmail.com	2025-11-06 14:59:48 +13:00
Tom Lane	ff8aba65d4	Fix contrib/ltree's subpath() with negative offset. subpath(ltree,offset,len) now correctly errors when given an offset less than -n, where n is the number of labels in the given ltree. There was a duplicate block of code that allowed an offset as low as -2n. The documentation says no such thing, so this must have been a copy-and-paste error in the original ltree patch. While here, avoid redundant calculation of "end" and write LTREE_MAX_LEVELS rather than its hard-coded value. Author: Marcus Gartner <m.a.gartner@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAAUGV_SvBO9gWYbaejb9nhe-mS9FkNP4QADNTdM3wdRhvLobwA@mail.gmail.com	2025-11-01 13:25:42 -04:00
Peter Eisentraut	8ce795fcb7	Fix some confusing uses of const There are a few places where we have typedef struct FooData { ... } FooData; typedef FooData Foo; and then function declarations with bar(const Foo x) which isn't incorrect but probably meant bar(const FooData x) meaning that the thing x points to is immutable, not x itself. This patch makes those changes where appropriate. In one case (execGrouping.c), the thing being pointed to was not immutable, so in that case remove the const altogether, to avoid further confusion. Co-authored-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/CAEoWx2m2E0xE8Kvbkv31ULh_E%2B5zph-WA_bEdv3UR9CLhw%2B3vg%40mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAEoWx2kTDz%3Db6T2xHX78vy_B_osDeCC5dcTCi9eG0vXHp5QpdQ%40mail.gmail.com	2025-10-30 11:20:04 +01:00
Álvaro Herrera	16edc1b94f	pg_stat_statements: Fix handling of duplicate constant locations Two or more constants can have the same location. We handled this correctly for non squashed constants, but failed to do it if squashed (resulting in out-of-bounds memory access), because the code structure became broken by commit `0f65f3eec4`: we failed to update 'last_loc' correctly when skipping these squashed constants. The simplest fix seems to be to get rid of 'last_loc' altogether -- in hindsight, it's quite pointless. Also, when ignoring a constant because of this, make sure to fulfill fill_in_constant_lengths's duty of setting its length to -1. Lastly, we can use == instead of <= because the locations have been sorted beforehand, so the < case cannot arise. Co-authored-by: Sami Imseih <samimseih@gmail.com> Co-authored-by: Dmitry Dolgov <9erthalion6@gmail.com> Reported-by: Konstantin Knizhnik <knizhnik@garret.ru> Backpatch-through: 18 Discussion: https://www.postgresql.org/message-id/2b91e358-0d99-43f7-be44-d2d4dbce37b3%40garret.ru	2025-10-29 12:35:02 +01:00
Tom Lane	0758111f5d	Update expected output for contrib/sepgsql's regression tests. Commit `65281391a` caused some additional error context lines to appear in the output of one test case. That's fine, but we missed updating the expected output. Do it now. While here, add some missing test-output subdirectories to contrib/sepgsql/.gitignore, so that we don't get git warnings after running the tests. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1613232.1761255361@sss.pgh.pa.us Backpatch-through: 18	2025-10-23 17:47:10 -04:00
David Rowley	2470ca435c	Use CompactAttribute more often, when possible `5983a4cff` added CompactAttribute for storing commonly used fields from FormData_pg_attribute. `5983a4cff` didn't go to the trouble of adjusting every location where we can use CompactAttribute rather than FormData_pg_attribute, so here we change the remaining ones. There are some locations where I've left the code using FormData_pg_attribute. These are mostly in the ALTER TABLE code. Using CompactAttribute here seems more risky as often the TupleDesc is being changed and those changes may not have been flushed to the CompactAttribute yet. I've also left record_recv(), record_send(), record_cmp(), record_eq() and record_image_eq() alone as it's not clear to me that accessing the CompactAttribute is a win here due to the FormData_pg_attribute still having to be accessed for most cases. Switching the relevant parts to use CompactAttribute would result in having to access both for common cases. Careful benchmarking may reveal that something can be done to make this better, but in absence of that, the safer option is to leave these alone. In ReorderBufferToastReplace(), there was a check to skip attnums < 0 while looping over the TupleDesc. Doing this is redundant since TupleDescs don't store < 0 attnums. Removing that code allows us to move to using CompactAttribute. The change in validateDomainCheckConstraint() just moves fetching the FormData_pg_attribute into the ERROR path, which is cold due to calling errstart_cold() and results in code being moved out of the common path. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAApHDvrMy90o1Lgkt31F82tcSuwRFHq3vyGewSRN=-QuSEEvyQ@mail.gmail.com	2025-10-22 11:36:26 +13:00
Masahiko Sawada	4bea91f21f	Support COPY TO for partitioned tables. Previously, COPY TO command didn't support directly specifying partitioned tables so users had to use COPY (SELECT ...) TO variant. This commit adds direct COPY TO support for partitioned tables, improving both usability and performance. Performance tests show it's faster than the COPY (SELECT ...) TO variant as it avoids the overheads of query processing and sending results to the COPY TO command. When used with partitioned tables, COPY TO copies the same rows as SELECT * FROM table. Row-level security policies of the partitioned table are applied in the same way as when executing COPY TO on a plain table. Author: jian he <jian.universality@gmail.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Melih Mutlu <m.melihmutlu@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CACJufxEZt%2BG19Ors3bQUq-42-61__C%3Dy5k2wk%3DsHEFRusu7%3DiQ%40mail.gmail.com	2025-10-20 10:38:52 -07:00
Tom Lane	da44d71e79	Allow role created by new test to log in on Windows. We must tell init about each role name we plan to connect as, else SSPI auth fails. Similar to previous patches such as `14793f471`, `973542866`. Oversight in `208927e65`, per buildfarm member drongo. (Although that was back-patched to v13, the test script only exists in v16 and up.)	2025-10-18 18:36:21 -04:00
Nathan Bossart	208927e656	Fix privilege checks for pg_prewarm() on indexes. pg_prewarm() currently checks for SELECT privileges on the target relation. However, indexes do not have access rights of their own, so a role may be denied permission to prewarm an index despite having the SELECT privilege on its parent table. This commit fixes this by locking the parent table before the index (to avoid deadlocks) and checking for SELECT on the parent table. Note that the code is largely borrowed from amcheck_lock_relation_and_check(). An obvious downside of this change is the extra AccessShareLock on the parent table during prewarming, but that isn't expected to cause too much trouble in practice. Author: Ayush Vatsa <ayushvatsa1810@gmail.com> Co-authored-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/CACX%2BKaMz2ZoOojh0nQ6QNBYx8Ak1Dkoko%3DD4FSb80BYW%2Bo8CHQ%40mail.gmail.com Backpatch-through: 13	2025-10-17 11:36:50 -05:00
Michael Paquier	fabb33b351	Improve TAP tests by replacing ok() with better Test::More functions The TAP tests whose ok() calls are changed in this commit were relying on perl operators, rather than equivalents available in Test::More. For example, rather than the following: ok($data =~ qr/expr/m, "expr matching"); ok($data !~ qr/expr/m, "expr not matching"); The new test code uses this equivalent: like($data, qr/expr/m, "expr matching"); unlike($data, qr/expr/m, "expr not matching"); A huge benefit of the new formulation is that it is possible to know about the values we are checking if a failure happens, making debugging easier, should the test runs happen in the buildfarm, in the CI or locally. This change leads to more test code overall as perltidy likes to make the code pretty the way it is in this commit. Author: Sadhuprasad Patro <b.sadhu@gmail.com> Discussion: https://postgr.es/m/CAFF0-CHhwNx_Cv2uy7tKjODUbeOgPrJpW4Rpf1jqB16_1bU2sg@mail.gmail.com	2025-10-17 14:39:09 +09:00
Etsuro Fujita	12609fbacb	Fix EvalPlanQual handling of foreign/custom joins in ExecScanFetch. If inside an EPQ recheck, ExecScanFetch would run the recheck method function for foreign/custom joins even if they aren't descendant nodes in the EPQ recheck plan tree, which is problematic at least in the foreign-join case, because such a foreign join isn't guaranteed to have an alternative local-join plan required for running the recheck method function; in the postgres_fdw case this could lead to a segmentation fault or an assert failure in an assert-enabled build when running the recheck method function. Even if inside an EPQ recheck, any scan nodes that aren't descendant ones in the EPQ recheck plan tree should be normally processed by using the access method function; fix by modifying ExecScanFetch so that if inside an EPQ recheck, it runs the recheck method function for foreign/custom joins that are descendant nodes in the EPQ recheck plan tree as before and runs the access method function for foreign/custom joins that aren't. This fix also adds to postgres_fdw an isolation test for an EPQ recheck that caused issues stated above. Oversight in commit `385f337c9`. Reported-by: Kristian Lejao <kristianlejao@gmail.com> Author: Masahiko Sawada <sawada.mshk@gmail.com> Co-authored-by: Etsuro Fujita <etsuro.fujita@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com> Discussion: https://postgr.es/m/CAD21AoBpo6Gx55FBOW+9s5X=nUw3Xpq64v35fpDEKsTERnc4TQ@mail.gmail.com Backpatch-through: 13	2025-10-15 17:15:00 +09:00
Nathan Bossart	c9b299f6df	dblink: Avoid locking relation before privilege check. The present coding of dblink's get_rel_from_relname() predates the introduction of RangeVarGetRelidExtended(), which provides a way to check permissions before locking the relation. This commit adjusts get_rel_from_relname() to use that function. Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/aOgmi6avE6qMw_6t%40nathan	2025-10-14 12:20:48 -05:00
Masahiko Sawada	d3b6183dd9	Add mem_exceeded_count column to pg_stat_replication_slots. This commit introduces a new column mem_exceeded_count to the pg_stat_replication_slots view. This counter tracks how often the memory used by logical decoding exceeds the logical_decoding_work_mem limit. The new statistic helps users determine whether exceeding the logical_decoding_work_mem limit is a rare occurrences or a frequent issue, information that wasn't available through existing statistics. Bumps catversion. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/978D21E8-9D3B-40EA-A4B1-F87BABE7868C@yesql.se	2025-10-08 10:05:04 -07:00
Robert Haas	c83ac02ec7	Add ExplainState argument to pg_plan_query() and planner(). This allows extensions to have access to any data they've stored in the ExplainState during planning. Unfortunately, it won't help with EXPLAIN EXECUTE is used, but since that case is less common, this still seems like an improvement. Since planner() has quite a few arguments now, also add some documentation of those arguments and the return value. Author: Robert Haas <rhaas@postgresql.org> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com	2025-10-08 08:33:29 -04:00
Richard Guo	8e11859102	Implement Eager Aggregation Eager aggregation is a query optimization technique that partially pushes aggregation past a join, and finalizes it once all the relations are joined. Eager aggregation may reduce the number of input rows to the join and thus could result in a better overall plan. In the current planner architecture, the separation between the scan/join planning phase and the post-scan/join phase means that aggregation steps are not visible when constructing the join tree, limiting the planner's ability to exploit aggregation-aware optimizations. To implement eager aggregation, we collect information about aggregate functions in the targetlist and HAVING clause, along with grouping expressions from the GROUP BY clause, and store it in the PlannerInfo node. During the scan/join planning phase, this information is used to evaluate each base or join relation to determine whether eager aggregation can be applied. If applicable, we create a separate RelOptInfo, referred to as a grouped relation, to represent the partially-aggregated version of the relation and generate grouped paths for it. Grouped relation paths can be generated in two ways. The first method involves adding sorted and hashed partial aggregation paths on top of the non-grouped paths. To limit planning time, we only consider the cheapest or suitably-sorted non-grouped paths in this step. Alternatively, grouped paths can be generated by joining a grouped relation with a non-grouped relation. Joining two grouped relations is currently not supported. To further limit planning time, we currently adopt a strategy where partial aggregation is pushed only to the lowest feasible level in the join tree where it provides a significant reduction in row count. This strategy also helps ensure that all grouped paths for the same grouped relation produce the same set of rows, which is important to support a fundamental assumption of the planner. For the partial aggregation that is pushed down to a non-aggregated relation, we need to consider all expressions from this relation that are involved in upper join clauses and include them in the grouping keys, using compatible operators. This is essential to ensure that an aggregated row from the partial aggregation matches the other side of the join if and only if each row in the partial group does. This ensures that all rows within the same partial group share the same "destiny", which is crucial for maintaining correctness. One restriction is that we cannot push partial aggregation down to a relation that is in the nullable side of an outer join, because the NULL-extended rows produced by the outer join would not be available when we perform the partial aggregation, while with a non-eager-aggregation plan these rows are available for the top-level aggregation. Pushing partial aggregation in this case may result in the rows being grouped differently than expected, or produce incorrect values from the aggregate functions. If we have generated a grouped relation for the topmost join relation, we finalize its paths at the end. The final paths will compete in the usual way with paths built from regular planning. The patch was originally proposed by Antonin Houska in 2017. This commit reworks various important aspects and rewrites most of the current code. However, the original patch and reviews were very useful. Author: Richard Guo <guofenglinux@gmail.com> Author: Antonin Houska <ah@cybertec.at> (in an older version) Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> (in an older version) Reviewed-by: Andy Fan <zhihuifan1213@163.com> (in an older version) Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> (in an older version) Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com	2025-10-08 17:04:23 +09:00
Robert Haas	8c49a484e8	Assign each subquery a unique name prior to planning it. Previously, subqueries were given names only after they were planned, which makes it difficult to use information from a previous execution of the query to guide future planning. If, for example, you knew something about how you want "InitPlan 2" to be planned, you won't know whether the subquery you're currently planning will end up being "InitPlan 2" until after you've finished planning it, by which point it's too late to use the information that you had. To fix this, assign each subplan a unique name before we begin planning it. To improve consistency, use textual names for all subplans, rather than, as we did previously, a mix of numbers (such as "InitPlan 1") and names (such as "CTE foo"), and make sure that the same name is never assigned more than once. We adopt the somewhat arbitrary convention of using the type of sublink to set the plan name; for example, a query that previously had two expression sublinks shown as InitPlan 2 and InitPlan 1 will now end up named expr_1 and expr_2. Because names are assigned before rather than after planning, some of the regression test outputs show the numerical part of the name switching positions: what was previously SubPlan 2 was actually the first one encountered, but we finished planning it later. We assign names even to subqueries that aren't shown as such within the EXPLAIN output. These include subqueries that are a FROM clause item or a branch of a set operation, rather than something that will be turned into an InitPlan or SubPlan. The purpose of this is to make sure that, below the topmost query level, there's always a name for each subquery that is stable from one planning cycle to the next (assuming no changes to the query or the database schema). Author: Robert Haas <rhaas@postgresql.org> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: http://postgr.es/m/3641043.1758751399@sss.pgh.pa.us	2025-10-07 09:18:54 -04:00
Álvaro Herrera	1a8b5b11e4	Don't include access/htup_details.h in executor/tuptable.h This is not at all needed; I suspect it was a simple mistake in commit `5408e233f0`. It causes htup_details.h to bleed into a huge number of places via execnodes.h. Remove it and fix fallout. Discussion: https://postgr.es/m/202510021240.ptc2zl5cvwen@alvherre.pgsql	2025-10-05 18:00:38 +02:00
Michael Paquier	684a745f55	pgstattuple: Improve reports generated for indexes (hash, gist, btree) pgstattuple checks the state of the pages retrieved for gist and hash using some check functions from each index AM, respectively gistcheckpage() and _hash_checkpage(). When these are called, they would fail when bumping on data that is found as incorrect (like opaque area size not matching, or empty pages), contrary to btree that simply discards these cases and continues to aggregate data. Zero pages can happen after a crash, with these AMs being able to do an internal cleanup when these are seen. Also, sporadic failures are annoying when doing for example a large-scale diagnostic query based on pgstattuple with a join of pg_class, as it forces one to use tricks like quals to discard hash or gist indexes, or use a PL wrapper able to catch errors. This commit changes the reports generated for btree, gist and hash to be more user-friendly; - When seeing an empty page, report it as free space. This new rule applies to gist and hash, and already applied to btree. - For btree, a check based on the size of BTPageOpaqueData is added. - For gist indexes, gistcheckpage() is not called anymore, replaced by a check based on the size of GISTPageOpaqueData. - For hash indexes, instead of _hash_getbuf_with_strategy(), use a direct call to ReadBufferExtended(), coupled with a check based on HashPageOpaqueData. The opaque area size check was already used. - Pages that do not match these criterias are discarded from the stats reports generated. There have been a couple of bug reports over the years that complained about the current behavior for hash and gist, as being not that useful, with nothing being done about it. Hence this change is backpatched down to v13. Reported-by: Noah Misch <noah@leadboat.com> Author: Nitin Motiani <nitinmotiani@google.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Discussion: https://postgr.es/m/CAH5HC95gT1J3dRYK4qEnaywG8RqjbwDdt04wuj8p39R=HukayA@mail.gmail.com Backpatch-through: 13	2025-10-02 11:07:30 +09:00
Peter Eisentraut	f5aabe6d58	Revert "Make some use of anonymous unions [pgcrypto]" This reverts commit `efcd5199d8`. I rebased my patch series incorrectly. This patch contained unrelated parts from another patch, which made the overall build fail. Revert for now and reconsider.	2025-09-30 13:12:16 +02:00
Peter Eisentraut	efcd5199d8	Make some use of anonymous unions [pgcrypto] Make some use of anonymous unions, which are allowed as of C11, as examples and encouragement for future code, and to test compilers. This commit changes some structures in pgcrypto. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/f00a9968-388e-4f8c-b5ef-5102e962d997%40eisentraut.org	2025-09-30 12:35:50 +02:00
Peter Eisentraut	57d46dff9b	Make some use of anonymous unions [reorderbuffer xact_time] Make some use of anonymous unions, which are allowed as of C11, as examples and encouragement for future code, and to test compilers. This commit changes the ReorderBufferTXN struct. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/f00a9968-388e-4f8c-b5ef-5102e962d997%40eisentraut.org	2025-09-30 12:35:50 +02:00
Álvaro Herrera	3bf31dd243	Do a tiny bit of header file maintenance Stop including utils/relcache.h in access/genam.h, and stop including htup_details.h in nodes/tidbitmap.h. Both these files (genam.h and tidbitmap.h) are widely used in other header files, so it's in our best interest that they remain as lean as reasonable. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/202509291356.o5t6ny2hoa3q@alvherre.pgsql	2025-09-30 12:28:29 +02:00
Robert Haas	f2bae51dfd	Keep track of what RTIs a Result node is scanning. Result nodes now include an RTI set, which is only non-NULL when they have no subplan, and is taken from the relid set of the RelOptInfo that the Result is generating. ExplainPreScanNode now takes notice of these RTIs, which means that a few things get schema-qualified in the regression tests that previously did not. This makes the output more consistent between cases where some part of the plan tree is replaced by a Result node and those where this does not happen. Likewise, pg_overexplain's EXPLAIN (RANGE_TABLE) now displays the RTIs stored in a Result node just as it already does for other RTI-bearing node types. Result nodes also now include a result_reason, which tells us something about why the Result node was inserted. Using that information, EXPLAIN now emits, where relevant, a "Replaces" line describing the origin of a Result node. The purpose of these changes is to allow code that inspects a Plan tree to understand the origin of Result nodes that appear therein. Discussion: http://postgr.es/m/CA+TgmoYeUZePZWLsSO+1FAN7UPePT_RMEZBKkqYBJVCF1s60=w@mail.gmail.com Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>	2025-09-23 09:07:55 -04:00
David Rowley	9fc7f6ab72	Fix various incorrect filename references Author: Chao Li <li.evan.chao@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAEoWx2=hOBCPm-Z=F15twr_23XjHeoXSbifP5GdEdtWona97wQ@mail.gmail.com	2025-09-22 13:33:17 +12:00
Tom Lane	261f89a976	Track the maximum possible frequency of non-MCE array elements. The lossy-counting algorithm that ANALYZE uses to identify most-common array elements has a notion of cutoff frequency: elements with frequency greater than that are guaranteed to be collected, elements with smaller frequencies are not. In cases where we find fewer MCEs than the stats target would permit us to store, the cutoff frequency provides valuable additional information, to wit that there are no non-MCEs with frequency greater than that. What the selectivity estimation functions actually use the "minfreq" entry for is as a ceiling on the possible frequency of non-MCEs, so using the cutoff rather than the lowest stored MCE frequency provides a tighter bound and more accurate estimates. Therefore, instead of redundantly storing the minimum observed MCE frequency, store the cutoff frequency when there are fewer tracked values than we want. (When there are more, then of course we cannot assert that no non-stored elements are above the cutoff frequency, since we're throwing away some that are; so we still use the minimum stored frequency in that case.) Notably, this works even when none of the values are common enough to be called MCEs. In such cases we previously stored nothing in the STATISTIC_KIND_MCELEM pg_statistic slot, which resulted in the selectivity functions falling back to default estimates. So in that case we want to construct a STATISTIC_KIND_MCELEM entry that contains no "values" but does have "numbers", to wit the three extra numbers that the MCELEM entry type defines. A small obstacle is that update_attstats() has traditionally stored a null, not an empty array, when passed zero "values" for a slot. That gives rise to an MCELEM entry that get_attstatsslot() will spit up on. The least risky solution seems to be to adjust update_attstats() so that it will emit a non-null (but possibly empty) array when the passed stavalues array pointer isn't NULL, rather than conditioning that on numvalues > 0. In other existing cases I don't believe that that changes anything. For consistency, handle the stanumbers array the same way. In passing, improve the comments in routines that use STATISTIC_KIND_MCELEM data. Particularly, explain why we use minfreq / 2 not minfreq as the estimate for non-MCE values. Thanks to Matt Long for the suggestion that we could apply this idea even when there are more than zero MCEs. Reported-by: Mark Frost <FROSTMAR@uk.ibm.com> Reported-by: Matt Long <matt@mattlong.org> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/PH3PPF1C905D6E6F24A5C1A1A1D8345B593E16FA@PH3PPF1C905D6E6.namprd15.prod.outlook.com	2025-09-20 14:48:16 -04:00

1 2 3 4 5 ...

5148 commits