Commit graph

5148 commits

Author SHA1 Message Date
Michael Paquier
3c5ec35dea oid2name: Add relation path to the information provided by -x/--extended
This affects two command patterns, showing information about relations:
* oid2name -x -d DBNAME, applying to all relations on a database.
* oid2name -x -d DBNAME -t TABNAME [-t ..], applying to a subset of
defined relations on a database.

The relative path of a relation is added to the information provided,
using pg_relation_filepath().

Author: David Bidoc <dcbidoc@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Guillaume Lelarge <guillaume.lelarge@dalibo.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Mark Wong <markwkm@gmail.com>
Discussion: https://postgr.es/m/CABour1v2CU1wjjoM86wAFyezJQ3-+ncH43zY1f1uXeVojVN8Ow@mail.gmail.com
2026-02-05 09:02:12 +09:00
John Naylor
176dffdf7d Fix various instances of undefined behavior
Mostly this involves checking for NULL pointer before doing operations
that add a non-zero offset.

The exception is an overflow warning in heap_fetch_toast_slice(). This
was caused by unneeded parentheses forcing an expression to be
evaluated to a negative integer, which then got cast to size_t.

Per clang 21 undefined behavior sanitizer.

Backpatch to all supported versions.

Co-authored-by: Alexander Lakhin <exclusion@gmail.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/777bd201-6e3a-4da0-a922-4ea9de46a3ee@gmail.com
Backpatch-through: 14
2026-02-04 18:09:35 +07:00
Peter Eisentraut
955e507668 Change StaticAssertVariableIsOfType to be a declaration
This allows moving the uses to more natural and useful positions.
Also, a declaration is the more native use of static assertions in C.

Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/2273bc2a-045d-4a75-8584-7cd9396e5534%40eisentraut.org
2026-02-03 08:46:02 +01:00
Peter Eisentraut
137d05df2f Rename AssertVariableIsOfType to StaticAssertVariableIsOfType
This keeps run-time assertions and static assertions clearly separate.

Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/2273bc2a-045d-4a75-8584-7cd9396e5534%40eisentraut.org
2026-02-03 08:45:24 +01:00
Melanie Plageman
4a99ef1a0d Fix flakiness in the pg_visibility VM-only vacuum test by using a temporary table.
The test relies on VACUUM being able to mark a page all-visible, but
this can fail when autovacuum in other sessions prevents the visibility
horizon from advancing. Making the test table temporary isolates its
horizon from other sessions, including catalog table vacuums, ensuring
reliable test behavior.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/2b09fba6-6b71-497a-96ef-a6947fcc39f6%40gmail.com
2026-02-02 17:45:27 -05:00
Tom Lane
da7a1dc0d6 Refactor att_align_nominal() to improve performance.
Separate att_align_nominal() into two macros, similarly to what
was already done with att_align_datum() and att_align_pointer().
The inner macro att_nominal_alignby() is really just TYPEALIGN(),
while att_align_nominal() retains its previous API by mapping
TYPALIGN_xxx values to numbers of bytes to align to and then
calling att_nominal_alignby().  In support of this, split out
tupdesc.c's logic to do that mapping into a publicly visible
function typalign_to_alignby().

Having done that, we can replace performance-critical uses of
att_align_nominal() with att_nominal_alignby(), where the
typalign_to_alignby() mapping is done just once outside the loop.

In most places I settled for doing typalign_to_alignby() once
per function.  We could in many places pass the alignby value
in from the caller if we wanted to change function APIs for this
purpose; but I'm a bit loath to do that, especially for exported
APIs that extensions might call.  Replacing a char typalign
argument by a uint8 typalignby argument would be an API change
that compilers would fail to warn about, thus silently breaking
code in hard-to-debug ways.  I did revise the APIs of array_iter_setup
and array_iter_next, moving the element type attribute arguments to
the former; if any external code uses those, the argument-count
change will cause visible compile failures.

Performance testing shows that ExecEvalScalarArrayOp is sped up by
about 10% by this change, when using a simple per-element function
such as int8eq.  I did not check any of the other loops optimized
here, but it's reasonable to expect similar gains.

Although the motivation for creating this patch was to avoid a
performance loss if we add some more typalign values, it evidently
is worth doing whether that patch lands or not.

Discussion: https://postgr.es/m/1127261.1769649624@sss.pgh.pa.us
2026-02-02 14:39:50 -05:00
Masahiko Sawada
1fdbca159e Standardize replication origin naming to use "ReplOrigin".
The replication origin code was using inconsistent naming
conventions. Functions were typically prefixed with 'replorigin',
while typedefs and constants used "RepOrigin".

This commit unifies the naming convention by renaming RepOriginId to
ReplOriginId.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAD21AoBDgm3hDqUZ+nqu=ViHmkCnJBuJyaxG_yvv27BAi2zBmQ@mail.gmail.com
2026-01-28 11:03:29 -08:00
Tomas Vondra
09c37015d4 pgindent fix for 3fccbd94cb
Backpatch-through: 18
2026-01-27 00:26:36 +01:00
Tomas Vondra
3fccbd94cb Handle ENOENT status when querying NUMA node
We've assumed that touching the memory is sufficient for a page to be
located on one of the NUMA nodes. But a page may be moved to a swap
after we touch it, due to memory pressure.

We touch the memory before querying the status, but there is no
guarantee it won't be moved to the swap in the meantime. The touching
happens only on the first call, so later calls are more likely to be
affected. And the batching increases the window too.

It's up to the kernel if/when pages get moved to swap. We have to accept
ENOENT (-2) as a valid result, and handle it without failing. This patch
simply treats it as an unknown node, and returns NULL in the two
affected views (pg_shmem_allocations_numa and pg_buffercache_numa).

Hugepages cannot be swapped out, so this affects only regular pages.

Reported by Christoph Berg, investigation and fix by me. Backpatch to
18, where the two views were introduced.

Reported-by: Christoph Berg <myon@debian.org>
Discussion: 18
Backpatch-through: https://postgr.es/m/aTq5Gt_n-oS_QSpL@msg.df7cb.de
2026-01-27 00:21:40 +01:00
Melanie Plageman
21796c267d Combine visibilitymap_set() cases in lazy_scan_prune()
lazy_scan_prune() previously had two separate cases that called
visibilitymap_set() after pruning and freezing. These branches were
nearly identical except that one attempted to avoid dirtying the heap
buffer. However, that situation can never occur — the heap buffer cannot
be clean at that point (and we would hit an assertion if it were).

In lazy_scan_prune(), when we change a previously all-visible page to
all-frozen and the page was recorded as all-visible in the visibility
map by find_next_unskippable_block(), the heap buffer will always be
dirty. Either we have just frozen a tuple and already dirtied the
buffer, or the buffer was modified between find_next_unskippable_block()
and heap_page_prune_and_freeze() and then pruned in
heap_page_prune_and_freeze().

Additionally, XLogRegisterBuffer() asserts that the buffer is dirty, so
attempting to add a clean heap buffer to the WAL chain would assert out
anyway.

Since the “clean heap buffer with already set VM” case is impossible,
the two visibilitymap_set() branches in lazy_scan_prune() can be merged.
Doing so makes the intent clearer and emphasizes that the heap buffer
must always be marked dirty before being added to the WAL chain.

This commit also adds a test case for vacuuming when no heap
modifications are required. Currently this ensures that the heap buffer
is marked dirty before it is added to the WAL chain, but if we later
remove the heap buffer from the VM-set WAL chain or pass it with the
REGBUF_NO_CHANGES flag, this test would guard that behavior.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
Discussion: https://postgr.es/m/flat/CAAKRu_ZWx5gCbeCf7PWCv8p5%3D%3Db7EEws0VD2wksDxpXCvCyHvQ%40mail.gmail.com
2026-01-26 16:03:32 -05:00
Peter Eisentraut
5ca5f12c2c Fix accidentally cast away qualifiers
This fixes cases where a qualifier (const, in all cases here) was
dropped by a cast, but the cast was otherwise necessary or desirable,
so the straightforward fix is to add the qualifier into the cast.

Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b04f4d3a-5e70-4e73-9ef2-87f777ca4aac%40eisentraut.org
2026-01-26 16:02:31 +01:00
Michael Paquier
72e3abd082 pg_stat_statements: Fix test instability with cache-clobbering builds
Builds with CLOBBER_CACHE_ALWAYS enabled are failing the new test
introduced in 1572ea96e6, checking the nesting level calculation in
the planner hook.  The inner query of the function called twice is
registered as normalized, as such builds would register a PGSS entry in
the post-parse-analyse hook due to the cached plans requiring
revalidation.

A trick based on debug_discard_caches cannot work as far as I can, a
normalized query still being registered.  This commit takes a different
approach with the addition of a DISCARD PLANS before the first function
call.  This forces the use of a normalized query in the PGSS entry for
the inner query of the function with and without CLOBBER_CACHE_ALWAYS,
which should be enough to stabilize the test.  Note that the test is
still checking what it should: when removing the nesting level
calculation in the planner hook of PGSS, one still gets a failure for
the PGSS entry of the inner query in the function, with "toplevel" being
flipped to true instead of false (it should be false, as a non-top-level
entry).

Per buildfarm members avocet and trilobite, at least.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/82dd02bb-4e0f-40ad-a60b-baa1763ff0bd@gmail.com
2026-01-25 19:01:23 +09:00
Amit Langote
f9a468c664 Fix bogus ctid requirement for dummy-root partitioned targets
ExecInitModifyTable() unconditionally required a ctid junk column even
when the target was a partitioned table. This led to spurious "could
not find junk ctid column" errors when all children were excluded and
only the dummy root result relation remained.

A partitioned table only appears in the result relations list when all
leaf partitions have been pruned, leaving the dummy root as the sole
entry. Assert this invariant (nrels == 1) and skip the ctid requirement.
Also adjust ExecModifyTable() to tolerate invalid ri_RowIdAttNo for
partitioned tables, which is safe since no rows will be processed in
this case.

Bug: #19099
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19099-e05dcfa022fe553d%40postgresql.org
Backpatch-through: 14
2026-01-23 10:23:30 +09:00
Peter Eisentraut
a5b40d156e Mark commented out code as unused
There were many PG_GETARG_* calls, mostly around gin, gist, spgist
code, that were commented out, presumably to indicate that the
argument was unused and to indicate that it wasn't forgotten or
miscounted.  But keeping commented-out code updated with refactorings
and style changes is annoying.  So this commit changes them to

    #ifdef NOT_USED

blocks, which is a style already in use.  That way, at least the
indentation and syntax highlighting works correctly, making some of
these blocks much easier to read.

An alternative would be to just delete that code, but there is some
value in making unused arguments explicit, and some of this arguably
serves as example code for index AM APIs.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/328e4371-9a4c-4196-9df9-1f23afc900df%40eisentraut.org
2026-01-22 12:44:07 +01:00
Fujii Masao
26cb14aea1 file_fdw: Support multi-line HEADER option.
Commit bc2f348 introduced multi-line HEADER support for COPY. This commit
extends this capability to file_fdw, allowing multiple header lines to be
skipped.

Because CREATE/ALTER FOREIGN TABLE requires option values to be single-quoted,
this commit also updates defGetCopyHeaderOption() to accept integer values
specified as strings for HEADER option.

Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: songjinzhou <tsinghualucky912@foxmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAOzEurT+iwC47VHPMS+uJ4WSzvOLPsZ2F2_wopm8M7O+CZa3Xw@mail.gmail.com
2026-01-22 10:14:12 +09:00
Tom Lane
4576208454 Force standard_conforming_strings to always be ON.
Continuing to support this backwards-compatibility feature has
nontrivial costs; in particular it is potentially a security hazard
if an application somehow gets confused about which setting the
server is using.  We changed the default to ON fifteen years ago,
which seems like enough time for applications to have adapted.
Let's remove support for the legacy string syntax.

We should not remove the GUC altogether, since client-side code will
still test it, pg_dump scripts will attempt to set it to ON, etc.
Instead, just prevent it from being set to OFF.  There is precedent
for this approach (see commit de66987ad).

This patch does remove the related GUC escape_string_warning, however.
That setting does nothing when standard_conforming_strings is on,
so it's now useless.  We could leave it in place as a do-nothing
setting to avoid breaking clients that still set it, if there are any.
But it seems likely that any such client is also trying to turn off
standard_conforming_strings, so it'll need work anyway.

The client-side changes in this patch are pretty minimal, because even
though we are dropping the server's support, most of our clients still
need to be able to talk to older server versions.  We could remove
dead client code only once we disclaim compatibility with pre-v19
servers, which is surely years away.  One change of note is that
pg_dump/pg_dumpall now set standard_conforming_strings = on in their
source session, rather than accepting the source server's default.
This ensures that literals in view definitions and such will be
printed in a way that's acceptable to v19+.  In particular,
pg_upgrade will work transparently even if the source installation has
standard_conforming_strings = off.  (However, pg_restore will behave
the same as before if given an archive file containing
standard_conforming_strings = off.  Such an archive will not be safely
restorable into v19+, but we shouldn't break the ability to extract
valid data from it for use with an older server.)

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3279216.1767072538@sss.pgh.pa.us
2026-01-21 15:08:38 -05:00
Álvaro Herrera
1f28982e40
amcheck: Fix snapshot usage in bt_index_parent_check
We were using SnapshotAny to do some index checks, but that's wrong and
causes spurious errors when used on indexes created by CREATE INDEX
CONCURRENTLY.  Fix it to use an MVCC snapshot, and add a test for it.

Backpatch of 6bd469d26a to branches 14-16.  I previously misidentified
the bug's origin: it came in with commit 7f563c09f8 (pg11-era, not
5ae2087202 as claimed previously), so all live branches are affected.

Also take the opportunity to fix some comments that we failed to update
in the original commits and apply pgperltidy.  In branch 14, remove the
unnecessary test plan specification (which would have need to have been
changed anyway; c.f. commit 549ec201d613.)

Diagnosed-by: Donghang Lin <donghanglin@gmail.com>
Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Backpatch-through: 17
Discussion: https://postgr.es/m/CANtu0ojmVd27fEhfpST7RG2KZvwkX=dMyKUqg0KM87FkOSdz8Q@mail.gmail.com
2026-01-21 18:55:43 +01:00
Michael Paquier
1572ea96e6 pg_stat_statements: Add more tests for level tracking
This commit adds tests to verify the computation of the nesting level
for two code paths: the planner hook and the ExecutorFinish() hook.  The
nesting level is essential to save a correct "toplevel" status for the
added PGSS entries.

The author has noticed that removing the manipulations of nesting_level
in these two code paths did not cause the tests to complain, meaning
that we never had coverage for the assumptions taken by the code.

Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0uK1PSrgf52bWCtDpzaqbWt04o6ZA7zBm6UQyv7vyvf9w@mail.gmail.com
2026-01-21 18:18:15 +09:00
Michael Paquier
905ef401d5 pg_stat_statements: Clean up REGRESS list in Makefile
The "wal" and "entry_timestamp" items were still on the same line, which
was not intentional.

Thinko in f9afd56218.

Reported-by: Man Zeng <zengman@halodbtech.com>
Discussion: https://postgr.es/m/aW6_Xc8auuu5iAPi@paquier.xyz
2026-01-21 11:29:34 +09:00
Michael Paquier
f9afd56218 pg_stat_statements: Rework test order
The test "squashing" was the last item of the REGRESS list, but
"cleanup" should be the second to last, dropping the extension.
"oldextversions" is the last item.

In passing, the REGRESS list is cleaned up to include one item per line,
so as diffs are minimized when adding new test files.

Noticed while playing with this area of the code.

Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Man Zeng <zengman@halodbtech.com>
Discussion: https://postgr.es/m/aW6_Xc8auuu5iAPi@paquier.xyz
2026-01-21 07:47:38 +09:00
Michael Paquier
5d95219faa pg_stat_statements: Fix crash in list squashing with Vars
When IN/ANY clauses contain both constants and variable expressions, the
optimizer transforms them into separate structures: constants become
an array expression while variables become individual OR conditions.

This transformation was creating an overlap with the token locations,
causing pg_stat_statements query normalization to crash because it
could not calculate the amount of bytes remaining to write for the
normalized query.

This commit disables squashing for mixed IN list expressions when
constructing a scalar array op, by setting list_start and list_end
to -1 when both variables and non-variables are present.  Some
regression tests are added to PGSS to verify these patterns.

Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0ts9qiONnHjjHxPxtePs22GBo4d3jZ_s2BQC59AN7XbAA@mail.gmail.com
Backpatch-through: 18
2026-01-20 08:11:12 +09:00
Andres Freund
dac328c8a6 bufmgr: Change BufferDesc.state to be a 64-bit atomic
This is motivated by wanting to merge buffer content locks into
BufferDesc.state in a future commit, rather than having a separate lwlock (see
commit c75ebc657f for more details). As this change is rather mechanical, it
seems to make sense to split it out into a separate commit, for easier review.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2026-01-15 14:20:41 -05:00
Álvaro Herrera
35e3fae738
Remove #include <math.h> where not needed
Liujinyang reported the one in binaryheap.c, I then found and analyzed
the rest.

For future patches, we require git archaelogical analysis before we
accept patches of this nature.

Co-authored-by: liujinyang <21043272@qq.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/tencent_6B302BFCAF6F010E00AB5C2C0ECB7AA3F205@qq.com
2026-01-15 19:09:47 +01:00
Michael Paquier
6dcfac9696 Use more consistent *GetDatum() macros for some unsigned numbers
This patch switches some code paths to use GetDatum() macros more in
line with the data types of the variables they manipulate.  This set of
changes does not fix a problem, but it is always nice to be more
consistent across the board.

Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Roman Khapov <rkhapov@yandex-team.ru>
Reviewed-by: Yuan Li <carol.li2025@outlook.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Man Zeng <zengman@halodbtech.com>
Discussion: https://postgr.es/m/CALdSSPidtC7j3MwhkqRj0K2hyp36ztnnjSt6qzGxQtiePR1dzw@mail.gmail.com
2026-01-14 17:07:49 +09:00
Amit Kapila
e385a4e2fd Prevent unintended dropping of active replication origins.
Commit 5b148706c5 exposed functionality that allows multiple processes to
use the same replication origin, enabling non-builtin logical replication
solutions to implement parallel apply for large transactions.

With this functionality, if two backends acquire the same replication
origin and one of them resets it first, the acquired_by flag is cleared
without acknowledging that another backend is still actively using the
origin. This can lead to the origin being unintentionally dropped. If the
shared memory for that dropped origin is later reused for a newly created
origin, the remaining backend that still holds a pointer to the old memory
may inadvertently advance the LSN of a completely different origin,
causing unpredictable behavior.

Although the underlying issue predates commit 5b148706c5, it did not
surface earlier because the internal parallel apply worker mechanism
correctly coordinated origin resets and drops.

This commit resolves the problem by introducing a reference counter for
replication origins. The reference count increases when a backend sets the
origin and decreases when it resets it. Additionally, the backend that
first acquires the origin will not release it until all other backends
using the origin have released it as well.

The patch also prevents dropping a replication origin when acquired_by is
zero but the reference counter is nonzero, covering the scenario where the
first session exits without properly releasing the origin.

Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/TY4PR01MB169077EE72ABE9E55BAF162D494B5A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/CAMPB6wfe4zLjJL8jiZV5kjjpwBM2=rTRme0UCL7Ra4L8MTVdOg@mail.gmail.com
2026-01-14 07:15:46 +00:00
Michael Paquier
e217dc7484 Fix query jumbling with GROUP BY clauses
RangeTblEntry.groupexprs was marked with the node attribute
query_jumble_ignore, causing a list of GROUP BY expressions to be
ignored during the query jumbling.  For example, these two queries could
be grouped together within the same query ID:
SELECT count(*) FROM t GROUP BY a;
SELECT count(*) FROM t GROUP BY b;

However, as such queries use different GROUP BY clauses, they should be
split across multiple entries.

This fixes an oversight in 247dea89f7, that has introduced an RTE for
GROUP BY clauses.  Query IDs are documented as being stable across minor
releases, but as this is a regression new to v18 and that we are still
early in its support cycle, a backpatch is exceptionally done as this
has broken a behavior that exists since query jumbling is supported in
core, since its introduction in pg_stat_statements.

The tests of pg_stat_statements are expanded to cover this area, with
patterns involving GROUP BY and GROUPING clauses.

Author: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxEy2W+tCqC7XuJ94r3ivWsM=onKJp94kRFx3hoARjBeFQ@mail.gmail.com
Backpatch-through: 18
2026-01-14 08:44:12 +09:00
Álvaro Herrera
225d1df1d2
Stop including {brin,gin}_tuple.h in tuplesort.h
Doing this meant that those two headers, which are supposed to be
internal to their corresponding index AMs, were being included pretty
much universally, because tuplesort.h is included by execnodes.h which
is very widely used.  Stop that, and fix fallout.

We also change indexing.h to no longer include execnodes.h (tuptable.h
is sufficient), and relscan.h to no longer include buf.h (pointless
since c2fe139c20).

Author: Mario González <gonzalemario@gmail.com>
Discussion: https://postgr.es/m/CAFsReFUcBFup=Ohv_xd7SNQ=e73TXi8YNEkTsFEE2BW7jS1noQ@mail.gmail.com
2026-01-12 18:09:49 +01:00
Jeff Davis
b96a9fd76f fuzzystrmatch: use pg_ascii_toupper().
fuzzystrmatch is designed for ASCII, so no need to rely on the global
LC_CTYPE setting.

Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/dd0cdd1f-e786-426e-b336-1ffa9b2f1fc6%40eisentraut.org
2026-01-12 08:54:04 -08:00
Álvaro Herrera
2defd00062
Move instrumentation-related structs to instrument_node.h
Some structs and enums related to parallel query instrumentation had
organically grown scattered across various files, and were causing
header pollution especially through execnodes.h.  Create a single file
where they can live together.

This only moves the structs to the new file; cleaning up the pollution
by removing no-longer-necessary cross-header inclusion will be done in
future commits.

Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Co-authored-by: Mario González <gonzalemario@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/202510051642.wwmn4mj77wch@alvherre.pgsql
Discussion: https://postgr.es/m/CAFsReFUr4KrQ60z+ck9cRM4WuUw1TCghN7EFwvV0KvuncTRc2w@mail.gmail.com
2026-01-12 16:59:28 +01:00
Peter Eisentraut
e39ece0343 Make dmetaphone collation-aware
The dmetaphone() SQL function internally upper-cases the argument
string.  It did this using the toupper() function.  That way, it has a
dependency on the global LC_CTYPE locale setting, which we want to get
rid of.

The "double metaphone" algorithm specifically supports the "C with
cedilla" letter, so just using ASCII case conversion wouldn't work.

To fix that, use the passed-in collation and use the str_toupper()
function, which has full awareness of collations and collation
providers.

Note that this does not change the fact that this function only works
correctly with single-byte encodings.  The change to str_toupper()
makes the case conversion multibyte-enabled, but the rest of the
function is still not ready.

Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://www.postgresql.org/message-id/108e07a2-0632-4f00-984d-fe0e0d0ec726%40eisentraut.org
2026-01-12 08:35:48 +01:00
Andres Freund
e5a5e0a907 instrumentation: Keep time fields as instrtime, convert in callers
Previously the instrumentation logic always converted to seconds, only for
many of the callers to do unnecessary division to get to milliseconds. As an
upcoming refactoring will split the Instrumentation struct, utilize instrtime
always to keep things simpler. It's also a bit faster to not have to first
convert to a double in functions like InstrEndLoop(), InstrAggNode().

Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAP53PkzZ3UotnRrrnXWAv=F4avRq9MQ8zU+bxoN9tpovEu6fGQ@mail.gmail.com
2026-01-09 13:38:00 -05:00
Tom Lane
b8ccd29152 Remove now-useless btree_gist--1.2.sql script.
In the wake of the previous commit, this script will fail
if executed via CREATE EXTENSION, so it's useless.  Remove it,
but keep the delta scripts, allowing old (even very old)
versions of the btree_gist SQL objects to be upgraded to 1.9
via ALTER EXTENSION UPDATE after a pg_upgrade.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/2483812.1754072263@sss.pgh.pa.us
2026-01-08 14:09:58 -05:00
Tom Lane
b3b0b45717 Create btree_gist v1.9, in which inet/cidr opclasses aren't default.
btree_gist's gist_inet_ops and gist_cidr_ops opclasses are
fundamentally broken: they rely on an approximate representation of
the inet values and hence sometimes miss rows they should return.
We want to eventually get rid of them altogether, but as the first
step on that journey, we should mark them not-opcdefault.

To do that, roll up the preceding deltas since 1.2 into a new base
script btree_gist--1.9.sql.  This will allow installing 1.9 without
going through a transient situation where gist_inet_ops and
gist_cidr_ops are marked as opcdefault; trying to create them that
way will fail if there's already a matching default opclass in the
core system.  Additionally provide btree_gist--1.8--1.9.sql, so
that a database that's been pg_upgraded from an older version can
be migrated to 1.9.

I noted along the way that commit 57e3c5160 had missed marking the
gist_bool_ops support functions as PARALLEL SAFE.  While that probably
has little harmful effect (since AFAIK we don't check that when
calling index support functions), this seems like a good time to make
things consistent.

Readers will also note that I removed the former habit of installing
some opclass operators/functions with ALTER OPERATOR FAMILY, instead
just rolling them all into the CREATE OPERATOR CLASS steps.  The
comment in btree_gist--1.2.sql that it's necessary to use ALTER for
pg_upgrade reproducibility has been obsolete since we invented the
amadjustmembers infrastructure.  Nowadays, gistadjustmembers will
force all operators and non-required support functions to have "soft"
opfamily dependencies, regardless of whether they are installed by
CREATE or ALTER.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/2483812.1754072263@sss.pgh.pa.us
2026-01-08 13:56:08 -05:00
Michael Paquier
8b9b93e39b Use relation_close() more consistently in contrib/
All the code paths updated here have been using index_close() to
close a relation that was opened with relation_open(), in pgstattuple
and pageinspect.  index_close() does the same thing as relation_close(),
so there is no harm, but being inconsistent could lead to issues if the
internals of these close() functions begin to introduce some specific
logic in the future.

In passing, this commit adds some comments explaining why we are using
relation_open() instead of index_open() in a few places, which is due to
the fact that partitioned indexes are not allowed in these functions.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/aUKamYGiDKO6byp5@ip-10-97-1-34.eu-west-3.compute.internal
2026-01-06 16:17:59 +09:00
Masahiko Sawada
e212a0f8e6 pg_visibility: Fix incorrect buffer lock description in comment.
Although the comment in collect_corrupt_items() stated that the buffer
is locked in exclusive mode, it is actually locked in shared mode.

Author: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/CAEoWx2kkhxgfp=kinPMetnwHaa0JjR6YBkO_0gg0oiy6mu7Zjw@mail.gmail.com
2026-01-05 15:49:43 -08:00
Michael Paquier
b8cfcb9e00 Fix typos and inconsistencies in code and comments
This change is a cocktail of harmonization of function argument names,
grammar typos, renames for better consistency and unused code (see
ltree).  All of these have been spotted by the author.

Author: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/b2c0d0b7-3944-487d-a03d-d155851958ff@gmail.com
2026-01-05 09:19:15 +09:00
David Rowley
4f49e4b55e Fix selectivity estimation integer overflow in contrib/intarray
This fixes a poorly written integer comparison function which was
performing subtraction in an attempt to return a negative value when
a < b and a positive value when a > b, and 0 when the values were equal.
Unfortunately that didn't always work correctly due to two's complement
having the INT_MIN 1 further from zero than INT_MAX.  This could result
in an overflow and cause the comparison function to return an incorrect
result, which would result in the binary search failing to find the
value being searched for.

This could cause poor selectivity estimates when the statistics stored
the value of INT_MAX (2147483647) and the value being searched for was
large enough to result in the binary search doing a comparison with that
INT_MAX value.

Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2ng1Ot5LoKbVU-Dh---dFTUZWJRH8wv2chBu29fnNDMaQ@mail.gmail.com
Backpatch-through: 14
2026-01-04 20:32:40 +13:00
Bruce Momjian
451c43974f Update copyright for 2026
Backpatch-through: 14
2026-01-01 13:24:10 -05:00
Tom Lane
bc6374cd76 Change IndexAmRoutines to be statically-allocated structs.
Up to now, index amhandlers were expected to produce a new, palloc'd
struct on each call.  That requires palloc/pfree overhead, and creates
a risk of memory leaks if the caller fails to pfree, and the time
taken to fill such a large structure isn't nil.  Moreover, we were
storing these things in the relcache, eating several hundred bytes for
each cached index.  There is not anything in these structs that needs
to vary at runtime, so let's change the definition so that an
amhandler can return a pointer to a "static const" struct of which
there's only one copy per index AM.  Mark all the core code's
IndexAmRoutine pointers const so that we catch anyplace that might
still try to change or pfree one.

(This is similar to the way we were already handling TableAmRoutine
structs.  This commit does fix one comment that was infelicitously
copied-and-pasted into tableamapi.c.)

This commit needs to be called out in the v19 release notes as an API
change for extension index AMs.  An un-updated AM will still work
(as of now, anyway) but it risks memory leaks and will be slower than
necessary.

Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAEoWx2=vApYk2LRu8R0DdahsPNEhWUxGBZ=rbZo1EXE=uA+opQ@mail.gmail.com
2025-12-30 18:26:23 -05:00
Tom Lane
cb77bc0442 Further stabilize a postgres_fdw test case.
This patch causes one postgres_fdw test case to revert to the plan
it used before aa86129e1, i.e., using a remote sort in preference to
local sort.  That decision is actually a coin-flip because cost_sort()
will give the same answer on both sides, so that the plan choice comes
down to little more than roundoff error.  In consequence, the test
output can change as a result of even minor changes in nearby costs,
as we saw in aa86129e1 (compare also b690e5fac and 4b14e1871).

b690e5fac's solution to stabilizing the adjacent test case was to
disable sorting locally, and here we extend that to the currently-
problematic case.  Without this, the following patch would cause this
plan choice to change back in this same way, for even less apparent
reason.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/2551253.1766952956@sss.pgh.pa.us
2025-12-29 12:53:49 -05:00
Michael Paquier
9adf32da6b Split some long Makefile lists
This change makes more readable code diffs when adding new items or
removing old items, while ensuring that lines do not get excessively
long.  Some SUBDIRS, PROGRAMS and REGRESS lists are split.

Note that there are a few more REGRESS lists that could be split,
particularly in contrib/.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Co-Authored-By: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Man Zeng <zengman@halodbtech.com>
Discussion: https://postgr.es/m/DF6HDGB559U5.3MPRFCWPONEAE@jeltef.nl
2025-12-28 09:17:42 +09:00
Masahiko Sawada
55c46bbf3a pg_visibility: Use visibilitymap_count instead of loop.
This commit updates pg_visibility_map_summary() to use the
visibilitymap_count() API, replacing its own counting mechanism. This
simplifies the function and improves performance by leveraging the
vectorized implementation introduced in commit 41c51f0c68.

Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAEze2WgPu-EYYuYQimy=AHQHGa7w8EvLVve5DM5eGMR6zh-7sw@mail.gmail.com
2025-12-23 10:33:06 -08:00
Michael Paquier
5cf03552fb btree_gist: Fix memory allocation formula
This change has been suggested by the two authors listed in this commit,
both of them providing an incomplete solution (David's formula relied on
a "bytea *", while Bertrand's did not use palloc_array()).  The solution
provided in this commit uses GBT_VARKEY instead of the inconsistent
bytea for the allocation size, with a palloc_array().

The change related to Vsrt is one I am flipping to a more consistent
style, in passing.

Author: David Geier <geidav.pg@gmail.com>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
Discussion: https://postgr.es/m/aTrG3Fi4APtfiCvQ@ip-10-97-1-34.eu-west-3.compute.internal
2025-12-18 11:01:43 +09:00
Jeff Davis
7f007e4a04 ltree: fix case-insensitive matching.
Previously, ltree_prefix_eq_ci() used lowercasing with the default
collation; while ltree_crc32_sz() used tolower() directly. These were
equivalent only if the default collation provider was libc and the
encoding was single-byte.

Change both to use casefolding with the default collation.

Backpatch through 18, where the casefolding APIs were introduced. The
bug exists in earlier versions, but would require some adaptation.

A REINDEX is required for ltree indexes where the database default
collation is not libc.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Backpatch-through: 18
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Discussion: https://postgr.es/m/01fc00fd66f641b9693d4f9f1af0ccf44cbdfbdf.camel@j-davis.com
2025-12-16 12:57:00 -08:00
Jeff Davis
84d5efa7e3 Fix multibyte issue in ltree_strncasecmp().
Previously, the API for ltree_strncasecmp() took two inputs but only
one length (that of the smaller input). It truncated the larger input
to that length, but that could break a multibyte sequence.

Change the API to be a check for prefix equality (possibly
case-insensitive) instead, which is all that's needed by the
callers. Also, provide the lengths of both inputs.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/5f65b85740197ba6249ea507cddf609f84a6188b.camel%40j-davis.com
Backpatch-through: 14
2025-12-16 10:35:40 -08:00
Nathan Bossart
48d4a1423d Allow passing a pointer to GetNamedDSMSegment()'s init callback.
This commit adds a new "void *arg" parameter to
GetNamedDSMSegment() that is passed to the initialization callback
function.  This is useful for reusing an initialization callback
function for multiple DSM segments.

Author: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAN4CZFMjh8TrT9ZhWgjVTzBDkYZi2a84BnZ8bM%2BfLPuq7Cirzg%40mail.gmail.com
2025-12-15 14:27:16 -06:00
Michael Paquier
171198ff2a pageinspect: use index_close() for GiST index relation
gist_page_items() opens its target relation with index_open(), but
closed it using relation_close() instead of index_close().  This was
harmless because index_close() and relation_close() do the exact same
work, still inconsistent with the rest of the code tree as routines
opening and closing a relation based on a relkind are expected to match,
at least in name.

Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2=bL41WWcD-4Fxx-buS2Y2G5=9PjkxZbHeFMR6Uy2WNvw@mail.gmail.com
2025-12-15 10:28:28 +09:00
Peter Eisentraut
493eb0da31 Replace most StaticAssertStmt() with StaticAssertDecl()
Similar to commit 75f49221c2, it is preferable to use
StaticAssertDecl() instead of StaticAssertStmt() when possible.

Discussion: https://www.postgresql.org/message-id/flat/CA%2BhUKGKvr0x_oGmQTUkx%3DODgSksT2EtgCA6LmGx_jQFG%3DsDUpg%40mail.gmail.com
2025-12-12 10:06:40 +01:00
Michael Paquier
4f7dacc5b8 Use palloc_object() and palloc_array(), the last change
This is the last batch of changes that have been suggested by the
author, this part covering the non-trivial changes.  Some of the changes
suggested have been discarded as they seem to lead to more instructions
generated, leaving the parts that can be qualified as in-place
replacements.

Similar work has been done in 1b105f9472, 0c3c5c3b06 and
31d3847a37.

Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
2025-12-11 14:29:12 +09:00
Michael Paquier
3f83de20ba pg_buffercache: Fix memory allocation formula
The code over-allocated the memory required for os_page_status, relying
on uint64 for its element size instead of an int, hence doubling what
was required.  This could mean quite a lot of memory if dealing with a
lot of NUMA pages.

Oversight in ba2a3c2302.

Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
Backpatch-through: 18
2025-12-11 14:11:06 +09:00
Peter Eisentraut
2268f2b91b Remove useless casts in format arguments
There were a number of useless casts in format arguments, either
where the input to the cast was already in the right type, or
seemingly uselessly casting between types instead of just using the
right format placeholder to begin with.

Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/07fa29f9-42d7-4aac-8834-197918cbbab6%40eisentraut.org
2025-12-09 07:33:08 +01:00
Peter Eisentraut
907caf5c39 Clean up int64-related format strings
Remove some gratuitous uses of INT64_FORMAT.  Make use of
PRIu64/PRId64 were appropriate, remove unnecessary casts.

Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/07fa29f9-42d7-4aac-8834-197918cbbab6%40eisentraut.org
2025-12-09 07:33:08 +01:00
Tom Lane
0986e95161 Revise APIs for pushJsonbValue() and associated routines.
Instead of passing "JsonbParseState **" to pushJsonbValue(),
pass a pointer to a JsonbInState, which will contain the
parseState stack pointer as well as other useful fields.
Also, instead of returning a JsonbValue pointer that is often
meaningless/ignored, return the top-level JsonbValue pointer
in the "result" field of the JsonbInState.

This involves a lot of (mostly mechanical) edits, but I think
the results are notationally cleaner and easier to understand.
Certainly the business with sometimes capturing the result of
pushJsonbValue() and sometimes not was bug-prone and incapable of
mechanical verification.  In the new arrangement, JsonbInState.result
remains null until we've completed a valid sequence of pushes, so
that an incorrect sequence will result in a null-pointer dereference,
not mistaken use of a partial result.

However, this isn't simply an exercise in prettier notation.
The real reason for doing it is to provide a mechanism whereby
pushJsonbValue() can be told to construct the JsonbValue tree
in a context that is not CurrentMemoryContext.  That happens
when a non-null "outcontext" is specified in the JsonbInState.
No callers exercise that option in this patch, but the next
patch in the series will make use of it.

I tried to improve the comments in this area too.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/1060917.1753202222@sss.pgh.pa.us
2025-12-07 11:51:33 -05:00
Michael Paquier
31d3847a37 Use more palloc_object() and palloc_array() in contrib/
The idea is to encourage more the use of these new routines across the
tree, as these offer stronger type safety guarantees than palloc().  In
an ideal world, palloc() would then act as an internal routine of these
flavors, whose footprint in the tree is minimal.

The patch sent by the author is very large, and this chunk of changes
represents something like 10% of the overall patch submitted.

The code compiled is the same before and after this commit, using
objdump to do some validation with a difference taken in-between.  There
are some diffs, which are caused by changes in line numbers because some
of the new allocation formulas are shorter, for the following files:
trgm_regexp.c, xpath.c and pg_walinspect.c.

Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
2025-12-05 16:40:26 +09:00
Amit Kapila
5db6a344ab Rename column slotsync_skip_at to slotsync_last_skip.
Commit 76b78721ca introduced two new columns in pg_stat_replication_slots
to improve monitoring of slot synchronization. One of these columns was
named slotsync_skip_at, which is inconsistent with the naming convention
used for similar columns in other system views.

Columns that store timestamps of the most recent event typically use the
'last_' in the column name (e.g., last_autovacuum, checksum_last_failure).
Renaming slotsync_skip_at to slotsync_last_skip aligns with this pattern,
making the purpose of the column clearer and improving overall consistency
across the views.

Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Michael Banck <mbanck@gmx.net>
Discussion: https://postgr.es/m/20251128091552.GB13635@p46.dedyn.io;lightning.p46.dedyn.io
Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com
2025-12-05 04:12:55 +00:00
Peter Eisentraut
e158fd4d68 Remove no longer needed casts from Pointer
These casts used to be required when Pointer was char *, but now it's
void * (commit 1b2bb5077e), so they are not needed anymore.

Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org
2025-12-04 20:44:52 +01:00
Peter Eisentraut
c6be3daa05 Remove no longer needed casts to Pointer
These casts used to be required when Pointer was char *, but now it's
void * (commit 1b2bb5077e), so they are not needed anymore.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org
2025-12-04 19:40:08 +01:00
Álvaro Herrera
6bd469d26a
amcheck: Fix snapshot usage in bt_index_parent_check
We were using SnapshotAny to do some index checks, but that's wrong and
causes spurious errors when used on indexes created by CREATE INDEX
CONCURRENTLY.  Fix it to use an MVCC snapshot, and add a test for it.

This problem came in with commit 5ae2087202, which introduced
uniqueness check.  Backpatch to 17.

Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Backpatch-through: 17
Discussion: https://postgr.es/m/CANtu0ojmVd27fEhfpST7RG2KZvwkX=dMyKUqg0KM87FkOSdz8Q@mail.gmail.com
2025-12-04 18:12:08 +01:00
Peter Eisentraut
9790affcce Fix stray references to SubscriptRef
This type never existed.  SubscriptingRef was meant instead.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/2eaa45e3-efc5-4d75-b082-f8159f51445f%40eisentraut.org
2025-12-03 14:44:14 +01:00
Peter Eisentraut
756a436893 Don't rely on pointer arithmetic with Pointer type
The comment for the Pointer type says 'XXX Pointer arithmetic is done
with this, so it can't be void * under "true" ANSI compilers.'.  This
fixes that.  Change from Pointer to use char * explicitly where
pointer arithmetic is needed.  This makes the meaning of the code
clearer locally and removes a dependency on the actual definition of
the Pointer type.  (The definition of the Pointer type is not changed
in this commit.)

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org
2025-12-03 09:54:15 +01:00
Peter Eisentraut
623801b3bd Remove useless casts to Pointer
in arguments of memcpy() and memmove() calls

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org
2025-12-03 08:40:33 +01:00
Heikki Linnakangas
cbe04e5d72 Fix amcheck's handling of half-dead B-tree pages
amcheck incorrectly reported the following error if there were any
half-dead pages in the index:

ERROR:  mismatch between parent key and child high key in index
"amchecktest_id_idx"

It's expected that a half-dead page does not have a downlink in the
parent level, so skip the test.

Reported-by: Konstantin Knizhnik <knizhnik@garret.ru>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Discussion: https://www.postgresql.org/message-id/33e39552-6a2a-46f3-8b34-3f9f8004451f@garret.ru
Backpatch-through: 14
2025-12-02 21:11:15 +02:00
Heikki Linnakangas
6c05ef5729 Fix amcheck's handling of incomplete root splits in B-tree
When the root page is being split, it's normal that root page
according to the metapage is not marked BTP_ROOT. Fix bogus error in
amcheck about that case.

Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/abd65090-5336-42cc-b768-2bdd66738404@iki.fi
Backpatch-through: 14
2025-12-02 21:10:51 +02:00
Peter Eisentraut
4f941d432b Remove useless casting to same type
This removes some casts where the input already has the same type as
the type specified by the cast.  Their presence could cause risks of
hiding actual type mismatches in the future or silently discarding
qualifiers.  It also improves readability.  Same kind of idea as
7f798aca1d and ef8fe69360.  (This does not change all such
instances, but only those hand-picked by the author.)

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/aSQy2JawavlVlEB0%40ip-10-97-1-34.eu-west-3.compute.internal
2025-12-02 10:09:32 +01:00
Michael Paquier
713d9a847e Update some timestamp[tz] functions to use soft-error reporting
This commit updates two functions that convert "timestamptz" to
"timestamp", and vice-versa, to use the soft error reporting rather than
a their own logic to do the same.  These are now named as follows:
- timestamp2timestamptz_safe()
- timestamptz2timestamp_safe()

These functions were suffixed with "_opt_overflow", previously.

This shaves some code, as it is possible to detect how a timestamp[tz]
overflowed based on the returned value rather than a custom state.  It
is optionally possible for the callers of these functions to rely on the
error generated internally by these functions, depending on the error
context.

Similar work has been done in d03668ea05 and 4246a977ba.

Reviewed-by: Amul Sul <sulamul@gmail.com>
Discussion: https://postgr.es/m/aS09YF2GmVXjAxbJ@paquier.xyz
2025-12-02 09:30:23 +09:00
Michael Paquier
d03668ea05 Switch some date/timestamp functions to use the soft error reporting
This commit changes some functions related to the data types date and
timestamp to use the soft error reporting rather than a custom boolean
flag called "overflow", used to let the callers of these functions know
if an overflow happens.

This results in the removal of some boilerplate code, as it is possible
to rely on an error context rather than a custom state, with the
possibility to use the error generated inside the functions updated
here, if necessary.

These functions were suffixed with "_opt_overflow".  They are now
renamed to use "_safe" as suffix.

This work is similar to 4246a977ba.

Author: Amul Sul <sulamul@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAAJ_b95HEmFyzHZfsdPquSHeswcopk8MCG1Q_vn4tVkZ+xxofw@mail.gmail.com
2025-12-01 15:22:20 +09:00
Michael Paquier
9ccc049dfe pg_buffercache: Add pg_buffercache_mark_dirty{,_relation,_all}()
This commit introduces three new functions for marking shared buffers as
dirty by using the functions introduced in 9660906dbd:
* pg_buffercache_mark_dirty() for one shared buffer.
- pg_buffercache_mark_dirt_relation() for all the shared buffers in a
relation.
* pg_buffercache_mark_dirty_all() for all the shared buffers in pool.

The "_all" and "_relation" flavors are designed to address the
inefficiency of repeatedly calling pg_buffercache_mark_dirty() for each
individual buffer, which can be time-consuming when dealing with with
large shared buffers pool.

These functions are intended as developer tools and are available only
to superusers.  There is no need to bump the version of pg_buffercache,
4b203d499c having done this job in this release cycle.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Joseph Koshakow <koshy44@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Yuhang Qiu <iamqyh@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw@mail.gmail.com
2025-11-28 09:04:04 +09:00
David Rowley
42473b3b31 Have the planner replace COUNT(ANY) with COUNT(*), when possible
This adds SupportRequestSimplifyAggref to allow pg_proc.prosupport
functions to receive an Aggref and allow them to determine if there is a
way that the Aggref call can be optimized.

Also added is a support function to allow transformation of COUNT(ANY)
into COUNT(*).  This is possible to do when the given "ANY" cannot be
NULL and also that there are no ORDER BY / DISTINCT clauses within the
Aggref.  This is a useful transformation to do as it is common that
people write COUNT(1), which until now has added unneeded overhead.
When counting a NOT NULL column.  The overheads can be worse as that
might mean deforming more of the tuple, which for large fact tables may
be many columns in.

It may be possible to add prosupport functions for other aggregates.  We
could consider if ORDER BY could be dropped for some calls, e.g. the
ORDER BY is quite useless in MAX(c ORDER BY c).

There is a little bit of passing fallout from adjusting
expr_is_nonnullable() to handle Const which results in a plan change in
the aggregates.out regression test.  Previously, nothing was able to
determine that "One-Time Filter: (100 IS NOT NULL)" was always true,
therefore useless to include in the plan.

Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAApHDvqGcPTagXpKfH=CrmHBqALpziThJEDs_MrPqjKVeDF9wA@mail.gmail.com
2025-11-27 10:43:28 +13:00
Amit Kapila
76b78721ca Add slotsync skip statistics.
This patch adds two new columns to the pg_stat_replication_slots view:
slotsync_skip_count - the total number of times a slotsync operation was
skipped.
slotsync_skip_at - the timestamp of the most recent skip.

These additions provide better visibility into replication slot
synchronization behavior.

A future patch will introduce the slotsync_skip_reason column in
pg_replication_slots to capture the reason for skip.

Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com
2025-11-25 07:06:02 +00:00
Michael Paquier
4b203d499c pg_buffercache: Add pg_buffercache_os_pages
ba2a3c2302 has added a way to check if a buffer is spread across
multiple pages with some NUMA information, via a new view
pg_buffercache_numa that depends on pg_buffercache_numa_pages(), a SQL
function.  These can only be queried when support for libnuma exists,
generating an error if not.

However, it can be useful to know how shared buffers and OS pages map
when NUMA is not supported or not available.  This commit expands the
capabilities around pg_buffercache_numa:
- pg_buffercache_numa_pages() is refactored as an internal function,
able to optionally process NUMA.  Its SQL definition prior to this
commit is still around to ensure backward-compatibility with v1.6.
- A SQL function called pg_buffercache_os_pages() is added, able to work
with or without NUMA.
- The view pg_buffercache_numa is redefined to use
pg_buffercache_os_pages().
- A new view is added, called pg_buffercache_os_pages.  This ignores
NUMA for its result processing, for a better efficiency.

The implementation is done so as there is no code duplication between
the NUMA and non-NUMA views/functions, relying on one internal function
that does the job for all of them.  The module is bumped to v1.7.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/Z/fFA2heH6lpSLlt@ip-10-97-1-34.eu-west-3.compute.internal
2025-11-24 14:29:15 +09:00
Michael Paquier
7d9043aee8 pg_buffercache: Remove unused fields from BufferCacheNumaRec
These fields have been added in commit ba2a3c2302, and have never been
used.  While on it, this commit moves a comment that was out of place,
improving it.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aSBOKX6pLJzumbmF@ip-10-97-1-34.eu-west-3.compute.internal
2025-11-23 13:37:42 +09:00
Michael Paquier
694b4ab33b pg_buffercache: Fix incorrect result cast for relforknumber
pg_buffercache_pages.relforknumber is defined as an int2, but its value
was stored with ObjectIdGetDatum() rather than Int16GetDatum() in the
result record.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAExHW5s2_qwSdhKpVnUzjRMf0cf1PvmhUHQDLaFM3QzKbP1OyQ@mail.gmail.com
2025-11-18 15:46:43 +09:00
Andres Freund
c75ebc657f bufmgr: Allow some buffer state modifications while holding header lock
Until now BufferDesc.state was not allowed to be modified while the buffer
header spinlock was held. This meant that operations like unpinning buffers
needed to use a CAS loop, waiting for the buffer header spinlock to be
released before updating.

The benefit of that restriction is that it allowed us to unlock the buffer
header spinlock with just a write barrier and an unlocked write (instead of a
full atomic operation). That was important to avoid regressions in
48354581a4. However, since then the hottest buffer header spinlock uses have
been replaced with atomic operations (in particular, the most common use of
PinBuffer_Locked(), in GetVictimBuffer() (formerly in BufferAlloc()), has been
removed in 5e89985928).

This change will allow, in a subsequent commit, to release buffer pins with a
single atomic-sub operation. This previously was not possible while such
operations were not allowed while the buffer header spinlock was held, as an
atomic-sub would not have allowed a race-free check for the buffer header lock
being held.

Using atomic-sub to unpin buffers is a nice scalability win, however it is not
the primary motivation for this change (although it would be sufficient). The
primary motivation is that we would like to merge the buffer content lock into
BufferDesc.state, which will result in more frequent changes of the state
variable, which in some situations can cause a performance regression, due to
an increased CAS failure rate when unpinning buffers.  The regression entirely
vanishes when using atomic-sub.

Naively implementing this would require putting CAS loops in every place
modifying the buffer state while holding the buffer header lock. To avoid
that, introduce UnlockBufHdrExt(), which can set/add flags as well as the
refcount, together with releasing the lock.

Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-11-06 16:42:10 -05:00
Álvaro Herrera
a2b02293bc
Use XLogRecPtrIsValid() in various places
Now that commit 06edbed478 has introduced XLogRecPtrIsValid(), we can
use that instead of:

- XLogRecPtrIsInvalid()
- direct comparisons with InvalidXLogRecPtr
- direct comparisons with literal 0

This makes the code more consistent.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aQB7EvGqrbZXrMlg@ip-10-97-1-34.eu-west-3.compute.internal
2025-11-06 20:33:57 +01:00
Etsuro Fujita
c3eec94fc1 postgres_fdw: Add more test coverage for EvalPlanQual testing.
postgres_fdw supports EvalPlanQual testing by using the infrastructure
provided by the core with the RecheckForeignScan callback routine (cf.
commits 5fc4c26db and 385f337c9), but there has been no test coverage
for that, except that recent commit 12609fbac, which fixed an issue in
commit 385f337c9, added a test case to exercise only a code path added
by that commit to the core infrastructure.  So let's add test cases to
exercise other code paths as well at this time.

Like commit 12609fbac, back-patch to all supported branches.

Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/CAPmGK15%2B6H%3DkDA%3D-y3Y28OAPY7fbAdyMosVofZZ%2BNc769epVTQ%40mail.gmail.com
Backpatch-through: 13
2025-11-06 12:15:00 +09:00
David Rowley
6d0eba6627 Use stack allocated StringInfoDatas, where possible
Various places that were using StringInfo but didn't need that
StringInfo to exist beyond the scope of the function were using
makeStringInfo(), which allocates both a StringInfoData and the buffer it
uses as two separate allocations.  It's more efficient for these cases to
use a StringInfoData on the stack and initialize it with initStringInfo(),
which only allocates the string buffer.  This also simplifies the cleanup,
in a few cases.

Author: Mats Kindahl <mats.kindahl@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/4379aac8-26f1-42f2-a356-ff0e886228d3@gmail.com
2025-11-06 14:59:48 +13:00
Tom Lane
ff8aba65d4 Fix contrib/ltree's subpath() with negative offset.
subpath(ltree,offset,len) now correctly errors when given an offset
less than -n, where n is the number of labels in the given ltree.
There was a duplicate block of code that allowed an offset as low
as -2n.  The documentation says no such thing, so this must have
been a copy-and-paste error in the original ltree patch.

While here, avoid redundant calculation of "end" and write
LTREE_MAX_LEVELS rather than its hard-coded value.

Author: Marcus Gartner <m.a.gartner@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAAUGV_SvBO9gWYbaejb9nhe-mS9FkNP4QADNTdM3wdRhvLobwA@mail.gmail.com
2025-11-01 13:25:42 -04:00
Peter Eisentraut
8ce795fcb7 Fix some confusing uses of const
There are a few places where we have

    typedef struct FooData { ... } FooData;
    typedef FooData *Foo;

and then function declarations with

    bar(const Foo x)

which isn't incorrect but probably meant

    bar(const FooData *x)

meaning that the thing x points to is immutable, not x itself.

This patch makes those changes where appropriate.  In one
case (execGrouping.c), the thing being pointed to was not immutable,
so in that case remove the const altogether, to avoid further
confusion.

Co-authored-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAEoWx2m2E0xE8Kvbkv31ULh_E%2B5zph-WA_bEdv3UR9CLhw%2B3vg%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAEoWx2kTDz%3Db6T2xHX78vy_B_osDeCC5dcTCi9eG0vXHp5QpdQ%40mail.gmail.com
2025-10-30 11:20:04 +01:00
Álvaro Herrera
16edc1b94f
pg_stat_statements: Fix handling of duplicate constant locations
Two or more constants can have the same location.  We handled this
correctly for non squashed constants, but failed to do it if squashed
(resulting in out-of-bounds memory access), because the code structure
became broken by commit 0f65f3eec4: we failed to update 'last_loc'
correctly when skipping these squashed constants.

The simplest fix seems to be to get rid of 'last_loc' altogether -- in
hindsight, it's quite pointless.  Also, when ignoring a constant because
of this, make sure to fulfill fill_in_constant_lengths's duty of setting
its length to -1.

Lastly, we can use == instead of <= because the locations have been
sorted beforehand, so the < case cannot arise.

Co-authored-by: Sami Imseih <samimseih@gmail.com>
Co-authored-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reported-by: Konstantin Knizhnik <knizhnik@garret.ru>
Backpatch-through: 18
Discussion: https://www.postgresql.org/message-id/2b91e358-0d99-43f7-be44-d2d4dbce37b3%40garret.ru
2025-10-29 12:35:02 +01:00
Tom Lane
0758111f5d Update expected output for contrib/sepgsql's regression tests.
Commit 65281391a caused some additional error context lines to
appear in the output of one test case.  That's fine, but we missed
updating the expected output.  Do it now.

While here, add some missing test-output subdirectories to
contrib/sepgsql/.gitignore, so that we don't get git warnings
after running the tests.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1613232.1761255361@sss.pgh.pa.us
Backpatch-through: 18
2025-10-23 17:47:10 -04:00
David Rowley
2470ca435c Use CompactAttribute more often, when possible
5983a4cff added CompactAttribute for storing commonly used fields from
FormData_pg_attribute.  5983a4cff didn't go to the trouble of adjusting
every location where we can use CompactAttribute rather than
FormData_pg_attribute, so here we change the remaining ones.

There are some locations where I've left the code using
FormData_pg_attribute.  These are mostly in the ALTER TABLE code.  Using
CompactAttribute here seems more risky as often the TupleDesc is being
changed and those changes may not have been flushed to the
CompactAttribute yet.

I've also left record_recv(), record_send(), record_cmp(), record_eq()
and record_image_eq() alone as it's not clear to me that accessing the
CompactAttribute is a win here due to the FormData_pg_attribute still
having to be accessed for most cases.  Switching the relevant parts to
use CompactAttribute would result in having to access both for common
cases.  Careful benchmarking may reveal that something can be done to
make this better, but in absence of that, the safer option is to leave
these alone.

In ReorderBufferToastReplace(), there was a check to skip attnums < 0
while looping over the TupleDesc.  Doing this is redundant since
TupleDescs don't store < 0 attnums.  Removing that code allows us to
move to using CompactAttribute.

The change in validateDomainCheckConstraint() just moves fetching the
FormData_pg_attribute into the ERROR path, which is cold due to calling
errstart_cold() and results in code being moved out of the common path.

Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAApHDvrMy90o1Lgkt31F82tcSuwRFHq3vyGewSRN=-QuSEEvyQ@mail.gmail.com
2025-10-22 11:36:26 +13:00
Masahiko Sawada
4bea91f21f Support COPY TO for partitioned tables.
Previously, COPY TO command didn't support directly specifying
partitioned tables so users had to use COPY (SELECT ...) TO variant.

This commit adds direct COPY TO support for partitioned
tables, improving both usability and performance. Performance tests
show it's faster than the COPY (SELECT ...) TO variant as it avoids
the overheads of query processing and sending results to the COPY TO
command.

When used with partitioned tables, COPY TO copies the same rows as
SELECT * FROM table. Row-level security policies of the partitioned
table are applied in the same way as when executing COPY TO on a plain
table.

Author: jian he <jian.universality@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Melih Mutlu <m.melihmutlu@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CACJufxEZt%2BG19Ors3bQUq-42-61__C%3Dy5k2wk%3DsHEFRusu7%3DiQ%40mail.gmail.com
2025-10-20 10:38:52 -07:00
Tom Lane
da44d71e79 Allow role created by new test to log in on Windows.
We must tell init about each role name we plan to connect as,
else SSPI auth fails.  Similar to previous patches such as
14793f471, 973542866.

Oversight in 208927e65, per buildfarm member drongo.
(Although that was back-patched to v13, the test script
only exists in v16 and up.)
2025-10-18 18:36:21 -04:00
Nathan Bossart
208927e656 Fix privilege checks for pg_prewarm() on indexes.
pg_prewarm() currently checks for SELECT privileges on the target
relation.  However, indexes do not have access rights of their own,
so a role may be denied permission to prewarm an index despite
having the SELECT privilege on its parent table.  This commit fixes
this by locking the parent table before the index (to avoid
deadlocks) and checking for SELECT on the parent table.  Note that
the code is largely borrowed from
amcheck_lock_relation_and_check().

An obvious downside of this change is the extra AccessShareLock on
the parent table during prewarming, but that isn't expected to
cause too much trouble in practice.

Author: Ayush Vatsa <ayushvatsa1810@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/CACX%2BKaMz2ZoOojh0nQ6QNBYx8Ak1Dkoko%3DD4FSb80BYW%2Bo8CHQ%40mail.gmail.com
Backpatch-through: 13
2025-10-17 11:36:50 -05:00
Michael Paquier
fabb33b351 Improve TAP tests by replacing ok() with better Test::More functions
The TAP tests whose ok() calls are changed in this commit were relying
on perl operators, rather than equivalents available in Test::More.  For
example, rather than the following:
ok($data =~ qr/expr/m, "expr matching");
ok($data !~ qr/expr/m, "expr not matching");
The new test code uses this equivalent:
like($data, qr/expr/m, "expr matching");
unlike($data, qr/expr/m, "expr not matching");

A huge benefit of the new formulation is that it is possible to know
about the values we are checking if a failure happens, making debugging
easier, should the test runs happen in the buildfarm, in the CI or
locally.

This change leads to more test code overall as perltidy likes to make
the code pretty the way it is in this commit.

Author: Sadhuprasad Patro <b.sadhu@gmail.com>
Discussion: https://postgr.es/m/CAFF0-CHhwNx_Cv2uy7tKjODUbeOgPrJpW4Rpf1jqB16_1bU2sg@mail.gmail.com
2025-10-17 14:39:09 +09:00
Etsuro Fujita
12609fbacb Fix EvalPlanQual handling of foreign/custom joins in ExecScanFetch.
If inside an EPQ recheck, ExecScanFetch would run the recheck method
function for foreign/custom joins even if they aren't descendant nodes
in the EPQ recheck plan tree, which is problematic at least in the
foreign-join case, because such a foreign join isn't guaranteed to have
an alternative local-join plan required for running the recheck method
function; in the postgres_fdw case this could lead to a segmentation
fault or an assert failure in an assert-enabled build when running the
recheck method function.

Even if inside an EPQ recheck, any scan nodes that aren't descendant
ones in the EPQ recheck plan tree should be normally processed by using
the access method function; fix by modifying ExecScanFetch so that if
inside an EPQ recheck, it runs the recheck method function for
foreign/custom joins that are descendant nodes in the EPQ recheck plan
tree as before and runs the access method function for foreign/custom
joins that aren't.

This fix also adds to postgres_fdw an isolation test for an EPQ recheck
that caused issues stated above.

Oversight in commit 385f337c9.

Reported-by: Kristian Lejao <kristianlejao@gmail.com>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Co-authored-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/CAD21AoBpo6Gx55FBOW+9s5X=nUw3Xpq64v35fpDEKsTERnc4TQ@mail.gmail.com
Backpatch-through: 13
2025-10-15 17:15:00 +09:00
Nathan Bossart
c9b299f6df dblink: Avoid locking relation before privilege check.
The present coding of dblink's get_rel_from_relname() predates the
introduction of RangeVarGetRelidExtended(), which provides a way to
check permissions before locking the relation.  This commit adjusts
get_rel_from_relname() to use that function.

Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/aOgmi6avE6qMw_6t%40nathan
2025-10-14 12:20:48 -05:00
Masahiko Sawada
d3b6183dd9 Add mem_exceeded_count column to pg_stat_replication_slots.
This commit introduces a new column mem_exceeded_count to the
pg_stat_replication_slots view. This counter tracks how often the
memory used by logical decoding exceeds the logical_decoding_work_mem
limit. The new statistic helps users determine whether exceeding the
logical_decoding_work_mem limit is a rare occurrences or a frequent
issue, information that wasn't available through existing statistics.

Bumps catversion.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/978D21E8-9D3B-40EA-A4B1-F87BABE7868C@yesql.se
2025-10-08 10:05:04 -07:00
Robert Haas
c83ac02ec7 Add ExplainState argument to pg_plan_query() and planner().
This allows extensions to have access to any data they've stored
in the ExplainState during planning. Unfortunately, it won't help
with EXPLAIN EXECUTE is used, but since that case is less common,
this still seems like an improvement.

Since planner() has quite a few arguments now, also add some
documentation of those arguments and the return value.

Author: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
2025-10-08 08:33:29 -04:00
Richard Guo
8e11859102 Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined.  Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.

In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations.  To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node.  During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied.  If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.

Grouped relation paths can be generated in two ways.  The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths.  To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation.  Joining two grouped relations
is currently not supported.

To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.

For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators.  This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does.  This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.

One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation.  Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.

If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end.  The final paths will compete in the
usual way with paths built from regular planning.

The patch was originally proposed by Antonin Houska in 2017.  This
commit reworks various important aspects and rewrites most of the
current code.  However, the original patch and reviews were very
useful.

Author: Richard Guo <guofenglinux@gmail.com>
Author: Antonin Houska <ah@cybertec.at> (in an older version)
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me> (in an older version)
Reviewed-by: Andy Fan <zhihuifan1213@163.com> (in an older version)
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> (in an older version)
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
2025-10-08 17:04:23 +09:00
Robert Haas
8c49a484e8 Assign each subquery a unique name prior to planning it.
Previously, subqueries were given names only after they were planned,
which makes it difficult to use information from a previous execution of
the query to guide future planning. If, for example, you knew something
about how you want "InitPlan 2" to be planned, you won't know whether
the subquery you're currently planning will end up being "InitPlan 2"
until after you've finished planning it, by which point it's too late to
use the information that you had.

To fix this, assign each subplan a unique name before we begin planning
it. To improve consistency, use textual names for all subplans, rather
than, as we did previously, a mix of numbers (such as "InitPlan 1") and
names (such as "CTE foo"), and make sure that the same name is never
assigned more than once.

We adopt the somewhat arbitrary convention of using the type of sublink
to set the plan name; for example, a query that previously had two
expression sublinks shown as InitPlan 2 and InitPlan 1 will now end up
named expr_1 and expr_2. Because names are assigned before rather than
after planning, some of the regression test outputs show the numerical
part of the name switching positions: what was previously SubPlan 2 was
actually the first one encountered, but we finished planning it later.

We assign names even to subqueries that aren't shown as such within the
EXPLAIN output. These include subqueries that are a FROM clause item or
a branch of a set operation, rather than something that will be turned
into an InitPlan or SubPlan. The purpose of this is to make sure that,
below the topmost query level, there's always a name for each subquery
that is stable from one planning cycle to the next (assuming no changes
to the query or the database schema).

Author: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: http://postgr.es/m/3641043.1758751399@sss.pgh.pa.us
2025-10-07 09:18:54 -04:00
Álvaro Herrera
1a8b5b11e4
Don't include access/htup_details.h in executor/tuptable.h
This is not at all needed; I suspect it was a simple mistake in commit
5408e233f0.  It causes htup_details.h to bleed into a huge number of
places via execnodes.h.  Remove it and fix fallout.

Discussion: https://postgr.es/m/202510021240.ptc2zl5cvwen@alvherre.pgsql
2025-10-05 18:00:38 +02:00
Michael Paquier
684a745f55 pgstattuple: Improve reports generated for indexes (hash, gist, btree)
pgstattuple checks the state of the pages retrieved for gist and hash
using some check functions from each index AM, respectively
gistcheckpage() and _hash_checkpage().  When these are called, they
would fail when bumping on data that is found as incorrect (like opaque
area size not matching, or empty pages), contrary to btree that simply
discards these cases and continues to aggregate data.

Zero pages can happen after a crash, with these AMs being able to do an
internal cleanup when these are seen.  Also, sporadic failures are
annoying when doing for example a large-scale diagnostic query based on
pgstattuple with a join of pg_class, as it forces one to use tricks like
quals to discard hash or gist indexes, or use a PL wrapper able to catch
errors.

This commit changes the reports generated for btree, gist and hash to
be more user-friendly;
- When seeing an empty page, report it as free space.  This new rule
applies to gist and hash, and already applied to btree.
- For btree, a check based on the size of BTPageOpaqueData is added.
- For gist indexes, gistcheckpage() is not called anymore, replaced by a
check based on the size of GISTPageOpaqueData.
- For hash indexes, instead of _hash_getbuf_with_strategy(), use a
direct call to ReadBufferExtended(), coupled with a check based on
HashPageOpaqueData.  The opaque area size check was already used.
- Pages that do not match these criterias are discarded from the stats
reports generated.

There have been a couple of bug reports over the years that complained
about the current behavior for hash and gist, as being not that useful,
with nothing being done about it.  Hence this change is backpatched down
to v13.

Reported-by: Noah Misch <noah@leadboat.com>
Author: Nitin Motiani <nitinmotiani@google.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/CAH5HC95gT1J3dRYK4qEnaywG8RqjbwDdt04wuj8p39R=HukayA@mail.gmail.com
Backpatch-through: 13
2025-10-02 11:07:30 +09:00
Peter Eisentraut
f5aabe6d58 Revert "Make some use of anonymous unions [pgcrypto]"
This reverts commit efcd5199d8.

I rebased my patch series incorrectly.  This patch contained unrelated
parts from another patch, which made the overall build fail.  Revert
for now and reconsider.
2025-09-30 13:12:16 +02:00
Peter Eisentraut
efcd5199d8 Make some use of anonymous unions [pgcrypto]
Make some use of anonymous unions, which are allowed as of C11, as
examples and encouragement for future code, and to test compilers.

This commit changes some structures in pgcrypto.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/f00a9968-388e-4f8c-b5ef-5102e962d997%40eisentraut.org
2025-09-30 12:35:50 +02:00
Peter Eisentraut
57d46dff9b Make some use of anonymous unions [reorderbuffer xact_time]
Make some use of anonymous unions, which are allowed as of C11, as
examples and encouragement for future code, and to test compilers.

This commit changes the ReorderBufferTXN struct.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/f00a9968-388e-4f8c-b5ef-5102e962d997%40eisentraut.org
2025-09-30 12:35:50 +02:00
Álvaro Herrera
3bf31dd243
Do a tiny bit of header file maintenance
Stop including utils/relcache.h in access/genam.h, and stop including
htup_details.h in nodes/tidbitmap.h.  Both these files (genam.h and
tidbitmap.h) are widely used in other header files, so it's in our best
interest that they remain as lean as reasonable.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/202509291356.o5t6ny2hoa3q@alvherre.pgsql
2025-09-30 12:28:29 +02:00
Robert Haas
f2bae51dfd Keep track of what RTIs a Result node is scanning.
Result nodes now include an RTI set, which is only non-NULL when they
have no subplan, and is taken from the relid set of the RelOptInfo that
the Result is generating. ExplainPreScanNode now takes notice of these
RTIs, which means that a few things get schema-qualified in the
regression tests that previously did not. This makes the output more
consistent between cases where some part of the plan tree is replaced by
a Result node and those where this does not happen.

Likewise, pg_overexplain's EXPLAIN (RANGE_TABLE) now displays the RTIs
stored in a Result node just as it already does for other RTI-bearing
node types.

Result nodes also now include a result_reason, which tells us something
about why the Result node was inserted.  Using that information, EXPLAIN
now emits, where relevant, a "Replaces" line describing the origin of
a Result node.

The purpose of these changes is to allow code that inspects a Plan
tree to understand the origin of Result nodes that appear therein.

Discussion: http://postgr.es/m/CA+TgmoYeUZePZWLsSO+1FAN7UPePT_RMEZBKkqYBJVCF1s60=w@mail.gmail.com
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
2025-09-23 09:07:55 -04:00
David Rowley
9fc7f6ab72 Fix various incorrect filename references
Author: Chao Li <li.evan.chao@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2=hOBCPm-Z=F15twr_23XjHeoXSbifP5GdEdtWona97wQ@mail.gmail.com
2025-09-22 13:33:17 +12:00
Tom Lane
261f89a976 Track the maximum possible frequency of non-MCE array elements.
The lossy-counting algorithm that ANALYZE uses to identify most-common
array elements has a notion of cutoff frequency: elements with
frequency greater than that are guaranteed to be collected, elements
with smaller frequencies are not.  In cases where we find fewer MCEs
than the stats target would permit us to store, the cutoff frequency
provides valuable additional information, to wit that there are no
non-MCEs with frequency greater than that.  What the selectivity
estimation functions actually use the "minfreq" entry for is as a
ceiling on the possible frequency of non-MCEs, so using the cutoff
rather than the lowest stored MCE frequency provides a tighter bound
and more accurate estimates.

Therefore, instead of redundantly storing the minimum observed MCE
frequency, store the cutoff frequency when there are fewer tracked
values than we want.  (When there are more, then of course we cannot
assert that no non-stored elements are above the cutoff frequency,
since we're throwing away some that are; so we still use the
minimum stored frequency in that case.)

Notably, this works even when none of the values are common enough
to be called MCEs.  In such cases we previously stored nothing in
the STATISTIC_KIND_MCELEM pg_statistic slot, which resulted in the
selectivity functions falling back to default estimates.  So in that
case we want to construct a STATISTIC_KIND_MCELEM entry that contains
no "values" but does have "numbers", to wit the three extra numbers
that the MCELEM entry type defines.  A small obstacle is that
update_attstats() has traditionally stored a null, not an empty array,
when passed zero "values" for a slot.  That gives rise to an MCELEM
entry that get_attstatsslot() will spit up on.  The least risky
solution seems to be to adjust update_attstats() so that it will emit
a non-null (but possibly empty) array when the passed stavalues array
pointer isn't NULL, rather than conditioning that on numvalues > 0.
In other existing cases I don't believe that that changes anything.
For consistency, handle the stanumbers array the same way.

In passing, improve the comments in routines that use
STATISTIC_KIND_MCELEM data.  Particularly, explain why we use
minfreq / 2 not minfreq as the estimate for non-MCE values.

Thanks to Matt Long for the suggestion that we could apply this
idea even when there are more than zero MCEs.

Reported-by: Mark Frost <FROSTMAR@uk.ibm.com>
Reported-by: Matt Long <matt@mattlong.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/PH3PPF1C905D6E6F24A5C1A1A1D8345B593E16FA@PH3PPF1C905D6E6.namprd15.prod.outlook.com
2025-09-20 14:48:16 -04:00