Commit graph

28616 commits

Author SHA1 Message Date
Daniel Gustafsson
bf25e5571b Improve handling of concurrent checksum requests
When pg_{enable|disable}_data_checksums is called while checksums are
being enabled or disabled, the already running launcher is detected
and the new desired state is recorded.  Processing will then pick up
the new state and change its operation to fulfill the new request.
If the same state is requested but with different cost values, the
new cost values will take effect on the next relation processed.

The previous coding had a complex logic of starting a new launcher
for this, which is now avoided with the shared mem structure instead
used to signal current processing.

This makes the logic more robust, and fixes a bug where the launcher
would erroneously revert back to the "off" state.

Access to the shared memory is also protected with LWLocks in all
cases.  Since the shmem structure is used for signalling between
the worker and the launcher, and there can be only one of each,
there were no concurrency issues detected but it's better to stick
to proper locking protocol should this ever be updated to handle
multiple workers.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
2026-04-30 13:41:53 +02:00
Daniel Gustafsson
381d19da15 Typo and spelling fixups for online checksums
A collection of spelling, wording and punctuation fixups for the code
documentation from postcommit review.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
2026-04-30 13:41:50 +02:00
Daniel Gustafsson
25b922ec58 Fix invalid checksum state transition in checkpoints
Commit 78e950cb8 added checksum state handling to all XLOG_CHECKPOINT
records which caused unnecessary state transitions and emission of
procsignal barriers.  Remove as only the _REDO record need to handle
checksum state.  Barrier emission is also consistently made after
controlfile updates to avoid race conditions.

Additionally, interrupts are held between calling ProcSignalInit and
InitLocalDataChecksumState to remove a window where otherwise invalid
state transitions can happen.

Also remove a pointless assertion on Controlfile which will never hit.

Author: Tomas Vondra <tomas@vondra.me>
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
2026-04-30 13:41:48 +02:00
Daniel Gustafsson
8fb8ded889 Handle data_checksum state changes during launcher_exit
When erroring out from the datachecksums launcher during data checksum
enabling, before state has transitioned to "on", we revert back to the
"off" state.  Since checksums weren't enabled, there is no use staying
in an inprogress state since the checksum launcher currently doesn't
support restarting from where it left off.  Should restartability get
added in the future, this would need to be revisited.  This state
transition was however missing from the allowed transitions in the
statemachine causing an error.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
2026-04-30 13:41:46 +02:00
Daniel Gustafsson
b120358c61 Prevent pg_enable/disable_data_checksums() on standby
These functions missed a RecoveryInProgress() check, allowing them to
be called on a hot standby.  Enabling, or disabling, checksums on the
standby only would cause the cluster to get out of sync and replaying
checksum transitions to fail.

Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAHg+QDfRk4-S7DMmdbXJnQ-xF=sUpMAKuh8b83ObLqYVKx5QLA@mail.gmail.com
2026-04-30 13:41:41 +02:00
Amit Kapila
2bf6c9ff71 Fix double table_close of sequence_rel in copy_sequences().
sequence_rel was declared at batch scope, so when a row is skipped due to
concurrent drop or insufficient privileges, the end-of-row cleanup closes
the stale pointer from the previous row, tripping the relcache refcount
assertion.

Move sequence_rel inside the per-row loop.

Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/CAJTYsWWOuw-yfmzotV4jCJ6LLxEsb=STLcGtDYXOxRcU9Te3Pw@mail.gmail.com
2026-04-30 16:39:39 +05:30
Michael Paquier
5941e7f092 Fix errno check based on EINTR in pg_flush_data()
Upon a failure of sync_file_range(), EINTR was checked based on the
returned result of the routine rather than its errno.  sync_file_range()
returns -1 on failure, making the check a no-op, invalidating the retry
attempt in this case.

Oversight in 0d369ac650.

Author: DaeMyung Kang <charsyam@gmail.com>
Discussion: https://postgr.es/m/20260429151811.1810874-1-charsyam@gmail.com
Backpatch-through: 16
2026-04-30 18:44:38 +09:00
Michael Paquier
ac59a90bef Adjust some incorrect *GetDatum() macros
This reverts portions of commit 6dcfac9696, which is wrong in trying
to use a *GetDatum() that matches with the C types of the values read.
*GetDatum() should match with the output argument types of the SQL
functions.

The portions of 6dcfac9696 that are right regarding this rule are:
- gistget.c, where the GiST support functions use DatumGetUInt16() to
retrieve the strategy number.
- The BRIN code for strategynum, used in syscache lookups.

The adjustments done in this commit are for pageinspect, pg_buffercache
and pg_lock_status().

While double-checking the whole state of the tree regarding non-matching
pairs of DatumGet*() and *GetDatum(), I have found much more code paths
that are incorrect, unrelated to 6dcfac9696.  These may be adjusted in
the future, in a different patch (perhaps not for v19, as we are already
past feature freeze).

Reported-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/97f9375a-be61-4272-a44d-408337fe8fa6@eisentraut.org
Discussion: https://postgr.es/m/CAJ7c6TMcGu8qmRe1gZfJ-gOzVnZq-t=fwn-UuyStx1w6ZyydMw@mail.gmail.com
2026-04-30 13:10:19 +09:00
Michael Paquier
4bfd0f1b76 Fix error of pg_stat_reset_shared()
"lock" a values is supported since 4019f725f5, but the error message
of the function used when specifying an incorrect value forgot about it.

Author: Maksim Logvinenko <logvinenko-ms@yandex.ru>
Discussion: https://postgr.es/m/433431777389005@mail.yandex.ru
2026-04-30 11:12:56 +09:00
Peter Geoghegan
748d871b7c Fix nbtree skip array parallel alloc accounting.
btestimateparallelscan neglected to add btps_arrElems[] space overhead
for skip array scan keys that were later output by nbtree preprocessing.
Skip arrays don't actually need to use this space, but a scan with a
subsequent SAOP array will need to subscript btps_arrElems[] using a
simple so->arrayKeys[]-wise offset.  so->arrayKeys[] has entries for
both kinds of arrays.

As a result of this oversight, it was possible for an index scan with a
skip array and a lower-order SAOP array to write past the allocated
shared memory boundary when storing the SAOP array's cur_elem.  In
practice the problem seems to be limited to scans with many skipped
index columns, since our general approach to estimating the amount of
shared memory that will be required is fairly conservative.

To fix, have btestimateparallelscan request an extra sizeof(int) space
for key columns that might require a skip array later on.

Oversight in commit 92fe23d9, which added the nbtree skip scan
optimization.

Author: Siddharth Kothari <sidkot@google.com>
Discussion: https://postgr.es/m/CAGCUe0Lwk3C0qdkBa+OLpYc7yXwW=pbaz8Sju4xMXEQAmyp+5g@mail.gmail.com
Backpatch-through: 18
2026-04-29 11:22:23 -04:00
John Naylor
ca9807dfec Cosmetic fixes for radix sort
Do minor comment fixes and remove implicit cast to Datum.

While here, let's prefer crashing instead of entering an infinite
loop in case of future programming mistakes when computing next_level,
suggested by ChangAo Chen.

Discussion: https://postgr.es/m/tencent_49E3F11E74D8A584A2144ED532A490CBC40A@qq.com
2026-04-29 16:14:25 +07:00
John Naylor
a0302eac78 Remove unused ByteaSortSupport.abbreviate field
Oversight in commit 9303d62c6.

Author: Aleksander Alekseev <aleksander@tigerdata.com>
Discussion: https://postgr.es/m/CAJ7c6TOsKmmgyA6EwxKVsNeHFHrWXYdgZivgjo_ujf890BpeeA@mail.gmail.com
2026-04-29 13:57:07 +07:00
Amit Kapila
c210647aeb Fix xid_advance_interval when max_retention_duration is 0.
When a subscription has retain_dead_tuples enabled and maxretention is
zero (unlimited), adjust_xid_advance_interval() mistakenly caps
xid_advance_interval to zero.

This zero interval forces get_candidate_xid() to evaluate
TimestampDifferenceExceeds() as always true, causing the apply worker to
call GetOldestActiveTransactionId() for every WAL message. This
leads to unnecessary ProcArrayLock acquisitions.

Fix this by only capping the interval when maxretention > 0, allowing
the exponential back-off to function properly.

Author: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDdKVnCLHot=AcoPpEiSyDzGz7wGYjAFHVOw57oDtmUDWQ@mail.gmail.com
2026-04-28 14:51:38 +05:30
Amit Kapila
7424aac088 Fix wrong datum conversion for subretentionactive in CreateSubscription.
Use BoolGetDatum() instead of Int32GetDatum() when storing the boolean
subretentionactive column in pg_subscription. This was an oversight in
a850be2fe6.

Author: Lakshmi N <lakshmin.jhs@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Discussion: https://postgr.es/m/CA+3i_M98-XjE-_fw0p+8xOnw64y2_YLtJfcwvCfsVMn-z2ZjGg@mail.gmail.com
2026-04-28 13:13:47 +05:30
Álvaro Herrera
832e220d99
REPACK CONCURRENTLY: Don't use deferrable primary keys
Similarly to logical replication, REPACK CONCURRENTLY needs to ability
to reliably locate a tuple based on an identity.  A replica identity
index is okay.  Primary keys normally also are, except when they are
deferrable, because a tuple being modified might not yet be indexed,
causing REPACK to fail.

Change the REPACK CONCURRENTLY code to use GetRelationIdentityOrPK(),
similar to what the logical replication code does.  (Though we don't yet
support locating tuples based on arbitrary indexes for replica identity
FULL.)

While at it, add a few more test cases for situations that aren't
supported by REPACK, to improve coverage.

Author: Chao Li <lic@highgo.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Reviewed-by: Yuchen Li <liyuchen_xyz@163.com>
Discussion: https://postgr.es/m/10DD5E13-B45D-44F1-BE08-C63E00ABCAC0@gmail.com
2026-04-27 18:22:03 +02:00
Peter Eisentraut
33db6c4baf Fix DELETE/UPDATE FOR PORTION OF with rules
Previously, these test cases would give internal errors or crash.  The
fix is to add some missing fields of ForPortionOfExpr to
expression_tree_walker.

Author: jian he <jian.universality@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://postgr.es/m/CACJufxHs1Hs00EqsZ4NbuAjmYzMzjJyP1sAj12Ne=cBsEVmQOA@mail.gmail.com
2026-04-27 10:34:06 +02:00
Richard Guo
c66d6d19eb Fix bogus calls in remove_self_join_rel()
remove_self_join_rel() called adjust_relid_set() on all_result_relids
and leaf_result_relids but threw away the return value.  Since
adjust_relid_set() returns a freshly-built Relids and does not modify
the input in place, the calls did nothing.  This has been the case
since the SJE feature went in (commit fc069a3a6).

There has been no observable misbehavior, because the relid being
passed is guaranteed not to be a member of either set.  At the point
remove_self_join_rel() runs, those sets contain only resultRelation;
inheritance children have not been added yet, as that happens later in
query_planner(), in expand_single_inheritance_child() called from
add_other_rels_to_query().  And remove_self_joins_recurse() rejects
parse->resultRelation as an SJE candidate to preserve the EvalPlanQual
mechanism.  Even with the result assigned, the calls would be no-ops
in practice.

Rather than make the calls do the cleanup they pretend to do, replace
them with assertions of the invariant.  Any future loosening of the
SJE candidate filter -- for instance to allow eliminating a result
relation under provable conditions -- will trip the assertion and
force whoever does it to revisit this code.

Additionally, decorate adjust_relid_set() with pg_nodiscard so that
any future accidental discard of its return value is caught at compile
time.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49fYQcqJfJ_Gtn8r1GFNoYtb1=2AUab4ieuqY4Zid9ocQ@mail.gmail.com
2026-04-27 10:40:37 +09:00
Michael Paquier
b801d5eef1 Fix some memory leaks in the WAL receiver
These are old leaks, that can pile up if a WAL receiver stays alive,
waiting for new WAL data after the sender has switched to a new
timeline.

While this is technically a bug, the impact is minimal and would only
become noticeable if the WAL sender handles a lot of timeline switches,
so no backpatch is done.  Note that in most cases, primary_conninfo
would be updated in a standby to point to a new sender, meaning a
restart of the WAL receiver.  Let's be clean on HEAD, though.

Author: DaeMyung Kang <charsyam@gmail.com>
Discussion: https://postgr.es/m/20260426170100.847923-1-charsyam@gmail.com
Discussion: https://postgr.es/m/20260426170219.849330-1-charsyam@gmail.com
2026-04-27 10:32:45 +09:00
Peter Eisentraut
9d2979dd68 pg_get_viewdef() and lateral references in COLUMNS of GRAPH_TABLE
Expressions in GRAPH_TABLE COLUMNS list may have lateral references.
get_rule_expr() requires lateral namespaces to deparse such
references.  get_from_clause_item() does not pass them when processing
the expressions in COLUMNS list causing ERROR "bogus varlevelsup: 0
offset 0".  Fix get_from_clause_item() to pass input deparse_context
containing lateral namespaces to get_rule_expr() instead of the dummy
context.

Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHg%2BQDcLVa2iBnggkHxY4itZbXtDMfsYHEjnCUYe9hNbnxDi-w%40mail.gmail.com
2026-04-24 09:12:03 +02:00
Peter Eisentraut
ac3bcc041c Fix collation of expressions in GRAPH_TABLE COLUMNS clause
GRAPH_TABLE clause is converted into a rangetable entry, which is
ignored by assign_query_collations().  Hence we assign collations
while transforming its parts.  But expressions in COLUMNS clause
missed that treatment, so fix that.

While at it, also add comments about collation assignment to the parts
of GRAPH_TABLE clause, and also fix a small grammar issue.

Reported-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAHg+QDc4aaiufYSgrwMMPMMRTPtQ66SghcrPFbWJFZMqNaG+BA@mail.gmail.com
2026-04-24 08:43:26 +02:00
Peter Eisentraut
9082680c34 Fix typos and grammar in graph table rewrite code
Reported-by: Lakshmi N <lakshmin.jhs@gmail.com>
Author: Lakshmi N <lakshmin.jhs@gmail.com>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/CA+3i_M9gpUGjH-BkJk=UFjK16jq9fEQHpmZ1cxpJO+xM4hWC+A@mail.gmail.com
2026-04-24 08:27:04 +02:00
Peter Eisentraut
2ff289d039 Check for stack overflow when rewriting graph queries
generate_queries_for_path_pattern_recurse() and
generate_setop_from_pathqueries() are recursive functions.  For a
property graph with hundreds of tables, a graph pattern with a handful
element patterns can cause stack overflow.  Fix it by calling
check_stack_depth() at the beginning of these functions.

Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAHg+QDfgK0xddH8f3eAb+UVn7sBDOnv8RvM6OkP4HtHAt6aD7w@mail.gmail.com
2026-04-24 08:18:21 +02:00
David Rowley
94219a73f7 Fix incorrect logic for hashed IN / NOT IN with non-strict operators
ExecEvalHashedScalarArrayOp(), when using a strict equality function,
performs a short-circuit when looking up NULL values.  When the function
is non-strict, the code incorrectly looked up the hash table for a
zero-valued Datum, which could have resulted in an accidental true
return if the hash table contained zero valued Datum, or could result
in a crash for non-byval types.

Here we fix this by adding an extra step when we build the hash table to
check what the result of a NULL lookup would be.  This requires looping
over the array and checking what the non-hashed version of the code
would do.  We cache the results of that in the expression so that we can
reuse the result any time we're asked to search for a NULL value.

It's important to note that non-strict equality functions are free to
treat any NULL value as equal to any non-NULL value.  For example,
someone may wish to design a type that treats an empty string and NULL
as equal.

All built-in types have strict equality functions, so this could affect
custom / user-defined types.

Author: Chengpeng Yan <chengpeng_yan@outlook.com>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: ChangAo Chen <cca5507@qq.com>
Discussion: https://postgr.es/m/A16187AE-2359-4265-9F5E-71D015EC2B2D@outlook.com
Backpatch-through: 14
2026-04-24 14:03:12 +12:00
Heikki Linnakangas
713bce9484 Don't call CheckAttributeType() with InvalidOid on dropped cols
If CheckAttributeType() is called with InvalidOid, it performs a bunch
of pointless, futile syscache lookups with InvalidOid, but ultimately
tolerates it and has no effect. We were calling it with InvalidOid on
dropped columns, but it seems accidental that it works, so let's stop
doing it.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/93ce56cd-02a6-4db1-8224-c8999372facc@iki.fi
Backpatch-through: 14
2026-04-23 21:28:26 +03:00
Heikki Linnakangas
dd40691976 Don't allow composite type to be member of itself via multirange
CheckAttributeType() checks that a composite type is not made a member
of itself with ALTER TABLE ADD COLUMN or ALTER TYPE ADD ATTRIBUTE,
even indirectly via a domain, array, another composite type or a range
type. But it missed checking for multiranges. That was a simple
oversight when multiranges were added.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/93ce56cd-02a6-4db1-8224-c8999372facc@iki.fi
Backpatch-through: 14
2026-04-23 21:28:11 +03:00
David Rowley
4f0cbc6fb5 Fix new-to-v19 -Wshadow warnings
There's some talk about upgrading our current -Wshadow=compatible-local
up to -Wshadow.  There's some pending questions as to whether the churn
and extra backpatching pain are worthwhile for doing all of them.  We
can't use the latter argument for ones that are new to v19, providing we
fix them now.  So let's fix those ones so that the problem is not any
worse for if we decide to fix the remainder for v20.

Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Yuchen Li <liyuchen_xyz@163.com>
Discussion: https://postgr.es/m/CAApHDvp=rx5GxM=yW8QhFF3noXtYt7LkOxJ7zkaPOzpti4Gm8w@mail.gmail.com
2026-04-23 16:49:29 +12:00
Jeff Davis
dbf217c1c7 catcache.c: use C_COLLATION_OID for texteqfast/texthashfast.
The problem report was about setting GUCs in the startup packet for a
physical replication connection. Setting the GUC required an ACL
check, which performed a lookup on pg_parameter_acl.parname. The
catalog cache was hardwired to use DEFAULT_COLLATION_OID for
texteqfast() and texthashfast(), but the database default collation
was uninitialized because it's a physical walsender and never connects
to a database. In versions 18 and later, this resulted in a NULL
pointer dereference, while in version 17 it resulted in an ERROR.

As the comments stated, using DEFAULT_COLLATION_OID was arbitrary
anyway: if the collation actually mattered, it should have used the
column's actual collation. (In the catalog, some text columns are the
default collation and some are "C".)

Fix by using C_COLLATION_OID, which doesn't require any initialization
and is always available. When any deterministic collation will do,
it's best to consistently use the simplest and fastest one, so this is
a good idea anyway.

Another problem was raised in the thread, which this commit doesn't
fix (see second discussion link).

Reported-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/D18AD72A-5004-4EF8-AF80-10732AF677FA@yandex-team.ru
Discussion: https://postgr.es/m/4524ed61a015d3496fc008644dcb999bb31916a7.camel%40j-davis.com
Backpatch-through: 17
2026-04-22 10:22:44 -07:00
Peter Geoghegan
d14f69a32a Harmonize function parameter names for Postgres 19.
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in a few places.  Most of
these inconsistencies were introduced during Postgres 19 development.

This commit was written with help from clang-tidy, by mechanically
applying the same rules as similar clean-up commits (the earliest such
commit was commit 035ce1fe).
2026-04-22 12:47:19 -04:00
Tom Lane
a50777680f Guard against overly-long numeric formatting symbols from locale.
to_char() allocates its output buffer with 8 bytes per formatting
code in the pattern.  If the locale's currency symbol, thousands
separator, or decimal or sign symbol is more than 8 bytes long,
in principle we could overrun the output buffer.  No such locales
exist in the real world, so it seems sufficient to truncate the
symbol if we do see it's too long.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/638232.1776790821@sss.pgh.pa.us
Backpatch-through: 14
2026-04-22 12:41:00 -04:00
Tom Lane
d7970e7e95 Prevent some buffer overruns in spell.c's parsing of affix files.
parse_affentry() and addCompoundAffixFlagValue() each collect fields
from an affix file into working buffers of size BUFSIZ.  They failed
to defend against overlength fields, so that a malicious affix file
could cause a stack smash.  BUFSIZ (typically 8K) is certainly way
longer than any reasonable affix field, but let's fix this while
we're closing holes in this area.

I chose to do this by silently truncating the input before it can
overrun the buffer, using logic comparable to the existing logic in
get_nextfield().  Certainly there's at least as good an argument for
raising an error, but for now let's follow the existing precedent.

Reported-by: Igor Stepansky <igor.stepansky@orca.security>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/864123.1776810909@sss.pgh.pa.us
Backpatch-through: 14
2026-04-22 12:02:15 -04:00
Tom Lane
844bb90d49 Prevent buffer overrun in spell.c's CheckAffix().
This function writes into a caller-supplied buffer of length
2 * MAXNORMLEN, which should be plenty in real-world cases.
However a malicious affix file could supply an affix long
enough to overrun that.  Defend by just rejecting the match
if it would overrun the buffer.  I also inserted a check of
the input word length against Affix->replen, just to be sure
we won't index off the buffer, though it would be caller error
for that not to be true.

Also make the actual copying steps a bit more readable, and remove
an unnecessary requirement for the whole input word to fit into the
output buffer (even though it always will with the current caller).

The lack of documentation in this code makes my head hurt, so
I also reverse-engineered a basic header comment for CheckAffix.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/641711.1776792744@sss.pgh.pa.us
Backpatch-through: 14
2026-04-22 10:47:56 -04:00
Alexander Korotkov
713e553e32 Preserve extension dependencies on indexes during partition merge/split
When using ALTER TABLE ... MERGE PARTITIONS or ALTER TABLE ... SPLIT
PARTITION, extension dependencies on partition indexes were being lost.
This happened because the new partition indexes are created fresh from
the parent partitioned table's indexes, while the old partition indexes
(with their extension dependencies) are dropped.

Fix this by collecting extension dependencies from source partition
indexes before detaching them, then applying those dependencies to the
corresponding new partition indexes after they're created.  The mapping
between old and new indexes is done via their common parent partitioned
index.

For MERGE operations, all source partition indexes sharing a parent
partitioned index must have the same extension dependencies; if they
differ, an error naming both conflicting partition indexes is raised.
The check is implemented by collecting one entry per partition index,
sorting by parent index OID, and comparing adjacent entries in a single
pass.  This is order-independent: the same set of partitions produces
the same decision regardless of the order they are listed in the MERGE
command, and subset mismatches are caught in both directions.

For SPLIT operations, the new partition indexes simply inherit all
extension dependencies from the source partition's index.

The regression tests exercising this feature live under
src/test/modules/test_extensions, where the test_ext3 and test_ext5
extensions are available; core regression tests cannot assume any
particular extension is installed.

Author: Matheus Alcantara <matheusssilv97@gmail.com>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Reported-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Dmitry Koval <d.koval@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/CALdSSPjXtzGM7Uk4fWRwRMXcCczge5uNirPQcYCHKPAWPkp9iQ%40mail.gmail.com
2026-04-22 14:34:20 +03:00
Dean Rasheed
5548a969b6 Fix UPDATE/DELETE ... WHERE CURRENT OF on a table with virtual columns.
Formerly, attempting to use WHERE CURRENT OF to update or delete from
a table with virtual generated columns would fail with the error
"WHERE CURRENT OF on a view is not implemented".

The reason was that the check preventing WHERE CURRENT OF from being
used on a view was in replace_rte_variables_mutator(), which presumed
that the only way it could get there was as part of rewriting a query
on a view. That is no longer the case, since replace_rte_variables()
is now also used to expand the virtual generated columns of a table.

Fix by doing the check for WHERE CURRENT OF on a view at parse time.
This is safe, since it is no longer possible for the relkind to change
after the query is parsed (as of b23cd185f).

Reported-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDc_TwzSgb=B_QgNLt3mvZdmRK23rLb+RkanSQkDF40GjA@mail.gmail.com
Backpatch-through: 18
2026-04-22 11:50:17 +01:00
Dean Rasheed
7834251758 Fix expansion of EXCLUDED virtual generated columns.
If the SET or WHERE clause of an INSERT ... ON CONFLICT command
references EXCLUDED.col, where col is a virtual generated column, the
column was not properly expanded, leading to an "unexpected virtual
generated column reference" error, or incorrect results.

The problem was that expand_virtual_generated_columns() would expand
virtual generated columns in both the SET and WHERE clauses and in the
targetlist of the EXCLUDED pseudo-relation (exclRelTlist). Then
fix_join_expr() from set_plan_refs() would turn the expanded
expressions in the SET and WHERE clauses back into Vars, because they
would be found to match the expression entries in the indexed tlist
produced from exclRelTlist.

To fix this, arrange for expand_virtual_generated_columns() to not
expand virtual generated columns in exclRelTlist. This forces
set_plan_refs() to resolve generation expressions in the query using
non-virtual columns, as required by the executor.

In addition, exclRelTlist now always contains only Vars. That was
something already claimed in a couple of existing comments in the
planner, which relied on that fact to skip some processing, though
those did not appear to constitute active bugs.

Reported-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDf7wTLz_vqb1wi1EJ_4Uh+Vxm75+b4c-Ky=6P+yOAHjbQ@mail.gmail.com
Backpatch-through: 18
2026-04-22 09:03:44 +01:00
Amit Langote
1b9dc2cb75 Fix some const qualifier use in ri_triggers.c
The ri_FetchConstraintInfo() and ri_LoadConstraintInfo() functions
were declared to return const RI_ConstraintInfo *, but callers
sometimes need to modify the struct, requiring casts to drop the
const.  Remove the misapplied const qualifiers and the casts that
worked around them.

Reported-by: Peter Eisentraut <peter@eisentraut.org>
Author: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/548600ed-8bbb-4e50-8fc3-65091b122276@eisentraut.org
2026-04-22 11:36:54 +09:00
Michael Paquier
9d3e094f12 Allow ALTER INDEX .. ATTACH PARTITION to validate a parent index
This commit tweaks ALTER INDEX .. ATTACH PARTITION to attempt a
validation of a parent index in the case where an index is already
attached but the parent is not yet valid.  This occurs in cases where a
parent index was created invalid such as with CREATE INDEX ONLY, but was
left invalid after an invalid child index was attached (partitioned
indexes set indisvalid to false if at least one partition is
!indisvalid, indisvalid is true in a partitioned table iff all
partitions are indisvalid).  This could leave a partition tree in a
situation where a user could not bring the parent index back to valid
after fixing the child index, as there is no built-in mechanism to do
so.  This commit relies on the fact that repeated ATTACH PARTITION
commands on the same index silently succeed.

An invalid parent index is more than just a passive issue.  It causes
for example ON CONFLICT on a partitioned table if the invalid parent
index is used to enforce a unique constraint.

Some test cases are added to track some of problematic patterns, using a
set of partition trees with combinations of invalid indexes and ATTACH
PARTITION.

Reported-by: Mohamed Ali <moali.pg@gmail.com>
Author: Sami Imseih <sanmimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Discussion: http://postgr.es/m/CAGnOmWqi1D9ycBgUeOGf6mOCd2Dcf=6sKhbf4sHLs5xAcKVCMQ@mail.gmail.com
Backpatch-through: 14
2026-04-22 10:32:10 +09:00
Melanie Plageman
31b0544b32 bufmgr: use I/O stats arguments in FlushUnlockedBuffer()
FlushUnlockedBuffer() accepted io_object and io_context arguments but
hardcoded IOOBJECT_RELATION and IOCONTEXT_NORMAL when calling
FlushBuffer(). Pass them through instead. Also fix FlushBuffer() to use
its io_object parameter for I/O timing stats rather than hardcoding
IOOBJECT_RELATION.

Not actively broken since all current callers pass IOOBJECT_RELATION and
IOCONTEXT_NORMAL, so not backpatched.

Author: Chao Li <lic@highgo.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/BC97546F-5C15-42F2-AD57-CFACDB9657D0@gmail.com
2026-04-21 17:47:50 -04:00
Melanie Plageman
da6874635d Make local buffers pin limit more conservative
GetLocalPinLimit() and GetAdditionalLocalPinLimit(), currently in use
only by the read stream, previously allowed a backend to pin all
num_temp_buffers local buffers. This meant that the read stream could
use every available local buffer for read-ahead, leaving none for other
concurrent pin-holders like other read streams and related buffers like
the visibility map buffer needed during on-access pruning.

This became more noticeable since b46e1e54d0, which allows on-access
pruning to set the visibility map, which meant that some scans also
needed to pin a page of the VM. It caused a test in
src/test/regress/sql/temp.sql to fail in some cases.

Cap the local pin limit to num_temp_buffers / 4, providing some
headroom. This doesn't guarantee that all needed pins will be available
— for example, a backend can still open more cursors than there are
buffers — but it makes it less likely that read-ahead will exhaust the
pool.

Note that these functions are not limited by definition to use in the
read stream; however, this cap should be appropriate in other contexts.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/97529f5a-ec10-46b1-ab50-4653126c6889%40gmail.com
2026-04-21 11:03:05 -04:00
Tom Lane
1cd3cd372a Remove gen_node_support.pl's ad-hoc ABI stability check.
We installed this in commit eea9fa9b2 to protect against foreseeable
mistakes that would break ABI in stable branches by renumbering
NodeTag enum entries.  However, we now have much more thorough
ABI stability checks thanks to buildfarm members using libabigail
(see the .abi-compliance-history mechanism).  So this incomplete,
single-purpose check seems like an anachronism.  I wouldn't object
to keeping it were it not that it requires an additional manual step
when making a new stable git branch.  That seems like something easy
to screw up, so let's get rid of it.

This patch just removes the logic that checks for changes in the last
auto-assigned NodeTag value.  We still need eea9fa9b2's cross-check
on the supplied list of header files, to prevent divergence between
the makefile and meson build systems.  We'll also sometimes need the
nodetag_number() infrastructure for hand-assigning new NodeTags in
stable branches.

Discussion: https://postgr.es/m/1458883.1776143073@sss.pgh.pa.us
2026-04-21 10:58:00 -04:00
Michael Paquier
d3bba04154 Fix a set of typos and grammar issues across the tree
This batch is similar to 462fe0ff62 and addresses a variety of code
style issues, including grammar mistakes, typos, inconsistent variable
names in function declarations, and incorrect function names in comments
and documentation.  These fixes have accumulated on the community
mailing lists since the commit mentioned above.

Notably, Alexander Lakhin previously submitted a patch identifying many
of the trivial typos and grammar issues that had been reported on
pgsql-hackers.  His patch covered a somewhat large portion of the issues
addressed here, though not all of them.

The documentation changes only affect HEAD.
2026-04-21 14:46:22 +09:00
Richard Guo
c6a79be3f3 Fix incorrect NEW references to generated columns in rule rewriting
When a rule action or rule qualification references NEW.col where col
is a generated column (stored or virtual), the rewriter produces
incorrect results.

rewriteTargetListIU removes generated columns from the query's target
list, since stored generated columns are recomputed by the executor
and virtual ones store nothing.  However, ReplaceVarsFromTargetList
then cannot find these columns when resolving NEW references during
rule rewriting.  For UPDATE, the REPLACEVARS_CHANGE_VARNO fallback
redirects NEW.col to the original target relation, making it read the
pre-update value (same as OLD.col).  For INSERT,
REPLACEVARS_SUBSTITUTE_NULL replaces it with NULL.  Both are wrong
when the generated column depends on columns being modified.

Fix by building target list entries for generated columns from their
generation expressions, pre-resolving the NEW.attribute references
within those expressions against the query's targetlist, and passing
them together with the query's targetlist to ReplaceVarsFromTargetList.

Back-patch to all supported branches.  Virtual generated columns were
added in v18, so the back-patches in pre-v18 branches only handle
stored generated columns.

Reported-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDexGTmCZzx=73gXkY2ZADS6LRhpnU+-8Y_QmrdTS6yUhA@mail.gmail.com
Backpatch-through: 14
2026-04-21 14:28:26 +09:00
Michael Paquier
9b43e6793b Fix orphaned processes when startup process fails during PM_STARTUP
When the startup process exists with a FATAL error during PM_STARTUP,
the postmaster called ExitPostmaster() directly, assuming that no other
processes are running at this stage.  Since 7ff23c6d27, this
assumption is not true, as the checkpointer, the background writer, the
IO workers and bgworkers kicking in early would be around.

This commit removes the startup-specific shortcut happening in
process_pm_child_exit() for a failing startup process during PM_STARTUP,
falling down to the existing exit() flow to signal all the started
children with SIGQUIT, so as we have no risk of creating orphaned
processes.

This required an extra change in HandleFatalError() for v18 and newer
versions, as an assertion could be triggered for PM_STARTUP.  It is now
incorrect.  In v17 and older versions, HandleChildCrash() needs to be
changed to handle PM_STARTUP so as children can be waited on.

While on it, fix a comment at the top of postmaster.c.  It was claiming
that the checkpointer and the background writer were started after
PM_RECOVERY.  That is not the case.

Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAJTYsWVoD3V9yhhqSae1_wqcnTdpFY-hDT7dPm5005ZFsL_bpA@mail.gmail.com
Backpatch-through: 15
2026-04-21 09:39:59 +09:00
Tom Lane
f0ac6d494b Fix relid-set clobber during join removal.
Commit cfcd57111 et al fell over under Valgrind testing.
(It seems to be enough to #define USE_VALGRIND, you don't actually
need to run it under Valgrind to see failures.)  The cause is that
remove_rel_from_eclass updates each EquivalenceMember's em_relids,
and those can be aliases of the left_relids or right_relids of some
RestrictInfo in ec_sources.  If the update made em_relids empty then
bms_del_member will have pfree'd the relid set, so that the subsequent
attempt to clean up ec_sources accesses already-freed memory.

We missed seeing ill effects before cfcd57111 because (a) if the
pfree happens then we will remove the EquivalenceMember altogether,
making the source RestrictInfo no longer of use, and (b) the
cleanup of ec_sources didn't touch left/right_relids before that.

I'm unclear though on how cfcd57111 managed to pass non-USE_VALGRIND
testing.  Apparently we managed to store another Bitmapset into the
freed space before trying to access it, but you'd not think that would
happen 100% of the time.  I think what USE_VALGRIND changes is that it
makes list.c much more memory-hungry, so that the freed space gets
claimed by some List node before a Bitmapset can be put there.

This failure can be seen in v16, v17, and master, but oddly enough not
v18.  That's because the SJE patch replaced the simple bms_del_members
calls used here with adjust_relid_set, which is careful not to
scribble on its input.  But commit 20efbdffe just recently put back
the old coding and thus resurrected the problem.

Discussion: https://postgr.es/m/458729.1776724816@sss.pgh.pa.us
Backpatch-through: 16, 17, master
2026-04-20 19:24:52 -04:00
Jeff Davis
bdcb85b56a Fix callers of unicode_strtitle() using srclen == -1.
Currently, only called that way in tests, which failed to fail.

Discussion: https://postgr.es/m/581a72ff452bb045ba83bbe3c6cf4467702d4f0f.camel@j-davis.com
Backpatch-through: 18
2026-04-20 14:44:08 -07:00
Tom Lane
cfcd571116 Clean up all relid fields of RestrictInfos during join removal.
The original implementation of remove_rel_from_restrictinfo()
thought it could skate by with removing no-longer-valid relid
bits from only the clause_relids and required_relids fields.
This is quite bogus, although somehow we had not run across a
counterexample before now.  At minimum, the left_relids and
right_relids fields need to be fixed because they will be
examined later by clause_sides_match_join().  But it seems
pretty foolish not to fix all the relid fields, so do that.

This needs to be back-patched as far as v16, because the
bug report shows a planner failure that does not occur
before v16.  I'm a little nervous about back-patching,
because this could cause unexpected plan changes due to
opening up join possibilities that were rejected before.
But it's hard to argue that this isn't a regression.  Also,
the fact that this changes no existing regression test results
suggests that the scope of changes may be fairly narrow.
I'll refrain from back-patching further though, since no
adverse effects have been demonstrated in older branches.

Bug: #19460
Reported-by: François Jehl <francois.jehl@pigment.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/19460-5625143cef66012f@postgresql.org
Backpatch-through: 16
2026-04-20 14:48:23 -04:00
Tom Lane
207cb2abcb Make ExecForPortionOfLeftovers() obey SRF protocol.
Before each call to the SRF, initialize isnull and isDone, as per the
comments for struct ReturnSetInfo.  This fixes a Coverity warning
about rsi.isDone not being initialized.  The built-in
{multi,}range_minus_multi functions don't return without setting it,
but a user-supplied function might not be as accommodating.

We also add statistics tracking around the function call, which
will be expected once user-defined withoutPortionProcs functions
are supported, and a cross-check on rsi.returnMode just for
paranoia's sake.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://postgr.es/m/4126231.1776622202@sss.pgh.pa.us
2026-04-20 10:21:52 -04:00
Álvaro Herrera
5dbb63fc82
REPACK: do not require REPLICATION or LOGIN
Although REPACK (CONCURRENTLY) uses replication slots, there is no
concern that the slot will leak data of other users, because the
MAINTAIN privilege on the table is required anyway; requiring
REPLICATION is user-unfriendly without providing any actual protection.

A related aspect is that the REPLICATION attribute is not needed to
prevent REPACK from stealing slots from logical replication, since
commit e76d8c749c made REPACK use a separate pool of replication
slots.

Similarly, there's no reason to require that the table owner has the
LOGIN privilege.  Bypass the default behavior in the background worker
launch sequence.

Because there are now successful concurrent repack runs in the
regression tests, we're forced to run test_plan_advice under
wal_level=replica, so add that.  Also, move the cluster.sql test to a
different parallel group in parallel_schedule: apparently the use of the
repack worker causes it to exceed the maximum limit of processes in some
runs (the actual limit reached is the number of XIDs in a snapshot's xip
array).

Author: Antonin Houska <ah@cybertec.at>
Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/aeJHPNmL4vVy3oPw@pryzbyj2023
2026-04-20 15:44:23 +02:00
Richard Guo
20efbdffeb Clean up remove_rel_from_query() after self-join elimination commit
The self-join elimination (SJE) commit grafted self-join removal onto
remove_rel_from_query(), which was originally written for left-join
removal only.  This resulted in several issues:

- Comments throughout remove_rel_from_query() still assumed only
  left-join removal, making the code misleading.

- ChangeVarNodesExtended was called on phv->phexpr with subst=-1
  during left-join removal, which is pointless and confusing since any
  surviving PHV shouldn't reference the removed rel.

- phinfo->ph_lateral was adjusted for left-join removal, which is
  unnecessary since the removed relid cannot appear in ph_lateral for
  outer joins.

- The comment about attr_needed reconstruction was in
  remove_rel_from_query(), but the actual rebuild is performed by the
  callers.

- EquivalenceClass processing in remove_rel_from_query() is redundant
  for self-join removal, since the caller (remove_self_join_rel)
  already handles ECs via update_eclasses().

- In remove_self_join_rel(), ChangeVarNodesExtended was called on
  root->processed_groupClause, which contains SortGroupClause nodes
  that have no Var nodes to rewrite.  The accompanying comment
  incorrectly mentioned "HAVING clause".

This patch fixes all these issues, clarifying the separation between
left-join removal and self-join elimination code paths within
remove_rel_from_query().  The resulting code is also better structured
for adding new types of join removal (such as inner-join removal) in
the future.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://postgr.es/m/CAMbWs48JC4OVqE=3gMB6se2WmRNNfMyFyYxm-09vgpm+Vwe8Hg@mail.gmail.com
2026-04-20 17:00:22 +09:00
Peter Eisentraut
04f9ea372a Add missing Datum conversions
Similar to commit ff89e182d4, for new code added since.
2026-04-20 07:22:16 +02:00
Peter Eisentraut
5936afe1ee Fix incorrect format placeholders 2026-04-20 07:09:13 +02:00
Amit Kapila
090c4297e4 Flush statistics during idle periods in parallel apply worker.
Parallel apply workers previously failed to report statistics while
waiting for new work in the main loop. This resulted in the stats from the
most recent transaction remaining unbuffered, leading to arbitrary
reporting delays—particularly when streamed transactions were infrequent.

This commit ensures that statistics are explicitly flushed when the worker
is idle, providing timely visibility into accumulated worker activity.

Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 16, where it was introduced
Discussion: https://postgr.es/m/TYRPR01MB1419579F217CC4332B615589594202@TYRPR01MB14195.jpnprd01.prod.outlook.com
2026-04-20 10:31:11 +05:30
Peter Eisentraut
9018c7d37b Fix 64-bit shifting in dynahash.c
The switch from long to int64 in commit 13b935cd52 was incomplete.
It was shifting the constant 1L, which is not always 64 bit.  Fix by
using an explicit int64 constant.

MSVC warning:

../src/backend/utils/hash/dynahash.c(1767): warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)

Also add the corresponding warning to the standard warning set on
MSVC, to help catch similar issues in the future.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/1142ad86-e475-41b3-aeee-c6ad913064fa%40eisentraut.org
2026-04-19 13:27:54 +02:00
Amit Langote
cda0c4c5d6 Reject invalid databases in pg_get_database_ddl()
An invalid database has datconnlimit set to -2.  pg_get_database_ddl()
emits this verbatim as CONNECTION LIMIT = -2, which ALTER DATABASE
rejects.  Error out early instead.

Reported-by: Lakshmi N <lakshmin.jhs@gmail.com>
Author: Lakshmi N <lakshmin.jhs@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Hu Xunqi <huxunqi.08@gmail.com>
Discussion: https://postgr.es/m/CA+3i_M8m1k2gFch+tU0JmAQh9FRV+pFrfTXDrJo+BqmwsTmOhg@mail.gmail.com
2026-04-17 13:19:56 +09:00
Álvaro Herrera
05c401d578
Add missing initialization
The backend running REPACK can check DecodingWorkerShared->initialized
before the worker could have the chance to initialize it, possibly
leading to wrong behavior.

While at it, remove DecodingWorkerShared->worker_dsm_segment, because
that doesn't actually need to be in shared memory; a simple local-memory
global variable is enough.

Oversights in commit 28d534e2ae.

Author: Antonin Houska <ah@cybertec.at>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/18181295-8375-4789-ad32-269d78d6001e@gmail.com
2026-04-16 22:27:04 +02:00
Melanie Plageman
b4c1b2be30 Update FSM during prune/freeze replay even if freespace is zero
add323da40 started updating the visibility map in the same WAL record
as pruning and freezing. This included updating the freespace map during
replay of a record setting the VM, which we've done since ab7dbd681.

add323da40, however, conditioned doing so on there being > 0 freespace
on the page, which differed from the previous state for records updating
the VM.

The FSM is not WAL-logged and is instead updated heuristically on
standbys. In rare cases, this heuristic could lead to pages with 0
freespace having outdated entries in the FSM. If the standby is later
promoted and vacuum skips these pages because they are marked
all-visible/all-frozen, overly optimistic values would be propagated up
the FSM tree, causing slowness when searching for freespace for new
tuples.

Fix it by always updating the FSM during replay when setting VM bits.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Alexey Makhmutov <a.makhmutov@postgrespro.ru>
Discussion: https://postgr.es/m/ead2f110-c736-48f5-99e1-023dc9acbf0b%40postgrespro.ru
2026-04-16 12:10:47 -04:00
Fujii Masao
2fd84e2226 Use XLogRecPtrIsValid() consistently for WAL position checks
Commit a2b02293bc switched various checks to use XLogRecPtrIsValid(),
but later changes reintroduced XLogRecPtrIsInvalid() and direct comparisons
with InvalidXLogRecPtr.

This commit replaces those uses with XLogRecPtrIsValid() for better
readability and consistency.

Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Xiaopeng Wang <wxp_728@163.com>
Reviewed-by: Amul Sul <sulamul@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CALDaNm16knMFtcqyAG3XYSkyagmVXfhaR0T=hau8UTAU0+eLQQ@mail.gmail.com
2026-04-16 23:02:34 +09:00
Peter Eisentraut
c86d2ccdb3 Add missing include
"utils/pg_locale.h" is needed when under MSVC for wchar2char(),
introduced by commit 65707ed9af.  Surprisingly, MSVC doesn't warn by
default about calling undeclared functions.  This will be addressed in
a separate commit.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/1142ad86-e475-41b3-aeee-c6ad913064fa%40eisentraut.org
2026-04-16 09:35:05 +02:00
Amit Langote
b5062a4e57 Fix incorrect comment in JsonTablePlanJoinNextRow()
The comment on the return-false path when both UNION siblings are
exhausted said "there are more rows," which is the opposite of what
the code does. The code itself is correct, returning false to signal
no more rows, but the misleading comment could tempt a reader into
"fixing" the return value, which would cause UNION plans to loop
indefinitely.

Back-patch to 17, where JSON_TABLE was introduced.

Author: Chuanwen Hu <463945512@qq.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/tencent_4CC6316F02DECA61ACCF22F933FEA5C12806@qq.com
Backpatch-through: 17
2026-04-16 13:45:33 +09:00
Fujii Masao
ee550254a2 Use proc_exit() for walreceiver exit in WalRcvWaitForStartPosition()
Previously, when the walreceiver exited from WalRcvWaitForStartPosition()
at the startup process's request, it called exit(1) directly. This could
skip cleanup performed by the callback functions.

This commit makes the walreceiver to use proc_exit() instead, ensuring
normal cleanup is executed on exit.

Also this commit updates comments describing walreceiver termination.

Apply to master only, as this has not caused practical issues so far.

Author: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/74381238-4E8A-4621-B794-57025DCCE0BA@gmail.com
2026-04-16 12:33:17 +09:00
Andrew Dunstan
f30d0c720f Fix COPY TO FORMAT JSON to exclude generated columns.
COPY TO with FORMAT json was including generated columns in the
output, unlike TEXT and CSV formats.  Virtual generated columns
appeared as null, and stored ones showed their computed values.

The JSON code path only built a restricted TupleDesc when an explicit
column list was given (attnamelist != NIL), but CopyGetAttnums()
also excludes generated columns from the default list.  Fix by
checking whether the attnumlist is shorter than the full TupleDesc
instead.

Bug introduced in 7dadd38cda.

Author: Satya Narlapuram <satya.narlapuram@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDcfpGDoPL3fvfjXRtfn=fny6DdJR6BAy6TpS1Xj2EZfXA@mail.gmail.com
2026-04-15 07:58:17 -04:00
Andrew Dunstan
3e2a1496ba Rework signal handler infrastructure to pass sender info as argument.
Commit 095c9d4cf06 added errdetail() reporting of the PID and UID of
the process that sent a termination signal.  However, as noted by
Andres Freund, the implementation had architectural problems:

1. wrapper_handler() in pqsignal.c contained SIGTERM-specific logic
   (setting ProcDieSenderPid/Uid), violating its role as a generic
   signal dispatch wrapper.

2. Using globals to pass sender info between wrapper_handler and the
   real handler is unsafe when signals nest on some platforms.

3. The syncrep.c errdetail used psprintf() to conditionally embed
   text via %s, breaking translatability.

Adopt the approach proposed by Andres Freund: introduce a
pg_signal_info struct that is passed as an argument to all signal
handlers via the SIGNAL_ARGS macro.  wrapper_handler populates it
from siginfo_t when SA_SIGINFO is available, or with zeros otherwise.
This keeps wrapper_handler fully generic and avoids any globals for
passing signal metadata.

Since pqsigfunc now has a different signature from the system's
signal handler type, SIG_IGN and SIG_DFL can no longer be passed
directly to pqsignal().  Introduce PG_SIG_IGN and PG_SIG_DFL macros
that cast to the new pqsigfunc type, and update all call sites.
The legacy pqsignal() in libpq retains its original signature via
a local typedef.

Only die() reads pg_siginfo today, copying the sender PID/UID into
ProcDieSenderPid/Uid for later use by ProcessInterrupts().  Only the
first SIGTERM's sender info is recorded.

Also fix the syncrep.c translatability issue by using separate ereport
calls with complete, independently translatable errdetail strings.

Also make the psql TAP test require the DETAIL line on platforms with
SA_SIGINFO, rather than making it unconditionally optional.

On Windows, pg_signal_info uses uint32_t for pid and uid fields
since pid_t/uid_t are not available early enough in the include
chain.  The Windows signal dispatch in pgwin32_dispatch_queued_signals()
passes a zeroed pg_signal_info to handlers.

Author: Andres Freund <andres@anarazel.de>
Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/cwyyryh2veejuxbj5ifzyaejw7jhhqc5mrdeq56xckknsdecn2@6hzfcxde2nm5
Discussion: https://postgr.es/m/jygesyr7mwg7ovdbxpmjvvbi3hccptpkcreqb645h7f56puwbz@hmkkwi3melfe
2026-04-15 07:30:34 -04:00
Richard Guo
363af93bdd Fix var_is_nonnullable() to handle invalid NOT NULL constraints
The NOTNULL_SOURCE_SYSCACHE code path in var_is_nonnullable() used
get_attnotnull() to check pg_attribute.attnotnull, which is true for
both valid and invalid (NOT VALID) NOT NULL constraints.  An invalid
constraint does not guarantee the absence of NULLs, so this could lead
to incorrect results.  For example, query_outputs_are_not_nullable()
could wrongly conclude that a subquery's output is non-nullable,
causing NOT IN to be incorrectly converted to an anti-join.

Fix by checking the attnullability field in the relation's tuple
descriptor instead, which correctly distinguishes valid from invalid
constraints, consistent with what the NOTNULL_SOURCE_HASHTABLE code
path already does.

While at it, rename NOTNULL_SOURCE_SYSCACHE to NOTNULL_SOURCE_CATALOG
to reflect that this code path no longer uses a syscache lookup, and
remove the now-unused get_attnotnull() function.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/CAMbWs48ALW=mR0ydQ62dGS-Q+3D7WdDSh=EWDezcKp19xi=TUA@mail.gmail.com
2026-04-15 09:38:56 +09:00
Andrew Dunstan
1f108fc02e Fix pfree crash in pg_get_role_ddl() and pg_get_database_ddl().
DatumGetArrayTypeP() can return a pointer into the tuple when the
datum is stored as a short varlena, so pfree() on the result crashes.
Use DatumGetArrayTypePCopy() to always get a palloc'd copy.

Bug introduced in 76e514ebb4 and a4f774cf1c.

Reported-by: Jeff Davis <pgsql@j-davis.com>
Author: Satya Narlapuram <satya.narlapuram@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDdWtv9PKtPZEokwGCNtbv4MVnfYw5wMZrsEj4xizSNe5Q@mail.gmail.com
2026-04-14 18:29:46 -04:00
Heikki Linnakangas
66ad764c8d Replace deprecated StaticAssertStmt() with StaticAssertDecl()
Commit 6f5ad00ab7 added another use of StaticAssertStmt(), but it
was marked as deprecated in commit d50c86e743.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/adeNWH5pDawDvvR2@ip-10-97-1-34.eu-west-3.compute.internal
2026-04-14 12:03:30 +03:00
Amit Kapila
fce3f7d267 Add missing period to HINT messages.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Discussion: https://postgr.es/m/CAHut+PvikGr4AtoFSs=jq=hmTybVF2NCMEZ57-sjwbGudfuqsQ@mail.gmail.com
2026-04-14 09:37:18 +05:30
Jeff Davis
06ce97b999 Fix overrun when comparing with unterminated ICU language string.
The overrun was introduced in commit c4ff35f10.

Author: Andreas Karlsson <andreas@proxel.se>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/96d80a47-f17f-42fa-82b1-2908efbd6541@gmail.com
Backpatch-through: 18
2026-04-13 11:19:04 -07:00
Alexander Korotkov
a8b61c23c5 Explicitly forbid non-top-level WAIT FOR execution
Previously we were relying on a snapshot-based check to detect invalid
execution contexts.  However, when WAIT FOR is wrapped into a stored
procedure or a DO block, it could pass this check, causing an error
elsewhere.

This commit implements an explicit isTopLevel check to reject WAIT FOR
when called from within a function, procedure, or DO block.  The
isTopLevel check catches these cases early with a clear error message,
matching the pattern used by other utility commands like VACUUM and
REINDEX.  The snapshot check is retained for the remaining case:
top-level execution within a transaction block using an isolation level
higher than READ COMMITTED.

Also adds tests for WAIT FOR LSN wrapped in a procedure and DO block,
complementing the existing test that uses a function wrapper.  Relevant
documentation paragraph is also added.

Reported-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/CAHg%2BQDcN-n3NUqgRtj%3DBQb9fFQmH8-DeEROCr%3DPDbw_BBRKOYA%40mail.gmail.com
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
2026-04-13 14:04:52 +03:00
Amit Kapila
85c17f612a Fix excessive logging in idle slotsync worker.
The slotsync worker was incorrectly identifying no-op states as successful
updates, triggering a busy loop to sync slots that logged messages every
200ms. This patch corrects the logic to properly classify these states,
enabling the worker to respect normal sleep intervals when no work is
performed.

Reported-by: Fujii Masao <masao.fujii@gmail.com>
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/CAHGQGwF6zG9Z8ws1yb3hY1VqV-WT7hR0qyXCn2HdbjvZQKufDw@mail.gmail.com
2026-04-13 10:06:50 +05:30
David Rowley
49ce41810f Improve various new-to-v19 appendStringInfo calls
Similar to 928394b66 and 8461424fd, here we adjust a few new locations
which were not using the most suitable appendStringInfo* or
appendPQExpBuffer* function for the intended purpose.

Author: David Rowley <drowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvohYOdrvhVxXzCJNX_GYMSWBfjTTtB6hgDauEtZ8Nar2A@mail.gmail.com
2026-04-13 13:16:48 +12:00
David Rowley
e3e26d04bd Fix unlikely overflow bug in bms_next_member()
... and bms_prev_member().

Both of these functions won't work correctly when given a prevbit of
INT_MAX and would crash when operating on a Bitmapset that happened to
have a member with that value.

Here we fix that by using an unsigned int to calculate which member to
look for next.

I've also adjusted bms_prev_member() to check for < 0 rather than == -1
for starting the loop.  This was done as it's safer and comes at zero
extra cost.

With our current use cases, it's likely impossible to have a Bitmapset
with an INT_MAX member, so no backpatch here.  I only noticed this issue
when working on a bms function to bitshift a Bitmapset.

Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAApHDvr1B2gbf6JF69QmueM2QNRvbQeeKLxDnF=w9f9--022uA@mail.gmail.com
2026-04-13 11:39:15 +12:00
David Rowley
a63bbc811d Use stack-allocated StringInfoDatas, where possible
6d0eba662 already did most of the changes, but some new ones snuck in
just prior to that commit, so these got missed.

Having these short-lived StringInfoDatas on the stack rather than having
them get palloc'd by makeStringInfo() is simply for performance as it
saves doing a 2nd palloc.

Since this code is new to v19, it makes sense to improve it now rather
than wait until we branch as having v19 and v20 differ here just makes it
harder to backpatch fixes in this area.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/adt4wpj4FZwR+S7I@ip-10-97-1-34.eu-west-3.compute.internal
2026-04-13 10:43:19 +12:00
Michael Paquier
80156cee06 Honor passed-in database OIDs in pgstat_database.c
Three routines in pgstat_database.c incorrectly ignore the database OID
provided by their caller, using MyDatabaseId instead:
- pgstat_report_connect()
- pgstat_report_disconnect()
- pgstat_reset_database_timestamp()

The first two functions, for connection and disconnection, each have a
single caller that already passes MyDatabaseId.  This was harmless,
still incorrect.

The timestamp reset function also has a single caller, but in this case
the issue has a real impact: it fails to reset the timestamp for the
shared-database entry (datid=0) when operating on shared objects.  This
situation can occur, for example, when resetting counters for shared
relations via pg_stat_reset_single_table_counters().

There is currently one test in the tree that checks the reset of a
shared relation, for pg_shdescription, we rely on it to check what is
stored in pg_stat_database.  As stats_reset may be NULL, two resets are
done to provide a baseline for comparison.

Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Dapeng Wang <wangdp20191008@gmail.com>
Discussion: https://postgr.es/m/ABBD5026-506F-4006-A569-28F72C188693@gmail.com
Backpatch-through: 15
2026-04-11 17:02:52 +09:00
Richard Guo
77d0e82e58 Fix estimate_array_length error with set-operation array coercions
When a nested set operation's output type doesn't match the parent's
expected type, recurse_set_operations builds a projection target list
using generate_setop_tlist with varno 0.  If the required type
coercion involves an ArrayCoerceExpr, estimate_array_length could be
called on such a Var, and would pass it to examine_variable, which
errors in find_base_rel because varno 0 has no valid relation entry.

Fix by skipping the statistics lookup for Vars with varno 0.

Bug introduced by commit 9391f7152.  Back-patch to v17, where
estimate_array_length was taught to use statistics.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/adjW8rfPDkplC7lF@pryzbyj2023
Backpatch-through: 17
2026-04-11 16:38:47 +09:00
Thomas Munro
b2a17ba7a5 read_stream: Remove obsolete comment.
This comment was describing the v17 implementation (or io_method=sync).

Backpatch-through: 18
2026-04-11 11:25:25 +12:00
Fujii Masao
de74d1e9a5 Adjust log level of logical decoding messages by context
Commit 21b018e7ea lowered some logical decoding messages from LOG to DEBUG1.
However, per discussion on pgsql-hackers, messages from background activity
(e.g., walsender or slotsync worker) should remain at LOG, as they are less
frequent and more likely to indicate issues that DBAs should notice.

For foreground SQL functions (e.g., pg_logical_slot_peek_binary_changes()),
keep these messages at DEBUG1 to avoid excessive log noise. They can still be
enabled by lowering client_min_messages or log_min_messages for the session.

This commit updates logical decoding to log these messages at LOG for
background activity and at DEBUG1 for foreground execution.

Suggested-by: Robert Haas <robertmhaas@gmail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CA+TgmoYsu2+YAo9eLGkDp5VP-pfQ-jOoX382vS4THKHeRTNgew@mail.gmail.com
2026-04-10 22:59:34 +09:00
Amit Langote
d6e96bacd3 Move afterTriggerFiringDepth into AfterTriggersData
The static variable afterTriggerFiringDepth introduced by commit
5c54c3ed1b is logically part of the after-trigger state and in
hindsight should have been a field in AfterTriggersData alongside
query_depth and the other per-transaction after-trigger state.
Move it there as firing_depth.  Also update its comment to
accurately reflect its sole remaining purpose: signaling to
AfterTriggerIsActive() that after-trigger firing is active.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFt4NGTNk7BinOsHHM48E9zGAa852vCfGoSe1bbL=JNFQ@mail.gmail.com
2026-04-10 16:17:58 +09:00
Richard Guo
f6936bf9da Fix var_is_nonnullable() to account for varreturningtype
var_is_nonnullable() failed to consider varreturningtype, which meant
it could incorrectly claim a Var is non-nullable based on a column's
NOT NULL constraint even when the Var refers to a non-existent row.
Specifically, OLD.col is NULL for INSERT (no old row exists) and
NEW.col is NULL for DELETE (no new row exists), regardless of any NOT
NULL constraint on the column.

This caused the planner's constant folding in eval_const_expressions
to incorrectly simplify IS NULL / IS NOT NULL tests on such Vars.  For
example, "old.a IS NULL" in an INSERT's RETURNING clause would be
folded to false when column "a" has a NOT NULL constraint, even though
the correct result is true.

Fix by returning false from var_is_nonnullable() when varreturningtype
is not VAR_RETURNING_DEFAULT, since such Vars can be NULL regardless
of table constraints.

Author: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDfaAipL6YzOq2H=gAhKBbcUTYmfbAv+W1zueOfRKH43FQ@mail.gmail.com
2026-04-10 15:51:00 +09:00
Amit Langote
155c03ee9d Assert index_attnos[0] == 1 in ri_FastPathFlushArray()
ri_FastPathFlushArray() handles single-column FKs only, so
index_attnos[0] is always 1.  Add an Assert to make this invariant
explicit, as a followup to 980c1a85d8.

Suggested-by: Junwang Zhao <zhjwpku@gmail.com> (offlist)
Discussion: https://www.postgresql.org/message-id/CADfhSr-pCkbDxmiOVYSAGE5QGjsQ48KKH_W424SPk%2BpwzKZFaQ%40mail.gmail.com
2026-04-10 15:24:38 +09:00
Amit Langote
980c1a85d8 Fix FK fast-path scan key ordering for mismatched column order
The fast-path foreign key check introduced in 2da86c1ef9 assumed that
constraint key positions directly correspond to index column positions.
This is not always true as a FK constraint can reference PK columns in a
different order than they appear in the PK's unique index.

For example, if the PK is (a, b, c) and the FK references them as
(a, c, b), the constraint stores keys in the FK-specified order, but
the index has columns in PK order. The buggy code used the constraint
key index to access rd_opfamily[i], which retrieved the wrong operator
family when columns were reordered, causing "operator X is not a member
of opfamily Y" errors.

After fixing the opfamily lookup, a second issue started to happen:
btree index scans require scan keys to be ordered by attribute number.
The code was placing scan keys at array position i with attribute number
idx_attno, producing out-of-order keys when columns were swapped. This
caused "btree index keys must be ordered by attribute" errors.

The fix adds an index_attnos array to FastPathMeta that maps each
constraint key position to its corresponding index column position.
In ri_populate_fastpath_metadata(), we search indkey to find the actual
index column for each pk_attnums[i] and use that position for the
opfamily lookup. In build_index_scankeys(), we place each scan key at
the array position corresponding to its index column
(skeys[idx_attno-1]) rather than at the constraint key position,
ensuring scan keys are properly ordered by attribute number as btree
requires.

Reported-by: Fredrik Widlert <fredrik.widlert@digpro.se>
Author: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://www.postgresql.org/message-id/CADfhSr-pCkbDxmiOVYSAGE5QGjsQ48KKH_W424SPk%2BpwzKZFaQ%40mail.gmail.com
2026-04-10 13:33:55 +09:00
Amit Langote
03029409b4 Fix typo left by 34a3078629
Reported-by: jie wang <jugierwang@gmail.com>
Reported-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAJnZyeDyaS=X-eYN=9rDYqK=6ma1gMLa0qDgfNbZKK0e0+q99Q@mail.gmail.com
2026-04-10 13:32:38 +09:00
Amit Langote
34a3078629 Fix RI fast-path crash under nested C-level SPI
When a C-language function uses SPI_connect/SPI_execute/SPI_finish to
INSERT into a table with FK constraints, the FK AFTER triggers fire
and schedule ri_FastPathEndBatch via
RegisterAfterTriggerBatchCallback(), opening PK relations under
CurrentResourceOwner at the time of the SPI call.  The query_depth > 0
guard in FireAfterTriggerBatchCallbacks suppresses the callback at
that nesting level, deferring teardown to the outer query's
AfterTriggerEndQuery. By then the resource owner active during the SPI
call may have been released, decrementing the cached relations'
refcounts to zero. ri_FastPathTeardown, running under the outer
query's resource owner, then crashes in assert builds when it attempts
to close relations whose refcounts are already zero:

  TRAP: failed Assert("rel->rd_refcnt > 0")

Fix by storing batch callbacks at the level where they should fire:
in AfterTriggersQueryData.batch_callbacks for immediate constraints
(fired by AfterTriggerEndQuery) and in AfterTriggersData.batch_callbacks
for deferred constraints (fired by AfterTriggerFireDeferred and
AfterTriggerSetState).  RegisterAfterTriggerBatchCallback() routes the
callback to the current query-level list when query_depth >= 0, and to
the top-level list otherwise.  FireAfterTriggerBatchCallbacks() takes a
list parameter and simply iterates and invokes it; memory cleanup is
handled by the caller.  This replaces the query_depth > 0 guard with
list-level scoping.  Note that deferred constraints are unaffected by
this bug: their callbacks fire at commit via AfterTriggerFireDeferred,
under the outer transaction's resource owner, which remains valid
throughout.

Also add firing_batch_callbacks to AfterTriggersData to enforce that
callbacks do not register new callbacks during
FireAfterTriggerBatchCallbacks(), which would be unsafe as it could
modify the list being iterated.  An Assert in
RegisterAfterTriggerBatchCallback() enforces this discipline for
future callers.  The flag is reset at transaction and subtransaction
boundaries to handle cases where an error thrown by a callback is
caught and the subtransaction is rolled back.

While at it, ensure callbacks are properly accounted for at all
transaction boundaries, as cleanup of b7b27eb41a: discard any
remaining top-level callbacks on both commit and abort in
AfterTriggerEndXact(), and clean up query-level callbacks in
AfterTriggerFreeQuery().

Note that ri_PerformCheck() calls SPI with fire_triggers=false, which
skips AfterTriggerBeginQuery/EndQuery for that SPI command.  Any
triggers queued during that SPI command are not fired immediately but
deferred to the outer query level.  Since the fast-path check for
those triggers runs under the outer query's resource owner rather than
a nested SPI resource owner, and ri_PerformCheck() does not create
a dedicated child resource owner, the bug described above does not
apply.

Reported-by: Evan Montgomery-Recht <montge@mianetworks.net>
Reported-by: Sandro Santilli <strk@kbt.io>
Analyzed-by: Evan Montgomery-Recht <montge@mianetworks.net>
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g@mail.gmail.com
2026-04-10 12:41:34 +09:00
Michael Paquier
5b5bf51e43 Zero-fill private_data when attaching an injection point
InjectionPointAttach() did not initialize the private_data buffer of the
shared memory entry before (perhaps partially) overwriting it.  When the
private data is set to NULL by the caler, the buffer was left
uninitialized.  If set, it could have stale contents.

The buffer is initialized to zero, so as the contents recorded when a
point is attached are deterministic.

Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0tsGHu2h6YLnVu4HiK05q+gTE_9WVUAqihW2LSscAYS-g@mail.gmail.com
Backpatch-through: 17
2026-04-10 11:17:09 +09:00
Nathan Bossart
71ff232a5b Fix double-free in pg_stat_autovacuum_scores.
Presently, relation_needs_vacanalyze() unconditionally frees the
pgstat entry returned by pgstat_fetch_stat_tabentry_ext().  This
behavior was first added by commit 02502c1bca to avoid memory
leakage in autovacuum.  While this is fine for autovacuum since it
forces stats_fetch_consistency to "none", it is not okay for other
callers that use "cache" or "snapshot".  This manifests as a
double-free when pg_stat_autovacuum_scores is called multiple times
in the same transaction.

To fix, add a "bool *may_free" parameter to
pgstat_fetch_stat_tabentry_ext() that returns whether it is safe
for the caller to explicitly pfree() the result.  If a caller would
rather leave it to the memory context machinery to free the result,
it can pass NULL as the "may_free" argument (or just ignore its
value).

Oversight in commit 87f61f0c82.

Reported-by: Tender Wang <tndrwang@gmail.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAHewXNkJKdwb3D5OnksrdOqzqUnXUEMpDam1TPW0vfUkW%3D7jUw%40mail.gmail.com
Discussion: https://postgr.es/m/5684f479-858e-4c5d-b8f5-bcf05de1f909%40gmail.com
2026-04-09 13:07:06 -05:00
Nathan Bossart
60165db6e1 Add LOG_NEVER error level code.
This logging level means not to emit the log, which is useful for
functions like relation_needs_vacanalyze().  This function accepts
a log level argument but not all callers want it to emit logs.

Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3101163.1775676098%40sss.pgh.pa.us
2026-04-09 10:18:15 -05:00
Richard Guo
8b6c89e377 Fix integer overflow in nodeWindowAgg.c
In nodeWindowAgg.c, the calculations for frame start and end positions
in ROWS and GROUPS modes were performed using simple integer addition.
If a user-supplied offset was sufficiently large (close to INT64_MAX),
adding it to the current row or group index could cause a signed
integer overflow, wrapping the result to a negative number.

This led to incorrect behavior where frame boundaries that should have
extended indefinitely (or beyond the partition end) were treated as
falling at the first row, or where valid rows were incorrectly marked
as out-of-frame.  Depending on the specific query and data, these
overflows can result in incorrect query results, execution errors, or
assertion failures.

To fix, use overflow-aware integer addition (ie, pg_add_s64_overflow)
to check for overflows during these additions.  If an overflow is
detected, the boundary is now clamped to INT64_MAX.  This ensures the
logic correctly treats the boundary as extending to the end of the
partition.

Bug: #19405
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/19405-1ecf025dda171555@postgresql.org
Backpatch-through: 14
2026-04-09 19:28:33 +09:00
Richard Guo
c1408956e3 Strip PlaceHolderVars from partition pruning operands
When pulling up a subquery, its targetlist items may be wrapped in
PlaceHolderVars to enforce separate identity or as a result of outer
joins.  This causes any upper-level WHERE clauses referencing these
outputs to contain PlaceHolderVars, which prevents partprune.c from
recognizing that they match partition key columns, defeating partition
pruning.

To fix, strip PlaceHolderVars from operands before comparing them to
partition keys.  A PlaceHolderVar with empty phnullingrels appearing
in a relation-scan-level expression is effectively a no-op, so
stripping it is safe.  This parallels the existing treatment in
indxpath.c for index matching.

In passing, rename strip_phvs_in_index_operand() to strip_noop_phvs()
and move it from indxpath.c to placeholder.c, since it is now a
general-purpose utility used by both index matching and partition
pruning code.

Back-patch to v18.  Although this issue exists before that, changes in
that version made it common enough to notice.  Given the lack of field
reports for older versions, I am not back-patching further.  In the
v18 back-patch, strip_phvs_in_index_operand() is retained as a thin
wrapper around the new strip_noop_phvs() to avoid breaking third-party
extensions that may reference it.

Reported-by: Cándido Antonio Martínez Descalzo <candido@ninehq.com>
Diagnosed-by: David Rowley <dgrowleyml@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAH5YaUwVUWETTyVECTnhs7C=CVwi+uMSQH=cOkwAUqMdvXdwWA@mail.gmail.com
Backpatch-through: 18
2026-04-09 16:41:31 +09:00
Amit Langote
e1cc57fabd Add nkeys parameter to recheck_matched_pk_tuple()
The function looped over ii_NumIndexKeyAttrs elements of the skeys
array, but one caller (ri_FastPathFlushArray) passes a one-element
array since it only handles single-column FKs.  The function
signature did not communicate this constraint, which static analysis
flags as a potential out-of-bounds read.

Add an nkeys parameter and assert that it matches
ii_NumIndexKeyAttrs, then use it in the loop.  The call sites
already know the key count.

Reported-by: Evan Montgomery-Recht <montge@mianetworks.net>
Discussion: https://postgr.es/m/CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g@mail.gmail.com
2026-04-09 14:45:31 +09:00
Michael Paquier
e0fa5bd146 Reduce presence of syscache.h in src/include/
ee642cccc4 has added syscache.h in inval.h and objectaddress.h,
enlarging by a lot the footprint of this header, particularly via
objectaddress.h.  A change in syscache.h would cause a lot more files to
be recompiled.

This commit reduces the presence of syscache.h by switching to a direct
use of syscache_ids.h in inval.h and objectaddress.h, where the enum
SysCacheIdentifier is defined.  genbki.pl gains an #ifndef block for
this header, so as its inclusion is more controlled.

Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/vlcexdcimsmvu3aplt2yxpfndkgtuvjsrms2fdl46rbw3k2kug@drspkoxlaije
2026-04-09 08:49:36 +09:00
Álvaro Herrera
2cff363715
Simplify declaration of memcpy target
The existing one is understandable failing on (some?) 32-bit platforms.

Reported-by: Tomas Vondra <tomas@vondra.me>
Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1c197f2d-49a2-4830-8dde-55867218b62d@vondra.me
2026-04-08 22:58:56 +02:00
Thomas Munro
a1643d40b3 Remove RADIUS support.
Our RADIUS implementation supported only the deprecated RADIUS/UDP
variant, without the recommended Message-Authenticator attribute to
mitigate against the Blast-RADIUS vulnerability.  By now, popular RADIUS
servers are expected to generate loud warnings or reject our
authentication attempts outright.

Since there have been no user reports about this, it seems unlikely that
there are users.

Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Michael Banck <mbanck@gmx.net>
Discussion: https://postgr.es/m/CA%2BhUKG%2BSH309V8KECU5%3DxuLP9Dks0v9f9UVS2W74fPAE5O21dg%40mail.gmail.com
2026-04-08 22:38:43 +12:00
Etsuro Fujita
28972b6fc3 Add support for importing statistics from remote servers.
Add a new FDW callback routine that allows importing remote statistics
for a foreign table directly to the local server, instead of collecting
statistics locally.  The new callback routine is called at the beginning
of the ANALYZE operation on the table, and if the FDW failed to import
the statistics, the existing callback routine is called on the table to
collect statistics locally.

Also implement this for postgres_fdw.  It is enabled by "restore_stats"
option both at the server and table level.  Currently, it is the user's
responsibility to ensure remote statistics to import are up-to-date, so
the default is false.

Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/CADkLM%3DchrYAx%3DX2KUcDRST4RLaRLivYDohZrkW4LLBa0iBhb5w%40mail.gmail.com
2026-04-08 19:15:00 +09:00
Thomas Munro
d1c01b79d4 aio: Adjust I/O worker pool automatically.
The size of the I/O worker pool used to implement io_method=worker was
previously controlled by the io_workers setting, defaulting to 3.  It
was hard to know how to tune it effectively.  That is replaced with:

  io_min_workers=2
  io_max_workers=8 (up to 32)
  io_worker_idle_timeout=60s
  io_worker_launch_interval=100ms

The pool is automatically sized within the configured range according to
recent variation in demand.  It grows when existing workers detect that
latency might be introduced by queuing, and shrinks when the
highest-numbered worker is idle for too long.  Work was already
concentrated into low-numbered workers in anticipation of this logic.

The logic for waking extra workers now also tries to measure and reduce
the number of spurious wakeups, though they are not entirely eliminated.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2Bm4xV0LMoH2c%3DoRAdEXuCnh%2BtGBTWa7uFeFMGgTLAw%2BQ%40mail.gmail.com
2026-04-08 19:08:32 +12:00
Thomas Munro
77645d44e3 Remove MULE_INTERNAL encoding.
This was useful before widespread Unicode adoption, and was based on the
internal encoding Emacs used to mix multiple sub-encodings.  Emacs
itself has stopped using it, and our implementation hadn't been updated
with modern underlying standards.  It is thought to be very unlikely
that anyone is still using it in the field.  Since such a complex
encoding comes with costs and risks, we agreed to drop support.

Any existing database using this encoding would need to be dumped and
restored with a new encoding to upgrade to PostgreSQL 19, most likely
UTF8, since pg_upgrade would fail.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tatsuo Ishii <ishii@postgresql.org>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/CA%2BhUKGKXDXh-FdU0orjfv%2BF08f%3DD91BhV3Ra-4zL-q%2BJmGYqTA%40mail.gmail.com
2026-04-08 17:40:06 +12:00
Andres Freund
2c16deee2f instrumentation: Allocate query level instrumentation in ExecutorStart
Until now extensions that wanted to measure overall query execution could
create QueryDesc->totaltime, which the core executor would then start and
stop.  That's a bit odd and composes badly, e.g. extensions always had to use
INSTRUMENT_ALL, because otherwise another extension might not get what they
need.

Instead this introduces a new field, QueryDesc->query_instr_options, that
extensions can use to indicate whether they need query level instrumentation
populated, and with which instrumentation options. Extensions should take care
to only add options they need, instead of replacing the options of others.

The prior name of the field, totaltime, sounded like it would only measure
time, but these days the instrumentation infrastructure can track more
resources.  The secondary benefit is that this will make it obvious to
extensions that they may not create the Instrumentation struct themselves
anymore (often extensions build only against a postgres build without
assertions).

Adjust pg_stat_statements and auto_explain to match, and lower the
requested instrumentation level for auto_explain to INSTRUMENT_TIMER,
since the summary instrumentation it needs is only runtime.

The reason to push this now, rather in the PG 20 cycle, is that 5a79e78501
already required extensions using query level instrumentations to adjust their
code, and it seemed undesirable to require them to do so again for 20.

Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAP53Pkyqsht+exJQYRsjhSWYKu+vFGHhPub7m6PmFD6Or0=p1g@mail.gmail.com
2026-04-08 00:06:45 -04:00
Fujii Masao
db93032a7c Fix slotsync worker blocking promotion when stuck in wait
Previously, on standby promotion, the startup process sent SIGUSR1 to
the slotsync worker (or a backend performing slot synchronization) and
waited for it to exit. This worked in most cases, but if the process was
blocked waiting for a response from the primary (e.g., due to a network
failure), SIGUSR1 would not interrupt the wait. As a result, the process
could remain stuck, causing the startup process to wait for a long time
and delaying promotion.

This commit fixes the issue by introducing a new procsignal reason,
PROCSIG_SLOTSYNC_MESSAGE. On promotion, the startup process
sends this signal, and the handler sets interrupt flags so the process
exits (or errors out) promptly at CHECK_FOR_INTERRUPTS(), allowing
promotion to complete without delay.

Backpatch to v17, where slotsync was introduced.

Author: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwFzNYroAxSoyJhqTU-pH=t4Ej6RyvhVmBZ91Exj_TPMMQ@mail.gmail.com
Backpatch-through: 17
2026-04-08 11:22:21 +09:00
Andres Freund
544000288e instrumentation: Move ExecProcNodeInstr to allow inlining
This moves the implementation of ExecProcNodeInstr, the ExecProcNode variant
that gets used when instrumentation is on, to be defined in instrument.c
instead of execProcNode.c, and marks functions it uses as inline.

This allows compilers to generate an optimized implementation, and shows a 4
to 12% reduction in instrumentation overhead for queries that move lots of
rows.

Author: Lukas Fittl <lukas@fittl.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAP53PkzdBK8VJ1fS4AZ481LgMN8f9mJiC39ZRHqkFUSYq6KWmg@mail.gmail.com
2026-04-07 21:36:49 -04:00
Tomas Vondra
e157fe6f76 Add EXPLAIN (IO) instrumentation for TidRangeScan
Adds support for EXPLAIN (IO) instrumentation for TidRange scans. This
requires adding shared instrumentation for parallel scans, using the
separate DSM approach introduced by dd78e69cfc.

Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
2026-04-07 23:25:05 +02:00
Tomas Vondra
3b1117d6e2 Add EXPLAIN (IO) instrumentation for SeqScan
Adds support for EXPLAIN (IO) instrumentation for sequential scans. This
requires adding shared instrumentation, using the separate DSM approach
introduced by dd78e69cfc.

Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
2026-04-07 23:07:03 +02:00
Tomas Vondra
681daed931 Add EXPLAIN (IO) infrastructure with BitmapHeapScan support
Allows collecting details about AIO / prefetch for scan nodes backed by
a ReadStream. This may be enabled by a new "IO" option in EXPLAIN, and
it shows information about the prefetch distance and I/O requests.

As of this commit this applies only to BitmapHeapScan, because that's
the only scan node using a ReadStream and collecting instrumentation
from workers in a parallel query. Support for SeqScan and TidRangeScan,
the other scan nodes using ReadStream, will be added in subsequent
commits.

The stats are collected only when required by EXPLAIN ANALYZE, with the
IO option (disabled by default). The amount of collected statistics is
very limited, but we don't want to clutter EXPLAIN with too much data.

The IOStats struct is stored in the scan descriptor as a field, next to
other fields used by table AMs. A pointer to the field is passed to the
ReadStream, and updated directly.

It's the responsibility of the table AM to allocate the struct (e.g. in
ambeginscan) whenever the flag SO_SCAN_INSTRUMENT flag is passed to the
scan, so that the executor and ReadStream has access to it.

The collected stats are designed for ReadStream, but are meant to be
reasonably generic in case a TAM manages I/Os in different ways.

Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
2026-04-07 22:33:34 +02:00
Tom Lane
4edd6036d6 Fix WITHOUT OVERLAPS' interaction with domains.
UNIQUE/PRIMARY KEY ... WITHOUT OVERLAPS requires the no-overlap
column to be a range or multirange, but it should allow a domain
over such a type too.  This requires minor adjustments in both
the parser and executor.

In passing, fix a nearby break-instead-of-continue thinko in
transformIndexConstraint.  This had the effect of disabling
parse-time validation of the no-overlap column's type in the context
of ALTER TABLE ADD CONSTRAINT, if it follows a dropped column.
We'd still complain appropriately at runtime though.

Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxGoAmN_0iJ=hjTG0vGpOSOyy-vYyfE+-q0AWxrq2_p5XQ@mail.gmail.com
Backpatch-through: 18
2026-04-07 14:45:37 -04:00
Andres Freund
294520c444 instrumentation: Use Time-Stamp Counter on x86-64 to lower overhead
This allows the direct use of the Time-Stamp Counter (TSC) value retrieved
from the CPU using RDTSC/RDTSCP instructions, instead of APIs like
clock_gettime() on POSIX systems.

This reduces the overhead of EXPLAIN with ANALYZE and TIMING ON. Tests showed
that the overhead on top of actual runtime when instrumenting queries moving
lots of rows through the plan can be reduced from 2x as slow to 1.2x as slow
compared to the actual runtime. More complex workloads such as TPCH queries
have also shown ~20% gains when instrumented compared to before.

To control use of the TSC, the new "timing_clock_source" GUC is introduced,
whose default ("auto") automatically uses the TSC when reliable, for example
when running on modern Intel CPUs, or when running on Linux and the system
clocksource is reported as "tsc". The use of the operating system clock source
can be enforced by setting "system", or on x86-64 architectures the use of TSC
can be enforced by explicitly setting "tsc".

In order to use the TSC the frequency is first determined by use of CPUID, and
if not available, by running a short calibration loop at program start,
falling back to the system clock source if TSC values are not stable.

Note, that we split TSC usage into the RDTSC CPU instruction which does not
wait for out-of-order execution (faster, less precise) and the RDTSCP
instruction, which waits for outstanding instructions to retire. RDTSCP is
deemed to have little benefit in the typical InstrStartNode() /
InstrStopNode() use case of EXPLAIN, and can be up to twice as slow. To
separate these use cases, the new macro INSTR_TIME_SET_CURRENT_FAST() is
introduced, which uses RDTSC.

The original macro INSTR_TIME_SET_CURRENT() uses RDTSCP and is supposed to be
used when precision is more important than performance. When the system timing
clock source is used both of these macros instead utilize the system
APIs (clock_gettime / QueryPerformanceCounter) like before.

Additional users of interval timing, such as track_io_timing and
track_wal_io_timing could also benefit from being converted to use
INSTR_TIME_SET_CURRENT_FAST() but are left for future changes.

Author: Lukas Fittl <lukas@fittl.com>
Author: Andres Freund <andres@anarazel.de>
Author: David Geier <geidav.pg@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com> (in an earlier version)
Reviewed-by: Maciek Sakrejda <m.sakrejda@gmail.com> (in an earlier version)
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (in an earlier version)
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> (in an earlier version)
Discussion: https://postgr.es/m/20200612232810.f46nbqkdhbutzqdg@alap3.anarazel.de
2026-04-07 13:00:24 -04:00
Andres Freund
0022622c93 instrumentation: Standardize ticks to nanosecond conversion method
The timing infrastructure (INSTR_* macros) measures time elapsed using
clock_gettime() on POSIX systems, which returns the time as nanoseconds,
and QueryPerformanceCounter() on Windows, which is a specialized timing
clock source that returns a tick counter that needs to be converted to
nanoseconds using the result of QueryPerformanceFrequency().

This conversion currently happens ad-hoc on Windows, e.g. when calling
INSTR_TIME_GET_NANOSEC, which calls QueryPerformanceFrequency() on every
invocation, despite the frequency being stable after program start,
incurring unnecessary overhead. It also causes a fractured implementation
where macros are defined differently between platforms.

To ease code readability, and prepare for a future change that intends
to use a ticks-to-nanosecond conversion on x86-64 for TSC use, introduce
new pg_ticks_to_ns() / pg_ns_to_ticks() functions that get called from
INSTR_* macros on all platforms.

These functions rely on a separately initialized ticks_per_ns_scaled
value, that represents the conversion ratio. This value is initialized
from QueryPerformanceFrequency() on Windows, and set to zero on x86-64
POSIX systems, which results in the ticks being treated as nanoseconds.
Other architectures always directly return the original ticks.

To support this, pg_initialize_timing() is introduced, and is now
mandatory for both the backend and any frontend programs to call before
utilizing INSTR_* macros.

In passing, fix variable names in comment documenting INSTR_TIME_ADD_NANOSEC().

Author: Lukas Fittl <lukas@fittl.com>
Author: David Geier <geidav.pg@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://www.postgresql.org/message-id/flat/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de
2026-04-07 13:00:24 -04:00
Jacob Champion
b977bd308a oauth: Allow validators to register custom HBA options
OAuth validators can already use custom GUCs to configure behavior
globally. But we currently provide no ability to adjust settings for
individual HBA entries, because the original design focused on a world
where a provider covered a "single audience" of users for one database
cluster. This assumption does not apply to multitenant use cases, where
a single validator may be controlling access for wildly different user
groups.

To improve this use case, add two new API calls for use by validator
callbacks: RegisterOAuthHBAOptions() and GetOAuthHBAOption().
Registering options "foo" and "bar" allows a user to set "validator.foo"
and "validator.bar" in an oauth HBA entry. These options are stringly
typed (syntax validation is solely the responsibility of the defining
module), and names are restricted to a subset of ASCII to avoid tying
our hands with future HBA syntax improvements.

Unfortunately, we can't check the custom option names during a reload of
the configuration, like we do with standard HBA options, without
requiring all validators to be loaded via shared_preload_libraries.
(I consider this to be a nonstarter: most validators should probably use
session_preload_libraries at most, since requiring a full restart just
to update authentication behavior will be unacceptable to many users.)
Instead, the new validator.* options are checked against the registered
list at connection time.

Multiple alternatives were proposed and/or prototyped, including
extending the GUC system to allow per-HBA overrides, joining forces with
recent refactoring work on the reloptions subsystem, and giving the
ability to customize HBA options to all PostgreSQL extensions. I
personally believe per-HBA GUC overrides are the best option, because
several existing GUCs like authentication_timeout and pre_auth_delay
would fit there usefully. But the recent addition of SNI per-host
settings in 4f433025f indicates that a more general solution is needed,
and I expect that to take multiple releases' worth of discussion.

This compromise patch, then, is intentionally designed to be an
architectural dead end: simple to describe, cheap to maintain, and
providing just enough functionality to let validators move forward for
PG19. The hope is that it will be replaced in the future by a solution
that can handle per-host, per-HBA, and other per-context configuration
with the same functionality that GUCs provide today. In the meantime,
the bulk of the code in this patch consists of strict guardrails on the
simple API, to try to ensure that we don't have any reason to regret its
existence during its unknown lifespan.

I owe particular thanks here to Zsolt Parragi, who prototyped several
approaches that guided the final design.

Suggested-by: Zsolt Parragi <zsolt.parragi@percona.com>
Suggested-by: VASUKI M <vasukianand0119@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAN4CZFM3b8u5uNNNsY6XCya257u%2BDofms3su9f11iMCxvCacag%40mail.gmail.com
2026-04-07 08:15:19 -07:00
Álvaro Herrera
e76d8c749c
Reserve replication slots specifically for REPACK
Add a new GUC max_repack_replication_slots, which lets the user reserve
some additional replication slots for concurrent repack (and only
concurrent repack).  With this, the user doesn't have to worry about
changing the max_replication_slots in order to cater for use of
concurrent repack.

(We still use the same pool of bgworkers though, but that's less
commonly a problem than slots.)

Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Discussion: https://postgr.es/m/202604012148.nnnmyxxrr6nh@alvherre.pgsql
2026-04-07 16:55:29 +02:00
Heikki Linnakangas
979387f188 Fix harmless leftover in _hash_kill_items()
Checking for 'havePin' is sufficient here. An earlier version of the
patch didn't have the 'havePin' variable and used
'so->hashso_bucket_buf == so->currPos.buf' as the condition when both
locking and unlocking the page. The havePin variable was added later
during development, but the unlocking condition wasn't fully
updated. Tidy it up.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/b9de8d05-3b02-4a27-9b0b-03972fa4bfd3@iki.fi
2026-04-07 17:38:11 +03:00
Andrew Dunstan
55890a9194 Add errdetail() with PID and UID about source of termination signal.
When a backend is terminated via pg_terminate_backend() or an external
SIGTERM, the error message now includes the sender's PID and UID as
errdetail, making it easier to identify the source of unexpected
terminations in multi-user environments.

On platforms that support SA_SIGINFO (Linux, FreeBSD, and most modern
Unix systems), the signal handler captures si_pid and si_uid from the
siginfo_t structure.  On platforms without SA_SIGINFO, the detail is
simply omitted.

Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Chao Li <1356863904@qq.com>
Discussion: https://postgr.es/m/CAKZiRmyrOWovZSdixpLd3PGMQXuQL_zw2Ght5XhHCkQ1uDsxjw@mail.gmail.com
2026-04-07 10:22:33 -04:00
Andres Freund
29e7dbf5e4 Minimal fix for WAIT FOR ... MODE 'standby_flush'
The investigation into the negative test performance impact of 7e8aeb9e48
lead to discovering that there are a few issues with WAIT FOR.

This commit is just a minimal fix to prevent hangs in standby_flush mode, due
to WAIT FOR ... 'standby_flush' seeing a 0 LSN if a newly started walreceiver
does not receive any writes, because the stanby is already caught up.

There are several other issues and this is isn't necessarily the best fix. But
this way we get the hangs out of the way.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/zqbppucpmkeqecfy4s5kscnru4tbk6khp3ozqz6ad2zijz354k@w4bdf4z3wqoz
2026-04-07 09:48:09 -04:00
Heikki Linnakangas
9480c585df Tidy up #ifdef USE_INJECTION_POINTS guards
Remove unnecessary #ifdef guard around the function prototypes; they
are already inside a larger #ifdef block. Move #include "subsystems.h"
inside the USE_INJECTION_POINTS guard; it's needed for
InjectionPointShmemCallbacks, which is a also inside the guard.

Reported-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/87y0iz2c1v.fsf@wibble.ilmari.org
2026-04-07 16:18:31 +03:00
Tomas Vondra
884f9b3c76 Use add_size/mul_size for index instrumentation size calculations
Use overflow-safe size arithmetic in the Index[Only]Scan and parallel
instrumentation functions, consistent with other executor nodes (Hash,
Sort, Agg, Memoize). This was an oversight in dd78e69cfc.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
2026-04-07 12:47:28 +02:00
Tomas Vondra
9c18b47e61 Fix BitmapHeapScan non-parallel-aware EXPLAIN ANALYZE
Allocates shared bitmap table scan instrumentation for all parallel
scans. Previously, the instrumentation was only allocated for
parallel-aware scans, other bitmap heap scans in the parallel query had
no shared instrumentation and EXPLAIN didn't report exact/lossy pages.
This affected cases like scans on the outside of a parallel join or
queries run with debug_parallel_query=regress.

Fixed by allocating a separate DSM chunk for shared instrumentation and
doing so regardless of parallel-awareness. The instrumentation is
allocated in its own DSM chunk, separate from ParallelBitmapHeapState.

Report an initial patch by me. The approach with a separate DSM was
proposed and implemented by Melanie.

Not backpatched. The issue affects Postgres 18 (since 5a1e6df3b8), but
having multiple DSM chunks is possible only since dd78e69cfc. If we
decide to fix this in backbranches too, it will need to be done in a
less invasive way.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
2026-04-07 12:47:13 +02:00
Álvaro Herrera
0d3dba38c7
Allow logical replication snapshots to be database-specific
By default, the logical decoding assumes access to shared catalogs, so
the snapshot builder needs to consider cluster-wide XIDs during startup.
That in turn means that, if any transaction is already running (and has
XID assigned), the snapshot builder needs to wait for its completion, as
it does not know if that transaction performed catalog changes earlier.

A possible problem with this concept is that if REPACK (CONCURRENTLY) is
running in some database, backends running the same command in other
databases get stuck until the first one has committed. Thus only a
single backend in the cluster can run REPACK (CONCURRENTLY) at any time.
Likewise, REPACK (CONCURRENTLY) can block walsenders starting on behalf
of subscriptions throughout the cluster.

This patch adds a new option to logical replication output plugin, to
declare that it does not use shared catalogs (i.e. catalogs that can be
changed by transactions running in other databases in the cluster). In
that case, no snapshot the backend will use during the decoding needs to
contain information about transactions running in other databases. Thus
the snapshot builder only needs to wait for completion of transactions
in the current database.

Currently we only use this option in the REPACK background worker. It
could possibly be used in the plugin for logical replication too,
however that would need thorough analysis of that plugin.

Bump WAL version number, due to a new field in xl_running_xacts.

Author: Antonin Houska <ah@cybertec.at>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/90475.1775218118@localhost
2026-04-07 12:31:18 +02:00
Álvaro Herrera
a3b069ef90
Avoid different-size pointer-to-integer cast
Buildfarm member mamba is unhappy that I wrote "(Datum) NULL" in commit
28d534e2ae:
  https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2026-04-07%2005%3A08%3A08
Use "(Datum) 0" which is what we do everywhere else.

Discussion: https://postgr.es/m/CANWCAZaOs_+WPH13ow33Q==+FwBwVZkqzm4vND=WEB4_NBmv1Q@mail.gmail.com
2026-04-07 12:28:05 +02:00
Heikki Linnakangas
6f5ad00ab7 Optimize sort and deduplication in ginExtractEntries()
Remove NULLs from the array first, and use qsort to deduplicate only
the non-NULL items. This simplifies the comparison function. Also
replace qsort_arg() with a templated version so that the comparison
function can be inlined. These changes make ginExtractEntries() a
little faster especially for simple datatypes like integers.

Author: David Geier <geidav.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/6d16b6bd-a1ff-4469-aefb-a1c8274e561a@iki.fi
2026-04-07 13:26:39 +03:00
Peter Eisentraut
b6ccd30d8f Add isolation tests for UPDATE/DELETE FOR PORTION OF
Add documentation about concurrency issues related to UPDATE/DELETE
FOR PORTION OF as well as supporting isolation tests.

Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/ec498c3d-5f2b-48ec-b989-5561c8aa2024%40illuminatedcomputing.com
2026-04-07 11:22:11 +02:00
Álvaro Herrera
5bcc3fbd19
Fix valgrind failure
Buildfarm member skink reports that the new REPACK code is trying to
write uninitialized bytes to disk, which correspond to padding space in
the SerializedSnapshotData struct.  Silence that by initializing the
memory in SerializeSnapshot() to all zeroes.

Co-authored-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/1976915.1775537087@sss.pgh.pa.us
2026-04-07 11:13:50 +02:00
John Naylor
8c3e22a8f8 Use .h for the file containing the page checksum code fragment
Commit 5e13b0f24 used a .c file for a file containing a code fragment,
to avoid adding an exception to headerscheck. That turned out to be
too clever, since it meant installation didn't happen by the usual
mechanism. Make it look like a normal header and add the requisite
exception.

Bug: #19450
Reported-by: RekGRpth <rekgrpth@gmail.com>
Discussion: https://postgr.es/m/19450-bb0612c50c6786e5@postgresql.org
2026-04-07 15:52:55 +07:00
John Naylor
30229be755 Simplify SortSupport for the macaddr data type
As of commit 6aebedc38 Datums are 64-bit values. Since MAC addresses
have only 6 bytes, the abbreviated key always contains the entire
MAC address and is thus authoritative (for practical purposes -- the
tuple sort machinery has no way of knowing that). Abbreviating this
datatype is cheap, and aborting abbreviation prevents optimizations
like radix sort, so remove cardinality estimation.

Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Suggested-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CAJ7c6TMk10rF_LiMz6j9rRy1rqk-5s+wBPuBefLix4cY+-4s1w@mail.gmail.com
2026-04-07 13:29:27 +07:00
Michael Paquier
49cc0d4148 Mark JumbleState as a const in the post_parse_analyze hook
This commit changes the post_parse_analyze_hook_type() hook to take a
const JumbleState, to tell external modules that they are not allowed to
touch the JumbleState that has been compiled by the core code.  This
fixes a pretty old problem with pg_stat_statements, that had always the
idea of modifying the lengths of the constants stored in the
JumbleState.  The previous state could confuse extensions that need to
look at a JumbleState depending on the loading order, if
pg_stat_statements is part of the stack loaded.

Another piece included in this commit is the move of the routine
fill_in_constant_lengths() to queryjumblefuncs.c, to give an option to
extensions to compile the lengths of the constants, if necessary.  I was
surprised by the number of external code that carries a copy of this
routine (see the thread for details).  Previously, this routine modified
JumbleState.  It now copies the set of LocationLens from JumbleState,
and fills the constant lengths for separate use.

pg_stat_statements is updated to use the new ComputeConstantLengths().
JumbleState is now marked with a const in the module, where relevant.

Author: Sami Imseih <samimseih@gmail.com>
Co-authored-by: Lukas Fittl <lukas@fittl.com>
Discussion: https://postgr.es/m/CAA5RZ0tZp5qU0ikZEEqJnxvdSNGh1DWv80sb-k4QAUmiMoOp_Q@mail.gmail.com
2026-04-07 15:22:49 +09:00
John Naylor
51098839cf Split CREATE STATISTICS error reasons out into errdetails
Some errmsgs in statscmds.c were phrased as "...cannot be used
because...". Put the reasons into errdetails. While at it, switch
from passive voice to "cannot create..." for the errmsg.

Author: Yugo Nagata <nagata@sraoss.co.jp>
Suggested-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CANWCAZaZeX0omWNh_ZbD_JVujzYQdRUW8UZOQ4dWh9Sg7OcAow@mail.gmail.com
2026-04-07 11:37:48 +07:00
Michael Paquier
17132f55c5 Fix shmem allocation of fixed-sized custom stats kind
StatsShmemSize(), that computes the shmem size needed for pgstats,
includes the amount of shared memory wanted by all the custom stats
kinds registered.  However, the shared memory allocation was done by
ShmemAlloc() in StatsShmemInit(), meaning that the space reserved was
not used, wasting some memory.

These extra allocations would show up under "<anonymous>" in
pg_shmem_allocations, as the allocations done by ShmemAlloc() are not
tracked by ShmemIndexEnt.

Issue introduced by 7949d95945.

Author: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/04b04387-92f5-476c-90b0-4064e71c5f37@iki.fi
Backpatch-through: 18
2026-04-07 11:59:49 +09:00
Amit Langote
5c54c3ed1b Fix deferred FK check batching introduced by commit b7b27eb41a
That commit introduced AfterTriggerIsActive() to detect whether
we are inside the after-trigger firing machinery, so that RI trigger
functions can take the batched fast path.  It was implemented using
query_depth >= 0, which correctly identified immediate trigger firing
but missed the deferred case where query_depth is -1 at COMMIT via
AfterTriggerFireDeferred().  This caused deferred FK checks to fall
back to the per-row fast path instead of the batched path.

The correct check is whether we are inside an after-trigger firing
loop specifically.  Introduce afterTriggerFiringDepth, a counter
incremented around the trigger-firing loops in AfterTriggerEndQuery,
AfterTriggerFireDeferred, and AfterTriggerSetState, and decremented
after FireAfterTriggerBatchCallbacks() returns.  AfterTriggerIsActive()
now returns afterTriggerFiringDepth > 0.

Reported-by: Chao Li <li.evan.chao@gmail.com>
Author: Chao Li <li.evan.chao@gmail.com>
Co-authored-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/C2133B47-79CD-40FF-B088-02D20D654806@gmail.com
2026-04-07 10:45:59 +09:00
Melanie Plageman
dd78e69cfc Allocate separate DSM chunk for parallel Index[Only]Scan instrumentation
Previously, parallel index and index-only scans packed the parallel scan
descriptor and shared instrumentation (for EXPLAIN ANALYZE) into a
single DSM allocation. Since scans may be instrumented without being
parallel-aware, and vice versa, using separate DSM chunks -- each with
its own TOC key -- is cleaner. A future commit will extend this pattern
to other scan node types.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
2026-04-06 19:10:19 -04:00
Melanie Plageman
43222b8e53 Assert no duplicate keys in shm_toc_insert()
shm_toc_insert() silently accepts duplicate keys. Since shm_toc_lookup()
returns the first matching entry, any later entry with the same key
would be unreachable. Add an assertion to catch this.

Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
2026-04-06 18:41:47 -04:00
Nathan Bossart
87f61f0c82 Add pg_stat_autovacuum_scores system view.
This view contains one row for each table in the current database,
showing the current autovacuum scores for that specific table.  It
also shows whether autovacuum would vacuum or analyze the table.

Bumps catversion.

Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Discussion: https://postgr.es/m/CAA5RZ0s4xjMrB-VAnLccC7kY8d0-4806-Lsac-czJsdA1LXtAw%40mail.gmail.com
2026-04-06 16:56:33 -05:00
Daniel Gustafsson
b3a37ffbc5 Use PG_DATA_CHECKSUM_OFF instead of hardcoded value
For a long time, the online checksums patchset kept the "off" state as
literal zero without a label to be consistent with the previous coding
which only had a label for the "on" state.  Later, when an "off" label
was made not all uses in the code got the memo.  Fix by setting these
to PG_DATA_CHECKSUM_OFF.

While there, fix a duplicate word in a comment introduced by the same
commit.

Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAJ7c6TPRTnQFXXX1CRcYoTLXw2swtDH==uSz1MYoMKdLrKZHjA@mail.gmail.com
2026-04-06 22:11:53 +02:00
Álvaro Herrera
28d534e2ae
Add CONCURRENTLY option to REPACK
When this flag is specified, REPACK no longer acquires access-exclusive
lock while the new copy of the table is being created; instead, it
creates the initial copy under share-update-exclusive lock only (same as
vacuum, etc), and it follows an MVCC snapshot; it sets up a replication
slot starting at that snapshot, and uses a concurrent background worker
to do logical decoding starting at the snapshot to populate a stash of
concurrent data changes.  Those changes can then be re-applied to the
new copy of the table just before swapping the relfilenodes.
Applications can continue to access the original copy of the table
normally until just before the swap, which is the only point at which
the access-exclusive lock is needed.

There are some loose ends in this commit:
1. concurrent repack needs its own replication slot in order to apply
   logical decoding, which are a scarce resource and easy to run out of.
2. due to the way the historic snapshot is initially set up, only one
   REPACK process can be running at any one time on the whole system.
3. there's a danger of deadlocking (and thus abort) due to the lock
   upgrade required at the final phase.

These issues will be addressed in upcoming commits.

The design and most of the code are by Antonin Houska, heavily based on
his own pg_squeeze third-party implementation.

Author: Antonin Houska <ah@cybertec.at>
Co-authored-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Noriyoshi Shinoda <noriyoshi.shinoda@hpe.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/5186.1706694913@antos
Discussion: https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql
2026-04-06 21:55:08 +02:00
Alexander Korotkov
834038c1f8 Avoid syscache lookup while building a WAIT FOR tuple descriptor
Use TupleDescInitBuiltinEntry instead of TupleDescInitEntry when building
the result tuple descriptor for the WAIT FOR command. This avoids a syscache
access that could re-establish a catalog snapshot after we've explicitly
released all snapshots before the wait.

Discussion: https://postgr.es/m/CABPTF7U%2BSUnJX_woQYGe%3D%3DR9Oz%2B-V6X0VO2stBLPGfJmH_LEhw%40mail.gmail.com
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2026-04-06 22:47:26 +03:00
Nathan Bossart
775fe51daa Remove recheck_relation_needs_vacanalyze().
This function is a thin wrapper around relation_needs_vacanalyze()
that handles fetching and freeing the pgstat entry for the table.
Since all callers of relation_needs_vacanalyze() do that anyway, we
can teach that function to fetch/free the pgstat entry and use it
instead.

Suggested-by: Álvaro Herrera <alvherre@kurilemu.de>
Author: Sami Imseih <samimseih@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0s4xjMrB-VAnLccC7kY8d0-4806-Lsac-czJsdA1LXtAw%40mail.gmail.com
2026-04-06 14:30:52 -05:00
Tom Lane
d516974840 Support more object types within CREATE SCHEMA.
Having rejected the principle that we should know how to re-order
the sub-commands of CREATE SCHEMA, there is not really anything
except a little coding to stop us from supporting more object types.
This patch adds support for creating functions (including procedures
and aggregates), operators, types (including domains), collations,
and text search objects.

SQL:2021 specifies that we should allow functions, procedures,
types, domains, and collations, so this moves us a great deal
closer to full SQL compatibility of CREATE SCHEMA.  What remains
missing from their list are casts, transforms, roles, and some
object types we don't support yet (e.g. CREATE CHARACTER SET).
Supporting casts or transforms would be problematic because
they don't have names at all, let alone schema-qualified names,
so it'd be quite a stretch to say that they belong to a schema.
Roles likewise are not schema-qualified, plus they are global
to a cluster, making it even less reasonable to consider them
as belonging to a schema.  So I don't see us trying to complete
the list.

User-defined aggregates and operators are outside the spec's ken,
as are text search objects, so adding them does not do anything for
spec compatibility.  But they go along with these other object types,
plus it takes no additional code to support them since they are
represented as DefineStmts like some variants of CREATE TYPE.
It would indeed take some effort to reject them.

Author: Kirill Reshke <reshkekirill@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CALdSSPh4jUSDsWu3K58hjO60wnTRR0DuO4CKRcwa8EVuOSfXxg@mail.gmail.com
2026-04-06 15:16:25 -04:00
Tom Lane
404db8f9ed Execute foreign key constraints in CREATE SCHEMA at the end.
The previous patch simplified CREATE SCHEMA's behavior to "execute all
subcommands in the order they are written".  However, that's a bit too
simple, as the spec clearly requires forward references in foreign key
constraint clauses to work, see feature F311-01.  (Most other SQL
implementations seem to read more into the spec than that, but it's
not clear that there's justification for more in the text, and this is
the only case that doesn't introduce unresolvable issues.)  We never
implemented that before, but let's do so now.

To fix it, transform FOREIGN KEY clauses into ALTER TABLE ... ADD
FOREIGN KEY commands and append them to the end of the CREATE SCHEMA's
subcommand list.  This works because the foreign key constraints are
independent and don't affect any other DDL that might be in CREATE
SCHEMA.  For simplicity, we do this for all FOREIGN KEY clauses even
if they would have worked where they were.

Author: Jian He <jian.universality@gmail.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1075425.1732993688@sss.pgh.pa.us
2026-04-06 15:16:25 -04:00
Tom Lane
a9c350d9ee Don't try to re-order the subcommands of CREATE SCHEMA.
transformCreateSchemaStmtElements has always believed that it is
supposed to re-order the subcommands of CREATE SCHEMA into a safe
execution order.  However, it is nowhere near being capable of doing
that correctly.  Nor is there reason to think that it ever will be,
or that that is a well-defined requirement.  (The SQL standard does
say that it should be possible to do foreign-key forward references
within CREATE SCHEMA, but it's not clear that the text requires
anything more than that.)  Moreover, the problem will get worse as
we add more subcommand types.  Let's just drop the whole idea and
execute the commands in the order given, which seems like a much
less astonishment-prone definition anyway.  The foreign-key issue
will be handled in a follow-up patch.

This will result in a release-note-worthy incompatibility,
which is that forward references like
	CREATE SCHEMA myschema
	    CREATE VIEW myview AS SELECT * FROM mytable
	    CREATE TABLE mytable (...);
used to work and no longer will.  Considering how many closely
related variants never worked, this isn't much of a loss.

Along the way, pass down a ParseState so that we can provide an
error cursor for "wrong schema name" and related errors, and fix
transformCreateSchemaStmtElements so that it doesn't scribble
on the parsetree passed to it.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/1075425.1732993688@sss.pgh.pa.us
2026-04-06 15:16:25 -04:00
Masahiko Sawada
1ff3180ca0 Allow autovacuum to use parallel vacuum workers.
Previously, autovacuum always disabled parallel vacuum regardless of
the table's index count or configuration. This commit enables
autovacuum workers to use parallel index vacuuming and index cleanup,
using the same parallel vacuum infrastructure as manual VACUUM.

Two new configuration options control the feature. The GUC
autovacuum_max_parallel_workers sets the maximum number of parallel
workers a single autovacuum worker may launch; it defaults to 0,
preserving existing behavior unless explicitly enabled. The per-table
storage parameter autovacuum_parallel_workers provides per-table
limits. A value of 0 disables parallel vacuum for the table, a
positive value caps the worker count (still bounded by the GUC), and
-1 (the default) defers to the GUC.

To handle cases where autovacuum workers receive a SIGHUP and update
their cost-based vacuum delay parameters mid-operation, a new
propagation mechanism is added to vacuumparallel.c. The leader stores
its effective cost parameters in a DSM segment. Parallel vacuum
workers poll for changes in vacuum_delay_point(); if an update is
detected, they apply the new values locally via VacuumUpdateCosts().

A new test module, src/test/modules/test_autovacuum, is added to
verify that parallel autovacuum workers are correctly launched and
that cost-parameter updates are propagated as expected.

The patch was originally proposed by Maxim Orlov, but the
implementation has undergone significant architectural changes
since then during the review process.

Author: Daniil Davydov <3danissimo@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: zengman <zengman@halodbtech.com>
Discussion: https://postgr.es/m/CACG=ezZOrNsuLoETLD1gAswZMuH2nGGq7Ogcc0QOE5hhWaw=cw@mail.gmail.com
2026-04-06 11:48:29 -07:00
Álvaro Herrera
c0b53ec063
Rename cluster.c to repack.c (and corresponding .h)
CLUSTER is no longer the favored way to invoke this functionality, and
the code is about to shift its focus to the REPACK more ambitiously.
Rename the file to avoid leaving an unnecessary historical artifact
around.

Author: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/202603271635.owyhm7btgoic@alvherre.pgsql
2026-04-06 20:11:01 +02:00
Tom Lane
21c69dc73f Disallow system columns in COPY FROM WHERE conditions.
These columns haven't been computed yet when the filtering happens
(since we've not written the candidate tuple into the table); so
any check on them is wrong or useless.  Worse, since aa606b931 such a
reference results in an access off the end of a TupleDesc, potentially
causing a phony "generated columns are not supported in COPY FROM
WHERE conditions" error; and since c98ad086a it throws an Assert
instead.

Actually we could allow tableoid, which has been set to the OID of the
table named as the COPY target.  However, plausible uses for tests of
tableoid would involve a partitioned target table, and the user would
wish it to read as the OID of the destination partition.  There has
been some discussion of changing things to make it work like that,
but pending that happening we should just disallow tableoid along
with other system columns.

It seems best though to install this prohibition only in HEAD.
In the back branches we'll just guard the unsafe TupleDesc access,
and people will keep getting whatever semantics they got before.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/6f435023-8ab6-47c2-ba07-035d0c4212f9@gmail.com
2026-04-06 14:05:01 -04:00
Tom Lane
6582010c80 Fix null-bitmap combining in array_agg_array_combine().
This code missed the need to update the combined state's
nullbitmap if state1 already had a bitmap but state2 didn't.
We need to extend the existing bitmap with 1's but didn't.
This could result in wrong output from a parallelized
array_agg(anyarray) calculation, if the input has a mix of
null and non-null elements.  The errors depended on timing
of the parallel workers, and therefore would vary from one
run to another.

Also install guards against integer overflow when calculating
the combined object's sizes, and make some trivial cosmetic
improvements.

Author: Dmytro Astapov <dastapov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFQUnFj2pQ1HbGp69+w2fKqARSfGhAi9UOb+JjyExp7kx3gsqA@mail.gmail.com
Backpatch-through: 16
2026-04-06 13:14:53 -04:00
Robert Haas
0442f1c9ef Add a guc_check_handler to the EXPLAIN extension mechanism.
It would be useful to be able to tell auto_explain to set a custom
EXPLAIN option, but it would be bad if it tried to do so and the
option name or value wasn't valid, because then every query would fail
with a complaint about the EXPLAIN option. So add a guc_check_handler
that auto_explain will be able to use to only try to set option
name/value/type combinations that have been determined to be legal,
and to emit useful messages about ones that aren't.

Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Discussion: http://postgr.es/m/CA+Tgmob-0W8306mvrJX5Urtqt1AAasu8pi4yLrZ1XfwZU-Uj1w@mail.gmail.com
2026-04-06 12:31:47 -04:00
Nathan Bossart
e3481edfd1 Remove autoanalyze corner case.
The restructuring in commit 53b8ca6881 revealed an interesting
corner case: if a table needs vacuuming for wraparound prevention
and autovacuum is disabled for it, we might still choose to analyze
it.  Research seems to indicate this was an accidental addition by
commit 48188e1621, and further discussion indicates there is
consensus that it is unnecessary and can be removed.

Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Discussion: https://postgr.es/m/adB9nSsm_S0D9708%40nathan
2026-04-06 11:28:46 -05:00
Robert Haas
e0e819cc08 Expose helper functions scan_quoted_identifier and scan_identifier.
Previously, this logic was embedded within SplitIdentifierString,
SplitDirectoriesString, and SplitGUCList. Factoring it out saves
a bit of duplicated code, and also makes it available to extensions
that might want to do similar things without necessarily wanting to
do exactly the same thing.

Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Discussion: http://postgr.es/m/CA+Tgmob-0W8306mvrJX5Urtqt1AAasu8pi4yLrZ1XfwZU-Uj1w@mail.gmail.com
2026-04-06 11:13:25 -04:00
Fujii Masao
93dc1ace20 Release postmaster working memory context in slotsync worker
Child processes do not need the postmaster's working memory context and
normally release it at the start of their main entry point. However,
the slotsync worker forgot to do so.

This commit makes the slotsync worker release the postmaster's working
memory context at startup, preventing unintended use.

Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tiancheng Ge <getiancheng_2012@163.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwHO05JaUpgKF8FBDmPdBUJsK22axRRcgmAUc2Jyi8OK8g@mail.gmail.com
2026-04-06 23:04:18 +09:00
Heikki Linnakangas
ed71d7356e Fix memory leaks introduced by commit 283e823f9d
When freeing pending_shmem_requests we should also free the ->options.

Author: Aleksander Alekseev <aleksander@tigerdata.com>
Discussion: https://www.postgresql.org/message-id/CAJ7c6TN9tp8MTc0WXM0zfSWqjfBqU8gpe+o5KqHB1-cQ7409Kw@mail.gmail.com
2026-04-06 15:46:03 +03:00
Heikki Linnakangas
2670a0fcc6 Fix compilation without injection points with some compilers
Some compilers didn't like the empty initializer when compiled without
USE_INJECTION_POINTS. Per buildfarm member 'drongo', using Visual
Studio 2019.

Author: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/adNHcBVJO5gIOp1l@paquier.xyz
2026-04-06 15:46:00 +03:00
Michael Paquier
404a17c155 Use single LWLock for lock statistics in pgstats
Previously, one LWLock was used for each lock type, adding complexity
without an observable performance benefit as data is gathered only for
paths involving lock waits, at least currently.  This commit replaces
the per-type set of LWLocks with a single LWLock protecting the stats
data of all the lock types, like the stats kinds for SLRU or WAL.  A
good chunk of the callbacks get simpler thanks to this change.

The previous approach also had one bug in the flush callback when nowait
was called with "true": a backend iterating over all entries could
successfully flush some entries while skipping others due to contention,
then unconditionally reset the pending data.  This would cause some
stats data loss.

Oversight in 4019f725f5.

Reported-by: Tomas Vondra <tomas@vondra.me>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/1af63e6d-16d5-4d5b-9b03-11472ef1adf9@vondra.me
2026-04-06 14:01:04 +09:00
Richard Guo
3a08a2a8b4 Fix volatile function evaluation in eager aggregation
Pushing aggregates containing volatile functions below a join can
violate volatility semantics by changing the number of times the
function is executed.

Here we check the Aggref nodes in the targetlist and havingQual for
volatile functions and disable eager aggregation when such functions
are present.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAMbWs48A53PY1Y4zoj7YhxPww9fO1hfnbdntKfA855zpXfVFRA@mail.gmail.com
2026-04-06 11:54:08 +09:00
Richard Guo
bd94845e8c Fix collation handling for grouping keys in eager aggregation
When determining if it is safe to use an expression as a grouping key
for partial aggregation, eager aggregation relies on the B-tree
equalimage support function to ensure that equality implies image
equality.

Previously, the code incorrectly passed the default collation of the
expression's data type to the equalimage procedure, rather than the
expression's actual collation.  As a result, if a column used a
non-deterministic collation but the base type's default collation was
deterministic, eager aggregation would incorrectly assume that the
column was safe for byte-level grouping.  This could cause rows to be
prematurely grouped and subsequently discarded by strict join
conditions, resulting in incorrect query results.

This patch fixes the issue by passing the expression's actual
collation to the equalimage procedure.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAMbWs48A53PY1Y4zoj7YhxPww9fO1hfnbdntKfA855zpXfVFRA@mail.gmail.com
2026-04-06 11:52:33 +09:00
Fujii Masao
a8f45dee91 Add wal_sender_shutdown_timeout GUC to limit shutdown wait for replication
Previously, during shutdown, walsenders always waited until all pending data
was replicated to receivers. This ensures sender and receiver stay in sync
after shutdown, which is important for physical replication switchovers,
but it can significantly delay shutdown. For example, in logical replication,
if apply workers are blocked on locks, walsenders may wait until those locks
are released, preventing shutdown from completing for a long time.

This commit introduces a new GUC, wal_sender_shutdown_timeout,
which specifies the maximum time a walsender waits during shutdown for all
pending data to be replicated. When set, shutdown completes once all data is
replicated or the timeout expires. A value of -1 (the default) disables
the timeout.

This can reduce shutdown time when replication is slow or stalled. However,
if the timeout is reached, the sender and receiver may be left out of sync,
which can be problematic for physical replication switchovers.

Author: Andrey Silitskiy <a.silitskiy@postgrespro.ru>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Takamichi Osumi <osumi.takamichi@fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Vitaly Davydov <v.davydov@postgrespro.ru>
Reviewed-by: Ronan Dunklau <ronan@dunklau.fr>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/TYAPR01MB586668E50FC2447AD7F92491F5E89@TYAPR01MB5866.jpnprd01.prod.outlook.com
2026-04-06 11:35:03 +09:00
Daniel Gustafsson
07009121c2 Test stabilization for online checksums
Postcommit review and buildfarm/CI failures revealed a few issues in
the test code which this commit attempts to resolve.  These failures
are verified using synthetic means.

  * Wait for launcher exit in enable/disable checksum tests

    When enabling or disabling data checksums in a test with waiting
    for an end state (on or off), the test typically want to perform
    more test against the cluster immediately. Make sure to wait for
    the launcher to exit in these cases before returning in order to
    know it can immediately be acted on.  This is a more generic way
    of implementating 0036232ba8.

  * Refactor injection point tests to use the injection_points test
    extension. Two injection points added for online checksums were
    better expressed using the injection_points extension with the
    test code embedded in datachecksum_state.c.

  * Make tests less timing dependent and allow transitions to "on"
    and not just "inprogress-on" in case a test manages to finish
    before it's checked for state.

  * When waiting on a blocking background psql keeping a temporary
    table open, the test first closed the background session abd
    then the server.  This could cause data checksums to manage to
    get enabled in the brief window between dropping the temporary
    table and closing the server.  Fix by closing the server first
    before the background session.

  * Remove a few superfluous duplicate checks and general cleanup
    of comments as well as making LSN logging consistent.

These issues were reported by Andres as well as spotted in the
buildfarm and on CI.

Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/92F25C14-801E-4198-994D-D83E31FEB0D8@yesql.se
2026-04-06 02:03:10 +02:00
Daniel Gustafsson
d771b0a907 Handle checksumworker startup wait race
If the background worker for processing databases manages to finish
before the launcher starts waiting for it, the launcher would treat
it erroneously as an error.  Fix by ensureing to check result state
in this case.  Identified on CI and synthetically reproduced during
local testing.

Also while, make sure to properly lock the shared memory structure
before updating tje result state.

Author: Daniel Gustafsson <daniel@yesql.seA
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/4fxw37ge47v5baeozla5phymi233hxbcjbwwsfwv3mpg3kyl2z@6jk4nkf6jp4
2026-04-06 01:55:06 +02:00
Michael Paquier
557a9f1e3e Add tests for lock statistics, take two
Commit 7c64d56fd9 has removed the isolation test providing coverage
for lock statistics due to some instability in the CI, where the
deadlock timeout may not have enough time to process, preventing the
stats data to be updated.  These also relied on a set of hardcoded
sleeps.

This commit switches the test suite to TAP, instead, that uses an
injection point with a wait to avoid the sleeps.  The injection point is
added in ProcSleep(), once we know that the deadlock timeout has fired
and that the stats have been updated.

Multiple lock patterns are checked, all rely on the same workflow, with
two sessions:
- session 1 holds a given lock type.
- session 2 attaches to the new injection point with the wait action.
- session 2 attempts to acquire a lock conflicting with the lock of
session 1, waiting for the injection point to be reached.
- session 1 releases its lock, session 2 commits.
- pg_stat_lock is polled until the counters are updated for the lock
type.

Bertrand's version of the patch introduced a new routine to
BackgroundPsql() to detect the blocked background sessions.  I have
tweaked the test so as we use the same method as some of the other tests
instead, based on some \echo commands.  This test has been run multiple
times in the CI, all passing, so I'd like to think that this is more
stable than the first version attempted.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/acNTR1lLHwQJ0o+P@ip-10-97-1-34.eu-west-3.compute.internal
2026-04-06 08:51:30 +09:00
Heikki Linnakangas
9b5acad3f4 Convert all remaining subsystems to use the new shmem allocation API
This removes all remaining uses of ShmemInitStruct() and
ShmemInitHash() from built-in code.

Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
2026-04-06 02:13:10 +03:00
Heikki Linnakangas
a4b6139dcc Convert buffer manager to use the new shmem allocation functions
This rectifies the initialization functions a little, making the
"buffer strategy" stuff in freelist.c and buffer mapping hash table in
buf_init.c top-level "subsystems" of their own, registered directly in
subsystemlist.h. Previously they were called indirectly from
BufferManagerShmemInit() and BufferManagerShmemSize()

Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
2026-04-06 02:13:08 +03:00