Commit graph

28616 commits

Author SHA1 Message Date
Michael Paquier
6d6348f032 Fix MCV input array checks in statistics restore functions
The SQL functions for the restore of attribute and expression statistics
accept "most_common_vals" and "most_common_freqs" as independent arrays.
The planner assumes these have the same number of elements, but it was
possible to insert in the catalogs data that would cause an over-read
when the catalog data is loaded in the planner.

There were two holes in the stats restore logic:
- Both arrays should match in size.
- The input array must be one-dimensional, and it should match with what
is delivered by pg_dump when scanning the pg_stats catalogs.

The multivariate extended statistics MCV path (import_mcv) already
validated these inputs via check_mcvlist_array(), and is not affected.
These problems exist in v18 and newer versions for the restore of
attribute statistics.  These problems affect only HEAD for the restore
of the expression statistics.

Reported-by: Jeroen Gui <jeroen.gui1@proton.me>
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Security: CVE-2026-6575
Backpatch-through: 18
2026-05-11 05:13:46 -07:00
Tom Lane
76ab76f875 Avoid passing unintended format codes to snprintf().
timeofday() assumed that the output of pg_strftime() could not contain
% signs, other than the one it explicitly asks for with %%.  However,
we don't have that guarantee with respect to the time zone name (%Z).
A crafted time zone setting could abuse the subsequent snprintf()
call, resulting in crashes or disclosure of server memory.

To fix, split the pg_strftime() call into two and then treat the
outputs as literal strings, not a snprintf format string.  The
extra pg_strftime() call doesn't really cost anything, since the
bulk of the conversion work was done by pg_localtime().

Also, adjust buffer widths so that we're not risking string truncation
during the snprintf() step, as that would create a hazard of producing
mis-encoded output.

This also fixes a latent portability issue: the format string expects
an int, but tp.tv_usec is long int on many platforms.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6474
2026-05-11 05:13:46 -07:00
Noah Misch
46b4f5c11b Fix SQL injection in logical replication origin checks.
ALTER SUBSCRIPTION ... REFRESH PUBLICATION interpolates schema and
relation names into SQL without quoting them.  A crafted subscriber
relation name can inject arbitrary SQL on the publisher.  Test such a
name.  Back-patch to v16, where commit
8756930190 first appeared.

Reported-by: Pavel Kohout <pavel.kohout@aisle.com>
Author: Pavel Kohout <pavel.kohout@aisle.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Backpatch-through: 16
Security: CVE-2026-6638
2026-05-11 05:13:46 -07:00
Michael Paquier
5924e256c4 Apply timingsafe_bcmp() in authentication paths
This commit applies timingsafe_bcmp() to authentication paths that
handle attributes or data previously compared with memcpy() or strcmp(),
which are sensitive to timing attacks.

The following data is concerned by this change, some being in the
backend and some in the frontend:
- For a SCRAM or MD5 password, the computed key or the MD5 hash compared
with a password during a plain authentication.
- For a SCRAM exchange, the stored key, the client's final nonce and the
server nonce.
- RADIUS (up to v18), the encrypted password.
- For MD5 authentication, the MD5(MD5()) hash.

Reported-by: Joe Conway <mail@joeconway.com>
Security: CVE-2026-6478
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Backpatch-through: 14
2026-05-11 05:13:46 -07:00
Michael Paquier
b63f25bddf Fix unbounded recursive handling of SSL/GSS in ProcessStartupPacket()
The handling of SSL and GSS negotiation messages in
ProcessStartupPacket() could cause a recursion of the backend,
ultimately crashing the server as the negotiation attempts were not
tracked across multiple calls processing startup packets.

A malicious client could therefore alternate rejected SSL and GSS
requests indefinitely, each adding a stack frame, until the backend
crashed with a stack overflow, taking down a server.

This commit addresses this issue by modifying ProcessStartupPacket() so
as processed negotiation attempts are tracked, preventing infinite
recursive attempts.  A TAP test is added to check this problem, where
multiple SSL and GSS negotiated attempts are stacked.

Reported-by: Calif.io in collaboration with Claude and Anthropic
Research
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Security: CVE-2026-6479
Backpatch-through: 14
2026-05-11 05:13:46 -07:00
Tom Lane
c55cea5290 Fix assorted places that need to use palloc_array().
multirange_recv and BlockRefTableReaderNextRelation were incautious
about multiplying a possibly-large integer by a factor more than 1
and then using it as an allocation size.  This is harmless on 64-bit
systems where we'd compute a size exceeding MaxAllocSize and then
fail, but on 32-bit systems we could overflow size_t leading to an
undersized allocation and buffer overrun.

Fix these places by using palloc_array() instead of a handwritten
multiplication.  (In HEAD, some of them were fixed already, but
none of that work got back-patched at the time.)

In addition, BlockRefTableReaderNextRelation passes the same value
to BlockRefTableRead's "int length" parameter.  If built for
64-bit frontend code, palloc_array() allows a larger array size
than it otherwise would, potentially allowing that parameter to
overflow.  Add an explicit check to forestall that and keep the
behavior the same cross-platform.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 14
Security: CVE-2026-6473
2026-05-11 05:13:46 -07:00
Tom Lane
066b7b144f Prevent buffer overrun in unicode_normalize().
Some UTF8 characters decompose to more than a dozen codepoints.
It is possible for an input string that fits into well under
1GB to produce more than 4G decomposed codepoints, causing
unicode_normalize()'s decomp_size variable to wrap around to a
small positive value.  This results in a small output buffer
allocation and subsequent buffer overrun.

To fix, test after each addition to see if we've overrun MaxAllocSize,
and break out of the loop early if so.  In frontend code we want to
just return NULL for this failure (treating it like OOM).  In the
backend, we can rely on the following palloc() call to throw error.

I also tightened things up in the calling functions in varlena.c,
using size_t rather than int and allocating the input workspace
with palloc_array().  These changes are probably unnecessary
given the knowledge that the original input and the normalized
output_chars array must fit into 1GB, but it's a lot easier to
believe the code is safe with these changes.

Reported-by: Xint Code
Reported-by: Bruce Dang <bruce@calif.io>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Heikki Linnakangas <hlinnaka@iki.fi>
Backpatch-through: 14
Security: CVE-2026-6473
2026-05-11 05:13:46 -07:00
Tom Lane
0dc1fdc75e Harden our regex engine against integer overflow in size calculations.
The number of NFA states, number of NFA arcs, and number of colors
are all bounded to reasonably small values.  However, there are
places where we try to allocate arrays sized by products of those
quantities, and those calculations could overflow, enabling
buffer-overrun attacks.  In practice there's no problem on 64-bit
machines, but there are some live scenarios on 32-bit machines.

A related problem is that citerdissect() and creviterdissect()
allocate arrays based on the length of the input string, which
potentially could overflow.

To fix, invent MALLOC_ARRAY and REALLOC_ARRAY macros that rely on
palloc_array_extended and repalloc_array_extended with the NO_OOM
option, similarly to the existing MALLOC and REALLOC macros.
(Like those, they'll throw an error not return a NULL result for
oversize requests.  This doesn't really fit into the regex code's
view of error handling, but it'll do for now.  We can consider
whether to change that behavior in a non-security follow-up patch.)

I installed similar defenses in the colormap construction code.
It's not entirely clear whether integer overflow is possible
there, but analyzing the behavior in detail seems not worth
the trouble, as the risky spots are not in hot code paths.

I left a bunch of calls as-is after verifying that they can't
overflow given reasonable limits on nstates and narcs.  Those
limits were enforced already via REG_MAX_COMPILE_SPACE, but
add commentary to document the interactions.

In passing, also fix a related edge case, which is that the
special color numbers used in LACON carcs could overflow the
"color" data type, if ncolors is close to MAX_COLOR.

In v14 and v15, the regex engine calls malloc() directly instead
of using palloc(), so MALLOC_ARRAY and REALLOC_ARRAY do likewise.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6473
2026-05-11 05:13:46 -07:00
Tom Lane
46593aea0a Make palloc_array() and friends safe against integer overflow.
Sufficiently large "count" arguments could result in undetected
overflow, causing the allocated memory chunk to be much smaller
than what the caller will subsequently write into it.  This is
unlikely to be a hazard with 64-bit size_t but can sometimes
happen on 32-bit builds, primarily where a function allocates
workspace that's significantly larger than its input data.
Rather than trying to patch the at-risk callers piecemeal,
let's just redefine these macros so that they always check.

To do that, move the longstanding add_size() and mul_size() functions
into palloc.h and mcxt.c, and adjust them to not be specific to
shared-memory allocation.  Then invent palloc_mul(), palloc0_mul(),
palloc_mul_extended() to use these functions.  Actually, the latter
use inlined copies to save one function call.  repalloc_array() gets
similar treatment.  I didn't bother trying to inline the calls for
repalloc0_array() though.

In v14 and v15, this also adds repalloc_extended(), which previously
was only available in v16 and up.

We need copies of all this in fe_memutils.[hc] as well, since that
module also provides palloc_array() etc.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6473
2026-05-11 05:13:46 -07:00
Michael Paquier
d388e1d7f0 Fix overflows with ts_headline()
The options "StartSel", "StopSel" and "FragmentDelimiter" given by a
caller of the SQL function ts_headline() have their lengths stored as
int16.  When providing values larger than PG_INT16_MAX, it was possible
to overflow the length values stored, leading to incorrect behaviors in
generateHeadline(), in most cases translating to a crash.

Attempting to use values for these options larger than PG_INT16_MAX is
now blocked.  Some test cases are added to cover our tracks.

Reported-by: Xint Code
Author: Michael Paquier <michael@paquier.xyz>
Backpatch-through: 14
Security: CVE-2026-6473
2026-05-11 05:13:46 -07:00
Richard Guo
9d124a14b3 Enforce RETURNING typmod for empty-set JSON_ARRAY(query)
Commit 8d829f5a0 introduced a COALESCE wrapper around the
JSON_ARRAYAGG subquery so that JSON_ARRAY(query) returns '[]' rather
than NULL when the subquery yields no rows, per the SQL/JSON standard.

The empty-array Const used as the COALESCE fallback was, however,
built with typmod -1 and the type input function was likewise invoked
with typmod -1.  As a result, any length restriction from the
RETURNING clause was silently bypassed on the empty-set path, while
the non-empty path enforced it via the JSON_ARRAYAGG coercion.

Build the empty-array Const using the typmod of the COALESCE's
non-empty argument, and pass that typmod to OidInputFunctionCall as
well so the value is length-checked at parse time.  This makes the
empty-set and non-empty-set paths behave consistently.

Reported-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAJTYsWXPYqa58YXrU+SQMVonsAhjLS46HNUMU=wO5zm9MgY3_g@mail.gmail.com
2026-05-08 17:21:48 +09:00
Amit Kapila
a49b9cfd72 Use schema-qualified names in EXCEPT clause error messages.
Error messages in check_publication_add_relation() previously reported
only the relation name when a table in an EXCEPT clause could not be
processed, which is ambiguous when the same name exists in multiple
schemas. Use schema-qualified names instead, consistent with other error
messages that reference relation names.

Author: Dilip Kumar <dilipbalaut@gmail.com>
Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/CAFiTN-scG7b11Jsp+VoDRT8ZFE84eSKLcDsSB18dZ8AaP=R-mw@mail.gmail.com
2026-05-08 10:00:26 +05:30
Richard Guo
a1b754558a Consider opfamily and collation when removing redundant GROUP BY columns
remove_useless_groupby_columns() uses a relation's unique indexes to
prove that some GROUP BY columns are functionally dependent on others,
and so can be dropped from the GROUP BY clause.  The match between
index columns and GROUP BY columns was done by attno alone, ignoring
two equality-relation issues.

A type may belong to multiple btree opfamilies whose notions of
equality differ.  The record type, for instance, has record_ops
(per-field equality) and record_image_ops (bytewise equality).  A
unique index under one opfamily does not prove uniqueness under the
equality used by GROUP BY when the SortGroupClause's eqop comes from a
different opfamily.

Likewise, since nondeterministic collations were introduced in PG 12,
two collations may disagree on equality, and a unique index under one
collation does not prove uniqueness under another.

In either case, rows that the index considers distinct can collapse
into a single GROUP BY group, taking ungrouped columns of differing
values with them, so the planner drops a column that is not in fact
functionally dependent and produces wrong results.

Fix by requiring, for each unique-index key column, that some GROUP BY
item on the same column has an eqop in the index's opfamily and a
collation that agrees on equality with the index's collation.  This
mirrors the combined check relation_has_unique_index_for() applies to
join clauses.

This is a v18 regression: commit bd10ec529 extended
remove_useless_groupby_columns() from primary-key constraints to
arbitrary unique indexes.  Before that, the function consulted only
primary keys, whose enforcement index is required by parse_utilcmd.c
to use the default opclass and the column's declared collation, so
neither mismatch could arise.  Back-patch to v18 only.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49t6uArWoTT-cHY+nhsi23nJJKcF9Xb9cYGzaZ9kNJ98g@mail.gmail.com
Backpatch-through: 18
2026-05-08 12:45:51 +09:00
Richard Guo
ba82de48e6 Fix HAVING-to-WHERE pushdown for simple-CASE form
Commit f76686ce7 added a walker that detects when a HAVING clause uses
a collation that conflicts with the GROUP BY's nondeterministic
collation, keeping such clauses in HAVING.  The walker uses
exprInputCollation() to identify each ancestor's comparison collation,
but missed the simple-CASE case: parse analysis builds each WHEN as
OpExpr(CaseTestExpr op val), where CaseTestExpr is a placeholder for
the arg, while the actual arg expression sits at cexpr->arg, outside
the OpExpr that carries the comparison's inputcollid.  A GROUP Var at
cexpr->arg was therefore visited with the WHEN's inputcollid absent
from the ancestor stack, the conflict went undetected, and the clause
was wrongly pushed to WHERE.

Fix by handling simple CASE explicitly: before walking cexpr->arg,
push every WHEN's inputcollid onto the ancestor stack so a GROUP Var
at the arg is checked against the same collations the WHEN comparisons
would apply.  Then walk the WHEN bodies and defresult under the
unchanged stack, where their own collation contexts are picked up by
the default path.

Back-patch to v18 only; this fix extends the walker added by commit
f76686ce7 and inherits its dependency on the v18 RTE_GROUP mechanism.

Author: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDcqPdd=2V0PQ_oNYj50OUeqSqznqFaYtP3RdokLBDXBqw@mail.gmail.com
Backpatch-through: 18
2026-05-08 10:57:50 +09:00
Amit Langote
4b1b2be22f Fix use-after-free of qs in AfterTriggerEndQuery.
afterTriggerInvokeEvents() may repalloc afterTriggers.query_stack
while firing trigger events, leaving any precomputed entry pointer
dangling.  The loop body in AfterTriggerEndQuery() recomputes qs
after each afterTriggerInvokeEvents() call for that reason, but the
"all fired" break path exits without the recompute, and the
subsequent FireAfterTriggerBatchCallbacks(qs->batch_callbacks)
dereferences the freed pointer.

Fix by recomputing qs immediately before
FireAfterTriggerBatchCallbacks(), as the loop body already does
after each afterTriggerInvokeEvents() call.

The hazard was introduced in 34a3078629, which added the
qs->batch_callbacks dereference at this site.

Reported-by: Amul Sul <sulamul@gmail.com>
Author: Amul Sul <sulamul@gmail.com>
Reviewed-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CAAJ_b95p6-qiVpE2Gpr=bUsNAqTcejD_rPgLnfjx9m=fo3Rf3Q@mail.gmail.com
2026-05-08 09:42:42 +09:00
Masahiko Sawada
b384cdb274 Fix race condition in XLogLogicalInfo and ProcSignal initialization.
Previously, InitializeProcessXLogLogicalInfo() was called before
ProcSignalInit(). This created a window where a process could miss a
signal barrier if it was issued between these two calls. As a result,
the process could fail to update its local XLogLogicalInfo cache,
leading to an inconsistent logical decoding state.

This commit fixes this by moving InitializeProcessXLogLogicalInfo()
after ProcSignalInit(). This ensures that the process is registered to
participate in signal barriers before its state is initialized,
preventing it from missing any state changes propagated during the
startup sequence.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAD21AoBzdeSyLSSPM5E6ysN1r8qzp8u_BRmnLvuAp_S8QxS_fQ@mail.gmail.com
Discussion: https://postgr.es/m/CAD21AoBj+zKvgw_Q8gjr4YbKccW_uMe3OFQ5+KT246FHUuNXSQ@mail.gmail.com
2026-05-07 10:09:42 -07:00
John Naylor
52e629be95 Message corrections for partition split/merge commands
Fix spelling and grammar, turn an accidental duplicate errmsg into
errdetail, and remove an errposition that was not pointing at anything
relevant to the error.

Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Yuchen Li <liyuchen_xyz@163.com> (earlier version)
Discussion: https://postgr.es/m/CAJTYsWUvMT5uKOasPnm6-o9CrdXbRONiAYHTKJb7wx66LB8S1A@mail.gmail.com
2026-05-07 19:10:35 +07:00
Peter Eisentraut
5778fb3eaf Fix typo in error message
Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAJTYsWXFy1j_T82%2BM_S9kFxU414tQYnZQD-b82%3DoL_LbG_5fPQ%40mail.gmail.com
2026-05-07 10:36:59 +02:00
Michael Paquier
6827de95ee Simplify code in objectaddress.c for some property graph objects
Property graph element labels and label properties relied on a direct
systable scan when retrieving their object descriptions.  These can be
simplified with get_catalog_object_by_oid().  This offers the benefit to
do a direct syscache lookup, if available.

The same logic will be used in a follow-up patch when retrieving the
object identity parts, applying the same rule across the board for these
object types.

Extracted from a larger patch by the author.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Alex Guo <guo.alex.hengchen@gmail.com>
Discussion: https://postgr.es/m/aej1DkLwhyZWmtxJ@bdtpg
2026-05-07 10:18:49 +09:00
Alexander Korotkov
5cdec42319 Fix WAIT FOR LSN cleanup on subtransaction abort
WAIT FOR LSN registers the current backend in shared memory before entering an
interruptible wait loop.  Top-level abort and backend exit already call
WaitLSNCleanup(), but subtransaction abort did not.  If an interrupt, such as
statement_timeout, occurred while waiting inside a savepoint, rolling back to
the savepoint left the backend marked as present in the WAIT FOR LSN heap.

Clean up WAIT FOR LSN state from AbortSubTransaction() as well, and add
a TAP test covering reuse of WAIT FOR LSN after a savepoint rollback.

Reported-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAJTYsWXDRwo-RVRaQgwxVcXgURVFeX8BKnijQrPiPcSCkDDX9A%40mail.gmail.com
Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2026-05-06 13:56:38 +03:00
Daniel Gustafsson
9a39056c41 Apply data-checksum worker throttling parameters
The DataChecksumsWorker accepts cost_delay and cost_limit parameters
from pg_enable_data_checksums() so users can throttle the I/O caused
by enabling checksums.  Due to the API for setting the cost parameters
changing between when the code was written, and when it was committed
the new cost update function call was omitted and thus the parameters
were silently ignored.

Fix by calling VacuumUpdateCosts() after assigning the parameters
(both during worker startup and on the runtime cost-update path), and
by leaving the page-cost weights at their GUC-controlled defaults.

Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDeevH6aTyWdXYBJW0wOmfoZy66gDi5TfinK_dXeCrHQLg@mail.gmail.com
2026-05-06 12:38:12 +02:00
Daniel Gustafsson
2018bd6167 Skip WAL for unlogged main fork during online checksum enable
ProcessSingleRelationFork() unconditionally generated an FPI WAL
record for every page of every relation when enabling checksums.
Unlogged relations, which by definition never generate WAL for
data changes, were not exempt which generated excessive WAL to
be emitted.

Fix by guarding the FPI WAL record call with RelationNeedsWAL()
to avoid emitting WAL for unlogged main forks.  Unlogged pages
are still dirtied to ensure the checksum is written to disk at
the next checkpoint.  The init fork remains WAL-logged even for
unlogged relations, as it's needed on the standby to materialize
the relation after promotion (see ResetUnloggedRelations()).
Skipping init-fork WAL would leave the standby with a stale init
fork that, once copied to the main fork on promotion, would fail
checksum verification on every read of the unlogged relation.

A test which creates an unlogged table with an index, enables
checksums, promotes the standby, and verifies that the unlogged
relation and its indexes are still readable post-promotion has
been added.

Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDeGrpZbNZdLjd_T4b43xKEEXZN0HGhkFm-1bkBdyzK7AQ@mail.gmail.com
2026-05-06 12:38:01 +02:00
Álvaro Herrera
a0a0c0c20e
Skip other sessions' temp tables in REPACK, CLUSTER, and VACUUM FULL
get_tables_to_repack() and get_all_vacuum_rels() were including other
sessions' temporary tables in their output work list, causing REPACK,
CLUSTER and VACUUM FULL (when executed without a table list) to attempt
to acquire AccessExclusiveLock on them, potentially blocking for an
extended time.  Fix by skipping other-session temp tables early, before
they are added to the list.

This issue is ancient, but there have been no complaints about it that I
know of, so I'm opting for not backpatching at present.

Author: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/0b555318-2bf2-46df-9377-09629a2a59db@uni-muenster.de
2026-05-05 16:20:26 +02:00
Peter Eisentraut
22f9207aaa Message style improvements (oauth related) 2026-05-05 10:39:13 +02:00
Álvaro Herrera
eb2e2eb4d4
Don't lose column values on REPACK
Commit 28d534e2ae introduced reform_tuple() with a fast path that
returns the source tuple verbatim when no dropped columns require fixing
up.  I (Álvaro) failed to realize that this broke handling of columns
with a 'missingval' defined: after a VACUUM FULL, CLUSTER, or REPACK
operation, the catalogued missingval is thrown away, so the tuples are
no longer correct.

Fix by forcing the rewrite when the tuple is shorter than the tuple
descriptor.

Author: Satya Narlapuram <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDeoccU5CudrJpmSKZfKZ1gRMNY=5BxSC=JpHgkonzgcOw@mail.gmail.com
2026-05-05 10:24:49 +02:00
Peter Eisentraut
d0eac3cafb Make spelling consistent
"vertexes" -> "vertices"

Reported-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAJTYsWXFy1j_T82%2BM_S9kFxU414tQYnZQD-b82%3DoL_LbG_5fPQ%40mail.gmail.com
2026-05-05 09:36:54 +02:00
Richard Guo
574581b50a Consider collation when proving subquery uniqueness
rel_is_distinct_for()'s RTE_SUBQUERY branch passed only the equality
operator from each join clause to query_is_distinct_for(), discarding
the operator's input collation.  query_is_distinct_for() then verified
opfamily compatibility but never checked collations, so a DISTINCT /
GROUP BY / set-op operating under one collation was trusted to prove
uniqueness for a comparison performed under an unrelated collation.
As with the recent fix in relation_has_unique_index_for(), this is
unsound for nondeterministic collations and yields wrong query results
in any optimization that consumes the proof.

Fix by carrying each clause's operator input collation into
query_is_distinct_for() and validating it at every check-site against
the subquery target expression's collation.

Back-patch to all supported branches.  query_is_distinct_for() is
declared in an installed header, so on stable branches the existing
two-list signature is retained as a thin wrapper that forwards to a
new collation-aware entry point; external callers continue to receive
the historical collation-blind answer.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMbWs4_XUUSTyzCaRjUeeahWNqi=8ZOA5Q4coi8zUVEDSBkM6A@mail.gmail.com
Backpatch-through: 14
2026-05-05 10:23:31 +09:00
Richard Guo
5a55ea507a Consider collation when proving uniqueness from unique indexes
relation_has_unique_index_for() has long had an XXX noting that it
doesn't check collations when matching a unique index's columns
against equality clauses.  This was benign as long as all collations
in play reduced to the same notion of equality, but has been incorrect
since nondeterministic collations were introduced in PG 12: a unique
index under a deterministic collation does not prove uniqueness under
a nondeterministic collation, nor vice versa.

The consequence is wrong query results for any planner optimization
that consumes the faulty proof, including inner-unique join execution
(which stops the inner search after the first match per outer row),
useless-left-join removal, semijoin-to-innerjoin reduction, and
self-join elimination.

Fix by requiring the index's collation to agree on equality with the
clause's input collation.  Two collations agree on equality if either
is InvalidOid (denoting a non-collation-sensitive operation, which
cannot conflict with the other side), if they have the same OID, or if
both are deterministic: by definition a deterministic collation treats
two strings as equal iff they are byte-wise equal (see CREATE
COLLATION), so any two deterministic collations share the same
equality relation and the uniqueness proof carries over.  Any mismatch
involving a nondeterministic collation is rejected.

Back-patch to all supported branches; the bug has existed since
nondeterministic collations were introduced in PG 12.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMbWs4_XUUSTyzCaRjUeeahWNqi=8ZOA5Q4coi8zUVEDSBkM6A@mail.gmail.com
Backpatch-through: 14
2026-05-05 10:22:53 +09:00
Tom Lane
93da297366 Declare load_hosts() as returning HostsFileLoadResult.
This function returns some value of enum HostsFileLoadResult,
but for reasons lost in the development process was declared to
return "int".  Fix that, for clarity and so that our typedefs
collection tooling sees the typedef as used.  Also fix the
variable that the sole call assigns into.  Move the typedef
to the header file that declares load_hosts() to avoid creating
header dependency problems.

Discussion: https://postgr.es/m/359138.1777922557@sss.pgh.pa.us
2026-05-04 18:33:06 -04:00
Álvaro Herrera
b5f92b8eb4
Fix off-by-one in repack index loop
A blunder of mine (Álvaro) in commit 28d534e2ae.

Author: Lakshmi N <lakshmin.jhs@gmail.com>
Reviewed-by: Xiaopeng Wang <wxp_728@163.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CA+3i_M9ytFufvD8Tm0rhpfxuC4XrpgQDBHxM7NJQYxv488JW7w@mail.gmail.com
2026-05-04 20:01:19 +02:00
Peter Eisentraut
dc9e7c9ed9 Handle nodes that may appear in GraphPattern expression trees
expression_tree_mutator_impl() did not handle T_GraphPattern,
T_GraphElementPattern, and T_GraphPropertyRef.  The corresponding
expression_tree_walker_impl() already handles all three node types.
This causes an "unrecognized node type" error whenever a GRAPH_TABLE
appeared in an expression tree.

While at it, also update raw_expression_tree_walker() and
expression_tree_walker() to handle missing nodes that may appear in
GraphPattern expression trees.  When raw_expression_tree_walker() is
called, GraphElementPattern::labelexpr contains ColumnRefs instead of
GraphLabelRefs.  Hence those are not handled in
raw_expression_tree_walker().

Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHg%2BQDc97WFTSkXg%3Dg_ZAH8GnY2gJrvq72cs%2BYjqEAuZgXnkAQ%40mail.gmail.com
2026-05-04 17:34:32 +02:00
Peter Eisentraut
891a57c739 Do not define type for a property graph
Even though a property graph is defined in pg_class it does not
contain any rows by itself and need not have a type defined. Avoid
creating a type for it.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAExHW5ucu7ZTgYkO6rB_1ShJP3e%3DGAT2T3CP4XWN8rUVEsiJoA%40mail.gmail.com
2026-05-04 15:45:56 +02:00
Peter Eisentraut
b83a94a73b Add missing serial commas 2026-05-04 11:53:04 +02:00
Amit Kapila
bf3ead6075 Simplify translatable messages for tuple value details in conflict.c.
append_tuple_value_detail() constructed user-visible messages using
separately translated fragments such as ": ", ", ", and ".",. This
makes correct translation difficult or impossible in some languages.

Refactor append_tuple_value_detail() to move all punctuation and
sentence construction to the callers, which now use a single
translatable string with a %s placeholder for the tuple data.

Reported-by: David Rowley <dgrowleyml@gmail.com>
Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/227279.1775956328%40sss.pgh.pa.us#8f3a5f50543556c60cc5a13270cb7ba4
Discussion: https://postgr.es/m/CAApHDvohYOdrvhVxXzCJNX_GYMSWBfjTTtB6hgDauEtZ8Nar2A@mail.gmail.com
2026-05-04 12:06:41 +05:30
Alexander Korotkov
c06d1a4ba6 Mark modified the FSM buffer as dirty during recovery
The XLogRecordPageWithFreeSpace function updates the freespace map (FSM) data
while replaying data-level WAL records during the recovery. If the FSM block
is updated, it needs to be marked as modified. Currently, this is done with
the MarkBufferDirtyHint call (as in all other cases for modifying FSM data).
However, in the recovery context, this function will actually do nothing if
checksums are enabled. It's assumed that the page should not be dirtied
during recovery while modifying hints to protect against torn pages, since no
new WAL data can be generated at this point to store FPI.

Such logic does not seem fully aligned with the FSM case, as its blocks could
be simply zeroed if a checksum mismatch is detected. Currently, changes to an
FSM block could be lost if each change to that block occurs infrequently
enough to allow it to be evicted from the cache. To persist the change, the
modification needs to be performed while the FSM block is still kept in
buffers and marked as dirty after receiving its FPI. If the block has already
been cleaned, the change won't be persisted, so stored FSM blocks may remain
in an obsolete state.

If a large number of discrepancies between the data in leaf FSM blocks and the
actual data blocks accumulate on the replica server, this could cause
significant delays in insert operations after switchover. Such an insert
operation may need to visit many data blocks marked as having sufficient
space in the FSM, only to discover that the information is incorrect and the
FSM records need to be corrected. In a heavily trafficked insert-only table
with many concurrent clients performing inserts, this has been observed to
cause several-second stalls, causing visible application malfunction. The
desire to avoid such cases was the reason behind the commit ab7dbd681, which
introduced an update of FSM data during the heap_xlog_visible invocation.
However, an update to the FSM data on the standby side could be lost due to a
missing 'dirty' flag, so there is still a possibility that a large number of
FSM records will contain incorrect data. Note that having a zeroed FSM page
in such a case (due to a checksum mismatch) is preferable, as a zero value
will be interpreted as an indication of full data blocks, and the inserter
will be routed to the next FSM block or to the end of the table.

Given that FSM is ready to handle torn page writes and
XLogRecordPageWithFreeSpace is called only during the recovery, there seems
to be no reason to use MarkBufferDirtyHint here instead of a regular
MarkBufferDirty call.

Discussion: https://postgr.es/m/596c4f1c-f966-4512-b9c9-dd8fbcaf0928%40postgrespro.ru
Author: Alexey Makhmutov <a.makhmutov@postgrespro.ru>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2026-05-03 20:23:50 +03:00
Alexander Korotkov
e7cd592174 Wake standby_write/standby_flush waiters from the WAL replay loop
The startup process only woke STANDBY_REPLAY waiters after replaying
each WAL record. STANDBY_WRITE and STANDBY_FLUSH waiters depended only
on walreceiver write/flush callbacks. As a result, replay progress alone
did not wake those waiters, and in pure archive recovery (where no
walreceiver exists) they could sleep until timeout.

Fix by also calling WaitLSNWakeup() for STANDBY_WRITE and
STANDBY_FLUSH after each replay. For the replay-floor semantics used by
GetCurrentLSNForWaitType(), replay progress is a valid lower bound for
both modes: WAL cannot be replayed unless it has already been written
and flushed locally.

This works together with the replay-position floor in
GetCurrentLSNForWaitType(). The getter ensures that a waiter woken by
replay can recheck successfully; the replay-side wakeups ensure that a
waiter already asleep is notified when replay reaches its target.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1957514.1775526774%40sss.pgh.pa.us
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2026-05-03 16:22:02 +03:00
Alexander Korotkov
cba67b5b87 Use replay position as floor for WAIT FOR LSN standby_(write|flush)
GetCurrentLSNForWaitType() for standby_write and standby_flush modes
returned only the walreceiver position, which may lag behind WAL
already present on the standby from a base backup, archive restore,
or prior streaming.  This could cause unnecessary blocking if the
target LSN falls between the walreceiver's tracked position and the
replay position.

Fix by returning the maximum of the walreceiver position and the
replay position.  WAL up to the replay point is physically on disk
regardless of its origin, so there is no reason to wait for the
walreceiver to re-receive it.

This complements 29e7dbf5e4, which seeded writtenUpto to
receiveStart in RequestXLogStreaming() to fix the most common
hang scenario.  The getter-level floor handles the remaining edge
cases: targets between receiveStart and the replay position, and
standbys running with archive recovery only (no walreceiver).

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1957514.1775526774%40sss.pgh.pa.us
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2026-05-03 16:22:02 +03:00
Alexander Korotkov
df9f938ca2 Remove redundant WAIT FOR LSN caller-side pre-checks
All five wakeup call sites duplicate WaitLSNWakeup()'s internal
fast-path minWaitedLSN check and add an unnecessary NULL check
on waitLSNState.

Remove the inline pre-checks and call WaitLSNWakeup() directly.
The fast-path check inside WaitLSNWakeup() already returns early
when no waiter's target has been reached, so there is no
performance difference.

The waitLSNState NULL checks are also unnecessary: shared memory
is fully initialized before any backend or auxiliary process
starts, so waitLSNState is always non-NULL at these call sites.

Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/jzq5shdewncpxc35r3s2mcfsmo4bjovkza5mnqf5bdfumhfi3g%40bglckf7dxmw5
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2026-05-03 16:22:02 +03:00
Alexander Korotkov
a80a593ab6 Fix memory ordering in WAIT FOR LSN wakeup mechanism
WAIT FOR LSN uses a Dekker-style handshake: the waker stores an LSN
position then reads minWaitedLSN; the waiter stores its target into
minWaitedLSN then reads the position.  Without a barrier between each
side's store and load, a CPU may satisfy the load before the store
becomes globally visible, causing either side to miss a concurrent
update.  The result is a missed wakeup: the waiter sleeps indefinitely
until the next unrelated event.

Fix by embedding the required barriers into the atomic operations on
minWaitedLSN:

- In updateMinWaitedLSN(), use pg_atomic_write_membarrier_u64() so the
  waiter's preceding heap update is visible before the new minWaitedLSN
  value is published.

- In WaitLSNWakeup(), use pg_atomic_read_membarrier_u64() in the
  fast-path check so the waker's preceding position store is globally
  visible before minWaitedLSN is read.

The waiter side is also covered by the barrier semantics already present
in GetCurrentLSNForWaitType(): GetWalRcvWriteRecPtr() uses an explicit
read barrier (from patch 0001), while the remaining getters acquire a
spinlock, which implies the same ordering.

Also call ResetLatch() unconditionally after WaitLatch(), following the
standard latch loop pattern.  WaitLatch() does not guarantee that all
simultaneously true wake conditions are reported in one return, so a
timeout can race with SetLatch().  If we skip ResetLatch() on a timeout
return, the code performs further asynchronous-state checks before
consuming the latch, violating the latch API's required wait/reset
pattern.  That can leave the latch set across loop exit and cause a
later unrelated WaitLatch() in the same backend to return immediately.

Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/zqbppucpmkeqecfy4s5kscnru4tbk6khp3ozqz6ad2zijz354k%40w4bdf4z3wqoz
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2026-05-03 16:22:02 +03:00
Alexander Korotkov
dfb690dd52 Use barrier semantics when reading/writing writtenUpto
The walreceiver publishes its write position lock-free via writtenUpto.
On weakly-ordered architectures (ARM, PowerPC), both sides of this
handshake need explicit barriers so that the lock-less reader sees a
consistent state.

Use pg_atomic_write_membarrier_u64() at both write sites and
pg_atomic_read_membarrier_u64() in GetWalRcvWriteRecPtr().  This matches
the barrier semantics that GetWalRcvFlushRecPtr() and other LSN-position
functions get implicitly from their spinlock acquire/release, and
protects from bugs caused by expectations of similar barrier guarantees
from different LSN-position functions.

Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/zqbppucpmkeqecfy4s5kscnru4tbk6khp3ozqz6ad2zijz354k%40w4bdf4z3wqoz
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2026-05-03 16:22:02 +03:00
Andrew Dunstan
b772f3fcad Only show signal-sender PID/UID detail in server log
The errdetail() added in 55890a9194 (and reworked in 3e2a1496ba)
exposed the operating-system PID and UID of whoever sent the
termination signal directly to the affected client.

Discussion suggested this should not be sent to the client, but only
recorded in the server log where the admin can use it for diagnosis.

Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Discussion: https://postgr.es/m/E5CA274C-74BD-4067-8B73-A3AD8C080EFA@gmail.com
2026-05-01 13:20:08 -04:00
Fujii Masao
c0b24b32b0 Avoid blocking indefinitely while finishing walsender shutdown
When walsender finishes streaming during shutdown, it sends a
CommandComplete message to tell the receiver that WAL streaming is done.
Previously, that path used EndCommand() followed by pq_flush().

Those functions can block indefinitely waiting for the socket to become
writeable. As a result, even when wal_sender_shutdown_timeout is set,
walsender could remain stuck while sending the final completion message,
and the shutdown timeout would not be enforced.

Fix this by introducing EndCommandExtended(), which allows
CommandComplete to be queued with pq_putmessage_noblock(), and by
using the walsender nonblocking flush path instead of pq_flush(), so
the shutdown timeout continues to be checked while pending output is
flushed.

Per CI testing on FreeBSD.

Reported-by: Andres Freund <andres@anarazel.de>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/vwlugmsogfn36jhm56zwrgd7m6xe6ircltvfh3kzt6kldvbtht@f45dgow5uhnx
2026-05-01 12:12:44 +09:00
Richard Guo
f76686ce7f Fix HAVING-to-WHERE pushdown with nondeterministic collations
When GROUP BY uses a nondeterministic collation, the planner's
optimization of moving HAVING clauses to WHERE can produce incorrect
query results.  The HAVING clause may apply a stricter collation that
distinguishes values the GROUP BY considers equal.  Pushing such a
clause to WHERE causes it to filter individual rows before grouping,
potentially eliminating group members and changing aggregate results.

Fix this by detecting collation conflicts before flatten_group_exprs,
while the HAVING clause still contains GROUP Vars (Vars referencing
RTE_GROUP).  At that point, each GROUP Var directly carries the GROUP
BY collation as its varcollid, making it straightforward to compare
against the operator's inputcollid.  A mismatch where the GROUP BY
collation is nondeterministic means the clause is unsafe to push down.
RowCompareExpr is treated specially, since it carries per-column
inputcollids[] rather than a single inputcollid.

The conflicting clause indices are recorded in a Bitmapset and
consulted during the existing HAVING-to-WHERE loop, so that only
affected clauses are kept in HAVING; other safe clauses in the same
query are still pushed.

Back-patch to v18 only.  The fix relies on the RTE_GROUP mechanism
introduced in v18 (commit 247dea89f), which is what lets us identify
grouping expressions and their resolved collations via GROUP Vars on
pre-flatten havingQual.  Pre-v18 branches lack that machinery, so a
back-patch there would need a different approach.  Given the absence
of field reports of this bug on back branches, the risk of carrying a
different fix on stable branches is not justified.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://postgr.es/m/CAMbWs48Dn2wW6XM94GZsoyMiH42=KgMo+WcobPKuWvGYnWaPOQ@mail.gmail.com
Backpatch-through: 18
2026-05-01 11:13:50 +09:00
Amit Langote
410013d2a5 Use "concurrent delete" in serialization error for TM_Deleted cases
In ExecLockRows() and ri_LockPKTuple(), the TM_Deleted code path was
using the same "could not serialize access due to concurrent update"
message as the TM_Updated path.  Use "concurrent delete" instead, since
the tuple was deleted, not updated.  The ExecLockRows() instance was
likely a copy-paste error per Andres; the ri_LockPKTuple() instance
was carried over from the same pattern in commit 2da86c1ef9.

Update affected isolation test expected files accordingly and add
a new test to fk-concurrent-pk-upd.spec with concurrent delete of the
PK row.

The ExecLockRows() change is master-only for lack of user complaints
and to avoid breaking anything that might match on the error text.

Reported-by: Jian He <jian.universality@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CACJufxEG1JTCq4A1gnNAu-bGAq9Xn=Xkf7kC3TRWFz6iuUOuRA@mail.gmail.com
2026-05-01 10:00:29 +09:00
Richard Guo
8d829f5a02 Fix JSON_ARRAY(query) empty set handling and view deparsing
According to the SQL/JSON standard, JSON_ARRAY(query) must return an
empty JSON array ('[]') when the subquery returns zero rows.

Previously, the parser rewrote JSON_ARRAY(query) into a JSON_ARRAYAGG
aggregate function.  Because this aggregate evaluates to NULL over an
empty set without a GROUP BY clause, the constructor erroneously
returned NULL.  Additionally, this premature rewrite baked physical
implementation details into the catalog, preventing ruleutils.c from
deparsing the original syntax for views.

This patch resolves both issues by introducing a new
JSCTOR_JSON_ARRAY_QUERY constructor type.  The parser builds the
executable form --- a COALESCE-wrapped JSON_ARRAYAGG subquery --- from
raw parse nodes via transformExprRecurse, and stores it in the func
field.  The original transformed Query is kept in a new orig_query
field so that ruleutils.c can deparse the original syntax for views.
During planning, eval_const_expressions replaces the node with the
pre-built func expression.

The deparsing issue was reported by Tom Lane.

Bump catalog version.

Bug: #19418
Reported-by: Lukas Eder <lukas.eder@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/19418-591ba1f29862ef5b@postgresql.org
2026-05-01 09:42:00 +09:00
Álvaro Herrera
6ca631b990
REPACK CONCURRENTLY: fix processing of toasted tuples
In order to process tuples inserted or updated while REPACK executes, we
write those tuples to disk and later restore them; however, some forms
of toasted tuples were not being processed correctly.  Fix that.

Also expand the tests a bit for better coverage.

Author: Satya Narlapuram <satyanarlapuram@gmail.com>
Author: Antonin Houska <ah@cybertec.at>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDeXb9HM2VGKXQedyCp52GzajJK5KOUdNi6oLjsS0nerQw@mail.gmail.com
2026-04-30 23:32:57 +02:00
Andrew Dunstan
6cf49e804c Fix attnum remapping in generateClonedExtStatsStmt()
When cloning extended statistics via CREATE TABLE ... LIKE ... INCLUDING
STATISTICS, stxkeys holds attribute numbers from the source (parent)
table, but get_attname() was being called with the child relation's
OID.  If the parent has dropped columns, the child's attribute numbers
are renumbered sequentially and no longer match, so the lookup either
returns the wrong column name (silent corruption) or errors out when
the attnum does not exist in the child.

Fix it by remapping the parent attnum through attmap before the lookup,
consistent with how expression statistics are already handled a few
lines below.

Add a regression test covering both manifestations: a 3-column parent
where the stale attnum refers to no child column (cache-lookup error),
and a 4-column parent where the stale attnum silently refers to the
wrong child column.

Author: Julien Tachoires <julmon@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Discussion: https://postgr.es/m/20260415105718.tomuncfbmlt67oel@poseidon.home.virt
Backpatch-through: 14
2026-04-30 11:04:57 -04:00
Andrew Dunstan
5642a0367c Avoid SIGSEGV in pg_get_database_ddl() on NULL tablespace
There is a narrow race in which a concurrent ALTER DATABASE ... SET
TABLESPACE moves the database off the tablespace and a DROP TABLESPACE
removes it between the syscache lookup and the catalog scan. If that
happens, output an error.

Author: Chao Li <lic@highgo.com>
Reviewed-by: Jack Bonatakis <jack@bonatak.is>
Reviewed-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/573E45C1-31A4-4885-A00C-1A2171159A2A@gmail.com
2026-04-30 10:14:52 -04:00
Daniel Gustafsson
75152c5dc5 Fix data_checksum GUC show_hook
Commit f19c0eccae erroneously omitted the show_hook for the
data_checksum GUC.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
2026-04-30 13:41:57 +02:00
Daniel Gustafsson
1df361e3d8 Improve database detection logic in datachecksumsworker
The worker need to know whether a database which failed checksum
processing still exists, or has been dropped.  This improves the
detection logic by checking for being partially dropped.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
2026-04-30 13:41:55 +02:00
Daniel Gustafsson
bf25e5571b Improve handling of concurrent checksum requests
When pg_{enable|disable}_data_checksums is called while checksums are
being enabled or disabled, the already running launcher is detected
and the new desired state is recorded.  Processing will then pick up
the new state and change its operation to fulfill the new request.
If the same state is requested but with different cost values, the
new cost values will take effect on the next relation processed.

The previous coding had a complex logic of starting a new launcher
for this, which is now avoided with the shared mem structure instead
used to signal current processing.

This makes the logic more robust, and fixes a bug where the launcher
would erroneously revert back to the "off" state.

Access to the shared memory is also protected with LWLocks in all
cases.  Since the shmem structure is used for signalling between
the worker and the launcher, and there can be only one of each,
there were no concurrency issues detected but it's better to stick
to proper locking protocol should this ever be updated to handle
multiple workers.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
2026-04-30 13:41:53 +02:00
Daniel Gustafsson
381d19da15 Typo and spelling fixups for online checksums
A collection of spelling, wording and punctuation fixups for the code
documentation from postcommit review.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
2026-04-30 13:41:50 +02:00
Daniel Gustafsson
25b922ec58 Fix invalid checksum state transition in checkpoints
Commit 78e950cb8 added checksum state handling to all XLOG_CHECKPOINT
records which caused unnecessary state transitions and emission of
procsignal barriers.  Remove as only the _REDO record need to handle
checksum state.  Barrier emission is also consistently made after
controlfile updates to avoid race conditions.

Additionally, interrupts are held between calling ProcSignalInit and
InitLocalDataChecksumState to remove a window where otherwise invalid
state transitions can happen.

Also remove a pointless assertion on Controlfile which will never hit.

Author: Tomas Vondra <tomas@vondra.me>
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
2026-04-30 13:41:48 +02:00
Daniel Gustafsson
8fb8ded889 Handle data_checksum state changes during launcher_exit
When erroring out from the datachecksums launcher during data checksum
enabling, before state has transitioned to "on", we revert back to the
"off" state.  Since checksums weren't enabled, there is no use staying
in an inprogress state since the checksum launcher currently doesn't
support restarting from where it left off.  Should restartability get
added in the future, this would need to be revisited.  This state
transition was however missing from the allowed transitions in the
statemachine causing an error.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
2026-04-30 13:41:46 +02:00
Daniel Gustafsson
b120358c61 Prevent pg_enable/disable_data_checksums() on standby
These functions missed a RecoveryInProgress() check, allowing them to
be called on a hot standby.  Enabling, or disabling, checksums on the
standby only would cause the cluster to get out of sync and replaying
checksum transitions to fail.

Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAHg+QDfRk4-S7DMmdbXJnQ-xF=sUpMAKuh8b83ObLqYVKx5QLA@mail.gmail.com
2026-04-30 13:41:41 +02:00
Amit Kapila
2bf6c9ff71 Fix double table_close of sequence_rel in copy_sequences().
sequence_rel was declared at batch scope, so when a row is skipped due to
concurrent drop or insufficient privileges, the end-of-row cleanup closes
the stale pointer from the previous row, tripping the relcache refcount
assertion.

Move sequence_rel inside the per-row loop.

Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/CAJTYsWWOuw-yfmzotV4jCJ6LLxEsb=STLcGtDYXOxRcU9Te3Pw@mail.gmail.com
2026-04-30 16:39:39 +05:30
Michael Paquier
5941e7f092 Fix errno check based on EINTR in pg_flush_data()
Upon a failure of sync_file_range(), EINTR was checked based on the
returned result of the routine rather than its errno.  sync_file_range()
returns -1 on failure, making the check a no-op, invalidating the retry
attempt in this case.

Oversight in 0d369ac650.

Author: DaeMyung Kang <charsyam@gmail.com>
Discussion: https://postgr.es/m/20260429151811.1810874-1-charsyam@gmail.com
Backpatch-through: 16
2026-04-30 18:44:38 +09:00
Michael Paquier
ac59a90bef Adjust some incorrect *GetDatum() macros
This reverts portions of commit 6dcfac9696, which is wrong in trying
to use a *GetDatum() that matches with the C types of the values read.
*GetDatum() should match with the output argument types of the SQL
functions.

The portions of 6dcfac9696 that are right regarding this rule are:
- gistget.c, where the GiST support functions use DatumGetUInt16() to
retrieve the strategy number.
- The BRIN code for strategynum, used in syscache lookups.

The adjustments done in this commit are for pageinspect, pg_buffercache
and pg_lock_status().

While double-checking the whole state of the tree regarding non-matching
pairs of DatumGet*() and *GetDatum(), I have found much more code paths
that are incorrect, unrelated to 6dcfac9696.  These may be adjusted in
the future, in a different patch (perhaps not for v19, as we are already
past feature freeze).

Reported-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/97f9375a-be61-4272-a44d-408337fe8fa6@eisentraut.org
Discussion: https://postgr.es/m/CAJ7c6TMcGu8qmRe1gZfJ-gOzVnZq-t=fwn-UuyStx1w6ZyydMw@mail.gmail.com
2026-04-30 13:10:19 +09:00
Michael Paquier
4bfd0f1b76 Fix error of pg_stat_reset_shared()
"lock" a values is supported since 4019f725f5, but the error message
of the function used when specifying an incorrect value forgot about it.

Author: Maksim Logvinenko <logvinenko-ms@yandex.ru>
Discussion: https://postgr.es/m/433431777389005@mail.yandex.ru
2026-04-30 11:12:56 +09:00
Peter Geoghegan
748d871b7c Fix nbtree skip array parallel alloc accounting.
btestimateparallelscan neglected to add btps_arrElems[] space overhead
for skip array scan keys that were later output by nbtree preprocessing.
Skip arrays don't actually need to use this space, but a scan with a
subsequent SAOP array will need to subscript btps_arrElems[] using a
simple so->arrayKeys[]-wise offset.  so->arrayKeys[] has entries for
both kinds of arrays.

As a result of this oversight, it was possible for an index scan with a
skip array and a lower-order SAOP array to write past the allocated
shared memory boundary when storing the SAOP array's cur_elem.  In
practice the problem seems to be limited to scans with many skipped
index columns, since our general approach to estimating the amount of
shared memory that will be required is fairly conservative.

To fix, have btestimateparallelscan request an extra sizeof(int) space
for key columns that might require a skip array later on.

Oversight in commit 92fe23d9, which added the nbtree skip scan
optimization.

Author: Siddharth Kothari <sidkot@google.com>
Discussion: https://postgr.es/m/CAGCUe0Lwk3C0qdkBa+OLpYc7yXwW=pbaz8Sju4xMXEQAmyp+5g@mail.gmail.com
Backpatch-through: 18
2026-04-29 11:22:23 -04:00
John Naylor
ca9807dfec Cosmetic fixes for radix sort
Do minor comment fixes and remove implicit cast to Datum.

While here, let's prefer crashing instead of entering an infinite
loop in case of future programming mistakes when computing next_level,
suggested by ChangAo Chen.

Discussion: https://postgr.es/m/tencent_49E3F11E74D8A584A2144ED532A490CBC40A@qq.com
2026-04-29 16:14:25 +07:00
John Naylor
a0302eac78 Remove unused ByteaSortSupport.abbreviate field
Oversight in commit 9303d62c6.

Author: Aleksander Alekseev <aleksander@tigerdata.com>
Discussion: https://postgr.es/m/CAJ7c6TOsKmmgyA6EwxKVsNeHFHrWXYdgZivgjo_ujf890BpeeA@mail.gmail.com
2026-04-29 13:57:07 +07:00
Amit Kapila
c210647aeb Fix xid_advance_interval when max_retention_duration is 0.
When a subscription has retain_dead_tuples enabled and maxretention is
zero (unlimited), adjust_xid_advance_interval() mistakenly caps
xid_advance_interval to zero.

This zero interval forces get_candidate_xid() to evaluate
TimestampDifferenceExceeds() as always true, causing the apply worker to
call GetOldestActiveTransactionId() for every WAL message. This
leads to unnecessary ProcArrayLock acquisitions.

Fix this by only capping the interval when maxretention > 0, allowing
the exponential back-off to function properly.

Author: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDdKVnCLHot=AcoPpEiSyDzGz7wGYjAFHVOw57oDtmUDWQ@mail.gmail.com
2026-04-28 14:51:38 +05:30
Amit Kapila
7424aac088 Fix wrong datum conversion for subretentionactive in CreateSubscription.
Use BoolGetDatum() instead of Int32GetDatum() when storing the boolean
subretentionactive column in pg_subscription. This was an oversight in
a850be2fe6.

Author: Lakshmi N <lakshmin.jhs@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Discussion: https://postgr.es/m/CA+3i_M98-XjE-_fw0p+8xOnw64y2_YLtJfcwvCfsVMn-z2ZjGg@mail.gmail.com
2026-04-28 13:13:47 +05:30
Álvaro Herrera
832e220d99
REPACK CONCURRENTLY: Don't use deferrable primary keys
Similarly to logical replication, REPACK CONCURRENTLY needs to ability
to reliably locate a tuple based on an identity.  A replica identity
index is okay.  Primary keys normally also are, except when they are
deferrable, because a tuple being modified might not yet be indexed,
causing REPACK to fail.

Change the REPACK CONCURRENTLY code to use GetRelationIdentityOrPK(),
similar to what the logical replication code does.  (Though we don't yet
support locating tuples based on arbitrary indexes for replica identity
FULL.)

While at it, add a few more test cases for situations that aren't
supported by REPACK, to improve coverage.

Author: Chao Li <lic@highgo.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Reviewed-by: Yuchen Li <liyuchen_xyz@163.com>
Discussion: https://postgr.es/m/10DD5E13-B45D-44F1-BE08-C63E00ABCAC0@gmail.com
2026-04-27 18:22:03 +02:00
Peter Eisentraut
33db6c4baf Fix DELETE/UPDATE FOR PORTION OF with rules
Previously, these test cases would give internal errors or crash.  The
fix is to add some missing fields of ForPortionOfExpr to
expression_tree_walker.

Author: jian he <jian.universality@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://postgr.es/m/CACJufxHs1Hs00EqsZ4NbuAjmYzMzjJyP1sAj12Ne=cBsEVmQOA@mail.gmail.com
2026-04-27 10:34:06 +02:00
Richard Guo
c66d6d19eb Fix bogus calls in remove_self_join_rel()
remove_self_join_rel() called adjust_relid_set() on all_result_relids
and leaf_result_relids but threw away the return value.  Since
adjust_relid_set() returns a freshly-built Relids and does not modify
the input in place, the calls did nothing.  This has been the case
since the SJE feature went in (commit fc069a3a6).

There has been no observable misbehavior, because the relid being
passed is guaranteed not to be a member of either set.  At the point
remove_self_join_rel() runs, those sets contain only resultRelation;
inheritance children have not been added yet, as that happens later in
query_planner(), in expand_single_inheritance_child() called from
add_other_rels_to_query().  And remove_self_joins_recurse() rejects
parse->resultRelation as an SJE candidate to preserve the EvalPlanQual
mechanism.  Even with the result assigned, the calls would be no-ops
in practice.

Rather than make the calls do the cleanup they pretend to do, replace
them with assertions of the invariant.  Any future loosening of the
SJE candidate filter -- for instance to allow eliminating a result
relation under provable conditions -- will trip the assertion and
force whoever does it to revisit this code.

Additionally, decorate adjust_relid_set() with pg_nodiscard so that
any future accidental discard of its return value is caught at compile
time.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49fYQcqJfJ_Gtn8r1GFNoYtb1=2AUab4ieuqY4Zid9ocQ@mail.gmail.com
2026-04-27 10:40:37 +09:00
Michael Paquier
b801d5eef1 Fix some memory leaks in the WAL receiver
These are old leaks, that can pile up if a WAL receiver stays alive,
waiting for new WAL data after the sender has switched to a new
timeline.

While this is technically a bug, the impact is minimal and would only
become noticeable if the WAL sender handles a lot of timeline switches,
so no backpatch is done.  Note that in most cases, primary_conninfo
would be updated in a standby to point to a new sender, meaning a
restart of the WAL receiver.  Let's be clean on HEAD, though.

Author: DaeMyung Kang <charsyam@gmail.com>
Discussion: https://postgr.es/m/20260426170100.847923-1-charsyam@gmail.com
Discussion: https://postgr.es/m/20260426170219.849330-1-charsyam@gmail.com
2026-04-27 10:32:45 +09:00
Peter Eisentraut
9d2979dd68 pg_get_viewdef() and lateral references in COLUMNS of GRAPH_TABLE
Expressions in GRAPH_TABLE COLUMNS list may have lateral references.
get_rule_expr() requires lateral namespaces to deparse such
references.  get_from_clause_item() does not pass them when processing
the expressions in COLUMNS list causing ERROR "bogus varlevelsup: 0
offset 0".  Fix get_from_clause_item() to pass input deparse_context
containing lateral namespaces to get_rule_expr() instead of the dummy
context.

Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHg%2BQDcLVa2iBnggkHxY4itZbXtDMfsYHEjnCUYe9hNbnxDi-w%40mail.gmail.com
2026-04-24 09:12:03 +02:00
Peter Eisentraut
ac3bcc041c Fix collation of expressions in GRAPH_TABLE COLUMNS clause
GRAPH_TABLE clause is converted into a rangetable entry, which is
ignored by assign_query_collations().  Hence we assign collations
while transforming its parts.  But expressions in COLUMNS clause
missed that treatment, so fix that.

While at it, also add comments about collation assignment to the parts
of GRAPH_TABLE clause, and also fix a small grammar issue.

Reported-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAHg+QDc4aaiufYSgrwMMPMMRTPtQ66SghcrPFbWJFZMqNaG+BA@mail.gmail.com
2026-04-24 08:43:26 +02:00
Peter Eisentraut
9082680c34 Fix typos and grammar in graph table rewrite code
Reported-by: Lakshmi N <lakshmin.jhs@gmail.com>
Author: Lakshmi N <lakshmin.jhs@gmail.com>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/CA+3i_M9gpUGjH-BkJk=UFjK16jq9fEQHpmZ1cxpJO+xM4hWC+A@mail.gmail.com
2026-04-24 08:27:04 +02:00
Peter Eisentraut
2ff289d039 Check for stack overflow when rewriting graph queries
generate_queries_for_path_pattern_recurse() and
generate_setop_from_pathqueries() are recursive functions.  For a
property graph with hundreds of tables, a graph pattern with a handful
element patterns can cause stack overflow.  Fix it by calling
check_stack_depth() at the beginning of these functions.

Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAHg+QDfgK0xddH8f3eAb+UVn7sBDOnv8RvM6OkP4HtHAt6aD7w@mail.gmail.com
2026-04-24 08:18:21 +02:00
David Rowley
94219a73f7 Fix incorrect logic for hashed IN / NOT IN with non-strict operators
ExecEvalHashedScalarArrayOp(), when using a strict equality function,
performs a short-circuit when looking up NULL values.  When the function
is non-strict, the code incorrectly looked up the hash table for a
zero-valued Datum, which could have resulted in an accidental true
return if the hash table contained zero valued Datum, or could result
in a crash for non-byval types.

Here we fix this by adding an extra step when we build the hash table to
check what the result of a NULL lookup would be.  This requires looping
over the array and checking what the non-hashed version of the code
would do.  We cache the results of that in the expression so that we can
reuse the result any time we're asked to search for a NULL value.

It's important to note that non-strict equality functions are free to
treat any NULL value as equal to any non-NULL value.  For example,
someone may wish to design a type that treats an empty string and NULL
as equal.

All built-in types have strict equality functions, so this could affect
custom / user-defined types.

Author: Chengpeng Yan <chengpeng_yan@outlook.com>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: ChangAo Chen <cca5507@qq.com>
Discussion: https://postgr.es/m/A16187AE-2359-4265-9F5E-71D015EC2B2D@outlook.com
Backpatch-through: 14
2026-04-24 14:03:12 +12:00
Heikki Linnakangas
713bce9484 Don't call CheckAttributeType() with InvalidOid on dropped cols
If CheckAttributeType() is called with InvalidOid, it performs a bunch
of pointless, futile syscache lookups with InvalidOid, but ultimately
tolerates it and has no effect. We were calling it with InvalidOid on
dropped columns, but it seems accidental that it works, so let's stop
doing it.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/93ce56cd-02a6-4db1-8224-c8999372facc@iki.fi
Backpatch-through: 14
2026-04-23 21:28:26 +03:00
Heikki Linnakangas
dd40691976 Don't allow composite type to be member of itself via multirange
CheckAttributeType() checks that a composite type is not made a member
of itself with ALTER TABLE ADD COLUMN or ALTER TYPE ADD ATTRIBUTE,
even indirectly via a domain, array, another composite type or a range
type. But it missed checking for multiranges. That was a simple
oversight when multiranges were added.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/93ce56cd-02a6-4db1-8224-c8999372facc@iki.fi
Backpatch-through: 14
2026-04-23 21:28:11 +03:00
David Rowley
4f0cbc6fb5 Fix new-to-v19 -Wshadow warnings
There's some talk about upgrading our current -Wshadow=compatible-local
up to -Wshadow.  There's some pending questions as to whether the churn
and extra backpatching pain are worthwhile for doing all of them.  We
can't use the latter argument for ones that are new to v19, providing we
fix them now.  So let's fix those ones so that the problem is not any
worse for if we decide to fix the remainder for v20.

Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Yuchen Li <liyuchen_xyz@163.com>
Discussion: https://postgr.es/m/CAApHDvp=rx5GxM=yW8QhFF3noXtYt7LkOxJ7zkaPOzpti4Gm8w@mail.gmail.com
2026-04-23 16:49:29 +12:00
Jeff Davis
dbf217c1c7 catcache.c: use C_COLLATION_OID for texteqfast/texthashfast.
The problem report was about setting GUCs in the startup packet for a
physical replication connection. Setting the GUC required an ACL
check, which performed a lookup on pg_parameter_acl.parname. The
catalog cache was hardwired to use DEFAULT_COLLATION_OID for
texteqfast() and texthashfast(), but the database default collation
was uninitialized because it's a physical walsender and never connects
to a database. In versions 18 and later, this resulted in a NULL
pointer dereference, while in version 17 it resulted in an ERROR.

As the comments stated, using DEFAULT_COLLATION_OID was arbitrary
anyway: if the collation actually mattered, it should have used the
column's actual collation. (In the catalog, some text columns are the
default collation and some are "C".)

Fix by using C_COLLATION_OID, which doesn't require any initialization
and is always available. When any deterministic collation will do,
it's best to consistently use the simplest and fastest one, so this is
a good idea anyway.

Another problem was raised in the thread, which this commit doesn't
fix (see second discussion link).

Reported-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/D18AD72A-5004-4EF8-AF80-10732AF677FA@yandex-team.ru
Discussion: https://postgr.es/m/4524ed61a015d3496fc008644dcb999bb31916a7.camel%40j-davis.com
Backpatch-through: 17
2026-04-22 10:22:44 -07:00
Peter Geoghegan
d14f69a32a Harmonize function parameter names for Postgres 19.
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in a few places.  Most of
these inconsistencies were introduced during Postgres 19 development.

This commit was written with help from clang-tidy, by mechanically
applying the same rules as similar clean-up commits (the earliest such
commit was commit 035ce1fe).
2026-04-22 12:47:19 -04:00
Tom Lane
a50777680f Guard against overly-long numeric formatting symbols from locale.
to_char() allocates its output buffer with 8 bytes per formatting
code in the pattern.  If the locale's currency symbol, thousands
separator, or decimal or sign symbol is more than 8 bytes long,
in principle we could overrun the output buffer.  No such locales
exist in the real world, so it seems sufficient to truncate the
symbol if we do see it's too long.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/638232.1776790821@sss.pgh.pa.us
Backpatch-through: 14
2026-04-22 12:41:00 -04:00
Tom Lane
d7970e7e95 Prevent some buffer overruns in spell.c's parsing of affix files.
parse_affentry() and addCompoundAffixFlagValue() each collect fields
from an affix file into working buffers of size BUFSIZ.  They failed
to defend against overlength fields, so that a malicious affix file
could cause a stack smash.  BUFSIZ (typically 8K) is certainly way
longer than any reasonable affix field, but let's fix this while
we're closing holes in this area.

I chose to do this by silently truncating the input before it can
overrun the buffer, using logic comparable to the existing logic in
get_nextfield().  Certainly there's at least as good an argument for
raising an error, but for now let's follow the existing precedent.

Reported-by: Igor Stepansky <igor.stepansky@orca.security>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/864123.1776810909@sss.pgh.pa.us
Backpatch-through: 14
2026-04-22 12:02:15 -04:00
Tom Lane
844bb90d49 Prevent buffer overrun in spell.c's CheckAffix().
This function writes into a caller-supplied buffer of length
2 * MAXNORMLEN, which should be plenty in real-world cases.
However a malicious affix file could supply an affix long
enough to overrun that.  Defend by just rejecting the match
if it would overrun the buffer.  I also inserted a check of
the input word length against Affix->replen, just to be sure
we won't index off the buffer, though it would be caller error
for that not to be true.

Also make the actual copying steps a bit more readable, and remove
an unnecessary requirement for the whole input word to fit into the
output buffer (even though it always will with the current caller).

The lack of documentation in this code makes my head hurt, so
I also reverse-engineered a basic header comment for CheckAffix.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/641711.1776792744@sss.pgh.pa.us
Backpatch-through: 14
2026-04-22 10:47:56 -04:00
Alexander Korotkov
713e553e32 Preserve extension dependencies on indexes during partition merge/split
When using ALTER TABLE ... MERGE PARTITIONS or ALTER TABLE ... SPLIT
PARTITION, extension dependencies on partition indexes were being lost.
This happened because the new partition indexes are created fresh from
the parent partitioned table's indexes, while the old partition indexes
(with their extension dependencies) are dropped.

Fix this by collecting extension dependencies from source partition
indexes before detaching them, then applying those dependencies to the
corresponding new partition indexes after they're created.  The mapping
between old and new indexes is done via their common parent partitioned
index.

For MERGE operations, all source partition indexes sharing a parent
partitioned index must have the same extension dependencies; if they
differ, an error naming both conflicting partition indexes is raised.
The check is implemented by collecting one entry per partition index,
sorting by parent index OID, and comparing adjacent entries in a single
pass.  This is order-independent: the same set of partitions produces
the same decision regardless of the order they are listed in the MERGE
command, and subset mismatches are caught in both directions.

For SPLIT operations, the new partition indexes simply inherit all
extension dependencies from the source partition's index.

The regression tests exercising this feature live under
src/test/modules/test_extensions, where the test_ext3 and test_ext5
extensions are available; core regression tests cannot assume any
particular extension is installed.

Author: Matheus Alcantara <matheusssilv97@gmail.com>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Reported-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Dmitry Koval <d.koval@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/CALdSSPjXtzGM7Uk4fWRwRMXcCczge5uNirPQcYCHKPAWPkp9iQ%40mail.gmail.com
2026-04-22 14:34:20 +03:00
Dean Rasheed
5548a969b6 Fix UPDATE/DELETE ... WHERE CURRENT OF on a table with virtual columns.
Formerly, attempting to use WHERE CURRENT OF to update or delete from
a table with virtual generated columns would fail with the error
"WHERE CURRENT OF on a view is not implemented".

The reason was that the check preventing WHERE CURRENT OF from being
used on a view was in replace_rte_variables_mutator(), which presumed
that the only way it could get there was as part of rewriting a query
on a view. That is no longer the case, since replace_rte_variables()
is now also used to expand the virtual generated columns of a table.

Fix by doing the check for WHERE CURRENT OF on a view at parse time.
This is safe, since it is no longer possible for the relkind to change
after the query is parsed (as of b23cd185f).

Reported-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDc_TwzSgb=B_QgNLt3mvZdmRK23rLb+RkanSQkDF40GjA@mail.gmail.com
Backpatch-through: 18
2026-04-22 11:50:17 +01:00
Dean Rasheed
7834251758 Fix expansion of EXCLUDED virtual generated columns.
If the SET or WHERE clause of an INSERT ... ON CONFLICT command
references EXCLUDED.col, where col is a virtual generated column, the
column was not properly expanded, leading to an "unexpected virtual
generated column reference" error, or incorrect results.

The problem was that expand_virtual_generated_columns() would expand
virtual generated columns in both the SET and WHERE clauses and in the
targetlist of the EXCLUDED pseudo-relation (exclRelTlist). Then
fix_join_expr() from set_plan_refs() would turn the expanded
expressions in the SET and WHERE clauses back into Vars, because they
would be found to match the expression entries in the indexed tlist
produced from exclRelTlist.

To fix this, arrange for expand_virtual_generated_columns() to not
expand virtual generated columns in exclRelTlist. This forces
set_plan_refs() to resolve generation expressions in the query using
non-virtual columns, as required by the executor.

In addition, exclRelTlist now always contains only Vars. That was
something already claimed in a couple of existing comments in the
planner, which relied on that fact to skip some processing, though
those did not appear to constitute active bugs.

Reported-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDf7wTLz_vqb1wi1EJ_4Uh+Vxm75+b4c-Ky=6P+yOAHjbQ@mail.gmail.com
Backpatch-through: 18
2026-04-22 09:03:44 +01:00
Amit Langote
1b9dc2cb75 Fix some const qualifier use in ri_triggers.c
The ri_FetchConstraintInfo() and ri_LoadConstraintInfo() functions
were declared to return const RI_ConstraintInfo *, but callers
sometimes need to modify the struct, requiring casts to drop the
const.  Remove the misapplied const qualifiers and the casts that
worked around them.

Reported-by: Peter Eisentraut <peter@eisentraut.org>
Author: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/548600ed-8bbb-4e50-8fc3-65091b122276@eisentraut.org
2026-04-22 11:36:54 +09:00
Michael Paquier
9d3e094f12 Allow ALTER INDEX .. ATTACH PARTITION to validate a parent index
This commit tweaks ALTER INDEX .. ATTACH PARTITION to attempt a
validation of a parent index in the case where an index is already
attached but the parent is not yet valid.  This occurs in cases where a
parent index was created invalid such as with CREATE INDEX ONLY, but was
left invalid after an invalid child index was attached (partitioned
indexes set indisvalid to false if at least one partition is
!indisvalid, indisvalid is true in a partitioned table iff all
partitions are indisvalid).  This could leave a partition tree in a
situation where a user could not bring the parent index back to valid
after fixing the child index, as there is no built-in mechanism to do
so.  This commit relies on the fact that repeated ATTACH PARTITION
commands on the same index silently succeed.

An invalid parent index is more than just a passive issue.  It causes
for example ON CONFLICT on a partitioned table if the invalid parent
index is used to enforce a unique constraint.

Some test cases are added to track some of problematic patterns, using a
set of partition trees with combinations of invalid indexes and ATTACH
PARTITION.

Reported-by: Mohamed Ali <moali.pg@gmail.com>
Author: Sami Imseih <sanmimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Discussion: http://postgr.es/m/CAGnOmWqi1D9ycBgUeOGf6mOCd2Dcf=6sKhbf4sHLs5xAcKVCMQ@mail.gmail.com
Backpatch-through: 14
2026-04-22 10:32:10 +09:00
Melanie Plageman
31b0544b32 bufmgr: use I/O stats arguments in FlushUnlockedBuffer()
FlushUnlockedBuffer() accepted io_object and io_context arguments but
hardcoded IOOBJECT_RELATION and IOCONTEXT_NORMAL when calling
FlushBuffer(). Pass them through instead. Also fix FlushBuffer() to use
its io_object parameter for I/O timing stats rather than hardcoding
IOOBJECT_RELATION.

Not actively broken since all current callers pass IOOBJECT_RELATION and
IOCONTEXT_NORMAL, so not backpatched.

Author: Chao Li <lic@highgo.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/BC97546F-5C15-42F2-AD57-CFACDB9657D0@gmail.com
2026-04-21 17:47:50 -04:00
Melanie Plageman
da6874635d Make local buffers pin limit more conservative
GetLocalPinLimit() and GetAdditionalLocalPinLimit(), currently in use
only by the read stream, previously allowed a backend to pin all
num_temp_buffers local buffers. This meant that the read stream could
use every available local buffer for read-ahead, leaving none for other
concurrent pin-holders like other read streams and related buffers like
the visibility map buffer needed during on-access pruning.

This became more noticeable since b46e1e54d0, which allows on-access
pruning to set the visibility map, which meant that some scans also
needed to pin a page of the VM. It caused a test in
src/test/regress/sql/temp.sql to fail in some cases.

Cap the local pin limit to num_temp_buffers / 4, providing some
headroom. This doesn't guarantee that all needed pins will be available
— for example, a backend can still open more cursors than there are
buffers — but it makes it less likely that read-ahead will exhaust the
pool.

Note that these functions are not limited by definition to use in the
read stream; however, this cap should be appropriate in other contexts.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/97529f5a-ec10-46b1-ab50-4653126c6889%40gmail.com
2026-04-21 11:03:05 -04:00
Tom Lane
1cd3cd372a Remove gen_node_support.pl's ad-hoc ABI stability check.
We installed this in commit eea9fa9b2 to protect against foreseeable
mistakes that would break ABI in stable branches by renumbering
NodeTag enum entries.  However, we now have much more thorough
ABI stability checks thanks to buildfarm members using libabigail
(see the .abi-compliance-history mechanism).  So this incomplete,
single-purpose check seems like an anachronism.  I wouldn't object
to keeping it were it not that it requires an additional manual step
when making a new stable git branch.  That seems like something easy
to screw up, so let's get rid of it.

This patch just removes the logic that checks for changes in the last
auto-assigned NodeTag value.  We still need eea9fa9b2's cross-check
on the supplied list of header files, to prevent divergence between
the makefile and meson build systems.  We'll also sometimes need the
nodetag_number() infrastructure for hand-assigning new NodeTags in
stable branches.

Discussion: https://postgr.es/m/1458883.1776143073@sss.pgh.pa.us
2026-04-21 10:58:00 -04:00
Michael Paquier
d3bba04154 Fix a set of typos and grammar issues across the tree
This batch is similar to 462fe0ff62 and addresses a variety of code
style issues, including grammar mistakes, typos, inconsistent variable
names in function declarations, and incorrect function names in comments
and documentation.  These fixes have accumulated on the community
mailing lists since the commit mentioned above.

Notably, Alexander Lakhin previously submitted a patch identifying many
of the trivial typos and grammar issues that had been reported on
pgsql-hackers.  His patch covered a somewhat large portion of the issues
addressed here, though not all of them.

The documentation changes only affect HEAD.
2026-04-21 14:46:22 +09:00
Richard Guo
c6a79be3f3 Fix incorrect NEW references to generated columns in rule rewriting
When a rule action or rule qualification references NEW.col where col
is a generated column (stored or virtual), the rewriter produces
incorrect results.

rewriteTargetListIU removes generated columns from the query's target
list, since stored generated columns are recomputed by the executor
and virtual ones store nothing.  However, ReplaceVarsFromTargetList
then cannot find these columns when resolving NEW references during
rule rewriting.  For UPDATE, the REPLACEVARS_CHANGE_VARNO fallback
redirects NEW.col to the original target relation, making it read the
pre-update value (same as OLD.col).  For INSERT,
REPLACEVARS_SUBSTITUTE_NULL replaces it with NULL.  Both are wrong
when the generated column depends on columns being modified.

Fix by building target list entries for generated columns from their
generation expressions, pre-resolving the NEW.attribute references
within those expressions against the query's targetlist, and passing
them together with the query's targetlist to ReplaceVarsFromTargetList.

Back-patch to all supported branches.  Virtual generated columns were
added in v18, so the back-patches in pre-v18 branches only handle
stored generated columns.

Reported-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDexGTmCZzx=73gXkY2ZADS6LRhpnU+-8Y_QmrdTS6yUhA@mail.gmail.com
Backpatch-through: 14
2026-04-21 14:28:26 +09:00
Michael Paquier
9b43e6793b Fix orphaned processes when startup process fails during PM_STARTUP
When the startup process exists with a FATAL error during PM_STARTUP,
the postmaster called ExitPostmaster() directly, assuming that no other
processes are running at this stage.  Since 7ff23c6d27, this
assumption is not true, as the checkpointer, the background writer, the
IO workers and bgworkers kicking in early would be around.

This commit removes the startup-specific shortcut happening in
process_pm_child_exit() for a failing startup process during PM_STARTUP,
falling down to the existing exit() flow to signal all the started
children with SIGQUIT, so as we have no risk of creating orphaned
processes.

This required an extra change in HandleFatalError() for v18 and newer
versions, as an assertion could be triggered for PM_STARTUP.  It is now
incorrect.  In v17 and older versions, HandleChildCrash() needs to be
changed to handle PM_STARTUP so as children can be waited on.

While on it, fix a comment at the top of postmaster.c.  It was claiming
that the checkpointer and the background writer were started after
PM_RECOVERY.  That is not the case.

Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAJTYsWVoD3V9yhhqSae1_wqcnTdpFY-hDT7dPm5005ZFsL_bpA@mail.gmail.com
Backpatch-through: 15
2026-04-21 09:39:59 +09:00
Tom Lane
f0ac6d494b Fix relid-set clobber during join removal.
Commit cfcd57111 et al fell over under Valgrind testing.
(It seems to be enough to #define USE_VALGRIND, you don't actually
need to run it under Valgrind to see failures.)  The cause is that
remove_rel_from_eclass updates each EquivalenceMember's em_relids,
and those can be aliases of the left_relids or right_relids of some
RestrictInfo in ec_sources.  If the update made em_relids empty then
bms_del_member will have pfree'd the relid set, so that the subsequent
attempt to clean up ec_sources accesses already-freed memory.

We missed seeing ill effects before cfcd57111 because (a) if the
pfree happens then we will remove the EquivalenceMember altogether,
making the source RestrictInfo no longer of use, and (b) the
cleanup of ec_sources didn't touch left/right_relids before that.

I'm unclear though on how cfcd57111 managed to pass non-USE_VALGRIND
testing.  Apparently we managed to store another Bitmapset into the
freed space before trying to access it, but you'd not think that would
happen 100% of the time.  I think what USE_VALGRIND changes is that it
makes list.c much more memory-hungry, so that the freed space gets
claimed by some List node before a Bitmapset can be put there.

This failure can be seen in v16, v17, and master, but oddly enough not
v18.  That's because the SJE patch replaced the simple bms_del_members
calls used here with adjust_relid_set, which is careful not to
scribble on its input.  But commit 20efbdffe just recently put back
the old coding and thus resurrected the problem.

Discussion: https://postgr.es/m/458729.1776724816@sss.pgh.pa.us
Backpatch-through: 16, 17, master
2026-04-20 19:24:52 -04:00
Jeff Davis
bdcb85b56a Fix callers of unicode_strtitle() using srclen == -1.
Currently, only called that way in tests, which failed to fail.

Discussion: https://postgr.es/m/581a72ff452bb045ba83bbe3c6cf4467702d4f0f.camel@j-davis.com
Backpatch-through: 18
2026-04-20 14:44:08 -07:00
Tom Lane
cfcd571116 Clean up all relid fields of RestrictInfos during join removal.
The original implementation of remove_rel_from_restrictinfo()
thought it could skate by with removing no-longer-valid relid
bits from only the clause_relids and required_relids fields.
This is quite bogus, although somehow we had not run across a
counterexample before now.  At minimum, the left_relids and
right_relids fields need to be fixed because they will be
examined later by clause_sides_match_join().  But it seems
pretty foolish not to fix all the relid fields, so do that.

This needs to be back-patched as far as v16, because the
bug report shows a planner failure that does not occur
before v16.  I'm a little nervous about back-patching,
because this could cause unexpected plan changes due to
opening up join possibilities that were rejected before.
But it's hard to argue that this isn't a regression.  Also,
the fact that this changes no existing regression test results
suggests that the scope of changes may be fairly narrow.
I'll refrain from back-patching further though, since no
adverse effects have been demonstrated in older branches.

Bug: #19460
Reported-by: François Jehl <francois.jehl@pigment.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/19460-5625143cef66012f@postgresql.org
Backpatch-through: 16
2026-04-20 14:48:23 -04:00
Tom Lane
207cb2abcb Make ExecForPortionOfLeftovers() obey SRF protocol.
Before each call to the SRF, initialize isnull and isDone, as per the
comments for struct ReturnSetInfo.  This fixes a Coverity warning
about rsi.isDone not being initialized.  The built-in
{multi,}range_minus_multi functions don't return without setting it,
but a user-supplied function might not be as accommodating.

We also add statistics tracking around the function call, which
will be expected once user-defined withoutPortionProcs functions
are supported, and a cross-check on rsi.returnMode just for
paranoia's sake.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://postgr.es/m/4126231.1776622202@sss.pgh.pa.us
2026-04-20 10:21:52 -04:00
Álvaro Herrera
5dbb63fc82
REPACK: do not require REPLICATION or LOGIN
Although REPACK (CONCURRENTLY) uses replication slots, there is no
concern that the slot will leak data of other users, because the
MAINTAIN privilege on the table is required anyway; requiring
REPLICATION is user-unfriendly without providing any actual protection.

A related aspect is that the REPLICATION attribute is not needed to
prevent REPACK from stealing slots from logical replication, since
commit e76d8c749c made REPACK use a separate pool of replication
slots.

Similarly, there's no reason to require that the table owner has the
LOGIN privilege.  Bypass the default behavior in the background worker
launch sequence.

Because there are now successful concurrent repack runs in the
regression tests, we're forced to run test_plan_advice under
wal_level=replica, so add that.  Also, move the cluster.sql test to a
different parallel group in parallel_schedule: apparently the use of the
repack worker causes it to exceed the maximum limit of processes in some
runs (the actual limit reached is the number of XIDs in a snapshot's xip
array).

Author: Antonin Houska <ah@cybertec.at>
Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/aeJHPNmL4vVy3oPw@pryzbyj2023
2026-04-20 15:44:23 +02:00
Richard Guo
20efbdffeb Clean up remove_rel_from_query() after self-join elimination commit
The self-join elimination (SJE) commit grafted self-join removal onto
remove_rel_from_query(), which was originally written for left-join
removal only.  This resulted in several issues:

- Comments throughout remove_rel_from_query() still assumed only
  left-join removal, making the code misleading.

- ChangeVarNodesExtended was called on phv->phexpr with subst=-1
  during left-join removal, which is pointless and confusing since any
  surviving PHV shouldn't reference the removed rel.

- phinfo->ph_lateral was adjusted for left-join removal, which is
  unnecessary since the removed relid cannot appear in ph_lateral for
  outer joins.

- The comment about attr_needed reconstruction was in
  remove_rel_from_query(), but the actual rebuild is performed by the
  callers.

- EquivalenceClass processing in remove_rel_from_query() is redundant
  for self-join removal, since the caller (remove_self_join_rel)
  already handles ECs via update_eclasses().

- In remove_self_join_rel(), ChangeVarNodesExtended was called on
  root->processed_groupClause, which contains SortGroupClause nodes
  that have no Var nodes to rewrite.  The accompanying comment
  incorrectly mentioned "HAVING clause".

This patch fixes all these issues, clarifying the separation between
left-join removal and self-join elimination code paths within
remove_rel_from_query().  The resulting code is also better structured
for adding new types of join removal (such as inner-join removal) in
the future.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://postgr.es/m/CAMbWs48JC4OVqE=3gMB6se2WmRNNfMyFyYxm-09vgpm+Vwe8Hg@mail.gmail.com
2026-04-20 17:00:22 +09:00
Peter Eisentraut
04f9ea372a Add missing Datum conversions
Similar to commit ff89e182d4, for new code added since.
2026-04-20 07:22:16 +02:00
Peter Eisentraut
5936afe1ee Fix incorrect format placeholders 2026-04-20 07:09:13 +02:00