Commit graph

62289 commits

Author SHA1 Message Date
Jeff Davis
ca938ec213 Check for unterminated strings when calling uloc_getLanguage().
Missed by commit 1671f990dd.

Author: Andreas Karlsson <andreas@proxel.se>
Discussion: https://postgr.es/m/118ca69e-47eb-42e1-83e9-72ccf40dd6fd@proxel.se
Backpatch-through: 16
2026-04-14 14:46:06 -07:00
Michael Paquier
42473d9009 Add tests for low-level PGLZ [de]compression routines
The goal of this module is to provide an entry point for the coverage of
the low-level compression and decompression PGLZ routines.  The new test
is moved to a new parallel group, with all the existing
compression-related tests added to it.

This includes tests for the cases detected by fuzzing that emulate
corrupted compressed data, as fixed by 2b5ba2a0a1:
- Set control bit with read of a match tag, where no data follows.
- Set control bit with read of a match tag, where 1 byte follows.
- Set control bit with match tag where length nibble is 3 bytes
(extended case).

While on it, some tests are added for compress/decompress roundtrips,
and for check_complete=false/true.  Like 2b5ba2a0a1, backpatch to all
the stable branches.

Discussion: https://postgr.es/m/adw647wuGjh1oU6p@paquier.xyz
Backpatch-through: 14
2026-04-15 05:09:08 +09:00
Jeff Davis
6393259bd4 Fix overrun when comparing with unterminated ICU language string.
The overrun was introduced in commit c4ff35f10.

Author: Andreas Karlsson <andreas@proxel.se>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/96d80a47-f17f-42fa-82b1-2908efbd6541@gmail.com
Backpatch-through: 18
2026-04-13 11:19:21 -07:00
Amit Kapila
540fe8fb5c Fix excessive logging in idle slotsync worker.
The slotsync worker was incorrectly identifying no-op states as successful
updates, triggering a busy loop to sync slots that logged messages every
200ms. This patch corrects the logic to properly classify these states,
enabling the worker to respect normal sleep intervals when no work is
performed.

Reported-by: Fujii Masao <masao.fujii@gmail.com>
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/CAHGQGwF6zG9Z8ws1yb3hY1VqV-WT7hR0qyXCn2HdbjvZQKufDw@mail.gmail.com
2026-04-13 09:42:51 +05:30
Michael Paquier
b081c5b073 Honor passed-in database OIDs in pgstat_database.c
Three routines in pgstat_database.c incorrectly ignore the database OID
provided by their caller, using MyDatabaseId instead:
- pgstat_report_connect()
- pgstat_report_disconnect()
- pgstat_reset_database_timestamp()

The first two functions, for connection and disconnection, each have a
single caller that already passes MyDatabaseId.  This was harmless,
still incorrect.

The timestamp reset function also has a single caller, but in this case
the issue has a real impact: it fails to reset the timestamp for the
shared-database entry (datid=0) when operating on shared objects.  This
situation can occur, for example, when resetting counters for shared
relations via pg_stat_reset_single_table_counters().

There is currently one test in the tree that checks the reset of a
shared relation, for pg_shdescription, we rely on it to check what is
stored in pg_stat_database.  As stats_reset may be NULL, two resets are
done to provide a baseline for comparison.

Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Dapeng Wang <wangdp20191008@gmail.com>
Discussion: https://postgr.es/m/ABBD5026-506F-4006-A569-28F72C188693@gmail.com
Backpatch-through: 15
2026-04-11 17:03:04 +09:00
Richard Guo
13e20d1c9d Fix estimate_array_length error with set-operation array coercions
When a nested set operation's output type doesn't match the parent's
expected type, recurse_set_operations builds a projection target list
using generate_setop_tlist with varno 0.  If the required type
coercion involves an ArrayCoerceExpr, estimate_array_length could be
called on such a Var, and would pass it to examine_variable, which
errors in find_base_rel because varno 0 has no valid relation entry.

Fix by skipping the statistics lookup for Vars with varno 0.

Bug introduced by commit 9391f7152.  Back-patch to v17, where
estimate_array_length was taught to use statistics.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/adjW8rfPDkplC7lF@pryzbyj2023
Backpatch-through: 17
2026-04-11 16:40:34 +09:00
Thomas Munro
36e7efbfdd read_stream: Remove obsolete comment.
This comment was describing the v17 implementation (or io_method=sync).

Backpatch-through: 18
2026-04-11 11:26:48 +12:00
Andrew Dunstan
c3e436b1cb Fix heap-buffer-overflow in pglz_decompress() on corrupt input.
When decoding a match tag, pglz_decompress() reads 2 bytes (or 3
for extended-length matches) from the source buffer before checking
whether enough data remains.  The existing bounds check (sp > srcend)
occurs after the reads, so truncated compressed data that ends
mid-tag causes a read past the allocated buffer.

Fix by validating that sufficient source bytes are available before
reading each part of the match tag.  The post-read sp > srcend
check is no longer needed and is removed.

Found by fuzz testing with libFuzzer and AddressSanitizer.

Backpatch-through: 14
2026-04-10 10:28:00 -04:00
Andrew Dunstan
3e49556302 Fix incremental JSON parser numeric token reassembly across chunks.
When the incremental JSON parser splits a numeric token across chunk
boundaries, it accumulates continuation characters into the partial
token buffer.  The accumulator's switch statement unconditionally
accepted '+', '-', '.', 'e', and 'E' as valid numeric continuations
regardless of position, which violated JSON number grammar
(-? int [frac] [exp]).  For example, input "4-" fed in single-byte
chunks would accumulate the '-' into the numeric token, producing an
invalid token that later triggered an assertion failure during
re-lexing.

Fix by tracking parser state (seen_dot, seen_exp, prev character)
across the existing partial token and incoming bytes, so that each
character class is accepted only in its grammatically valid position.

Backpatch-through: 17
2026-04-10 10:21:38 -04:00
Michael Paquier
35f41b29ff Zero-fill private_data when attaching an injection point
InjectionPointAttach() did not initialize the private_data buffer of the
shared memory entry before (perhaps partially) overwriting it.  When the
private data is set to NULL by the caler, the buffer was left
uninitialized.  If set, it could have stale contents.

The buffer is initialized to zero, so as the contents recorded when a
point is attached are deterministic.

Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0tsGHu2h6YLnVu4HiK05q+gTE_9WVUAqihW2LSscAYS-g@mail.gmail.com
Backpatch-through: 17
2026-04-10 11:17:30 +09:00
Richard Guo
bfc7dff26d Fix integer overflow in nodeWindowAgg.c
In nodeWindowAgg.c, the calculations for frame start and end positions
in ROWS and GROUPS modes were performed using simple integer addition.
If a user-supplied offset was sufficiently large (close to INT64_MAX),
adding it to the current row or group index could cause a signed
integer overflow, wrapping the result to a negative number.

This led to incorrect behavior where frame boundaries that should have
extended indefinitely (or beyond the partition end) were treated as
falling at the first row, or where valid rows were incorrectly marked
as out-of-frame.  Depending on the specific query and data, these
overflows can result in incorrect query results, execution errors, or
assertion failures.

To fix, use overflow-aware integer addition (ie, pg_add_s64_overflow)
to check for overflows during these additions.  If an overflow is
detected, the boundary is now clamped to INT64_MAX.  This ensures the
logic correctly treats the boundary as extending to the end of the
partition.

Bug: #19405
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/19405-1ecf025dda171555@postgresql.org
Backpatch-through: 14
2026-04-09 19:30:37 +09:00
Richard Guo
8e8b2bef78 Strip PlaceHolderVars from partition pruning operands
When pulling up a subquery, its targetlist items may be wrapped in
PlaceHolderVars to enforce separate identity or as a result of outer
joins.  This causes any upper-level WHERE clauses referencing these
outputs to contain PlaceHolderVars, which prevents partprune.c from
recognizing that they match partition key columns, defeating partition
pruning.

To fix, strip PlaceHolderVars from operands before comparing them to
partition keys.  A PlaceHolderVar with empty phnullingrels appearing
in a relation-scan-level expression is effectively a no-op, so
stripping it is safe.  This parallels the existing treatment in
indxpath.c for index matching.

In passing, rename strip_phvs_in_index_operand() to strip_noop_phvs()
and move it from indxpath.c to placeholder.c, since it is now a
general-purpose utility used by both index matching and partition
pruning code.

Back-patch to v18.  Although this issue exists before that, changes in
that version made it common enough to notice.  Given the lack of field
reports for older versions, I am not back-patching further.  In the
v18 back-patch, strip_phvs_in_index_operand() is retained as a thin
wrapper around the new strip_noop_phvs() to avoid breaking third-party
extensions that may reference it.

Reported-by: Cándido Antonio Martínez Descalzo <candido@ninehq.com>
Diagnosed-by: David Rowley <dgrowleyml@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAH5YaUwVUWETTyVECTnhs7C=CVwi+uMSQH=cOkwAUqMdvXdwWA@mail.gmail.com
Backpatch-through: 18
2026-04-09 16:43:28 +09:00
Fujii Masao
acf49bfede Fix ABI break by moving PROCSIG_SLOTSYNC_MESSAGE in ProcSignalReason
Commit 58c1188a3e added PROCSIG_SLOTSYNC_MESSAGE in the middle of
enum ProcSignalReason, breaking the ABI.

Fix this by moving PROCSIG_SLOTSYNC_MESSAGE to the end of the enum,
to restore ordering.

Per buildfarm member crake.

Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwH_AAbtsiYDJt65N7_4PJ0CgOJmBEaCq68e5_tcuG_vXw@mail.gmail.com
Backpatch-through: 18 only
2026-04-09 15:25:40 +09:00
Fujii Masao
58c1188a3e Fix slotsync worker blocking promotion when stuck in wait
Previously, on standby promotion, the startup process sent SIGUSR1 to
the slotsync worker (or a backend performing slot synchronization) and
waited for it to exit. This worked in most cases, but if the process was
blocked waiting for a response from the primary (e.g., due to a network
failure), SIGUSR1 would not interrupt the wait. As a result, the process
could remain stuck, causing the startup process to wait for a long time
and delaying promotion.

This commit fixes the issue by introducing a new procsignal reason,
PROCSIG_SLOTSYNC_MESSAGE. On promotion, the startup process
sends this signal, and the handler sets interrupt flags so the process
exits (or errors out) promptly at CHECK_FOR_INTERRUPTS(), allowing
promotion to complete without delay.

Backpatch to v17, where slotsync was introduced.

Author: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwFzNYroAxSoyJhqTU-pH=t4Ej6RyvhVmBZ91Exj_TPMMQ@mail.gmail.com
Backpatch-through: 17
2026-04-08 11:23:13 +09:00
Amit Kapila
94efd308bc Enhance slot synchronization API to respect promotion signal.
Previously, during a promotion, only the slot synchronization worker was
signaled to shut down. The backend executing slot synchronization via the
pg_sync_replication_slots() SQL function was not signaled, allowing it to
complete its synchronization cycle before exiting.

An upcoming patch improves pg_sync_replication_slots() to wait until
replication slots are fully persisted before finishing. This behaviour
requires the backend to exit promptly if a promotion occurs.

This patch ensures that, during promotion, a signal is also sent to the
backend running pg_sync_replication_slots(), allowing it to be interrupted
and exit immediately.

This change was originally committed to master only. However, backpatch
it to v17, where slot synchronization was introduced. Because it is required
for an upcoming bug fix addressing slotsync (including
pg_sync_replication_slots()) blocking promotion when stuck in a wait.

Author: Ajin Cherian <itsajin@gmail.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
Backpatch-through: 17
2026-04-08 11:19:31 +09:00
Tom Lane
49f3cb453b Fix WITHOUT OVERLAPS' interaction with domains.
UNIQUE/PRIMARY KEY ... WITHOUT OVERLAPS requires the no-overlap
column to be a range or multirange, but it should allow a domain
over such a type too.  This requires minor adjustments in both
the parser and executor.

In passing, fix a nearby break-instead-of-continue thinko in
transformIndexConstraint.  This had the effect of disabling
parse-time validation of the no-overlap column's type in the context
of ALTER TABLE ADD CONSTRAINT, if it follows a dropped column.
We'd still complain appropriately at runtime though.

Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxGoAmN_0iJ=hjTG0vGpOSOyy-vYyfE+-q0AWxrq2_p5XQ@mail.gmail.com
Backpatch-through: 18
2026-04-07 14:45:33 -04:00
Michael Paquier
93f08dc92c Fix shmem allocation of fixed-sized custom stats kind
StatsShmemSize(), that computes the shmem size needed for pgstats,
includes the amount of shared memory wanted by all the custom stats
kinds registered.  However, the shared memory allocation was done by
ShmemAlloc() in StatsShmemInit(), meaning that the space reserved was
not used, wasting some memory.

These extra allocations would show up under "<anonymous>" in
pg_shmem_allocations, as the allocations done by ShmemAlloc() are not
tracked by ShmemIndexEnt.

Issue introduced by 7949d95945.

Author: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/04b04387-92f5-476c-90b0-4064e71c5f37@iki.fi
Backpatch-through: 18
2026-04-07 11:59:54 +09:00
Michael Paquier
af04b04f2f Fix shared memory size of template code for custom fixed-sized pgstats
On HEAD, the template code for custom fixed-sized pgstats is in the test
module test_custom_stats.  On REL_18_STABLE, this code lives in the test
module injection_points.

Both cases were underestimating the size of the shared memory area
required for the storage of the stats data, using a single entry rather
than the whole area.  This underestimation meant that there was no
memory allocated for the LWLock required for the stats, and even more.
This problem would be also misleading for extension developers looking
at this code.

This issue has been noticed while digging into a different bug reported
by Heikki Linnakangas, showing that the underestimation was causing
failures in the TAP tests of the test modules for 32-bit builds.  The
other issue reported, related to the memory allocation of custom
fixed-sized pgstats, will be fixed in a follow-up commit.

Discussion: https://postgr.es/m/adMk_lWbnz3HDOA8@paquier.xyz
Backpatch-through: 18
2026-04-07 08:24:36 +09:00
Tom Lane
11c2c0cc8d Avoid unsafe access to negative index in a TupleDesc.
Commit aa606b931 installed a test that would reference a nonexistent
TupleDesc array entry if a system column is used in COPY FROM WHERE.
Typically this would be harmless, but with bad luck it could result
in a phony "generated columns are not supported in COPY FROM WHERE
conditions" error, and at least in principle it could cause SIGSEGV.
(Compare 570e2fcc0 which fixed the identical problem in another
place.)  Also, since c98ad086a it throws an Assert instead.

In the back branches, just guard the test to make it a safe no-op for
system columns.  Commit 21c69dc73 installed a more aggressive answer
in master.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/6f435023-8ab6-47c2-ba07-035d0c4212f9@gmail.com
Backpatch-through: 14-18
2026-04-06 14:22:17 -04:00
Tom Lane
14bf2c39ee Fix null-bitmap combining in array_agg_array_combine().
This code missed the need to update the combined state's
nullbitmap if state1 already had a bitmap but state2 didn't.
We need to extend the existing bitmap with 1's but didn't.
This could result in wrong output from a parallelized
array_agg(anyarray) calculation, if the input has a mix of
null and non-null elements.  The errors depended on timing
of the parallel workers, and therefore would vary from one
run to another.

Also install guards against integer overflow when calculating
the combined object's sizes, and make some trivial cosmetic
improvements.

Author: Dmytro Astapov <dastapov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFQUnFj2pQ1HbGp69+w2fKqARSfGhAi9UOb+JjyExp7kx3gsqA@mail.gmail.com
Backpatch-through: 16
2026-04-06 13:14:50 -04:00
Thomas Munro
5079e420b9 More tar portability adjustments.
For the three implementations that have caused problems so far:

* GNU and BSD (libarchive) tar both understand --format=ustar
* ustar doesn't support large UID/GID values, so set them to 0 to
  avoid a hard error from at least GNU tar
* OpenBSD tar needs -F ustar, and it appears to warn but carry
  on with "nobody" if a UID is too large
* -f /dev/null is a more portable way to throw away the output, since
  the default destination might be a tape device depending on build
  options that a distribution might change
* Windows ships BSD tar but lacks /dev/null, so ask perl for its name

Based on their manuals, the other two implementations the tests are
likely to encounter in the wild don't seem to need any special handling:

* Solaris/illumos tar uses ustar and replaces large UIDs with 60001
* AIX tar uses ustar (unless --format=pax) and truncates large UIDs

Backpatch-through: 18
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Sami Imseih <samimseih@gmail.com> (large UIDs)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier version)
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> (OpenBSD)
Reviewed-by: Andrew Dunstan <andrew@dunslane.net> (Windows)
Discussion: https://postgr.es/m/3676229.1775170250%40sss.pgh.pa.us
Discussion: https://postgr.es/m/CAA5RZ0tt89MgNi4-0F4onH%2B-TFSsysFjMM-tBc6aXbuQv5xBXw%40mail.gmail.com
2026-04-04 14:02:28 +13:00
Thomas Munro
5c54e0f48f jit: No backport::SectionMemoryManager for LLVM 22.
LLVM 22 has the fix that we copied into our tree in commit 9044fc1d and
a new function to reach it[1][2], so we only need to use our copy for
Aarch64 + LLVM < 22.  The only change to the final version that our copy
didn't get is a new LLVM_ABI macro, but that isn't appropriate for us.
Our copy is hopefully now frozen and would only need maintenance if bugs
are found in the upstream code.

Non-Aarch64 systems now also use the new API with LLVM 22.  It allocates
all sections with one contiguous mmap() instead of one per
section.  We could have done that earlier, but commit 9044fc1d wanted to
limit the blast radius to the affected systems.  We might as well
benefit from that small improvement everywhere now that it is available
out of the box.

We can't delete our copy until LLVM 22 is our minimum supported version,
or we switch to the newer JITLink API for at least Aarch64.

[1] https://github.com/llvm/llvm-project/pull/71968
[2] https://github.com/llvm/llvm-project/pull/174307

Backpatch-through: 14
Discussion: https://postgr.es/m/CA%2BhUKGJTumad75o8Zao-LFseEbt%3DenbUFCM7LZVV%3Dc8yg2i7dg%40mail.gmail.com
2026-04-03 14:58:59 +13:00
Tom Lane
c4b7be4ecb Further harden tests that might use not-so-compatible tar versions.
Buildfarm testing shows that OpenSUSE (and perhaps related platforms?)
configures GNU tar in such a way that it'll archive sparse WAL files
by default, thus triggering the pax-extension detection code added by
bc30c704a.  Thus, we need something similar to 852de579a but for
GNU tar's option set.  "--format=ustar" seems to do the trick.

Moreover, the buildfarm shows that pg_verifybackup's 003_corruption.pl
test script is also triggering creation of pax-format tar files on
that platform.  We had not noticed because those test cases all fail
(intentionally) before getting to the point of trying to verify WAL
data.

Since that means two TAP scripts need this option-selection logic, and
plausibly more will do so in future, factor it out into a subroutine
in Test::Utils.  We also need to back-patch the 003_corruption.pl fix
into v18, where it's also failing.

While at it, clean up some places where guards for $tar being empty
or undefined were incomplete or even outright backwards.  Presumably,
we missed noticing because the set of machines that run TAP tests
and don't have tar installed is empty.  But if we're going to try
to handle that scenario, we should do it correctly.

Reported-by: Tomas Vondra <tomas@vondra.me>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/02770bea-b3f3-4015-8a43-443ae345379c@vondra.me
Backpatch-through: 18
2026-04-02 17:21:18 -04:00
Tom Lane
698eae7db7 Harden astreamer tar parsing logic against archives it can't handle.
Previously, there was essentially no verification in this code that
the input is a tar file at all, let alone that it fits into the
subset of valid tar files that we can handle.  This was exposed by
the discovery that we couldn't handle files that FreeBSD's tar
makes, because it's fairly aggressive about converting sparse WAL
files into sparse tar entries.  To fix:

* Bail out if we find a pax extension header.  This covers the
sparse-file case, and also protects us against scenarios where
the pax header changes other file properties that we care about.
(Eventually we may extend the logic to actually handle such
headers, but that won't happen in time for v19.)

* Be more wary about tar file type codes in general: do not assume
that anything that's neither a directory nor a symlink must be a
regular file.  Instead, we just ignore entries that are none of the
three supported types.

* Apply pg_dump's isValidTarHeader to verify that a purported
header block is actually in tar format.  To make this possible,
move isValidTarHeader into src/port/tar.c, which is probably where
it should have been since that file was created.

I also took the opportunity to const-ify the arguments of
isValidTarHeader and tarChecksum, and to use symbols not hard-wired
constants inside tarChecksum.

Back-patch to v18 but not further.  Although this code exists inside
pg_basebackup in older branches, it's not really exposed in that
usage to tar files that weren't generated by our own code, so it
doesn't seem worth back-porting these changes across 3c9056981
and f80b09bac.  I did choose to include a back-patch of 5868372bb
into v18 though, to minimize cosmetic differences between these
two branches.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/3049460.1775067940@sss.pgh.pa.us>
Backpatch-through: 18
2026-04-02 12:20:26 -04:00
Thomas Munro
78cea19bf7 jit: Stop emitting lifetime.end for LLVM 22.
The lifetime.end intrinsic can now only be used for stack memory
allocated with alloca[1][2][3].  We use it to tell LLVM about the
lifetime of function arguments/isnull values that we keep in palloc'd
memory, so that it can avoid spilling registers to memory.

We might need to rearrange things and put them on the stack, but that'll
take some research.  In the meantime, unbreak the build on LLVM 22.

[1] https://github.com/llvm/llvm-project/pull/149310
[2] https://llvm.org/docs/LangRef.html#llvm-lifetime-end-intrinsic
[3] https://llvm.org/docs/LangRef.html#i-alloca

Backpatch-through: 14
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> (earlier attempt)
Reviewed-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> (earlier attempt)
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier attempt)
Discussion: https://postgr.es/m/CA%2BhUKGJTumad75o8Zao-LFseEbt%3DenbUFCM7LZVV%3Dc8yg2i7dg%40mail.gmail.com
2026-04-02 15:54:01 +13:00
Nathan Bossart
9ed5015f0d doc: Add missing description for DROP SUBSCRIPTION IF EXISTS.
Oversight in commit 665d1fad99.

Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHut%2BPv72haFerrCdYdmF6hu6o2jKcGzkXehom%2BsP-JBBmOVDg%40mail.gmail.com
Backpatch-through: 14
2026-04-01 09:48:48 -05:00
Tom Lane
adb7873bb9 Be more careful to preserve consistency of a tuplestore.
Several places in tuplestore.c would leave the tuplestore data
structure effectively corrupt if some subroutine were to throw
an error.  Notably, if WRITETUP() failed after some number of
successful calls within dumptuples(), the tuplestore would
contain some memtuples pointers that were apparently live
entries but in fact pointed to pfree'd chunks.

In most cases this sort of thing is fine because transaction
abort cleanup is not too picky about the contents of memory that
it's going to throw away anyway.  There's at least one exception
though: if a Portal has a holdStore, we're going to call
tuplestore_end() on that, even during transaction abort.
So it's not cool if that tuplestore is corrupt, and that means
tuplestore.c has to be more careful.

This oversight demonstrably leads to crashes in v15 and before,
if a holdable cursor fails to persist its data due to an undersized
temp_file_limit setting.  Very possibly the same thing can happen in
v16 and v17 as well, though the specific test case submitted failed
to fail there (cf. 095555daf).  The failure is accidentally dodged
as of v18 because 590b045c3 got rid of tuplestore_end's retail tuple
deletion loop.  Still, it seems unwise to permit tuplestores to become
internally inconsistent in any branch, so I've applied the same fix
across the board.

Since the known test case for this is rather expensive and doesn't
fail in recent branches, I've omitted it.

Bug: #19438
Reported-by: Dmitriy Kuzmin <kuzmin.db4@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/19438-9d37b179c56d43aa@postgresql.org
Backpatch-through: 14
2026-03-30 13:59:54 -04:00
Tom Lane
3f3eefc288 Detect pfree or repalloc of a previously-freed memory chunk.
Before the major rewrite in commit c6e0fe1f2, AllocSetFree() would
typically crash when asked to free an already-free chunk.  That was
an ugly but serviceable way of detecting coding errors that led to
double pfrees.  But since that rewrite, double pfrees went through
just fine, because the "hdrmask" of a freed chunk isn't changed at all
when putting it on the freelist.  We'd end with a corrupt freelist
that circularly links back to the doubly-freed chunk, which would
usually result in trouble later, far removed from the actual bug.

This situation is no good at all for debugging purposes.  Fortunately,
we can fix it at low cost in MEMORY_CONTEXT_CHECKING builds by making
AllocSetFree() check for chunk->requested_size == InvalidAllocSize,
relying on the pre-existing code that sets it that way just below.

I investigated the alternative of changing a freed chunk's methodid
field, which would allow detection in non-MEMORY_CONTEXT_CHECKING
builds too.  But that adds measurable overhead.  Seeing that we didn't
notice this oversight for more than three years, it's hard to argue
that detecting this type of bug is worth any extra overhead in
production builds.

Likewise fix AllocSetRealloc() to detect repalloc() on a freed chunk,
and apply similar changes in generation.c and slab.c.  (generation.c
would hit an Assert failure anyway, but it seems best to make it act
like aset.c.)  bump.c doesn't need changes since it doesn't support
pfree in the first place.  Ideally alignedalloc.c would receive
similar changes, but in debugging builds it's impossible to reach
AlignedAllocFree() or AlignedAllocRealloc() on a pfreed chunk, because
the underlying context's pfree would have wiped the chunk header of
the aligned chunk.  But that means we should get an error of some
sort, so let's be content with that.

Per investigation of why the test case for bug #19438 didn't appear to
fail in v16 and up, even though the underlying bug was still present.
(This doesn't fix the underlying double-free bug, just cause it to
get detected.)

Bug: #19438
Reported-by: Dmitriy Kuzmin <kuzmin.db4@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/19438-9d37b179c56d43aa@postgresql.org
Backpatch-through: 16
2026-03-30 12:02:08 -04:00
Fujii Masao
5db5e33969 Fix FK triggers losing DEFERRABLE/INITIALLY DEFERRED when marked ENFORCED again
Previously, a foreign key defined as DEFERRABLE INITIALLY DEFERRED could
behave as NOT DEFERRABLE after being set to NOT ENFORCED and then back
to ENFORCED.

This happened because recreating the FK triggers on re-enabling the constraint
forgot to restore the tgdeferrable and tginitdeferred fields in pg_trigger.

Fix this bug by properly setting those fields when the foreign key constraint
is marked ENFORCED again and its triggers are recreated, so the original
DEFERRABLE and INITIALLY DEFERRED properties are preserved.

Backpatch to v18, where NOT ENFORCED foreign keys were introduced.

Author: Yasuo Honda <yasuo.honda@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAKmOUTms2nkxEZDdcrsjq5P3b2L_PR266Hv8kW5pANwmVaRJJQ@mail.gmail.com
Backpatch-through: 18
2026-03-30 14:38:58 +09:00
David Rowley
49315de0c0 Fix datum_image_*()'s inability to detect sign-extension variations
Functions such as hash_numeric() are not careful to use the correct
PG_RETURN_*() macro according to the return type of that function as
defined in pg_proc.  Because that function is meant to return int32,
when the hashed value exceeds 2^31, the 64-bit Datum value won't wrap to
a negative number, which means the Datum won't have the same value as it
would have had it been cast to int32 on a two's complement machine.  This
isn't harmless as both datum_image_eq() and datum_image_hash() may receive
a Datum that's been formed and deformed from a tuple in some cases, and
not in other cases.  When formed into a tuple, the Datum value will be
coerced into an integer according to the attlen as specified by the
TupleDesc.  This can result in two Datums that should be equal being
classed as not equal, which could result in (but not limited to) an error
such as:

ERROR:  could not find memoization table entry

Here we fix this by ensuring we cast the Datum value to a signed integer
according to the typLen specified in the datum_image_eq/datum_image_hash
function call before comparing or hashing.

Author: David Rowley <dgrowleyml@gmail.com>
Reported-by: Tender Wang <tndrwang@gmail.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/CAHewXNmcXVFdB9_WwA8Ez0P+m_TQy_KzYk5Ri5dvg+fuwjD_yw@mail.gmail.com
2026-03-30 16:16:09 +13:00
Amit Langote
a1baf64589 Doc: fix stale text about partition locking with cached plans
Commit 121d774cae added text to master describing pruning-aware
locking behavior introduced by 525392d57.  That behavior was
reverted in May 2025, making the text incorrect.  Replace it with
the text used in back branches, which correctly describes current
behavior: pruned partitions are still locked at the beginning of
execution.

Discussion: https://postgr.es/m/CA+HiwqFT0fPPoYBr0iUFWNB-Og7bEXB9hB=6ogk_qD9=OM8Vbw@mail.gmail.com
2026-03-30 10:29:52 +09:00
Andrew Dunstan
5095f3f4a0 Fix multiple bugs in astreamer pipeline code.
astreamer_tar_parser_content() sent the wrong data pointer when
forwarding MEMBER_TRAILER padding to the next streamer.  After
astreamer_buffer_until() buffers the padding bytes, the 'data'
pointer has been advanced past them, but the code passed 'data'
instead of bbs_buffer.data.  This caused the downstream consumer
to receive bytes from after the padding rather than the padding
itself, and could read past the end of the input buffer.

astreamer_gzip_decompressor_content() only checked for
Z_STREAM_ERROR from inflate(), silently ignoring Z_DATA_ERROR
(corrupted data) and Z_MEM_ERROR (out of memory).  Fix by
treating any return other than Z_OK, Z_STREAM_END, and
Z_BUF_ERROR as fatal.

astreamer_gzip_decompressor_free() missed calling inflateEnd() to
release zlib's internal decompression state.

astreamer_tar_parser_free() neglected to pfree() the streamer
struct itself, leaking it.

astreamer_extractor_content() did not check the return value of
fclose() when closing an extracted file.  A deferred write error
(e.g., disk full on buffered I/O) would be silently lost.

Discussion: https://postgr.es/m/results/98c6b630-acbb-44a7-97fa-1692ce2b827c@dunslane.net

Reviewed-By: Tom Lane <tgl@sss.pgh.pa.us>

Backpatch-through: 15
2026-03-29 09:02:15 -04:00
Heikki Linnakangas
25b02320e1 Avoid memory leak on error while parsing pg_stat_statements dump file
By using palloc() instead of raw malloc().

Reported-by: Gaurav Singh <gaurav.singh@yugabyte.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAEcQ1bYR9s4eQLFDjzzJHU8fj-MTbmRpW-9J-r2gsCn+HEsynw@mail.gmail.com
Backpatch-through: 14
2026-03-27 12:21:29 +02:00
Andres Freund
fb072e1721 Fix off-by-one error in read IO tracing
AsyncReadBuffer()'s no-IO needed path passed
TRACE_POSTGRESQL_BUFFER_READ_DONE the wrong block number because it had
already incremented operation->nblocks_done. Fix by folding the
nblocks_done offset into the blocknum local variable at initialization.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/u73un3xeljr4fiidzwi4ikcr6vm7oqugn4fo5vqpstjio6anl2%40hph6fvdiiria
Backpatch-through: 18
2026-03-26 10:08:13 -04:00
Fujii Masao
98e96e579b Fix premature NULL lag reporting in pg_stat_replication
pg_stat_replication is documented to keep the last measured lag values for
a short time after the standby catches up, and then set them to NULL when
there is no WAL activity. However, previously lag values could become NULL
prematurely even while WAL activity was ongoing, especially in logical
replication.

This happened because the code cleared lag when two consecutive reply messages
indicated that the apply location had caught up with the send location.
It did not verify that the reported positions were unchanged, so lag could be
cleared even when positions had advanced between messages. In logical
replication, where the apply location often quickly catches up, this issue was
more likely to occur.

This commit fixes the issue by clearing lag only when the standby reports that
it has fully replayed WAL (i.e., both flush and apply locations have caught up
with the send location) and the write/flush/apply positions remain unchanged
across two consecutive reply messages.

The second message with unchanged positions typically results from
wal_receiver_status_interval, so lag values are cleared after that interval
when there is no activity. This avoids showing stale lag data while preventing
premature NULL values.

Even with this fix, lag may rarely become NULL during activity if identical
position reports are sent repeatedly. Eliminating such duplicate messages
would address this fully, but that change is considered too invasive for stable
branches and will be handled in master only later.

Backpatch to all supported branches.

Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAOzEurTzcUrEzrH97DD7+Yz=HGPU81kzWQonKZvqBwYhx2G9_A@mail.gmail.com
Backpatch-through: 14
2026-03-26 20:50:22 +09:00
Robert Haas
cceb9c18a5 Prevent spurious "indexes on virtual generated columns are not supported".
Both of the checks in DefineIndex() that can produce this error
message have a guard against negative attribute numbers, but lack a
guard to ensure that attno is non-zero. As a result, we can index
off the beginning of the TupleDesc and read a garbage byte for
attgenerated. If that byte happens to be 'v', we'll incorrectly
produce the error mentioned above.

The first call site is easy to hit: any attempt to create an
expression index does so. The second one is not currently hit in
the regression tests, but can be hit by something like
CREATE INDEX ON some_table ((some_function(some_table))).

Found by study of a test_plan_advice failure on buildfarm member
skink, though this issue has nothing to do with test_plan_advice
and seems to have only been revealed by happenstance.

Backpatch-through: 18
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoacixUZVvi00hOjk_d9B4iYKswWP1gNqQ8Vfray-AcOCA@mail.gmail.com
2026-03-24 06:28:32 -04:00
John Naylor
51b7316a7c Fix copy-paste error in test_ginpostinglist
The check for a mismatch on the second decoded item pointer
was an exact copy of the first item pointer check, comparing
orig_itemptrs[0] with decoded_itemptrs[0] instead of orig_itemptrs[1]
with decoded_itemptrs[1].  The error message also reported (0, 1) as
the expected value instead of (blk, off).  As a result, any decoding
error in the second item pointer (where the varbyte delta encoding
is exercised) would go undetected.

This has been wrong since commit bde7493d1, so backpatch to all
supported versions.

Author: Jianghua Yang <yjhjstz@gmail.com>
Discussion: https://postgr.es/m/CAAZLFmSOD8R7tZjRLZsmpKtJLoqjgawAaM-Pne1j8B_Q2aQK8w@mail.gmail.com
Backpatch-through: 14
2026-03-24 17:17:48 +07:00
Alexander Korotkov
8c73ab9da9 Further improve commentary about ChangeVarNodesWalkExpression()
The updated comment explains why we use ChangeVarNodes_walker() instead of
expression_tree_walker(), and provides a bit more detail about the differences
in processing top-level Query and subqueries.

Author: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAPpHfdvbjq342WTQ705Wmqhe8794pcp7wospz%2BWUJ2qB7vuOqA%40mail.gmail.com
Backpatch-through: 18
2026-03-24 09:53:28 +02:00
Tom Lane
a0e0b3cc68 Improve commentary about ChangeVarNodesWalkExpression().
IMO the proximate cause of the bug fixed in commit 07b7a964d
was sloppy thinking about what ChangeVarNodesWalkExpression()
is to be used for.  Flesh out its header comment to try to
improve that situation.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1607553.1774017006@sss.pgh.pa.us
Backpatch-through: 18
2026-03-23 11:14:24 -04:00
Heikki Linnakangas
0852643e1c Fix multixact backwards-compatibility with CHECKPOINT race condition
If a CHECKPOINT record with nextMulti N is written to the WAL before
the CREATE_ID record for N, and N happens to be the first multixid on
an offset page, the backwards compatibility logic to tolerate WAL
generated by older minor versions (before commit 789d65364c) failed to
compensate for the missing XLOG_MULTIXACT_ZERO_OFF_PAGE record. In
that case, the latest_page_number was initialized at the start of WAL
replay to the page for nextMulti from the CHECKPOINT record, even if
we had not seen the CREATE_ID record for that multixid yet, which
fooled the backwards compatibility logic to think that the page was
already initialized.

To fix, track the last XLOG_MULTIXACT_ZERO_OFF_PAGE that we've seen
separately from latest_page_number. If we haven't seen any
XLOG_MULTIXACT_ZERO_OFF_PAGE records yet, use
SimpleLruDoesPhysicalPageExist() to check if the page needs to be
initialized.

Reported-by: duankunren.dkr <duankunren.dkr@alibaba-inc.com>
Analyzed-by: duankunren.dkr <duankunren.dkr@alibaba-inc.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/c4ef1737-8cba-458e-b6fd-4e2d6011e985.duankunren.dkr@alibaba-inc.com
Backpatch-through: 14-18
2026-03-23 11:53:32 +02:00
Michael Paquier
882bdcf9fd Fix invalid value of pg_aios.pid, function pg_get_aios()
When the value of pg_aios.pid is found to be 0, the function had the
idea to set "nulls" to "false" instead of "true", without setting the
value stored in the tuplestore.  This could lead to the display of buggy
data.  The intention of the code is clearly to display NULL when a PID
of 0 is found, and this commit adjusts the logic to do so.

Issue introduced by 60f566b4f2.

Author: ChangAo Chen <cca5507@qq.com>
Reviewed-by:  Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/tencent_7D61A85D6143AD57CA8D8C00DEC541869D06@qq.com
Backpatch-through: 18
2026-03-23 18:14:28 +09:00
Tom Lane
5f96426142 Fix finalization of decompressor astreamers.
Send the correct amount of data to the next astreamer, not the
whole allocated buffer size.  This bug escaped detection because
in present uses the next astreamer is always a tar-file parser
which is insensitive to trailing garbage.  But that may not
be true in future uses.

Author: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2178517.1774064942@sss.pgh.pa.us
Backpatch-through: 15
2026-03-22 18:06:48 -04:00
Alexander Korotkov
e8b9d64974 Fix self-join removal to update bare Var references in join clauses
Self-join removal failed to update Var nodes when the join clause was a
bare Var (e.g., ON t1.bool_col) rather than an expression containing
Vars.  ChangeVarNodesWalkExpression() used expression_tree_walker(),
which descends into child nodes but does not process the top-level node
itself.  When a bare Var referencing the removed relation appeared as
the clause, its varno was left unchanged, leading to "no relation entry
for relid N" errors.

Fix by calling ChangeVarNodes_walker() directly instead of
expression_tree_walker(), so the top-level node is also processed.

Bug: #19435
Reported-by: Hang Ammmkilo <ammmkilo@163.com>
Author: Andrei Lepikhov <lepihov@gmail.com>
Co-authored-by: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/19435-3cc1a87f291129f1%40postgresql.org
Backpatch-through: 18
2026-03-20 15:47:11 +02:00
Álvaro Herrera
6958077ceb
SET NOT NULL: Call object-alter hook only after the catalog change
... otherwise, the function invoked by the hook might consult the
catalog and not see that the new constraint exists.

This relies on set_attnotnull doing CommandCounterIncrement()
after successfully modifying the catalog.

Oversight in commit 14e87ffa5c.

Author: Artur Zakirov <zaartur@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/CAKNkYnxUPCJk-3Xe0A3rmCC8B8V8kqVJbYMVN6ySGpjs_qd7dQ@mail.gmail.com
2026-03-20 14:38:50 +01:00
Jeff Davis
c11f87b1a3 Fix dependency on FDW handler.
ALTER FOREIGN DATA WRAPPER could drop the dependency on the handler
function if it wasn't explicitly specified.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/35c44a4b7fb76d35418c4d66b775a88f4ce60c86.camel@j-davis.com
Backpatch-through: 14
2026-03-19 14:59:30 -07:00
Fujii Masao
9804981386 Fix WAL flush LSN used by logical walsender during shutdown
Commit 6eedb2a5fd made the logical walsender call
XLogFlush(GetXLogInsertRecPtr()) to ensure that all pending WAL is flushed,
fixing a publisher shutdown hang. However, if the last WAL record ends at
a page boundary, GetXLogInsertRecPtr() can return an LSN pointing past
the page header, which can cause XLogFlush() to report an error.

A similar issue previously existed in the GiST code. Commit b1f14c9672
introduced GetXLogInsertEndRecPtr(), which returns a safe WAL insertion end
location (returning the start of the page when the last record ends at a page
boundary), and updated the GiST code to use it with XLogFlush().

This commit fixes the issue by making the logical walsender use
XLogFlush(GetXLogInsertEndRecPtr()) when flushing pending WAL during shutdown.

Backpatch to all supported versions.

Reported-by: Andres Freund <andres@anarazel.de>
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/vzguaguldbcyfbyuq76qj7hx5qdr5kmh67gqkncyb2yhsygrdt@dfhcpteqifux
Backpatch-through: 14
2026-03-17 08:12:25 +09:00
Tomas Vondra
0e5ff9b9b4 Tighten asserts on ParallelWorkerNumber
The comment about ParallelWorkerNumbr in parallel.c says:

  In parallel workers, it will be set to a value >= 0 and < the number
  of workers before any user code is invoked; each parallel worker will
  get a different parallel worker number.

However asserts in various places collecting instrumentation allowed
(ParallelWorkerNumber == num_workers). That would be a bug, as the value
is used as index into an array with num_workers entries.

Fixed by adjusting the asserts accordingly. Backpatch to all supported
versions.

Discussion: https://postgr.es/m/5db067a1-2cdf-4afb-a577-a04f30b69167@vondra.me
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Backpatch-through: 14
2026-03-14 15:27:56 +01:00
Tomas Vondra
5b3f63a1bf Use GetXLogInsertEndRecPtr in gistGetFakeLSN
The function used GetXLogInsertRecPtr() to generate the fake LSN. Most
of the time this is the same as what XLogInsert() would return, and so
it works fine with the XLogFlush() call. But if the last record ends at
a page boundary, GetXLogInsertRecPtr() returns LSN pointing after the
page header. In such case XLogFlush() fails with errors like this:

  ERROR: xlog flush request 0/01BD2018 is not satisfied --- flushed only to 0/01BD2000

Such failures are very hard to trigger, particularly outside aggressive
test scenarios.

Fixed by introducing GetXLogInsertEndRecPtr(), returning the correct LSN
without skipping the header. This is the same as GetXLogInsertRecPtr(),
except that it calls XLogBytePosToEndRecPtr().

Initial investigation by me, root cause identified by Andres Freund.

This is a long-standing bug in gistGetFakeLSN(), probably introduced by
c6b92041d3 in PG13. Backpatch to all supported versions.

Reported-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/vf4hbwrotvhbgcnknrqmfbqlu75oyjkmausvy66ic7x7vuhafx@e4rvwavtjswo
Backpatch-through: 14
2026-03-13 23:26:02 +01:00
Michael Paquier
e33a4fda00 xml2: Fix failure with xslt_process() under -fsanitize=undefined
The logic of xslt_process() has never considered the fact that
xsltSaveResultToString() would return NULL for an empty string (the
upstream code has always done so, with a string length of 0).  This
would cause memcpy() to be called with a NULL pointer, something
forbidden by POSIX.

Like 46ab07ffda and similar fixes, this is backpatched down to all the
supported branches, with a test case to cover this scenario.  An empty
string has been always returned in xml2 in this case, based on the
history of the module, so this is an old issue.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/c516a0d9-4406-47e3-9087-5ca5176ebcf9@gmail.com
Backpatch-through: 14
2026-03-13 16:06:46 +09:00
Robert Haas
9540c0e5dd Prevent restore of incremental backup from bloating VM fork.
When I (rhaas) wrote the WAL summarizer code, I incorrectly believed
that XLOG_SMGR_TRUNCATE truncates all forks to the same length.  In
fact, what other parts of the code do is compute the truncation length
for the FSM and VM forks from the truncation length used for the main
fork. But, because I was confused, I coded the WAL summarizer to set the
limit block for the VM fork to the same value as for the main fork.
(Incremental backup always copies FSM forks in full, so there is no
similar issue in that case.)

Doing that doesn't directly cause any data corruption, as far as I can
see. However, it does create a serious risk of consuming a large amount
of extra disk space, because pg_combinebackup's reconstruct.c believes
that the reconstructed file should always be at least as long as the
limit block value. We might want to be smarter about that at some point
in the future, because it's always safe to omit all-zeroes blocks at the
end of the last segment of a relation, and doing so could save disk
space, but the current algorithm will rarely waste enough disk space to
worry about unless we believe that a relation has been truncated to a
length much longer than its actual length on disk, which is exactly what
happens as a result of the problem mentioned in the previous paragraph.

To fix, create a new visibilitymap helper function and use it to include
the right limit block in the summary files. Incremental backups taken
with existing summary files will still have this issue, but this should
improve the situation going forward.

Diagnosed-by: Oleg Tkachenko <oatkachenko@gmail.com>
Diagnosed-by: Amul Sul <sulamul@gmail.com>
Discussion: http://postgr.es/m/CAAJ_b97PqG89hvPNJ8cGwmk94gJ9KOf_pLsowUyQGZgJY32o9g@mail.gmail.com
Discussion: http://postgr.es/m/6897DAF7-B699-41BF-A6FB-B818FCFFD585%40gmail.com
Backpatch-through: 17
2026-03-09 06:46:20 -04:00