The existing logic computed an updated replicationSlotMinLSN from all
slots' restart_lsn only when catalog_xmin also advanced. This is not a
problem in normal (non-repack) cases, because catalog_xmin changes
pretty frequently, so the recomputation is triggered frequently enough.
However, REPACK does not currently change its catalog snapshot, so that
doesn't work very well if no other replication slot is being used.
(After this commit, we still don't recycle WAL properly for REPACK,
because its background worker is not advancing its restart_lsn either;
that will be fixed in a separate commit. However, this preexisting
problem in older code is logically separate from that one.)
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/TY4PR01MB17718B44164522D0798F8E898940A2@TY4PR01MB17718.jpnprd01.prod.outlook.com
The coverage report shows that some error cases were not being tested;
add test cases for them.
While at it, move some recently added ones to the test_decoding suite:
the preventative check added in 43649b6a53 now causes servers with
wal_level=minimal to error out earlier than before.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Baji Shaik <baji.pgdev@gmail.com>
Discussion: https://postgr.es/m/ahiwD29RNfVT4tjQ@alvherre.pgsql
We had discussed changing the wording of messages from "cannot repack
table X" to "cannot execute REPACK on table X", so that translators
don't have to figure out how to translate REPACK as a verb in their
language. We already do that for VACUUM and others and it's not very
nice. Also remove extra double-quotes in a message of that form which I
mistakenly added in commit 43649b6a53.
While at it, add specific error messages for the cases of a table with a
deferrable primary key, and of REPLICA IDENTITY FULL; otherwise the user
gets a message that the table doesn't have an identity index and it's
not clear why that is.
Author: Baji Shaik <baji.pgdev@gmail.com>
Discussion: https://postgr.es/m/CA+fm-ROdgh0rEVuXoViBk4TVgjodrN=MTR_RYuOuKLZ9voX4YA@mail.gmail.com
Discussion: https://postgr.es/m/CABV9wwOo=wvq1hwTRK6HgBWUB=ekzsEebY30EWoc1V9UJQrrrw@mail.gmail.com
With address sanitizer's stack-use-after-return check, stack variables are
moved to heap allocations, to allow to detect references to the memory at a
later time. That broke our stack-depth check, which is why we had to disable
detect_stack_use_after_return in CI. Luckily __builtin_frame_address() works
correctly, even under asan, so use that.
We started using __builtin_frame_address() with de447bb8e6, however as of
that commit we just used it for the stack base address, not for the value to
compare to the base address. Now we use it for both.
When building without __builtin_frame_address() support, we continue to use
stack variables for the stack depth determination.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2kk4z4odvuyrg7qlwjd7ft4eron4cle4btb33v4qatgsdkayir@gj6e62rgsel4
Backpatch-through: 14
The error emitted when REPACK (CONCURRENTLY) is run with too low a
wal_level is thrown by CheckSlotRequirements(), which is a bit
mysterious when the user doesn't know what's up. Add an upfront check
in check_concurrent_repack_requirements() for a more explicit, REPACK-
centered report, which is easier to understand -- this also saves
starting the worker just to have it die immediately.
Author: Baji Shaik <baji.pgdev@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CA+fm-ROdgh0rEVuXoViBk4TVgjodrN=MTR_RYuOuKLZ9voX4YA@mail.gmail.com
Previously, ProcSignalInit() read the global barrier generation before
publishing its PID into pss_pid. This created a race condition: a
process could initialize its local generation with an older global
value, while a concurrent EmitProcSignalBarrier() might skip that
process because its pss_pid was still zero. This resulted in
WaitForProcSignalBarrier() hanging indefinitely.
Fix this by publishing pss_pid before reading psh_barrierGeneration
with a memory barrier so that the store to pss_pid is ordered before
the load. A concurrent EmitProcSignalBarrier() then either observes
the published PID and signals this slot, or completes its generation
increment before we load it.
While this race has become more visible due to recent features using
signal barriers in more places (such as online wal_level changes), the
issue is theoretically present since signal barriers were introduced
to release smgr caches (e.g., in DROP DATABASE). v14 has the
procsiangl barrier infrastricutre but no in-tree caller that actually
emits a barrier, so the case is unreachable there.
This issue was also reported by buildfarm member flaviventris.
Reported-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAEze2WgAJmWReDN7Chtba8Er2YBvKCoa0KVN25-1evnTrHsLyA@mail.gmail.com
Backpatch-through: 15
Commit 2af1dc8928 placed the new "logical decoding disabled after
REPACK (CONCURRENTLY)" check at the end of
051_effective_wal_level.pl. That placement assumed the logical slot
"test_slot" no longer existed when the check ran, but the assumption
only holds on builds with injection points: the earlier
injection-point-driven tests drop "test_slot" as a side effect, while
on builds without injection points the slot persists. When
"test_slot" still exists, logical decoding remains enabled and the new
check fails on those buildfarm members.
Move the REPACK test earlier in the script, ensuring that the test
starts with logical decoding disabled.
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAD21AoBmdmBQ-+Jga+jSKKq5OPGEP1pEjSJfRPT6MCwVHLD6og@mail.gmail.com
REPACK (CONCURRENTLY) uses a temporary logical replication slot, which
is dropped once done, but it wasn't calling RequestDisableLogicalDecoding(),
leaving effective_wal_level stuck at 'logical'.
Fix by adding a Boolean flag to ReplicationSlotDropAcquired() to have it
request to disable logical decoding, and passing it as true on REPACK.
Other callers of that function preserve their existing behavior.
Author: Imran Zaheer <imran.zhir@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Discussion: https://postgr.es/m/CA+UBfaktds57dw2M8BEv_kS-=ixph3w+3MxKixtaDQMi_k7Ybg@mail.gmail.com
Commit 282b1cde9 made SignalBackends() ignore ListenerEntry entries
whose "listening" flag said that the listener was not yet committed.
That will be true for a new listener that has already registered its
queue position, but has not yet reached AtCommit_Notify(). If another
backend notifies the same channel in that window, SignalBackends()
would directly advance the new listener's queue position, causing it
to miss message(s). Really this is a definitional question: is a new
listener active as of PreCommit, or as of AtCommit? But it seems to
make more sense to expect that the new listener will see all messages
after its initially-registered queue position, especially since the
direct-advance logic is supposed to be an optimization that doesn't
affect semantics.
Fix this by treating all channel entries as valid wakeup targets.
Rename the "listening" flag to removeOnAbort to reflect its remaining
purpose: identifying staged LISTEN entries that abort cleanup must
remove.
While we're here, remove an obsolete test case added by 282b1cde9.
The check for "ChannelHashAddListener array growth" was meant to
exercise code that never made it into the committed patch, so now
it's just a waste of test cycles.
Author: Joel Jacobson <joel@compiler.org>
Reviewed-by: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/9835b0a4-9121-47ac-9c44-427b8b1a7f1b@app.fastmail.com
Discussion: https://postgr.es/m/6fe5ee75-537d-4d4f-909a-b21303c3ce75@app.fastmail.com
Concurrent DDL can leave behind objects referencing other objects that
no longer exist. This can happen if an object is dropped, while a new
object that depends on it is created concurrently. For example:
session 1: BEGIN; CREATE FUNCTION myschema.myfunc() ...;
session 2: DROP SCHEMA myschema;
session 1: COMMIT;
DROP SCHEMA does check that there are no objects dependending on the
schema being dropped, but it does not see objects being concurrently
created by other sessions. Even if it did, this scenario would still
fail:
session 1: BEGIN: DROP SCHEMA myschema;
session 2: CREATE FUNCTION myschema.myfunc() ...;
session 1: COMMIT;
When the DROP SCHEMA runs, the schema was empty, but the new function
is created in it before the dropping transaction completes. The CREATE
FUNCTION does not see that the schema is concurrently being dropped.
In both of these scenarios, the function is left behind in the schema
that no longer exists.
To fix, acquire AccessShareLock on all referenced objects when
recording dependencies. This conflicts with the AccessExclusiveLock
taken by DROP, preventing the race. After acquiring the lock, verify
that the object still exists, and if it was dropped concurrently,
report an error. We already had such a mechanism for shared
dependencies, but for some reason we didn't do it for in-database
dependendies.
Ideally the locks would be acquired much earlier when creating a new
object, but that will require modifying a lot of callers. This check
while recording the dependency is a nice wholesale protection, and
even if we change all the CREATE commands to acquire locks earlier,
it's still good to have this as a backstop to catch any cases where we
forgot to do so.
The patch adds a few tests for some cases that left behind orphaned
objects before this. It also adds a test for roles, which already had
such protection, although that test is partially disabled because the
error message includes an OID which is not predictable.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Discussion: https://postgr.es/m/ZiYjn0eVc7pxVY45@ip-10-97-1-34.eu-west-3.compute.internal
Backpatch-through: 14
When creating a relation with a dropped column, we called
recordDependencyOn() also on the datatype of the dropped column, which
is always InvalidOid. In versions 15 and above, that was harmless
because recordDependencyOn() considers InvalidOid as a pinned object,
and skips over it. On version 14, isPinnedObject() does not consider
InvalidOid as pinned, so we created a bogus pg_depend entry with
refobjectid == 0.
As far as I can tell, the only case when AddNewAttributeTuples() is
called with dropped columns is when performing a table-rewriting ALTER
TABLE command. That temporarily creates a new relation with the same
columns, including dropped ones, then swaps the relations, and drops
the newly created table again. So even on version 14, the bogus
pg_depend entry was only on the transient relation that was dropped at
the end of the ALTER TABLE command, which was harmless.
Even though this is harmless, let's be tidy, similar to commit
713bce9484. The reason I noticed this now and why I backported this,
is because the next commit will add code to acquire locks on the
referenced objects, and we don't want to acquire a lock on InvalidOid.
Discussion: https://postgr.es/m/ZiYjn0eVc7pxVY45@ip-10-97-1-34.eu-west-3.compute.internal
Backpatch-through: 14
Commit 316472146 introduced support for ECDH key exchange with an ifdef
guard to ensure support in the underlying OpenSSL installation. Commit
10bf4fc2c3 in OpenSSL removed this guard in 2015 which effectively made
our check a no-op. There has been no complaints that this doesn't work
and OpenSSL installations without ECDH support are likely very rare, so
remove the checks rather than re-implementing support. Not backpatched
since this fix doesn't alter functionality.
Also fix a typo introduced in the original commit which had survived
till this day.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/1787BA9F-A11C-4A7A-9252-94C470D5CBE3@yesql.se
DisownLatch() was executed after the PGPROC entry of the process
terminated is pushed back into a freelist. A newly-forked backend that
recycles the slot could call OwnLatch() and PANIC with a "latch already
owned by PID", taking down the server.
There were two scenarios related to lock groups where this issue could
be reached:
* A follower pushes the leader's PGPROC back to the freelist while the
leader has not yet called DisownLatch() in its own ProcKill().
* A leader outliving all its followers pushes its own PGPROC onto the
freelist before reaching DisownLatch(), which would be the most common
scenario.
This issue is fixed by calling SwitchBackToLocalLatch() and
DisownLatch() at an earlier phase of ProcKill(), before any freelist
manipulation happens, so that the slot of the backend terminated is
never exposed as owning a latch.
Note that pgstat_reset_wait_event_storage() is kept at a later stage.
An upcoming commit will take advantage of that by introducing a test
able to check the original PANIC scenario.
Author: Vlad Lesin <vladlesin@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/d2983796-2603-41b7-a66e-fc8489ddb954@gmail.com
Backpatch-through: 14
This commit fixes two bugs in ProcKill()'s lock-group teardown freelist
publication:
* a double push of the leader's PGPROC that corrupts the freelist.
* a leak of the last follower's PGPROC slot.
ProcKill()'s lock-group teardown had two PGPROC freelist updates
scattered through the function, done under two separate freeProcsLock
acquisitions:
* A follower's push of the leader's PGPROC, done when a follower is the
last group member exiting.
* Every backend's self-push at the bottom of the function.
The two freelist updates were coordinated only by inspecting
proc->lockGroupLeader, which a follower could clear as a side effect of
pushing the leader. This coordination was broken. For example, with
two concurrent backends:
* The follower clears leader->lockGroupLeader and pushes the leader's
PGPROC under leader_lwlock.
* The follower does not clear its own proc->lockGroupLeader, being
skipped.
* When the leader reaches the bottom of ProcKill(), it sees a NULL
proc->lockGroupLeader (the follower cleared it) and pushes itself,
causing a second dlist_push_tail() of the same node onto the same
freelist.
* The follower at the bottom sees its own proc->lockGroupLeader being
not NULL (never cleared) and skips its own push, causing its own slot
to leak.
This commit refactors the freelist manipulation to be done in two
distinct phases, each step using its own lock acquisition to ensure that
each freelist operation happens in an isolated manner for each backend
(follower or leader):
- First, under a single leader_lwlock acquisition, check the state of
the lock-group. Depending on if we are dealing with a follower and/or a
leader, and if the leader has exited before a follower, then set some
state booleans that define which actions should be taken with the
freelist.
- Second, under a single freeProcsLock acquisition, perform the cleanup
actions, self-push of a backend and/or push of the leader back to the
freelist.
This is an old issue, dating back to 9.6 where parallel workers and lock
grouping has been added.
Author: Vlad Lesin <vladlesin@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/d2983796-2603-41b7-a66e-fc8489ddb954@gmail.com
Backpatch-through: 14
When pg_createsubscriber fails after creating logical replication
objects, it should remove the publication and replication slot that
it created on the publisher.
Previously, if dropping subscriber-side objects failed,
pg_createsubscriber reset its internal cleanup state too early. As a
result, the exit-time cleanup could skip removing the publication or
replication slot on the publisher.
This could leave pg_createsubscriber-created objects behind on
the publisher after a failed run. That can make a retry harder,
because the leftover publication or replication slot may need to be
removed manually before running pg_createsubscriber again.
In the case of a replication slot, leaving it behind can also retain
WAL files longer than expected.
The cause of this issue was that the flags made_publication and
made_replslot tracking whether pg_createsubscriber created
a publication or replication slot on the primary were incorrectly
reset to false when failures occurred while dropping objects
on the subscriber.
This commit fixes the issue by preventing those cleanup flags from
being reset even when failures occurred while dropping objects
on the subscriber, ensuring proper cleanup of primary objects
before exit on failure.
Backpatch to v17, where pg_createsubscriber was added.
Author: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/CABdArM5V9QKK1PkLY9dpgAcZa3kUp84-wPqPovxvdLOri4=69w@mail.gmail.com
Backpatch-through: 17
Update stale comments and test names in 019_replslot_limit.pl to match
the actual WAL advancement and wal_status checks. Remove a redundant
standby stop in the inactive_since coverage.
Discussion: https://postgr.es/m/CABPTF7XxDonXAcz6DsN6AUJB3swYrZkJHq3UCDaD3Q2H%2Bj0gUA%40mail.gmail.com
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
wait_for_catchup() has "wait for the standby to reach the target LSN"
semantics. However, the previous polling implementation actually waited for
the primary to observe that position via pg_stat_replication.
7e8aeb9e48 introduced the new WAIT FOR LSN-based implementation, which
just probes the standby.
019_replslot_limit.pl relied on the old side effect: its
"slot state changes to extended/unreserved" subtests inspect
primary-side pg_replication_slots, whose wal_status depends on
restart_lsn, which only advances after the walsender processes a
standby reply. Make the test wait on what it actually needs by
replacing each wait_for_catchup() with
wait_for_slot_catchup('rep1', 'restart', primary->lsn('write')).
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/63f6abc9-c0ae-465d-a4e6-667eca6ea008@gmail.com
Author: Xuneng Zhou <xunengzhou@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
EventTriggerOnLogin() tries to clear pg_database.dathasloginevt when
the database no longer has any login event triggers but the flag is
still set. To make that safe against concurrent flag setters, it
takes a conditional AccessExclusiveLock on the database object.
On a hot standby, that lock acquisition fails outright with
FATAL: cannot acquire lock mode AccessExclusiveLock on database
objects while recovery is in progress
because LockAcquireExtended() refuses locks stronger than
RowExclusiveLock on database objects during recovery. The standby
already replays the flag's value from the primary, so the dangling
flag is the result of replaying a state in which the primary had
already dropped its login event triggers but not yet run a login
event trigger pass to clear the flag. Any session connecting to the
standby in that window therefore fails to connect.
Skip the cleanup on a standby. The flag will be cleared via WAL
replay once the primary clears it on its side.
Add a recovery TAP test that reproduces the original report: create
and drop a login event trigger on the primary in one session, wait
for the standby to replay, then verify that a fresh connection to
the standby succeeds.
Backpatch to v17, where the login event triggers were introduced.
Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reported-by: Egor Chindyaskin <kyzevan23@mail.ru>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Discussion: https://postgr.es/m/19488-d7ccfca2bf6b74b0%40postgresql.org
Backpatch-through: 17
Unlike the slotsync worker, whose retry cycles are separated by
transaction boundaries, pg_sync_replication_slots() retries within a
single SQL function call. Per-cycle allocations for slot names, plugin
names, database names, and auxiliary list containers get accumulated
across retries until the function returned. Memory growth is proportional
to the number of retries and remote slots, and the function may wait an
extended period between cycles when slots are slow to persist.
Fix by running each retry cycle in a short-lived memory context
(sync_retry_ctx) that is reset before the next attempt. Additionally,
release tuple slots created with MakeSingleTupleTableSlot() before
clearing the walreceiver result.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VVPxgfYyr8Kyi=+JACjckQ6NpniV9eRtHboj2hMn0REw@mail.gmail.com
QueueFKConstraintValidation() recurses through the partition hierarchy
to queue child constraint validations and to mark child rows as
validated. With a sufficiently deep partition tree, this can result
in a stack-overflow crash. Defend against that as we do elsewhere.
Bug: #19482
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19482-4cc37cbf52d55235@postgresql.org
Backpatch-through: 18
The original code would leave a shared memory segment unreleased if we
fail partway through initialization. Change the shutdown order so that
we always free it.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/agtNn6ZCmdI2KJFn@alvherre.pgsql
pg_get_multixact_stats() uses members_size to report the amount of
storage used by the currently retained multixact members. However,
MultiXactOffsetStorageSize() divided the member count by the number of
members per storage group before multiplying by the group size, so it
was rounding down its result and incorrectly reported zero when there
were few retained members. The calculation is changed to calculate the
same based on the member count.
While on it, this fixes a different issue in the isolation test
multixact-stats. Three fields were defined for checks related to the
oldest offset values, but were not used. The offsets existed in an
older version of the patch than what has been committed. These are
replaced by checks for members_size, checking the new calculation
formula.
Thinkos introduced in 97b101776c.
Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/819AC1B2-1A71-4244-B081-3ADD85D1725D@gmail.com
The wording of two error hints is tweaked in this commit:
- Import of extended statistics, where the value of an array element is
not a NULL or a string.
- Online data checksum switch, where a period was missing.
Author: Baji Shaik <baji.pgdev@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CA+fm-RMrKbyky_+vi5SDdAVnFVjWh7zW3GoDAVnrp5OpDnW6tw@mail.gmail.com
Tab completion for CHECKPOINT options contained FLUSH_UNLOGGED, but
the boolean value was not part of the completion. Fix to make this
consistent with other boolean values.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/4855807D-F1CA-44E6-9B58-406691832848@gmail.com
ALTER TABLE ... SPLIT PARTITION allows a DEFAULT partition to be created
as one of the replacement partitions when the parent table does not
already have one. However, it should not allow the degenerate case where
a non-DEFAULT partition keeps exactly the same bound as the split
partition and the command merely adds a DEFAULT partition through the
SPLIT PARTITION path.
Detect that case by comparing the bound of the split partition with the
bound of the only non-DEFAULT replacement partition, and raise an error
when they are the same. Users should add a DEFAULT partition directly
with CREATE TABLE ... PARTITION OF ... DEFAULT or ALTER TABLE ... ATTACH
PARTITION ... DEFAULT instead.
The comparison goes through the partition operator family rather than
byte equality so that values which are binary-different but compare
equal under the partition key's comparator are treated as the same
bound. The corresponding regression test uses a float8 LIST partition
with -0.0 and 0.0 -- they have different bit patterns but are equal
under float8 -- to verify that a datumIsEqual()-based check would let
the degenerate split through while the partsupfunc-based check
correctly rejects it.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Discussion: https://postgr.es/m/C18878AB-DEB2-4A61-9995-A035DD644B81@gmail.com
The check for the minimum expected bytea size of a MVDependencies object
was using SizeOfItem() for its calculation. This macro uses the number
of attributes in a single dependency.
This minimum size calculation should be based on MinSizeOfItems(), that
computes the minimum expected size as the header plus the
minimally-sized number of dependency items.
Oversight in d08c44f7a4.
Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Discussion: https://postgr.es/m/4b8d299d-2505-4c30-bf80-0f697410db35@tantorlabs.com
Backpatch-through: 14
This reverts commit 0d3dba38c7, which was determined to have
fundamental flaws. This restricts REPACK (CONCURRENTLY) so that only
one process can run it concurrently on different tables and even on
different databases; we'll lift that restriction in another way during
the next development cycle.
Reported-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1Jg21ODQ7fS2fvN5W_S5kDRhAP5inj3XMRQaa=s-GbYhw@mail.gmail.com
When reusing an existing WAL receiver after it has reached
WALRCV_WAITING for new instructions, RequestXLogStreaming() copied
PrimaryConnInfo into WalRcv->conninfo before switching the state to
WALRCV_RESTARTING. At that point ready_to_display could still be true,
so pg_stat_wal_receiver could expose the raw connection string,
including sensitive fields, but it should only show the user-displayable
version of the connection string.
WALRCV_RESTARTING does not establish a new connection. The waiting WAL
receiver reuses its existing connection and only needs a new startpoint
and timeline, so there is no need to copy the raw connection string into
shared memory again. Let's only copy conninfo when launching a new WAL
receiver after WALRCV_STOPPED, not while waiting for instructions.
This commit adds coverage for the case fixed by this commit to the
timeline-switch test by verifying that the WAL receiver conninfo remains
consistent across the jump.
Backpatch all the way down, as this issue is possible since
pg_stat_wal_receiver has been introduced.
Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/EF91FF76-1E2B-4F3B-9162-290B4DC517FF@gmail.com
Backpatch-through: 14
Commit a36164e746 added a CONNECTING status for the WAL receiver, but
pg_stat_wal_receiver returned no information while the connection to the
primary was attempted, limiting the usability of the feature in
high-latency environments where the connection attempt to the primary
could take time.
This commit improves the report of the status by splitting the way the
shared memory state of the WAL receiver is filled before and after the
connection to the primary is attempted with walrcv_connect():
- Before the attempt, reset all the connection fields, switch
ready_to_display to true.
- After the attempt, fill in the connection fields.
This change means two spinlock acquisitions instead of one, but at least
monitoring tools can know about the connection attempt before its
completion, enlarging the usability of the feature. This code path is
taken only once when a WAL receiver is spawned, so the extra acquisition
does not matter performance-wise.
Reported-by: Chao Li <li.evan.chao@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/EF91FF76-1E2B-4F3B-9162-290B4DC517FF@gmail.com
Commit 112faf1378 added custom notice receivers for replication,
postgres_fdw, and dblink so that remote NOTICE, WARNING, and similar
messages are reported via ereport(). However, those notice receivers were
installed only after libpqsrv_connect() and libpqsrv_connect_params()
returned, by which point libpq connection startup had already completed.
As a result, messages emitted during connection establishment could be
missed.
This commit fixes the issue by splitting libpqsrv_connect() and
libpqsrv_connect_params() into separate start and complete phases:
libpqsrv_connect_start(), libpqsrv_connect_params_start(), and
libpqsrv_connect_complete(). This allows callers to perform
per-connection setup, such as installing a notice receiver, after the
connection has been started but before startup completes.
Note that callers of libpqsrv_connect_start() and
libpqsrv_connect_params_start() must still call
libpqsrv_connect_complete(), even if the start function returns NULL, so
that any external FDs reserved during startup are released properly.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com>
Discussion: https://postgr.es/m/A2B8B7DE-C119-492F-A9FA-14CF86849777@gmail.com
The documentation states that NOT NULL constraints on partitioned tables
are always inherited by all partitions, and therefore cannot be declared
NO INHERIT. While a check already existed to reject creating such
constraints with NO INHERIT, previously the same check was missing for
ALTER TABLE ... ALTER CONSTRAINT ... NO INHERIT.
This commit adds the missing check so that attempting to set NO INHERIT
on a partitioned NOT NULL constraint now fails.
Backpatch to v18, where ALTER TABLE ... ALTER CONSTRAINT ... [NO] INHERIT
was added.
Author: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/ecc985ad-6ec1-4094-a315-317943ca5f3f@proxel.se
Backpatch-through: 18
ALTER TABLE ... SPLIT PARTITION allows a DEFAULT partition to be created
as one of the replacement partitions when the parent table does not
already have one. However, it should not allow the degenerate case where
a non-DEFAULT partition keeps exactly the same bound as the split
partition and the command merely adds a DEFAULT partition through the
SPLIT PARTITION path.
Detect that case by comparing the bound of the split partition with the
bound of the only non-DEFAULT replacement partition, and raise an error
when they are the same. Users should add a DEFAULT partition directly
with CREATE TABLE ... PARTITION OF ... DEFAULT or ALTER TABLE ... ATTACH
PARTITION ... DEFAULT instead.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Discussion: https://postgr.es/m/C18878AB-DEB2-4A61-9995-A035DD644B81@gmail.com
Commit 263d1e6dfe changed pg_recvlogical to honor source cluster file
permissions when creating output files. This commit adds tests verifying
that output files are created with mode 0600 when the source cluster is
initialized without group access, and with mode 0640 when group access is
enabled.
Author: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwHhpizYzMo3nFP4GkNMueSNMY3QfC-gBN1VTXtuiANDvw@mail.gmail.com
Commit c37b3d08ca attempted to preserve group permissions on pg_recvlogical
output files when group access was enabled on the source cluster. However,
the output files were still created with a fixed S_IRUSR | S_IWUSR mode,
preventing group-read permissions from being applied.
This commit fixes the issue by creating output files with pg_file_create_mode
instead of a hard-coded mode. This allows pg_recvlogical to correctly preserve
group permissions from the source cluster.
Backpatch to all supported branches.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwHhpizYzMo3nFP4GkNMueSNMY3QfC-gBN1VTXtuiANDvw@mail.gmail.com
Backpatch-through: 14
When the launching backend of REPACK (CONCURRENTLY) is terminated via
pg_terminate_backend(), ProcDiePending causes ereport(FATAL) which
bypasses PG_FINALLY blocks. As a result, stop_repack_decoding_worker()
is never called, leaving the decoding worker running indefinitely and
holding its temporary replication slot.
Fix by using PG_ENSURE_ERROR_CLEANUP, which handles both ERROR and
FATAL exits.
Author: Baji Shaik <baji.pgdev@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CA+fm-RNoPxL2N7db_A0anMXV_aDu6jWj4PNOPtMtBUAPDPvSXQ@mail.gmail.com
When ALTER TABLE ... SPLIT PARTITION specifies a DEFAULT partition, the
explicit partitions do not need to cover the split partition's bound
exactly. They may cover only part of it, with the DEFAULT partition
covering the remaining range.
However, the existing hint said that the combined bounds of the new
partitions must exactly match the bound of the split partition, which is
misleading for this case and inconsistent with the code comment.
Fix the hint to state the actual requirement: explicit partition bounds
must stay within the bounds of the split partition when a DEFAULT
partition is specified.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Discussion: https://postgr.es/m/C18878AB-DEB2-4A61-9995-A035DD644B81@gmail.com
When splitting a range partition and defining a new DEFAULT partition, the
validation checked the lower bound of the first explicit partition and the
upper bound of explicit partitions only when they were not first. If there
was exactly one explicit non-DEFAULT partition, its upper bound was therefore
not checked.
This could allow the replacement partition to extend beyond the upper bound
of the partition being split, potentially overlapping another existing
partition.
Fix this by checking the upper bound whenever the explicit partition is the
last one. Add a regression test covering the single explicit partition plus
DEFAULT case.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Zhenwei Shang <a934172442@gmail.com>
Reviewed-by: Dmitry Koval <d.koval@postgrespro.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Discussion: https://postgr.es/m/C18878AB-DEB2-4A61-9995-A035DD644B81@gmail.com
When using COPY FROM ... ON_ERROR SET_NULL with a selective column list, the
domain_with_constraint array was incorrectly allocated based on the length of
the target column list. While the array was populated sequentially,
CopyFromTextLikeOneRow attempted to access it using the physical attribute
index (attnum - 1). This mismatch caused out-of-bounds reads when targeting
high-numbered columns, allowing NULL values to bypass NOT NULL domain checks
and be silently inserted.
Fix by allocating the array to match the total number of physical attributes
(num_phys_attrs) and indexing via attnum - 1, bringing it into alignment with
other per-column arrays in BeginCopyFrom.
Author: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDdej0c0gWJi2FnbirzhgzyZNPiTwC1P5B_-dSNCzq-91A@mail.gmail.com
The macro for enabling single-copy atomicity on i586+ when using
GCC has been incorrect since 2017 (commit e8fdbd58f) without any
complaints, and getting it to work is non-trivial.
Getting this to work reliably require C11 atomics, which in turn
also bumps the required MSVC version. For now, simply remove the
attempted support which doesn't work anyways.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reported-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAKZiRmycHOOJyEPc9FUss1_69_U62WoSx32jT7wyES-YkStZKA@mail.gmail.com
Discussion: https://posrgr.es/m/CA+hUKGKFvu3zyvv3aaj5hHs9VtWcjFAmisOwOc7aOZNc5AF3NA@mail.gmail.com
This comment was originally added to RegisterInvalid() in POSTGRES before
Postgres95, and came in via the Postgres95 import. It has been obsolote
for quite some time so remove.
Author: Steven Niu <niushiji@highgo.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/MN2PR15MB30219837B2381AE2518A4C45A7FCA@MN2PR15MB3021.namprd15.prod.outlook.com
ParseVariableDouble missed returning false after logging an error when
the parsed value exceeded max, making the value assigned rather than
rejected. Backpatch down to v18 where this was introduced as part of
the \WATCH_INTERVAL.
Author: Sven Klemm <sven@tigerdata.com>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAMCrgp31p_5SDVi7dwnP39tTW5icQ0MWHA+N4kJdXgkL0PEy8w@mail.gmail.com
Backpatch-through: 18
The error message for incorrect oauth validator configuration was missing
a quote character. OAuth was introduced in v18 but there is no need for a
backpatch since this was introduced in 22f9207aaa.
Author: Jonathan Gonzalez V. <jonathan.abdiel@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/ff9b84b9e6d5a3fef1f320ee5d63ec7dae722739.camel@gmail.com
This commit addresses some defects with the handling of expressions in
pg_restore_extended_stats() and pg_clear_extended_stats():
- Misleading WARNING for an incorrect number of expressions, where the
number of required expressions was reported as the number of elements
given in input rather than the actual number of expressions expected by
the extstats object definition.
- Incorrect matching of expression names, where a key name was
considered as valid as long as it matched with the prefix of a legit key
name. For example "correlatio" given in input would match with
"correlation", and be considered valid. The consequence of this bug was
a silent discard of the input data, where the operation would be
considered a success. The value associated to the prefixed key was not
inserted in the catalogs, just ignored. pg_dump would not generate such
input data patterns, but a user doing manual stats injection could.
- Missing heap_freetuple() in pg_clear_extended_stats(), for the case
where the extstats object in input does not match with its parent
relation.
Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/A7C11B83-7534-4A09-9071-FBD09175CFC8@gmail.com
Previously, REPACK option parsing had two bugs.
First, REPACK (CONCURRENTLY OFF) failed with:
ERROR: unrecognized REPACK option "concurrently"
while CONCURRENTLY ON was accepted correctly.
Second, when the same option was specified multiple times, the last value
specified was not always honored. If any occurrence set the option to ON,
the option was treated as enabled even when the final setting was OFF.
This commit fixes these issues by correctly accepting CONCURRENTLY
regardless of its value, and by making the last specified value take precedence
when an option appears multiple times.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAHGQGwGAY4kfDtC4i+hAOX-a3u0yOA6__6EDTQz-ytsDHgh-yQ@mail.gmail.com
The IGNORE NULLS implementation caches whether a window function argument
evaluated to NULL or NOT NULL for a given partition row. That is safe for
ordinary expressions, but not for volatile expressions, where evaluating the
same argument on the same row can produce a different NULL/NOT NULL result
later.
This could produce wrong results in two ways. A row previously cached as
NULL could be skipped even though a later evaluation would return NOT NULL.
Conversely, a row cached as NOT NULL could be chosen as the target row, then
re-evaluated to fetch the actual value and return NULL.
Make the nullness cache conditional per argument. Do not use it for
arguments containing volatile functions or subplans, following the same
conservative approach used for moving window aggregates. Also avoid
re-evaluating non-cacheable partition arguments after the scan has already
found the target row.
Add regression tests covering volatile arguments and subplan arguments with
IGNORE NULLS.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Tatsuo Ishii <ishii@postgresql.org>
Discussion: https://postgr.es/m/42B42506-6972-4266-8422-FB73E61D9DA7@gmail.com
This commit moves the definitions of InjectionPointConditionType and
InjectionPointCondition into a new header local to the test module
injection_points.h, so as these can be shared across more files in the
module. A patch for a bug fix is under discussion, whose proposed test
will benefit from this refactoring.
Backpatch down to where the module exists, as this should be useful for
future bug fixes, even cases unrelated to the thread where this change
has been discussed.
Author: Andrey Borodin <x4mmm@yandex-team.ru>
Author: Vlad Lesin <vladlesin@gmail.com>
Discussion: https://postgr.es/m/d2983796-2603-41b7-a66e-fc8489ddb954@gmail.com
Backpatch-through: 17
Three locations use Assert() to guard against a mismatch between the
number of columns advertised in the RELATION message and the number
actually received in the subsequent INSERT/UPDATE tuple message. Since
these values originate from the publisher, the check must survive into
production builds.
A malicious or buggy publisher can send a RELATION claiming N columns
and an INSERT claiming M < N columns. The subscriber's apply worker
indexes into colvalues[]/colstatus[] using column indices from the
RELATION message's attribute map, causing a heap out-of-bounds read when
the tuple's column array is smaller than expected. We've looked, without
success, for a scenario in which the publisher holds sufficient control
over these out-of-bounds bytes to exploit this or even to reach a
SIGSEGV. Despite not finding one, the code has been fragile. Back-patch
to v14 (all supported versions).
Reported-by: Varik Matevosyan <varikmatevosyan@gmail.com>
Author: Varik Matevosyan <varikmatevosyan@gmail.com>
Discussion: https://postgr.es/m/CA+bBoog3cCogktzfLb9bppUByu-10B3CFp8u=iKXG_OvtAguCw@mail.gmail.com
Backpatch-through: 14
There is now only one caller of ProcessStartupPacket(). Let's simplify
the routine so as the GSS and SSL states are tracked inside it. If
future callers are added, there is less guessing to do.
Suggested-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/aga7lCWluyc5zLb5@paquier.xyz
In some cases its necessary to understand whether TSC frequency data was
sourced from CPUID, and which of the registers. Show this debug info at
the end of pg_test_timing, and rework TSC functions to support that.
This would have helped debug the buildfarm report fixed in 7fc36c5db5
and is likely going to aid in any TSC-related issues reported during the
beta period or later.
Additionally, emit a warning if TSC frequency from calibration differs
by more than 10% from the TSC frequency in use, and suggest the use
of timing_clock_source = 'system'.
In passing, add an explicit early return in the output function if the
loop count is zero. This can't happen in practice, but coverity complained
because we unconditionally call output for the fast TSC measurement.
Author: Lukas Fittl <lukas@fittl.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (coverity fix only)
Discussion: https://postgr.es/m/CAP53Pkw3Gzb+KTF5pu_o7tzbfZ7+qm2m6uDWuGtTJjZpV9yNpg@mail.gmail.com
Previously, the subscription setting retain_dead_tuples didn't cause
ALTER SUBSCRIPTION ... SERVER to check the publisher. And if the
publisher was checked for some other reason, then it would use the old
conninfo.
Fix ALTER SUBSCRIPTION ... SERVER to always check the publisher when
retain_dead_tuples is set, and to use the new connection info, like
ALTER SUBSCRIPTION ... CONNECTION.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/f13a8e29410bbbf9999290f2c04513a8884fa51c.camel@j-davis.com
Previously, tab completion for REPACK parenthesized boolean options
(ANALYZE, CONCURRENTLY, and VERBOSE) did not suggest the boolean values
ON and OFF, unlike VACUUM.
This commit fixes the issue by adding ON/OFF completion for those options.
Author: Baji Shaik <baji.pgdev@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CA+fm-RNZpy7MAceR9gSyy833H_uL-fTx0LxO73RnvwEaprpuRA@mail.gmail.com
Commit 4bea91f21f enabled COPY TO on a partitioned table to read
tuples from its partitions and mapped them to the root table's tuple
descriptor before output. However, it incorrectly built the attribute
map from the root table to the partition.
This commit fixes by building the attribute map from the partition to
the root table, ensuring that partition attributes are correctly
mapped to their corresponding root attributes.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/85EA70F3-C3DB-477B-B856-EA569FDAAE7C@gmail.com
Commit b7b0f3f272 ("Use streaming I/O in sequential scans") routed
sequential scans through read_stream_next_buffer(), bypassing the
RELATION_IS_OTHER_TEMP() check in ReadBufferExtended(). As a result,
a superuser can attempt to read or modify temp tables of other
sessions through the read-stream path. When the query plan uses no index,
SELECT/UPDATE/DELETE/MERGE silently see no rows / report zero affected rows,
and COPY produces an empty output -- because the buffer manager has no
visibility into the owning session's local buffers and silently returns
nothing. Any query plan that uses, for instance, a btree index
still errors out via the existing check in ReadBufferExtended(), which
is reached from hio.c and nbtree respectively, but this is incidental.
Fix by enforcing RELATION_IS_OTHER_TEMP() at the three additional
buffer-manager entry points:
- read_stream_begin_impl() rejects the read at stream setup time,
covering sequential and bitmap scans that go through the
read-stream path.
- ReadBuffer_common() becomes the canonical place for the check,
consolidating the existing one previously kept in
ReadBufferExtended(). All ReadBufferExtended() callers go through
ReadBuffer_common(), so the consolidation is behavior-preserving.
- StartReadBuffersImpl() catches direct callers of StartReadBuffers()
that bypass both of the above. This is currently defense-in-depth,
but documents the contract for future code.
The companion test in src/test/modules/test_misc was added in the
preceding commit; this commit updates the assertions for SELECT,
UPDATE, DELETE, MERGE, and COPY (which previously documented the
bug as silent success) to expect the new error.
Author: Jim Jones <jim.jones@uni-muenster.de>
Author: Daniil Davydov <3danissimo@gmail.com>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAJDiXghdFcZ8%3Dnh4G69te7iRr3Q0uFyXxb3ZdG09_GTNZXwH0g%40mail.gmail.com
Backpatch-through: 17
Add a TAP test in src/test/modules/test_misc that documents what
happens when one session attempts to read or modify another session's
temporary table. This commit only adds tests; it does not change
backend behavior, so the assertions reflect current behavior:
- SELECT, UPDATE, DELETE, MERGE, COPY on a table without an index
silently succeed with no error and zero rows / zero affected rows.
These commands run through the read-stream path, which currently
bypasses the RELATION_IS_OTHER_TEMP() check. This is the
underlying bug to be fixed in a follow-up.
- INSERT errors with "cannot access temporary tables of other
sessions" because hio.c calls ReadBufferExtended() to find a page
with free space and is caught by the existing check there.
- Index scan errors via the same existing check, reached through
nbtree -> ReadBuffer -> ReadBufferExtended.
- TRUNCATE / ALTER TABLE / ALTER INDEX / CLUSTER fail with their
command-specific error messages.
- VACUUM is silently skipped to avoid noise during database-wide
VACUUM (vacuum_rel() returns without warning).
- DROP TABLE is intentionally allowed: DROP does not touch the
table's contents, and autovacuum relies on this to clean up
temp relations orphaned by a crashed backend.
- ALTER FUNCTION / DROP FUNCTION on an owner-created function over
its own temp row type work as catalog operations -- they don't
read the underlying data.
- CREATE FUNCTION from a separate session, using another session's
temp row type as an argument, is allowed but emits a NOTICE: the
function is moved into the creator's pg_temp namespace with an
auto-dependency on the borrowed type, so it disappears together
with the session that created it.
- A bare DROP TABLE on a temp table that has a cross-session
dependent function fails with a catalog-level dependency error.
- LOCK TABLE in ACCESS SHARE mode on another session's temp table
succeeds and properly blocks the owner's session-exit cleanup
(which acquires AccessExclusiveLock via findDependentObjects).
This exercises the same LockRelationOid path used by autovacuum
when cleaning up orphaned temp relations.
- When the owner session ends, the normal session-exit cleanup
cascades through DEPENDENCY_NORMAL and removes both the temp
objects and any cross-session functions that depended on them.
Also, document the contract for RELATION_IS_OTHER_TEMP() so that
future buffer-access entry points enforce the same rule.
Backpatch this through PostgreSQL 17, where b7b0f3f272 introduces a code
path bypassing this check.
Author: Jim Jones <jim.jones@uni-muenster.de>
Author: Daniil Davydov <3danissimo@gmail.com>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAJDiXghdFcZ8%3Dnh4G69te7iRr3Q0uFyXxb3ZdG09_GTNZXwH0g%40mail.gmail.com
Backpatch-through: 17
The jsonpath .split_part() method passed its field-position argument
through numeric_int4(), that can fail hard if called directly.
This commit switches the code to use numeric_int4_safe() with an error
context for soft reporting, so as the overflow and zero field-position
cases can be handled in silent mode.
Oversight in bd4f879a9c.
Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/FCF996D0-580B-431C-8DE1-A540C58E444C@gmail.com
When pgbench runs with multiple threads and verbose error reporting is
enabled (--verbose-errors), multiple clients can build verbose error
messages concurrently. Previously, a function-local static
PQExpBuffer was used for these messages, causing the buffer to be
shared across threads. This was not thread-safe and could result in
corrupted or incorrect log output.
Fix this by using a local PQExpBufferData instead of a static buffer.
This keeps verbose error messages correct during concurrent execution.
Backpatch to v15, where this issue was introduced.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Alex Guo <guo.alex.hengchen@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwER1AjGXpkKB9t9820NBhMQ_Ghv7=HsKeodUr3=SZsF4g@mail.gmail.com
Backpatch-through: 15
Use consistent "REPACK (CONCURRENTLY)" naming in errhint messages,
matching the actual command syntax and the errmsg text used elsewhere
in the same file. Also improve the ereport() after XLogReadRecord
failure to be like others in the tree.
While at it, remove direct mentions of the DDL in the translatable
strings, both in the same errhint() calls as well as some errmsg()
calls. Add periods where missing.
There are all oversights in 28d534e2ae.
Reported-by: Baji Shaik <baji.pgdev@gmail.com>
Discussion: https://postgr.es/m/CA+fm-RPxX1xTcYY4qQGPRDXB2-Fy2SDNdZi=zVjr0j=MPg2PaA@mail.gmail.com
"egrep" has never been in POSIX; the standard way to access this
functionality is "grep -E". Recent versions of GNU grep have
started to warn about this, so stop using "egrep".
This could be back-patched, but I see little need to do so
because the affected places are not code that runs during
normal builds. (Perhaps src/backend/port/aix/mkldexport.sh
is an exception, but let's wait to see if any AIX users
complain before touching that.)
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/473272.1778685870@sss.pgh.pa.us
Run renumber_oids.pl to move high-numbered OIDs down, as per pre-beta
tasks specified by RELEASE_CHANGES. For reference, the command was
./renumber_oids.pl --first-mapped-oid 8000 --target-oid 6400
(but there were already some used OIDs at 6400, so the first one
actually assigned was 6434).
Update typedefs.list from the buildfarm, and run pgindent.
The changes from the new typedefs list are pretty minimal,
since we'd been pretty good (not perfect) about updating
typedefs.list by hand. But the pgindent behavior changes
installed by a3e6beba6, b518ba4af, and 60f9467c3 add up
to make this a relatively sizable diff.
Enforce this standard formatting of multiline comments that start
in column 1:
/*
* line 1
* line 2
*/
Unlike indented comments, we don't reconsider line breaks, except
for forcing the initial /* and trailing */ onto their own lines.
We do make each line start with " *", with some whitespace following.
We preserve pgindent's existing behavior of not touching comments
that begin with /**... or /*-... Also, if the first line looks like
/* === or /* ---, we don't split that line; similarly for the last
line.
The vast majority of multiline comments in our tree already look
like this, but this change will clean up some stragglers.
Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reported-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAJ7c6TPQ0kkHQG-AqeAJ3PV_YtmDzcc7s%2B_V4%3Dt%2BxgSnZm1cFw%40mail.gmail.com
Discussion: https://postgr.es/m/EB0141C5-ACC2-4F0B-85EA-0E3AFBCE322F@umbc.edu
Formatting of variadic functions and struct literals with named fields
used to be ugly due to pg_bsd_indent treating period as always being a
binary operator. After a comma, it's not that, so insert a space.
Bump pg_bsd_indent's version so that people who use out-of-tree
copies will know they need to update. (This also covers the other
pg_bsd_indent behavioral change introduced in a3e6beba6.)
Author: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/c3327be8-09e2-46a1-88b4-228a339d6916@proxel.se
When a struct member name matches a registered typedef, pgindent
removes the space after "!=" (and some other operators), like so:
entry->dsh.dsa_handle !=DSA_HANDLE_INVALID
The problem is that the related code in lexi.c sets last_u_d to
true before jumping to found_typename, causing the next operator to
be classified as unary and suppressing the following space. This
is correct for type names, but not for struct members. For
example, "Datum *x" needs "*" to be unary to suppress the space
before "x". To fix, only set last_u_d before jumping to
found_typename if the typedef name doesn't appear after "." or
"->".
Note that this does not bump INDENT_VERSION. We'll do that just
once after some other changes to pg_bsd_indent are committed.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/aS9hkwnkWf3dZIA_%40nathan
Both UPDATE and DELETE were failing to test that the application-time
column was updatable. The column is not part of
perminfo->updatedCols, because it should not be checked for
permissions. And it needs to be checked in the DELETE case as well,
since we might insert leftovers with a value for that column.
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Co-authored-by: jian he <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACJufxFRqg8%3DgbZ-Q6ZS_UQ%2BYdwfZpk%2B9rf7jgWrk8m4RMUm%3DA%40mail.gmail.com
Two cases fixed by 2b5ba2a0a1 were not covered, to emulate the
handling of corrupted data, for:
- set control bit with a valid 2-byte match tag where offset is 0.
- set control bit with a valid 2-byte match tag where offset exceeds
output written.
Oversight in 67d318e704.
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/agF4xkIdRcrCIprs@paquier.xyz
Backpatch-through: 14
Previously, pg_stat_progress_copy in the subscriber could continue to show
the initial COPY operation for logical replication table synchronization as
active even after the data copy had finished. The stale progress entry
remained visible until synchronization caught up with the publisher.
This happened because the table synchronization code called BeginCopyFrom()
and CopyFrom(), but failed to call EndCopyFrom() afterward.
This commit fixes the issue by adding the missing EndCopyFrom() call so that
the COPY progress state in the subscriber is cleared as soon as the initial
data copy completes.
Backpatch to all supported branches.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: ChangAo Chen <cca5507@qq.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAOzEurQKuy3RiPkd=25PEwEzaqHuGvEOf=X7vaVzhgNjaukYzA@mail.gmail.com
Backpatch-through: 14
Oleg's original comment was intelligible only to him.
Aleksander has reverse-engineered what seems like a plausible
explanation of what the code is trying to do, so replace the
comment with that. (Also, re-order the final expression to
match the new comment.)
In passing, this makes the comment satisfy our usual formatting
conventions. pgindent has let it pass as-is so far, but planned
changes would mess it up without some sort of intervention.
Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAJ7c6TO0xvunpeOv89i1eKQBhKF9=GEETkTz+yAGs1xGYH25MQ@mail.gmail.com
REPACK replay builds scan keys for the replica identity index, but it
hard-coded BTEqualStrategyNumber when looking up the equality operator.
That is not correct for non-btree identity indexes, such as the GiST
indexes created for WITHOUT OVERLAPS primary keys. In addition,
find_target_tuple() accepted the first tuple returned by the identity
index scan, which is unsafe for lossy index scans because the index AM may
return false positives with xs_recheck set.
Fix this by using IndexAmTranslateCompareType() to translate COMPARE_EQ
to the equality strategy number for the index AM, and by continuing the
scan when recheck is required until a candidate tuple matches the locator
tuple on all replica identity key columns.
The recheck uses the same equality operator functions as the identity
index scan keys, preserving ScanKey argument ordering.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/7B0EC0EC-5461-41EF-9B31-F9BBE608DEA5@gmail.com
When result_is_int is set to 0, PQfn() cannot validate that the
result fits in result_buf, so it will write data beyond the end of
the buffer when the server returns more data than requested. Since
this function is insecurable and obsolete, add a warning to the top
of the pertinent documentation advising against its use.
The only in-tree caller of PQfn() is the frontend large object
interface. To fix that, add a buf_size parameter to
pqFunctionCall3() that is used to protect against overruns, and use
it in a private version of PQfn() that also accepts a buf_size
parameter.
Reported-by: Yu Kunpeng <yu443940816@live.com>
Reported-by: Martin Heistermann <martin.heistermann@unibe.ch>
Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Security: CVE-2026-6477
Backpatch-through: 14
If you accumulate many arrays full of NULLs, you could overflow
'nitems', before reaching the MaxAllocSize limit on the allocations.
Add an explicit check that the number of items doesn't grow too large.
With more than MaxArraySize items, getting the final result with
makeArrayResultArr() would fail anyway, so better to error out early.
Reported-by: Xint Code
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 14
Security: CVE-2026-6473
pg_locale_icu.c was full of places where a very long input string
could cause integer overflow while calculating a buffer size,
leading to buffer overruns.
It also was cavalier about using char-type local arrays as buffers
holding arrays of UChar. The alignment of a char[] variable isn't
guaranteed, so that this risked failure on alignment-picky platforms.
The lack of complaints suggests that such platforms are very rare
nowadays; but it's likely that we are paying a performance price on
rather more platforms. Declare those arrays as UChar[] instead,
keeping their physical size the same.
pg_locale_libc.c's strncoll_libc_win32_utf8() also had the
disease of assuming it could double or quadruple the input
string length without concern for overflow.
Reported-by: Xint Code
Reported-by: Pavel Kohout <pavel.kohout@aisle.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 14
Security: CVE-2026-6473
pg_rewind and pg_basebackup could be fed paths from rogue endpoints that
could overwrite the contents of the client when received, achieving path
traversal.
There were two areas in the tree that were sensitive to this problem:
- pg_basebackup, through the astreamer code, where no validation was
performed before building an output path when streaming tar data. This
is an issue in v15 and newer versions.
- pg_rewind file operations for paths received through libpq, for all
the stable branches supported.
In order to address this problem, this commit adds a helper function in
path.c, that reuses path_is_relative_and_below_cwd() after applying
canonicalize_path(). This can be used to validate the paths received
from a connection point. A path is considered invalid if any of the two
following conditions is satisfied:
- The path is absolute.
- The path includes a direct parent-directory reference.
Reported-by: XlabAI Team of Tencent Xuanwu Lab
Reported-by: Valery Gubanov <valerygubanov95@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6475
A few functions in this file were incautious about multiplying a
possibly large integer by a factor more than 1 and then using it as
an allocation size. This is harmless on 64-bit systems where we'd
compute a size exceeding MaxAllocSize and then fail, but on 32-bit
systems we could overflow size_t, leading to an undersized
allocation and buffer overrun. To fix, use palloc_array() or
mul_size() instead of handwritten multiplication.
Reported-by: Sven Klemm <sven@tigerdata.com>
Reported-by: Xint Code
Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tatsuo Ishii <ishii@postgresql.org>
Security: CVE-2026-6473
Backpatch-through: 14
This omission allowed roles to create multirange types in any
schema, potentially leading to privilege escalations. Note that
when a multirange type name is not specified in CREATE TYPE, it is
automatically placed in the range type's schema, which is checked
at the beginning of DefineRange().
Reported-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Security: CVE-2026-6472
Backpatch-through: 14
drop_existing_subscription() neglected to escape the subscription
name when generating its query string. To fix, use
PQescapeIdentifier() to construct a properly escaped name, and use
it in the ALTER SUBSCRIPTION and DROP SUBSCRIPTION commands.
Reported-by: Yu Kunpeng <yu443940816@live.com>
Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Security: CVE-2026-6476
Backpatch-through: 17
The SQL functions for the restore of attribute and expression statistics
accept "most_common_vals" and "most_common_freqs" as independent arrays.
The planner assumes these have the same number of elements, but it was
possible to insert in the catalogs data that would cause an over-read
when the catalog data is loaded in the planner.
There were two holes in the stats restore logic:
- Both arrays should match in size.
- The input array must be one-dimensional, and it should match with what
is delivered by pg_dump when scanning the pg_stats catalogs.
The multivariate extended statistics MCV path (import_mcv) already
validated these inputs via check_mcvlist_array(), and is not affected.
These problems exist in v18 and newer versions for the restore of
attribute statistics. These problems affect only HEAD for the restore
of the expression statistics.
Reported-by: Jeroen Gui <jeroen.gui1@proton.me>
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Security: CVE-2026-6575
Backpatch-through: 18
Although pg_strftime() has defined error conditions, no callers bother
to check for errors. This is problematic because the output string is
very likely not null-terminated if an error occurs, so that blindly
using it is unsafe. Rather than trusting that we can find and fix all
the callers, let's alter the function's API spec slightly: make it
guarantee a null-terminated result so long as maxsize > 0.
Furthermore, if we do get an error, let's make that null-terminated
result be an empty string. We could instead truncate at the buffer
length, but that risks producing mis-encoded output if the tz_name
string contains multibyte characters. It doesn't seem reasonable for
src/timezone/ to make use of our encoding-aware truncation logic.
Also, the only really likely source of a failure is a user-supplied
timezone name that is intentionally trying to overrun our buffers.
I don't feel a need to be particularly friendly about that case.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6474
timeofday() assumed that the output of pg_strftime() could not contain
% signs, other than the one it explicitly asks for with %%. However,
we don't have that guarantee with respect to the time zone name (%Z).
A crafted time zone setting could abuse the subsequent snprintf()
call, resulting in crashes or disclosure of server memory.
To fix, split the pg_strftime() call into two and then treat the
outputs as literal strings, not a snprintf format string. The
extra pg_strftime() call doesn't really cost anything, since the
bulk of the conversion work was done by pg_localtime().
Also, adjust buffer widths so that we're not risking string truncation
during the snprintf() step, as that would create a hazard of producing
mis-encoded output.
This also fixes a latent portability issue: the format string expects
an int, but tp.tv_usec is long int on many platforms.
Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6474
ALTER SUBSCRIPTION ... REFRESH PUBLICATION interpolates schema and
relation names into SQL without quoting them. A crafted subscriber
relation name can inject arbitrary SQL on the publisher. Test such a
name. Back-patch to v16, where commit
8756930190 first appeared.
Reported-by: Pavel Kohout <pavel.kohout@aisle.com>
Author: Pavel Kohout <pavel.kohout@aisle.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Backpatch-through: 16
Security: CVE-2026-6638
This commit applies timingsafe_bcmp() to authentication paths that
handle attributes or data previously compared with memcpy() or strcmp(),
which are sensitive to timing attacks.
The following data is concerned by this change, some being in the
backend and some in the frontend:
- For a SCRAM or MD5 password, the computed key or the MD5 hash compared
with a password during a plain authentication.
- For a SCRAM exchange, the stored key, the client's final nonce and the
server nonce.
- RADIUS (up to v18), the encrypted password.
- For MD5 authentication, the MD5(MD5()) hash.
Reported-by: Joe Conway <mail@joeconway.com>
Security: CVE-2026-6478
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Backpatch-through: 14
The handling of SSL and GSS negotiation messages in
ProcessStartupPacket() could cause a recursion of the backend,
ultimately crashing the server as the negotiation attempts were not
tracked across multiple calls processing startup packets.
A malicious client could therefore alternate rejected SSL and GSS
requests indefinitely, each adding a stack frame, until the backend
crashed with a stack overflow, taking down a server.
This commit addresses this issue by modifying ProcessStartupPacket() so
as processed negotiation attempts are tracked, preventing infinite
recursive attempts. A TAP test is added to check this problem, where
multiple SSL and GSS negotiated attempts are stacked.
Reported-by: Calif.io in collaboration with Claude and Anthropic
Research
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Security: CVE-2026-6479
Backpatch-through: 14
multirange_recv and BlockRefTableReaderNextRelation were incautious
about multiplying a possibly-large integer by a factor more than 1
and then using it as an allocation size. This is harmless on 64-bit
systems where we'd compute a size exceeding MaxAllocSize and then
fail, but on 32-bit systems we could overflow size_t leading to an
undersized allocation and buffer overrun.
Fix these places by using palloc_array() instead of a handwritten
multiplication. (In HEAD, some of them were fixed already, but
none of that work got back-patched at the time.)
In addition, BlockRefTableReaderNextRelation passes the same value
to BlockRefTableRead's "int length" parameter. If built for
64-bit frontend code, palloc_array() allows a larger array size
than it otherwise would, potentially allowing that parameter to
overflow. Add an explicit check to forestall that and keep the
behavior the same cross-platform.
Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 14
Security: CVE-2026-6473
Some UTF8 characters decompose to more than a dozen codepoints.
It is possible for an input string that fits into well under
1GB to produce more than 4G decomposed codepoints, causing
unicode_normalize()'s decomp_size variable to wrap around to a
small positive value. This results in a small output buffer
allocation and subsequent buffer overrun.
To fix, test after each addition to see if we've overrun MaxAllocSize,
and break out of the loop early if so. In frontend code we want to
just return NULL for this failure (treating it like OOM). In the
backend, we can rely on the following palloc() call to throw error.
I also tightened things up in the calling functions in varlena.c,
using size_t rather than int and allocating the input workspace
with palloc_array(). These changes are probably unnecessary
given the knowledge that the original input and the normalized
output_chars array must fit into 1GB, but it's a lot easier to
believe the code is safe with these changes.
Reported-by: Xint Code
Reported-by: Bruce Dang <bruce@calif.io>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Heikki Linnakangas <hlinnaka@iki.fi>
Backpatch-through: 14
Security: CVE-2026-6473
The number of NFA states, number of NFA arcs, and number of colors
are all bounded to reasonably small values. However, there are
places where we try to allocate arrays sized by products of those
quantities, and those calculations could overflow, enabling
buffer-overrun attacks. In practice there's no problem on 64-bit
machines, but there are some live scenarios on 32-bit machines.
A related problem is that citerdissect() and creviterdissect()
allocate arrays based on the length of the input string, which
potentially could overflow.
To fix, invent MALLOC_ARRAY and REALLOC_ARRAY macros that rely on
palloc_array_extended and repalloc_array_extended with the NO_OOM
option, similarly to the existing MALLOC and REALLOC macros.
(Like those, they'll throw an error not return a NULL result for
oversize requests. This doesn't really fit into the regex code's
view of error handling, but it'll do for now. We can consider
whether to change that behavior in a non-security follow-up patch.)
I installed similar defenses in the colormap construction code.
It's not entirely clear whether integer overflow is possible
there, but analyzing the behavior in detail seems not worth
the trouble, as the risky spots are not in hot code paths.
I left a bunch of calls as-is after verifying that they can't
overflow given reasonable limits on nstates and narcs. Those
limits were enforced already via REG_MAX_COMPILE_SPACE, but
add commentary to document the interactions.
In passing, also fix a related edge case, which is that the
special color numbers used in LACON carcs could overflow the
"color" data type, if ncolors is close to MAX_COLOR.
In v14 and v15, the regex engine calls malloc() directly instead
of using palloc(), so MALLOC_ARRAY and REALLOC_ARRAY do likewise.
Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6473
Sufficiently large "count" arguments could result in undetected
overflow, causing the allocated memory chunk to be much smaller
than what the caller will subsequently write into it. This is
unlikely to be a hazard with 64-bit size_t but can sometimes
happen on 32-bit builds, primarily where a function allocates
workspace that's significantly larger than its input data.
Rather than trying to patch the at-risk callers piecemeal,
let's just redefine these macros so that they always check.
To do that, move the longstanding add_size() and mul_size() functions
into palloc.h and mcxt.c, and adjust them to not be specific to
shared-memory allocation. Then invent palloc_mul(), palloc0_mul(),
palloc_mul_extended() to use these functions. Actually, the latter
use inlined copies to save one function call. repalloc_array() gets
similar treatment. I didn't bother trying to inline the calls for
repalloc0_array() though.
In v14 and v15, this also adds repalloc_extended(), which previously
was only available in v16 and up.
We need copies of all this in fe_memutils.[hc] as well, since that
module also provides palloc_array() etc.
Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6473
The options "StartSel", "StopSel" and "FragmentDelimiter" given by a
caller of the SQL function ts_headline() have their lengths stored as
int16. When providing values larger than PG_INT16_MAX, it was possible
to overflow the length values stored, leading to incorrect behaviors in
generateHeadline(), in most cases translating to a crash.
Attempting to use values for these options larger than PG_INT16_MAX is
now blocked. Some test cases are added to cover our tracks.
Reported-by: Xint Code
Author: Michael Paquier <michael@paquier.xyz>
Backpatch-through: 14
Security: CVE-2026-6473