Previously, ProcSignalInit() read the global barrier generation before
publishing its PID into pss_pid. This created a race condition: a
process could initialize its local generation with an older global
value, while a concurrent EmitProcSignalBarrier() might skip that
process because its pss_pid was still zero. This resulted in
WaitForProcSignalBarrier() hanging indefinitely.
Fix this by publishing pss_pid before reading psh_barrierGeneration
with a memory barrier so that the store to pss_pid is ordered before
the load. A concurrent EmitProcSignalBarrier() then either observes
the published PID and signals this slot, or completes its generation
increment before we load it.
While this race has become more visible due to recent features using
signal barriers in more places (such as online wal_level changes), the
issue is theoretically present since signal barriers were introduced
to release smgr caches (e.g., in DROP DATABASE). v14 has the
procsiangl barrier infrastricutre but no in-tree caller that actually
emits a barrier, so the case is unreachable there.
This issue was also reported by buildfarm member flaviventris.
Reported-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAEze2WgAJmWReDN7Chtba8Er2YBvKCoa0KVN25-1evnTrHsLyA@mail.gmail.com
Backpatch-through: 15
Commit 2af1dc8928 placed the new "logical decoding disabled after
REPACK (CONCURRENTLY)" check at the end of
051_effective_wal_level.pl. That placement assumed the logical slot
"test_slot" no longer existed when the check ran, but the assumption
only holds on builds with injection points: the earlier
injection-point-driven tests drop "test_slot" as a side effect, while
on builds without injection points the slot persists. When
"test_slot" still exists, logical decoding remains enabled and the new
check fails on those buildfarm members.
Move the REPACK test earlier in the script, ensuring that the test
starts with logical decoding disabled.
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAD21AoBmdmBQ-+Jga+jSKKq5OPGEP1pEjSJfRPT6MCwVHLD6og@mail.gmail.com
REPACK (CONCURRENTLY) uses a temporary logical replication slot, which
is dropped once done, but it wasn't calling RequestDisableLogicalDecoding(),
leaving effective_wal_level stuck at 'logical'.
Fix by adding a Boolean flag to ReplicationSlotDropAcquired() to have it
request to disable logical decoding, and passing it as true on REPACK.
Other callers of that function preserve their existing behavior.
Author: Imran Zaheer <imran.zhir@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Discussion: https://postgr.es/m/CA+UBfaktds57dw2M8BEv_kS-=ixph3w+3MxKixtaDQMi_k7Ybg@mail.gmail.com
Commit 282b1cde9 made SignalBackends() ignore ListenerEntry entries
whose "listening" flag said that the listener was not yet committed.
That will be true for a new listener that has already registered its
queue position, but has not yet reached AtCommit_Notify(). If another
backend notifies the same channel in that window, SignalBackends()
would directly advance the new listener's queue position, causing it
to miss message(s). Really this is a definitional question: is a new
listener active as of PreCommit, or as of AtCommit? But it seems to
make more sense to expect that the new listener will see all messages
after its initially-registered queue position, especially since the
direct-advance logic is supposed to be an optimization that doesn't
affect semantics.
Fix this by treating all channel entries as valid wakeup targets.
Rename the "listening" flag to removeOnAbort to reflect its remaining
purpose: identifying staged LISTEN entries that abort cleanup must
remove.
While we're here, remove an obsolete test case added by 282b1cde9.
The check for "ChannelHashAddListener array growth" was meant to
exercise code that never made it into the committed patch, so now
it's just a waste of test cycles.
Author: Joel Jacobson <joel@compiler.org>
Reviewed-by: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/9835b0a4-9121-47ac-9c44-427b8b1a7f1b@app.fastmail.com
Discussion: https://postgr.es/m/6fe5ee75-537d-4d4f-909a-b21303c3ce75@app.fastmail.com
Concurrent DDL can leave behind objects referencing other objects that
no longer exist. This can happen if an object is dropped, while a new
object that depends on it is created concurrently. For example:
session 1: BEGIN; CREATE FUNCTION myschema.myfunc() ...;
session 2: DROP SCHEMA myschema;
session 1: COMMIT;
DROP SCHEMA does check that there are no objects dependending on the
schema being dropped, but it does not see objects being concurrently
created by other sessions. Even if it did, this scenario would still
fail:
session 1: BEGIN: DROP SCHEMA myschema;
session 2: CREATE FUNCTION myschema.myfunc() ...;
session 1: COMMIT;
When the DROP SCHEMA runs, the schema was empty, but the new function
is created in it before the dropping transaction completes. The CREATE
FUNCTION does not see that the schema is concurrently being dropped.
In both of these scenarios, the function is left behind in the schema
that no longer exists.
To fix, acquire AccessShareLock on all referenced objects when
recording dependencies. This conflicts with the AccessExclusiveLock
taken by DROP, preventing the race. After acquiring the lock, verify
that the object still exists, and if it was dropped concurrently,
report an error. We already had such a mechanism for shared
dependencies, but for some reason we didn't do it for in-database
dependendies.
Ideally the locks would be acquired much earlier when creating a new
object, but that will require modifying a lot of callers. This check
while recording the dependency is a nice wholesale protection, and
even if we change all the CREATE commands to acquire locks earlier,
it's still good to have this as a backstop to catch any cases where we
forgot to do so.
The patch adds a few tests for some cases that left behind orphaned
objects before this. It also adds a test for roles, which already had
such protection, although that test is partially disabled because the
error message includes an OID which is not predictable.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Discussion: https://postgr.es/m/ZiYjn0eVc7pxVY45@ip-10-97-1-34.eu-west-3.compute.internal
Backpatch-through: 14
When creating a relation with a dropped column, we called
recordDependencyOn() also on the datatype of the dropped column, which
is always InvalidOid. In versions 15 and above, that was harmless
because recordDependencyOn() considers InvalidOid as a pinned object,
and skips over it. On version 14, isPinnedObject() does not consider
InvalidOid as pinned, so we created a bogus pg_depend entry with
refobjectid == 0.
As far as I can tell, the only case when AddNewAttributeTuples() is
called with dropped columns is when performing a table-rewriting ALTER
TABLE command. That temporarily creates a new relation with the same
columns, including dropped ones, then swaps the relations, and drops
the newly created table again. So even on version 14, the bogus
pg_depend entry was only on the transient relation that was dropped at
the end of the ALTER TABLE command, which was harmless.
Even though this is harmless, let's be tidy, similar to commit
713bce9484. The reason I noticed this now and why I backported this,
is because the next commit will add code to acquire locks on the
referenced objects, and we don't want to acquire a lock on InvalidOid.
Discussion: https://postgr.es/m/ZiYjn0eVc7pxVY45@ip-10-97-1-34.eu-west-3.compute.internal
Backpatch-through: 14
Commit 316472146 introduced support for ECDH key exchange with an ifdef
guard to ensure support in the underlying OpenSSL installation. Commit
10bf4fc2c3 in OpenSSL removed this guard in 2015 which effectively made
our check a no-op. There has been no complaints that this doesn't work
and OpenSSL installations without ECDH support are likely very rare, so
remove the checks rather than re-implementing support. Not backpatched
since this fix doesn't alter functionality.
Also fix a typo introduced in the original commit which had survived
till this day.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/1787BA9F-A11C-4A7A-9252-94C470D5CBE3@yesql.se
DisownLatch() was executed after the PGPROC entry of the process
terminated is pushed back into a freelist. A newly-forked backend that
recycles the slot could call OwnLatch() and PANIC with a "latch already
owned by PID", taking down the server.
There were two scenarios related to lock groups where this issue could
be reached:
* A follower pushes the leader's PGPROC back to the freelist while the
leader has not yet called DisownLatch() in its own ProcKill().
* A leader outliving all its followers pushes its own PGPROC onto the
freelist before reaching DisownLatch(), which would be the most common
scenario.
This issue is fixed by calling SwitchBackToLocalLatch() and
DisownLatch() at an earlier phase of ProcKill(), before any freelist
manipulation happens, so that the slot of the backend terminated is
never exposed as owning a latch.
Note that pgstat_reset_wait_event_storage() is kept at a later stage.
An upcoming commit will take advantage of that by introducing a test
able to check the original PANIC scenario.
Author: Vlad Lesin <vladlesin@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/d2983796-2603-41b7-a66e-fc8489ddb954@gmail.com
Backpatch-through: 14
This commit fixes two bugs in ProcKill()'s lock-group teardown freelist
publication:
* a double push of the leader's PGPROC that corrupts the freelist.
* a leak of the last follower's PGPROC slot.
ProcKill()'s lock-group teardown had two PGPROC freelist updates
scattered through the function, done under two separate freeProcsLock
acquisitions:
* A follower's push of the leader's PGPROC, done when a follower is the
last group member exiting.
* Every backend's self-push at the bottom of the function.
The two freelist updates were coordinated only by inspecting
proc->lockGroupLeader, which a follower could clear as a side effect of
pushing the leader. This coordination was broken. For example, with
two concurrent backends:
* The follower clears leader->lockGroupLeader and pushes the leader's
PGPROC under leader_lwlock.
* The follower does not clear its own proc->lockGroupLeader, being
skipped.
* When the leader reaches the bottom of ProcKill(), it sees a NULL
proc->lockGroupLeader (the follower cleared it) and pushes itself,
causing a second dlist_push_tail() of the same node onto the same
freelist.
* The follower at the bottom sees its own proc->lockGroupLeader being
not NULL (never cleared) and skips its own push, causing its own slot
to leak.
This commit refactors the freelist manipulation to be done in two
distinct phases, each step using its own lock acquisition to ensure that
each freelist operation happens in an isolated manner for each backend
(follower or leader):
- First, under a single leader_lwlock acquisition, check the state of
the lock-group. Depending on if we are dealing with a follower and/or a
leader, and if the leader has exited before a follower, then set some
state booleans that define which actions should be taken with the
freelist.
- Second, under a single freeProcsLock acquisition, perform the cleanup
actions, self-push of a backend and/or push of the leader back to the
freelist.
This is an old issue, dating back to 9.6 where parallel workers and lock
grouping has been added.
Author: Vlad Lesin <vladlesin@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/d2983796-2603-41b7-a66e-fc8489ddb954@gmail.com
Backpatch-through: 14
When pg_createsubscriber fails after creating logical replication
objects, it should remove the publication and replication slot that
it created on the publisher.
Previously, if dropping subscriber-side objects failed,
pg_createsubscriber reset its internal cleanup state too early. As a
result, the exit-time cleanup could skip removing the publication or
replication slot on the publisher.
This could leave pg_createsubscriber-created objects behind on
the publisher after a failed run. That can make a retry harder,
because the leftover publication or replication slot may need to be
removed manually before running pg_createsubscriber again.
In the case of a replication slot, leaving it behind can also retain
WAL files longer than expected.
The cause of this issue was that the flags made_publication and
made_replslot tracking whether pg_createsubscriber created
a publication or replication slot on the primary were incorrectly
reset to false when failures occurred while dropping objects
on the subscriber.
This commit fixes the issue by preventing those cleanup flags from
being reset even when failures occurred while dropping objects
on the subscriber, ensuring proper cleanup of primary objects
before exit on failure.
Backpatch to v17, where pg_createsubscriber was added.
Author: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/CABdArM5V9QKK1PkLY9dpgAcZa3kUp84-wPqPovxvdLOri4=69w@mail.gmail.com
Backpatch-through: 17
Update stale comments and test names in 019_replslot_limit.pl to match
the actual WAL advancement and wal_status checks. Remove a redundant
standby stop in the inactive_since coverage.
Discussion: https://postgr.es/m/CABPTF7XxDonXAcz6DsN6AUJB3swYrZkJHq3UCDaD3Q2H%2Bj0gUA%40mail.gmail.com
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
wait_for_catchup() has "wait for the standby to reach the target LSN"
semantics. However, the previous polling implementation actually waited for
the primary to observe that position via pg_stat_replication.
7e8aeb9e48 introduced the new WAIT FOR LSN-based implementation, which
just probes the standby.
019_replslot_limit.pl relied on the old side effect: its
"slot state changes to extended/unreserved" subtests inspect
primary-side pg_replication_slots, whose wal_status depends on
restart_lsn, which only advances after the walsender processes a
standby reply. Make the test wait on what it actually needs by
replacing each wait_for_catchup() with
wait_for_slot_catchup('rep1', 'restart', primary->lsn('write')).
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/63f6abc9-c0ae-465d-a4e6-667eca6ea008@gmail.com
Author: Xuneng Zhou <xunengzhou@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
EventTriggerOnLogin() tries to clear pg_database.dathasloginevt when
the database no longer has any login event triggers but the flag is
still set. To make that safe against concurrent flag setters, it
takes a conditional AccessExclusiveLock on the database object.
On a hot standby, that lock acquisition fails outright with
FATAL: cannot acquire lock mode AccessExclusiveLock on database
objects while recovery is in progress
because LockAcquireExtended() refuses locks stronger than
RowExclusiveLock on database objects during recovery. The standby
already replays the flag's value from the primary, so the dangling
flag is the result of replaying a state in which the primary had
already dropped its login event triggers but not yet run a login
event trigger pass to clear the flag. Any session connecting to the
standby in that window therefore fails to connect.
Skip the cleanup on a standby. The flag will be cleared via WAL
replay once the primary clears it on its side.
Add a recovery TAP test that reproduces the original report: create
and drop a login event trigger on the primary in one session, wait
for the standby to replay, then verify that a fresh connection to
the standby succeeds.
Backpatch to v17, where the login event triggers were introduced.
Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reported-by: Egor Chindyaskin <kyzevan23@mail.ru>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Discussion: https://postgr.es/m/19488-d7ccfca2bf6b74b0%40postgresql.org
Backpatch-through: 17
Unlike the slotsync worker, whose retry cycles are separated by
transaction boundaries, pg_sync_replication_slots() retries within a
single SQL function call. Per-cycle allocations for slot names, plugin
names, database names, and auxiliary list containers get accumulated
across retries until the function returned. Memory growth is proportional
to the number of retries and remote slots, and the function may wait an
extended period between cycles when slots are slow to persist.
Fix by running each retry cycle in a short-lived memory context
(sync_retry_ctx) that is reset before the next attempt. Additionally,
release tuple slots created with MakeSingleTupleTableSlot() before
clearing the walreceiver result.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VVPxgfYyr8Kyi=+JACjckQ6NpniV9eRtHboj2hMn0REw@mail.gmail.com
QueueFKConstraintValidation() recurses through the partition hierarchy
to queue child constraint validations and to mark child rows as
validated. With a sufficiently deep partition tree, this can result
in a stack-overflow crash. Defend against that as we do elsewhere.
Bug: #19482
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19482-4cc37cbf52d55235@postgresql.org
Backpatch-through: 18
The original code would leave a shared memory segment unreleased if we
fail partway through initialization. Change the shutdown order so that
we always free it.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/agtNn6ZCmdI2KJFn@alvherre.pgsql
pg_get_multixact_stats() uses members_size to report the amount of
storage used by the currently retained multixact members. However,
MultiXactOffsetStorageSize() divided the member count by the number of
members per storage group before multiplying by the group size, so it
was rounding down its result and incorrectly reported zero when there
were few retained members. The calculation is changed to calculate the
same based on the member count.
While on it, this fixes a different issue in the isolation test
multixact-stats. Three fields were defined for checks related to the
oldest offset values, but were not used. The offsets existed in an
older version of the patch than what has been committed. These are
replaced by checks for members_size, checking the new calculation
formula.
Thinkos introduced in 97b101776c.
Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/819AC1B2-1A71-4244-B081-3ADD85D1725D@gmail.com
The wording of two error hints is tweaked in this commit:
- Import of extended statistics, where the value of an array element is
not a NULL or a string.
- Online data checksum switch, where a period was missing.
Author: Baji Shaik <baji.pgdev@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CA+fm-RMrKbyky_+vi5SDdAVnFVjWh7zW3GoDAVnrp5OpDnW6tw@mail.gmail.com
Given a WHERE clause like "int[] @@ query_int" or "query_int ~~ int[]"
where the query_int side is a table column having statistics,
_int_matchsel() exited without remembering to free the statistics
tuple. This would typically lead to warnings about cache refcount
leakage, like
WARNING: resource was not closed: cache pg_statistic (73), tuple 42/12 has count 1
It's been wrong since this code was added, in commit c6fbe6d6f.
Bug: #19492
Reported-by: Man Zeng <zengman@halodbtech.com>
Author: Man Zeng <zengman@halodbtech.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19492-ddcd0e22399ef85a@postgresql.org
Backpatch-through: 14
Previously, dblink accepted the use_scram_passthrough option on
foreign-data wrappers via ALTER FOREIGN DATA WRAPPER dblink_fdw
OPTIONS, even though the setting had no effect there.
use_scram_passthrough should be only meaningful for foreign servers
and user mappings, so this commit updates dblink to accept the option
only in those contexts.
Backpatch to v18, where use_scram_passthrough was introduced.
Author: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwEJ8rZjmbOvCicyr4vbuLio082bNTde0WNoSWaWr9wVcg@mail.gmail.com
Backpatch-through: 18
Commit 97f6fc10ff changed postgres_fdw so that user-mapping settings
override foreign server settings for use_scram_passthrough. This commit
applies the same behavior to dblink.
Backpatch to v18, where use_scram_passthrough was introduced.
Author: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwEJ8rZjmbOvCicyr4vbuLio082bNTde0WNoSWaWr9wVcg@mail.gmail.com
Backpatch-through: 18
Previously, when use_scram_passthrough was specified on both a foreign server
and a user mapping, the server-level setting took precedence over the
user-mapping setting. This was inconsistent with the usual semantics of
postgres_fdw options, where foreign server options provide shared defaults
and user mapping options override them on a per-user basis.
This commit updates postgres_fdw so that the user-mapping setting takes
precedence when use_scram_passthrough is specified in both places. This
matches the behavior of other connection options such as sslcert and sslkey.
Backpatch to v18, where use_scram_passthrough was introduced. In v18,
this only affects limited configurations that specify conflicting values
at both the foreign server and user-mapping levels. In such cases, users
would naturally expect the user-mapping setting to override the server-level
setting, so changing the behavior should be minimally disruptive.
Also keeping v18 as the only branch with different semantics for
use_scram_passthrough would be unnecessarily confusing, so backpatch
this fix to v18.
Author: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwEJ8rZjmbOvCicyr4vbuLio082bNTde0WNoSWaWr9wVcg@mail.gmail.com
Backpatch-through: 18
The CHECKPOINT reference page still described checkpoints as flushing
all data files, which could be misleading as it depends on the value
of FLUSH_UNLOGGED option. Update the description to make it clearer
that only data files of permanent relations are flushed by default.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/4855807D-F1CA-44E6-9B58-406691832848@gmail.com
Tab completion for CHECKPOINT options contained FLUSH_UNLOGGED, but
the boolean value was not part of the completion. Fix to make this
consistent with other boolean values.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/4855807D-F1CA-44E6-9B58-406691832848@gmail.com
ALTER TABLE ... SPLIT PARTITION allows a DEFAULT partition to be created
as one of the replacement partitions when the parent table does not
already have one. However, it should not allow the degenerate case where
a non-DEFAULT partition keeps exactly the same bound as the split
partition and the command merely adds a DEFAULT partition through the
SPLIT PARTITION path.
Detect that case by comparing the bound of the split partition with the
bound of the only non-DEFAULT replacement partition, and raise an error
when they are the same. Users should add a DEFAULT partition directly
with CREATE TABLE ... PARTITION OF ... DEFAULT or ALTER TABLE ... ATTACH
PARTITION ... DEFAULT instead.
The comparison goes through the partition operator family rather than
byte equality so that values which are binary-different but compare
equal under the partition key's comparator are treated as the same
bound. The corresponding regression test uses a float8 LIST partition
with -0.0 and 0.0 -- they have different bit patterns but are equal
under float8 -- to verify that a datumIsEqual()-based check would let
the degenerate split through while the partsupfunc-based check
correctly rejects it.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Discussion: https://postgr.es/m/C18878AB-DEB2-4A61-9995-A035DD644B81@gmail.com
The check for the minimum expected bytea size of a MVDependencies object
was using SizeOfItem() for its calculation. This macro uses the number
of attributes in a single dependency.
This minimum size calculation should be based on MinSizeOfItems(), that
computes the minimum expected size as the header plus the
minimally-sized number of dependency items.
Oversight in d08c44f7a4.
Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Discussion: https://postgr.es/m/4b8d299d-2505-4c30-bf80-0f697410db35@tantorlabs.com
Backpatch-through: 14
This reverts commit 0d3dba38c7, which was determined to have
fundamental flaws. This restricts REPACK (CONCURRENTLY) so that only
one process can run it concurrently on different tables and even on
different databases; we'll lift that restriction in another way during
the next development cycle.
Reported-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1Jg21ODQ7fS2fvN5W_S5kDRhAP5inj3XMRQaa=s-GbYhw@mail.gmail.com
When reusing an existing WAL receiver after it has reached
WALRCV_WAITING for new instructions, RequestXLogStreaming() copied
PrimaryConnInfo into WalRcv->conninfo before switching the state to
WALRCV_RESTARTING. At that point ready_to_display could still be true,
so pg_stat_wal_receiver could expose the raw connection string,
including sensitive fields, but it should only show the user-displayable
version of the connection string.
WALRCV_RESTARTING does not establish a new connection. The waiting WAL
receiver reuses its existing connection and only needs a new startpoint
and timeline, so there is no need to copy the raw connection string into
shared memory again. Let's only copy conninfo when launching a new WAL
receiver after WALRCV_STOPPED, not while waiting for instructions.
This commit adds coverage for the case fixed by this commit to the
timeline-switch test by verifying that the WAL receiver conninfo remains
consistent across the jump.
Backpatch all the way down, as this issue is possible since
pg_stat_wal_receiver has been introduced.
Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/EF91FF76-1E2B-4F3B-9162-290B4DC517FF@gmail.com
Backpatch-through: 14
Commit a36164e746 added a CONNECTING status for the WAL receiver, but
pg_stat_wal_receiver returned no information while the connection to the
primary was attempted, limiting the usability of the feature in
high-latency environments where the connection attempt to the primary
could take time.
This commit improves the report of the status by splitting the way the
shared memory state of the WAL receiver is filled before and after the
connection to the primary is attempted with walrcv_connect():
- Before the attempt, reset all the connection fields, switch
ready_to_display to true.
- After the attempt, fill in the connection fields.
This change means two spinlock acquisitions instead of one, but at least
monitoring tools can know about the connection attempt before its
completion, enlarging the usability of the feature. This code path is
taken only once when a WAL receiver is spawned, so the extra acquisition
does not matter performance-wise.
Reported-by: Chao Li <li.evan.chao@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/EF91FF76-1E2B-4F3B-9162-290B4DC517FF@gmail.com
Commit 112faf1378 added custom notice receivers for replication,
postgres_fdw, and dblink so that remote NOTICE, WARNING, and similar
messages are reported via ereport(). However, those notice receivers were
installed only after libpqsrv_connect() and libpqsrv_connect_params()
returned, by which point libpq connection startup had already completed.
As a result, messages emitted during connection establishment could be
missed.
This commit fixes the issue by splitting libpqsrv_connect() and
libpqsrv_connect_params() into separate start and complete phases:
libpqsrv_connect_start(), libpqsrv_connect_params_start(), and
libpqsrv_connect_complete(). This allows callers to perform
per-connection setup, such as installing a notice receiver, after the
connection has been started but before startup completes.
Note that callers of libpqsrv_connect_start() and
libpqsrv_connect_params_start() must still call
libpqsrv_connect_complete(), even if the start function returns NULL, so
that any external FDs reserved during startup are released properly.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com>
Discussion: https://postgr.es/m/A2B8B7DE-C119-492F-A9FA-14CF86849777@gmail.com
The documentation states that NOT NULL constraints on partitioned tables
are always inherited by all partitions, and therefore cannot be declared
NO INHERIT. While a check already existed to reject creating such
constraints with NO INHERIT, previously the same check was missing for
ALTER TABLE ... ALTER CONSTRAINT ... NO INHERIT.
This commit adds the missing check so that attempting to set NO INHERIT
on a partitioned NOT NULL constraint now fails.
Backpatch to v18, where ALTER TABLE ... ALTER CONSTRAINT ... [NO] INHERIT
was added.
Author: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/ecc985ad-6ec1-4094-a315-317943ca5f3f@proxel.se
Backpatch-through: 18
ALTER TABLE ... SPLIT PARTITION allows a DEFAULT partition to be created
as one of the replacement partitions when the parent table does not
already have one. However, it should not allow the degenerate case where
a non-DEFAULT partition keeps exactly the same bound as the split
partition and the command merely adds a DEFAULT partition through the
SPLIT PARTITION path.
Detect that case by comparing the bound of the split partition with the
bound of the only non-DEFAULT replacement partition, and raise an error
when they are the same. Users should add a DEFAULT partition directly
with CREATE TABLE ... PARTITION OF ... DEFAULT or ALTER TABLE ... ATTACH
PARTITION ... DEFAULT instead.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Discussion: https://postgr.es/m/C18878AB-DEB2-4A61-9995-A035DD644B81@gmail.com
Commit 263d1e6dfe changed pg_recvlogical to honor source cluster file
permissions when creating output files. This commit adds tests verifying
that output files are created with mode 0600 when the source cluster is
initialized without group access, and with mode 0640 when group access is
enabled.
Author: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwHhpizYzMo3nFP4GkNMueSNMY3QfC-gBN1VTXtuiANDvw@mail.gmail.com
Commit c37b3d08ca attempted to preserve group permissions on pg_recvlogical
output files when group access was enabled on the source cluster. However,
the output files were still created with a fixed S_IRUSR | S_IWUSR mode,
preventing group-read permissions from being applied.
This commit fixes the issue by creating output files with pg_file_create_mode
instead of a hard-coded mode. This allows pg_recvlogical to correctly preserve
group permissions from the source cluster.
Backpatch to all supported branches.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwHhpizYzMo3nFP4GkNMueSNMY3QfC-gBN1VTXtuiANDvw@mail.gmail.com
Backpatch-through: 14
When the launching backend of REPACK (CONCURRENTLY) is terminated via
pg_terminate_backend(), ProcDiePending causes ereport(FATAL) which
bypasses PG_FINALLY blocks. As a result, stop_repack_decoding_worker()
is never called, leaving the decoding worker running indefinitely and
holding its temporary replication slot.
Fix by using PG_ENSURE_ERROR_CLEANUP, which handles both ERROR and
FATAL exits.
Author: Baji Shaik <baji.pgdev@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CA+fm-RNoPxL2N7db_A0anMXV_aDu6jWj4PNOPtMtBUAPDPvSXQ@mail.gmail.com
The documentation said that the bounds of new partitions should not
overlap and that their combined bounds should equal the bounds of the
split partition. That is misleading when a new DEFAULT partition is
specified, because the explicit partitions may cover only part of the
split partition while the DEFAULT partition covers the rest.
Clarify that new non-DEFAULT partition bounds must not overlap with
other new or existing partitions and must be contained within the bounds
of the split partition. Also state that the combined bounds must exactly
match the split partition only when no new DEFAULT partition is specified.
While here, improve nearby wording about hash-partitioned target tables
and splitting a DEFAULT partition with the same partition name.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Discussion: https://postgr.es/m/C18878AB-DEB2-4A61-9995-A035DD644B81@gmail.com
When ALTER TABLE ... SPLIT PARTITION specifies a DEFAULT partition, the
explicit partitions do not need to cover the split partition's bound
exactly. They may cover only part of it, with the DEFAULT partition
covering the remaining range.
However, the existing hint said that the combined bounds of the new
partitions must exactly match the bound of the split partition, which is
misleading for this case and inconsistent with the code comment.
Fix the hint to state the actual requirement: explicit partition bounds
must stay within the bounds of the split partition when a DEFAULT
partition is specified.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Discussion: https://postgr.es/m/C18878AB-DEB2-4A61-9995-A035DD644B81@gmail.com
When splitting a range partition and defining a new DEFAULT partition, the
validation checked the lower bound of the first explicit partition and the
upper bound of explicit partitions only when they were not first. If there
was exactly one explicit non-DEFAULT partition, its upper bound was therefore
not checked.
This could allow the replacement partition to extend beyond the upper bound
of the partition being split, potentially overlapping another existing
partition.
Fix this by checking the upper bound whenever the explicit partition is the
last one. Add a regression test covering the single explicit partition plus
DEFAULT case.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Zhenwei Shang <a934172442@gmail.com>
Reviewed-by: Dmitry Koval <d.koval@postgrespro.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Discussion: https://postgr.es/m/C18878AB-DEB2-4A61-9995-A035DD644B81@gmail.com
When using COPY FROM ... ON_ERROR SET_NULL with a selective column list, the
domain_with_constraint array was incorrectly allocated based on the length of
the target column list. While the array was populated sequentially,
CopyFromTextLikeOneRow attempted to access it using the physical attribute
index (attnum - 1). This mismatch caused out-of-bounds reads when targeting
high-numbered columns, allowing NULL values to bypass NOT NULL domain checks
and be silently inserted.
Fix by allocating the array to match the total number of physical attributes
(num_phys_attrs) and indexing via attnum - 1, bringing it into alignment with
other per-column arrays in BeginCopyFrom.
Author: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDdej0c0gWJi2FnbirzhgzyZNPiTwC1P5B_-dSNCzq-91A@mail.gmail.com
The macro for enabling single-copy atomicity on i586+ when using
GCC has been incorrect since 2017 (commit e8fdbd58f) without any
complaints, and getting it to work is non-trivial.
Getting this to work reliably require C11 atomics, which in turn
also bumps the required MSVC version. For now, simply remove the
attempted support which doesn't work anyways.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reported-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAKZiRmycHOOJyEPc9FUss1_69_U62WoSx32jT7wyES-YkStZKA@mail.gmail.com
Discussion: https://posrgr.es/m/CA+hUKGKFvu3zyvv3aaj5hHs9VtWcjFAmisOwOc7aOZNc5AF3NA@mail.gmail.com
This comment was originally added to RegisterInvalid() in POSTGRES before
Postgres95, and came in via the Postgres95 import. It has been obsolote
for quite some time so remove.
Author: Steven Niu <niushiji@highgo.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/MN2PR15MB30219837B2381AE2518A4C45A7FCA@MN2PR15MB3021.namprd15.prod.outlook.com
ParseVariableDouble missed returning false after logging an error when
the parsed value exceeded max, making the value assigned rather than
rejected. Backpatch down to v18 where this was introduced as part of
the \WATCH_INTERVAL.
Author: Sven Klemm <sven@tigerdata.com>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAMCrgp31p_5SDVi7dwnP39tTW5icQ0MWHA+N4kJdXgkL0PEy8w@mail.gmail.com
Backpatch-through: 18
The error message for incorrect oauth validator configuration was missing
a quote character. OAuth was introduced in v18 but there is no need for a
backpatch since this was introduced in 22f9207aaa.
Author: Jonathan Gonzalez V. <jonathan.abdiel@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/ff9b84b9e6d5a3fef1f320ee5d63ec7dae722739.camel@gmail.com
This commit addresses some defects with the handling of expressions in
pg_restore_extended_stats() and pg_clear_extended_stats():
- Misleading WARNING for an incorrect number of expressions, where the
number of required expressions was reported as the number of elements
given in input rather than the actual number of expressions expected by
the extstats object definition.
- Incorrect matching of expression names, where a key name was
considered as valid as long as it matched with the prefix of a legit key
name. For example "correlatio" given in input would match with
"correlation", and be considered valid. The consequence of this bug was
a silent discard of the input data, where the operation would be
considered a success. The value associated to the prefixed key was not
inserted in the catalogs, just ignored. pg_dump would not generate such
input data patterns, but a user doing manual stats injection could.
- Missing heap_freetuple() in pg_clear_extended_stats(), for the case
where the extstats object in input does not match with its parent
relation.
Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/A7C11B83-7534-4A09-9071-FBD09175CFC8@gmail.com
Previously, REPACK option parsing had two bugs.
First, REPACK (CONCURRENTLY OFF) failed with:
ERROR: unrecognized REPACK option "concurrently"
while CONCURRENTLY ON was accepted correctly.
Second, when the same option was specified multiple times, the last value
specified was not always honored. If any occurrence set the option to ON,
the option was treated as enabled even when the final setting was OFF.
This commit fixes these issues by correctly accepting CONCURRENTLY
regardless of its value, and by making the last specified value take precedence
when an option appears multiple times.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAHGQGwGAY4kfDtC4i+hAOX-a3u0yOA6__6EDTQz-ytsDHgh-yQ@mail.gmail.com
The IGNORE NULLS implementation caches whether a window function argument
evaluated to NULL or NOT NULL for a given partition row. That is safe for
ordinary expressions, but not for volatile expressions, where evaluating the
same argument on the same row can produce a different NULL/NOT NULL result
later.
This could produce wrong results in two ways. A row previously cached as
NULL could be skipped even though a later evaluation would return NOT NULL.
Conversely, a row cached as NOT NULL could be chosen as the target row, then
re-evaluated to fetch the actual value and return NULL.
Make the nullness cache conditional per argument. Do not use it for
arguments containing volatile functions or subplans, following the same
conservative approach used for moving window aggregates. Also avoid
re-evaluating non-cacheable partition arguments after the scan has already
found the target row.
Add regression tests covering volatile arguments and subplan arguments with
IGNORE NULLS.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Tatsuo Ishii <ishii@postgresql.org>
Discussion: https://postgr.es/m/42B42506-6972-4266-8422-FB73E61D9DA7@gmail.com
This commit moves the definitions of InjectionPointConditionType and
InjectionPointCondition into a new header local to the test module
injection_points.h, so as these can be shared across more files in the
module. A patch for a bug fix is under discussion, whose proposed test
will benefit from this refactoring.
Backpatch down to where the module exists, as this should be useful for
future bug fixes, even cases unrelated to the thread where this change
has been discussed.
Author: Andrey Borodin <x4mmm@yandex-team.ru>
Author: Vlad Lesin <vladlesin@gmail.com>
Discussion: https://postgr.es/m/d2983796-2603-41b7-a66e-fc8489ddb954@gmail.com
Backpatch-through: 17
Three locations use Assert() to guard against a mismatch between the
number of columns advertised in the RELATION message and the number
actually received in the subsequent INSERT/UPDATE tuple message. Since
these values originate from the publisher, the check must survive into
production builds.
A malicious or buggy publisher can send a RELATION claiming N columns
and an INSERT claiming M < N columns. The subscriber's apply worker
indexes into colvalues[]/colstatus[] using column indices from the
RELATION message's attribute map, causing a heap out-of-bounds read when
the tuple's column array is smaller than expected. We've looked, without
success, for a scenario in which the publisher holds sufficient control
over these out-of-bounds bytes to exploit this or even to reach a
SIGSEGV. Despite not finding one, the code has been fragile. Back-patch
to v14 (all supported versions).
Reported-by: Varik Matevosyan <varikmatevosyan@gmail.com>
Author: Varik Matevosyan <varikmatevosyan@gmail.com>
Discussion: https://postgr.es/m/CA+bBoog3cCogktzfLb9bppUByu-10B3CFp8u=iKXG_OvtAguCw@mail.gmail.com
Backpatch-through: 14
There is now only one caller of ProcessStartupPacket(). Let's simplify
the routine so as the GSS and SSL states are tracked inside it. If
future callers are added, there is less guessing to do.
Suggested-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/aga7lCWluyc5zLb5@paquier.xyz
In some cases its necessary to understand whether TSC frequency data was
sourced from CPUID, and which of the registers. Show this debug info at
the end of pg_test_timing, and rework TSC functions to support that.
This would have helped debug the buildfarm report fixed in 7fc36c5db5
and is likely going to aid in any TSC-related issues reported during the
beta period or later.
Additionally, emit a warning if TSC frequency from calibration differs
by more than 10% from the TSC frequency in use, and suggest the use
of timing_clock_source = 'system'.
In passing, add an explicit early return in the output function if the
loop count is zero. This can't happen in practice, but coverity complained
because we unconditionally call output for the fast TSC measurement.
Author: Lukas Fittl <lukas@fittl.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (coverity fix only)
Discussion: https://postgr.es/m/CAP53Pkw3Gzb+KTF5pu_o7tzbfZ7+qm2m6uDWuGtTJjZpV9yNpg@mail.gmail.com
Commit 28972b6fc ("Add support for importing statistics from remote
servers.") stored the names of local/remote columns for a foreign table
into the buffers of NAMEDATALEN bytes in this structure, without
accounting for the possibility that the remote column name in particular
could be longer than NAMEDATALEN - 1. If it was longer than that, this
would leave it unterminated/truncated in the buffer, invoking undefined
behavior when match_attrmap() processes it, which assumes that it's
fully-contained/terminated in the buffer.
To fix, replace the buffers with char pointers, pstrdup the local/remote
column names, and store the results into the pointers. This commit also
adds a function to clean up the nested data structure.
Per Coverity and Tom Lane.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/342868.1776017700%40sss.pgh.pa.us
Previously, the subscription setting retain_dead_tuples didn't cause
ALTER SUBSCRIPTION ... SERVER to check the publisher. And if the
publisher was checked for some other reason, then it would use the old
conninfo.
Fix ALTER SUBSCRIPTION ... SERVER to always check the publisher when
retain_dead_tuples is set, and to use the new connection info, like
ALTER SUBSCRIPTION ... CONNECTION.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/f13a8e29410bbbf9999290f2c04513a8884fa51c.camel@j-davis.com
These tests have been removed by 906ea101d0, due to some of them being
unstable in the buildfarm with low max_stack_depth values. They are now
reworked so as they should be more portable.
The tests to cover the findoprnd() overflows use a balanced tree to
avoid using too much stack, per a suggestion and an investigation by Tom
Lane.
Note: This is initially applied only on HEAD; a backpatch will follow
should the buildfarm be fine with the situation.
Discussion: https://postgr.es/m/agZc6XecyE7E7fep@paquier.xyz
Backpatch-through: 14
Previously, tab completion for REPACK parenthesized boolean options
(ANALYZE, CONCURRENTLY, and VERBOSE) did not suggest the boolean values
ON and OFF, unlike VACUUM.
This commit fixes the issue by adding ON/OFF completion for those options.
Author: Baji Shaik <baji.pgdev@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CA+fm-RNZpy7MAceR9gSyy833H_uL-fTx0LxO73RnvwEaprpuRA@mail.gmail.com
When an UPDATE statement triggers check_foreign_key() with the
action set to "cascade", it generates more UPDATE statements to
modify the key values in referencing relations. If a new key value
is NULL, SPI_getvalue() returns a NULL pointer, which is
subsequently passed to quote_literal_cstr(), causing a segfault.
To fix, skip quoting when a new key value is NULL and insert an
unquoted NULL keyword instead.
Oversight in commit 260e97733b. While the refint documentation
recommends marking primary key columns NOT NULL, the aforementioned
scenario accidentally worked on platforms where snprintf()
substitutes "(null)" for NULL pointers. Note that for
character-type columns, the old code quoted "(null)" as a string
literal, so this didn't always produce correct results. But it
still seems better to fix this than to reject cases that previously
worked.
Reported-by: Nikita Kalinin <n.kalinin@postgrespro.ru>
Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: Pierre Forstmann <pierre.forstmann@gmail.com>
Discussion: https://postgr.es/m/19476-bd04ea6241345303%40postgresql.org
Backpatch-through: 14
Commit 4bea91f21f enabled COPY TO on a partitioned table to read
tuples from its partitions and mapped them to the root table's tuple
descriptor before output. However, it incorrectly built the attribute
map from the root table to the partition.
This commit fixes by building the attribute map from the partition to
the root table, ensuring that partition attributes are correctly
mapped to their corresponding root attributes.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/85EA70F3-C3DB-477B-B856-EA569FDAAE7C@gmail.com
Commit b7b0f3f272 ("Use streaming I/O in sequential scans") routed
sequential scans through read_stream_next_buffer(), bypassing the
RELATION_IS_OTHER_TEMP() check in ReadBufferExtended(). As a result,
a superuser can attempt to read or modify temp tables of other
sessions through the read-stream path. When the query plan uses no index,
SELECT/UPDATE/DELETE/MERGE silently see no rows / report zero affected rows,
and COPY produces an empty output -- because the buffer manager has no
visibility into the owning session's local buffers and silently returns
nothing. Any query plan that uses, for instance, a btree index
still errors out via the existing check in ReadBufferExtended(), which
is reached from hio.c and nbtree respectively, but this is incidental.
Fix by enforcing RELATION_IS_OTHER_TEMP() at the three additional
buffer-manager entry points:
- read_stream_begin_impl() rejects the read at stream setup time,
covering sequential and bitmap scans that go through the
read-stream path.
- ReadBuffer_common() becomes the canonical place for the check,
consolidating the existing one previously kept in
ReadBufferExtended(). All ReadBufferExtended() callers go through
ReadBuffer_common(), so the consolidation is behavior-preserving.
- StartReadBuffersImpl() catches direct callers of StartReadBuffers()
that bypass both of the above. This is currently defense-in-depth,
but documents the contract for future code.
The companion test in src/test/modules/test_misc was added in the
preceding commit; this commit updates the assertions for SELECT,
UPDATE, DELETE, MERGE, and COPY (which previously documented the
bug as silent success) to expect the new error.
Author: Jim Jones <jim.jones@uni-muenster.de>
Author: Daniil Davydov <3danissimo@gmail.com>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAJDiXghdFcZ8%3Dnh4G69te7iRr3Q0uFyXxb3ZdG09_GTNZXwH0g%40mail.gmail.com
Backpatch-through: 17
Add a TAP test in src/test/modules/test_misc that documents what
happens when one session attempts to read or modify another session's
temporary table. This commit only adds tests; it does not change
backend behavior, so the assertions reflect current behavior:
- SELECT, UPDATE, DELETE, MERGE, COPY on a table without an index
silently succeed with no error and zero rows / zero affected rows.
These commands run through the read-stream path, which currently
bypasses the RELATION_IS_OTHER_TEMP() check. This is the
underlying bug to be fixed in a follow-up.
- INSERT errors with "cannot access temporary tables of other
sessions" because hio.c calls ReadBufferExtended() to find a page
with free space and is caught by the existing check there.
- Index scan errors via the same existing check, reached through
nbtree -> ReadBuffer -> ReadBufferExtended.
- TRUNCATE / ALTER TABLE / ALTER INDEX / CLUSTER fail with their
command-specific error messages.
- VACUUM is silently skipped to avoid noise during database-wide
VACUUM (vacuum_rel() returns without warning).
- DROP TABLE is intentionally allowed: DROP does not touch the
table's contents, and autovacuum relies on this to clean up
temp relations orphaned by a crashed backend.
- ALTER FUNCTION / DROP FUNCTION on an owner-created function over
its own temp row type work as catalog operations -- they don't
read the underlying data.
- CREATE FUNCTION from a separate session, using another session's
temp row type as an argument, is allowed but emits a NOTICE: the
function is moved into the creator's pg_temp namespace with an
auto-dependency on the borrowed type, so it disappears together
with the session that created it.
- A bare DROP TABLE on a temp table that has a cross-session
dependent function fails with a catalog-level dependency error.
- LOCK TABLE in ACCESS SHARE mode on another session's temp table
succeeds and properly blocks the owner's session-exit cleanup
(which acquires AccessExclusiveLock via findDependentObjects).
This exercises the same LockRelationOid path used by autovacuum
when cleaning up orphaned temp relations.
- When the owner session ends, the normal session-exit cleanup
cascades through DEPENDENCY_NORMAL and removes both the temp
objects and any cross-session functions that depended on them.
Also, document the contract for RELATION_IS_OTHER_TEMP() so that
future buffer-access entry points enforce the same rule.
Backpatch this through PostgreSQL 17, where b7b0f3f272 introduces a code
path bypassing this check.
Author: Jim Jones <jim.jones@uni-muenster.de>
Author: Daniil Davydov <3danissimo@gmail.com>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAJDiXghdFcZ8%3Dnh4G69te7iRr3Q0uFyXxb3ZdG09_GTNZXwH0g%40mail.gmail.com
Backpatch-through: 17
build_remattrmap() deparses a list of remote column names for a query
that retrieves attribute stats for them from the remote server.
Previously, it did so by using the array-literal syntax with each column
name individually quoted by quote_identifier(), causing the query to
fail on the remote server with a syntax error or no results when that
column name included a single quote or backslash, as quote_identifier()
doesn't escape those characters, making the query invalid or incorrect.
Fix by switching from the array-literal syntax to the ARRAY constructor
syntax with each column name individually quoted by
deparseStringLiteral().
Oversight in commit 28972b6fc.
Reported-by: Satya Narlapuram <satyanarlapuram@gmail.com>
Reported-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: Alex Guo <guo.alex.hengchen@gmail.com>
Reviewed-by: Zhenwei Shang <a934172442@gmail.com>
Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/CAHg%2BQDc9%3DWtYi%3DJW6QUL6ASOJc6PcGPTuxoMkhnkQ7oi7j5atg%40mail.gmail.com
Discussion: https://postgr.es/m/CAJTYsWWGhVDFjr%2BsmdYdU-Q_TT9YMzXA4QcLCr7rizDOyrEEow%40mail.gmail.com
The jsonpath .split_part() method passed its field-position argument
through numeric_int4(), that can fail hard if called directly.
This commit switches the code to use numeric_int4_safe() with an error
context for soft reporting, so as the overflow and zero field-position
cases can be handled in silent mode.
Oversight in bd4f879a9c.
Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/FCF996D0-580B-431C-8DE1-A540C58E444C@gmail.com
When pgbench runs with multiple threads and verbose error reporting is
enabled (--verbose-errors), multiple clients can build verbose error
messages concurrently. Previously, a function-local static
PQExpBuffer was used for these messages, causing the buffer to be
shared across threads. This was not thread-safe and could result in
corrupted or incorrect log output.
Fix this by using a local PQExpBufferData instead of a static buffer.
This keeps verbose error messages correct during concurrent execution.
Backpatch to v15, where this issue was introduced.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Alex Guo <guo.alex.hengchen@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwER1AjGXpkKB9t9820NBhMQ_Ghv7=HsKeodUr3=SZsF4g@mail.gmail.com
Backpatch-through: 15
Use consistent "REPACK (CONCURRENTLY)" naming in errhint messages,
matching the actual command syntax and the errmsg text used elsewhere
in the same file. Also improve the ereport() after XLogReadRecord
failure to be like others in the tree.
While at it, remove direct mentions of the DDL in the translatable
strings, both in the same errhint() calls as well as some errmsg()
calls. Add periods where missing.
There are all oversights in 28d534e2ae.
Reported-by: Baji Shaik <baji.pgdev@gmail.com>
Discussion: https://postgr.es/m/CA+fm-RPxX1xTcYY4qQGPRDXB2-Fy2SDNdZi=zVjr0j=MPg2PaA@mail.gmail.com
"egrep" has never been in POSIX; the standard way to access this
functionality is "grep -E". Recent versions of GNU grep have
started to warn about this, so stop using "egrep".
This could be back-patched, but I see little need to do so
because the affected places are not code that runs during
normal builds. (Perhaps src/backend/port/aix/mkldexport.sh
is an exception, but let's wait to see if any AIX users
complain before touching that.)
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/473272.1778685870@sss.pgh.pa.us
Run renumber_oids.pl to move high-numbered OIDs down, as per pre-beta
tasks specified by RELEASE_CHANGES. For reference, the command was
./renumber_oids.pl --first-mapped-oid 8000 --target-oid 6400
(but there were already some used OIDs at 6400, so the first one
actually assigned was 6434).
Update typedefs.list from the buildfarm, and run pgindent.
The changes from the new typedefs list are pretty minimal,
since we'd been pretty good (not perfect) about updating
typedefs.list by hand. But the pgindent behavior changes
installed by a3e6beba6, b518ba4af, and 60f9467c3 add up
to make this a relatively sizable diff.
Enforce this standard formatting of multiline comments that start
in column 1:
/*
* line 1
* line 2
*/
Unlike indented comments, we don't reconsider line breaks, except
for forcing the initial /* and trailing */ onto their own lines.
We do make each line start with " *", with some whitespace following.
We preserve pgindent's existing behavior of not touching comments
that begin with /**... or /*-... Also, if the first line looks like
/* === or /* ---, we don't split that line; similarly for the last
line.
The vast majority of multiline comments in our tree already look
like this, but this change will clean up some stragglers.
Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reported-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAJ7c6TPQ0kkHQG-AqeAJ3PV_YtmDzcc7s%2B_V4%3Dt%2BxgSnZm1cFw%40mail.gmail.com
Discussion: https://postgr.es/m/EB0141C5-ACC2-4F0B-85EA-0E3AFBCE322F@umbc.edu
Formatting of variadic functions and struct literals with named fields
used to be ugly due to pg_bsd_indent treating period as always being a
binary operator. After a comma, it's not that, so insert a space.
Bump pg_bsd_indent's version so that people who use out-of-tree
copies will know they need to update. (This also covers the other
pg_bsd_indent behavioral change introduced in a3e6beba6.)
Author: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/c3327be8-09e2-46a1-88b4-228a339d6916@proxel.se
When a struct member name matches a registered typedef, pgindent
removes the space after "!=" (and some other operators), like so:
entry->dsh.dsa_handle !=DSA_HANDLE_INVALID
The problem is that the related code in lexi.c sets last_u_d to
true before jumping to found_typename, causing the next operator to
be classified as unary and suppressing the following space. This
is correct for type names, but not for struct members. For
example, "Datum *x" needs "*" to be unary to suppress the space
before "x". To fix, only set last_u_d before jumping to
found_typename if the typedef name doesn't appear after "." or
"->".
Note that this does not bump INDENT_VERSION. We'll do that just
once after some other changes to pg_bsd_indent are committed.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/aS9hkwnkWf3dZIA_%40nathan
Both UPDATE and DELETE were failing to test that the application-time
column was updatable. The column is not part of
perminfo->updatedCols, because it should not be checked for
permissions. And it needs to be checked in the DELETE case as well,
since we might insert leftovers with a value for that column.
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Co-authored-by: jian he <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACJufxFRqg8%3DgbZ-Q6ZS_UQ%2BYdwfZpk%2B9rf7jgWrk8m4RMUm%3DA%40mail.gmail.com
As mentioned in 8268e41aca, pgss_ProcessUtility() may free the
PlannedStmt after an internal ROLLBACK. This commit sets the
PlannedStmt "pstmt" to NULL once it is no longer safe to rely on it,
making bugs similar to the one fixed by the previous commit easier to
detect.
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/0A9A8DAC-BC3C-4C7A-9504-2C6050405544@anarazel.de
Two cases fixed by 2b5ba2a0a1 were not covered, to emulate the
handling of corrupted data, for:
- set control bit with a valid 2-byte match tag where offset is 0.
- set control bit with a valid 2-byte match tag where offset exceeds
output written.
Oversight in 67d318e704.
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/agF4xkIdRcrCIprs@paquier.xyz
Backpatch-through: 14
Previously, pg_stat_progress_copy in the subscriber could continue to show
the initial COPY operation for logical replication table synchronization as
active even after the data copy had finished. The stale progress entry
remained visible until synchronization caught up with the publisher.
This happened because the table synchronization code called BeginCopyFrom()
and CopyFrom(), but failed to call EndCopyFrom() afterward.
This commit fixes the issue by adding the missing EndCopyFrom() call so that
the COPY progress state in the subscriber is cleared as soon as the initial
data copy completes.
Backpatch to all supported branches.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: ChangAo Chen <cca5507@qq.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAOzEurQKuy3RiPkd=25PEwEzaqHuGvEOf=X7vaVzhgNjaukYzA@mail.gmail.com
Backpatch-through: 14
Oleg's original comment was intelligible only to him.
Aleksander has reverse-engineered what seems like a plausible
explanation of what the code is trying to do, so replace the
comment with that. (Also, re-order the final expression to
match the new comment.)
In passing, this makes the comment satisfy our usual formatting
conventions. pgindent has let it pass as-is so far, but planned
changes would mess it up without some sort of intervention.
Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAJ7c6TO0xvunpeOv89i1eKQBhKF9=GEETkTz+yAGs1xGYH25MQ@mail.gmail.com
pgss_ProcessUtility() included a reference to a portion of a PlannedStmt
after the point where this data's structure could have been freed,
causing an incorrect memory access. There was a comment documenting
this requirement, missed in 3357471cf9.
This commit includes a test able to make valgrind complain with a
PlannedStmt freed by an internal ROLLBACK query. Similarly to what is
mentioned in 495e73c207, this can be triggered by using the extended
query protocol, something that can be now tested thanks to the recent
meta-command additions in psql. This commit mentions potential other
cases, but as far as I can see the extended protocol case with an
internal ROLLBACK is the only problematic pattern reachable in practice.
Issue introduced by 3357471cf9, gone unnoticed due to a lack of test
coverage. The fix is authored by Chao, my contribution being the new
test.
Author: Chao Li <li.evan.chao@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/2F91906A-F2B5-4A6B-9695-D136957D4545@gmail.com
REPACK replay builds scan keys for the replica identity index, but it
hard-coded BTEqualStrategyNumber when looking up the equality operator.
That is not correct for non-btree identity indexes, such as the GiST
indexes created for WITHOUT OVERLAPS primary keys. In addition,
find_target_tuple() accepted the first tuple returned by the identity
index scan, which is unsafe for lossy index scans because the index AM may
return false positives with xs_recheck set.
Fix this by using IndexAmTranslateCompareType() to translate COMPARE_EQ
to the equality strategy number for the index AM, and by continuing the
scan when recheck is required until a candidate tuple matches the locator
tuple on all replica identity key columns.
The recheck uses the same equality operator functions as the identity
index scan keys, preserving ScanKey argument ordering.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/7B0EC0EC-5461-41EF-9B31-F9BBE608DEA5@gmail.com
These checks are failing in the buildfarm, reporting stack overflows
rather than the expected errors, though seemingly only on ppc64 and
s390x platforms. Perhaps there is something off about our tests
for stack depth on those architectures? But there's no time to
debug that right now, and surely these tests aren't too essential.
Revert for now and plan to revisit after the release dust settles.
Backpatch-through: 14
Security: CVE-2026-6473
Maliciously crafted key value updates could achieve SQL injection
within check_foreign_key(). To fix, ensure new key values are
properly quoted and escaped in the internally generated SQL
statements. While at it, avoid potential buffer overruns by
replacing the stack buffers for internally generated SQL statements
with StringInfo.
Reported-by: Nikolay Samokhvalov <nik@postgres.ai>
Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Security: CVE-2026-6637
Backpatch-through: 14
When result_is_int is set to 0, PQfn() cannot validate that the
result fits in result_buf, so it will write data beyond the end of
the buffer when the server returns more data than requested. Since
this function is insecurable and obsolete, add a warning to the top
of the pertinent documentation advising against its use.
The only in-tree caller of PQfn() is the frontend large object
interface. To fix that, add a buf_size parameter to
pqFunctionCall3() that is used to protect against overruns, and use
it in a private version of PQfn() that also accepts a buf_size
parameter.
Reported-by: Yu Kunpeng <yu443940816@live.com>
Reported-by: Martin Heistermann <martin.heistermann@unibe.ch>
Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Security: CVE-2026-6477
Backpatch-through: 14
If you accumulate many arrays full of NULLs, you could overflow
'nitems', before reaching the MaxAllocSize limit on the allocations.
Add an explicit check that the number of items doesn't grow too large.
With more than MaxArraySize items, getting the final result with
makeArrayResultArr() would fail anyway, so better to error out early.
Reported-by: Xint Code
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 14
Security: CVE-2026-6473
pg_locale_icu.c was full of places where a very long input string
could cause integer overflow while calculating a buffer size,
leading to buffer overruns.
It also was cavalier about using char-type local arrays as buffers
holding arrays of UChar. The alignment of a char[] variable isn't
guaranteed, so that this risked failure on alignment-picky platforms.
The lack of complaints suggests that such platforms are very rare
nowadays; but it's likely that we are paying a performance price on
rather more platforms. Declare those arrays as UChar[] instead,
keeping their physical size the same.
pg_locale_libc.c's strncoll_libc_win32_utf8() also had the
disease of assuming it could double or quadruple the input
string length without concern for overflow.
Reported-by: Xint Code
Reported-by: Pavel Kohout <pavel.kohout@aisle.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 14
Security: CVE-2026-6473
pg_rewind and pg_basebackup could be fed paths from rogue endpoints that
could overwrite the contents of the client when received, achieving path
traversal.
There were two areas in the tree that were sensitive to this problem:
- pg_basebackup, through the astreamer code, where no validation was
performed before building an output path when streaming tar data. This
is an issue in v15 and newer versions.
- pg_rewind file operations for paths received through libpq, for all
the stable branches supported.
In order to address this problem, this commit adds a helper function in
path.c, that reuses path_is_relative_and_below_cwd() after applying
canonicalize_path(). This can be used to validate the paths received
from a connection point. A path is considered invalid if any of the two
following conditions is satisfied:
- The path is absolute.
- The path includes a direct parent-directory reference.
Reported-by: XlabAI Team of Tencent Xuanwu Lab
Reported-by: Valery Gubanov <valerygubanov95@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6475
A few functions in this file were incautious about multiplying a
possibly large integer by a factor more than 1 and then using it as
an allocation size. This is harmless on 64-bit systems where we'd
compute a size exceeding MaxAllocSize and then fail, but on 32-bit
systems we could overflow size_t, leading to an undersized
allocation and buffer overrun. To fix, use palloc_array() or
mul_size() instead of handwritten multiplication.
Reported-by: Sven Klemm <sven@tigerdata.com>
Reported-by: Xint Code
Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tatsuo Ishii <ishii@postgresql.org>
Security: CVE-2026-6473
Backpatch-through: 14
This omission allowed roles to create multirange types in any
schema, potentially leading to privilege escalations. Note that
when a multirange type name is not specified in CREATE TYPE, it is
automatically placed in the range type's schema, which is checked
at the beginning of DefineRange().
Reported-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Security: CVE-2026-6472
Backpatch-through: 14
drop_existing_subscription() neglected to escape the subscription
name when generating its query string. To fix, use
PQescapeIdentifier() to construct a properly escaped name, and use
it in the ALTER SUBSCRIPTION and DROP SUBSCRIPTION commands.
Reported-by: Yu Kunpeng <yu443940816@live.com>
Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Security: CVE-2026-6476
Backpatch-through: 17
The SQL functions for the restore of attribute and expression statistics
accept "most_common_vals" and "most_common_freqs" as independent arrays.
The planner assumes these have the same number of elements, but it was
possible to insert in the catalogs data that would cause an over-read
when the catalog data is loaded in the planner.
There were two holes in the stats restore logic:
- Both arrays should match in size.
- The input array must be one-dimensional, and it should match with what
is delivered by pg_dump when scanning the pg_stats catalogs.
The multivariate extended statistics MCV path (import_mcv) already
validated these inputs via check_mcvlist_array(), and is not affected.
These problems exist in v18 and newer versions for the restore of
attribute statistics. These problems affect only HEAD for the restore
of the expression statistics.
Reported-by: Jeroen Gui <jeroen.gui1@proton.me>
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Security: CVE-2026-6575
Backpatch-through: 18
Although pg_strftime() has defined error conditions, no callers bother
to check for errors. This is problematic because the output string is
very likely not null-terminated if an error occurs, so that blindly
using it is unsafe. Rather than trusting that we can find and fix all
the callers, let's alter the function's API spec slightly: make it
guarantee a null-terminated result so long as maxsize > 0.
Furthermore, if we do get an error, let's make that null-terminated
result be an empty string. We could instead truncate at the buffer
length, but that risks producing mis-encoded output if the tz_name
string contains multibyte characters. It doesn't seem reasonable for
src/timezone/ to make use of our encoding-aware truncation logic.
Also, the only really likely source of a failure is a user-supplied
timezone name that is intentionally trying to overrun our buffers.
I don't feel a need to be particularly friendly about that case.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6474
timeofday() assumed that the output of pg_strftime() could not contain
% signs, other than the one it explicitly asks for with %%. However,
we don't have that guarantee with respect to the time zone name (%Z).
A crafted time zone setting could abuse the subsequent snprintf()
call, resulting in crashes or disclosure of server memory.
To fix, split the pg_strftime() call into two and then treat the
outputs as literal strings, not a snprintf format string. The
extra pg_strftime() call doesn't really cost anything, since the
bulk of the conversion work was done by pg_localtime().
Also, adjust buffer widths so that we're not risking string truncation
during the snprintf() step, as that would create a hazard of producing
mis-encoded output.
This also fixes a latent portability issue: the format string expects
an int, but tp.tv_usec is long int on many platforms.
Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6474
ALTER SUBSCRIPTION ... REFRESH PUBLICATION interpolates schema and
relation names into SQL without quoting them. A crafted subscriber
relation name can inject arbitrary SQL on the publisher. Test such a
name. Back-patch to v16, where commit
8756930190 first appeared.
Reported-by: Pavel Kohout <pavel.kohout@aisle.com>
Author: Pavel Kohout <pavel.kohout@aisle.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Backpatch-through: 16
Security: CVE-2026-6638
This commit applies timingsafe_bcmp() to authentication paths that
handle attributes or data previously compared with memcpy() or strcmp(),
which are sensitive to timing attacks.
The following data is concerned by this change, some being in the
backend and some in the frontend:
- For a SCRAM or MD5 password, the computed key or the MD5 hash compared
with a password during a plain authentication.
- For a SCRAM exchange, the stored key, the client's final nonce and the
server nonce.
- RADIUS (up to v18), the encrypted password.
- For MD5 authentication, the MD5(MD5()) hash.
Reported-by: Joe Conway <mail@joeconway.com>
Security: CVE-2026-6478
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Backpatch-through: 14
contrib/intarray's query_int type uses an int16 field to hold the
offset from a binary operator node to its left operand. However, it
allows the number of nodes to be as much as will fit in MaxAllocSize,
so there is a risk of overflowing int16 depending on the precise shape
of the tree. Simple right-associative cases like "a | b | c | ..."
work fine, so we should not solve this by restricting the overall
number of nodes. Instead add a direct test of whether each individual
offset is too large.
contrib/ltree's ltxtquery type uses essentially the same logic and
has the same 16-bit restriction.
(The core backend's tsquery.c has a variant of this logic too, but
in that case the target field is 32 bits, so it is okay so long
as varlena datums are restricted to 1GB.)
In v16 and up, these types support soft error reporting, so we have
to complicate the recursive findoprnd function's API a bit to allow
the complaint to be reported softly. v14/v15 don't need that.
Undocumented and overcomplicated code like this makes my head hurt,
so add some comments and simplify while at it.
Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Backpatch-through: 14
Security: CVE-2026-6473
The handling of SSL and GSS negotiation messages in
ProcessStartupPacket() could cause a recursion of the backend,
ultimately crashing the server as the negotiation attempts were not
tracked across multiple calls processing startup packets.
A malicious client could therefore alternate rejected SSL and GSS
requests indefinitely, each adding a stack frame, until the backend
crashed with a stack overflow, taking down a server.
This commit addresses this issue by modifying ProcessStartupPacket() so
as processed negotiation attempts are tracked, preventing infinite
recursive attempts. A TAP test is added to check this problem, where
multiple SSL and GSS negotiated attempts are stacked.
Reported-by: Calif.io in collaboration with Claude and Anthropic
Research
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Security: CVE-2026-6479
Backpatch-through: 14
multirange_recv and BlockRefTableReaderNextRelation were incautious
about multiplying a possibly-large integer by a factor more than 1
and then using it as an allocation size. This is harmless on 64-bit
systems where we'd compute a size exceeding MaxAllocSize and then
fail, but on 32-bit systems we could overflow size_t leading to an
undersized allocation and buffer overrun.
Fix these places by using palloc_array() instead of a handwritten
multiplication. (In HEAD, some of them were fixed already, but
none of that work got back-patched at the time.)
In addition, BlockRefTableReaderNextRelation passes the same value
to BlockRefTableRead's "int length" parameter. If built for
64-bit frontend code, palloc_array() allows a larger array size
than it otherwise would, potentially allowing that parameter to
overflow. Add an explicit check to forestall that and keep the
behavior the same cross-platform.
Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 14
Security: CVE-2026-6473
Some UTF8 characters decompose to more than a dozen codepoints.
It is possible for an input string that fits into well under
1GB to produce more than 4G decomposed codepoints, causing
unicode_normalize()'s decomp_size variable to wrap around to a
small positive value. This results in a small output buffer
allocation and subsequent buffer overrun.
To fix, test after each addition to see if we've overrun MaxAllocSize,
and break out of the loop early if so. In frontend code we want to
just return NULL for this failure (treating it like OOM). In the
backend, we can rely on the following palloc() call to throw error.
I also tightened things up in the calling functions in varlena.c,
using size_t rather than int and allocating the input workspace
with palloc_array(). These changes are probably unnecessary
given the knowledge that the original input and the normalized
output_chars array must fit into 1GB, but it's a lot easier to
believe the code is safe with these changes.
Reported-by: Xint Code
Reported-by: Bruce Dang <bruce@calif.io>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Heikki Linnakangas <hlinnaka@iki.fi>
Backpatch-through: 14
Security: CVE-2026-6473
The number of NFA states, number of NFA arcs, and number of colors
are all bounded to reasonably small values. However, there are
places where we try to allocate arrays sized by products of those
quantities, and those calculations could overflow, enabling
buffer-overrun attacks. In practice there's no problem on 64-bit
machines, but there are some live scenarios on 32-bit machines.
A related problem is that citerdissect() and creviterdissect()
allocate arrays based on the length of the input string, which
potentially could overflow.
To fix, invent MALLOC_ARRAY and REALLOC_ARRAY macros that rely on
palloc_array_extended and repalloc_array_extended with the NO_OOM
option, similarly to the existing MALLOC and REALLOC macros.
(Like those, they'll throw an error not return a NULL result for
oversize requests. This doesn't really fit into the regex code's
view of error handling, but it'll do for now. We can consider
whether to change that behavior in a non-security follow-up patch.)
I installed similar defenses in the colormap construction code.
It's not entirely clear whether integer overflow is possible
there, but analyzing the behavior in detail seems not worth
the trouble, as the risky spots are not in hot code paths.
I left a bunch of calls as-is after verifying that they can't
overflow given reasonable limits on nstates and narcs. Those
limits were enforced already via REG_MAX_COMPILE_SPACE, but
add commentary to document the interactions.
In passing, also fix a related edge case, which is that the
special color numbers used in LACON carcs could overflow the
"color" data type, if ncolors is close to MAX_COLOR.
In v14 and v15, the regex engine calls malloc() directly instead
of using palloc(), so MALLOC_ARRAY and REALLOC_ARRAY do likewise.
Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6473
Sufficiently large "count" arguments could result in undetected
overflow, causing the allocated memory chunk to be much smaller
than what the caller will subsequently write into it. This is
unlikely to be a hazard with 64-bit size_t but can sometimes
happen on 32-bit builds, primarily where a function allocates
workspace that's significantly larger than its input data.
Rather than trying to patch the at-risk callers piecemeal,
let's just redefine these macros so that they always check.
To do that, move the longstanding add_size() and mul_size() functions
into palloc.h and mcxt.c, and adjust them to not be specific to
shared-memory allocation. Then invent palloc_mul(), palloc0_mul(),
palloc_mul_extended() to use these functions. Actually, the latter
use inlined copies to save one function call. repalloc_array() gets
similar treatment. I didn't bother trying to inline the calls for
repalloc0_array() though.
In v14 and v15, this also adds repalloc_extended(), which previously
was only available in v16 and up.
We need copies of all this in fe_memutils.[hc] as well, since that
module also provides palloc_array() etc.
Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6473
The options "StartSel", "StopSel" and "FragmentDelimiter" given by a
caller of the SQL function ts_headline() have their lengths stored as
int16. When providing values larger than PG_INT16_MAX, it was possible
to overflow the length values stored, leading to incorrect behaviors in
generateHeadline(), in most cases translating to a crash.
Attempting to use values for these options larger than PG_INT16_MAX is
now blocked. Some test cases are added to cover our tracks.
Reported-by: Xint Code
Author: Michael Paquier <michael@paquier.xyz>
Backpatch-through: 14
Security: CVE-2026-6473
The lquery parser in contrib/ltree/ had two overflow problems:
- A single lquery level with many OR-separated variants (e.g.,
'label1|label2|...'), could cause an overflow of totallen, this being
stored as a uint16, meaning a maximum value of UINT16_MAX or 65k. Each
variant contributes MAXALIGN(LVAR_HDRSIZE + len) bytes. With enough
long variants, the value would wraparound. This would corrupt the data
written by LQL_NEXT(), leading to a stack corruption, most likely
translating into a crash, but it would allow incorrect memory access.
- numvar, labelled as a uint16, counts the number of OR-variants in a
single level, and it is incremented without bounds checking. With more
than PG_UINT16_MAX (65k) variants in a single level, and a minimum of
131kB of input data, it would wrap to 0. When a (wildcard) '*' is
used, this would change the query results silently.
For both issues, a set of overflows checks are added to guard against
these problematic patterns.
The first issue has been reported by the three people listed below,
affecting v16 and newer versions due to b1665bf01e. Its coding was
still unsafe in v14 and v15. The second issue affects all the stable
branches; I have bumped into while reviewing the code of the module.
Reported-by: Vergissmeinnicht <vergissmeinnichtzh@gmail.com>
Reported-by: A1ex <alex000young@gmail.com>
Reported-by: Jihe Wang <wangjihe.mail@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Security: CVE-2026-6473
Backpatch-through: 14
Commit 16743db06 assumed that the CPUID instruction was always
available when the usual x86 symbols were defined. That is not the
case, so zero out the info rather than error out.
Reported-by: Jakob Egger <jakob@eggerapps.at>
Reported-by: Tobias Bussmann <t.bussmann@gmx.net>
Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/223EA201-A0E8-4A13-B220-EB903E8DF817@eggerapps.at
Commit 8d829f5a0 introduced a COALESCE wrapper around the
JSON_ARRAYAGG subquery so that JSON_ARRAY(query) returns '[]' rather
than NULL when the subquery yields no rows, per the SQL/JSON standard.
The empty-array Const used as the COALESCE fallback was, however,
built with typmod -1 and the type input function was likewise invoked
with typmod -1. As a result, any length restriction from the
RETURNING clause was silently bypassed on the empty-set path, while
the non-empty path enforced it via the JSON_ARRAYAGG coercion.
Build the empty-array Const using the typmod of the COALESCE's
non-empty argument, and pass that typmod to OidInputFunctionCall as
well so the value is length-checked at parse time. This makes the
empty-set and non-empty-set paths behave consistently.
Reported-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAJTYsWXPYqa58YXrU+SQMVonsAhjLS46HNUMU=wO5zm9MgY3_g@mail.gmail.com
Error messages in check_publication_add_relation() previously reported
only the relation name when a table in an EXCEPT clause could not be
processed, which is ambiguous when the same name exists in multiple
schemas. Use schema-qualified names instead, consistent with other error
messages that reference relation names.
Author: Dilip Kumar <dilipbalaut@gmail.com>
Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/CAFiTN-scG7b11Jsp+VoDRT8ZFE84eSKLcDsSB18dZ8AaP=R-mw@mail.gmail.com
When importing remote stats for a foreign table backed by a pre-v17
remote server, the query built/executed in this function has three NULL
placeholders for the range stats supported in v17 at the end of the
SELECT list. Previously, it included a trailing comma after the last
NULL, like "SELECT ..., NULL, NULL, NULL, FROM pg_catalog.pg_stats ...",
causing a syntax error on the remote server. Fix by removing the comma.
Oversight in commit 28972b6fc.
Author: Satya Narlapuram <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/CAHg%2BQDdEE7wp1S60Fn9Kmna8KfdMo5Tu6dROLpMn_-EOUBKmWQ%40mail.gmail.com
remove_useless_groupby_columns() uses a relation's unique indexes to
prove that some GROUP BY columns are functionally dependent on others,
and so can be dropped from the GROUP BY clause. The match between
index columns and GROUP BY columns was done by attno alone, ignoring
two equality-relation issues.
A type may belong to multiple btree opfamilies whose notions of
equality differ. The record type, for instance, has record_ops
(per-field equality) and record_image_ops (bytewise equality). A
unique index under one opfamily does not prove uniqueness under the
equality used by GROUP BY when the SortGroupClause's eqop comes from a
different opfamily.
Likewise, since nondeterministic collations were introduced in PG 12,
two collations may disagree on equality, and a unique index under one
collation does not prove uniqueness under another.
In either case, rows that the index considers distinct can collapse
into a single GROUP BY group, taking ungrouped columns of differing
values with them, so the planner drops a column that is not in fact
functionally dependent and produces wrong results.
Fix by requiring, for each unique-index key column, that some GROUP BY
item on the same column has an eqop in the index's opfamily and a
collation that agrees on equality with the index's collation. This
mirrors the combined check relation_has_unique_index_for() applies to
join clauses.
This is a v18 regression: commit bd10ec529 extended
remove_useless_groupby_columns() from primary-key constraints to
arbitrary unique indexes. Before that, the function consulted only
primary keys, whose enforcement index is required by parse_utilcmd.c
to use the default opclass and the column's declared collation, so
neither mismatch could arise. Back-patch to v18 only.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49t6uArWoTT-cHY+nhsi23nJJKcF9Xb9cYGzaZ9kNJ98g@mail.gmail.com
Backpatch-through: 18
Commit f76686ce7 added a walker that detects when a HAVING clause uses
a collation that conflicts with the GROUP BY's nondeterministic
collation, keeping such clauses in HAVING. The walker uses
exprInputCollation() to identify each ancestor's comparison collation,
but missed the simple-CASE case: parse analysis builds each WHEN as
OpExpr(CaseTestExpr op val), where CaseTestExpr is a placeholder for
the arg, while the actual arg expression sits at cexpr->arg, outside
the OpExpr that carries the comparison's inputcollid. A GROUP Var at
cexpr->arg was therefore visited with the WHEN's inputcollid absent
from the ancestor stack, the conflict went undetected, and the clause
was wrongly pushed to WHERE.
Fix by handling simple CASE explicitly: before walking cexpr->arg,
push every WHEN's inputcollid onto the ancestor stack so a GROUP Var
at the arg is checked against the same collations the WHEN comparisons
would apply. Then walk the WHEN bodies and defresult under the
unchanged stack, where their own collation contexts are picked up by
the default path.
Back-patch to v18 only; this fix extends the walker added by commit
f76686ce7 and inherits its dependency on the v18 RTE_GROUP mechanism.
Author: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDcqPdd=2V0PQ_oNYj50OUeqSqznqFaYtP3RdokLBDXBqw@mail.gmail.com
Backpatch-through: 18
afterTriggerInvokeEvents() may repalloc afterTriggers.query_stack
while firing trigger events, leaving any precomputed entry pointer
dangling. The loop body in AfterTriggerEndQuery() recomputes qs
after each afterTriggerInvokeEvents() call for that reason, but the
"all fired" break path exits without the recompute, and the
subsequent FireAfterTriggerBatchCallbacks(qs->batch_callbacks)
dereferences the freed pointer.
Fix by recomputing qs immediately before
FireAfterTriggerBatchCallbacks(), as the loop body already does
after each afterTriggerInvokeEvents() call.
The hazard was introduced in 34a3078629, which added the
qs->batch_callbacks dereference at this site.
Reported-by: Amul Sul <sulamul@gmail.com>
Author: Amul Sul <sulamul@gmail.com>
Reviewed-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CAAJ_b95p6-qiVpE2Gpr=bUsNAqTcejD_rPgLnfjx9m=fo3Rf3Q@mail.gmail.com
Previously, InitializeProcessXLogLogicalInfo() was called before
ProcSignalInit(). This created a window where a process could miss a
signal barrier if it was issued between these two calls. As a result,
the process could fail to update its local XLogLogicalInfo cache,
leading to an inconsistent logical decoding state.
This commit fixes this by moving InitializeProcessXLogLogicalInfo()
after ProcSignalInit(). This ensures that the process is registered to
participate in signal barriers before its state is initialized,
preventing it from missing any state changes propagated during the
startup sequence.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAD21AoBzdeSyLSSPM5E6ysN1r8qzp8u_BRmnLvuAp_S8QxS_fQ@mail.gmail.com
Discussion: https://postgr.es/m/CAD21AoBj+zKvgw_Q8gjr4YbKccW_uMe3OFQ5+KT246FHUuNXSQ@mail.gmail.com
The regression tests had a copy of the full error, detail, and hint
text in comments above each failing statement in the .sql files. This
is a maintenance hazard, so simplify to "-- ERROR", in line with
other tests.
Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CANWCAZap26BRLwtd+A7GFDSD6-+C3F0NVdUGUAu2LUfvpOTy=w@mail.gmail.com
Fix spelling and grammar, turn an accidental duplicate errmsg into
errdetail, and remove an errposition that was not pointing at anything
relevant to the error.
Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Yuchen Li <liyuchen_xyz@163.com> (earlier version)
Discussion: https://postgr.es/m/CAJTYsWUvMT5uKOasPnm6-o9CrdXbRONiAYHTKJb7wx66LB8S1A@mail.gmail.com
Property graph element labels and label properties relied on a direct
systable scan when retrieving their object descriptions. These can be
simplified with get_catalog_object_by_oid(). This offers the benefit to
do a direct syscache lookup, if available.
The same logic will be used in a follow-up patch when retrieving the
object identity parts, applying the same rule across the board for these
object types.
Extracted from a larger patch by the author.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Alex Guo <guo.alex.hengchen@gmail.com>
Discussion: https://postgr.es/m/aej1DkLwhyZWmtxJ@bdtpg
WAIT FOR LSN registers the current backend in shared memory before entering an
interruptible wait loop. Top-level abort and backend exit already call
WaitLSNCleanup(), but subtransaction abort did not. If an interrupt, such as
statement_timeout, occurred while waiting inside a savepoint, rolling back to
the savepoint left the backend marked as present in the WAIT FOR LSN heap.
Clean up WAIT FOR LSN state from AbortSubTransaction() as well, and add
a TAP test covering reuse of WAIT FOR LSN after a savepoint rollback.
Reported-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAJTYsWXDRwo-RVRaQgwxVcXgURVFeX8BKnijQrPiPcSCkDDX9A%40mail.gmail.com
Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
The test for finding page verification failures in the logfiles
were missing the /m modifier to make sure it anchors to every
newline in the search space buffer, and not just the last one.
Spotted while adding a test for the recently reported issue with
excessive WAL for unlogged relations.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDeGrpZbNZdLjd_T4b43xKEEXZN0HGhkFm-1bkBdyzK7AQ@mail.gmail.com
The DataChecksumsWorker accepts cost_delay and cost_limit parameters
from pg_enable_data_checksums() so users can throttle the I/O caused
by enabling checksums. Due to the API for setting the cost parameters
changing between when the code was written, and when it was committed
the new cost update function call was omitted and thus the parameters
were silently ignored.
Fix by calling VacuumUpdateCosts() after assigning the parameters
(both during worker startup and on the runtime cost-update path), and
by leaving the page-cost weights at their GUC-controlled defaults.
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDeevH6aTyWdXYBJW0wOmfoZy66gDi5TfinK_dXeCrHQLg@mail.gmail.com
ProcessSingleRelationFork() unconditionally generated an FPI WAL
record for every page of every relation when enabling checksums.
Unlogged relations, which by definition never generate WAL for
data changes, were not exempt which generated excessive WAL to
be emitted.
Fix by guarding the FPI WAL record call with RelationNeedsWAL()
to avoid emitting WAL for unlogged main forks. Unlogged pages
are still dirtied to ensure the checksum is written to disk at
the next checkpoint. The init fork remains WAL-logged even for
unlogged relations, as it's needed on the standby to materialize
the relation after promotion (see ResetUnloggedRelations()).
Skipping init-fork WAL would leave the standby with a stale init
fork that, once copied to the main fork on promotion, would fail
checksum verification on every read of the unlogged relation.
A test which creates an unlogged table with an index, enables
checksums, promotes the standby, and verifies that the unlogged
relation and its indexes are still readable post-promotion has
been added.
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDeGrpZbNZdLjd_T4b43xKEEXZN0HGhkFm-1bkBdyzK7AQ@mail.gmail.com
Commit b3cf461b3c renamed --wal-directory to --wal-path but retained
the former as a silent alias. Per project policy, all options,
including deprecated ones, should be documented to assist users
transitioning between versions.
This patch restores --wal-directory to the documentation and --help
output.
Author: Amul Sul <sulamul@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/E1w3fZp-000gje-31%40gemulon.postgresql.org
get_tables_to_repack() and get_all_vacuum_rels() were including other
sessions' temporary tables in their output work list, causing REPACK,
CLUSTER and VACUUM FULL (when executed without a table list) to attempt
to acquire AccessExclusiveLock on them, potentially blocking for an
extended time. Fix by skipping other-session temp tables early, before
they are added to the list.
This issue is ancient, but there have been no complaints about it that I
know of, so I'm opting for not backpatching at present.
Author: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/0b555318-2bf2-46df-9377-09629a2a59db@uni-muenster.de
As connections that failed abort cleanup can't safely be further used,
if a remote query tries to get such a connection, we reject it.
Previously, this rejection involved dropping the connection if it was
open, without accounting for the possibility of open cursors using it,
causing a server crash when such an open cursor tried to use an
already-dropped connection, as a cursor-handling function
(create_cursor, fetch_more_data, or close_cursor) was called on a freed
PGconn. To fix, delay dropping failed connections until abort cleanup
of the main transaction, to ensure open cursors using such a connection
can safely refer to the PGconn for it.
Oversight in commit 8bf58c0d9.
Reported-by: Zhibai Song <songzhibai1234@gmail.com>
Diagnosed-by: Zhibai Song <songzhibai1234@gmail.com>
Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAPmGK176y6JP017-Cn%2BhS9CEJx_6iVhRoYbAqzuLU4d8-XPPNg%40mail.gmail.com
Backpatch-through: 14
Commit 28d534e2ae introduced reform_tuple() with a fast path that
returns the source tuple verbatim when no dropped columns require fixing
up. I (Álvaro) failed to realize that this broke handling of columns
with a 'missingval' defined: after a VACUUM FULL, CLUSTER, or REPACK
operation, the catalogued missingval is thrown away, so the tuples are
no longer correct.
Fix by forcing the rewrite when the tuple is shorter than the tuple
descriptor.
Author: Satya Narlapuram <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDeoccU5CudrJpmSKZfKZ1gRMNY=5BxSC=JpHgkonzgcOw@mail.gmail.com
rel_is_distinct_for()'s RTE_SUBQUERY branch passed only the equality
operator from each join clause to query_is_distinct_for(), discarding
the operator's input collation. query_is_distinct_for() then verified
opfamily compatibility but never checked collations, so a DISTINCT /
GROUP BY / set-op operating under one collation was trusted to prove
uniqueness for a comparison performed under an unrelated collation.
As with the recent fix in relation_has_unique_index_for(), this is
unsound for nondeterministic collations and yields wrong query results
in any optimization that consumes the proof.
Fix by carrying each clause's operator input collation into
query_is_distinct_for() and validating it at every check-site against
the subquery target expression's collation.
Back-patch to all supported branches. query_is_distinct_for() is
declared in an installed header, so on stable branches the existing
two-list signature is retained as a thin wrapper that forwards to a
new collation-aware entry point; external callers continue to receive
the historical collation-blind answer.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMbWs4_XUUSTyzCaRjUeeahWNqi=8ZOA5Q4coi8zUVEDSBkM6A@mail.gmail.com
Backpatch-through: 14
relation_has_unique_index_for() has long had an XXX noting that it
doesn't check collations when matching a unique index's columns
against equality clauses. This was benign as long as all collations
in play reduced to the same notion of equality, but has been incorrect
since nondeterministic collations were introduced in PG 12: a unique
index under a deterministic collation does not prove uniqueness under
a nondeterministic collation, nor vice versa.
The consequence is wrong query results for any planner optimization
that consumes the faulty proof, including inner-unique join execution
(which stops the inner search after the first match per outer row),
useless-left-join removal, semijoin-to-innerjoin reduction, and
self-join elimination.
Fix by requiring the index's collation to agree on equality with the
clause's input collation. Two collations agree on equality if either
is InvalidOid (denoting a non-collation-sensitive operation, which
cannot conflict with the other side), if they have the same OID, or if
both are deterministic: by definition a deterministic collation treats
two strings as equal iff they are byte-wise equal (see CREATE
COLLATION), so any two deterministic collations share the same
equality relation and the uniqueness proof carries over. Any mismatch
involving a nondeterministic collation is rejected.
Back-patch to all supported branches; the bug has existed since
nondeterministic collations were introduced in PG 12.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMbWs4_XUUSTyzCaRjUeeahWNqi=8ZOA5Q4coi8zUVEDSBkM6A@mail.gmail.com
Backpatch-through: 14
This function returns some value of enum HostsFileLoadResult,
but for reasons lost in the development process was declared to
return "int". Fix that, for clarity and so that our typedefs
collection tooling sees the typedef as used. Also fix the
variable that the sole call assigns into. Move the typedef
to the header file that declares load_hosts() to avoid creating
header dependency problems.
Discussion: https://postgr.es/m/359138.1777922557@sss.pgh.pa.us
expression_tree_mutator_impl() did not handle T_GraphPattern,
T_GraphElementPattern, and T_GraphPropertyRef. The corresponding
expression_tree_walker_impl() already handles all three node types.
This causes an "unrecognized node type" error whenever a GRAPH_TABLE
appeared in an expression tree.
While at it, also update raw_expression_tree_walker() and
expression_tree_walker() to handle missing nodes that may appear in
GraphPattern expression trees. When raw_expression_tree_walker() is
called, GraphElementPattern::labelexpr contains ColumnRefs instead of
GraphLabelRefs. Hence those are not handled in
raw_expression_tree_walker().
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHg%2BQDc97WFTSkXg%3Dg_ZAH8GnY2gJrvq72cs%2BYjqEAuZgXnkAQ%40mail.gmail.com
append_tuple_value_detail() constructed user-visible messages using
separately translated fragments such as ": ", ", ", and ".",. This
makes correct translation difficult or impossible in some languages.
Refactor append_tuple_value_detail() to move all punctuation and
sentence construction to the callers, which now use a single
translatable string with a %s placeholder for the tuple data.
Reported-by: David Rowley <dgrowleyml@gmail.com>
Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/227279.1775956328%40sss.pgh.pa.us#8f3a5f50543556c60cc5a13270cb7ba4
Discussion: https://postgr.es/m/CAApHDvohYOdrvhVxXzCJNX_GYMSWBfjTTtB6hgDauEtZ8Nar2A@mail.gmail.com
The XLogRecordPageWithFreeSpace function updates the freespace map (FSM) data
while replaying data-level WAL records during the recovery. If the FSM block
is updated, it needs to be marked as modified. Currently, this is done with
the MarkBufferDirtyHint call (as in all other cases for modifying FSM data).
However, in the recovery context, this function will actually do nothing if
checksums are enabled. It's assumed that the page should not be dirtied
during recovery while modifying hints to protect against torn pages, since no
new WAL data can be generated at this point to store FPI.
Such logic does not seem fully aligned with the FSM case, as its blocks could
be simply zeroed if a checksum mismatch is detected. Currently, changes to an
FSM block could be lost if each change to that block occurs infrequently
enough to allow it to be evicted from the cache. To persist the change, the
modification needs to be performed while the FSM block is still kept in
buffers and marked as dirty after receiving its FPI. If the block has already
been cleaned, the change won't be persisted, so stored FSM blocks may remain
in an obsolete state.
If a large number of discrepancies between the data in leaf FSM blocks and the
actual data blocks accumulate on the replica server, this could cause
significant delays in insert operations after switchover. Such an insert
operation may need to visit many data blocks marked as having sufficient
space in the FSM, only to discover that the information is incorrect and the
FSM records need to be corrected. In a heavily trafficked insert-only table
with many concurrent clients performing inserts, this has been observed to
cause several-second stalls, causing visible application malfunction. The
desire to avoid such cases was the reason behind the commit ab7dbd681, which
introduced an update of FSM data during the heap_xlog_visible invocation.
However, an update to the FSM data on the standby side could be lost due to a
missing 'dirty' flag, so there is still a possibility that a large number of
FSM records will contain incorrect data. Note that having a zeroed FSM page
in such a case (due to a checksum mismatch) is preferable, as a zero value
will be interpreted as an indication of full data blocks, and the inserter
will be routed to the next FSM block or to the end of the table.
Given that FSM is ready to handle torn page writes and
XLogRecordPageWithFreeSpace is called only during the recovery, there seems
to be no reason to use MarkBufferDirtyHint here instead of a regular
MarkBufferDirty call.
Discussion: https://postgr.es/m/596c4f1c-f966-4512-b9c9-dd8fbcaf0928%40postgrespro.ru
Author: Alexey Makhmutov <a.makhmutov@postgrespro.ru>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
WAIT FOR LSN compares only the numeric LSN and has no notion of which
timeline a WAL record belongs to. There are many possible scenarios when
timeline-switching can break read-your-writes consistency. The proper
analysis and timeline support is possible in the next major release. Yet
just document the current behaviour.
Reported-by: Xuneng Zhou <xunengzhou@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Add regression coverage for several WAIT FOR LSN edge cases.
First, cover fresh walreceiver shared-memory initialization after a
standby restart. Restart the standby while its upstream is down, so
RequestXLogStreaming() seeds writtenUpto/flushedUpto to the
segment-aligned receiveStart and the walreceiver cannot immediately
advance them. Verify that the seeded flush position is segment-aligned,
that replay can be ahead of it, and that standby_write/standby_flush
still succeed for an already-replayed LSN via the replay-position floor
in GetCurrentLSNForWaitType().
Second, add fencepost checks for the target <= currentLSN predicate.
With replay paused and walreceiver stopped, verify exact boundaries for
standby_replay using pg_last_wal_replay_lsn(), and for standby_flush
using pg_last_wal_receive_lsn(). Also verify that a waiter for
current + 1 sleeps while replay is paused and wakes with success once
new WAL is delivered and replay advances.
Finally, add a cascading-standby timeline-switch test. Start a waiter
on the downstream standby, promote its upstream, generate WAL on the new
timeline, and verify that the cascade follows the new timeline and the
wait completes successfully once replay reaches the target LSN.
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/1957514.1775526774%40sss.pgh.pa.us
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
The startup process only woke STANDBY_REPLAY waiters after replaying
each WAL record. STANDBY_WRITE and STANDBY_FLUSH waiters depended only
on walreceiver write/flush callbacks. As a result, replay progress alone
did not wake those waiters, and in pure archive recovery (where no
walreceiver exists) they could sleep until timeout.
Fix by also calling WaitLSNWakeup() for STANDBY_WRITE and
STANDBY_FLUSH after each replay. For the replay-floor semantics used by
GetCurrentLSNForWaitType(), replay progress is a valid lower bound for
both modes: WAL cannot be replayed unless it has already been written
and flushed locally.
This works together with the replay-position floor in
GetCurrentLSNForWaitType(). The getter ensures that a waiter woken by
replay can recheck successfully; the replay-side wakeups ensure that a
waiter already asleep is notified when replay reaches its target.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1957514.1775526774%40sss.pgh.pa.us
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
GetCurrentLSNForWaitType() for standby_write and standby_flush modes
returned only the walreceiver position, which may lag behind WAL
already present on the standby from a base backup, archive restore,
or prior streaming. This could cause unnecessary blocking if the
target LSN falls between the walreceiver's tracked position and the
replay position.
Fix by returning the maximum of the walreceiver position and the
replay position. WAL up to the replay point is physically on disk
regardless of its origin, so there is no reason to wait for the
walreceiver to re-receive it.
This complements 29e7dbf5e4, which seeded writtenUpto to
receiveStart in RequestXLogStreaming() to fix the most common
hang scenario. The getter-level floor handles the remaining edge
cases: targets between receiveStart and the replay position, and
standbys running with archive recovery only (no walreceiver).
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1957514.1775526774%40sss.pgh.pa.us
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
All five wakeup call sites duplicate WaitLSNWakeup()'s internal
fast-path minWaitedLSN check and add an unnecessary NULL check
on waitLSNState.
Remove the inline pre-checks and call WaitLSNWakeup() directly.
The fast-path check inside WaitLSNWakeup() already returns early
when no waiter's target has been reached, so there is no
performance difference.
The waitLSNState NULL checks are also unnecessary: shared memory
is fully initialized before any backend or auxiliary process
starts, so waitLSNState is always non-NULL at these call sites.
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/jzq5shdewncpxc35r3s2mcfsmo4bjovkza5mnqf5bdfumhfi3g%40bglckf7dxmw5
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
WAIT FOR LSN uses a Dekker-style handshake: the waker stores an LSN
position then reads minWaitedLSN; the waiter stores its target into
minWaitedLSN then reads the position. Without a barrier between each
side's store and load, a CPU may satisfy the load before the store
becomes globally visible, causing either side to miss a concurrent
update. The result is a missed wakeup: the waiter sleeps indefinitely
until the next unrelated event.
Fix by embedding the required barriers into the atomic operations on
minWaitedLSN:
- In updateMinWaitedLSN(), use pg_atomic_write_membarrier_u64() so the
waiter's preceding heap update is visible before the new minWaitedLSN
value is published.
- In WaitLSNWakeup(), use pg_atomic_read_membarrier_u64() in the
fast-path check so the waker's preceding position store is globally
visible before minWaitedLSN is read.
The waiter side is also covered by the barrier semantics already present
in GetCurrentLSNForWaitType(): GetWalRcvWriteRecPtr() uses an explicit
read barrier (from patch 0001), while the remaining getters acquire a
spinlock, which implies the same ordering.
Also call ResetLatch() unconditionally after WaitLatch(), following the
standard latch loop pattern. WaitLatch() does not guarantee that all
simultaneously true wake conditions are reported in one return, so a
timeout can race with SetLatch(). If we skip ResetLatch() on a timeout
return, the code performs further asynchronous-state checks before
consuming the latch, violating the latch API's required wait/reset
pattern. That can leave the latch set across loop exit and cause a
later unrelated WaitLatch() in the same backend to return immediately.
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/zqbppucpmkeqecfy4s5kscnru4tbk6khp3ozqz6ad2zijz354k%40w4bdf4z3wqoz
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
The walreceiver publishes its write position lock-free via writtenUpto.
On weakly-ordered architectures (ARM, PowerPC), both sides of this
handshake need explicit barriers so that the lock-less reader sees a
consistent state.
Use pg_atomic_write_membarrier_u64() at both write sites and
pg_atomic_read_membarrier_u64() in GetWalRcvWriteRecPtr(). This matches
the barrier semantics that GetWalRcvFlushRecPtr() and other LSN-position
functions get implicitly from their spinlock acquire/release, and
protects from bugs caused by expectations of similar barrier guarantees
from different LSN-position functions.
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/zqbppucpmkeqecfy4s5kscnru4tbk6khp3ozqz6ad2zijz354k%40w4bdf4z3wqoz
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
ECPGdeallocate_all(), ECPGprepared_statement(), ECPGget_desc(), and
ecpg_freeStmtCacheEntry() could crash with a SIGSEGV when called
without an established connection (for example, when EXEC SQL CONNECT
was forgotten or a non-existent connection name was used), because
they dereferenced the result of ecpg_get_connection() without first
checking it for NULL.
Each site is fixed in the style of the surrounding code.
New tests are added for these conditions.
Author: Shruthi Gowda <gowdashru@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Mahendra Singh Thalor <mahi6run@gmail.com>
Reviewed-by: Nishant Sharma <nishant.sharma@enterprisedb.com>
Discussion: https://postgr.es/m/3007317.1765210195@sss.pgh.pa.us
Backpatch-through: 14
The errdetail() added in 55890a9194 (and reworked in 3e2a1496ba)
exposed the operating-system PID and UID of whoever sent the
termination signal directly to the affected client.
Discussion suggested this should not be sent to the client, but only
recorded in the server log where the admin can use it for diagnosis.
Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Discussion: https://postgr.es/m/E5CA274C-74BD-4067-8B73-A3AD8C080EFA@gmail.com
The sequence subscription test switches regress_seq_sub to connect to the
publisher as regress_seq_repl (a non-superuser) when checking behavior
with insufficient sequence privileges but forgot to set up pg_hba.conf to
allow connections from it. The special setup is only needed on Windows
machines that don't use UNIX sockets.
As per buildfarm.
Reported-by: Ajin Cherian <itsajin@gmail.com>
Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CAFPTHDad911HUMkHgD1KZk+WOvTopiBcYf4C_8Fqj1-sZk3xgw@mail.gmail.com
When walsender finishes streaming during shutdown, it sends a
CommandComplete message to tell the receiver that WAL streaming is done.
Previously, that path used EndCommand() followed by pq_flush().
Those functions can block indefinitely waiting for the socket to become
writeable. As a result, even when wal_sender_shutdown_timeout is set,
walsender could remain stuck while sending the final completion message,
and the shutdown timeout would not be enforced.
Fix this by introducing EndCommandExtended(), which allows
CommandComplete to be queued with pq_putmessage_noblock(), and by
using the walsender nonblocking flush path instead of pq_flush(), so
the shutdown timeout continues to be checked while pending output is
flushed.
Per CI testing on FreeBSD.
Reported-by: Andres Freund <andres@anarazel.de>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/vwlugmsogfn36jhm56zwrgd7m6xe6ircltvfh3kzt6kldvbtht@f45dgow5uhnx
When GROUP BY uses a nondeterministic collation, the planner's
optimization of moving HAVING clauses to WHERE can produce incorrect
query results. The HAVING clause may apply a stricter collation that
distinguishes values the GROUP BY considers equal. Pushing such a
clause to WHERE causes it to filter individual rows before grouping,
potentially eliminating group members and changing aggregate results.
Fix this by detecting collation conflicts before flatten_group_exprs,
while the HAVING clause still contains GROUP Vars (Vars referencing
RTE_GROUP). At that point, each GROUP Var directly carries the GROUP
BY collation as its varcollid, making it straightforward to compare
against the operator's inputcollid. A mismatch where the GROUP BY
collation is nondeterministic means the clause is unsafe to push down.
RowCompareExpr is treated specially, since it carries per-column
inputcollids[] rather than a single inputcollid.
The conflicting clause indices are recorded in a Bitmapset and
consulted during the existing HAVING-to-WHERE loop, so that only
affected clauses are kept in HAVING; other safe clauses in the same
query are still pushed.
Back-patch to v18 only. The fix relies on the RTE_GROUP mechanism
introduced in v18 (commit 247dea89f), which is what lets us identify
grouping expressions and their resolved collations via GROUP Vars on
pre-flatten havingQual. Pre-v18 branches lack that machinery, so a
back-patch there would need a different approach. Given the absence
of field reports of this bug on back branches, the risk of carrying a
different fix on stable branches is not justified.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://postgr.es/m/CAMbWs48Dn2wW6XM94GZsoyMiH42=KgMo+WcobPKuWvGYnWaPOQ@mail.gmail.com
Backpatch-through: 18
In ExecLockRows() and ri_LockPKTuple(), the TM_Deleted code path was
using the same "could not serialize access due to concurrent update"
message as the TM_Updated path. Use "concurrent delete" instead, since
the tuple was deleted, not updated. The ExecLockRows() instance was
likely a copy-paste error per Andres; the ri_LockPKTuple() instance
was carried over from the same pattern in commit 2da86c1ef9.
Update affected isolation test expected files accordingly and add
a new test to fk-concurrent-pk-upd.spec with concurrent delete of the
PK row.
The ExecLockRows() change is master-only for lack of user complaints
and to avoid breaking anything that might match on the error text.
Reported-by: Jian He <jian.universality@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CACJufxEG1JTCq4A1gnNAu-bGAq9Xn=Xkf7kC3TRWFz6iuUOuRA@mail.gmail.com
According to the SQL/JSON standard, JSON_ARRAY(query) must return an
empty JSON array ('[]') when the subquery returns zero rows.
Previously, the parser rewrote JSON_ARRAY(query) into a JSON_ARRAYAGG
aggregate function. Because this aggregate evaluates to NULL over an
empty set without a GROUP BY clause, the constructor erroneously
returned NULL. Additionally, this premature rewrite baked physical
implementation details into the catalog, preventing ruleutils.c from
deparsing the original syntax for views.
This patch resolves both issues by introducing a new
JSCTOR_JSON_ARRAY_QUERY constructor type. The parser builds the
executable form --- a COALESCE-wrapped JSON_ARRAYAGG subquery --- from
raw parse nodes via transformExprRecurse, and stores it in the func
field. The original transformed Query is kept in a new orig_query
field so that ruleutils.c can deparse the original syntax for views.
During planning, eval_const_expressions replaces the node with the
pre-built func expression.
The deparsing issue was reported by Tom Lane.
Bump catalog version.
Bug: #19418
Reported-by: Lukas Eder <lukas.eder@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/19418-591ba1f29862ef5b@postgresql.org
In order to process tuples inserted or updated while REPACK executes, we
write those tuples to disk and later restore them; however, some forms
of toasted tuples were not being processed correctly. Fix that.
Also expand the tests a bit for better coverage.
Author: Satya Narlapuram <satyanarlapuram@gmail.com>
Author: Antonin Houska <ah@cybertec.at>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDeXb9HM2VGKXQedyCp52GzajJK5KOUdNi6oLjsS0nerQw@mail.gmail.com
When cloning extended statistics via CREATE TABLE ... LIKE ... INCLUDING
STATISTICS, stxkeys holds attribute numbers from the source (parent)
table, but get_attname() was being called with the child relation's
OID. If the parent has dropped columns, the child's attribute numbers
are renumbered sequentially and no longer match, so the lookup either
returns the wrong column name (silent corruption) or errors out when
the attnum does not exist in the child.
Fix it by remapping the parent attnum through attmap before the lookup,
consistent with how expression statistics are already handled a few
lines below.
Add a regression test covering both manifestations: a 3-column parent
where the stale attnum refers to no child column (cache-lookup error),
and a 4-column parent where the stale attnum silently refers to the
wrong child column.
Author: Julien Tachoires <julmon@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Discussion: https://postgr.es/m/20260415105718.tomuncfbmlt67oel@poseidon.home.virt
Backpatch-through: 14
There is a narrow race in which a concurrent ALTER DATABASE ... SET
TABLESPACE moves the database off the tablespace and a DROP TABLESPACE
removes it between the syscache lookup and the catalog scan. If that
happens, output an error.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Jack Bonatakis <jack@bonatak.is>
Reviewed-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/573E45C1-31A4-4885-A00C-1A2171159A2A@gmail.com
The worker need to know whether a database which failed checksum
processing still exists, or has been dropped. This improves the
detection logic by checking for being partially dropped.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
When pg_{enable|disable}_data_checksums is called while checksums are
being enabled or disabled, the already running launcher is detected
and the new desired state is recorded. Processing will then pick up
the new state and change its operation to fulfill the new request.
If the same state is requested but with different cost values, the
new cost values will take effect on the next relation processed.
The previous coding had a complex logic of starting a new launcher
for this, which is now avoided with the shared mem structure instead
used to signal current processing.
This makes the logic more robust, and fixes a bug where the launcher
would erroneously revert back to the "off" state.
Access to the shared memory is also protected with LWLocks in all
cases. Since the shmem structure is used for signalling between
the worker and the launcher, and there can be only one of each,
there were no concurrency issues detected but it's better to stick
to proper locking protocol should this ever be updated to handle
multiple workers.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
A collection of spelling, wording and punctuation fixups for the code
documentation from postcommit review.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
Commit 78e950cb8 added checksum state handling to all XLOG_CHECKPOINT
records which caused unnecessary state transitions and emission of
procsignal barriers. Remove as only the _REDO record need to handle
checksum state. Barrier emission is also consistently made after
controlfile updates to avoid race conditions.
Additionally, interrupts are held between calling ProcSignalInit and
InitLocalDataChecksumState to remove a window where otherwise invalid
state transitions can happen.
Also remove a pointless assertion on Controlfile which will never hit.
Author: Tomas Vondra <tomas@vondra.me>
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
When erroring out from the datachecksums launcher during data checksum
enabling, before state has transitioned to "on", we revert back to the
"off" state. Since checksums weren't enabled, there is no use staying
in an inprogress state since the checksum launcher currently doesn't
support restarting from where it left off. Should restartability get
added in the future, this would need to be revisited. This state
transition was however missing from the allowed transitions in the
statemachine causing an error.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
This includes a number of smaller fixups to the online checksums test
module which were found during postcommit review and stabilization
work.
* Fix scope increase for PG_TEST_EXTRA: The online checksums tests
have two levels of PG_TEST_EXTRA, checksum and checksums_extended
for extra test runs and test runs with increased randomization.
The logic for increasing the number of test iterations was however
backwards.
* Change stopmode for PITR test: The pitr suite used immediate stop
mode which caused problems on slower machines where the sigquit
would interrupt archive commands leaving partial WAL files behind.
This would then prevent restart. Fix by using fast mode which is
the appropriate mode for the test at hand. Also increase timeouts
to help slower test systems since an expired timeout will incur
the same effect as an immediate standby with a partial WAL left
behind. This issue was observed when running the test suites on
a Raspberry Pi 4 machine.
* Improve logging: The test suite for data checksums use a set of
helper functions in a Perl module to avoid repeating code, this
makes sure that the helper functions do a better job of logging
their test output to make debug easier.
* Remove unused code: wait_for_cluster_crash was used during the
development of online checksums but was never used in any test
which shipped, so remove the function.
* Standby fixes: Ensure no vacuum on pgbench init on standby with
-n to avoid bogus error message in the log, and enable
hot_standby_feedback to prevent queries from getting cancelled
due to recovery on slower systems.
Author: Daniel Gustafsson <daniel@yesql.se>
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
These functions missed a RecoveryInProgress() check, allowing them to
be called on a hot standby. Enabling, or disabling, checksums on the
standby only would cause the cluster to get out of sync and replaying
checksum transitions to fail.
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAHg+QDfRk4-S7DMmdbXJnQ-xF=sUpMAKuh8b83ObLqYVKx5QLA@mail.gmail.com
sequence_rel was declared at batch scope, so when a row is skipped due to
concurrent drop or insufficient privileges, the end-of-row cleanup closes
the stale pointer from the previous row, tripping the relcache refcount
assertion.
Move sequence_rel inside the per-row loop.
Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/CAJTYsWWOuw-yfmzotV4jCJ6LLxEsb=STLcGtDYXOxRcU9Te3Pw@mail.gmail.com
Upon a failure of sync_file_range(), EINTR was checked based on the
returned result of the routine rather than its errno. sync_file_range()
returns -1 on failure, making the check a no-op, invalidating the retry
attempt in this case.
Oversight in 0d369ac650.
Author: DaeMyung Kang <charsyam@gmail.com>
Discussion: https://postgr.es/m/20260429151811.1810874-1-charsyam@gmail.com
Backpatch-through: 16
This reverts portions of commit 6dcfac9696, which is wrong in trying
to use a *GetDatum() that matches with the C types of the values read.
*GetDatum() should match with the output argument types of the SQL
functions.
The portions of 6dcfac9696 that are right regarding this rule are:
- gistget.c, where the GiST support functions use DatumGetUInt16() to
retrieve the strategy number.
- The BRIN code for strategynum, used in syscache lookups.
The adjustments done in this commit are for pageinspect, pg_buffercache
and pg_lock_status().
While double-checking the whole state of the tree regarding non-matching
pairs of DatumGet*() and *GetDatum(), I have found much more code paths
that are incorrect, unrelated to 6dcfac9696. These may be adjusted in
the future, in a different patch (perhaps not for v19, as we are already
past feature freeze).
Reported-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/97f9375a-be61-4272-a44d-408337fe8fa6@eisentraut.org
Discussion: https://postgr.es/m/CAJ7c6TMcGu8qmRe1gZfJ-gOzVnZq-t=fwn-UuyStx1w6ZyydMw@mail.gmail.com
"lock" a values is supported since 4019f725f5, but the error message
of the function used when specifying an incorrect value forgot about it.
Author: Maksim Logvinenko <logvinenko-ms@yandex.ru>
Discussion: https://postgr.es/m/433431777389005@mail.yandex.ru
After a recent macOS update, building Postgres produces warnings
that look like this:
ranlib: warning: 'libpgport_shlib.a(pg_cpu_x86.c.o)' has no symbols
ranlib: warning: 'libpgport_shlib.a(pg_popcount_x86.c.o)' has no symbols
To fix, add a dummy symbol to files that may otherwise have none.
Per project policy, this is a candidate for back-patching into
out-of-support branches: it suppresses annoying compiler warnings
but changes no behavior.
Reported-by: Zhang Mingli <zmlpostgres@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/229aaaf3-f529-44ed-8e50-00cb6909af21%40Spark
Backpatch-through: 13
TidStoreSetBlockOffsets() requires its offsets array to be strictly
ascending and asserts this precondition. In test_tidstore, we were
passing random offset numbers deduplicated by a DISTINCT clause in an
array_agg() call directly to the do_set_block_offsets() test
harness. However, DISTINCT without an ORDER BY clause does not
guarantee sorted results according to the SQL standard.
Fix this by sorting the offsets in-place inside do_set_block_offsets()
before calling TidStoreSetBlockOffsets().
While this assertion failure is not observed during regular regression
tests because they use queries simple enough that the optimizer
consistently chooses plans yielding sorted results, it makes sense to
stabilize the test. The failure could theoretically occur depending on
the optimizer's plan choice, and has been reported when experimenting
with certain third-party extensions.
Backpatch to v17, where test_tidstore was introduced, to ensure
extension development on stable branches does not hit this assertion.
Reported-by: Andrei Lepikhov <lepihov@gmail.com>
Author: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/b97f1850-fc7b-43c4-9b04-4e97bb9e7dc0@gmail.com
Backpatch-through: 17
The tests introduced in c529ee38b9 are timezone sensitive.
Pin the cluster's timezone to UTC at init time so timestamptz output
is deterministic regardless of the host's local timezone.
The regression tests for pg_get_role_ddl(), pg_get_database_ddl(),
and pg_get_tablespace_ddl() created databases and tablespaces, which
are heavyweight operations. As noted by Andres Freund, this is
wasteful in the core regression suite which gets run repeatedly.
Convert the three test files (role_ddl.sql, database_ddl.sql,
tablespace_ddl.sql) into a single TAP test that runs once, covering
all the same functionality: basic DDL generation, pretty-printing,
option handling, error cases, permission checks, and edge cases like
quoted names and role memberships.
Discussion: https://postgr.es/m/5c67dc79-909a-4e17-8606-6686667da6c6@dunslane.net
btestimateparallelscan neglected to add btps_arrElems[] space overhead
for skip array scan keys that were later output by nbtree preprocessing.
Skip arrays don't actually need to use this space, but a scan with a
subsequent SAOP array will need to subscript btps_arrElems[] using a
simple so->arrayKeys[]-wise offset. so->arrayKeys[] has entries for
both kinds of arrays.
As a result of this oversight, it was possible for an index scan with a
skip array and a lower-order SAOP array to write past the allocated
shared memory boundary when storing the SAOP array's cur_elem. In
practice the problem seems to be limited to scans with many skipped
index columns, since our general approach to estimating the amount of
shared memory that will be required is fairly conservative.
To fix, have btestimateparallelscan request an extra sizeof(int) space
for key columns that might require a skip array later on.
Oversight in commit 92fe23d9, which added the nbtree skip scan
optimization.
Author: Siddharth Kothari <sidkot@google.com>
Discussion: https://postgr.es/m/CAGCUe0Lwk3C0qdkBa+OLpYc7yXwW=pbaz8Sju4xMXEQAmyp+5g@mail.gmail.com
Backpatch-through: 18
Do minor comment fixes and remove implicit cast to Datum.
While here, let's prefer crashing instead of entering an infinite
loop in case of future programming mistakes when computing next_level,
suggested by ChangAo Chen.
Discussion: https://postgr.es/m/tencent_49E3F11E74D8A584A2144ED532A490CBC40A@qq.com
When a subscription has retain_dead_tuples enabled and maxretention is
zero (unlimited), adjust_xid_advance_interval() mistakenly caps
xid_advance_interval to zero.
This zero interval forces get_candidate_xid() to evaluate
TimestampDifferenceExceeds() as always true, causing the apply worker to
call GetOldestActiveTransactionId() for every WAL message. This
leads to unnecessary ProcArrayLock acquisitions.
Fix this by only capping the interval when maxretention > 0, allowing
the exponential back-off to function properly.
Author: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDdKVnCLHot=AcoPpEiSyDzGz7wGYjAFHVOw57oDtmUDWQ@mail.gmail.com
Similarly to logical replication, REPACK CONCURRENTLY needs to ability
to reliably locate a tuple based on an identity. A replica identity
index is okay. Primary keys normally also are, except when they are
deferrable, because a tuple being modified might not yet be indexed,
causing REPACK to fail.
Change the REPACK CONCURRENTLY code to use GetRelationIdentityOrPK(),
similar to what the logical replication code does. (Though we don't yet
support locating tuples based on arbitrary indexes for replica identity
FULL.)
While at it, add a few more test cases for situations that aren't
supported by REPACK, to improve coverage.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Reviewed-by: Yuchen Li <liyuchen_xyz@163.com>
Discussion: https://postgr.es/m/10DD5E13-B45D-44F1-BE08-C63E00ABCAC0@gmail.com
Previously, these test cases would give internal errors or crash. The
fix is to add some missing fields of ForPortionOfExpr to
expression_tree_walker.
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://postgr.es/m/CACJufxHs1Hs00EqsZ4NbuAjmYzMzjJyP1sAj12Ne=cBsEVmQOA@mail.gmail.com
remove_self_join_rel() called adjust_relid_set() on all_result_relids
and leaf_result_relids but threw away the return value. Since
adjust_relid_set() returns a freshly-built Relids and does not modify
the input in place, the calls did nothing. This has been the case
since the SJE feature went in (commit fc069a3a6).
There has been no observable misbehavior, because the relid being
passed is guaranteed not to be a member of either set. At the point
remove_self_join_rel() runs, those sets contain only resultRelation;
inheritance children have not been added yet, as that happens later in
query_planner(), in expand_single_inheritance_child() called from
add_other_rels_to_query(). And remove_self_joins_recurse() rejects
parse->resultRelation as an SJE candidate to preserve the EvalPlanQual
mechanism. Even with the result assigned, the calls would be no-ops
in practice.
Rather than make the calls do the cleanup they pretend to do, replace
them with assertions of the invariant. Any future loosening of the
SJE candidate filter -- for instance to allow eliminating a result
relation under provable conditions -- will trip the assertion and
force whoever does it to revisit this code.
Additionally, decorate adjust_relid_set() with pg_nodiscard so that
any future accidental discard of its return value is caught at compile
time.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49fYQcqJfJ_Gtn8r1GFNoYtb1=2AUab4ieuqY4Zid9ocQ@mail.gmail.com
These are old leaks, that can pile up if a WAL receiver stays alive,
waiting for new WAL data after the sender has switched to a new
timeline.
While this is technically a bug, the impact is minimal and would only
become noticeable if the WAL sender handles a lot of timeline switches,
so no backpatch is done. Note that in most cases, primary_conninfo
would be updated in a standby to point to a new sender, meaning a
restart of the WAL receiver. Let's be clean on HEAD, though.
Author: DaeMyung Kang <charsyam@gmail.com>
Discussion: https://postgr.es/m/20260426170100.847923-1-charsyam@gmail.com
Discussion: https://postgr.es/m/20260426170219.849330-1-charsyam@gmail.com
Quote pg_hosts.conf fields derived from the build directory, since
hba.c:next_token() treats a comma as a token separator. Commit
4f433025f6 introduced pg_hosts.conf and
this test. A build directory name containing a comma worked before that
commit. A build directory name containing a quote character has not
worked, so don't handle that.
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/20260426213252.7a@rfd.leadboat.com
British Columbia (America/Vancouver) moved to permanent UTC-07 on
2026-03-09, which will affect their clocks beginning on 2026-11-01.
For lack of any clarity on the point, assume their TZ abbreviation
will be MST from that time forward.
Moldova (Europe/Chisinau) has followed EU DST transition times since
2022.
Backpatch-through: 14
We need to create top-level targets to run targets with the ninja
command like `ninja <target_name>`.
Some targets (man, html, ...) have the same target name on both
top-level and custom target. This creates a confusion for the meson
build:
$ meson compile -C build html
```
ERROR: Can't invoke target `html`: ambiguous name. Add target type
and/or path:
- ./doc/src/sgml/html:custom
- ./doc/src/sgml/html:alias
```
Solve that problem by adding '-custom' suffix to these problematic
targets' custom target names. Top-level targets can be called with
both meson and ninja now:
$ meson compile -C build html
$ ninja -C build html
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Suggested-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/5508e572-79ae-4b20-84d0-010a66d077f2%40eisentraut.org
Expressions in GRAPH_TABLE COLUMNS list may have lateral references.
get_rule_expr() requires lateral namespaces to deparse such
references. get_from_clause_item() does not pass them when processing
the expressions in COLUMNS list causing ERROR "bogus varlevelsup: 0
offset 0". Fix get_from_clause_item() to pass input deparse_context
containing lateral namespaces to get_rule_expr() instead of the dummy
context.
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHg%2BQDcLVa2iBnggkHxY4itZbXtDMfsYHEjnCUYe9hNbnxDi-w%40mail.gmail.com
GRAPH_TABLE clause is converted into a rangetable entry, which is
ignored by assign_query_collations(). Hence we assign collations
while transforming its parts. But expressions in COLUMNS clause
missed that treatment, so fix that.
While at it, also add comments about collation assignment to the parts
of GRAPH_TABLE clause, and also fix a small grammar issue.
Reported-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAHg+QDc4aaiufYSgrwMMPMMRTPtQ66SghcrPFbWJFZMqNaG+BA@mail.gmail.com
generate_queries_for_path_pattern_recurse() and
generate_setop_from_pathqueries() are recursive functions. For a
property graph with hundreds of tables, a graph pattern with a handful
element patterns can cause stack overflow. Fix it by calling
check_stack_depth() at the beginning of these functions.
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAHg+QDfgK0xddH8f3eAb+UVn7sBDOnv8RvM6OkP4HtHAt6aD7w@mail.gmail.com
Commit 0b096e379e changed pg_test_timing to measure timing
differences in nanoseconds instead of microseconds, but the resulting
deltas continued to be stored in int32.
That can overflow for large gaps (for example, values greater than about
2.14 seconds in nanoseconds), leading to truncation or incorrect output.
This commit fixes the issue by storing measured timing deltas in int64.
This prevents overflow for large values and better matches
nanosecond-resolution measurements.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Xiaopeng Wang <wxp_728@163.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/F780CEEB-A237-4302-9F55-60E9D8B6533D@gmail.com
ExecEvalHashedScalarArrayOp(), when using a strict equality function,
performs a short-circuit when looking up NULL values. When the function
is non-strict, the code incorrectly looked up the hash table for a
zero-valued Datum, which could have resulted in an accidental true
return if the hash table contained zero valued Datum, or could result
in a crash for non-byval types.
Here we fix this by adding an extra step when we build the hash table to
check what the result of a NULL lookup would be. This requires looping
over the array and checking what the non-hashed version of the code
would do. We cache the results of that in the expression so that we can
reuse the result any time we're asked to search for a NULL value.
It's important to note that non-strict equality functions are free to
treat any NULL value as equal to any non-NULL value. For example,
someone may wish to design a type that treats an empty string and NULL
as equal.
All built-in types have strict equality functions, so this could affect
custom / user-defined types.
Author: Chengpeng Yan <chengpeng_yan@outlook.com>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: ChangAo Chen <cca5507@qq.com>
Discussion: https://postgr.es/m/A16187AE-2359-4265-9F5E-71D015EC2B2D@outlook.com
Backpatch-through: 14
pg_test_timing reports timing differences in nanoseconds in master, and
in microseconds in v14 through v18, but previously the backward-clock
warning incorrectly labeled the value as milliseconds.
This commit fixes the warning message to use "ns" in master and
"us" in v14 through v18, matching the actual unit being reported.
Backpatch to all supported versions.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Xiaopeng Wang <wxp_728@163.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/F780CEEB-A237-4302-9F55-60E9D8B6533D@gmail.com
Backpatch-through: 14
If CheckAttributeType() is called with InvalidOid, it performs a bunch
of pointless, futile syscache lookups with InvalidOid, but ultimately
tolerates it and has no effect. We were calling it with InvalidOid on
dropped columns, but it seems accidental that it works, so let's stop
doing it.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/93ce56cd-02a6-4db1-8224-c8999372facc@iki.fi
Backpatch-through: 14
CheckAttributeType() checks that a composite type is not made a member
of itself with ALTER TABLE ADD COLUMN or ALTER TYPE ADD ATTRIBUTE,
even indirectly via a domain, array, another composite type or a range
type. But it missed checking for multiranges. That was a simple
oversight when multiranges were added.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/93ce56cd-02a6-4db1-8224-c8999372facc@iki.fi
Backpatch-through: 14
These tests sometimes run with wal_level=minimal, which does not allow
to run REPACK (CONCURRENTLY). Move them to test_decoding, which is
ensured to run with high enough wal_level.
Discussion: https://postgr.es/m/260901.1776696126@sss.pgh.pa.us
The psql describe (`\d`) footer titles were previously unintuitive when
listing publications that included or excluded specific tables. Even
though the tag for included publications was pre-existing, it is better
to update it to "Included in publications:" to match the phrasing of
the "Excluded from publications:" tag.
Footer titles for sequence and schema descriptions have been updated
similarly to maintain consistency.
Reported-by: Álvaro Herrera <alvherre@kurilemu.de>
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Yuchen Li <liyuchen_xyz@163.com>
Discussion: https://postgr.es/m/aeDs7iZUox1bbKAK%40alvherre.pgsql
There's some talk about upgrading our current -Wshadow=compatible-local
up to -Wshadow. There's some pending questions as to whether the churn
and extra backpatching pain are worthwhile for doing all of them. We
can't use the latter argument for ones that are new to v19, providing we
fix them now. So let's fix those ones so that the problem is not any
worse for if we decide to fix the remainder for v20.
Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Yuchen Li <liyuchen_xyz@163.com>
Discussion: https://postgr.es/m/CAApHDvp=rx5GxM=yW8QhFF3noXtYt7LkOxJ7zkaPOzpti4Gm8w@mail.gmail.com
The problem report was about setting GUCs in the startup packet for a
physical replication connection. Setting the GUC required an ACL
check, which performed a lookup on pg_parameter_acl.parname. The
catalog cache was hardwired to use DEFAULT_COLLATION_OID for
texteqfast() and texthashfast(), but the database default collation
was uninitialized because it's a physical walsender and never connects
to a database. In versions 18 and later, this resulted in a NULL
pointer dereference, while in version 17 it resulted in an ERROR.
As the comments stated, using DEFAULT_COLLATION_OID was arbitrary
anyway: if the collation actually mattered, it should have used the
column's actual collation. (In the catalog, some text columns are the
default collation and some are "C".)
Fix by using C_COLLATION_OID, which doesn't require any initialization
and is always available. When any deterministic collation will do,
it's best to consistently use the simplest and fastest one, so this is
a good idea anyway.
Another problem was raised in the thread, which this commit doesn't
fix (see second discussion link).
Reported-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/D18AD72A-5004-4EF8-AF80-10732AF677FA@yandex-team.ru
Discussion: https://postgr.es/m/4524ed61a015d3496fc008644dcb999bb31916a7.camel%40j-davis.com
Backpatch-through: 17
Commit 7a1f0f8747 optimized the slot verification query but
overlooked cases where all logical replication slots are already
invalidated. In this scenario, the CTE returns no rows, causing the
main query (which used a cross join) to return an empty result even
when invalid slots exist.
This commit fixes this by using a LEFT JOIN with the CTE, ensuring
that slots are properly reported even if the CTE returns no rows.
Author: Lakshmi N <lakshmin.jhs@gmail.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CA+3i_M8eT6j8_cBHkYykV-SXCxbmAxpVSKptjDVq+MFtpT-Paw@mail.gmail.com
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in a few places. Most of
these inconsistencies were introduced during Postgres 19 development.
This commit was written with help from clang-tidy, by mechanically
applying the same rules as similar clean-up commits (the earliest such
commit was commit 035ce1fe).
to_char() allocates its output buffer with 8 bytes per formatting
code in the pattern. If the locale's currency symbol, thousands
separator, or decimal or sign symbol is more than 8 bytes long,
in principle we could overrun the output buffer. No such locales
exist in the real world, so it seems sufficient to truncate the
symbol if we do see it's too long.
Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/638232.1776790821@sss.pgh.pa.us
Backpatch-through: 14
parse_affentry() and addCompoundAffixFlagValue() each collect fields
from an affix file into working buffers of size BUFSIZ. They failed
to defend against overlength fields, so that a malicious affix file
could cause a stack smash. BUFSIZ (typically 8K) is certainly way
longer than any reasonable affix field, but let's fix this while
we're closing holes in this area.
I chose to do this by silently truncating the input before it can
overrun the buffer, using logic comparable to the existing logic in
get_nextfield(). Certainly there's at least as good an argument for
raising an error, but for now let's follow the existing precedent.
Reported-by: Igor Stepansky <igor.stepansky@orca.security>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/864123.1776810909@sss.pgh.pa.us
Backpatch-through: 14
This function writes into a caller-supplied buffer of length
2 * MAXNORMLEN, which should be plenty in real-world cases.
However a malicious affix file could supply an affix long
enough to overrun that. Defend by just rejecting the match
if it would overrun the buffer. I also inserted a check of
the input word length against Affix->replen, just to be sure
we won't index off the buffer, though it would be caller error
for that not to be true.
Also make the actual copying steps a bit more readable, and remove
an unnecessary requirement for the whole input word to fit into the
output buffer (even though it always will with the current caller).
The lack of documentation in this code makes my head hurt, so
I also reverse-engineered a basic header comment for CheckAffix.
Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/641711.1776792744@sss.pgh.pa.us
Backpatch-through: 14
When using ALTER TABLE ... MERGE PARTITIONS or ALTER TABLE ... SPLIT
PARTITION, extension dependencies on partition indexes were being lost.
This happened because the new partition indexes are created fresh from
the parent partitioned table's indexes, while the old partition indexes
(with their extension dependencies) are dropped.
Fix this by collecting extension dependencies from source partition
indexes before detaching them, then applying those dependencies to the
corresponding new partition indexes after they're created. The mapping
between old and new indexes is done via their common parent partitioned
index.
For MERGE operations, all source partition indexes sharing a parent
partitioned index must have the same extension dependencies; if they
differ, an error naming both conflicting partition indexes is raised.
The check is implemented by collecting one entry per partition index,
sorting by parent index OID, and comparing adjacent entries in a single
pass. This is order-independent: the same set of partitions produces
the same decision regardless of the order they are listed in the MERGE
command, and subset mismatches are caught in both directions.
For SPLIT operations, the new partition indexes simply inherit all
extension dependencies from the source partition's index.
The regression tests exercising this feature live under
src/test/modules/test_extensions, where the test_ext3 and test_ext5
extensions are available; core regression tests cannot assume any
particular extension is installed.
Author: Matheus Alcantara <matheusssilv97@gmail.com>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Reported-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Dmitry Koval <d.koval@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/CALdSSPjXtzGM7Uk4fWRwRMXcCczge5uNirPQcYCHKPAWPkp9iQ%40mail.gmail.com
Formerly, attempting to use WHERE CURRENT OF to update or delete from
a table with virtual generated columns would fail with the error
"WHERE CURRENT OF on a view is not implemented".
The reason was that the check preventing WHERE CURRENT OF from being
used on a view was in replace_rte_variables_mutator(), which presumed
that the only way it could get there was as part of rewriting a query
on a view. That is no longer the case, since replace_rte_variables()
is now also used to expand the virtual generated columns of a table.
Fix by doing the check for WHERE CURRENT OF on a view at parse time.
This is safe, since it is no longer possible for the relkind to change
after the query is parsed (as of b23cd185f).
Reported-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDc_TwzSgb=B_QgNLt3mvZdmRK23rLb+RkanSQkDF40GjA@mail.gmail.com
Backpatch-through: 18
If the SET or WHERE clause of an INSERT ... ON CONFLICT command
references EXCLUDED.col, where col is a virtual generated column, the
column was not properly expanded, leading to an "unexpected virtual
generated column reference" error, or incorrect results.
The problem was that expand_virtual_generated_columns() would expand
virtual generated columns in both the SET and WHERE clauses and in the
targetlist of the EXCLUDED pseudo-relation (exclRelTlist). Then
fix_join_expr() from set_plan_refs() would turn the expanded
expressions in the SET and WHERE clauses back into Vars, because they
would be found to match the expression entries in the indexed tlist
produced from exclRelTlist.
To fix this, arrange for expand_virtual_generated_columns() to not
expand virtual generated columns in exclRelTlist. This forces
set_plan_refs() to resolve generation expressions in the query using
non-virtual columns, as required by the executor.
In addition, exclRelTlist now always contains only Vars. That was
something already claimed in a couple of existing comments in the
planner, which relied on that fact to skip some processing, though
those did not appear to constitute active bugs.
Reported-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDf7wTLz_vqb1wi1EJ_4Uh+Vxm75+b4c-Ky=6P+yOAHjbQ@mail.gmail.com
Backpatch-through: 18
The ri_FetchConstraintInfo() and ri_LoadConstraintInfo() functions
were declared to return const RI_ConstraintInfo *, but callers
sometimes need to modify the struct, requiring casts to drop the
const. Remove the misapplied const qualifiers and the casts that
worked around them.
Reported-by: Peter Eisentraut <peter@eisentraut.org>
Author: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/548600ed-8bbb-4e50-8fc3-65091b122276@eisentraut.org
This commit tweaks ALTER INDEX .. ATTACH PARTITION to attempt a
validation of a parent index in the case where an index is already
attached but the parent is not yet valid. This occurs in cases where a
parent index was created invalid such as with CREATE INDEX ONLY, but was
left invalid after an invalid child index was attached (partitioned
indexes set indisvalid to false if at least one partition is
!indisvalid, indisvalid is true in a partitioned table iff all
partitions are indisvalid). This could leave a partition tree in a
situation where a user could not bring the parent index back to valid
after fixing the child index, as there is no built-in mechanism to do
so. This commit relies on the fact that repeated ATTACH PARTITION
commands on the same index silently succeed.
An invalid parent index is more than just a passive issue. It causes
for example ON CONFLICT on a partitioned table if the invalid parent
index is used to enforce a unique constraint.
Some test cases are added to track some of problematic patterns, using a
set of partition trees with combinations of invalid indexes and ATTACH
PARTITION.
Reported-by: Mohamed Ali <moali.pg@gmail.com>
Author: Sami Imseih <sanmimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Discussion: http://postgr.es/m/CAGnOmWqi1D9ycBgUeOGf6mOCd2Dcf=6sKhbf4sHLs5xAcKVCMQ@mail.gmail.com
Backpatch-through: 14
This neglected to set TAP_TESTS = 1, and partially compensated
for that by writing duplicative hand-made rules for check and
installcheck. That's not really sufficient though. The way
I noticed the error was that "make distclean" didn't clean out
the tmp_check subdirectory, and there might be other consequences.
Do it the standard way instead.
FlushUnlockedBuffer() accepted io_object and io_context arguments but
hardcoded IOOBJECT_RELATION and IOCONTEXT_NORMAL when calling
FlushBuffer(). Pass them through instead. Also fix FlushBuffer() to use
its io_object parameter for I/O timing stats rather than hardcoding
IOOBJECT_RELATION.
Not actively broken since all current callers pass IOOBJECT_RELATION and
IOCONTEXT_NORMAL, so not backpatched.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/BC97546F-5C15-42F2-AD57-CFACDB9657D0@gmail.com
The btree_gist enum test expects a bitmap heap scan. Since b46e1e54d0
enabled setting the VM during on-access pruning and 378a21618 set
pd_prune_xid on INSERT, scans of enumtmp may set pages all-visible.
If autovacuum or autoanalyze then updates pg_class.relallvisible, the
planner could choose an index-only scan instead.
Make the enumtmp a temp table to exclude it from autovacuum/autoanalyze.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/46733d68-aec0-4d09-8120-4c66b87047a4%40gmail.com
Since b46e1e54d0 allowed setting the VM on-access and 378a21618 set
pd_prune_xid on INSERT, the testing of generic/custom plans in
src/test/regress/sql/plancache.sql was destabilized.
One of the queries of test_mode could have set the pages all-visible and
if autovacuum/autoanalyze ran and updated pg_class.relallvisible, it
would affect whether we got an index-only or sequential scan.
Preclude this by disabling autovacuum and autoanalyze for test_mode and
carefully sequencing when ANALYZE is run.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/71277259-264e-4983-a201-938b404049d7%40gmail.com
GetLocalPinLimit() and GetAdditionalLocalPinLimit(), currently in use
only by the read stream, previously allowed a backend to pin all
num_temp_buffers local buffers. This meant that the read stream could
use every available local buffer for read-ahead, leaving none for other
concurrent pin-holders like other read streams and related buffers like
the visibility map buffer needed during on-access pruning.
This became more noticeable since b46e1e54d0, which allows on-access
pruning to set the visibility map, which meant that some scans also
needed to pin a page of the VM. It caused a test in
src/test/regress/sql/temp.sql to fail in some cases.
Cap the local pin limit to num_temp_buffers / 4, providing some
headroom. This doesn't guarantee that all needed pins will be available
— for example, a backend can still open more cursors than there are
buffers — but it makes it less likely that read-ahead will exhaust the
pool.
Note that these functions are not limited by definition to use in the
read stream; however, this cap should be appropriate in other contexts.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/97529f5a-ec10-46b1-ab50-4653126c6889%40gmail.com
We installed this in commit eea9fa9b2 to protect against foreseeable
mistakes that would break ABI in stable branches by renumbering
NodeTag enum entries. However, we now have much more thorough
ABI stability checks thanks to buildfarm members using libabigail
(see the .abi-compliance-history mechanism). So this incomplete,
single-purpose check seems like an anachronism. I wouldn't object
to keeping it were it not that it requires an additional manual step
when making a new stable git branch. That seems like something easy
to screw up, so let's get rid of it.
This patch just removes the logic that checks for changes in the last
auto-assigned NodeTag value. We still need eea9fa9b2's cross-check
on the supplied list of header files, to prevent divergence between
the makefile and meson build systems. We'll also sometimes need the
nodetag_number() infrastructure for hand-assigning new NodeTags in
stable branches.
Discussion: https://postgr.es/m/1458883.1776143073@sss.pgh.pa.us
We were using "select count(*) into x from generate_series(1,
1_000_000_000_000)" to waste one second waiting for a statement
timeout trap. Aside from consuming CPU to little purpose, this could
easily eat several hundred MB of temporary file space, which has been
observed to cause out-of-disk-space errors in the buildfarm.
Let's just use "pg_sleep(10)", which is far less resource-intensive.
Also update the "when others" exception handler so that if it does
ever again trap an error, it will tell us what error. The cause of
these intermittent buildfarm failures had been obscure for awhile.
Discussion: https://postgr.es/m/557992.1776779694@sss.pgh.pa.us
Backpatch-through: 14
This batch is similar to 462fe0ff62 and addresses a variety of code
style issues, including grammar mistakes, typos, inconsistent variable
names in function declarations, and incorrect function names in comments
and documentation. These fixes have accumulated on the community
mailing lists since the commit mentioned above.
Notably, Alexander Lakhin previously submitted a patch identifying many
of the trivial typos and grammar issues that had been reported on
pgsql-hackers. His patch covered a somewhat large portion of the issues
addressed here, though not all of them.
The documentation changes only affect HEAD.
When a rule action or rule qualification references NEW.col where col
is a generated column (stored or virtual), the rewriter produces
incorrect results.
rewriteTargetListIU removes generated columns from the query's target
list, since stored generated columns are recomputed by the executor
and virtual ones store nothing. However, ReplaceVarsFromTargetList
then cannot find these columns when resolving NEW references during
rule rewriting. For UPDATE, the REPLACEVARS_CHANGE_VARNO fallback
redirects NEW.col to the original target relation, making it read the
pre-update value (same as OLD.col). For INSERT,
REPLACEVARS_SUBSTITUTE_NULL replaces it with NULL. Both are wrong
when the generated column depends on columns being modified.
Fix by building target list entries for generated columns from their
generation expressions, pre-resolving the NEW.attribute references
within those expressions against the query's targetlist, and passing
them together with the query's targetlist to ReplaceVarsFromTargetList.
Back-patch to all supported branches. Virtual generated columns were
added in v18, so the back-patches in pre-v18 branches only handle
stored generated columns.
Reported-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDexGTmCZzx=73gXkY2ZADS6LRhpnU+-8Y_QmrdTS6yUhA@mail.gmail.com
Backpatch-through: 14
When the startup process exists with a FATAL error during PM_STARTUP,
the postmaster called ExitPostmaster() directly, assuming that no other
processes are running at this stage. Since 7ff23c6d27, this
assumption is not true, as the checkpointer, the background writer, the
IO workers and bgworkers kicking in early would be around.
This commit removes the startup-specific shortcut happening in
process_pm_child_exit() for a failing startup process during PM_STARTUP,
falling down to the existing exit() flow to signal all the started
children with SIGQUIT, so as we have no risk of creating orphaned
processes.
This required an extra change in HandleFatalError() for v18 and newer
versions, as an assertion could be triggered for PM_STARTUP. It is now
incorrect. In v17 and older versions, HandleChildCrash() needs to be
changed to handle PM_STARTUP so as children can be waited on.
While on it, fix a comment at the top of postmaster.c. It was claiming
that the checkpointer and the background writer were started after
PM_RECOVERY. That is not the case.
Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAJTYsWVoD3V9yhhqSae1_wqcnTdpFY-hDT7dPm5005ZFsL_bpA@mail.gmail.com
Backpatch-through: 15
The documentation previously described the io_max_workers,
io_worker_idle_timeout, and io_worker_launch_interval GUCs as
type "int". However, the documentation consistently uses "integer"
for parameters of this type.
This commit updates these parameter descriptions to use "integer"
for consistency.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAHGQGwEpMDpB-K8SSUVRRHg6L6z3pLAkekd9aviOS=ns0EC=+Q@mail.gmail.com
The documentation for jit_debugging_support and jit_profiling_support
previously stated that these parameters can only be set at server start.
However, both parameters use the PGC_SU_BACKEND context, meaning they
can be set at session start by superusers or users granted the appropriate
SET privilege, but cannot be changed within an active session.
This commit updates the documentation to reflect the actual behavior.
Backpatch to all supported versions.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAHGQGwEpMDpB-K8SSUVRRHg6L6z3pLAkekd9aviOS=ns0EC=+Q@mail.gmail.com
Backpatch-through: 14
Replace the outdated DatumGetCString(DirectFunctionCall1(textout, ...))
pattern with TextDatumGetCString(). The macro is the modern, more
efficient way to convert a text Datum to a C string as it avoids
unnecessary function call machinery and handles detoasting internally.
Since plsample serves as reference code for extension authors, it
should follow current idiomatic practices.
Author: Amul Sul <sulamul@gmail.com>
Discussion: https://postgr.es/m/CAAJ_b95-xMvUN1PEqxv8y6g-A-8k+fSgyv20kSZc9eF1wZAUPg@mail.gmail.com
Commit cfcd57111 et al fell over under Valgrind testing.
(It seems to be enough to #define USE_VALGRIND, you don't actually
need to run it under Valgrind to see failures.) The cause is that
remove_rel_from_eclass updates each EquivalenceMember's em_relids,
and those can be aliases of the left_relids or right_relids of some
RestrictInfo in ec_sources. If the update made em_relids empty then
bms_del_member will have pfree'd the relid set, so that the subsequent
attempt to clean up ec_sources accesses already-freed memory.
We missed seeing ill effects before cfcd57111 because (a) if the
pfree happens then we will remove the EquivalenceMember altogether,
making the source RestrictInfo no longer of use, and (b) the
cleanup of ec_sources didn't touch left/right_relids before that.
I'm unclear though on how cfcd57111 managed to pass non-USE_VALGRIND
testing. Apparently we managed to store another Bitmapset into the
freed space before trying to access it, but you'd not think that would
happen 100% of the time. I think what USE_VALGRIND changes is that it
makes list.c much more memory-hungry, so that the freed space gets
claimed by some List node before a Bitmapset can be put there.
This failure can be seen in v16, v17, and master, but oddly enough not
v18. That's because the SJE patch replaced the simple bms_del_members
calls used here with adjust_relid_set, which is careful not to
scribble on its input. But commit 20efbdffe just recently put back
the old coding and thus resurrected the problem.
Discussion: https://postgr.es/m/458729.1776724816@sss.pgh.pa.us
Backpatch-through: 16, 17, master
The second function signature entry for pg_get_tablespace_ddl() was
missing the role="func_signature" attribute. This commit adds the
missing attribute to ensure consistent formatting with other function
entries.
Author: Tatsuya Kawata <kawatatatsuya0913@gmail.com>
Discussion: https://postgr.es/m/CAHza6qcSgwdh+f41zEm6NSaGHvs5_cwjVu22+KTic=TfnonrFA@mail.gmail.com
The original implementation of remove_rel_from_restrictinfo()
thought it could skate by with removing no-longer-valid relid
bits from only the clause_relids and required_relids fields.
This is quite bogus, although somehow we had not run across a
counterexample before now. At minimum, the left_relids and
right_relids fields need to be fixed because they will be
examined later by clause_sides_match_join(). But it seems
pretty foolish not to fix all the relid fields, so do that.
This needs to be back-patched as far as v16, because the
bug report shows a planner failure that does not occur
before v16. I'm a little nervous about back-patching,
because this could cause unexpected plan changes due to
opening up join possibilities that were rejected before.
But it's hard to argue that this isn't a regression. Also,
the fact that this changes no existing regression test results
suggests that the scope of changes may be fairly narrow.
I'll refrain from back-patching further though, since no
adverse effects have been demonstrated in older branches.
Bug: #19460
Reported-by: François Jehl <francois.jehl@pigment.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/19460-5625143cef66012f@postgresql.org
Backpatch-through: 16
Before each call to the SRF, initialize isnull and isDone, as per the
comments for struct ReturnSetInfo. This fixes a Coverity warning
about rsi.isDone not being initialized. The built-in
{multi,}range_minus_multi functions don't return without setting it,
but a user-supplied function might not be as accommodating.
We also add statistics tracking around the function call, which
will be expected once user-defined withoutPortionProcs functions
are supported, and a cross-check on rsi.returnMode just for
paranoia's sake.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://postgr.es/m/4126231.1776622202@sss.pgh.pa.us
Although REPACK (CONCURRENTLY) uses replication slots, there is no
concern that the slot will leak data of other users, because the
MAINTAIN privilege on the table is required anyway; requiring
REPLICATION is user-unfriendly without providing any actual protection.
A related aspect is that the REPLICATION attribute is not needed to
prevent REPACK from stealing slots from logical replication, since
commit e76d8c749c made REPACK use a separate pool of replication
slots.
Similarly, there's no reason to require that the table owner has the
LOGIN privilege. Bypass the default behavior in the background worker
launch sequence.
Because there are now successful concurrent repack runs in the
regression tests, we're forced to run test_plan_advice under
wal_level=replica, so add that. Also, move the cluster.sql test to a
different parallel group in parallel_schedule: apparently the use of the
repack worker causes it to exceed the maximum limit of processes in some
runs (the actual limit reached is the number of XIDs in a snapshot's xip
array).
Author: Antonin Houska <ah@cybertec.at>
Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/aeJHPNmL4vVy3oPw@pryzbyj2023
Create the PL/pgSQL function and procedure for the top-level WAIT FOR
checks in a single transaction, then wait once for standby replay before
running both tests. Also revise some surrounding comments.
This avoids an extra 'wait_for_catchup()' on the delayed standby without
changing the test coverage.
Discussion: https://postgr.es/m/CABPTF7WZ1yuYz8V%3Dxsbghg8e7qaAm5MpyNw6BthWcbN7%2BP6biw%40mail.gmail.com
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
The self-join elimination (SJE) commit grafted self-join removal onto
remove_rel_from_query(), which was originally written for left-join
removal only. This resulted in several issues:
- Comments throughout remove_rel_from_query() still assumed only
left-join removal, making the code misleading.
- ChangeVarNodesExtended was called on phv->phexpr with subst=-1
during left-join removal, which is pointless and confusing since any
surviving PHV shouldn't reference the removed rel.
- phinfo->ph_lateral was adjusted for left-join removal, which is
unnecessary since the removed relid cannot appear in ph_lateral for
outer joins.
- The comment about attr_needed reconstruction was in
remove_rel_from_query(), but the actual rebuild is performed by the
callers.
- EquivalenceClass processing in remove_rel_from_query() is redundant
for self-join removal, since the caller (remove_self_join_rel)
already handles ECs via update_eclasses().
- In remove_self_join_rel(), ChangeVarNodesExtended was called on
root->processed_groupClause, which contains SortGroupClause nodes
that have no Var nodes to rewrite. The accompanying comment
incorrectly mentioned "HAVING clause".
This patch fixes all these issues, clarifying the separation between
left-join removal and self-join elimination code paths within
remove_rel_from_query(). The resulting code is also better structured
for adding new types of join removal (such as inner-join removal) in
the future.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://postgr.es/m/CAMbWs48JC4OVqE=3gMB6se2WmRNNfMyFyYxm-09vgpm+Vwe8Hg@mail.gmail.com
Parallel apply workers previously failed to report statistics while
waiting for new work in the main loop. This resulted in the stats from the
most recent transaction remaining unbuffered, leading to arbitrary
reporting delays—particularly when streamed transactions were infrequent.
This commit ensures that statistics are explicitly flushed when the worker
is idle, providing timely visibility into accumulated worker activity.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 16, where it was introduced
Discussion: https://postgr.es/m/TYRPR01MB1419579F217CC4332B615589594202@TYRPR01MB14195.jpnprd01.prod.outlook.com
Since f039c22441, the minimum version of meson supported is 0.57.2,
meaning that it is possible to use the result of declare_dependency()
when checking for headers with check_header(). There were two TODOs for
readline and gssapi to change declare_dependency() after upgrading to at
least 0.57.0, which were not addressed yet.
While on it, this fixes a comment related to str.replace(). The
function has been introduced in meson 0.58.0, not 0.56.
Author: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Tristan Partin <tristan@partin.io>
Discussion: https://postgr.es/m/00cd2e0c-85df-4cf9-a889-125d85e66980@proxel.se
1. Make it so test_random_operations() can accept a NULL to have the
function select a random seed.
2. Widen the seed parameter of test_random_operations() to bigint.
Without that, it'll be impossible to run the function with a seed
which was selected by GetCurrentTimestamp(), and if a randomly
selected seed ever results in a failure, we'll likely want to run
with the same seed to debug the issue.
3. Report the seed in the error messages in test_random_operations().
If the buildfarm were ever to fail there, we'd certainly want to know
what this was.
4. Add CHECK_FOR_INTERRUPTS() to test_random_operations(). Someone might
run with a large num_ops and they'd have no way to cancel the query.
5. Minor cosmetic fixes; header order and whitespace issue.
To allow #1, the STRICT modifier had to be removed. The additional
prechecks were added as I didn't see how else to handle someone passing
those parameters as NULL.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Greg Burd <greg@burd.me>
Discussion: https://postgr.es/m/CAApHDvrDW9W72vAr7h7XeCu7+Qz-_Vff02Q+RPPuVeM0Qf0MCw@mail.gmail.com
The switch from long to int64 in commit 13b935cd52 was incomplete.
It was shifting the constant 1L, which is not always 64 bit. Fix by
using an explicit int64 constant.
MSVC warning:
../src/backend/utils/hash/dynahash.c(1767): warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)
Also add the corresponding warning to the standard warning set on
MSVC, to help catch similar issues in the future.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/1142ad86-e475-41b3-aeee-c6ad913064fa%40eisentraut.org
The argument was marked as "const void *X", but that might rightly
give the compiler the idea that *X cannot be modified through the
resulting Datum, and make incorrect optimizations based on that. Some
functions use pointer Datums to pass output arguments, like GIN
support functions. Coverity started to complain after commit
6f5ad00ab7 that there's dead code in ginExtractEntries(), because it
didn't see that it passes PointerGetDatum(&nentries) to a function
that sets it.
This issue goes back to commit c8b2ef05f4 (version 16), which
changed PointerGetDatum() from a macro to a static inline function.
This commit changes it back to a macro, but uses a trick with a dummy
conditional expression to still produce a compiler error if you try to
pass a non-pointer as the argument.
Even though this goes back to v16, I'm only committing this to
'master' for now, to verify that this silences the Coverity warning.
If this works, we might want to introduce separate const and non-const
versions of PointerGetDatum() instead of this, but that's a bigger
patch. It's also not decided yet whether to back-patch this (or some
other fix), given that we haven't yet seen any hard evidence of
compilers actually producing buggy code because of this.
Discussion: https://www.postgresql.org/message-id/342012.1776017102@sss.pgh.pa.us
This one occurs when an outer join appears beneath the made-unique
side of a semijoin. The issue is that join RTEs are not featured
out of sj_unique_rels entries. Fix, and add a test case.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Analyzed-by: Tender Wang <tndrwang@gmail.com>
Discussion: http://postgr.es/m/c0c63979-43c2-4424-8fe8-56949934c9d8@gmail.com
Replace the previous negative phrasing such as "there are no slots whose
... is not true" with a direct expression that all slots must have
conflicting = false.
Similarly, reword the requirement on the new cluster to state that it must
not have any permanent logical slots, clarifying that any existing logical
slots must have temporary set to true.
These changes improve readability without altering the meaning.
Reported-by: <mimidatabase@gmail.com>
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/177609278737.403059.14174275013090471947%40wrigleys.postgresql.org
The documentation stated only that the log file created by pg_ctl -l is
inaccessible to other users by default. However, since commit c37b3d0,
the actual behavior is that only the cluster owner has access by default,
but users in the same group as the cluster owner may also read the file
if group access is enabled in the cluster.
This commit updates the documentation to describe this behavior
more clearly.
Backpatch to all supported versions.
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Xiaopeng Wang <wxp_728@163.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/OS9PR01MB1214959BE987B4839E3046050F54BA@OS9PR01MB12149.jpnprd01.prod.outlook.com
Backpatch-through: 14
Previously, tab completion after EXCEPT (...) always suggested FROM SERVER.
This was correct for IMPORT FOREIGN SCHEMA ... EXCEPT (...), but became
incorrect once commit fd366065e0 added CREATE PUBLICATION ... EXCEPT (...).
This commit updates tab completion so FROM SERVER is no longer suggested
after CREATE PUBLICATION ... EXCEPT (...), while preserving the existing
behavior for IMPORT FOREIGN SCHEMA ... EXCEPT (...).
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CALDaNm1-Fx6Msw6zcRuSjgQdw6asdTyp2DwP-4TCKGYAT+ndsA@mail.gmail.com
An invalid database has datconnlimit set to -2. pg_get_database_ddl()
emits this verbatim as CONNECTION LIMIT = -2, which ALTER DATABASE
rejects. Error out early instead.
Reported-by: Lakshmi N <lakshmin.jhs@gmail.com>
Author: Lakshmi N <lakshmin.jhs@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Hu Xunqi <huxunqi.08@gmail.com>
Discussion: https://postgr.es/m/CA+3i_M8m1k2gFch+tU0JmAQh9FRV+pFrfTXDrJo+BqmwsTmOhg@mail.gmail.com
Commit 3e2a1496ba made the psql TAP test require the DETAIL line on
platforms with SA_SIGINFO, rather than making it optional. This
unexpectedly blew up on OpenBSD buildfarm members, because OpenBSD does
not set si_pid for SIGTERM signals even though it has SA_SIGINFO
defined.
So revert to the test as it was in commit 55890a9194, where the detail
line being missing never causes an error.
Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/2007157.1776269052%40sss.pgh.pa.us
The backend running REPACK can check DecodingWorkerShared->initialized
before the worker could have the chance to initialize it, possibly
leading to wrong behavior.
While at it, remove DecodingWorkerShared->worker_dsm_segment, because
that doesn't actually need to be in shared memory; a simple local-memory
global variable is enough.
Oversights in commit 28d534e2ae.
Author: Antonin Houska <ah@cybertec.at>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/18181295-8375-4789-ad32-269d78d6001e@gmail.com
add323da40 started updating the visibility map in the same WAL record
as pruning and freezing. This included updating the freespace map during
replay of a record setting the VM, which we've done since ab7dbd681.
add323da40, however, conditioned doing so on there being > 0 freespace
on the page, which differed from the previous state for records updating
the VM.
The FSM is not WAL-logged and is instead updated heuristically on
standbys. In rare cases, this heuristic could lead to pages with 0
freespace having outdated entries in the FSM. If the standby is later
promoted and vacuum skips these pages because they are marked
all-visible/all-frozen, overly optimistic values would be propagated up
the FSM tree, causing slowness when searching for freespace for new
tuples.
Fix it by always updating the FSM during replay when setting VM bits.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Alexey Makhmutov <a.makhmutov@postgrespro.ru>
Discussion: https://postgr.es/m/ead2f110-c736-48f5-99e1-023dc9acbf0b%40postgrespro.ru
Commit a2b02293bc switched various checks to use XLogRecPtrIsValid(),
but later changes reintroduced XLogRecPtrIsInvalid() and direct comparisons
with InvalidXLogRecPtr.
This commit replaces those uses with XLogRecPtrIsValid() for better
readability and consistency.
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Xiaopeng Wang <wxp_728@163.com>
Reviewed-by: Amul Sul <sulamul@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CALDaNm16knMFtcqyAG3XYSkyagmVXfhaR0T=hau8UTAU0+eLQQ@mail.gmail.com
The ssl_sni and hosts_file GUCs were missing from the configuration
section of the documentation, they were only described in the main
SSL SNI subsection. This adds the GUCs to the relevant sections as
well as rewords the existing SSL SNI documentation to refer to the
settings along with a few smaller fixups.
Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwESD2Pty+J1kP3mXmWwMKZ5uJmknZdJsSGrMSRR6CQBmw@mail.gmail.com
Calling an undeclared function should be an error as of C99, and GCC
and Clang do that, but MSVC doesn't even warn about it in the default
warning level. (Commit c86d2ccdb3 fixed an instance of this
problem.) This turns on this warning and makes it an error by
default, to match other compilers.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/1142ad86-e475-41b3-aeee-c6ad913064fa%40eisentraut.org
* JOHAB: replace the incorrect "simplified Chinese" description with
a correct one that identifies it as the Korean combining (Johab)
encoding standardized in KS X 1001 annex 3.
* EUC_KR: drop a stray space before the comma in the existing
comment, and note that the encoding covers the KS X 1001
precomposed (Wansung) form.
* UHC: spell out "Unified Hangul Code", clarify that it is
Microsoft Windows CodePage 949, and describe its relationship to
EUC-KR (superset covering all 11,172 precomposed Hangul syllables).
Backpatch-through: 14
Author: Henson Choi <assam258@gmail.com>
Discussion: https://postgr.es/m/CAAAe_zAFz1v-3b7Je4L%2B%3DwZM3UGAczXV47YVZfZi9wbJxspxeA%40mail.gmail.com
overexplain_range_table() emitted the "Unprunable RTIs" and "Result
RTIs" properties before closing the "Range Table" group. In the JSON
and YAML formats the Range Table group is rendered as an array of RTE
objects, so emitting key/value pairs inside it produced structurally
invalid output. The XML format had a related oddity, with these
elements nested inside <Range-Table> rather than appearing as its
siblings.
These fields are properties of the PlannedStmt as a whole, not of any
individual RTE, so close the Range Table group before emitting them.
They now appear as siblings of "Range Table" in the parent Query
object, which is what was intended.
Also add a test exercising FORMAT JSON with RANGE_TABLE so that any
future regression in the output structure is caught.
Reported-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDdDrdqMr98a_OBYDYmK3RaT7XwCEShZfvDYKZpZTfOEjQ@mail.gmail.com
Backpatch-through: 18
The comment on the return-false path when both UNION siblings are
exhausted said "there are more rows," which is the opposite of what
the code does. The code itself is correct, returning false to signal
no more rows, but the misleading comment could tempt a reader into
"fixing" the return value, which would cause UNION plans to loop
indefinitely.
Back-patch to 17, where JSON_TABLE was introduced.
Author: Chuanwen Hu <463945512@qq.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/tencent_4CC6316F02DECA61ACCF22F933FEA5C12806@qq.com
Backpatch-through: 17
Previously, when the walreceiver exited from WalRcvWaitForStartPosition()
at the startup process's request, it called exit(1) directly. This could
skip cleanup performed by the callback functions.
This commit makes the walreceiver to use proc_exit() instead, ensuring
normal cleanup is executed on exit.
Also this commit updates comments describing walreceiver termination.
Apply to master only, as this has not caused practical issues so far.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/74381238-4E8A-4621-B794-57025DCCE0BA@gmail.com
COPY TO with FORMAT json was including generated columns in the
output, unlike TEXT and CSV formats. Virtual generated columns
appeared as null, and stored ones showed their computed values.
The JSON code path only built a restricted TupleDesc when an explicit
column list was given (attnamelist != NIL), but CopyGetAttnums()
also excludes generated columns from the default list. Fix by
checking whether the attnumlist is shorter than the full TupleDesc
instead.
Bug introduced in 7dadd38cda.
Author: Satya Narlapuram <satya.narlapuram@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDcfpGDoPL3fvfjXRtfn=fny6DdJR6BAy6TpS1Xj2EZfXA@mail.gmail.com
Commit 095c9d4cf06 added errdetail() reporting of the PID and UID of
the process that sent a termination signal. However, as noted by
Andres Freund, the implementation had architectural problems:
1. wrapper_handler() in pqsignal.c contained SIGTERM-specific logic
(setting ProcDieSenderPid/Uid), violating its role as a generic
signal dispatch wrapper.
2. Using globals to pass sender info between wrapper_handler and the
real handler is unsafe when signals nest on some platforms.
3. The syncrep.c errdetail used psprintf() to conditionally embed
text via %s, breaking translatability.
Adopt the approach proposed by Andres Freund: introduce a
pg_signal_info struct that is passed as an argument to all signal
handlers via the SIGNAL_ARGS macro. wrapper_handler populates it
from siginfo_t when SA_SIGINFO is available, or with zeros otherwise.
This keeps wrapper_handler fully generic and avoids any globals for
passing signal metadata.
Since pqsigfunc now has a different signature from the system's
signal handler type, SIG_IGN and SIG_DFL can no longer be passed
directly to pqsignal(). Introduce PG_SIG_IGN and PG_SIG_DFL macros
that cast to the new pqsigfunc type, and update all call sites.
The legacy pqsignal() in libpq retains its original signature via
a local typedef.
Only die() reads pg_siginfo today, copying the sender PID/UID into
ProcDieSenderPid/Uid for later use by ProcessInterrupts(). Only the
first SIGTERM's sender info is recorded.
Also fix the syncrep.c translatability issue by using separate ereport
calls with complete, independently translatable errdetail strings.
Also make the psql TAP test require the DETAIL line on platforms with
SA_SIGINFO, rather than making it unconditionally optional.
On Windows, pg_signal_info uses uint32_t for pid and uid fields
since pid_t/uid_t are not available early enough in the include
chain. The Windows signal dispatch in pgwin32_dispatch_queued_signals()
passes a zeroed pg_signal_info to handlers.
Author: Andres Freund <andres@anarazel.de>
Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/cwyyryh2veejuxbj5ifzyaejw7jhhqc5mrdeq56xckknsdecn2@6hzfcxde2nm5
Discussion: https://postgr.es/m/jygesyr7mwg7ovdbxpmjvvbi3hccptpkcreqb645h7f56puwbz@hmkkwi3melfe
The NOTNULL_SOURCE_SYSCACHE code path in var_is_nonnullable() used
get_attnotnull() to check pg_attribute.attnotnull, which is true for
both valid and invalid (NOT VALID) NOT NULL constraints. An invalid
constraint does not guarantee the absence of NULLs, so this could lead
to incorrect results. For example, query_outputs_are_not_nullable()
could wrongly conclude that a subquery's output is non-nullable,
causing NOT IN to be incorrectly converted to an anti-join.
Fix by checking the attnullability field in the relation's tuple
descriptor instead, which correctly distinguishes valid from invalid
constraints, consistent with what the NOTNULL_SOURCE_HASHTABLE code
path already does.
While at it, rename NOTNULL_SOURCE_SYSCACHE to NOTNULL_SOURCE_CATALOG
to reflect that this code path no longer uses a syscache lookup, and
remove the now-unused get_attnotnull() function.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/CAMbWs48ALW=mR0ydQ62dGS-Q+3D7WdDSh=EWDezcKp19xi=TUA@mail.gmail.com
DatumGetArrayTypeP() can return a pointer into the tuple when the
datum is stored as a short varlena, so pfree() on the result crashes.
Use DatumGetArrayTypePCopy() to always get a palloc'd copy.
Bug introduced in 76e514ebb4 and a4f774cf1c.
Reported-by: Jeff Davis <pgsql@j-davis.com>
Author: Satya Narlapuram <satya.narlapuram@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDdWtv9PKtPZEokwGCNtbv4MVnfYw5wMZrsEj4xizSNe5Q@mail.gmail.com
The goal of this module is to provide an entry point for the coverage of
the low-level compression and decompression PGLZ routines. The new test
is moved to a new parallel group, with all the existing
compression-related tests added to it.
This includes tests for the cases detected by fuzzing that emulate
corrupted compressed data, as fixed by 2b5ba2a0a1:
- Set control bit with read of a match tag, where no data follows.
- Set control bit with read of a match tag, where 1 byte follows.
- Set control bit with match tag where length nibble is 3 bytes
(extended case).
While on it, some tests are added for compress/decompress roundtrips,
and for check_complete=false/true. Like 2b5ba2a0a1, backpatch to all
the stable branches.
Discussion: https://postgr.es/m/adw647wuGjh1oU6p@paquier.xyz
Backpatch-through: 14
It turns out that our main regression test suite queries tables upon
which concurrent DDL is occurring, which can, rarely, cause
test_plan_advice failures. We're not quite ready to fix that problem
just yet, because we want to gather some more information about how
often it actually happens first. But, our plan is going to require
test_plan_advice to access a few bits of pg_plan_advice that have
been considered internal up until now, so this commit rejiggers
things to expose those bits.
First, test_plan_advice is going to need to be able to interpret
the PGPA_TE_* constants which have been declared in pgpa_trove.h.
The "TE" stands for "trove entry" but that's kind of a silly name;
change the naming to "FB" (for "feedback") and move the declarations
to pg_plan_advice.h, which is a header file that's already installed.
This has the side benefit of making these constants available to any
other extensions that may want to examine plan advice feedback.
Second, test_plan_advice is going to call pgpa_planner_feedback_warning,
so make that function non-static and mark it PGDLLEXPORT.
Discussion: http://postgr.es/m/CA+TgmobOOmmXSJz3e+cjTY-bA1+W0dqVDqzxUBEvGtW62whYGg@mail.gmail.com
If a subquery is proven empty, and if that subquery contained a
semijoin, and if making one side or the other of that semijoin
unique and performing an inner join was a possible strategy, then
the previous code would fail with ERROR: no rtoffset for plan %s
when attempting to generate advice. Fix that.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: http://postgr.es/m/CA+TgmobOOmmXSJz3e+cjTY-bA1+W0dqVDqzxUBEvGtW62whYGg@mail.gmail.com
When a tablesample routine says that it is not repeatable across
scans, set_tablesample_rel_pathlist will (usually) materialize it,
confusing pg_plan_advice's plan walker machinery. To fix, update that
machinery to view such Material paths as essentially an extension of
the underlying scan.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: http://postgr.es/m/CA+TgmobOOmmXSJz3e+cjTY-bA1+W0dqVDqzxUBEvGtW62whYGg@mail.gmail.com
Previously we were relying on a snapshot-based check to detect invalid
execution contexts. However, when WAIT FOR is wrapped into a stored
procedure or a DO block, it could pass this check, causing an error
elsewhere.
This commit implements an explicit isTopLevel check to reject WAIT FOR
when called from within a function, procedure, or DO block. The
isTopLevel check catches these cases early with a clear error message,
matching the pattern used by other utility commands like VACUUM and
REINDEX. The snapshot check is retained for the remaining case:
top-level execution within a transaction block using an isolation level
higher than READ COMMITTED.
Also adds tests for WAIT FOR LSN wrapped in a procedure and DO block,
complementing the existing test that uses a function wrapper. Relevant
documentation paragraph is also added.
Reported-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/CAHg%2BQDcN-n3NUqgRtj%3DBQb9fFQmH8-DeEROCr%3DPDbw_BBRKOYA%40mail.gmail.com
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Consistent with existing psql metadata display conventions, update the
description tags for EXCEPT publications to use lowercase for the second
word (e.g., "Except tables" instead of "Except Tables"). This aligns the
output style with other publication describe commands.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CAHut+Pt3t_tCYwDStkj5fG4Z=YXrHvPBA7iGdh745QipC5zKeg@mail.gmail.com
The slotsync worker was incorrectly identifying no-op states as successful
updates, triggering a busy loop to sync slots that logged messages every
200ms. This patch corrects the logic to properly classify these states,
enabling the worker to respect normal sleep intervals when no work is
performed.
Reported-by: Fujii Masao <masao.fujii@gmail.com>
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/CAHGQGwF6zG9Z8ws1yb3hY1VqV-WT7hR0qyXCn2HdbjvZQKufDw@mail.gmail.com
The data given in input of the function may not be null-terminated,
causing strlcpy() to complain with an invalid read.
Issue spotted using valgrind.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/09df9d75-13e7-45fe-89af-33fe118e797b@gmail.com
... and bms_prev_member().
Both of these functions won't work correctly when given a prevbit of
INT_MAX and would crash when operating on a Bitmapset that happened to
have a member with that value.
Here we fix that by using an unsigned int to calculate which member to
look for next.
I've also adjusted bms_prev_member() to check for < 0 rather than == -1
for starting the loop. This was done as it's safer and comes at zero
extra cost.
With our current use cases, it's likely impossible to have a Bitmapset
with an INT_MAX member, so no backpatch here. I only noticed this issue
when working on a bms function to bitshift a Bitmapset.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAApHDvr1B2gbf6JF69QmueM2QNRvbQeeKLxDnF=w9f9--022uA@mail.gmail.com
6d0eba662 already did most of the changes, but some new ones snuck in
just prior to that commit, so these got missed.
Having these short-lived StringInfoDatas on the stack rather than having
them get palloc'd by makeStringInfo() is simply for performance as it
saves doing a 2nd palloc.
Since this code is new to v19, it makes sense to improve it now rather
than wait until we branch as having v19 and v20 differ here just makes it
harder to backpatch fixes in this area.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/adt4wpj4FZwR+S7I@ip-10-97-1-34.eu-west-3.compute.internal
Three routines in pgstat_database.c incorrectly ignore the database OID
provided by their caller, using MyDatabaseId instead:
- pgstat_report_connect()
- pgstat_report_disconnect()
- pgstat_reset_database_timestamp()
The first two functions, for connection and disconnection, each have a
single caller that already passes MyDatabaseId. This was harmless,
still incorrect.
The timestamp reset function also has a single caller, but in this case
the issue has a real impact: it fails to reset the timestamp for the
shared-database entry (datid=0) when operating on shared objects. This
situation can occur, for example, when resetting counters for shared
relations via pg_stat_reset_single_table_counters().
There is currently one test in the tree that checks the reset of a
shared relation, for pg_shdescription, we rely on it to check what is
stored in pg_stat_database. As stats_reset may be NULL, two resets are
done to provide a baseline for comparison.
Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Dapeng Wang <wangdp20191008@gmail.com>
Discussion: https://postgr.es/m/ABBD5026-506F-4006-A569-28F72C188693@gmail.com
Backpatch-through: 15
When a nested set operation's output type doesn't match the parent's
expected type, recurse_set_operations builds a projection target list
using generate_setop_tlist with varno 0. If the required type
coercion involves an ArrayCoerceExpr, estimate_array_length could be
called on such a Var, and would pass it to examine_variable, which
errors in find_base_rel because varno 0 has no valid relation entry.
Fix by skipping the statistics lookup for Vars with varno 0.
Bug introduced by commit 9391f7152. Back-patch to v17, where
estimate_array_length was taught to use statistics.
Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/adjW8rfPDkplC7lF@pryzbyj2023
Backpatch-through: 17
The test in test_autovacuum was unstable because it called
log_contains() immediately after verifying autovacuum_count in
pg_stat_user_tables. This created a race condition where the
statistics could be updated before the autovacuum logs were fully
flushed to disk.
This commit replaces log_contains() with wait_for_log() to ensure the
test waits for the parallel vacuum messages to appear. Additionally,
remove the checks of the autovacuum count. Verifying the log messages
is sufficient to confirm parallel autovacuum behavior, as logging is
only enabled for the specific table under test.
Per report from buildfarm member flaviventris.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/525d0f48-93f7-493f-a988-f39b460a79bc@gmail.com
Use consistent phrasing for parallel vacuum descriptions between
manual VACUUM and autovacuum. Specifically, clarify that the parallel
worker count is limited by the respective options only if they are
explicitly specified.
Also, fix a typo in the parallel vacuum section.
Author: Aleksander Alekseev <aleksander@tigerdata.com>
Discussion: https://postgr.es/m/CAJ7c6TPcSqzhbhrsiCMmVwmE8F7pwS7i9J49SP1zPKS_ER+vcA@mail.gmail.com
Commit 21b018e7ea lowered some logical decoding messages from LOG to DEBUG1.
However, per discussion on pgsql-hackers, messages from background activity
(e.g., walsender or slotsync worker) should remain at LOG, as they are less
frequent and more likely to indicate issues that DBAs should notice.
For foreground SQL functions (e.g., pg_logical_slot_peek_binary_changes()),
keep these messages at DEBUG1 to avoid excessive log noise. They can still be
enabled by lowering client_min_messages or log_min_messages for the session.
This commit updates logical decoding to log these messages at LOG for
background activity and at DEBUG1 for foreground execution.
Suggested-by: Robert Haas <robertmhaas@gmail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CA+TgmoYsu2+YAo9eLGkDp5VP-pfQ-jOoX382vS4THKHeRTNgew@mail.gmail.com
When decoding a match tag, pglz_decompress() reads 2 bytes (or 3
for extended-length matches) from the source buffer before checking
whether enough data remains. The existing bounds check (sp > srcend)
occurs after the reads, so truncated compressed data that ends
mid-tag causes a read past the allocated buffer.
Fix by validating that sufficient source bytes are available before
reading each part of the match tag. The post-read sp > srcend
check is no longer needed and is removed.
Found by fuzz testing with libFuzzer and AddressSanitizer.
When the incremental JSON parser splits a numeric token across chunk
boundaries, it accumulates continuation characters into the partial
token buffer. The accumulator's switch statement unconditionally
accepted '+', '-', '.', 'e', and 'E' as valid numeric continuations
regardless of position, which violated JSON number grammar
(-? int [frac] [exp]). For example, input "4-" fed in single-byte
chunks would accumulate the '-' into the numeric token, producing an
invalid token that later triggered an assertion failure during
re-lexing.
Fix by tracking parser state (seen_dot, seen_exp, prev character)
across the existing partial token and incoming bytes, so that each
character class is accepted only in its grammatically valid position.
The test added in 980c1a85d8 covered reordered FK columns with
different types, which triggered an "operator not a member of opfamily"
error in the fast-path prior to that commit. Add a test for the
same-type case, which is also fixed by that commit but where the wrong
scan key ordering instead produced a spurious FK violation without any
internal error.
Reported-by: Fredrik Widlert <fredrik.widlert@digpro.se>
Discussion: https://postgr.es/m/CADfhSr8hYc-4Cz7vfXH_oV-Jq81pyK9W4phLrOGspovsg2W7Kw@mail.gmail.com
The static variable afterTriggerFiringDepth introduced by commit
5c54c3ed1b is logically part of the after-trigger state and in
hindsight should have been a field in AfterTriggersData alongside
query_depth and the other per-transaction after-trigger state.
Move it there as firing_depth. Also update its comment to
accurately reflect its sole remaining purpose: signaling to
AfterTriggerIsActive() that after-trigger firing is active.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFt4NGTNk7BinOsHHM48E9zGAa852vCfGoSe1bbL=JNFQ@mail.gmail.com
var_is_nonnullable() failed to consider varreturningtype, which meant
it could incorrectly claim a Var is non-nullable based on a column's
NOT NULL constraint even when the Var refers to a non-existent row.
Specifically, OLD.col is NULL for INSERT (no old row exists) and
NEW.col is NULL for DELETE (no new row exists), regardless of any NOT
NULL constraint on the column.
This caused the planner's constant folding in eval_const_expressions
to incorrectly simplify IS NULL / IS NOT NULL tests on such Vars. For
example, "old.a IS NULL" in an INSERT's RETURNING clause would be
folded to false when column "a" has a NOT NULL constraint, even though
the correct result is true.
Fix by returning false from var_is_nonnullable() when varreturningtype
is not VAR_RETURNING_DEFAULT, since such Vars can be NULL regardless
of table constraints.
Author: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDfaAipL6YzOq2H=gAhKBbcUTYmfbAv+W1zueOfRKH43FQ@mail.gmail.com
The fast-path foreign key check introduced in 2da86c1ef9 assumed that
constraint key positions directly correspond to index column positions.
This is not always true as a FK constraint can reference PK columns in a
different order than they appear in the PK's unique index.
For example, if the PK is (a, b, c) and the FK references them as
(a, c, b), the constraint stores keys in the FK-specified order, but
the index has columns in PK order. The buggy code used the constraint
key index to access rd_opfamily[i], which retrieved the wrong operator
family when columns were reordered, causing "operator X is not a member
of opfamily Y" errors.
After fixing the opfamily lookup, a second issue started to happen:
btree index scans require scan keys to be ordered by attribute number.
The code was placing scan keys at array position i with attribute number
idx_attno, producing out-of-order keys when columns were swapped. This
caused "btree index keys must be ordered by attribute" errors.
The fix adds an index_attnos array to FastPathMeta that maps each
constraint key position to its corresponding index column position.
In ri_populate_fastpath_metadata(), we search indkey to find the actual
index column for each pk_attnums[i] and use that position for the
opfamily lookup. In build_index_scankeys(), we place each scan key at
the array position corresponding to its index column
(skeys[idx_attno-1]) rather than at the constraint key position,
ensuring scan keys are properly ordered by attribute number as btree
requires.
Reported-by: Fredrik Widlert <fredrik.widlert@digpro.se>
Author: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://www.postgresql.org/message-id/CADfhSr-pCkbDxmiOVYSAGE5QGjsQ48KKH_W424SPk%2BpwzKZFaQ%40mail.gmail.com
When a C-language function uses SPI_connect/SPI_execute/SPI_finish to
INSERT into a table with FK constraints, the FK AFTER triggers fire
and schedule ri_FastPathEndBatch via
RegisterAfterTriggerBatchCallback(), opening PK relations under
CurrentResourceOwner at the time of the SPI call. The query_depth > 0
guard in FireAfterTriggerBatchCallbacks suppresses the callback at
that nesting level, deferring teardown to the outer query's
AfterTriggerEndQuery. By then the resource owner active during the SPI
call may have been released, decrementing the cached relations'
refcounts to zero. ri_FastPathTeardown, running under the outer
query's resource owner, then crashes in assert builds when it attempts
to close relations whose refcounts are already zero:
TRAP: failed Assert("rel->rd_refcnt > 0")
Fix by storing batch callbacks at the level where they should fire:
in AfterTriggersQueryData.batch_callbacks for immediate constraints
(fired by AfterTriggerEndQuery) and in AfterTriggersData.batch_callbacks
for deferred constraints (fired by AfterTriggerFireDeferred and
AfterTriggerSetState). RegisterAfterTriggerBatchCallback() routes the
callback to the current query-level list when query_depth >= 0, and to
the top-level list otherwise. FireAfterTriggerBatchCallbacks() takes a
list parameter and simply iterates and invokes it; memory cleanup is
handled by the caller. This replaces the query_depth > 0 guard with
list-level scoping. Note that deferred constraints are unaffected by
this bug: their callbacks fire at commit via AfterTriggerFireDeferred,
under the outer transaction's resource owner, which remains valid
throughout.
Also add firing_batch_callbacks to AfterTriggersData to enforce that
callbacks do not register new callbacks during
FireAfterTriggerBatchCallbacks(), which would be unsafe as it could
modify the list being iterated. An Assert in
RegisterAfterTriggerBatchCallback() enforces this discipline for
future callers. The flag is reset at transaction and subtransaction
boundaries to handle cases where an error thrown by a callback is
caught and the subtransaction is rolled back.
While at it, ensure callbacks are properly accounted for at all
transaction boundaries, as cleanup of b7b27eb41a: discard any
remaining top-level callbacks on both commit and abort in
AfterTriggerEndXact(), and clean up query-level callbacks in
AfterTriggerFreeQuery().
Note that ri_PerformCheck() calls SPI with fire_triggers=false, which
skips AfterTriggerBeginQuery/EndQuery for that SPI command. Any
triggers queued during that SPI command are not fired immediately but
deferred to the outer query level. Since the fast-path check for
those triggers runs under the outer query's resource owner rather than
a nested SPI resource owner, and ri_PerformCheck() does not create
a dedicated child resource owner, the bug described above does not
apply.
Reported-by: Evan Montgomery-Recht <montge@mianetworks.net>
Reported-by: Sandro Santilli <strk@kbt.io>
Analyzed-by: Evan Montgomery-Recht <montge@mianetworks.net>
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g@mail.gmail.com
InjectionPointAttach() did not initialize the private_data buffer of the
shared memory entry before (perhaps partially) overwriting it. When the
private data is set to NULL by the caler, the buffer was left
uninitialized. If set, it could have stale contents.
The buffer is initialized to zero, so as the contents recorded when a
point is attached are deterministic.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0tsGHu2h6YLnVu4HiK05q+gTE_9WVUAqihW2LSscAYS-g@mail.gmail.com
Backpatch-through: 17
Presently, relation_needs_vacanalyze() unconditionally frees the
pgstat entry returned by pgstat_fetch_stat_tabentry_ext(). This
behavior was first added by commit 02502c1bca to avoid memory
leakage in autovacuum. While this is fine for autovacuum since it
forces stats_fetch_consistency to "none", it is not okay for other
callers that use "cache" or "snapshot". This manifests as a
double-free when pg_stat_autovacuum_scores is called multiple times
in the same transaction.
To fix, add a "bool *may_free" parameter to
pgstat_fetch_stat_tabentry_ext() that returns whether it is safe
for the caller to explicitly pfree() the result. If a caller would
rather leave it to the memory context machinery to free the result,
it can pass NULL as the "may_free" argument (or just ignore its
value).
Oversight in commit 87f61f0c82.
Reported-by: Tender Wang <tndrwang@gmail.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAHewXNkJKdwb3D5OnksrdOqzqUnXUEMpDam1TPW0vfUkW%3D7jUw%40mail.gmail.com
Discussion: https://postgr.es/m/5684f479-858e-4c5d-b8f5-bcf05de1f909%40gmail.com
The test 001_parallel_autovacuum.pl verified that vacuum delay
parameters are propagated to parallel vacuum workers by using
injection points. It previously waited for autovacuum to complete on
the test_autovac table. However, since injection points are
cluster-wide, an autovacuum worker could be triggered on tables in
other databases (e.g., template1) and get stuck at the same injection
point. This could lead to a timeout when the test waits for the
expected table's autovacuum to finish.
This commit removes the wait for autovacuum completion from this
specific test case. Since the primary goal is to verify the
propagation of parameter updates, which is already confirmed via log
messages, waiting for the entire vacuum process to finish is
unnecessary and prone to instability in concurrent test environments.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0s+kZZRMSF4HW7tZ9W2jS1o4B+Fg8dr5a-T6mANX+mdQA@mail.gmail.com
This restricts the retrieval of the TSC frequency whilst under a Hypervisor to
either Hypervisor-specific CPUID registers (0x40000010), or TSC
calibration. We previously allowed retrieving from the traditional CPUID
registers for TSC frequency (0x15/0x16) like on bare metal, but it turns out
that they are not trustworthy when virtualized and can report wildly incorrect
frequencies, like 7 kHz when the actual calibrated frequencty is 2.5 GHz.
Per report from buildfarm member drongo.
Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/jr4hk2sxhqcfpb67ftz5g4vw33nm67cgf7go3wwmqsafu5aclq%405m67ukuhyszz
This logging level means not to emit the log, which is useful for
functions like relation_needs_vacanalyze(). This function accepts
a log level argument but not all callers want it to emit logs.
Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3101163.1775676098%40sss.pgh.pa.us
In nodeWindowAgg.c, the calculations for frame start and end positions
in ROWS and GROUPS modes were performed using simple integer addition.
If a user-supplied offset was sufficiently large (close to INT64_MAX),
adding it to the current row or group index could cause a signed
integer overflow, wrapping the result to a negative number.
This led to incorrect behavior where frame boundaries that should have
extended indefinitely (or beyond the partition end) were treated as
falling at the first row, or where valid rows were incorrectly marked
as out-of-frame. Depending on the specific query and data, these
overflows can result in incorrect query results, execution errors, or
assertion failures.
To fix, use overflow-aware integer addition (ie, pg_add_s64_overflow)
to check for overflows during these additions. If an overflow is
detected, the boundary is now clamped to INT64_MAX. This ensures the
logic correctly treats the boundary as extending to the end of the
partition.
Bug: #19405
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/19405-1ecf025dda171555@postgresql.org
Backpatch-through: 14
When pulling up a subquery, its targetlist items may be wrapped in
PlaceHolderVars to enforce separate identity or as a result of outer
joins. This causes any upper-level WHERE clauses referencing these
outputs to contain PlaceHolderVars, which prevents partprune.c from
recognizing that they match partition key columns, defeating partition
pruning.
To fix, strip PlaceHolderVars from operands before comparing them to
partition keys. A PlaceHolderVar with empty phnullingrels appearing
in a relation-scan-level expression is effectively a no-op, so
stripping it is safe. This parallels the existing treatment in
indxpath.c for index matching.
In passing, rename strip_phvs_in_index_operand() to strip_noop_phvs()
and move it from indxpath.c to placeholder.c, since it is now a
general-purpose utility used by both index matching and partition
pruning code.
Back-patch to v18. Although this issue exists before that, changes in
that version made it common enough to notice. Given the lack of field
reports for older versions, I am not back-patching further. In the
v18 back-patch, strip_phvs_in_index_operand() is retained as a thin
wrapper around the new strip_noop_phvs() to avoid breaking third-party
extensions that may reference it.
Reported-by: Cándido Antonio Martínez Descalzo <candido@ninehq.com>
Diagnosed-by: David Rowley <dgrowleyml@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAH5YaUwVUWETTyVECTnhs7C=CVwi+uMSQH=cOkwAUqMdvXdwWA@mail.gmail.com
Backpatch-through: 18
The function looped over ii_NumIndexKeyAttrs elements of the skeys
array, but one caller (ri_FastPathFlushArray) passes a one-element
array since it only handles single-column FKs. The function
signature did not communicate this constraint, which static analysis
flags as a potential out-of-bounds read.
Add an nkeys parameter and assert that it matches
ii_NumIndexKeyAttrs, then use it in the loop. The call sites
already know the key count.
Reported-by: Evan Montgomery-Recht <montge@mianetworks.net>
Discussion: https://postgr.es/m/CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g@mail.gmail.com
ee642cccc4 has added syscache.h in inval.h and objectaddress.h,
enlarging by a lot the footprint of this header, particularly via
objectaddress.h. A change in syscache.h would cause a lot more files to
be recompiled.
This commit reduces the presence of syscache.h by switching to a direct
use of syscache_ids.h in inval.h and objectaddress.h, where the enum
SysCacheIdentifier is defined. genbki.pl gains an #ifndef block for
this header, so as its inclusion is more controlled.
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/vlcexdcimsmvu3aplt2yxpfndkgtuvjsrms2fdl46rbw3k2kug@drspkoxlaije
Commit f19c0eccae changed the data_checksums GUC datatype from a
boolean to an enum. This updates the documentation to accurately
reflect its new type and document the new possible states: 'on',
'off', 'inprogress-on', and 'inprogress-off'.
Also update the xref for more information to point to the section
on data checksums rather than the initdb checksum option.
Author: Lakshmi N <lakshmin.jhs@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CA+3i_M-AtTnqTB2KLBTpu-c-jvnTuy7bGxyxs80rgiQLxWrRUQ@mail.gmail.com
Our RADIUS implementation supported only the deprecated RADIUS/UDP
variant, without the recommended Message-Authenticator attribute to
mitigate against the Blast-RADIUS vulnerability. By now, popular RADIUS
servers are expected to generate loud warnings or reject our
authentication attempts outright.
Since there have been no user reports about this, it seems unlikely that
there are users.
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Michael Banck <mbanck@gmx.net>
Discussion: https://postgr.es/m/CA%2BhUKG%2BSH309V8KECU5%3DxuLP9Dks0v9f9UVS2W74fPAE5O21dg%40mail.gmail.com
Add a new FDW callback routine that allows importing remote statistics
for a foreign table directly to the local server, instead of collecting
statistics locally. The new callback routine is called at the beginning
of the ANALYZE operation on the table, and if the FDW failed to import
the statistics, the existing callback routine is called on the table to
collect statistics locally.
Also implement this for postgres_fdw. It is enabled by "restore_stats"
option both at the server and table level. Currently, it is the user's
responsibility to ensure remote statistics to import are up-to-date, so
the default is false.
Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/CADkLM%3DchrYAx%3DX2KUcDRST4RLaRLivYDohZrkW4LLBa0iBhb5w%40mail.gmail.com
The size of the I/O worker pool used to implement io_method=worker was
previously controlled by the io_workers setting, defaulting to 3. It
was hard to know how to tune it effectively. That is replaced with:
io_min_workers=2
io_max_workers=8 (up to 32)
io_worker_idle_timeout=60s
io_worker_launch_interval=100ms
The pool is automatically sized within the configured range according to
recent variation in demand. It grows when existing workers detect that
latency might be introduced by queuing, and shrinks when the
highest-numbered worker is idle for too long. Work was already
concentrated into low-numbered workers in anticipation of this logic.
The logic for waking extra workers now also tries to measure and reduce
the number of spurious wakeups, though they are not entirely eliminated.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2Bm4xV0LMoH2c%3DoRAdEXuCnh%2BtGBTWa7uFeFMGgTLAw%2BQ%40mail.gmail.com
The vectorized path in commit fbc57f2bc had a side effect of putting
more branches in the path taken for small inputs. To reduce risk
of regressions, only proceed with the vectorized path if we can
guarantee that the remaining input after the alignment preamble is
greater than 64 bytes. That also allows removing length checks in
the alignment preamble.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/CANWCAZZ48GuLYhJCcTy8TXysjrMVJL6n1n7NP94=iG+t80YKPw@mail.gmail.com
Since we have dropped MULE_INTERNAL, add a check that all encodings used
in the source cluster are still supported according to
PG_ENCODING_BE_VALID(). This is done generically, in case we decide to
drop another encoding some day.
Suggested-by: Jeff Davis <pgsql@j-davis.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKGKXDXh-FdU0orjfv%2BF08f%3DD91BhV3Ra-4zL-q%2BJmGYqTA%40mail.gmail.com
This was useful before widespread Unicode adoption, and was based on the
internal encoding Emacs used to mix multiple sub-encodings. Emacs
itself has stopped using it, and our implementation hadn't been updated
with modern underlying standards. It is thought to be very unlikely
that anyone is still using it in the field. Since such a complex
encoding comes with costs and risks, we agreed to drop support.
Any existing database using this encoding would need to be dumped and
restored with a new encoding to upgrade to PostgreSQL 19, most likely
UTF8, since pg_upgrade would fail.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tatsuo Ishii <ishii@postgresql.org>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/CA%2BhUKGKXDXh-FdU0orjfv%2BF08f%3DD91BhV3Ra-4zL-q%2BJmGYqTA%40mail.gmail.com
Until now extensions that wanted to measure overall query execution could
create QueryDesc->totaltime, which the core executor would then start and
stop. That's a bit odd and composes badly, e.g. extensions always had to use
INSTRUMENT_ALL, because otherwise another extension might not get what they
need.
Instead this introduces a new field, QueryDesc->query_instr_options, that
extensions can use to indicate whether they need query level instrumentation
populated, and with which instrumentation options. Extensions should take care
to only add options they need, instead of replacing the options of others.
The prior name of the field, totaltime, sounded like it would only measure
time, but these days the instrumentation infrastructure can track more
resources. The secondary benefit is that this will make it obvious to
extensions that they may not create the Instrumentation struct themselves
anymore (often extensions build only against a postgres build without
assertions).
Adjust pg_stat_statements and auto_explain to match, and lower the
requested instrumentation level for auto_explain to INSTRUMENT_TIMER,
since the summary instrumentation it needs is only runtime.
The reason to push this now, rather in the PG 20 cycle, is that 5a79e78501
already required extensions using query level instrumentations to adjust their
code, and it seemed undesirable to require them to do so again for 20.
Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAP53Pkyqsht+exJQYRsjhSWYKu+vFGHhPub7m6PmFD6Or0=p1g@mail.gmail.com
Previously, on standby promotion, the startup process sent SIGUSR1 to
the slotsync worker (or a backend performing slot synchronization) and
waited for it to exit. This worked in most cases, but if the process was
blocked waiting for a response from the primary (e.g., due to a network
failure), SIGUSR1 would not interrupt the wait. As a result, the process
could remain stuck, causing the startup process to wait for a long time
and delaying promotion.
This commit fixes the issue by introducing a new procsignal reason,
PROCSIG_SLOTSYNC_MESSAGE. On promotion, the startup process
sends this signal, and the handler sets interrupt flags so the process
exits (or errors out) promptly at CHECK_FOR_INTERRUPTS(), allowing
promotion to complete without delay.
Backpatch to v17, where slotsync was introduced.
Author: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwFzNYroAxSoyJhqTU-pH=t4Ej6RyvhVmBZ91Exj_TPMMQ@mail.gmail.com
Backpatch-through: 17
This moves the implementation of ExecProcNodeInstr, the ExecProcNode variant
that gets used when instrumentation is on, to be defined in instrument.c
instead of execProcNode.c, and marks functions it uses as inline.
This allows compilers to generate an optimized implementation, and shows a 4
to 12% reduction in instrumentation overhead for queries that move lots of
rows.
Author: Lukas Fittl <lukas@fittl.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAP53PkzdBK8VJ1fS4AZ481LgMN8f9mJiC39ZRHqkFUSYq6KWmg@mail.gmail.com
Adds support for EXPLAIN (IO) instrumentation for TidRange scans. This
requires adding shared instrumentation for parallel scans, using the
separate DSM approach introduced by dd78e69cfc.
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
This adds support to pg_test_timing for the different timing sources added by
294520c444.
Author: Lukas Fittl <lukas@fittl.com>
Author: David Geier <geidav.pg@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> (in an earlier version)
Discussion: https://www.postgresql.org/message-id/flat/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de
Adds support for EXPLAIN (IO) instrumentation for sequential scans. This
requires adding shared instrumentation, using the separate DSM approach
introduced by dd78e69cfc.
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
Allows collecting details about AIO / prefetch for scan nodes backed by
a ReadStream. This may be enabled by a new "IO" option in EXPLAIN, and
it shows information about the prefetch distance and I/O requests.
As of this commit this applies only to BitmapHeapScan, because that's
the only scan node using a ReadStream and collecting instrumentation
from workers in a parallel query. Support for SeqScan and TidRangeScan,
the other scan nodes using ReadStream, will be added in subsequent
commits.
The stats are collected only when required by EXPLAIN ANALYZE, with the
IO option (disabled by default). The amount of collected statistics is
very limited, but we don't want to clutter EXPLAIN with too much data.
The IOStats struct is stored in the scan descriptor as a field, next to
other fields used by table AMs. A pointer to the field is passed to the
ReadStream, and updated directly.
It's the responsibility of the table AM to allocate the struct (e.g. in
ambeginscan) whenever the flag SO_SCAN_INSTRUMENT flag is passed to the
scan, so that the executor and ReadStream has access to it.
The collected stats are designed for ReadStream, but are meant to be
reasonably generic in case a TAM manages I/Os in different ways.
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
Use unaligned output for multiple EXPLAIN queries using non-text format
in regression tests. With aligned output adding/removing explain fields
can be very disruptive, as it often modifies the whole block because of
padding. Unaligned output does not have this issue.
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
UNIQUE/PRIMARY KEY ... WITHOUT OVERLAPS requires the no-overlap
column to be a range or multirange, but it should allow a domain
over such a type too. This requires minor adjustments in both
the parser and executor.
In passing, fix a nearby break-instead-of-continue thinko in
transformIndexConstraint. This had the effect of disabling
parse-time validation of the no-overlap column's type in the context
of ALTER TABLE ADD CONSTRAINT, if it follows a dropped column.
We'd still complain appropriately at runtime though.
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxGoAmN_0iJ=hjTG0vGpOSOyy-vYyfE+-q0AWxrq2_p5XQ@mail.gmail.com
Backpatch-through: 18
This allows the direct use of the Time-Stamp Counter (TSC) value retrieved
from the CPU using RDTSC/RDTSCP instructions, instead of APIs like
clock_gettime() on POSIX systems.
This reduces the overhead of EXPLAIN with ANALYZE and TIMING ON. Tests showed
that the overhead on top of actual runtime when instrumenting queries moving
lots of rows through the plan can be reduced from 2x as slow to 1.2x as slow
compared to the actual runtime. More complex workloads such as TPCH queries
have also shown ~20% gains when instrumented compared to before.
To control use of the TSC, the new "timing_clock_source" GUC is introduced,
whose default ("auto") automatically uses the TSC when reliable, for example
when running on modern Intel CPUs, or when running on Linux and the system
clocksource is reported as "tsc". The use of the operating system clock source
can be enforced by setting "system", or on x86-64 architectures the use of TSC
can be enforced by explicitly setting "tsc".
In order to use the TSC the frequency is first determined by use of CPUID, and
if not available, by running a short calibration loop at program start,
falling back to the system clock source if TSC values are not stable.
Note, that we split TSC usage into the RDTSC CPU instruction which does not
wait for out-of-order execution (faster, less precise) and the RDTSCP
instruction, which waits for outstanding instructions to retire. RDTSCP is
deemed to have little benefit in the typical InstrStartNode() /
InstrStopNode() use case of EXPLAIN, and can be up to twice as slow. To
separate these use cases, the new macro INSTR_TIME_SET_CURRENT_FAST() is
introduced, which uses RDTSC.
The original macro INSTR_TIME_SET_CURRENT() uses RDTSCP and is supposed to be
used when precision is more important than performance. When the system timing
clock source is used both of these macros instead utilize the system
APIs (clock_gettime / QueryPerformanceCounter) like before.
Additional users of interval timing, such as track_io_timing and
track_wal_io_timing could also benefit from being converted to use
INSTR_TIME_SET_CURRENT_FAST() but are left for future changes.
Author: Lukas Fittl <lukas@fittl.com>
Author: Andres Freund <andres@anarazel.de>
Author: David Geier <geidav.pg@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com> (in an earlier version)
Reviewed-by: Maciek Sakrejda <m.sakrejda@gmail.com> (in an earlier version)
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (in an earlier version)
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> (in an earlier version)
Discussion: https://postgr.es/m/20200612232810.f46nbqkdhbutzqdg@alap3.anarazel.de
This adds additional x86 specific CPUID checks for flags needed for
determining whether the Time-Stamp Counter (TSC) is usable on a given system,
as well as a helper function to retrieve the TSC frequency from CPUID.
This is intended for a future patch that will utilize the TSC to lower the
overhead of timing instrumentation.
In passing, always make pg_cpuid_subleaf reset the variables used for its
result, to avoid accidentally using stale results if __get_cpuid_count errors
out and the caller doesn't check for it.
Author: Lukas Fittl <lukas@fittl.com>
Author: David Geier <geidav.pg@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: John Naylor <john.naylor@postgresql.org>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> (in an earlier version)
Discussion: https://www.postgresql.org/message-id/flat/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de
The timing infrastructure (INSTR_* macros) measures time elapsed using
clock_gettime() on POSIX systems, which returns the time as nanoseconds,
and QueryPerformanceCounter() on Windows, which is a specialized timing
clock source that returns a tick counter that needs to be converted to
nanoseconds using the result of QueryPerformanceFrequency().
This conversion currently happens ad-hoc on Windows, e.g. when calling
INSTR_TIME_GET_NANOSEC, which calls QueryPerformanceFrequency() on every
invocation, despite the frequency being stable after program start,
incurring unnecessary overhead. It also causes a fractured implementation
where macros are defined differently between platforms.
To ease code readability, and prepare for a future change that intends
to use a ticks-to-nanosecond conversion on x86-64 for TSC use, introduce
new pg_ticks_to_ns() / pg_ns_to_ticks() functions that get called from
INSTR_* macros on all platforms.
These functions rely on a separately initialized ticks_per_ns_scaled
value, that represents the conversion ratio. This value is initialized
from QueryPerformanceFrequency() on Windows, and set to zero on x86-64
POSIX systems, which results in the ticks being treated as nanoseconds.
Other architectures always directly return the original ticks.
To support this, pg_initialize_timing() is introduced, and is now
mandatory for both the backend and any frontend programs to call before
utilizing INSTR_* macros.
In passing, fix variable names in comment documenting INSTR_TIME_ADD_NANOSEC().
Author: Lukas Fittl <lukas@fittl.com>
Author: David Geier <geidav.pg@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://www.postgresql.org/message-id/flat/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de
OAuth validators can already use custom GUCs to configure behavior
globally. But we currently provide no ability to adjust settings for
individual HBA entries, because the original design focused on a world
where a provider covered a "single audience" of users for one database
cluster. This assumption does not apply to multitenant use cases, where
a single validator may be controlling access for wildly different user
groups.
To improve this use case, add two new API calls for use by validator
callbacks: RegisterOAuthHBAOptions() and GetOAuthHBAOption().
Registering options "foo" and "bar" allows a user to set "validator.foo"
and "validator.bar" in an oauth HBA entry. These options are stringly
typed (syntax validation is solely the responsibility of the defining
module), and names are restricted to a subset of ASCII to avoid tying
our hands with future HBA syntax improvements.
Unfortunately, we can't check the custom option names during a reload of
the configuration, like we do with standard HBA options, without
requiring all validators to be loaded via shared_preload_libraries.
(I consider this to be a nonstarter: most validators should probably use
session_preload_libraries at most, since requiring a full restart just
to update authentication behavior will be unacceptable to many users.)
Instead, the new validator.* options are checked against the registered
list at connection time.
Multiple alternatives were proposed and/or prototyped, including
extending the GUC system to allow per-HBA overrides, joining forces with
recent refactoring work on the reloptions subsystem, and giving the
ability to customize HBA options to all PostgreSQL extensions. I
personally believe per-HBA GUC overrides are the best option, because
several existing GUCs like authentication_timeout and pre_auth_delay
would fit there usefully. But the recent addition of SNI per-host
settings in 4f433025f indicates that a more general solution is needed,
and I expect that to take multiple releases' worth of discussion.
This compromise patch, then, is intentionally designed to be an
architectural dead end: simple to describe, cheap to maintain, and
providing just enough functionality to let validators move forward for
PG19. The hope is that it will be replaced in the future by a solution
that can handle per-host, per-HBA, and other per-context configuration
with the same functionality that GUCs provide today. In the meantime,
the bulk of the code in this patch consists of strict guardrails on the
simple API, to try to ensure that we don't have any reason to regret its
existence during its unknown lifespan.
I owe particular thanks here to Zsolt Parragi, who prototyped several
approaches that guided the final design.
Suggested-by: Zsolt Parragi <zsolt.parragi@percona.com>
Suggested-by: VASUKI M <vasukianand0119@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAN4CZFM3b8u5uNNNsY6XCya257u%2BDofms3su9f11iMCxvCacag%40mail.gmail.com
PGOAUTHDEBUG is a blunt instrument: you get all the debugging features,
or none of them. The most annoying consequence during manual use is the
Curl debug trace, which tends to obscure the device flow prompt
entirely. The promotion of PGOAUTHCAFILE into its own feature in
993368113 improved the situation somewhat, but there's still the
discomfort of knowing you have to opt into many dangerous behaviors just
to get the single debug feature you wanted.
Explode the PGOAUTHDEBUG syntax into a comma-separated list. The old
"UNSAFE" value enables everything, like before. Any individual unsafe
features still require the envvar to begin with an "UNSAFE:" prefix, to
try to interrupt the flow of someone who is about to do something they
should not.
So now, rather than
PGOAUTHDEBUG=UNSAFE # enable all the unsafe things
a developer can say
PGOAUTHDEBUG=call-count # only show me the call count. safe!
PGOAUTHDEBUG=UNSAFE:trace # print secrets, but don't allow HTTP
To avoid adding more build system scaffolding to libpq-oauth, implement
this entirely in a small private header. This unfortunately can't be
standalone, so it needs a headerscheck exception.
Author: Zsolt Parragi <zsolt.parragi@percona.com>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAOYmi%2B%3DfbZNJSkHVci%3DGpR8XPYObK%3DH%2B2ERRha0LDTS%2BifsWnw%40mail.gmail.com
Discussion: https://postgr.es/m/CAN4CZFMmDZMH56O9vb_g7vHqAk8ryWFxBMV19C39PFghENg8kA%40mail.gmail.com
Add a new GUC max_repack_replication_slots, which lets the user reserve
some additional replication slots for concurrent repack (and only
concurrent repack). With this, the user doesn't have to worry about
changing the max_replication_slots in order to cater for use of
concurrent repack.
(We still use the same pool of bgworkers though, but that's less
commonly a problem than slots.)
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Discussion: https://postgr.es/m/202604012148.nnnmyxxrr6nh@alvherre.pgsql
Checking for 'havePin' is sufficient here. An earlier version of the
patch didn't have the 'havePin' variable and used
'so->hashso_bucket_buf == so->currPos.buf' as the condition when both
locking and unlocking the page. The havePin variable was added later
during development, but the unlocking condition wasn't fully
updated. Tidy it up.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/b9de8d05-3b02-4a27-9b0b-03972fa4bfd3@iki.fi
When a backend is terminated via pg_terminate_backend() or an external
SIGTERM, the error message now includes the sender's PID and UID as
errdetail, making it easier to identify the source of unexpected
terminations in multi-user environments.
On platforms that support SA_SIGINFO (Linux, FreeBSD, and most modern
Unix systems), the signal handler captures si_pid and si_uid from the
siginfo_t structure. On platforms without SA_SIGINFO, the detail is
simply omitted.
Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Chao Li <1356863904@qq.com>
Discussion: https://postgr.es/m/CAKZiRmyrOWovZSdixpLd3PGMQXuQL_zw2Ght5XhHCkQ1uDsxjw@mail.gmail.com
If pg_stash_advice.persist = true, stashed advice will be written to
pg_stash_advice.tsv in the data directory, periodically and at
shutdown. On restart, stash modifications are locked out until this
file has been reloaded, but queries will not be, so there may be a
short window after startup during which previously-stashed advice is
not automatically applied.
Author: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Lukas Fittl <lukas@fittl.com>
Discussion: https://postgr.es/m/CA+Tgmob87qsWa-VugofU6epuV0H5XjWZGMbQas4Q-ADKmvSyBg@mail.gmail.com
The investigation into the negative test performance impact of 7e8aeb9e48
lead to discovering that there are a few issues with WAIT FOR.
This commit is just a minimal fix to prevent hangs in standby_flush mode, due
to WAIT FOR ... 'standby_flush' seeing a 0 LSN if a newly started walreceiver
does not receive any writes, because the stanby is already caught up.
There are several other issues and this is isn't necessarily the best fix. But
this way we get the hangs out of the way.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/zqbppucpmkeqecfy4s5kscnru4tbk6khp3ozqz6ad2zijz354k@w4bdf4z3wqoz
Remove unnecessary #ifdef guard around the function prototypes; they
are already inside a larger #ifdef block. Move #include "subsystems.h"
inside the USE_INJECTION_POINTS guard; it's needed for
InjectionPointShmemCallbacks, which is a also inside the guard.
Reported-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/87y0iz2c1v.fsf@wibble.ilmari.org
Buildfarm members which have specifically configured to use
wal_level=minimal fail the repack regression tests, which require
wal_level=replica. Add a temp config file to fix that.
Use templated qsort() so that the comparison function can be
inlined. To speed up qunique(), use a specialized comparison function
that only checks for equality.
Author: David Geier <geidav.pg@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/2a76b5ef-4b12-4023-93a1-eed6e64968f3@gmail.com
Use overflow-safe size arithmetic in the Index[Only]Scan and parallel
instrumentation functions, consistent with other executor nodes (Hash,
Sort, Agg, Memoize). This was an oversight in dd78e69cfc.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
Allocates shared bitmap table scan instrumentation for all parallel
scans. Previously, the instrumentation was only allocated for
parallel-aware scans, other bitmap heap scans in the parallel query had
no shared instrumentation and EXPLAIN didn't report exact/lossy pages.
This affected cases like scans on the outside of a parallel join or
queries run with debug_parallel_query=regress.
Fixed by allocating a separate DSM chunk for shared instrumentation and
doing so regardless of parallel-awareness. The instrumentation is
allocated in its own DSM chunk, separate from ParallelBitmapHeapState.
Report an initial patch by me. The approach with a separate DSM was
proposed and implemented by Melanie.
Not backpatched. The issue affects Postgres 18 (since 5a1e6df3b8), but
having multiple DSM chunks is possible only since dd78e69cfc. If we
decide to fix this in backbranches too, it will need to be done in a
less invasive way.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
By default, the logical decoding assumes access to shared catalogs, so
the snapshot builder needs to consider cluster-wide XIDs during startup.
That in turn means that, if any transaction is already running (and has
XID assigned), the snapshot builder needs to wait for its completion, as
it does not know if that transaction performed catalog changes earlier.
A possible problem with this concept is that if REPACK (CONCURRENTLY) is
running in some database, backends running the same command in other
databases get stuck until the first one has committed. Thus only a
single backend in the cluster can run REPACK (CONCURRENTLY) at any time.
Likewise, REPACK (CONCURRENTLY) can block walsenders starting on behalf
of subscriptions throughout the cluster.
This patch adds a new option to logical replication output plugin, to
declare that it does not use shared catalogs (i.e. catalogs that can be
changed by transactions running in other databases in the cluster). In
that case, no snapshot the backend will use during the decoding needs to
contain information about transactions running in other databases. Thus
the snapshot builder only needs to wait for completion of transactions
in the current database.
Currently we only use this option in the REPACK background worker. It
could possibly be used in the plugin for logical replication too,
however that would need thorough analysis of that plugin.
Bump WAL version number, due to a new field in xl_running_xacts.
Author: Antonin Houska <ah@cybertec.at>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/90475.1775218118@localhost
Remove NULLs from the array first, and use qsort to deduplicate only
the non-NULL items. This simplifies the comparison function. Also
replace qsort_arg() with a templated version so that the comparison
function can be inlined. These changes make ginExtractEntries() a
little faster especially for simple datatypes like integers.
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/6d16b6bd-a1ff-4469-aefb-a1c8274e561a@iki.fi
Buildfarm member skink reports that the new REPACK code is trying to
write uninitialized bytes to disk, which correspond to padding space in
the SerializedSnapshotData struct. Silence that by initializing the
memory in SerializeSnapshot() to all zeroes.
Co-authored-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/1976915.1775537087@sss.pgh.pa.us
Commit 5e13b0f24 used a .c file for a file containing a code fragment,
to avoid adding an exception to headerscheck. That turned out to be
too clever, since it meant installation didn't happen by the usual
mechanism. Make it look like a normal header and add the requisite
exception.
Bug: #19450
Reported-by: RekGRpth <rekgrpth@gmail.com>
Discussion: https://postgr.es/m/19450-bb0612c50c6786e5@postgresql.org
As of commit 6aebedc38 Datums are 64-bit values. Since MAC addresses
have only 6 bytes, the abbreviated key always contains the entire
MAC address and is thus authoritative (for practical purposes -- the
tuple sort machinery has no way of knowing that). Abbreviating this
datatype is cheap, and aborting abbreviation prevents optimizations
like radix sort, so remove cardinality estimation.
Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Suggested-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CAJ7c6TMk10rF_LiMz6j9rRy1rqk-5s+wBPuBefLix4cY+-4s1w@mail.gmail.com
This commit changes the post_parse_analyze_hook_type() hook to take a
const JumbleState, to tell external modules that they are not allowed to
touch the JumbleState that has been compiled by the core code. This
fixes a pretty old problem with pg_stat_statements, that had always the
idea of modifying the lengths of the constants stored in the
JumbleState. The previous state could confuse extensions that need to
look at a JumbleState depending on the loading order, if
pg_stat_statements is part of the stack loaded.
Another piece included in this commit is the move of the routine
fill_in_constant_lengths() to queryjumblefuncs.c, to give an option to
extensions to compile the lengths of the constants, if necessary. I was
surprised by the number of external code that carries a copy of this
routine (see the thread for details). Previously, this routine modified
JumbleState. It now copies the set of LocationLens from JumbleState,
and fills the constant lengths for separate use.
pg_stat_statements is updated to use the new ComputeConstantLengths().
JumbleState is now marked with a const in the module, where relevant.
Author: Sami Imseih <samimseih@gmail.com>
Co-authored-by: Lukas Fittl <lukas@fittl.com>
Discussion: https://postgr.es/m/CAA5RZ0tZp5qU0ikZEEqJnxvdSNGh1DWv80sb-k4QAUmiMoOp_Q@mail.gmail.com
Some errmsgs in statscmds.c were phrased as "...cannot be used
because...". Put the reasons into errdetails. While at it, switch
from passive voice to "cannot create..." for the errmsg.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Suggested-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CANWCAZaZeX0omWNh_ZbD_JVujzYQdRUW8UZOQ4dWh9Sg7OcAow@mail.gmail.com
injection_points_detach() could fail because of a concurrent cleanup
triggered by injection_points_set_local() when a session finishes.
This problem could be reproduced by adding a hardcoded sleep in
InjectionPointDetach(), and has been detected by the CI.
As the test is designed so as the injection point is detached before
being awaken, there is no need for it to be local, similarly to test
010_index_concurrently_upsert. This commit removes
injection_points_set_local(), replacing it with a confirmation that the
point has been attached in the session expected to block on a lock.
With this removal, the detach cannot happen concurrently anymore, only
before when the point is woken up.
Issue introduced by 557a9f1e3e, where the test has been added.
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/rp6wz4lnz5qn4zlh7uxtavzfrmqvycy2g42z4zasfss2gxi54f@zzcsjdvdflwp
StatsShmemSize(), that computes the shmem size needed for pgstats,
includes the amount of shared memory wanted by all the custom stats
kinds registered. However, the shared memory allocation was done by
ShmemAlloc() in StatsShmemInit(), meaning that the space reserved was
not used, wasting some memory.
These extra allocations would show up under "<anonymous>" in
pg_shmem_allocations, as the allocations done by ShmemAlloc() are not
tracked by ShmemIndexEnt.
Issue introduced by 7949d95945.
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/04b04387-92f5-476c-90b0-4064e71c5f37@iki.fi
Backpatch-through: 18
That commit introduced AfterTriggerIsActive() to detect whether
we are inside the after-trigger firing machinery, so that RI trigger
functions can take the batched fast path. It was implemented using
query_depth >= 0, which correctly identified immediate trigger firing
but missed the deferred case where query_depth is -1 at COMMIT via
AfterTriggerFireDeferred(). This caused deferred FK checks to fall
back to the per-row fast path instead of the batched path.
The correct check is whether we are inside an after-trigger firing
loop specifically. Introduce afterTriggerFiringDepth, a counter
incremented around the trigger-firing loops in AfterTriggerEndQuery,
AfterTriggerFireDeferred, and AfterTriggerSetState, and decremented
after FireAfterTriggerBatchCallbacks() returns. AfterTriggerIsActive()
now returns afterTriggerFiringDepth > 0.
Reported-by: Chao Li <li.evan.chao@gmail.com>
Author: Chao Li <li.evan.chao@gmail.com>
Co-authored-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/C2133B47-79CD-40FF-B088-02D20D654806@gmail.com
On HEAD, the template code for custom fixed-sized pgstats is in the test
module test_custom_stats. On REL_18_STABLE, this code lives in the test
module injection_points.
Both cases were underestimating the size of the shared memory area
required for the storage of the stats data, using a single entry rather
than the whole area. This underestimation meant that there was no
memory allocated for the LWLock required for the stats, and even more.
This problem would be also misleading for extension developers looking
at this code.
This issue has been noticed while digging into a different bug reported
by Heikki Linnakangas, showing that the underestimation was causing
failures in the TAP tests of the test modules for 32-bit builds. The
other issue reported, related to the memory allocation of custom
fixed-sized pgstats, will be fixed in a follow-up commit.
Discussion: https://postgr.es/m/adMk_lWbnz3HDOA8@paquier.xyz
Backpatch-through: 18
Previously, parallel index and index-only scans packed the parallel scan
descriptor and shared instrumentation (for EXPLAIN ANALYZE) into a
single DSM allocation. Since scans may be instrumented without being
parallel-aware, and vice versa, using separate DSM chunks -- each with
its own TOC key -- is cleaner. A future commit will extend this pattern
to other scan node types.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
shm_toc_insert() silently accepts duplicate keys. Since shm_toc_lookup()
returns the first matching entry, any later entry with the same key
would be unreachable. Add an assertion to catch this.
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
This view contains one row for each table in the current database,
showing the current autovacuum scores for that specific table. It
also shows whether autovacuum would vacuum or analyze the table.
Bumps catversion.
Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Discussion: https://postgr.es/m/CAA5RZ0s4xjMrB-VAnLccC7kY8d0-4806-Lsac-czJsdA1LXtAw%40mail.gmail.com
For a long time, the online checksums patchset kept the "off" state as
literal zero without a label to be consistent with the previous coding
which only had a label for the "on" state. Later, when an "off" label
was made not all uses in the code got the memo. Fix by setting these
to PG_DATA_CHECKSUM_OFF.
While there, fix a duplicate word in a comment introduced by the same
commit.
Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAJ7c6TPRTnQFXXX1CRcYoTLXw2swtDH==uSz1MYoMKdLrKZHjA@mail.gmail.com
When this flag is specified, REPACK no longer acquires access-exclusive
lock while the new copy of the table is being created; instead, it
creates the initial copy under share-update-exclusive lock only (same as
vacuum, etc), and it follows an MVCC snapshot; it sets up a replication
slot starting at that snapshot, and uses a concurrent background worker
to do logical decoding starting at the snapshot to populate a stash of
concurrent data changes. Those changes can then be re-applied to the
new copy of the table just before swapping the relfilenodes.
Applications can continue to access the original copy of the table
normally until just before the swap, which is the only point at which
the access-exclusive lock is needed.
There are some loose ends in this commit:
1. concurrent repack needs its own replication slot in order to apply
logical decoding, which are a scarce resource and easy to run out of.
2. due to the way the historic snapshot is initially set up, only one
REPACK process can be running at any one time on the whole system.
3. there's a danger of deadlocking (and thus abort) due to the lock
upgrade required at the final phase.
These issues will be addressed in upcoming commits.
The design and most of the code are by Antonin Houska, heavily based on
his own pg_squeeze third-party implementation.
Author: Antonin Houska <ah@cybertec.at>
Co-authored-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Noriyoshi Shinoda <noriyoshi.shinoda@hpe.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/5186.1706694913@antos
Discussion: https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql
Add a note to the WAIT FOR documentation explaining that sessions
using this command on a standby server may be interrupted by recovery
conflicts. Some conflicts are unavoidable - for example, replaying
a tablespace drop terminates all backends unconditionally.
Discussion: https://postgr.es/m/CAPpHfds7oSCbZqob7ytT_Lso8fv-NW8LnedUTE4Krde%2B3rkJeA%40mail.gmail.com
Author: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
When the standby is passed as a PostgreSQL::Test::Cluster instance,
use the WAIT FOR LSN command on the standby server to implement
wait_for_catchup() for replay, write, and flush modes. This is more
efficient than polling pg_stat_replication on the upstream, as the
WAIT FOR LSN command uses a latch-based wakeup mechanism.
The optimization applies when:
- The standby is passed as a Cluster object (not just a name string)
- The mode is 'replay', 'write', or 'flush' (not 'sent')
Rather than pre-checking pg_is_in_recovery() on the standby (which
would add an extra round-trip on every call), we issue WAIT FOR LSN
directly and handle the 'not in recovery' result as a signal to fall
back to polling.
For 'sent' mode, when the standby is passed as a string (e.g., a
subscription name for logical replication), when the standby has been
promoted, or when WAIT FOR LSN is interrupted by a recovery conflict,
the function falls back to the original polling-based approach using
pg_stat_replication on the upstream. The recovery conflict fallback
is necessary because some conflicts are unavoidable - for example,
ResolveRecoveryConflictWithTablespace() kills all backends
unconditionally, regardless of what they are doing.
The recovery conflict detection matches the English error message
"conflict with recovery", which is reliable because the test suite
runs with LC_MESSAGES=C.
Discussion: https://postgr.es/m/CABPTF7UiArgW-sXj9CNwRzUhYOQrevLzkYcgBydmX5oDes1sjg%40mail.gmail.com
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@kurilemu.de>
Use TupleDescInitBuiltinEntry instead of TupleDescInitEntry when building
the result tuple descriptor for the WAIT FOR command. This avoids a syscache
access that could re-establish a catalog snapshot after we've explicitly
released all snapshots before the wait.
Discussion: https://postgr.es/m/CABPTF7U%2BSUnJX_woQYGe%3D%3DR9Oz%2B-V6X0VO2stBLPGfJmH_LEhw%40mail.gmail.com
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
This function is a thin wrapper around relation_needs_vacanalyze()
that handles fetching and freeing the pgstat entry for the table.
Since all callers of relation_needs_vacanalyze() do that anyway, we
can teach that function to fetch/free the pgstat entry and use it
instead.
Suggested-by: Álvaro Herrera <alvherre@kurilemu.de>
Author: Sami Imseih <samimseih@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0s4xjMrB-VAnLccC7kY8d0-4806-Lsac-czJsdA1LXtAw%40mail.gmail.com
The associated value should look like something that could be
part of an EXPLAIN options list, but restricted to EXPLAIN options
added by extensions.
For example, if pg_overexplain is loaded, you could set
auto_explain.log_extension_options = 'DEBUG, RANGE_TABLE'.
You can also specify arguments to these options in the same manner
as normal e.g. 'DEBUG 1, RANGE_TABLE false'.
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Discussion: http://postgr.es/m/CA+Tgmob-0W8306mvrJX5Urtqt1AAasu8pi4yLrZ1XfwZU-Uj1w@mail.gmail.com
Having rejected the principle that we should know how to re-order
the sub-commands of CREATE SCHEMA, there is not really anything
except a little coding to stop us from supporting more object types.
This patch adds support for creating functions (including procedures
and aggregates), operators, types (including domains), collations,
and text search objects.
SQL:2021 specifies that we should allow functions, procedures,
types, domains, and collations, so this moves us a great deal
closer to full SQL compatibility of CREATE SCHEMA. What remains
missing from their list are casts, transforms, roles, and some
object types we don't support yet (e.g. CREATE CHARACTER SET).
Supporting casts or transforms would be problematic because
they don't have names at all, let alone schema-qualified names,
so it'd be quite a stretch to say that they belong to a schema.
Roles likewise are not schema-qualified, plus they are global
to a cluster, making it even less reasonable to consider them
as belonging to a schema. So I don't see us trying to complete
the list.
User-defined aggregates and operators are outside the spec's ken,
as are text search objects, so adding them does not do anything for
spec compatibility. But they go along with these other object types,
plus it takes no additional code to support them since they are
represented as DefineStmts like some variants of CREATE TYPE.
It would indeed take some effort to reject them.
Author: Kirill Reshke <reshkekirill@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CALdSSPh4jUSDsWu3K58hjO60wnTRR0DuO4CKRcwa8EVuOSfXxg@mail.gmail.com
The previous patch simplified CREATE SCHEMA's behavior to "execute all
subcommands in the order they are written". However, that's a bit too
simple, as the spec clearly requires forward references in foreign key
constraint clauses to work, see feature F311-01. (Most other SQL
implementations seem to read more into the spec than that, but it's
not clear that there's justification for more in the text, and this is
the only case that doesn't introduce unresolvable issues.) We never
implemented that before, but let's do so now.
To fix it, transform FOREIGN KEY clauses into ALTER TABLE ... ADD
FOREIGN KEY commands and append them to the end of the CREATE SCHEMA's
subcommand list. This works because the foreign key constraints are
independent and don't affect any other DDL that might be in CREATE
SCHEMA. For simplicity, we do this for all FOREIGN KEY clauses even
if they would have worked where they were.
Author: Jian He <jian.universality@gmail.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1075425.1732993688@sss.pgh.pa.us
transformCreateSchemaStmtElements has always believed that it is
supposed to re-order the subcommands of CREATE SCHEMA into a safe
execution order. However, it is nowhere near being capable of doing
that correctly. Nor is there reason to think that it ever will be,
or that that is a well-defined requirement. (The SQL standard does
say that it should be possible to do foreign-key forward references
within CREATE SCHEMA, but it's not clear that the text requires
anything more than that.) Moreover, the problem will get worse as
we add more subcommand types. Let's just drop the whole idea and
execute the commands in the order given, which seems like a much
less astonishment-prone definition anyway. The foreign-key issue
will be handled in a follow-up patch.
This will result in a release-note-worthy incompatibility,
which is that forward references like
CREATE SCHEMA myschema
CREATE VIEW myview AS SELECT * FROM mytable
CREATE TABLE mytable (...);
used to work and no longer will. Considering how many closely
related variants never worked, this isn't much of a loss.
Along the way, pass down a ParseState so that we can provide an
error cursor for "wrong schema name" and related errors, and fix
transformCreateSchemaStmtElements so that it doesn't scribble
on the parsetree passed to it.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/1075425.1732993688@sss.pgh.pa.us
Previously, autovacuum always disabled parallel vacuum regardless of
the table's index count or configuration. This commit enables
autovacuum workers to use parallel index vacuuming and index cleanup,
using the same parallel vacuum infrastructure as manual VACUUM.
Two new configuration options control the feature. The GUC
autovacuum_max_parallel_workers sets the maximum number of parallel
workers a single autovacuum worker may launch; it defaults to 0,
preserving existing behavior unless explicitly enabled. The per-table
storage parameter autovacuum_parallel_workers provides per-table
limits. A value of 0 disables parallel vacuum for the table, a
positive value caps the worker count (still bounded by the GUC), and
-1 (the default) defers to the GUC.
To handle cases where autovacuum workers receive a SIGHUP and update
their cost-based vacuum delay parameters mid-operation, a new
propagation mechanism is added to vacuumparallel.c. The leader stores
its effective cost parameters in a DSM segment. Parallel vacuum
workers poll for changes in vacuum_delay_point(); if an update is
detected, they apply the new values locally via VacuumUpdateCosts().
A new test module, src/test/modules/test_autovacuum, is added to
verify that parallel autovacuum workers are correctly launched and
that cost-parameter updates are propagated as expected.
The patch was originally proposed by Maxim Orlov, but the
implementation has undergone significant architectural changes
since then during the review process.
Author: Daniil Davydov <3danissimo@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: zengman <zengman@halodbtech.com>
Discussion: https://postgr.es/m/CACG=ezZOrNsuLoETLD1gAswZMuH2nGGq7Ogcc0QOE5hhWaw=cw@mail.gmail.com
CLUSTER is no longer the favored way to invoke this functionality, and
the code is about to shift its focus to the REPACK more ambitiously.
Rename the file to avoid leaving an unnecessary historical artifact
around.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/202603271635.owyhm7btgoic@alvherre.pgsql
These columns haven't been computed yet when the filtering happens
(since we've not written the candidate tuple into the table); so
any check on them is wrong or useless. Worse, since aa606b931 such a
reference results in an access off the end of a TupleDesc, potentially
causing a phony "generated columns are not supported in COPY FROM
WHERE conditions" error; and since c98ad086a it throws an Assert
instead.
Actually we could allow tableoid, which has been set to the OID of the
table named as the COPY target. However, plausible uses for tests of
tableoid would involve a partitioned target table, and the user would
wish it to read as the OID of the destination partition. There has
been some discussion of changing things to make it work like that,
but pending that happening we should just disallow tableoid along
with other system columns.
It seems best though to install this prohibition only in HEAD.
In the back branches we'll just guard the unsafe TupleDesc access,
and people will keep getting whatever semantics they got before.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/6f435023-8ab6-47c2-ba07-035d0c4212f9@gmail.com
contrib/pg_stash_advice and src/test/modules/test_shmem
missed these, leading to complaints from git after an
in-tree check-world run.
Use our standard boilerplate list of ignorable subdirectories,
although the two modules presently create different subsets
of that.
This code missed the need to update the combined state's
nullbitmap if state1 already had a bitmap but state2 didn't.
We need to extend the existing bitmap with 1's but didn't.
This could result in wrong output from a parallelized
array_agg(anyarray) calculation, if the input has a mix of
null and non-null elements. The errors depended on timing
of the parallel workers, and therefore would vary from one
run to another.
Also install guards against integer overflow when calculating
the combined object's sizes, and make some trivial cosmetic
improvements.
Author: Dmytro Astapov <dastapov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFQUnFj2pQ1HbGp69+w2fKqARSfGhAi9UOb+JjyExp7kx3gsqA@mail.gmail.com
Backpatch-through: 16
It would be useful to be able to tell auto_explain to set a custom
EXPLAIN option, but it would be bad if it tried to do so and the
option name or value wasn't valid, because then every query would fail
with a complaint about the EXPLAIN option. So add a guc_check_handler
that auto_explain will be able to use to only try to set option
name/value/type combinations that have been determined to be legal,
and to emit useful messages about ones that aren't.
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Discussion: http://postgr.es/m/CA+Tgmob-0W8306mvrJX5Urtqt1AAasu8pi4yLrZ1XfwZU-Uj1w@mail.gmail.com
The restructuring in commit 53b8ca6881 revealed an interesting
corner case: if a table needs vacuuming for wraparound prevention
and autovacuum is disabled for it, we might still choose to analyze
it. Research seems to indicate this was an accidental addition by
commit 48188e1621, and further discussion indicates there is
consensus that it is unnecessary and can be removed.
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Discussion: https://postgr.es/m/adB9nSsm_S0D9708%40nathan
Previously, this logic was embedded within SplitIdentifierString,
SplitDirectoriesString, and SplitGUCList. Factoring it out saves
a bit of duplicated code, and also makes it available to extensions
that might want to do similar things without necessarily wanting to
do exactly the same thing.
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Discussion: http://postgr.es/m/CA+Tgmob-0W8306mvrJX5Urtqt1AAasu8pi4yLrZ1XfwZU-Uj1w@mail.gmail.com
This commit updates 011_lock_stats.pl to verify log_lock_waits behavior.
The tests check that messages are emitted both when a wait occurs and
when the lock is acquired, and that the "still waiting for" message is logged
exactly once per wait, even if the backend wakes up during the wait.
The latter covers the behavior introduced by commit fd6ecbfa75.
Author: Hüseyin Demir <huseyin.d3r@gmail.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAB5wL7YB1my9W5k5i=SY+=sTjeozyJ0YkvGXrVfeDNzuRkoTPg@mail.gmail.com
Child processes do not need the postmaster's working memory context and
normally release it at the start of their main entry point. However,
the slotsync worker forgot to do so.
This commit makes the slotsync worker release the postmaster's working
memory context at startup, preventing unintended use.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tiancheng Ge <getiancheng_2012@163.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwHO05JaUpgKF8FBDmPdBUJsK22axRRcgmAUc2Jyi8OK8g@mail.gmail.com
Some compilers didn't like the empty initializer when compiled without
USE_INJECTION_POINTS. Per buildfarm member 'drongo', using Visual
Studio 2019.
Author: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/adNHcBVJO5gIOp1l@paquier.xyz
This module allows plan advice strings to be provided automatically
from an in-memory advice stash. Advice stashes are stored in dynamic
shared memory and must be recreated and repopulated after a server
restart. If pg_stash_advice.stash_name is set to the name of an advice
stash, and if query identifiers are enabled, the query identifier
for each query will be looked up in the advice stash and the
associated advice string, if any, will be used each time that query
is planned.
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Discussion: http://postgr.es/m/CA+TgmoaeNuHXQ60P3ZZqJLrSjP3L1KYokW9kPfGbWDyt+1t=Ng@mail.gmail.com
Previously, one LWLock was used for each lock type, adding complexity
without an observable performance benefit as data is gathered only for
paths involving lock waits, at least currently. This commit replaces
the per-type set of LWLocks with a single LWLock protecting the stats
data of all the lock types, like the stats kinds for SLRU or WAL. A
good chunk of the callbacks get simpler thanks to this change.
The previous approach also had one bug in the flush callback when nowait
was called with "true": a backend iterating over all entries could
successfully flush some entries while skipping others due to contention,
then unconditionally reset the pending data. This would cause some
stats data loss.
Oversight in 4019f725f5.
Reported-by: Tomas Vondra <tomas@vondra.me>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/1af63e6d-16d5-4d5b-9b03-11472ef1adf9@vondra.me
Alexander Lakhin has noticed that it can be possible on machines with
slow storage to have the spawned workers be stuck in
initialize_worker_spi(), before they reach their main loop. Waiting for
a flush to happen would block the interrupt attempts done by the
database commands, causing the test to fail on timeout once the number
of interrupt attempts is reached in CountOtherDBBackends().
This commit switches the test to wait for the spawned bgworkers to reach
their main loops before attempting the database commands that would
trigger the interrupts, napping for a time larger than the default, with
worker_spi.naptime set at 10 minutes. Another thing that could be
attempted is to enforce a larger number of tries in
CountOtherDBBackends(), if what is done here is not enough. Let's see
first if what this commit does is enough for the buildfarm members
widowbird and jay.
Analyzed-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/f913fba1-da59-404c-9eb3-07c7304be637@gmail.com
Previously the stats.sql regression test used conditions like
"datname = (SELECT current_database())" to check the current database name.
The subquery is unnecessary, so this commit simplifies these expressions to
"datname = current_database()".
Author: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/A1535A8F-65AF-4C3D-ACBE-25891CB5D38B@gmail.com
Pushing aggregates containing volatile functions below a join can
violate volatility semantics by changing the number of times the
function is executed.
Here we check the Aggref nodes in the targetlist and havingQual for
volatile functions and disable eager aggregation when such functions
are present.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAMbWs48A53PY1Y4zoj7YhxPww9fO1hfnbdntKfA855zpXfVFRA@mail.gmail.com
When determining if it is safe to use an expression as a grouping key
for partial aggregation, eager aggregation relies on the B-tree
equalimage support function to ensure that equality implies image
equality.
Previously, the code incorrectly passed the default collation of the
expression's data type to the equalimage procedure, rather than the
expression's actual collation. As a result, if a column used a
non-deterministic collation but the base type's default collation was
deterministic, eager aggregation would incorrectly assume that the
column was safe for byte-level grouping. This could cause rows to be
prematurely grouped and subsequently discarded by strict join
conditions, resulting in incorrect query results.
This patch fixes the issue by passing the expression's actual
collation to the equalimage procedure.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAMbWs48A53PY1Y4zoj7YhxPww9fO1hfnbdntKfA855zpXfVFRA@mail.gmail.com
Previously, during shutdown, walsenders always waited until all pending data
was replicated to receivers. This ensures sender and receiver stay in sync
after shutdown, which is important for physical replication switchovers,
but it can significantly delay shutdown. For example, in logical replication,
if apply workers are blocked on locks, walsenders may wait until those locks
are released, preventing shutdown from completing for a long time.
This commit introduces a new GUC, wal_sender_shutdown_timeout,
which specifies the maximum time a walsender waits during shutdown for all
pending data to be replicated. When set, shutdown completes once all data is
replicated or the timeout expires. A value of -1 (the default) disables
the timeout.
This can reduce shutdown time when replication is slow or stalled. However,
if the timeout is reached, the sender and receiver may be left out of sync,
which can be problematic for physical replication switchovers.
Author: Andrey Silitskiy <a.silitskiy@postgrespro.ru>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Takamichi Osumi <osumi.takamichi@fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Vitaly Davydov <v.davydov@postgrespro.ru>
Reviewed-by: Ronan Dunklau <ronan@dunklau.fr>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/TYAPR01MB586668E50FC2447AD7F92491F5E89@TYAPR01MB5866.jpnprd01.prod.outlook.com
On MSVC Arm, USE_ARMV8_CRC32C is defined, but __builtin_constant_p
is not available. Use pg_integer_constant_p and add appropriate
guards. There is a similar potential hazard for the x86 path, but
for now let's get the buildfarm green.
Oversight in commit fbc57f2bc, per buildfarm member hoatzin.
Postcommit review and buildfarm/CI failures revealed a few issues in
the test code which this commit attempts to resolve. These failures
are verified using synthetic means.
* Wait for launcher exit in enable/disable checksum tests
When enabling or disabling data checksums in a test with waiting
for an end state (on or off), the test typically want to perform
more test against the cluster immediately. Make sure to wait for
the launcher to exit in these cases before returning in order to
know it can immediately be acted on. This is a more generic way
of implementating 0036232ba8.
* Refactor injection point tests to use the injection_points test
extension. Two injection points added for online checksums were
better expressed using the injection_points extension with the
test code embedded in datachecksum_state.c.
* Make tests less timing dependent and allow transitions to "on"
and not just "inprogress-on" in case a test manages to finish
before it's checked for state.
* When waiting on a blocking background psql keeping a temporary
table open, the test first closed the background session abd
then the server. This could cause data checksums to manage to
get enabled in the brief window between dropping the temporary
table and closing the server. Fix by closing the server first
before the background session.
* Remove a few superfluous duplicate checks and general cleanup
of comments as well as making LSN logging consistent.
These issues were reported by Andres as well as spotted in the
buildfarm and on CI.
Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/92F25C14-801E-4198-994D-D83E31FEB0D8@yesql.se
If the background worker for processing databases manages to finish
before the launcher starts waiting for it, the launcher would treat
it erroneously as an error. Fix by ensureing to check result state
in this case. Identified on CI and synthetically reproduced during
local testing.
Also while, make sure to properly lock the shared memory structure
before updating tje result state.
Author: Daniel Gustafsson <daniel@yesql.seA
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/4fxw37ge47v5baeozla5phymi233hxbcjbwwsfwv3mpg3kyl2z@6jk4nkf6jp4
Commit 7c64d56fd9 has removed the isolation test providing coverage
for lock statistics due to some instability in the CI, where the
deadlock timeout may not have enough time to process, preventing the
stats data to be updated. These also relied on a set of hardcoded
sleeps.
This commit switches the test suite to TAP, instead, that uses an
injection point with a wait to avoid the sleeps. The injection point is
added in ProcSleep(), once we know that the deadlock timeout has fired
and that the stats have been updated.
Multiple lock patterns are checked, all rely on the same workflow, with
two sessions:
- session 1 holds a given lock type.
- session 2 attaches to the new injection point with the wait action.
- session 2 attempts to acquire a lock conflicting with the lock of
session 1, waiting for the injection point to be reached.
- session 1 releases its lock, session 2 commits.
- pg_stat_lock is polled until the counters are updated for the lock
type.
Bertrand's version of the patch introduced a new routine to
BackgroundPsql() to detect the blocked background sessions. I have
tweaked the test so as we use the same method as some of the other tests
instead, based on some \echo commands. This test has been run multiple
times in the CI, all passing, so I'd like to think that this is more
stable than the first version attempted.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/acNTR1lLHwQJ0o+P@ip-10-97-1-34.eu-west-3.compute.internal
This rectifies the initialization functions a little, making the
"buffer strategy" stuff in freelist.c and buffer mapping hash table in
buf_init.c top-level "subsystems" of their own, registered directly in
subsystemlist.h. Previously they were called indirectly from
BufferManagerShmemInit() and BufferManagerShmemSize()
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
The buffer blocks, converted to use ShmemRequestStruct() in the next
commit, are IO-aligned. This might come handy in other places too, so
make it an explicit feature of ShmemRequestStruct().
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
This replaces the "shmem_size" and "shmem_init" callbacks in the IO
methods table with the same ShmemCallback struct that we now use in
other subsystems
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
I replaced the old SimpleLruInit() function without a backwards
compatibility wrapper, because few extensions define their own SLRUs.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
This is in preparation to convert it to use the new shmem allocation
functions, making the next commit that does that smaller. This inlines
SerialInit() to the caller, and moves all the initialization steps
within PredicateLockShmemInit() to happen after all the
ShmemInit{Struct|Hash}() calls.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
These subsystems have some complicating properties, making them
slightly harder to convert than most:
- The initialization callbacks of some of these subsystems have
dependencies, i.e. they need to be initialized in the right order.
- The ProcGlobal pointer still needs to be inherited by the
BackendParameters mechanism on EXEC_BACKEND builds, because
ProcGlobal is required by InitProcess() to get a PGPROC entry, and
the PGPROC entry is required to use LWLocks, and usually attaching
to shared memory areas requires the use of LWLocks.
- Similarly, ProcSignal pointer still needs to be handled by
BackendParameters, because query cancellation connections access it
without calling InitProcess
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
It seems like a good candidate to convert first because it needs to
initialized before any other subsystem, but other than that it's
nothing special.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
To add a new built-in subsystem, add it to subsystemslist.h. That
hooks up its shmem callbacks so that they get called at the right
times during postmaster startup. For now this is unused, but will
replace the current SubsystemShmemSize() and SubsystemShmemInit()
calls in the next commits.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
As part of this, embed the LWLock it needs in the shared memory struct
itself, so that we don't need to use RequestNamedLWLockTranche()
anymore. LWLockNewTrancheId() + LWLockInitialize() is more convenient
to use in extensions.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
The old ShmemInit{Struct/Hash}() functions could be used after
postmaster statup, as long as the allocation is small enough to fit in
spare shmem reserved at startup. I believe some extensions do that,
although we hadn't really documented it and had not coverage for it.
The new test module covers that after-startup usage with the new
ShmemRequestStruct() functions.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
This replaces the [Subsystem]ShmemSize() and [Subsystem]ShmemInit()
functions called at postmaster startup with a new set of callbacks.
The new mechanism is designed to be more ergonomic. Notably, the size
of each shmem area is specified in the same ShmemRequestStruct() call,
together with its name. The same mechanism is used in extensions,
replacing the shmem_{request/startup}_hooks.
ShmemInitStruct() and ShmemInitHash() become backwards-compatibility
wrappers around the new functions. In future commits, I will replace
all ShmemInitStruct() and ShmemInitHash() calls with the new
functions, although we'll still need to keep them around for
extensions.
Co-authored-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
Previously, different places (e.g. query "total time") were repurposing the
Instrumentation struct initially introduced for capturing per-node statistics
during execution. This overuse of the same struct is confusing, e.g. by
cluttering calls of InstrStartNode/InstrStopNode in unrelated code paths, and
prevents future refactorings.
Instead, simplify the Instrumentation struct to only track time and WAL/buffer
usage. Similarly, drop the use of InstrEndLoop outside of per-node
instrumentation - these calls were added without any apparent benefit since
the relevant fields were never read.
Introduce the NodeInstrumentation struct to carry forward the per-node
instrumentation information. WorkerInstrumentation is renamed to
WorkerNodeInstrumentation for clarity.
In passing, clarify that InstrAggNode is expected to only run after
InstrEndLoop (as it does in practice), and drop unused code.
This also fixes a consequence-less bug: Previously ->async_mode was only set
when a non-zero instrument_option was passed. That turns out to be harmless
right now, as ->async_mode only affects a timing related field.
Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAP53PkzdBK8VJ1fS4AZ481LgMN8f9mJiC39ZRHqkFUSYq6KWmg@mail.gmail.com
Introduce TriggerInstrumentation to capture trigger timing and firings
(previously counted in "ntuples"), to aid a future refactoring that
splits out all Instrumentation fields beyond timing and WAL/buffers into
more specific structs.
In passing, drop the "n" argument to InstrAlloc, as all remaining callers need
exactly one Instrumentation struct. The duplication between InstrAlloc() and
InstrInit(), as well as the conditional initialization of async_mode will be
addressed in a subsequent commit.
Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/CAP53PkzdBK8VJ1fS4AZ481LgMN8f9mJiC39ZRHqkFUSYq6KWmg@mail.gmail.com
The database name was warned about when building with
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS, leading to BF and CI failures.
It is somewhat confusing that the required prefix is different for databases
than other object types.
Also fix a pgindent violation that caused koel to start to fail.
Discussion: https://postgr.es/m/ptyiexyhmtxf4lm524s7o7w64r26ra237uusv4tjav4yhpmeoo@vfwwllz7tivb
The two new functions allow to extract the block number and offset from a tid.
There are existing ways to do so (e.g. by doing (ctid::text::point)[0]), but
they are hard to remember and not pretty.
tid_block() returns int8 (bigint) because BlockNumber is uint32, which exceeds
the range of int4. tid_offset() returns int4 (integer) because OffsetNumber is
uint16, which fits safely in int4.
Bumps catversion.
Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAJTYsWUzok2+mvSYkbVUwq_SWWg-GdHqCuYumN82AU97SjwjCA@mail.gmail.com
You could request two tranches with same name, but things would get
confusing when you called GetNamedLWLockTranche() to get the LWLocks
allocated for them; it would always return the first tranche with the
name. That doesn't make sense, so forbid duplicates.
We still allow duplicates with LWLockNewTrancheId(). That works better
as you don't use the name to look up the tranche ID later. It's still
confusing in wait events, for example, but it's not dangerous in the
same way.
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://www.postgresql.org/message-id/463a28db-0c0b-4af6-bac6-3891828bbbfe@iki.fi
While working on refactoring how shmem is allocated, I made a mistake
where the main LWLock array did not reserve space for the LWLocks
allocated with RequestNamedLWLockTranche(), and the test still
passed. Matthias van de Meent spotted that before it got committed,
but in order to catch such mistakes in the future, add checks in
test_lwlock_tranches that the locks allocated with
RequestNamedLWLockTranche() can be acquired and released.
Another change is to stop requesting multiple tranches with the same
name with RequestNamedLWLockTranche(). As soon as I started to test
using the locks I realized that's bogus, and the next commit will
forbid it. Keep test coverage for duplicates requested with
LWLockNewTrancheId() for now, but make it more clear that that's what
the test does.
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://www.postgresql.org/message-id/463a28db-0c0b-4af6-bac6-3891828bbbfe@iki.fi
Discussion: https://www.postgresql.org/message-id/CAEze2WjgCROMMXY0+j8FFdm3iFcr7By-+6Mwiz=PgGSEydiW3A@mail.gmail.com
Add a new SQL-callable function that returns the DDL statements needed
to recreate a database. It takes a regdatabase argument and an optional
VARIADIC text argument for options that are specified as alternating
name/value pairs. The following options are supported: pretty (boolean)
for formatted output, owner (boolean) to include OWNER and tablespace
(boolean) to include TABLESPACE. The return is one or multiple rows
where the first row is a CREATE DATABASE statement and subsequent rows are
ALTER DATABASE statements to set some database properties.
The caller must have CONNECT privilege on the target database.
Author: Akshay Joshi <akshay.joshi@enterprisedb.com>
Co-authored-by: Andrew Dunstan <andrew@dunslane.net>
Co-authored-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Quan Zongliang <quanzongliang@yeah.net>
Discussion: https://postgr.es/m/CANxoLDc6FHBYJvcgOnZyS+jF0NUo3Lq_83-rttBuJgs9id_UDg@mail.gmail.com
Discussion: https://postgr.es/m/e247c261-e3fb-4810-81e0-a65893170e94@dunslane.net
Add a new SQL-callable function that returns the DDL statements needed
to recreate a tablespace. It takes a tablespace name or OID and an
optional VARIADIC text argument for options that are specified as
alternating name/value pairs. The following options are supported: pretty
(boolean) for formatted output and owner (boolean) to include OWNER.
(It includes two variants because there is no regtablespace pseudotype.)
The return is one or multiple rows where the first row is a CREATE
TABLESPACE statement and subsequent rows are ALTER TABLESPACE statements
to set some tablespace properties.
The caller must have SELECT privilege on pg_tablespace.
get_reloptions() in ruleutils.c is made non-static so it can be called
from the new ddlutils.c file.
Author: Nishant Sharma <nishant.sharma@enterprisedb.com>
Author: Manni Wood <manni.wood@enterprisedb.com>
Co-authored-by: Andrew Dunstan <andrew@dunslane.net>
Co-authored-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAKWEB6rmnmGKUA87Zmq-s=b3Scsnj02C0kObQjnbL2ajfPWGEw@mail.gmail.com
Discussion: https://postgr.es/m/e247c261-e3fb-4810-81e0-a65893170e94@dunslane.net
Add a new SQL-callable function that returns the DDL statements needed
to recreate a role. It takes a regrole argument and an optional VARIADIC
text argument for options that are specified as alternating name/value
pairs. The following options are supported: pretty (boolean) for
formatted output and memberships (boolean) to include GRANT statements
for role memberships and membership options. The return is one or
multiple rows where the first row is a CREATE ROLE statement and
subsequent rows are ALTER ROLE statements to set some role properties.
Password information is never included in the output.
The caller must have SELECT privilege on pg_authid.
Author: Mario Gonzalez <gonzalemario@gmail.com>
Author: Bryan Green <dbryan.green@gmail.com>
Co-authored-by: Andrew Dunstan <andrew@dunslane.net>
Co-authored-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Quan Zongliang <quanzongliang@yeah.net>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/4c5f895e-3281-48f8-b943-9228b7da6471@gmail.com
Discussion: https://postgr.es/m/e247c261-e3fb-4810-81e0-a65893170e94@dunslane.net
Add parse_ddl_options(), append_ddl_option(), and append_guc_value()
helper functions in a new ddlutils.c file that provide common option
parsing and output formatting for the pg_get_*_ddl family of functions
which will follow in later patches. These accept VARIADIC text
arguments as alternating name/value pairs.
Callers declare an array of DdlOption descriptors specifying the
accepted option names and their types (boolean, text, or integer).
parse_ddl_options() matches each supplied pair against the array,
validates the value, and fills in the result fields. This
descriptor-based scheme is based on an idea from Euler Taveira.
This is placed in a new ddlutils.c file which will contain the
pg_get_*_ddl functions.
Author: Akshay Joshi <akshay.joshi@enterprisedb.com>
Co-authored-by: Andrew Dunstan <andrew@dunslane.net>
Co-authored-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/CAKWEB6rmnmGKUA87Zmq-s=b3Scsnj02C0kObQjnbL2ajfPWGEw@mail.gmail.com
Discussion: https://postgr.es/m/4c5f895e-3281-48f8-b943-9228b7da6471@gmail.com
Discussion: https://postgr.es/m/CANxoLDc6FHBYJvcgOnZyS+jF0NUo3Lq_83-rttBuJgs9id_UDg@mail.gmail.com
Discussion: https://postgr.es/m/e247c261-e3fb-4810-81e0-a65893170e94@dunslane.net
A future REPACK patch wants a way to suppress index_build doing its
progress reports when building an index, because that would interfere
with repack's own reporting; so add an INDEX_CREATE_SUPPRESS_PROGRESS
bit that enables this.
Furthermore, change the index_create_copy() API so that it takes flag
bits for index_create() and passes them unchanged. This gives its
callers more direct control, which eases the interface -- now its
callers can pass the INDEX_CREATE_SUPPRESS_PROGRESS bit directly. We
use it for the current caller in REINDEX CONCURRENTLY, since it's also
not interested in progress reporting, since it doesn't want
index_build() to be called at all in the first place.
One thing to keep in mind, pointed out by Mihail, is that we're not
suppressing the index-AM-specific progress report updates which happen
during ambuild(). At present this is not a problem, because the values
updated by those don't overlap with those used by commands other than
CREATE INDEX; but maybe in the future we'll want the ability to suppress
them also. (Alternatively we might want to display how each
index-build-subcommand progresses during REPACK and others.)
Author: Antonin Houska <ah@cybertec.at>
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Discussion: https://postgr.es/m/102906.1773668762@localhost
READ ONLY transactions should prevent modifications to foreign data as
well as local data, but postgres_fdw transactions declared as READ ONLY
that reference foreign tables mapped to a remote view executing volatile
functions would modify data on remote servers, as it would open remote
transactions in READ WRITE mode.
Similarly, DEFERRABLE transactions should not abort due to a
serialization failure even when accessing foreign data, but postgres_fdw
transactions declared as DEFERRABLE would abort due to that failure in a
remote server, as it would open remote transactions in NOT DEFERRABLE
mode.
To fix, modify postgres_fdw to open remote transactions in the same
access/deferrable modes as the local transaction. This commit also
modifies it to open remote subtransactions in the same access mode as
the local subtransaction.
This commit changes the behavior of READ ONLY/DEFERRABLE transactions
using postgres_fdw; in particular, it doesn't allow the READ ONLY
transactions to modify data on remote servers anymore, so such
transactions should be redeclared as READ WRITE or rewritten using other
tools like dblink. The release notes should note this as an
incompatibility.
These issues exist since the introduction of postgres_fdw, but to avoid
the incompatibility in the back branches, fix them in master only.
Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAPmGK16n_hcUUWuOdmeUS%2Bw4Q6dZvTEDHb%3DOP%3D5JBzo-M3QmpQ%40mail.gmail.com
Discussion: https://postgr.es/m/E1uLe9X-000zsY-2g%40gemulon.postgresql.org
This avoids increasing the distance to the maximum in cases where the I/O
subsystem is already keeping up. This turns out to be important for
performance for two reasons:
- Pinning a lot of buffers is not cheap. If additional pins allow us to avoid
IO waits, it's definitely worth it, but if we can already do all the
necessary readahead at a distance of 16, reading ahead 512 buffers can
increase the CPU overhead substantially. This is particularly noticeable
when the to-be-read blocks are already in the kernel page cache.
- If the read stream is read to completion, reading in data earlier than
needed is of limited consequences, leaving aside the CPU costs mentioned
above. But if the read stream will not be fully consumed, e.g. because it is
on the inner side of a nested loop join, the additional IO can be a serious
performance issue. This is not that commonly a problem for current read
stream users, but the upcoming work, to use a read stream to fetch table
pages as part of an index scan, frequently encounters this.
Note that this commit would have substantial performance downsides without
earlier commits:
- Commit 6e36930f9a, which avoids decreasing the readahead distance when
there was recent IO, is crucial, as otherwise we very often would end up not
reading ahead aggressively enough anymore with this commit, due to
increasing the distance less often.
- "read stream: Split decision about look ahead for AIO and combining" is
important as we would otherwise not perform IO combining when the IO
subsystem can keep up.
- "aio: io_uring: Trigger async processing for large IOs" is important to
continue to benefit from memory copy parallelism when using fewer IOs.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Tested-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/f3xxfrkafjxpyqxywcxricxgyizjirfceychyxsgn7bwjp5eda@kwbduhy7tfmu
Discussion: https://postgr.es/m/CA+hUKGL2PhFyDoqrHefqasOnaXhSg48t1phs3VM8BAdrZqKZkw@mail.gmail.com
In a subsequent commit the read-ahead distance will only be increased when
waiting for IO. Without further work that would cause a regression: As IO
combining and read-ahead are currently controlled by the same mechanism, we
would end up not allowing IO combining when never needing to wait for IO (as
the distance ends up too small to allow for full sized IOs), which can
increase CPU overhead. A typical reason to not have to wait for IO completion
at a low look-ahead distance is use of io_uring with the to-be-read data in
the page cache. But even with worker the IO submission rate may be low enough
for the worker to keep up.
One might think that we could just always perform IO combining, but doing so
at the start of a scan can cause performance regressions:
1) Performing a large IO commonly has a higher latency than smaller IOs. That
is not a problem once reading ahead far enough, but at the start of a stream
it can lead to longer waits for IO completion.
2) Sometimes read streams will not be read to completion. Immediately starting
with full sized IOs leads to more wasted effort. This is not commonly an
issue with existing read stream users, but the upcoming use of read streams
to fetch table pages as part of an index scan frequently encounters this.
Solve this issue by splitting ReadStream->distance into ->combine_distance and
->readahead_distance. Right now they are increased/decreased at the same time,
but that will change in the next commit.
One of the comments in read_stream_should_look_ahead() refers to a motivation
that only really exists as of the next commit, but without it the code doesn't
make sense on its own.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/f3xxfrkafjxpyqxywcxricxgyizjirfceychyxsgn7bwjp5eda@kwbduhy7tfmu
Discussion: https://postgr.es/m/CA+hUKGL2PhFyDoqrHefqasOnaXhSg48t1phs3VM8BAdrZqKZkw@mail.gmail.com
The long if statements were hard to read and hard to document. Splitting them
into inline helpers makes it much easier to explain each part separately.
This is done in preparation for making the logic more complicated...
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/f3xxfrkafjxpyqxywcxricxgyizjirfceychyxsgn7bwjp5eda@kwbduhy7tfmu
io_method=io_uring has a heuristic to trigger asynchronous processing of IOs
once the IO depth is a bit larger. That heuristic is important when doing
buffered IO from the kernel page cache, to allow parallelizing of the memory
copy, as otherwise io_method=io_uring would be a lot slower than
io_method=worker in that case.
An upcoming commit will make read_stream.c only increase the read-ahead
distance if we needed to wait for IO to complete. If to-be-read data is in the
kernel page cache, io_uring will synchronously execute IO, unless the IO is
flagged as async. Therefore the aforementioned change in read_stream.c
heuristic would lead to a substantial performance regression with io_uring
when data is in the page cache, as we would never reach a deep enough queue to
actually trigger the existing heuristic.
Parallelizing the copy from the page cache is mainly important when doing a
lot of IO, which commonly is only possible when doing largely sequential IO.
The reason we don't just mark all io_uring IOs as asynchronous is that the
dispatch to a kernel thread has overhead. This overhead is mostly noticeable
with small random IOs with a low queue depth, as in that case the gain from
parallelizing the memory copy is small and the latency cost high.
The facts from the two prior paragraphs show a way out: Use the size of the IO
in addition to the depth of the queue to trigger asynchronous processing.
One might think that just using the IO size might be enough, but
experimentation has shown that not to be the case - with deep look-ahead
distances being able to parallelize the memory copy is important even with
smaller IOs.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/f3xxfrkafjxpyqxywcxricxgyizjirfceychyxsgn7bwjp5eda@kwbduhy7tfmu
Discussion: https://postgr.es/m/CA+hUKGL2PhFyDoqrHefqasOnaXhSg48t1phs3VM8BAdrZqKZkw@mail.gmail.com
Also rename it to index_create_copy. Add a 'boolean concurrent' option,
and make it work for both cases: in concurrent mode, just create the
catalog entries; caller is responsible for the actual building later.
In non-concurrent mode, the index is built right away.
This allows it to be reused for other purposes -- specifically, for
concurrent REPACK.
(With the CONCURRENTLY option, REPACK cannot simply swap the heap file and
rebuild its indexes. Instead, it needs to build a separate set of
indexes, including their system catalog entries, *before* the actual
swap, to reduce the time AccessExclusiveLock needs to be held for. This
approach is different from what CREATE INDEX CONCURRENTLY does.)
Per a suggestion from Mihail Nikalayeu.
Author: Antonin Houska <ah@cybertec.at>
Reviewed-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/41104.1754922120@localhost
Avoid dropping the heap page pin (xs_cbuf) and visibility map pin
(xs_vmbuffer) within heapam_index_fetch_reset. Retaining these pins
saves cycles during certain nested loop joins and merge joins that
frequently restore a saved mark: cases where the next tuple fetched
after a reset often falls on the same heap page will now avoid the cost
of repeated pinning and unpinning.
Avoiding dropping the scan's heap page buffer pin is preparation for an
upcoming patch that will add I/O prefetching to index scans. Testing of
that patch (which makes heapam tend to pin more buffers concurrently
than was typical before now) shows that the aforementioned cases get a
small but clearly measurable benefit from this optimization.
Upcoming work to add a slot-based table AM interface for index scans
(which is further preparation for prefetching) will move VM checks for
index-only scans out of the executor and into heapam. That will expand
the role of xs_vmbuffer to include VM lookups for index-only scans (the
field won't just be used for setting pages all-visible during on-access
pruning via the enhancement recently introduced by commit b46e1e54).
Avoiding dropping the xs_vmbuffer pin will preserve the historical
behavior of nodeIndexonlyscan.c, which always kept this pin on a rescan;
that aspect of this commit isn't really new.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wz=g=JTSyDB4UtB5su2ZcvsS7VbP+ZMvvaG6ABoCb+s8Lw@mail.gmail.com
Add an explicit BlockNumber field (xs_blk) to IndexFetchHeapData that
tracks which heap block is currently pinned in xs_cbuf.
heapam_index_fetch_tuple now uses xs_blk to determine when buffer
switching is needed, replacing the previous approach that compared
buffer identities via ReleaseAndReadBuffer on every non-HOT-chain call.
This is preparatory work for an upcoming commit that will add index
prefetching using a read stream. Delegating the release of a currently
pinned buffer to ReleaseAndReadBuffer won't work anymore -- at least not
when the next buffer that the scan needs to pin is one returned by
read_stream_next_buffer (not a buffer returned by ReadBuffer).
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wz=g=JTSyDB4UtB5su2ZcvsS7VbP+ZMvvaG6ABoCb+s8Lw@mail.gmail.com
Move the heapam index fetch callbacks (index_fetch_begin,
index_fetch_reset, index_fetch_end, and index_fetch_tuple) into a new
dedicated file. Also move heap_hot_search_buffer over. This is a
purely mechanical move with no functional impact.
Upcoming work to add a slot-based table AM interface for index scans
will substantially expand this code. Keeping it in heapam_handler.c
would clutter a file whose primary role is to wire up the TableAmRoutine
callbacks. Bitmap heap scans and sequential scans would benefit from
similar separation in the future.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/bmbrkiyjxoal6o5xadzv5bveoynrt3x37wqch7w3jnwumkq2yo@b4zmtnrfs4mh
Rename heapam_index_fetch_tuple's call_again argument to heap_continue,
for consistency with the pointed-to variable name (IndexScanDescData's
xs_heap_continue field).
Preparation for an upcoming commit that will move index scan related
heapam functions into their own file.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/bmbrkiyjxoal6o5xadzv5bveoynrt3x37wqch7w3jnwumkq2yo@b4zmtnrfs4mh
In similar vein to commit 3c6e8c123, the ARMv8 cryptography extension
has 64x64 -> 128-bit carryless multiplication instructions suitable
for computing CRC. This was tested to be around twice as fast as
scalar CRC instructions for longer inputs.
We now do a runtime check, even for builds that target "armv8-a+crc",
but those builds can still use a direct call for constant inputs,
which we assume are short.
As for x86, the MIT-licensed implementation was generated with the
"generate" program from
https://github.com/corsix/fast-crc32/
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/CANWCAZaKhE+RD5KKouUFoxx1EbUNrNhcduM1VQ=DkSDadNEFng@mail.gmail.com
We already rely on autovectorization for computing page checksums,
but on x86 we can get a further several-fold performance increase by
annotating pg_checksum_block() with a function target attribute for
the AVX2 instruction set extension. Not only does that use 256-bit
registers, it can also use vector multiplication rather than the
vector shifts and adds used in SSE2.
Similar to other hardware-specific paths, we set a function pointer
on first use. We don't bother to avoid this on platforms without AVX2
since the overhead of indirect calls doesn't matter for multi-kilobyte
inputs. However, we do arrange so that only core has the function
pointer mechanism. External programs will continue to build a normal
static function and don't need to be aware of this.
This matters most when using io_uring since in that case the checksum
computation is not done in parallel by IO workers.
Co-authored-by: Matthew Sterrett <matthewsterrett2@gmail.com>
Co-authored-by: Andrew Kim <andrew.kim@intel.com>
Reviewed-by: Oleg Tselebrovskiy <o.tselebrovskiy@postgrespro.ru>
Tested-by: Ants Aasma <ants.aasma@cybertec.at>
Tested-by: Stepan Neretin <slpmcf@gmail.com> (earlier version)
Discussion: https://postgr.es/m/CA+vA85_5GTu+HHniSbvvP+8k3=xZO=WE84NPwiKyxztqvpfZ3Q@mail.gmail.com
Discussion: https://postgr.es/m/20250911054220.3784-1-root%40ip-172-31-36-228.ec2.internal
It's been missing ever since fast-path locking was introduced. It's a
small discrepancy, about 4 kB, but let's be tidy. This doesn't seem
worth backpatching, however; in stable branches we were less precise
about the estimates and e.g. added a 10% margin to the hash table
estimates, which is usually much bigger than this discrepancy.
For the three implementations that have caused problems so far:
* GNU and BSD (libarchive) tar both understand --format=ustar
* ustar doesn't support large UID/GID values, so set them to 0 to
avoid a hard error from at least GNU tar
* OpenBSD tar needs -F ustar, and it appears to warn but carry
on with "nobody" if a UID is too large
* -f /dev/null is a more portable way to throw away the output, since
the default destination might be a tape device depending on build
options that a distribution might change
* Windows ships BSD tar but lacks /dev/null, so ask perl for its name
Based on their manuals, the other two implementations the tests are
likely to encounter in the wild don't seem to need any special handling:
* Solaris/illumos tar uses ustar and replaces large UIDs with 60001
* AIX tar uses ustar (unless --format=pax) and truncates large UIDs
Backpatch-through: 18
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Sami Imseih <samimseih@gmail.com> (large UIDs)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier version)
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> (OpenBSD)
Reviewed-by: Andrew Dunstan <andrew@dunslane.net> (Windows)
Discussion: https://postgr.es/m/3676229.1775170250%40sss.pgh.pa.us
Discussion: https://postgr.es/m/CAA5RZ0tt89MgNi4-0F4onH%2B-TFSsysFjMM-tBc6aXbuQv5xBXw%40mail.gmail.com
It's not very useful to specify a non-standard directory size. The
HASH_DIRSIZE option was only used for shared memory hash tables, and
those always used hash_select_dirsize() to choose the size, which in
turn just uses the default algorithm anyway. That assumption was
ingrained in hash_estimate_size(), too.
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://www.postgresql.org/message-id/01ab1d41-3eda-4705-8bbd-af898f5007f1@iki.fi
Previously, the shared header (HASHHDR) and the directory were
allocated by the caller, and passed to hash_create(), while the actual
elements were allocated separately with ShmemAlloc(). After this
commit, all the memory needed by the header, the directory, and all
the elements is allocated using a single ShmemInitStruct() call, and
the different parts are carved out of that allocation. This way the
ShmemIndex entries (and thus pg_shmem_allocations) reflect the size of
the whole hash table, rather than just the directories.
Commit f5930f9a98 attempted this earlier, but it had to be reverted.
The new strategy is to let dynahash.c perform all the allocations with
the alloc function, but have the alloc function carve out the parts
from the one larger allocation. The shared header and the directory
are now also allocated with alloc calls, instead of passing the area
for those directly from the caller.
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://www.postgresql.org/message-id/01ab1d41-3eda-4705-8bbd-af898f5007f1@iki.fi
Set HASH_FIXED_SIZE on all shared memory hash tables, to prevent them
from growing after the initial allocation. It was always weirdly
indeterministic that if one hash table used up all the unused shared
memory, you could not use that space for other things anymore until
restart. We just got rid of that behavior for the LOCK and PROCLOCK
tables, but it's similarly weird for all other hash tables.
Increase SHMEM_INDEX_SIZE because we were already above the max size,
on that one, and it's now a hard limit.
Some callers of ShmemInitHash() still pass HASH_FIXED_SIZE, but that's
now unnecessary. They should perhaps now be removed, but it doesn't do
any harm either to pass it.
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://www.postgresql.org/message-id/01ab1d41-3eda-4705-8bbd-af898f5007f1@iki.fi
Replace the separate init and max size options with a single size
option. We didn't make much use of the feature, all callers except the
ones in wait_event.c already used the same size for both, and the hash
tables in wait_event.c are small so there's little harm in just
allocating them to the max size.
The only reason why you might want to not reserve the max size upfront
is to make the memory available for other hash tables to grow beyond
their max size. Letting hash tables grow much beyond their max size is
bad for performance, however, because we cannot resize the directory,
and we never had very much "wiggle room" to grow to anyway so you
couldn't really rely on it. We recently marked the LOCK and PROCLOCK
tables with HAS_FIXED_SIZE, so there's nothing left in core that would
benefit from more unallocated shared memory.
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://www.postgresql.org/message-id/01ab1d41-3eda-4705-8bbd-af898f5007f1@iki.fi
At the moment, the only way for a validator module to report error
details on failure is to log them separately before returning from
validate_cb. Independently of that problem, the ereport() calls that we
make during validation failure partially duplicate some of the work of
auth_failed().
The end result is overly verbose and confusing for readers of the logs:
[768233] LOG: [my_validator] bad signature in bearer token
[768233] LOG: OAuth bearer authentication failed for user "jacob"
[768233] DETAIL: Validator failed to authorize the provided token.
[768233] FATAL: OAuth bearer authentication failed for user "jacob"
[768233] DETAIL: Connection matched file ".../pg_hba.conf" line ...
Solve both problems by making use of the existing logdetail pointer
that's provided by ClientAuthentication. Validator modules may set
ValidatorModuleResult->error_detail to override our default generic
message.
The end result looks something like
[242284] FATAL: OAuth bearer authentication failed for user "jacob"
[242284] DETAIL: [my_validator] bad signature in bearer token
Connection matched file ".../pg_hba.conf" line ...
Reported-by: Álvaro Herrera <alvherre@kurilemu.de>
Reported-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/202601241015.y5uvxd7oxnfs%40alvherre.pgsql
The test for re-running checksum enabling was only checking for the
data checksum state to transition to 'on', but didn't account for
the launcher process having had time to exit, thus getting an error
instead of the expected no-op. Adding a pg_stat_activity check for
the launcher exiting resolves the error, verified by inducing delay
in the launcher.
Also wrap a variable only used in injection point tests within the
correct USE macros to avoid warning for an unused variable.
All per the buildfarm.
Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Buildfarm
Discussion: https://postgr.es/m/1CB288C9-564B-4664-B096-C2F4377D17AB@yesql.se
Presently, this function only computes component scores when the
corresponding threshold is reached. A follow-up commit will add a
view that shows tables' autovacuum scores, and we anticipate that
users will want to use this view to discover tables that are
nearing autovacuum eligibility. This commit teaches this function
to always compute autovacuum scores, even when a threshold has not
been reached or autovacuum is disabled.
The restructuring in this commit revealed an interesting edge case.
If the table needs vacuuming for wraparound prevention and
autovacuum is disabled for it, we might still choose to analyze it.
It's not clear if this is intentional, but it has been this way for
nearly 20 years, so it seems best to avoid changing it without
further discussion.
Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0s4xjMrB-VAnLccC7kY8d0-4806-Lsac-czJsdA1LXtAw%40mail.gmail.com
This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.
Data checksums could prior to this only be enabled during initdb or
when the cluster is offline using the pg_checksums app. This commit
introduce functionality to enable, or disable, data checksums while
the cluster is running regardless of how it was initialized.
A background worker launcher process is responsible for launching a
dynamic per-database background worker which will mark all buffers
dirty for all relation with storage in order for them to have data
checksums calculated on write. Once all relations in all databases
have been processed, the data_checksums state will be set to on and
the cluster will at that point be identical to one which had data
checksums enabled during initialization or via offline processing.
When data checksums are being enabled, concurrent I/O operations
from backends other than the data checksums worker will write the
checksums but not verify them on reading. Only when all backends
have absorbed the procsignalbarrier for setting data_checksums to
on will they also start verifying checksums on reading. The same
process is repeated during disabling; all backends write checksums
but do not verify them until the barrier for setting the state to
off has been absorbed by all. This in-progress state is used to
ensure there are no false negatives (or positives) due to reading
a checksum which is not in sync with the page.
A new testmodule, test_checksums, is introduced with an extensive
set of tests covering both online and offline data checksum mode
changes. The tests which run concurrent pgbdench during online
processing are gated behind the PG_TEST_EXTRA flag due to being
very expensive to run. Two levels of PG_TEST_EXTRA flags exist
to turn on a subset of the expensive tests, or the full suite of
multiple runs.
This work is based on an earlier version of this patch which was
reviewed by among others Heikki Linnakangas, Robert Haas, Andres
Freund, Tomas Vondra, Michael Banck and Andrey Borodin. During
the work on this new version, Tomas Vondra has given invaluable
assistance with not only coding and reviewing but very in-depth
testing.
Author: Daniel Gustafsson <daniel@yesql.se>
Author: Magnus Hagander <magnus@hagander.net>
Co-authored-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CABUevExz9hUUOLnJVr2kpw9Cx=o4MCr1SVKwbupzuxP7ckNutA@mail.gmail.com
Discussion: https://postgr.es/m/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de
Discussion: https://postgr.es/m/CABUevEwE3urLtwxxqdgd5O2oQz9J717ZzMbh+ziCSa5YLLU_BA@mail.gmail.com
This commit adds an early return to this function, allowing us to
remove a level of indentation on a decent chunk of code. This is
preparatory work for follow-up commits that will add a new system
view to show tables' autovacuum scores.
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0s4xjMrB-VAnLccC7kY8d0-4806-Lsac-czJsdA1LXtAw%40mail.gmail.com
The previous commits reduced the amount of memory available for locks
by eliminating the "safety margins" and by settling the split between
LOCK and PROCLOCK tables at startup. The allocation is now more
deterministic, but it also means that you often hit one of the limits
sooner than before. To compensate for that, bump up
max_locks_per_transactions from 64 to 128. With that there is a little
more space in the both hash tables than what was the effective maximum
size for either table before the previous commits.
This only changes the default, so if you had changed
max_locks_per_transactions in postgresql.conf, you will still have
fewer locks available than before for the same setting value. This
should be noted in the release notes. A good rule of thumb is that if
you double max_locks_per_transactions, you should be able to get as
many locks as before.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/e07be2ba-856b-4ff5-8313-8b58b6b4e4d0@iki.fi
This prevents the LOCK table from "stealing" space that was originally
calculated for the PROLOCK table, and vice versa. That was weirdly
indeterministic so that if you e.g. took a lot of locks consuming all
the available shared memory for the LOCK table, subsequent
transactions that needed the more space for the PROCLOCK table would
fail, but if you restarted the system then the space would be
available for PROCLOCK again. Better to be strict and predictable,
even though that means that in many cases you can acquire far fewer
locks than before.
This also prevents the lock hash tables from using up the
general-purpose 100 kB reserve we set aside for "stuff that's too
small to bother estimating" in CalculateShmemSize(). We are pretty
good at accounting for everything nowadays, so we could probably make
that reservation smaller, but I'll leave that for another commit.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/e07be2ba-856b-4ff5-8313-8b58b6b4e4d0@iki.fi
As the comment says, the hash table sizes are just estimates, but that
doesn't mean we need a "safety margin" here. hash_estimate_size()
estimates the needed size in bytes pretty accurately for the given
number of elements, so if we wanted room for more elements in the
table, we should just use larger max_table_size in the
hash_estimate_size() call.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/e07be2ba-856b-4ff5-8313-8b58b6b4e4d0@iki.fi
The 10% safety margin was copy-pasted from lock.c when the predicate
locking code was originally added. However, we later (commit
7c797e7194) added the HASH_FIXED_SIZE flag to the hash tables, which
means that they cannot actually use the safety margin that we're
calculating for them.
The extra memory was mainly used by the main lock manager, which is
the only shmem hash table of non-trivial size that does not use the
HASH_FIXED_SIZE flag. If we wanted to have more space for the lock
manager, we should reserve it directly in lock.c. After this commit,
the lock manager will just have less memory available than before.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/e07be2ba-856b-4ff5-8313-8b58b6b4e4d0@iki.fi
Instead of probing the PK index on each trigger invocation, buffer
FK rows in a new per-constraint cache entry (RI_FastPathEntry) and
flush them as a batch.
On each trigger invocation, the new ri_FastPathBatchAdd() buffers
the FK row in RI_FastPathEntry. When the buffer fills (64 rows)
or the trigger-firing cycle ends, the new ri_FastPathBatchFlush()
probes the index for all buffered rows, sharing a single
CommandCounterIncrement, snapshot, permission check, and security
context switch across the batch, rather than repeating each per row
as the SPI path does. Per-flush CCI is safe because all AFTER
triggers for the buffered rows have already fired by flush time.
For single-column foreign keys, the new ri_FastPathFlushArray()
builds an ArrayType from the buffered FK values (casting to the
PK-side type if needed) and constructs a scan key with the
SK_SEARCHARRAY flag. The index AM sorts and deduplicates the array
internally, then walks matching leaf pages in one ordered traversal
instead of descending from the root once per row. A matched[] bitmap
tracks which batch items were satisfied; the first unmatched item is
reported as a violation. Multi-column foreign keys fall back to
per-row probing via the new ri_FastPathFlushLoop().
The fast path introduced in the previous commit (2da86c1ef9) yields
~1.8x speedup. This commit adds ~1.6x on top of that, for a combined
~2.9x speedup over the unpatched code (int PK / int FK, 1M rows, PK
table and index cached in memory).
FK tuples are materialized via ExecCopySlotHeapTuple() into a new
purpose-specific memory context (flush_cxt), child of
TopTransactionContext, which is also used for per-flush transient
work: cast results, the search array, and index scan allocations.
It is reset after each flush and deleted in teardown.
The PK relation, index, tuple slots, and fast-path metadata are
cached in RI_FastPathEntry across trigger invocations within a
trigger-firing batch, avoiding repeated open/close overhead. The
snapshot and IndexScanDesc are taken fresh per flush. The entry is
not subject to cache invalidation: cached relations are held with
locks for the transaction duration, and the entry's lifetime is
bounded by the trigger-firing cycle.
Lifecycle management for RI_FastPathEntry relies on three new
mechanisms:
- AfterTriggerBatchCallback: A new general-purpose callback
mechanism in trigger.c. Callbacks registered via
RegisterAfterTriggerBatchCallback() fire at the end of each
trigger-firing batch (AfterTriggerEndQuery for immediate
constraints, AfterTriggerFireDeferred at COMMIT, and
AfterTriggerSetState for SET CONSTRAINTS IMMEDIATE). The RI
code registers ri_FastPathEndBatch as a batch callback.
- Batch callbacks only fire at the outermost query level
(checked inside FireAfterTriggerBatchCallbacks), so nested
queries from SPI inside other AFTER triggers do not tear down
the cache mid-batch.
- XactCallback: ri_FastPathXactCallback NULLs the static cache
pointer at transaction end, handling the abort path where the
batch callback never fired.
- SubXactCallback: ri_FastPathSubXactCallback NULLs the static
cache pointer on subtransaction abort, preventing the batch
callback from accessing already-released resources.
- AfterTriggerBatchIsActive(): A new exported accessor that
returns true when afterTriggers.query_depth >= 0. During
ALTER TABLE ... ADD FOREIGN KEY validation, RI triggers are
called directly outside the after-trigger framework, so batch
callbacks would never fire. The fast-path code uses this to
fall back to the non-cached per-invocation path in that
context.
ri_FastPathEndBatch() flushes any partial batch before tearing
down cached resources. Since the FK relation may already be
closed by flush time (e.g. for deferred constraints at COMMIT),
it reopens the relation using entry->fk_relid if needed.
The existing ALTER TABLE validation path bypasses batching and
continues to call ri_FastPathCheck() directly per row, because
RI triggers are called outside the after-trigger framework there
and batch callbacks would never fire to flush the buffer.
Suggested-by: David Rowley <dgrowleyml@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Co-authored-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Tested-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
LLVM 22 has the fix that we copied into our tree in commit 9044fc1d and
a new function to reach it[1][2], so we only need to use our copy for
Aarch64 + LLVM < 22. The only change to the final version that our copy
didn't get is a new LLVM_ABI macro, but that isn't appropriate for us.
Our copy is hopefully now frozen and would only need maintenance if bugs
are found in the upstream code.
Non-Aarch64 systems now also use the new API with LLVM 22. It allocates
all sections with one contiguous mmap() instead of one per
section. We could have done that earlier, but commit 9044fc1d wanted to
limit the blast radius to the affected systems. We might as well
benefit from that small improvement everywhere now that it is available
out of the box.
We can't delete our copy until LLVM 22 is our minimum supported version,
or we switch to the newer JITLink API for at least Aarch64.
[1] https://github.com/llvm/llvm-project/pull/71968
[2] https://github.com/llvm/llvm-project/pull/174307
Backpatch-through: 14
Discussion: https://postgr.es/m/CA%2BhUKGJTumad75o8Zao-LFseEbt%3DenbUFCM7LZVV%3Dc8yg2i7dg%40mail.gmail.com
Buildfarm testing shows that OpenSUSE (and perhaps related platforms?)
configures GNU tar in such a way that it'll archive sparse WAL files
by default, thus triggering the pax-extension detection code added by
bc30c704a. Thus, we need something similar to 852de579a but for
GNU tar's option set. "--format=ustar" seems to do the trick.
Moreover, the buildfarm shows that pg_verifybackup's 003_corruption.pl
test script is also triggering creation of pax-format tar files on
that platform. We had not noticed because those test cases all fail
(intentionally) before getting to the point of trying to verify WAL
data.
Since that means two TAP scripts need this option-selection logic, and
plausibly more will do so in future, factor it out into a subroutine
in Test::Utils. We also need to back-patch the 003_corruption.pl fix
into v18, where it's also failing.
While at it, clean up some places where guards for $tar being empty
or undefined were incomplete or even outright backwards. Presumably,
we missed noticing because the set of machines that run TAP tests
and don't have tar installed is empty. But if we're going to try
to handle that scenario, we should do it correctly.
Reported-by: Tomas Vondra <tomas@vondra.me>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/02770bea-b3f3-4015-8a43-443ae345379c@vondra.me
Backpatch-through: 18
Add the following jsonpath methods:
* l/r/btrim()
* lower(), upper()
* initcap()
* replace()
* split_part()
Each simply dispatches to the standard string processing functions.
These depend on the locale, but since it's set at `initdb`, they can be
considered immutable and therefore allowed in any jsonpath expression.
Author: Florents Tselai <florents.tselai@gmail.com>
Co-authored-by: David E. Wheeler <david@justatheory.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CA+v5N40sJF39m0v7h=QN86zGp0CUf9F1WKasnZy9nNVj_VhCZQ@mail.gmail.com
This is just cleanup in the jsonpath grammar.
Rename the `csv_` tokens to `int_`, because they represent signed or
unsigned integers, as follows:
* `csv_elem` => `int_elem`
* `csv_list` => `int_list`
* `opt_csv_list` => `opt_int_list`
Rename the `datetime_precision` tokens to `uint_arg`, as they represent
unsigned integers and will be useful for other methods in the future, as
follows:
* `datetime_precision` => `uint_elem`
* `opt_datetime_precision` => `opt_uint_arg`
Rename the `datetime_template` tokens to `str_arg`, as they represent
strings and will be useful for other methods in the future, as follows:
* `datetime_template` => `str_elem`
* `opt_datetime_template` => `opt_str_arg`
Author: David E. Wheeler <david@justatheory.com>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CA+v5N40sJF39m0v7h=QN86zGp0CUf9F1WKasnZy9nNVj_VhCZQ@mail.gmail.com
When a tablesync worker checks whether a specific table is published,
it previously issued a query to the publisher calling
pg_get_publication_tables() and filtering the result by relid via a
WHERE clause. Because the function itself was fully evaluated before
the filter was applied, this forced the publisher to enumerate all
tables in the publication. For publications covering a large number of
tables, this resulted in expensive catalog scans and unnecessary CPU
overhead on the publisher.
This commit adds a new overloaded form of pg_get_publication_tables()
that accepts an array of publication names and a target table
OID. Instead of enumerating all published tables, it evaluates
membership for the specified relation via syscache lookups, using the
new is_table_publishable_in_publication() helper. This helper
correctly accounts for publish_via_partition_root, ALL TABLES with
EXCEPT clauses, schema publications, and partition inheritance, while
avoiding the overhead of building the complete published table list.
The existing VARIADIC array form of pg_get_publication_tables() is
preserved for backward compatibility. Tablesync workers use the new
two-argument form when connected to a publisher running PostgreSQL 19
or later.
Bump catalog version.
Reported-by: Marcos Pegoraro <marcos@f10.com.br>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Haoyan Wang <wanghaoyan20@163.com>
Discussion: https://postgr.es/m/CAB-JLwbBFNuASyEnZWP0Tck9uNkthBZqi6WoXNevUT6+mV8XmA@mail.gmail.com
Previously, there was essentially no verification in this code that
the input is a tar file at all, let alone that it fits into the
subset of valid tar files that we can handle. This was exposed by
the discovery that we couldn't handle files that FreeBSD's tar
makes, because it's fairly aggressive about converting sparse WAL
files into sparse tar entries. To fix:
* Bail out if we find a pax extension header. This covers the
sparse-file case, and also protects us against scenarios where
the pax header changes other file properties that we care about.
(Eventually we may extend the logic to actually handle such
headers, but that won't happen in time for v19.)
* Be more wary about tar file type codes in general: do not assume
that anything that's neither a directory nor a symlink must be a
regular file. Instead, we just ignore entries that are none of the
three supported types.
* Apply pg_dump's isValidTarHeader to verify that a purported
header block is actually in tar format. To make this possible,
move isValidTarHeader into src/port/tar.c, which is probably where
it should have been since that file was created.
I also took the opportunity to const-ify the arguments of
isValidTarHeader and tarChecksum, and to use symbols not hard-wired
constants inside tarChecksum.
Back-patch to v18 but not further. Although this code exists inside
pg_basebackup in older branches, it's not really exposed in that
usage to tar files that weren't generated by our own code, so it
doesn't seem worth back-porting these changes across 3c9056981
and f80b09bac. I did choose to include a back-patch of 5868372bb
into v18 though, to minimize cosmetic differences between these
two branches.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/3049460.1775067940@sss.pgh.pa.us>
Backpatch-through: 18
Interrupt handling functions (e.g., HandleCatchupInterrupt(),
HandleParallelApplyMessageInterrupt()) are called only by
procsignal_sigusr1_handler(), which already calls SetLatch()
for the current process at the end of its processing.
Therefore, these interrupt handling functions do not need to
call SetLatch() themselves.
However, previously, some of these functions redundantly
called SetLatch(). This commit removes those unnecessary
calls.
While duplicate SetLatch() calls are redundant, they are
harmless, so this change is not backpatched.
Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/CALj2ACWd5apddj6Cd885WwJ6LquYu_G81C4GoR4xSoDV1x-FEA@mail.gmail.com
Previously we would only check for the availability of __cpuidex if
the related __get_cpuid_count was not available on a platform.
Future commits will need to access hypervisor information about
the TSC frequency of x86 CPUs. For that case __cpuidex is the only
viable option for accessing a high leaf (e.g. 0x40000000), since
__get_cpuid_count does not allow that.
__cpuidex is defined in cpuid.h for gcc/clang, but in intrin.h
for MSVC, so adjust tests to suite. We also need to cast the array
of unsigned ints to signed, since gcc (with -Wall) and clang emit
warnings otherwise.
Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: John Naylor <john.naylor@postgresql.org>
Discussion: https://postgr.es/m/CAP53PkyooCeR8YV0BUD_xC7oTZESHz8OdA=tP7pBRHFVQ9xtKg@mail.gmail.com
Now that command_ok() captures and displays failure output, use it
instead of system() plus manual diff-dumping in these two tests. This
simplifies both scripts and produces consistent, truncated output on
failure.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/DFYFWM053WHS.10K8ZPJ605UFK@jeltef.nl
Replace die with croak throughout Cluster.pm and Utils.pm (except in
INIT blocks and signal handlers, where die is correct) so that error
messages report the test script's line number rather than the helper
module's.
Add @CARP_NOT in Utils.pm listing PostgreSQL::Test::Cluster, so that
when a Utils function is called through a Cluster.pm wrapper, croak
skips both packages and reports the actual test-script caller.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/DFYFWM053WHS.10K8ZPJ605UFK@jeltef.nl
Install a $SIG{__DIE__} handler in the INIT block of Utils.pm that emits
the die message as a TAP diagnostic. Previously, an unexpected die
(e.g. from safe_psql) produced only "no plan was declared" with no
indication of the actual error. The handler also calls done_testing()
to suppress that confusing message.
Dies during compilation ($^S undefined) and inside eval ($^S == 1) are
left alone.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/DFYFWM053WHS.10K8ZPJ605UFK@jeltef.nl
Discussion: https://postgr.es/m/20220222181924.eehi7o4pmneeb4hm%40alap3.anarazel.de
Capture stdout and stderr from command_ok() and command_fails() and emit
them as TAP diagnostics on failure. Output is truncated to the first
and last 30 lines per channel to avoid flooding.
A new helper _diag_command_output() is introduced in Utils.pm so
both functions share the same truncation and formatting logic.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/DFYFWM053WHS.10K8ZPJ605UFK@jeltef.nl
When pg_regress fails it is often tedious to find the actual diffs,
especially in CI where you must navigate a file browser. Emit the first
80 lines of the combined regression.diffs as TAP diagnostics so the
failure reason is visible directly in the test output.
The line limit is across all failing tests in a single pg_regress run to
avoid flooding when a crash causes every subsequent test to fail.
New DIAG_DETAIL / DIAG_END tap output types are added, mirroring the
existing NOTE_DETAIL / NOTE_END pair, so that long diff lines can be
emitted without spurious '#' prefixes on continuation lines.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/DFYFWM053WHS.10K8ZPJ605UFK@jeltef.nl
While JIT can speed up large analytical queries, it can also cause
serious performance issues on otherwise very fast queries. Compiling
and optimizing the expressions may be so expensive, it completely
outweighs the JIT benefits for shorter queries.
Ideally, we'd address this in the cost model, but the part deciding
whether to enable JIT for a query is rather simple, partially because we
don't have any reliable estimates of how expensive the LLVM compilation
and optimization is.
Sometimes seemingly unrelated changes (for example a couple additional
INSERTs into a table) increase the cost just enough to enable JIT,
resulting in a performance cliff.
Because of these risks, most large-scale deployments already disable JIT
by default. Notably, this includes all hyperscalers.
This commit changes our default to align with that established practice.
If we improve the JIT (be it better costing or cheaper execution), we
can consider enabling it by default again.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://postgr.es/m/DG1VZJEX1AQH.2EH4OKGRUDB71@jeltef.nl
c456e3911 added various optimizations to the tuple deformation routines.
One optimization assumed that heap tuples would never contain cstrings.
That optimization also made its way into nocachegetattr(), which isn't
correct as ROW() types get formed into HeapTuples by ExecEvalRow() and
those can contain cstring Datums. nocachegetattr() gets used to extract
Datums from those tuples.
Here we remove the pg_assume(), which was there to instruct the compiler
to omit the attlen == -2 related code in att_addlength_pointer().
Author: David Rowley <dgrowleyml@gmail.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/80aeac57-8f50-4732-a5b4-c2373c3f8149@gmail.com
The pg_test_timing program was previously using INSTR_TIME_GET_NANOSEC on an
absolute instr_time value in order to do a diff, which goes against the spirit
of how the GET_* macros are supposed to be used, and will cause overhead in a
future change that assumes these macros are typically used on intervals only.
Additionally the program was doing unnecessary work in the test loop by
measuring the time elapsed, instead of checking the existing current time
measurement against a target end time. To support that, introduce a new
INSTR_TIME_ADD_NANOSEC macro that allows adding user-defined nanoseconds
to an instr_time variable.
While modifying the relevant code anyway, simplify it by not handling
durations <= 0 in test_timing(), since duration is unsigned and 0 is
disallowed by the caller.
Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAP53Pkyxv3-3gX+aOxC5tX0p2v9RHU+XH0iyvb64+ZnBXj92vg@mail.gmail.com
Until now we reduced the look-ahead distance by 1 on every hit, and doubled it
on every miss. That is problematic because there are very common IO patterns
where this prevents us from ever reaching a sufficiently high distance (e.g. a
miss followed by a hit will never have the distance grow beyond 2). In many
such cases, if we had ever reached a sufficient look-ahead distance, things
would have been fine, because we grow the distance faster than we decrease it.
One might think that the most obvious answer to this problem would be to never
reduce the distance. However, that would not work well, as (particularly with
upcoming users of read streams), it is reasonably common to at first have a
lot of misses and then to transition to a fully cached workload, e.g. because
the same blocks are needed repeatedly within one stream. Doing unnecessarily
deep readahead can be costly, due to having to pin a lot more buffers, which
increases CPU overhead.
Because the cost of a synchronously handled miss can be very high (multiple
milliseconds for every IO with commonly used storage) compared to the CPU
overhead of keeping the distance too high, we want to err on the side of not
reducing the distance too early.
The insight that a decrease of the distance by 1 at ever hit may be ok at
large distances, but not at low distances, shows a way out: If we only allow
decreasing the distance once there were no misses for our maximum look-ahead
distance, we will keep the distance high as long as readahead has a chance to
do IO asynchronously, but not commonly when not.
Several folks have written variants of this patch, including at least Thomas
Munro, Melanie Plageman and I.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/f3xxfrkafjxpyqxywcxricxgyizjirfceychyxsgn7bwjp5eda@kwbduhy7tfmu
Discussion: https://postgr.es/m/CA+hUKGL2PhFyDoqrHefqasOnaXhSg48t1phs3VM8BAdrZqKZkw@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-Wz%3DkMg3PNay96cHMT0LFwtxP-cQSRZTZzh1Cixxf8G%3Dzrw%40mail.gmail.com
While in fast-path, execute any IO that we might encounter synchronously.
Because we are, in that moment, not reading ahead, dispatching any occasional
IO to workers has the dispatch overhead, without any realistic chance of the
IO completing before we need it.
This helps io_method=worker performance for workloads that have only
occasional cache misses, but where those occasional misses still take long
enough to matter. It is likely this is only measurable with fast local
storage or workloads with the data in the kernel page cache, as with remote
storage the IO latency, not the dispatch-to-worker latency, is the determining
factor.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/f3xxfrkafjxpyqxywcxricxgyizjirfceychyxsgn7bwjp5eda@kwbduhy7tfmu
Discussion: https://postgr.es/m/CAH2-Wz%3DkMg3PNay96cHMT0LFwtxP-cQSRZTZzh1Cixxf8G%3Dzrw%40mail.gmail.com
The tuple_insert() method already has an equivalent argument, so this
makes sense just on consistency grounds, for future growth.
table_delete() can immediately use it to carry the 'changingPart'
boolean; for table_update we don't have any options at present.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> (older version)
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/202603171606.kf6pmhscqbqz@alvherre.pgsql
This is an extension of the UPDATE and DELETE commands to do a
"temporal update/delete" based on a range or multirange column. The
user can say UPDATE t FOR PORTION OF valid_at FROM '2001-01-01' TO
'2002-01-01' SET ... (or likewise with DELETE) where valid_at is a
range or multirange column.
The command is automatically limited to rows overlapping the targeted
portion, and only history within those bounds is changed. If a row
represents history partly inside and partly outside the bounds, then
the command truncates the row's application time to fit within the
targeted portion, then it inserts one or more "temporal leftovers":
new rows containing all the original values, except with the
application-time column changed to only represent the untouched part
of history.
To compute the temporal leftovers that are required, we use the *_minus_multi
set-returning functions defined in 5eed8ce50c.
- Added bison support for FOR PORTION OF syntax. The bounds must be
constant, so we forbid column references, subqueries, etc. We do
accept functions like NOW().
- Added logic to executor to insert new rows for the "temporal
leftover" part of a record touched by a FOR PORTION OF query.
- Documented FOR PORTION OF.
- Added tests.
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/ec498c3d-5f2b-48ec-b989-5561c8aa2024%40illuminatedcomputing.com
Oversight in commit 1bd6f22f43: I was way too optimistic about the
compiler letting me know what variables needed to be updated, and missed
a few of them. Clean it up.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reported-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/40E570EE-5A60-49D8-B8F7-2F8F2B7C8DFA@gmail.com
This allows both univariate and multivariate statistics to be built on
virtual generated columns and expressions that refer to virtual
generated columns. The restriction disallowing extended statistics on
a single column is lifted in the case of a single virtual generated
column, since it is treated as a single expression.
In the catalogs, references to virtual generated columns are stored
as-is. They are expanded at ANALYZE time to build the statistics, and
at planning time to allow the optimizer to make use of the statistics.
This allows the statistics to be correctly rebuilt using ANALYZE, if a
column's generation expression is altered (which causes any existing
statistics data to be deleted).
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/20250422181006.dd6f9d1d81299f5b2ad55e1a@sraoss.co.jp
Until now pgaio_wref_check_done() with io_method=io_uring would not detect if
IOs are known to have completed to the kernel, but the completion has not yet
been consumed by userspace. This can lead to inferior performance and also
makes it harder to use smarter feedback logic in read_stream, because we
cannot use knowledge about whether an IO completed to control the readahead
distance.
This commit just adds the io_uring specific infrastructure. Later commits will
return whether a wait was needed from WaitReadBuffers() and then use that
knowledge.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/f3xxfrkafjxpyqxywcxricxgyizjirfceychyxsgn7bwjp5eda@kwbduhy7tfmu
Discussion: https://postgr.es/m/CAH2-Wz%3DkMg3PNay96cHMT0LFwtxP-cQSRZTZzh1Cixxf8G%3Dzrw%40mail.gmail.com
FastPathMeta stored pointers into ri_compare_cache entries via
compare_entries[], creating a dependency on that cache remaining
stable. If ri_compare_cache entries were invalidated after fpmeta
was populated, the pointers would dangle.
Replace compare_entries[] with inline copies of the two FmgrInfo
fields actually needed (cast_func_finfo and eq_opr_finfo), copied
at populate time via fmgr_info_copy(). fpmeta now depends only on
riinfo remaining valid, which is already handled by the invalidation
callback.
Introduced by commit 2da86c1ef9 ("Add fast path for foreign key
constraint checks"), noticed while reviewing code for robustness
under CLOBBER_CACHE_ALWAYS.
Discussion: https://postgr.es/m/CA+HiwqFQ+ZA7hSOygv4uv_t75B3r0_gosjadetCsAEoaZwTu6g@mail.gmail.com
First, under CLOBBER_CACHE_ALWAYS, the RI_ConstraintInfo entry can
be invalidated by relcache callbacks triggered inside table_open()
or index_open(), leaving ri_FastPathCheck() calling
ri_populate_fastpath_metadata() with a stale entry whose valid flag
is false. Fix by moving the fpmeta initialization to after
ri_CheckPermissions(), reloading riinfo first to ensure it is
valid, then calling ri_ExtractValues() and build_index_scankeys()
immediately after before any further operations that could trigger
invalidation.
Second, fpmeta allocated in TopMemoryContext was not freed when the
entry was invalidated in InvalidateConstraintCacheCallBack(),
leaking memory each time the constraint cache entry was recycled.
Fix by freeing and NULLing fpmeta at invalidation time.
Noticed locally when testing with CLOBBER_CACHE_ALWAYS.
Discussion: https://postgr.es/m/CA+HiwqGBU__7-VZZhQWQ3EQuwLYNPd9==ngnzduhGWKHMj9mvw@mail.gmail.com
During the counting step, keep track of the bits that are the same
for the entire input. If we counted only a single distinct byte,
the next recursion will start at the next byte position that has
more than one distinct byte in the input. This allows us to skip over
multiple passes where the byte is the same for the entire input.
This provides a significant speedup for integers that have some upper
bytes with all-zeros or all-ones, which is common.
Reviewed-by: Chengpeng Yan <chengpeng_yan@outlook.com>
Reviewed-by: ChangAo Chen <cca5507@qq.com>
Discussion: https://postgr.es/m/CANWCAZYpGMDSSwAa18fOxJGXaPzVdyPsWpOkfCX32DWh3Qznzw@mail.gmail.com
Previously some logical decoding messages (e.g., "logical decoding found
consistent point") were logged at level LOG, even though they provided
low-level, developer-oriented information that DBAs were typically not
interested in.
Since these messages can occur routinely (for example, when keeping calling
pg_logical_slot_get_changes() to obtain the changes from logical decoding),
logging them at LOG can be overly verbose.
This commit reduces their log level to DEBUG1 to avoid unnecessary log noise.
This change applies to a small set of messages for now. Additional messages
may be adjusted similarly in the future.
Even with this change, if these messages from walsender still need to be
observed, enabling DEBUG1 logging selectively for walsender (e.g.,
log_min_messages = 'warning,walsender:debug1') would be helpful to avoid
increasing overall log volume.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwGTyHgtD9tyN664x6vQ8Q1G53H7ZUCgBU9_X=nLt3f1QA@mail.gmail.com
Use the standard C23 and C++ attributes [[nodiscard]], [[noreturn]],
and [[maybe_unused]], if available.
This makes pg_nodiscard and pg_attribute_unused() available in
not-GCC-compatible compilers that support C23 as well as in C++.
For pg_noreturn, we can now drop the GCC-specific and MSVC-specific
fallbacks, because the C11 and the C++ implementation will now cover
all required cases.
Note, in a few places, we need to change the position of the attribute
because it's not valid in that place in C23.
Discussion: https://www.postgresql.org/message-id/flat/pxr5b3z7jmkpenssra5zroxi7qzzp6eswuggokw64axmdixpnk@zbwxuq7gbbcw
The test_cplusplusext test module has so far been disabled on MSVC.
The only remaining problem now is that designated initializers, as
used in PG_MODULE_MAGIC, require C++20. (With GCC and Clang they work
in older C++ versions as well.)
This adds another test in the top-level meson.build to check that the
compiler supports C++20 designated initializers. This is not
required, we are just checking and recording the answer. If yes, we
can enable the test module.
Most current compilers likely won't be in C++20 mode by default. This
doesn't change that; we are not doing anything to try to switch the
compiler into that mode. This might be a separate project, but for
now we'll leave that for the user or the test scaffolding.
The VS task on Cirrus CI is changed to provide the required flag to
turn on C++20 mode.
There is no equivalent change in configure, since this change mainly
targets MSVC.
Co-authored-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/CAGECzQR21OnnKiZO_1rLWO0-16kg1JBxnVq-wymYW0-_1cUNtg%40mail.gmail.com
The new test fails with an error about a missing WAL file on that
stack, because it is archived in GNU tar's --sparse --format=posix
format. BSD tar uses that format by default, unlike GNU tar itself, and
ZFS triggers it by implicitly creating sparse files when it sees a lot
of zeroes.
The problem will surely also affect real users of the new tar support in
pg_waldump (commit b15c1513) and pg_verifybackup (commit b3cf461b3) on
such systems. Ideas under discussion, but for now the test is made to
pass by disabling sparse file detection in BSD tar.
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/1624716.1774736283%40sss.pgh.pa.us
The check for skip_if_not_valid added in 819dc118c0 was put at the start of
the loop. A CAS loop in theory does allow to make that check in a race free
manner. However, just after the check, there's a
old_buf_state = WaitBufHdrUnlocked(buf);
which introduces a race, because it would allow BM_VALID to be cleared, after
the skip_if_not_valid check.
Fix by restarting the loop after WaitBufHdrUnlocked().
Reported-by: Yura Sokolov <y.sokolov@postgrespro.ru>
Discussion: https://postgr.es/m/5bf667f3-5270-4b19-a08f-0facbecdff68@postgrespro.ru
If the archive file was made with --no-schema or related options
then it likely does not have enough dependency information to
ensure that parallel restore will choose a workable restore order.
Document this.
In passing, do some minor wordsmithing on new nearby text about
restoring from pg_dumpall archives.
Author: vaibhave postgres <postgresvaibhave@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAM_eQjzTLtt1X9WKvMV6Rew0UvxT3mmhimZa9WT-vqaPjmXk-g@mail.gmail.com
In a production build, StartReadBuffers() doesn't populate all fields
of a ReadBuffersOperation for a buffer hit because no callers use them
(they are populated in assert builds).
read_buffers() (a test-only function) relied on some of these fields, so
AIO tests failed on non-assert builds (discovered on the buildfarm after
commit 020c02bd90).
Fix by tracking the required information ourselves in read_buffers() and
avoiding reliance on the ReadBuffersOperation unless we know that we did
IO.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/9ce8f5d8-8ab2-4aa2-b062-c5d74161069c%40gmail.com
Currently, when the client sends a parameter discovery request within
OAUTHBEARER, the server logs the attempt with
FATAL: OAuth bearer authentication failed for user
These log entries are difficult to distinguish from true authentication
failures, and by default, libpq sends a discovery request as part of
every OAuth connection, making them annoyingly noisy. Use the new
PG_SASL_EXCHANGE_ABANDONED status to suppress them.
Patch by Zsolt Parragi, with some additional comments added by me.
Author: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAN4CZFPim7hUiyb7daNKQPSZ8CvQRBGkVhbvED7yZi8VktSn4Q%40mail.gmail.com
Introduce PG_SASL_EXCHANGE_ABANDONED, which allows CheckSASLAuth to
suppress the failing log entry for any SASL exchange that isn't actually
an authentication attempt. This is desirable for OAUTHBEARER's discovery
exchanges (and a subsequent commit will make use of it there).
This might have some overlap in the future with in-band aborts for SASL
exchanges, but it's intentionally not named _ABORTED to avoid confusion.
(We don't currently support clientside aborts in our SASL profile.)
Adapted from a patch by Zsolt Parragi.
Author: Zsolt Parragi <zsolt.parragi@percona.com>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAN4CZFPim7hUiyb7daNKQPSZ8CvQRBGkVhbvED7yZi8VktSn4Q%40mail.gmail.com
SASL exchanges must end with either an AuthenticationOk or an
ErrorResponse from the server, and the standard way to produce an
ErrorResponse packet is for auth_failed() to call ereport(FATAL). This
means that there's no way for a SASL mechanism to suppress the server
log entry if the "authentication attempt" was really just a query for
authentication metadata, as is done with OAUTHBEARER.
Following the example of 1f9158ba4, add a FATAL_CLIENT_ONLY elevel. This
will allow ClientAuthentication() to choose not to log a particular
failure, while still correctly ending the authentication exchange before
process exit.
(The provenance of this patch is convoluted: since it's a mechanical
copy-paste of 1f9158ba4, both Zsolt Parragi and I produced nearly
identical versions independently, and Andrey Borodin reviewed Zsolt's
version. Tom Lane is the author of 1f9158ba4, but I don't want to imply
that he's signed off on this adaptation. See Discussion.)
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CAN4CZFPim7hUiyb7daNKQPSZ8CvQRBGkVhbvED7yZi8VktSn4Q%40mail.gmail.com
For PG19, since we won't have the ability to officially switch out flow
plugins, relax the flow-loading code to not require the internal init
function. Modules that don't have one will be treated as custom user
flows in error messages.
This will let bleeding-edge developers more easily test out the API and
provide feedback for PG20, by telling the runtime linker to find a
different libpq-oauth. It remains undocumented for end users.
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAOYmi%2BmrGg%2Bn_X2MOLgeWcj3v_M00gR8uz_D7mM8z%3DdX1JYVbg%40mail.gmail.com
The new PGoauthBearerRequestV2 API (which has similarities to the
"subclass" pointer architecture in use by the backend, for Nodes)
carries the risk of a developer ignoring the type of hook in use and
just casting directly to the V2 struct. This will appear to work fine in
19, but crash (or worse) when speaking to libpq 18.
However, we're in a unique position to catch this problem, because we
have tight control over the struct. Add poisoning code to the v1 path
which does the following:
- masks the v2 request->issuer pointer, to hopefully point at nonsense
memory
- abort()s if the v2 request->error is assigned by the hook
- attempts to cover both with VALGRIND_MAKE_MEM_NOACCESS for the
duration of the callback (a potential AddressSanitizer implementation
is left for future work)
The struct is unpoisoned after the call, so we can switch back to the v2
internal implementation when necessary.
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAOYmi%2BnCg5upBVOo_UCSjMfO%3DYMkZXcSEsgaADKXqerr5wahZQ%40mail.gmail.com
Commit 2252fcd427 modified some function prototypes in tableam.h
and heapam.h to take a VacuumParams argument instead of a pointer,
which required including vacuum.h in those headers. vacuum.h has a
reasonably large dependency tree, and headers like tableam.h are
widely included, so this is not ideal. To fix, change the
functions in question to accept a "const VacuumParams *" argument
instead. That allows us to use a forward declaration for
VacuumParams and avoid including vacuum.h. Since vacuum_rel()
needs to scribble on the params argument, we still pass it by value
to that function so that the original struct is not modified.
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/rzxpxod4c4la62yvutyrvgoyilrl2fx55djaf2suidy7np5m6c%403l2ln476eadh
This is nonsense on its face, since the textsearch parsing logic
generally uses int32 to count words (see, eg, struct ParsedText).
Not to mention that we don't support input strings larger than
1GB.
The actual limitation of interest is documented nearby: a tsvector
can't be larger than 1MB, thanks to 20-bit offset fields within it
(see WordEntry.pos). That constrains us to well under 256K lexemes
per tsvector, depending on how many positions are stored per lexeme.
It seems sufficient therefore to just remove the bit about number
of lexemes.
Author: Dharin Shah <dharinshah95@gmail.com>
Discussion: https://postgr.es/m/CAOj6k6d0YO6AO-bhxkfUXPxUi-+YX9-doh2h5D5z0Bm8D2w=OA@mail.gmail.com
The docs previously didn't explain that leaf and non-leaf keys
could be treated differently, even though many of our opclasses
do exactly that. It also wasn't explained how that relates to
the STORAGE option, particularly since only one storage type
can be specified for both leaf and non-leaf keys.
While here, reorganize the text slightly, rather than sticking
additional detail into what's supposed to be a brief summary
paragraph.
Author: Paul A Jungwirth <pj@illuminatedcomputing.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA+renyWs5Np+FLSYfL+eu20S4U671A3fQGb-+7e22HLrD1NbYw@mail.gmail.com
When converting the WHERE clause in an element pattern,
generate_query_for_graph_path() calls replace_property_refs() to
replace the property references in it. Only the current graph element
pattern is passed as the context for replacement. If there are
references to variables from other element patterns, it causes a
segmentation fault (an assertion failure in an Assert enabled build)
since it does not find path_element object corresponding to those
variables.
We do not support forward and backward variable references within a
graph table clause. Hence prohibit all the cross references.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reported-by: Man Zeng <zengman@halodbtech.com>
Reviewed-by: Henson Choi <assam258@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAExHW5u6AoDfNg4%3DR5eVJn_bJn%3DC%3DwVPrto02P_06fxy39fniA%40mail.gmail.com
When a ColumnRef can be resolved as a graph table property reference
and a lateral table column reference prefer the graph table property
reference since element pattern variables in the GRAPH_TABLE clause
form the innermost namespace.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Henson Choi <assam258@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAExHW5u6AoDfNg4%3DR5eVJn_bJn%3DC%3DwVPrto02P_06fxy39fniA%40mail.gmail.com
XLOG_CHECKPOINT_REDO only contains the wal_level copied straight in
without an encapsulating record structure. While it works, it makes
future uses of XLOG_CHECKPOINT_REDO hard as there is nowhere to put
new data items. This fix this was inspired by the online checksums
patch which adds data to this record, but this change has value on
its own.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/c92b5d8b-bc03-47bc-b209-2e4a719eee32@iki.fi
Add a fast-path optimization for foreign key checks that bypasses SPI
by directly probing the unique index on the referenced table.
Benchmarking shows ~1.8x speedup for bulk FK inserts (int PK/int FK,
1M rows, where PK table and index are cached).
The fast path applies when the referenced table is not partitioned and
the constraint does not involve temporal semantics. Otherwise, the
existing SPI path is used.
This optimization covers only the referential check trigger
(RI_FKey_check). The action triggers (CASCADE, SET NULL, SET DEFAULT,
RESTRICT, NO ACTION) must find rows on the FK side to modify, which
requires a table scan with no guaranteed index available, and then
execute DML against those rows through the full executor path including
any triggered actions. Replicating that without substantial code
duplication is not feasible, so those triggers remain on the SPI path.
Extending the fast path to action triggers remains possible as future
work if the necessary infrastructure is built.
The new ri_FastPathCheck() function extracts the FK values, builds scan
keys, performs an index scan, and locks the matching tuple with
LockTupleKeyShare via ri_LockPKTuple(), which handles the RI-specific
subset of table_tuple_lock() results.
If the locked tuple was reached by chasing an update chain
(tmfd.traversed), recheck_matched_pk_tuple() verifies that the key
is still the same, emulating EvalPlanQual.
The scan uses GetTransactionSnapshot(), matching what the SPI path
uses (via _SPI_execute_plan pushing GetTransactionSnapshot() as the
active snapshot). Under READ COMMITTED this is a fresh snapshot;
under REPEATABLE READ / SERIALIZABLE it is the frozen transaction-
start snapshot, so PK rows committed after the transaction started
are not visible.
The ri_CheckPermissions() function performs schema USAGE and table
SELECT checks, matching what the SPI path gets implicitly through
the executor's permission checks. The fast path also switches to
the PK table owner's security context (with SECURITY_NOFORCE_RLS)
before the index probe, matching the SPI path where the query runs
as the table owner.
ri_HashCompareOp() is adjusted to handle cross-type equality operators
(e.g. int48eq for int4 PK / int8 FK) which can appear in conpfeqop.
The existing code asserted same-type operators only, which was correct
for its existing callers (ri_KeysEqual compares same-type FK column
values via ff_eq_oprs), but the fast path is the first caller to pass
pf_eq_oprs, which can be cross-type.
Per-key metadata (compare entries, operator procedures, strategy
numbers) is cached in RI_ConstraintInfo via
ri_populate_fastpath_metadata() on first use, eliminating repeated
calls to ri_HashCompareOp() and get_op_opfamily_properties().
conindid and pk_is_partitioned are also cached at constraint load
time, avoiding per-invocation syscache lookups and the need to open
pk_rel before deciding whether the fast path applies.
New regression tests cover RLS bypass and ACL enforcement for the
fast-path permission checks. New isolation tests exercise concurrent
PK updates under both READ COMMITTED and REPEATABLE READ.
Author: Junwang Zhao <zhjwpku@gmail.com>
Co-authored-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Tested-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
Adjust the syntax of the EXCEPT clause in CREATE/ALTER PUBLICATION
added in commits fd366065e0 and 493f8c6439 to move the TABLE keyword
inside the relation list.
Old syntax:
CREATE PUBLICATION ... FOR ALL TABLES EXCEPT TABLE (t1, ...);
ALTER PUBLICATION ... SET ALL TABLES EXCEPT TABLE (t1, ...);
New syntax:
CREATE PUBLICATION ... FOR ALL TABLES EXCEPT (TABLE t1, ...);
ALTER PUBLICATION ... SET ALL TABLES EXCEPT (TABLE t1, ...);
This is to ensure that inclusion and exclusion list can be specified in
a same way. Previously, the exclusion table list can be specified as
TABLE (t1, t2, t3) and inclusion list can be specified as TABLE t1, t2,
t3, or TABLE t1, TABLE t2, TABLE t3.
This change is purely syntactic and does not alter behavior.
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
Author: vignesh C <vignesh21@gmail.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAD21AoCC8XuwfX62qKBSfHUAoww_XB3_84HjswgL9jxQy696yw@mail.gmail.com
Discussion: https://postgr.es/m/CALDaNm3=JrucjhiiwsYQw5-PGtBHFONa6F7hhWCXMsGvh=tamA@mail.gmail.com
We long ago folded these two tuple header fields into one field
to save space. However, nothing was done to the user-facing
documentation about them, perhaps with the idea that we'd add
code to emit something approximating the original definitions.
That never happened and presumably never will, so update the
text to reflect current reality.
Author: Paul A Jungwirth <pj@illuminatedcomputing.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA+renyVYYboiTayRRE0j1oKpeB+NjEBSUXfwgEu6O0JESSmauQ@mail.gmail.com
PG18 hid the PGOAUTHCAFILE envvar behind PGOAUTHDEBUG=UNSAFE, because I
thought that any "real" production usage of private CA certificates
would have them added to the Curl system trust store. But there are use
cases, such as containerized environments, that prefer to manage custom
CA settings more granularly; some of them consider envvar configuration
of certificates to be standard practice.
Move PGOAUTHCAFILE out from under the debug flag, and add an
oauth_ca_file option to libpq to configure trusted CAs per connection.
Patch by Jonathan Gonzalez V., with some additional wordsmithing and
test organization by me.
Author: Jonathan Gonzalez V. <jonathan.abdiel@gmail.com>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/16a91d02795cb991963326a902afa764e4d721db.camel%40gmail.com
In addition to removing the bits8, bits16, and bits32 typedefs,
this commit replaces all uses with uint8, uint16, or uint32. bits*
provided little benefit beyond establishing the intent of the
variable, and they were inconsistently used for that purpose.
Third-party code should instead use the corresponding uint*
typedef.
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/absbX33E4eaA0Ity%40nathan
Now that on-access pruning can update the visibility map (VM) during
read-only queries, set the page’s pd_prune_xid hint during INSERT and on
the new page during UPDATE.
This allows heap_page_prune_and_freeze() to set the VM the first time a
page is read after being filled with tuples. This may avoid I/O
amplification by setting the page all-visible when it is still in shared
buffers and allowing later vacuums to skip scanning the page. It also
enables index-only scans of newly inserted data much sooner.
As a side benefit, this addresses a long-standing note in heap_insert()
and heap_multi_insert(): aborted inserts can now be pruned on-access
rather than lingering until the next VACUUM.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
This commit implements on-access VM setting for sequential scans, tid
range scans, sample scans, bitmap heap scans, and the underlying heap
relation in index scans.
Setting the visibility map on-access can avoid write amplification
caused by vacuum later needing to set the page all-visible, which could
trigger a write and potentially an FPI. It also allows more frequent
index-only scans, since they require pages to be marked all-visible in
the VM.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
The comment part of d9dd406fe2 mentioned that -Wvla is not applicable
for C++. That is not fully correct: it is true that VLAs are not part of the
C++ standard, and g++ with -pedantic will also warn about them as a non-standard
extension. However, -Wvla itself works fine in C++ and will catch VLA
usage just as in C.
Fix configure.ac to apply -Werror=vla to C++ as well. There is no need to
fix meson.build as it already includes it in common_warning_flags.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Suggested-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/7bf60ab1-2b5d-4a77-93ce-815072a0a014%40eisentraut.org
Several places in tuplestore.c would leave the tuplestore data
structure effectively corrupt if some subroutine were to throw
an error. Notably, if WRITETUP() failed after some number of
successful calls within dumptuples(), the tuplestore would
contain some memtuples pointers that were apparently live
entries but in fact pointed to pfree'd chunks.
In most cases this sort of thing is fine because transaction
abort cleanup is not too picky about the contents of memory that
it's going to throw away anyway. There's at least one exception
though: if a Portal has a holdStore, we're going to call
tuplestore_end() on that, even during transaction abort.
So it's not cool if that tuplestore is corrupt, and that means
tuplestore.c has to be more careful.
This oversight demonstrably leads to crashes in v15 and before,
if a holdable cursor fails to persist its data due to an undersized
temp_file_limit setting. Very possibly the same thing can happen in
v16 and v17 as well, though the specific test case submitted failed
to fail there (cf. 095555daf). The failure is accidentally dodged
as of v18 because 590b045c3 got rid of tuplestore_end's retail tuple
deletion loop. Still, it seems unwise to permit tuplestores to become
internally inconsistent in any branch, so I've applied the same fix
across the board.
Since the known test case for this is rather expensive and doesn't
fail in recent branches, I've omitted it.
Bug: #19438
Reported-by: Dmitriy Kuzmin <kuzmin.db4@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/19438-9d37b179c56d43aa@postgresql.org
Backpatch-through: 14
Some of these probably could continue using non-re-entrant getopt()
even if we start using threads in the future, but it seems better to
make them all anyway, so that we have a clear-cut rule of "no plain
getopt() in the postgres binary".
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/d1da5f0e-0d68-47c9-a882-eb22f462752f@iki.fi
The standard getopt(3) function is not re-entrant nor thread-safe.
That's OK for current usage, but it's one more little thing we need to
change in order to make the server multi-threaded.
There's no standard getopt_r() function on any platform, I presume
because command line arguments are usually parsed early when you start
a program, before launching any threads, so there isn't much need for
it. However, we call it at backend startup to parse options from the
startup packet. Because there's no standard, we're free to define our
own.
The pg_getopt_start/next() implementation is based on the old getopt
implementation, I just gathered all the state variables to a struct.
The non-re-entrant getopt() function is now a wrapper around the
re-entrant variant, on platforms that don't have getopt(3).
getopt_long() is not used in the server, so we don't need to provide a
re-entrant variant of that.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/d1da5f0e-0d68-47c9-a882-eb22f462752f@iki.fi
The function is supposed to look at the passed in 'arg' argument, but
peeks at the 'optarg' global variable that's part of getopt()
instead. It happened to work anyway, because all callers passed
'optarg' as the argument.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/d1da5f0e-0d68-47c9-a882-eb22f462752f@iki.fi
Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
Add an AM user-settable flags parameter to several of the table scan
functions, one table AM callback, and index_beginscan(). This allows
users to pass additional context to be used when building the scan
descriptors.
For index scans, a new flags field is added to IndexFetchTableData, and
the heap AM saves the caller-provided flags there.
This introduces an extension point for follow-up work to pass per-scan
information (such as whether the relation is read-only for the current
query) from the executor to the AM layer.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/2be31f17-5405-4de9-8d73-90ebc322f7d8%40vondra.me
Before the major rewrite in commit c6e0fe1f2, AllocSetFree() would
typically crash when asked to free an already-free chunk. That was
an ugly but serviceable way of detecting coding errors that led to
double pfrees. But since that rewrite, double pfrees went through
just fine, because the "hdrmask" of a freed chunk isn't changed at all
when putting it on the freelist. We'd end with a corrupt freelist
that circularly links back to the doubly-freed chunk, which would
usually result in trouble later, far removed from the actual bug.
This situation is no good at all for debugging purposes. Fortunately,
we can fix it at low cost in MEMORY_CONTEXT_CHECKING builds by making
AllocSetFree() check for chunk->requested_size == InvalidAllocSize,
relying on the pre-existing code that sets it that way just below.
I investigated the alternative of changing a freed chunk's methodid
field, which would allow detection in non-MEMORY_CONTEXT_CHECKING
builds too. But that adds measurable overhead. Seeing that we didn't
notice this oversight for more than three years, it's hard to argue
that detecting this type of bug is worth any extra overhead in
production builds.
Likewise fix AllocSetRealloc() to detect repalloc() on a freed chunk,
and apply similar changes in generation.c and slab.c. (generation.c
would hit an Assert failure anyway, but it seems best to make it act
like aset.c.) bump.c doesn't need changes since it doesn't support
pfree in the first place. Ideally alignedalloc.c would receive
similar changes, but in debugging builds it's impossible to reach
AlignedAllocFree() or AlignedAllocRealloc() on a pfreed chunk, because
the underlying context's pfree would have wiped the chunk header of
the aligned chunk. But that means we should get an error of some
sort, so let's be content with that.
Per investigation of why the test case for bug #19438 didn't appear to
fail in v16 and up, even though the underlying bug was still present.
(This doesn't fix the underlying double-free bug, just cause it to
get detected.)
Bug: #19438
Reported-by: Dmitriy Kuzmin <kuzmin.db4@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/19438-9d37b179c56d43aa@postgresql.org
Backpatch-through: 16
An Append node that is part of a partitionwise aggregate has no
apprelids. If such a node was elided, the previous coding would
attempt to call unique_nonjoin_rtekind() on a NULL pointer, which
leads to an assertion failure. Insert a NULL check to prevent that.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: http://postgr.es/m/0afba1ce-c946-4131-972d-191d9a1c097c@gmail.com
PlannedStmt->resultRelations was an integer list of range table indexes
because at the time it was added (to Query), the Bitmapset data type did
not yet exist in Postgres.
0f4c170cf3 added a Bitmapset of result relations, so remove the integer
list of RTIs and use the more compact resultRelationRelids.
Discussion: https://postgr.es/m/CAApHDvqAOeOwCKh9g0gfxWa040%3DHyc7_oA%3DC59rjod8kXJDWyw%40mail.gmail.com
Save the range table indexes of result relations and row mark relations
in separate bitmapsets in the PlannedStmt. Precomputing them allows
cheap membership checks during execution. Together, these two groups
approximate all relations that will be modified by a query. This
includes relations targeted by INSERT, UPDATE, DELETE, and MERGE as well
as relations with any row mark (like SELECT FOR UPDATE).
Future work will use information on whether or not a relation is
modified by a query in a heuristic.
PlannedStmt->resultRelations is only used in a membership check, so it
will be removed in a separate commit.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
Headers that c.h includes early should not have another header
included before them in the headerscheck test file, especially not
c.h.
A particular instance of a problem is that pg_config.h defines some
symbols that c.h later undefines in some cases, such as in the code
added by commit cd083b54bd, but there were also some before that.
This only works correctly if pg_config.h is included first.
pg_config_manual.h and pg_config_os.h are closely related to
pg_config.h and should be treated the same way.
postgres_ext.h is meant to be usable standalone, so testing it with
c.h included first defeats the point.
c.h also includes port.h, but this commit leaves that alone, since
port.h does need some of c.h to be processed first. (But because of
header guards, testing port.h separately is probably ineffective.)
Discussion: https://www.postgresql.org/message-id/flat/579116be-5016-4dbc-aed0-c06f8d9f5bbb%40eisentraut.org
Previously, the function casting type circle to type polygon could not
be made error safe, because it is an SQL language function.
This refactors it as a C/internal function, by sharing code with the
C/internal function that the SQL function previously wrapped, and soft
error support is added.
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Amul Sul <sulamul@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: Discussion: https://www.postgresql.org/message-id/flat/CADkLM%3Dfv1JfY4Ufa-jcwwNbjQixNViskQ8jZu3Tz_p656i_4hQ%40mail.gmail.com
Previously, a foreign key defined as DEFERRABLE INITIALLY DEFERRED could
behave as NOT DEFERRABLE after being set to NOT ENFORCED and then back
to ENFORCED.
This happened because recreating the FK triggers on re-enabling the constraint
forgot to restore the tgdeferrable and tginitdeferred fields in pg_trigger.
Fix this bug by properly setting those fields when the foreign key constraint
is marked ENFORCED again and its triggers are recreated, so the original
DEFERRABLE and INITIALLY DEFERRED properties are preserved.
Backpatch to v18, where NOT ENFORCED foreign keys were introduced.
Author: Yasuo Honda <yasuo.honda@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAKmOUTms2nkxEZDdcrsjq5P3b2L_PR266Hv8kW5pANwmVaRJJQ@mail.gmail.com
Backpatch-through: 18
Functions such as hash_numeric() are not careful to use the correct
PG_RETURN_*() macro according to the return type of that function as
defined in pg_proc. Because that function is meant to return int32,
when the hashed value exceeds 2^31, the 64-bit Datum value won't wrap to
a negative number, which means the Datum won't have the same value as it
would have had it been cast to int32 on a two's complement machine. This
isn't harmless as both datum_image_eq() and datum_image_hash() may receive
a Datum that's been formed and deformed from a tuple in some cases, and
not in other cases. When formed into a tuple, the Datum value will be
coerced into an integer according to the attlen as specified by the
TupleDesc. This can result in two Datums that should be equal being
classed as not equal, which could result in (but not limited to) an error
such as:
ERROR: could not find memoization table entry
Here we fix this by ensuring we cast the Datum value to a signed integer
according to the typLen specified in the datum_image_eq/datum_image_hash
function call before comparing or hashing.
Author: David Rowley <dgrowleyml@gmail.com>
Reported-by: Tender Wang <tndrwang@gmail.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/CAHewXNmcXVFdB9_WwA8Ez0P+m_TQy_KzYk5Ri5dvg+fuwjD_yw@mail.gmail.com
This followw up on the previous change (commit 7bff9f106a) for partitions by
applying the same formatting to inheritance tables lists.
Previously, \d+ <table> displayed inheritance tables differently from other
object lists: the first inheritance table appeared on the same line as the
"Inherits" header. For example:
Inherits: test_like_5,
test_like_5x
This commit updates the output so that inheritance tables are listed
consistently with other objects, with each entry on its own line starting
below the header:
Inherits:
test_like_5
test_like_5x
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Neil Chen <carpenter.nail.cz@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHut+Pu1puO00C-OhgLnAcECzww8MB3Q8DCsvx0cZWHRfs4gBQ@mail.gmail.com
Previously, \d+ <table> displayed partitions differently from other object
lists: the first partition appeared on the same line as the "Partitions"
header. For example:
Partitions: pt12 FOR VALUES IN (1, 2),
pt34 FOR VALUES IN (3, 4)
This commit updates the output so that partitions are listed consistently
with other objects, with each entry on its own line starting below the header:
Partitions:
pt12 FOR VALUES IN (1, 2)
pt34 FOR VALUES IN (3, 4)
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Neil Chen <carpenter.nail.cz@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHut+Pu1puO00C-OhgLnAcECzww8MB3Q8DCsvx0cZWHRfs4gBQ@mail.gmail.com
Commit 121d774cae added text to master describing pruning-aware
locking behavior introduced by 525392d57. That behavior was
reverted in May 2025, making the text incorrect. Replace it with
the text used in back branches, which correctly describes current
behavior: pruned partitions are still locked at the beginning of
execution.
Discussion: https://postgr.es/m/CA+HiwqFT0fPPoYBr0iUFWNB-Og7bEXB9hB=6ogk_qD9=OM8Vbw@mail.gmail.com
The reason for passing fire_triggers=false to SPI_execute_snapshot()
in ri_PerformCheck() was not documented, making it unclear why it was
done that way. Add a comment explaining that it ensures AFTER triggers
on rows modified by the RI action are queued in the outer query's
after-trigger context and fire only after all RI updates on the same
row are complete.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Surya Poondla <suryapoondla4@gmail.com>
Discussion: https://postgr.es/m/20250331212648.ad4ab804559001d7f0788741@sraoss.co.jp
This adjusts cast functions of the geometry types to support soft
errors. This requires refactoring of various helper functions to
support error contexts. Also make the float8 to float4 cast error
safe. It requires some of the same helper functions.
This is in preparation for a future feature where conversion errors in
casts can be caught.
(The function casting type circle to type polygon is not yet made error
safe, because it is an SQL language function.)
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Amul Sul <sulamul@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CADkLM%3Dfv1JfY4Ufa-jcwwNbjQixNViskQ8jZu3Tz_p656i_4hQ%40mail.gmail.com
Most of the pairs of incompatible options (such as --file and --dbname)
are pretty obvious and need no explanation. But it may not be obvious
that --single-transaction cannot be used together with --create or
multiple jobs, so let's mention that in the documentation.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/CAExHW5ti5igDwOOde6shgfS7JPtCY9gNrkB3xNr=FuGTYVDSjQ@mail.gmail.com
Add a sentence that describes the parts of a cluster's state that are
*not* included in the output.
Also swap two sentences in the introductory paragraph. Without that,
it is not clear what the "it" at the beginning of the second sentence
is referring to. Also add a reference to pg_restore, since not all
output formats are restored with pg_dump.
Also clarify the recently-added text about where different output
formats go, and relocate it above the ancillary text about having
to run as superuser.
Reported-by: Dimitre Radoulov <cichomitiko@gmail.com>
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAGJBphSX2oMPPu=VM4U8NP4+qffFH_483tFQCJ_s-mOcN3DLDw@mail.gmail.com
astreamer_tar_parser_content() sent the wrong data pointer when
forwarding MEMBER_TRAILER padding to the next streamer. After
astreamer_buffer_until() buffers the padding bytes, the 'data'
pointer has been advanced past them, but the code passed 'data'
instead of bbs_buffer.data. This caused the downstream consumer
to receive bytes from after the padding rather than the padding
itself, and could read past the end of the input buffer.
astreamer_gzip_decompressor_content() only checked for
Z_STREAM_ERROR from inflate(), silently ignoring Z_DATA_ERROR
(corrupted data) and Z_MEM_ERROR (out of memory). Fix by
treating any return other than Z_OK, Z_STREAM_END, and
Z_BUF_ERROR as fatal.
astreamer_gzip_decompressor_free() missed calling inflateEnd() to
release zlib's internal decompression state.
astreamer_tar_parser_free() neglected to pfree() the streamer
struct itself, leaking it.
astreamer_extractor_content() did not check the return value of
fclose() when closing an extracted file. A deferred write error
(e.g., disk full on buffered I/O) would be silently lost.
Discussion: https://postgr.es/m/results/98c6b630-acbb-44a7-97fa-1692ce2b827c@dunslane.net
Reviewed-By: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 15
This adjusts cast functions from jsonb to other types to support soft
errors. This just involves some refactoring of the underlying helper
functions to use ereturn.
This is in preparation for a future feature where conversion errors in
casts can be caught.
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Amul Sul <sulamul@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CADkLM%3Dfv1JfY4Ufa-jcwwNbjQixNViskQ8jZu3Tz_p656i_4hQ%40mail.gmail.com
When a backend attempts to start a read IO and finds the first buffer already
has I/O in progress, previously it waited for that I/O to complete before
initiating reads for any of the subsequent buffers.
Although it must wait for the I/O to finish when acquiring the buffer, there's
no reason for it to wait when setting up the read operation. Waiting at this
point prevents starting I/O on subsequent buffers and can significantly reduce
concurrency.
This matters in two workloads:
1) When multiple backends scan the same relation concurrently.
2) When a single backend requests the same block multiple times within the
readahead distance.
Waiting each time an in-progress read is encountered effectively degenerates
the access pattern into synchronous I/O.
To fix this, when encountering an already in-progress IO for the head buffer,
the wait reference is now recorded and waiting is deferred until
WaitReadBuffers(), when the buffer actually needs to be acquired.
In rare cases, a backend may still need to wait synchronously at IO
start time: If another backend has set BM_IO_IN_PROGRESS on the buffer
but has not yet set the wait reference. Such windows should be brief and
uncommon.
Author: Melanie Plageman <melanieplageman@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/flat/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw%403p3zu522yykv
Until now StartBufferIO() had a few weaknesses:
- As it did not submit staged IOs, it was not safe to call StartBufferIO()
where there was a potential for unsubmitted IO, which required
AsyncReadBuffers() to use a wrapper (ReadBuffersCanStartIO()) around
StartBufferIO().
- With nowait = true, the boolean return value did not allow to distinguish
between no IO being necessary and having to wait, which would lead
ReadBuffersCanStartIO() to unnecessarily submit staged IO.
- Several callers needed to handle both local and shared buffers, requiring
the caller to differentiate between StartBufferIO() and StartLocalBufferIO()
- In a future commit some callers of StartBufferIO() want the BufferDesc's
io_wref to be returned, to asynchronously wait for in-progress IO
- Indicating whether to wait with the nowait parameter was somewhat confusing
compared to a wait parameter
Address these issues as follows:
- StartBufferIO() is renamed to StartSharedBufferIO()
- A new StartBufferIO() is introduced that supports both shared and local
buffers
- The boolean return value has been replaced with an enum, indicating whether
the IO is already done, already in progress or that the buffer has been
readied for IO
- A new PgAioWaitRef * argument allows the caller to get the wait reference is
desired. All current callers pass NULL, a user of this will be introduced
subsequently
- Instead of the nowait argument there now is wait
This probably would not have been worthwhile on its own, but since all these
lines needed to be touched anyway...
Author: Andres Freund <andres@anarazel.de>
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw@3p3zu522yykv
PostmasterContext is not available in single-user mode, use
TopMemoryContext instead. Also make sure that we use the correct
memory context in the lappend().
Author: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/acb_Eo1XtmCO_9z7@nathan
While we have a lot of indirect coverage of read streams, there are corner
cases that are hard to test when only indirectly controlling and observing the
read stream. This commit adds an SQL callable SRF interface for a read stream
and uses that in a few tests.
To make some of the tests possible, the injection point infrastructure in
test_aio had to be expanded to allow blocking IO completion.
While at it, fix a wrong debug message in inj_io_short_read_hook().
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw@3p3zu522yykv
Create a <sect4> section for each function that the previous text
described in one long series of paragraphs. Also split the functions'
previously in-line syntax summaries into <synopsis> clauses, which is
more readable and allows us to sneak in an explicit mention of the
result data type.
This change gives us an opportunity to make cross-reference links
more specific, too, so do that.
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxFuk9P=P4=BZ=qCkgvo6im8aL8NnCkjxx2S2MQDWNdouw@mail.gmail.com
Now that the buffer content lock is implemented as part of BufferDesc.state,
releasing the lock and unpinning the buffer can be implemented as a single
atomic operation.
This improves workloads that have heavy contention on a small number of
buffers substantially, I e.g., see a ~20% improvement for pipelined readonly
pgbench on an older two socket machine.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d
An upcoming commit will make UnlockReleaseBuffer() considerably faster and
more scalable than doing LockBuffer(BUFFER_LOCK_UNLOCK); ReleaseBuffer();. But
it's a small performance benefit even as-is.
Most of the callsites changed in this patch are not performance sensitive,
however some, like the nbtree ones, are in critical paths.
This patch changes all the easily convertible places over to
UnlockReleaseBuffer() mainly because I needed to check all of them anyway, and
reducing cases where the operations are done separately makes the checking
easier.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d
After the series of preceding commits introducing and using
BufferBeginSetHintBits()/BufferSetHintBits16(), hint bits are not set anymore
while IO is going on. Therefore we do not need to copy pages while they are
being written out anymore.
For the same reason XLogSaveBufferForHint() now does not need to operate on a
copy of the page anymore, but can instead use the normal XLogRegisterBuffer()
mechanism. For that the assertions and comments to XLogRegisterBuffer() had to
be updated to allow share-exclusive locked buffers to be registered.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d
Not only is this good style, but it dodges some obscure bugs within
pg_bsd_indent. We could try to fix said bugs, but the amount of
effort required seems far out of proportion to the benefit.
Reported-by: Akshay Joshi <akshay.joshi@enterprisedb.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CANxoLDfca8O5SkeDxB_j6SVNXd+pNKaDmVmEW+2yyicdU8fy0w@mail.gmail.com
While fixing the base32hex UUID sortability test in commit
89210037a0, it turned out that the expected lexicographical order is
only maintained under the C collation (or an equivalent byte-wise
collation). Natural language collations may employ different rules,
breaking the sortability.
This commit updates the documentation to explicitly state that
base32hex is "byte-wise sortable", ensuring users do not fall into the
trap of using natural language collations when querying their encoded
data.
Co-Authored-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CAD21AoAwX1D6baSGuQXm0mzPXPWB07kgaoaaahjNHHenbdY24A@mail.gmail.com
Autovacuum workers scan pg_class twice to collect the set of tables
to process. The first pass is for plain relations and materialized
views, and the second is for TOAST tables. When the worker finds a
table to process, it adds it to the end of a list. Later on, it
processes the tables in the same order as the list. This simple
strategy has worked surprisingly well for a long time, but there
have been many discussions over the years about trying to improve
it.
This commit introduces a scoring system that is used to sort the
aforementioned list of tables to process. The idea is to have
autovacuum workers prioritize tables that are furthest beyond their
thresholds (e.g., a table nearing transaction ID wraparound should
be vacuumed first). This prioritization scheme is certainly far
from perfect; there are simply too many possibilities for any
scoring technique to work across all workloads, and the situation
might change significantly between the time we calculate the score
and the time that autovacuum processes it. However, we have
attemped to develop something that is expected to work for a large
portion of workloads with reasonable parameter settings.
The score is calculated as the maximum of the ratios of each of the
table's relevant values to its threshold. For example, if the
number of inserted tuples is 100, and the insert threshold for the
table is 80, the insert score is 1.25. If all other scores are
below that value, the table's score will be 1.25. The other
criteria considered for the score are the table ages (both
relfrozenxid and relminmxid) compared to the corresponding
freeze-max-age setting, the number of update/deleted tuples
compared to the vacuum threshold, and the number of
inserted/updated/deleted tuples compared to the analyze threshold.
Once exception to the previous paragraph is for tables nearing
wraparound, i.e., those that have surpassed the effective failsafe
ages. In that case, the relfrozenxid/relminmxid-based score is
scaled aggressively so that the table has a decent chance of
sorting to the front of the list.
To adjust how strongly each component contributes to the score, the
following parameters can be adjusted from their default of 1.0 to
anywhere between 0.0 and 10.0 (inclusive). Setting all of these to
0.0 restores pre-v19 prioritization behavior:
autovacuum_freeze_score_weight
autovacuum_multixact_freeze_score_weight
autovacuum_vacuum_score_weight
autovacuum_vacuum_insert_score_weight
autovacuum_analyze_score_weight
This is intended to be a baby step towards smarter autovacuum
workers. Possible future improvements include, but are not limited
to, periodic reprioritization, automatic cost limit adjustments,
and better observability (e.g., a system view that shows current
scores). While we do not expect this commit to produce any
earth-shattering improvements, it is arguably a prerequisite for
the aforementioned follow-up changes.
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/aOaAuXREwnPZVISO%40nathan
These tests were intended to be aligned with each other, but
additional tests for virtual generated columns disrupted that
alignment. The test confirming that user-defined types are not
allowed in virtual generated columns has also been moved to the
generated_virtual.sql-specific section.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Mutaamba Maasha <maasha@gmail.com>
Reviewed-by: Surya Poondla <s_poondla@apple.com>
Discussion: https://www.postgresql.org/message-id/flat/20250808115142.e9ccb81f35466a9a131a4c55@sraoss.co.jp
The PredicateLockShmemInit function is pretty complicated, and one
source of confusion is that it reuses the same local variable for
sizes of things. Replace the different uses with separate variables
for clarity.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/113724ab-0028-493f-9605-6e8570f0939f@iki.fi
An element pattern variable may be repeated in the path pattern.
GraphTableParseState maintains a list of all variable names used in
the graph pattern. Add a new variable name to that list only when it
is not present already. This isn't a problem right now, but it could
be in the future.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAExHW5tR4O0vjeqTCPr2VB5pYjNYbJgbCBEQf63NtU5Pz1MiOQ%40mail.gmail.com
Adding an implicit empty vertex pattern when a path pattern starts or
ends with an edge pattern or when two consecutive edge patterns appear
in the pattern is not supported right now. Prohibit such path
patterns.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Henson Choi <assam258@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/72a23702-6d96-4103-a54b-057c2352e885%2540eisentraut.org
Previously we reused the shmem allocator's ShmemLock to also protect
lwlock.c's shared memory structures. Introduce a separate spinlock for
lwlock.c for the sake of modularity. Now that lwlock.c has its own
shared memory struct (LWLockTranches), this is easy to do.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/47aaf57e-1b7b-4e12-bda2-0316081ff50e@iki.fi
Merge the LWLockTranches and NamedLWLockTrancheRequest data structures
in shared memory into one array of user-defined tranches. The
NamedLWLockTrancheRequest list is now only used in postmaster, to hold
the requests until shared memory is initialized.
Introduce a C struct, LWLockTranches, to hold all the different fields
kept in shared memory. This gives an easier overview of what are all
the things kept in shared memory. Previously, we had separate pointers
for LWLockTrancheNames, LWLockCounter and the (shared memory copy of)
NamedLWLockTrancheRequestArray.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/47aaf57e-1b7b-4e12-bda2-0316081ff50e@iki.fi
The "named tranches" term is a little confusing. In most places it
refers to tranches requested with RequestNamedLWLockTranche(), even
though all built-in tranches and tranches allocated with
LWLockNewTrancheId() also have a name. But in MAX_NAMED_TRANCHES, it
refers to tranches requested with either RequestNamedLWLockTranche()
or LWLockNewTrancheId(), as it's the maximum of all of those in total.
The "user defined" term is already used in
LWTRANCHE_FIRST_USER_DEFINED, so let's standardize on that to mean
tranches allocated with either RequestNamedLWLockTranche() or
LWLockNewTrancheId().
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://www.postgresql.org/message-id/47aaf57e-1b7b-4e12-bda2-0316081ff50e@iki.fi
Factor out the "persistence mode" and storage/compression parts
of the syntax synopsis to reduce line lengths and increase
readability. Also add an introductory para about the persistence
modes so that the Description section still lines up with the
synopsis.
Author: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CAKFQuwYfMV-2SdrP-umr5SVNSqTn378BUvHsebetp5=DhT494w@mail.gmail.com
The premise of src/test/modules/test_plan_advice is that if we plan
a query once, generate plan advice, and then replan it using that
same advice, all of that advice should apply cleanly, since the
settings and everything else are the same. Unfortunately, that's
not the case: the test suite is the main regression tests, and
concurrent activity can change the statistics on tables involved
in the query, especially system catalogs. That's OK as long as it
only affects costing, but in a few cases, it affects which relations
appear in the final plan at all.
In the buildfarm failures observed to date, this happens because
we consider alternative subplans for the same portion of the query;
in theory, MinMaxAggPath is vulnerable to a similar hazard. In both
cases, the planner clones an entire subquery, and the clone has a
different plan name, and therefore different range table identifiers,
than the original. If a cost change results in flipping between one
of these plans and the other, the test_plan_advice tests will fail,
because the range table identifiers to which advice was applied won't
even be present in the output of the second planning cycle.
To fix, invent a new DO_NOT_SCAN advice tag. When generating advice,
emit it for relations that should not appear in the final plan at
all, because some alternative version of that relation was used
instead. When DO_NOT_SCAN is supplied, disable all scan methods for
that relation.
To make this work, we reuse a bunch of the machinery that previously
existed for the purpose of ensuring that we build the same set of
relation identifiers during planning as we do from the final
PlannedStmt. In the process, this commit slightly weakens the
cross-check mechanism: before this commit, it would fire whenever
the pg_plan_advice module was loaded, even if pg_plan_advice wasn't
actually doing anything; now, it will only engage when we have some
other reason to create a pgpa_planner_state. The old way was complex
and didn't add much useful test coverage, so this seems like an
acceptable sacrifice.
Discussion: http://postgr.es/m/CA+TgmoYuWmN-00Ec5pY7zAcpSFQUQLbgAdVWGR9kOR-HM-fHrA@mail.gmail.com
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Typically, we have only one PlannerInfo for any given subquery, but
when we are considering a MinMaxAggPath or a hashed subplan, we end
up creating a second PlannerInfo for the same portion of the query,
with a clone of the original range table. In fact, in the MinMaxAggPath
case, we might end up creating several clones, one per aggregate.
At present, there's no easy way for a plugin, such as pg_plan_advice,
to understand the relationships between the original range table and
the copies of it that are created in these cases. To fix, add an
alternative_plan_name field to PlannerInfo. For a hashed subplan, this
is the plan name for the non-hashed alternative; for minmax aggregates,
this is the plan_name from the parent PlannerInfo; otherwise, it's the
same as plan_name.
Discussion: http://postgr.es/m/CA+TgmoYuWmN-00Ec5pY7zAcpSFQUQLbgAdVWGR9kOR-HM-fHrA@mail.gmail.com
Reviewed-by: Lukas Fittl <lukas@fittl.com>
The COMMIT command handles an aborted transaction in the same
manner as the ROLLBACK command, but this wasn't explained in
its official reference page. Also mention that behavior in
the tutorial's material on transactions.
Also add a comment mentioning that we don't raise an exception
for COMMIT within an aborted transaction, as the SQL standard
would have us do.
Hyperlink a couple of cross-references while we're at it.
Author: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Gurjeet Singh <gurjeet@singh.im>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAKFQuwYgYR3rWt6vFXw=ZWZ__bv7PqvdOnHujG+UyqE11f+3sg@mail.gmail.com
Restructure AsyncReadBuffers() to use early return when the head buffer is
already valid, instead of using a did_start_io flag and if/else branches. Also
move around a bit of the code to be located closer to where it is used. This
is a refactor only.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw@3p3zu522yykv
Already two places count buffer hits, requiring quite a few lines of
code since we do accounting in so many places. Future commits will add
more locations, so refactor into a helper.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw@3p3zu522yykv
PinBufferForBlock() is always_inline and called in a loop in
StartReadBuffersImpl(). Previously it computed io_context and io_object
internally, which required calling IOContextForStrategy() -- a non-inline
function the compiler cannot prove is side-effect-free. This could potential
cause unneeded redundant function calls.
Compute io_context and io_object in the callers instead, allowing
StartReadBuffersImpl() to do so once before entering the loop.
Author: Melanie Plageman <melanieplageman@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw@3p3zu522yykv
pg_plan_advice tracks two pieces of per-PlannerInfo data: (1) for each
RTI, the corresponding relation identifier, for purposes of
cross-checking those calculations against the final plan; and (2) the
set of semijoins seen during planning for which the strategy of making
one side unique was considered. The former is tracked using a hash
table that uses <plan_name, RTI> as the key, and the latter is
tracked using a List of <plan_name, relids>.
It seems better to track both of these things in the same way and
to try to reuse some code instead of having everything be completely
separate, so invent pgpa_planner_info; we'll create one every time we
see a new PlannerInfo and need to associate some data with it, and
we'll use the plan_name field to distinguish between PlannerInfo
objects, as it should always be unique. Then, refactor the two
systems mentioned above to use this new infrastructure.
(Note that the adjustment in pgpa_plan_walker is necessary in order
to avoid spuriously triggering the sanity check in that function,
in the case where a pgpa_planner_info is created for a purpose not
related to sj_unique_rels.)
Discussion: https://postgr.es/m/CA+TgmoaK=4w7-qknUo3QhUJ53pXZq=c=KgZmRyD+k7ytqfmgSg@mail.gmail.com
Reviewed-by: Lukas Fittl <lukas@fittl.com>
We recommend looking at psql's "-E" output to help understand the
system catalogs, but in some cases (particularly table displays)
there's a bunch of rather impenetrable SQL there. As a small
improvement, label each query issued by describe.c with a short
description of its purpose. The code is arranged so that the
labels also appear as SQL comments in the server log, if the
server is logging these commands.
We could expand this policy to every use of PSQLexec(), but most of
the ones outside describe.c are issuing simple commands like "BEGIN"
or "COMMIT", which don't seem to need such glosses. I did add
labels to the commands issued by \sf, \sv and friends.
Also, make the -E and log output for hidden queries say
"INTERNAL QUERY" not just "QUERY", to distinguish them from
user-written queries.
Author: Greg Sabino Mullane <htamfids@gmail.com>
Co-authored-by: David Christensen <david+pg@pgguru.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAKAnmmJz8Hh=8Ru8jgzySPWmLBhnv4=oc_0KRiz-UORJ0Dex+w@mail.gmail.com
AsyncReadBuffer()'s no-IO needed path passed
TRACE_POSTGRESQL_BUFFER_READ_DONE the wrong block number because it had
already incremented operation->nblocks_done. Fix by folding the
nblocks_done offset into the blocknum local variable at initialization.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/u73un3xeljr4fiidzwi4ikcr6vm7oqugn4fo5vqpstjio6anl2%40hph6fvdiiria
Backpatch-through: 18
In a future commit more AIO related tests are due to be introduced. However
001_aio.pl already is fairly large.
This commit introduces a new TestAio package with helpers for writing AIO
related tests. Then it uses the new helpers to simplify the existing
001_aio.pl by iterating over all supported io_methods. This will be
particularly helpful because additional methods already have been submitted.
Additionally this commit splits out testing of initdb using a non-default
method into its own test. While that test is somewhat important, it's fairly
slow and doesn't break that often. For development velocity it's helpful for
001_aio.pl to be faster.
While particularly the latter could benefit from being its own commit, it
seems to introduce more back-and-forth than it's worth.
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw@3p3zu522yykv
When my commit e222534679 added the
concept of disabled_nodes, it failed to add a disabled_nodes field
to SubPlan. This is a regression: before that commit, when
fix_alternative_subplan compared the costs of two plans, the number
of disabled nodes affected the result, because it was just a
component of the total cost. After that commit, it no longer did,
making it possible for a disabled path to win on cost over one that
is not disabled. Fix that.
As usual for planner fixes that might destabilize plan choices,
no back-patch.
Discussion: https://postgr.es/m/CA+TgmoaK=4w7-qknUo3QhUJ53pXZq=c=KgZmRyD+k7ytqfmgSg@mail.gmail.com
Reviewed-by: Lukas Fittl <lukas@fittl.com>
This dials back a couple of the qualifiers added by commit
7724cb9935. Specifically, in match_boolean_partition_clause() the
call to negate_clause() casts away the const, so we shouldn't make the
input argument const.
Previously, when the startup process applied WAL and requested walreceiver
to send an apply notification to the primary, walreceiver sent a status reply
unconditionally, even if the WAL locations had not advanced since
the previous update.
As a result, the standby could send two consecutive status reply messages
with identical WAL locations even though wal_receiver_status_interval had
not yet elapsed. This could unexpectedly reset the reported replication lag,
making it difficult for users to monitor lag. The second message was also
unnecessary because it reported no progress.
This commit updates walreceiver to send a reply only when the apply location
has advanced since the last status update, even when the startup process
requests a notification.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAOzEurTzcUrEzrH97DD7+Yz=HGPU81kzWQonKZvqBwYhx2G9_A@mail.gmail.com
pg_stat_replication is documented to keep the last measured lag values for
a short time after the standby catches up, and then set them to NULL when
there is no WAL activity. However, previously lag values could become NULL
prematurely even while WAL activity was ongoing, especially in logical
replication.
This happened because the code cleared lag when two consecutive reply messages
indicated that the apply location had caught up with the send location.
It did not verify that the reported positions were unchanged, so lag could be
cleared even when positions had advanced between messages. In logical
replication, where the apply location often quickly catches up, this issue was
more likely to occur.
This commit fixes the issue by clearing lag only when the standby reports that
it has fully replayed WAL (i.e., both flush and apply locations have caught up
with the send location) and the write/flush/apply positions remain unchanged
across two consecutive reply messages.
The second message with unchanged positions typically results from
wal_receiver_status_interval, so lag values are cleared after that interval
when there is no activity. This avoids showing stale lag data while preventing
premature NULL values.
Even with this fix, lag may rarely become NULL during activity if identical
position reports are sent repeatedly. Eliminating such duplicate messages
would address this fully, but that change is considered too invasive for stable
branches and will be handled in master only later.
Backpatch to all supported branches.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAOzEurTzcUrEzrH97DD7+Yz=HGPU81kzWQonKZvqBwYhx2G9_A@mail.gmail.com
Backpatch-through: 14
The MSVC warning option /w24777 added by commit 2307cfe316 was a
typo, it should have been /w24477. But this option is already enabled
by default in level 1, so we don't need to add it explicitly. So just
remove it.
Compound literals, as used in pg_list.h for list_makeN(), are not a
C++ feature. MSVC doesn't accept these. (GCC and Clang accept them,
but they would warn in -pedantic mode.) Replace with equivalent
inline functions. (These are the only instances of compound literals
used in PostgreSQL header files.)
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/CAGECzQR21OnnKiZO_1rLWO0-16kg1JBxnVq-wymYW0-_1cUNtg%40mail.gmail.com
Reorder the validation checks in replorigin_session_setup() to provide a
more logical flow. This makes the function easier to follow and ensures
that basic state checks are performed consistently.
Additionally, update an error message to align its phrasing with similar
diagnostics in the replication origin subsystem, improving overall
consistency.
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/e0508305-bc6a-417c-b969-36564d632f9e@iki.fi
Commit 497c1170cb added base32hex encoding support, but its
regression test for UUIDs failed on buildfarm members hippopotamus and
jay using natural language locales (such as cs_CZ). This happened
because those collations may sort characters differently, which breaks
the strict byte-wise lexicographical ordering expected by base32hex
encoding.
This commit fixes the regression tests by explicitly using the C
collation.
Per buildfarm members hippopotamus and jay.
Analyzed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/682417.1774482047@sss.pgh.pa.us
Previously, pg_promote() looped a fixed number of times, calculated from
the specified timeout, and waited 100ms on a latch, once per iteration,
for the promotion of a standby to complete. However, unrelated signals
to the backend could set the latch and wake up the backend early,
resulting in a faster consumption of the loops and an execution time of
the function that does not match with the timeout input given in input.
This could be confusing for the function caller, especially if some
backend-side timeout is aggressive, because the function would return
much earlier than expected and report that the promote request has not
completed within the time requested.
This commit refines the logic to track the time actually elapsed, by
looping until the requested duration has truly passed. The code
calculates the end time we expect, then uses it when looping.
Author: Robert Pang <robertpang@google.com>
Reviewed-by: Tiancheng Ge <getiancheng_2012@163.com>
Discussion: https://postgr.es/m/CAJhEC07OK8J7tLUbyiccnuOXRE7UKxBNqD2-pLfeFXa=tBoWtw@mail.gmail.com
The code removed here deleted already-used data from a partially-read
WAL segment's hashtable entry. The intent was evidently to try to
keep the entry's memory consumption below the WAL segment's total
size, but we don't use WAL segments that are so large as to make that
a big win. The important memory-space optimization is to remove
hashtable entries altogether when done with them, and that's handled
elsewhere. To buy that, we must accept a substantially more complex
(and under-documented) logical invariant about what is in entry->buf,
as well as complex and under-documented interactions with the entry
spilling logic, various re-checking code paths in xlogreader.c,
and pg_waldump's overall data processing order. Any of those aspects
could have bugs lurking still, and are quite likely to be prone to
new bugs after future code changes.
Given the number of bugs we've already found in commit b15c15139,
I judge that simplifying anything we possibly can is a good decision.
While here, revise and extend some related comments.
Discussion: https://postgr.es/m/374225.1774459521@sss.pgh.pa.us
Both ArchivedWAL_insert() and ArchivedWAL_delete_item() can cause
existing hashtable entries to move. The code didn't account for this
and could leave privateInfo->cur_file pointing at a dead or incorrect
entry, with hilarity ensuing. Likewise, read_archive_wal_page calls
read_archive_file which could result in movement of the hashtable
entry it is working with.
I believe these bugs explain some odd buildfarm failures, although
the amount of data we use in pg_waldump's TAP tests isn't enough to
trigger them reliably.
This code's all new as of commit b15c15139, so no need for back-patch.
Discussion: https://postgr.es/m/374225.1774459521@sss.pgh.pa.us
TarWALDumpCloseSegment was of the opinion that it didn't need to
do anything. It was mistaken: it has to close the open file if
any, because nothing else will, leading to a descriptor leak.
In addition, we failed to ensure that any file being read by the
XLogReader machinery gets closed before the atexit callback tries to
cleanup the temporary directory holding spilled WAL files. While the
file would have been closed already in case of a success exit, this
doesn't happen in case of pg_fatal() exits. The least messy way
to fix that is to move the atexit function into pg_waldump.c,
where it has easier access to the XLogReaderState pointer and to
WALDumpCloseSegment.
These FD leakages are pretty insignificant on Unix-ish platforms,
but they're a bug on Windows, because they prevent successful cleanup
of the temporary directory for extracted WAL files. (Windows can't
delete a directory that holds a deleted-but-still-open file.)
This is visible in occasional buildfarm failures.
This code's all new as of commit b15c15139, so no need for back-patch.
Discussion: https://postgr.es/m/374225.1774459521@sss.pgh.pa.us
This adds support for base32hex encoding and decoding, as defined in
RFC 4648 Section 7. Unlike standard base32, base32hex uses the
extended hex alphabet (0-9, A-V) which preserves the lexicographical
order of the encoded data.
This is particularly useful for representing UUIDv7 values in a
compact string format while maintaining their time-ordered sort
property.
The encode() function produces output padded with '=', while decode()
accepts both padded and unpadded input. Following the behavior of
other encoding types, decoding is case-insensitive.
Suggested-by: Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au>
Author: Andrey Borodin <x4mmm@yandex-team.ru>
Co-authored-by: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Илья Чердаков <i.cherdakov.pg@gmail.com>
Reviewed-by: Chengxi Sun <chengxisun92@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAJ7c6TOramr1UTLcyB128LWMqita1Y7%3Darq3KHaU%3Dqikf5yKOQ%40mail.gmail.com
Commit 8185bb5347 extended the CREATE/ALTER SUBSCRIPTION and
CREATE/ALTER FOREIGN DATA WRAPPER commands, but missed the
corresponding tab-completion logic. This commit fixes that oversight
by adding completion support for:
- The CONNECTION keyword in CREATE/ALTER FOREIGN DATA WRAPPER.
- The list of foreign servers in CREATE/ALTER SUBSCRIPTION.
Author: Yamaguchi Atsuo <acrobatcoder@gmail.com>
Discussion: https://postgr.es/m/CAKSyusJWdWcUKVd3qJXcEaQxJewGymQWV_r3-mc=Knrqo0AZ_g@mail.gmail.com
This is similar to the standard behavior in GCC. For MSVC, we set all
headers in angle brackets to be considered system headers. (GCC goes
by path, not include style.)
The required option is available since VS 2017. (Before VS 2019
version 16.10, the additional option /experimental:external is
required, but per discussion in [0], we effectively require 16.11, so
this shouldn't be a problem.)
[0]: https://www.postgresql.org/message-id/04ab76a3-186c-4a37-8076-e6882ebf9d43%40eisentraut.org
Then, we can remove one workaround for avoiding a warning from a
system header. (And some warnings to be enabled in the future could
benefit from this.)
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/aa73q1aT0A3/vke/%40ip-10-97-1-34.eu-west-3.compute.internal
This commit introduces a -l (or --logdir) argument to pg_createsubscriber,
allowing users to specify a directory for log files.
When enabled, a timestamped subdirectory is created within the specified
log directory, containing:
pg_createsubscriber_server.log: Captures logs from the standby server
during its start/stop cycles.
pg_createsubscriber_internal.log: Captures the tool's own internal
diagnostic and progress messages.
This ensures that transient server and utility messages are preserved for
troubleshooting after the subscriber creation process completes or errored
out.
Author: Gyan Sreejith <gyan.sreejith@gmail.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAEqnbaUthOQARV1dscGvB_EsqC-YfxiM6rWkVDHc+G+f4oSUHw@mail.gmail.com
Introduce two helpers for CPUID, pg_cpuid and pg_cpuid_subleaf that wrap
the platform specific __get_cpuid/__cpuid and __get_cpuid_count/__cpuidex
functions.
Additionally, use macros to specify registers names (e.g. EAX) for clarity,
instead of numeric integers into the result array.
Author: Lukas Fittl <lukas@fittl.com>
Suggested-By: John Naylor <john.naylor@postgresql.org>
Discussion: https://postgr.es/m/CANWCAZZ+Crjt5za9YmFsURRMDW7M4T2mutDezd_3s1gTLnrzGQ@mail.gmail.com
This test is proving to be unstable in the CI for Windows, at least.
The origin of the issue is that the deadlock_timeout requests may not
be processed, causing the lock stats to not be updated. This could be
mitigated by making the hardcoded sleep longer, however this would cost
in runtime on fast machines. On slow machines, there is no guarantee
that an augmented sleep would be enough.
An isolation test may not be the best method to write this test
(TAP test with injection point with a NOTICE+wait_for_log before
processing the deadlock_timeout request should remove the need of a
sleep). As we are late in the release cycle, I am removing the test for
now to keep the CI and the buildfarm a maximum stable. Let's revisit
this part later.
Discussion: https://postgr.es/m/hlkdrplgrmudbspibsuq6xooxrqxqsgwo6x5b6x5ptvkgjbe7w@xogt6xgua6dz
Constructing a Subcription object uses a number of small or temporary
allocations. Use a per-object memory context for easy cleanup.
Get rid of FreeSubscription() which did not free all the allocations
anyway. Also get rid of the PG_TRY()/PG_CATCH() logic in
ForeignServerConnectionString() which were used to avoid leaks during
GetSubscription().
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/xvdjrdqnpap3uq7owbaox3r7p5gf7sv62aaqf2ju3vb6yglatr%40kvvwhoudrlxq
Discussion: https://postgr.es/m/CAA4eK1K=WjZ1maBCmj=5ZdO66AwPORK5ZBxVKedS0xdCcb621A@mail.gmail.com
There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.
Bumps XLOG_PAGE_MAGIC because we removed a WAL record type.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible and all-frozen in a
XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
This has no real independent benefit, but empty pages were the last user
of XLOG_HEAP2_VISIBLE, so by making this change we can next remove all
of the XLOG_HEAP2_VISIBLE code.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Earlier version Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications. This reduces WAL volume produced by vacuum.
For now, vacuum is still the only user of heap_page_prune_and_freeze()
allowed to set the VM. On-access pruning is not yet able to set the VM.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Earlier version Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
This function exits early in the case where the number of inner rows
is estimated to be less than 2, on the theory that in that case a
Nested Loop with inner Memoize must lose to a plain Nested Loop.
But since commit 4020b370f2 it's
possible for a plain Nested Loop to be disabled, while a Nested Loop
with inner Memoize is still enabled. In that case, this reasoning
is not valid, so adjust the code not to exit early in that case.
This issue was revealed by a test_plan_advice failure on buildfarm
member skink, where NESTED_LOOP_MEMOIZE() couldn't be enforced on
replanning due to this early exit.
Discussion: http://postgr.es/m/CA+TgmoZUN8FT1Ah=m6Uis5bHa4FUa+_hMDWtcABG17toEfpiUg@mail.gmail.com
During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.
Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.
Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain set_all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
During vacuum's first and third phases, we examine tuples' visibility to
determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen
at the start of vacuum (OldestXmin). We now use GlobalVisState, which
enables future work to set the VM during on-access pruning, since
ordinary queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.
OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page; if it is
globally visible, then the entire page is all-visible.
Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid, which is required to set the visibility map
on-access in the future.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
The number of .c files that must include access/clog.h can currently be
counted on one's fingers and miss only one (assuming one has the usual
number of hands). However, due to indirect inclusion via proc.h,
there's a lot of files that are pointlessly including it. This is easy
to avoid with the easy trick implemented by this commit.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/202603221856.iwlhitt6dxxx@alvherre.pgsql
astreamer_gzip.c and astreamer_lz4.c left their decompression
output buffers at StringInfo's default allocation, merely 1kB.
This results in a lot of ping-ponging between the decompressor
and the next astreamer filter. This patch increases these buffer
sizes to 256kB. In a simple test this had a small but measurable
effect (saving a few percent) on the overall runtime of pg_waldump
for the gzipped-data case; I didn't bother measuring for lz4.
astreamer_zstd.c used ZSTD_DStreamOutSize() to size its
compression output buffer, but the libzstd API says you should use
ZSTD_CStreamOutSize(); ZSTD_DStreamOutSize() is for decompression.
The two functions seem to produce the same value (256kB) here, so
this is just cosmetic, but nonetheless we should play by the rules.
While these issues are old, they don't seem significant enough to
warrant back-patching.
Discussion: https://postgr.es/m/3424809.1774234940@sss.pgh.pa.us
Instead, always try to fill the allocated buffer completely.
The previous coding apparently intended (though it's undocumented)
to read only small amounts of data until we are able to identify the
WAL segment size and begin filtering out unwanted segments. However
this extra complication has no measurable value according to simple
testing here, and it could easily be a net loss if there is a
substantial amount of non-WAL data in the archive file before the
first WAL file.
Discussion: https://postgr.es/m/3341199.1774221191@sss.pgh.pa.us
Since storage/locktags.h was added by commit 322bab7974, many headers
can be made leaner by depending on that instead of on storage/lock.h,
which has many other dependencies.
(In fact, some of these changes were possible even before that.)
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/abvrRZo52Yx9ZzWQ@ip-10-97-1-34.eu-west-3.compute.internal
check_backtrace_functions() and check_archive_directory() were doing an
empty-string check this way:
*newval[0] == '\0'
which, because of operator precedence, is interpreted as *(newval[0])
instead of (*newval)[0] -- but these variables are pointers to C-strings
and we want to check the first character therein, rather than check the
first pointer of the array, so that interpretation is wrong. This would
be wrong for any index element other than 0, as evidenced by every other
dereference of the same variable in check_backtrace_functions, which use
parentheses.
Add parentheses to make the intended dereference explicit.
This is just cosmetic at this stage, so no backpatch, although it's been
"wrong" for a long time.
Author: Zhang Hu <kongbaik228@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/CAB5m2QssN6UO+ckr6ZCcV0A71mKUB6WdiTw1nHo43v4DTW1Dfg@mail.gmail.com
Previously, XLogFindNextRecord() did not return detailed error information
when it failed to find a valid WAL record. As a result, callers such as
the WAL summarizer, pg_waldump, and pg_walinspect could only report generic
errors (e.g., "could not find a valid record after ..."), making
troubleshooting difficult.
This commit fix the issue by extending XLogFindNextRecord() to return
detailed error information on failure, and updating its callers to include
those details in their error messages.
For example, when pg_waldump is run on a WAL file with an invalid magic number,
it now reports not only the generic error but also the specific cause
(e.g., "invalid magic number").
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAO6_XqoxJXddcT4wkd9Xd+cD6Sz-fyspRGuV4Bq-wbXG4pVNzA@mail.gmail.com
The second argument to TupleDescAttr should always be at least zero
and less than natts; otherwise, we index outside of the attribute
array. Assert that this is the case.
Various violations, or possible violations, of this rule that are
currently in the tree are actually harmless, because while
we do call TupleDescAttr() before verifying that the argument is
within range, we don't actually dereference it unless the argument
was within range all along. Nonetheless, the Assert means we
should be more careful, so tidy up accordingly.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoacixUZVvi00hOjk_d9B4iYKswWP1gNqQ8Vfray-AcOCA@mail.gmail.com
This adjusts many C functions underlying casts to support soft errors.
This is in preparation for a future feature where conversion errors in
casts can be caught.
This patch covers cast functions that can be adjusted easily by
changing ereport to ereturn or making other light changes. The
underlying helper functions were already changed to support soft
errors some time ago as part of soft error support in type input
functions.
Other casts and types will require some more work and are being kept
as separate patches.
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Amul Sul <sulamul@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CADkLM%3Dfv1JfY4Ufa-jcwwNbjQixNViskQ8jZu3Tz_p656i_4hQ%40mail.gmail.com
Both of the checks in DefineIndex() that can produce this error
message have a guard against negative attribute numbers, but lack a
guard to ensure that attno is non-zero. As a result, we can index
off the beginning of the TupleDesc and read a garbage byte for
attgenerated. If that byte happens to be 'v', we'll incorrectly
produce the error mentioned above.
The first call site is easy to hit: any attempt to create an
expression index does so. The second one is not currently hit in
the regression tests, but can be hit by something like
CREATE INDEX ON some_table ((some_function(some_table))).
Found by study of a test_plan_advice failure on buildfarm member
skink, though this issue has nothing to do with test_plan_advice
and seems to have only been revealed by happenstance.
Backpatch-through: 18
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoacixUZVvi00hOjk_d9B4iYKswWP1gNqQ8Vfray-AcOCA@mail.gmail.com
The check for a mismatch on the second decoded item pointer
was an exact copy of the first item pointer check, comparing
orig_itemptrs[0] with decoded_itemptrs[0] instead of orig_itemptrs[1]
with decoded_itemptrs[1]. The error message also reported (0, 1) as
the expected value instead of (blk, off). As a result, any decoding
error in the second item pointer (where the varbyte delta encoding
is exercised) would go undetected.
This has been wrong since commit bde7493d1, so backpatch to all
supported versions.
Author: Jianghua Yang <yjhjstz@gmail.com>
Discussion: https://postgr.es/m/CAAZLFmSOD8R7tZjRLZsmpKtJLoqjgawAaM-Pne1j8B_Q2aQK8w@mail.gmail.com
Backpatch-through: 14
The updated comment explains why we use ChangeVarNodes_walker() instead of
expression_tree_walker(), and provides a bit more detail about the differences
in processing top-level Query and subqueries.
Author: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAPpHfdvbjq342WTQ705Wmqhe8794pcp7wospz%2BWUJ2qB7vuOqA%40mail.gmail.com
Backpatch-through: 18
This commit adds a new stats kind, called PGSTAT_KIND_LOCK, implementing
statistics for lock tags, as reported by pg_locks. The implementation
is fixed-sized, as the data is caped based on the number of lock tags in
LockTagType.
The new statistics kind records the following fields, providing insight
regarding lock behavior, while avoiding impact on performance-critical
code paths (such as fast-path lock acquisition):
- waits and wait_time: respectively track the number of times a lock
required waiting and the total time spent acquiring it. These metrics
are only collected once a lock is successfully acquired and after
deadlock_timeout has been exceeded.
fastpath_exceeded: counts how often a lock could not be acquired via
the fast path due to the max_locks_per_transaction slot limits.
A new view called pg_stat_lock can be used to access this data, coupled
with a SQL function called pg_stat_get_lock().
Bump stat file format PGSTAT_FILE_FORMAT_ID.
Bump catalog version.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aIyNxBWFCybgBZBS%40ip-10-97-1-34.eu-west-3.compute.internal
This change will simplify an upcoming change that will introduce lock
statistics, reducting code churn.
This commit means that we begin to calculate the time it took to acquire
a lock after the deadlock check interrupt has run should log_lock_waits
be off, when taken in isolation. This is not a performance-critical
code path, and note that log_lock_waits is enabled by default since
2aac62be8c.
Extracted from a larger patch by the same author.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aIyNxBWFCybgBZBS@ip-10-97-1-34.eu-west-3.compute.internal
This commit makes our implementation of SASLprep() compliant with RFC
3454 (Stringprep) and RFC 4013 (SASLprep). Originally, as introduced in
60f11b87a2, the operation considered a password made of only ASCII
characters as valid, performing an optimization for this case to skip
the internal NFKC transformation.
However, the RFCs listed above use a different definition, with the
following characters being prohibited:
- 0x00~0x1F (0~31), control characters.
- 0x7F (127, DEL).
In its SCRAM protocol, Postgres has the idea to apply a password as-is
if SASLprep() is not a success, so this change is safe on
backward-compatibility grounds:
- A libpq client with the compliant SASLprep can connect to a server
with a non-compliant SASLprep.
- A libpq client with the non-compliant SASLprep can connect to a server
with a compliant SASLprep.
This commit removes the all-ASCII optimization used in pg_saslprep() and
applies SASLprep even if a password is made only of ASCII characters,
making the operation compatible with the RFC. All the in-core callers
of pg_saslprep() do that:
- pg_be_scram_build_secret() in auth-scram.c, when generating a
SCRAM verifier for rolpassword in the backend.
- scram_init() in fe-auth-scram.c, when starting the SASL exchange.
- pg_fe_scram_build_secret() in fe-auth-scram.c, when generating a SCRAM
verifier for the frontend with libpq, to generate it for a ALTER/CREATE
ROLE command for example.
The test module test_saslprep shows the difference this change is
leading to.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/aaEJ-El2seZHeFcG@paquier.xyz
Our RHEL7-vintage buildfarm animals are complaining about
"the comparison will always evaluate as true" for a usage of
SOFT_ERROR_OCCURRED() on a local variable. This is the same
issue addressed in 7bc88c3d6 and some earlier commits, so solve
it the same way: write "escontext.error_occurred" instead.
Problem dates to recent commit a0b6ef29a, no need for back-patch.
My attention was drawn to this new documentation by overlength-line
complaints in the PDF docs builds: the synopsis for hostname lines was
too wide. I initially thought of shortening the parameter names to
fit, but it turns out that adding <optional> markup is enough to
persuade DocBook to break the line, and that seems more helpful
anyway.
While here, I couldn't resist some copy-editing, mostly being
consistent about whether to use Oxford commas or not. The biggest
change was to re-order the entries in the hostname-values table to
match the running text.
IMO the proximate cause of the bug fixed in commit 07b7a964d
was sloppy thinking about what ChangeVarNodesWalkExpression()
is to be used for. Flesh out its header comment to try to
improve that situation.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1607553.1774017006@sss.pgh.pa.us
Backpatch-through: 18
When the value of pg_aios.pid is found to be 0, the function had the
idea to set "nulls" to "false" instead of "true", without setting the
value stored in the tuplestore. This could lead to the display of buggy
data. The intention of the code is clearly to display NULL when a PID
of 0 is found, and this commit adjusts the logic to do so.
Issue introduced by 60f566b4f2.
Author: ChangAo Chen <cca5507@qq.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/tencent_7D61A85D6143AD57CA8D8C00DEC541869D06@qq.com
Backpatch-through: 18
headerscheck in C++ mode (cpluspluscheck) previously hardcoded
CXXFLAGS and documented that you might need to override them manually
from the environment. Now that we have better C++ support in the
build system, we can just get CXXFLAGS from Makefile.global, like we
do for other variables.
Furthermore, this is necessary in some configurations to make
cpluspluscheck work under meson, because under meson, some -I options
end up in CXXFLAGS where under make they would be in CPPFLAGS.
Therefore, getting the correct CXXFLAGS is required in those cases.
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAMSWrt-PoQt4sHryWrB1ViuGBJF_PpbjoSGrWR2Ry47bHNLDqg%40mail.gmail.com
Replace generic pg_log_* calls with report_createsub_log() and
report_createsub_fatal(). This refactor provides the necessary
infrastructure to support logging to external files via the -l option.
These new functions enable the utility to route messages to both the
terminal and a log file based on the logging configuration and verbosity
levels provided by the user.
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Author: Gyan Sreejith <gyan.sreejith@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAEqnbaUthOQARV1dscGvB_EsqC-YfxiM6rWkVDHc+G+f4oSUHw@mail.gmail.com
The gzip basebackup sink called deflateInit2() in begin_archive() but
never called deflateEnd(), leaking zlib's internal compression state
(~256KB per archive) until the memory context of the base backup is
destroyed.
The code tree has already a matching deflateEnd() call for each
deflateInit[2]() call (pgrypto, etc.), except for the file touched in
this commit, so this brings more consistency for all the compression
methods. The server-side LZ4 and zstd implementations require a
dedicated cleanup callback as they allocate their state outside the
context of a palloc().
As currently used, deflateInit2() is called once per tablespace in a
single backup. Memory would slightly bloat only when dealing with many
tablespaces at once, not across multiple base backups so this is not
worth a backpatch. This change could matter for future uses of this
code.
zlib allows the definition of memory allocation and free callbacks in
the z_stream object given to a deflateInit[2](). The base backup
backend code relies on palloc() for the allocations and deflateEnd()
internally only cleans up memory (no fd allocation for example).
Author: Jianghua Yang <yjhjstz@gmail.com>
Discussion: https://postgr.es/m/CAAZLFmQNJ0QNArpWEOZXwv=vbumcWKEHz-b1me5gBqRqG67EwQ@mail.gmail.com
While re-reading 860359ea0, I noticed another problem: when
spilling to a temp file, it did not bother to check the result
of fclose(). This is bad since write errors (like ENOSPC)
may not be reported until close time.
1. archive_waldump.c called astreamer_finalize() nowhere. This meant
that any data retained in decompression buffers at the moment we
detect archive EOF would never reach astreamer_waldump_content(),
resulting in surprising failures if we actually need the last few
bytes of the archive file.
To fix that, make read_archive_file() do the finalize once it detects
EOF. Change its API to return a boolean "yes there's more data"
rather than the entirely-misleading raw count of bytes read.
2. init_archive_reader() relied on privateInfo->cur_file to track
which WAL segment was being read, but cur_file can become NULL if a
member trailer is processed during a read_archive_file() call. This
could cause unreproducible "could not find WAL in archive" failures,
particularly with compressed archives where all the WAL data fits in
a small number of compressed bytes.
Fix by scanning the hash table after each read to find any cached
WAL segment with sufficient data, instead of depending on cur_file.
Also reduce the minimum data requirement from XLOG_BLCKSZ to
sizeof(XLogLongPageHeaderData), since we only need the long page
header to extract the segment size.
We likewise need to fix init_archive_reader() to scan the whole
hash table for irrelevant entries, since we might have already
loaded more than one entry when the data is compressible enough.
3. get_archive_wal_entry() relied on tracking cur_file to identify
WAL hash table entries that need to be spilled to disk. However,
this can't work for entries that are read completely within a
single read_archive_file call: the caller will never see cur_file
pointing at such an entry. Instead, scan the WAL hash table to
find entries we should spill. This also fixes a buglet that any
hash table entries completely loaded during init_archive_reader
were never considered for spilling.
Also, simplify the logic tremendously by not attempting to spill
entries that haven't been read fully. I am not convinced that the old
logic handled that correctly in every path, and it's really not worth
the complication and risk of bugs to try to spill entries on the fly.
We can just write them in a single go once they are no longer the
cur_file.
4. Fix a rather critical performance problem: the code thought that
resetStringInfo() will reclaim storage, but it doesn't. So by the
end of the run we'd have consumed storage space equal to the total
amount of WAL read, negating all the effort of the spill logic.
Also document the contract that cur_file can change (or become NULL)
during a single read_archive_file() call, since the decompression
pipeline may produce enough output to trigger multiple astreamer
callbacks.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/2178517.1774064942@sss.pgh.pa.us
The ASTREAMER_ARCHIVE_TRAILER case in astreamer_tar_parser_content()
intended to reject tar files whose trailer exceeded 2 blocks. However,
the check compared 'len' after astreamer_buffer_bytes() had already
consumed all the data and set len to 0, so the pg_fatal() could never
fire.
Moreover, per the POSIX specification for the ustar format, the last
physical block of a tar archive is always full-sized, and "logical
records after the two zero logical records may contain undefined data."
GNU tar, for example, zero-pads its output to a 10kB boundary by
default. So rejecting extra data after the two zero blocks would be
wrong even if the check worked. (But if the check had worked, it
would have alerted us to the bug just fixed in 9aa1fcc54.)
Remove the dead check and update the comment to explain why trailing
data is expected and harmless.
Per report from Tom Lane.
Author: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2178517.1774064942@sss.pgh.pa.us
Send the correct amount of data to the next astreamer, not the
whole allocated buffer size. This bug escaped detection because
in present uses the next astreamer is always a tar-file parser
which is insensitive to trailing garbage. But that may not
be true in future uses.
Author: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2178517.1774064942@sss.pgh.pa.us
Backpatch-through: 15
Use fake LSNs in all hash AM critical sections that write a WAL record.
This gives us a reliable way (a way that works during scans of both
logged and unlogged relations) to detect when an index page was
concurrently modified during the window between when the page is
initially read (by _hash_readpage) and when the page has any known-dead
items LP_DEAD-marked (by _hash_kill_items).
Preparation for an upcoming patch that makes the hash index AM use the
amgetbatch interface, enabling I/O prefetching during hash index scans.
The amgetbatch design imposes certain rules on index AMs with respect to
how they hold on to index page buffer pins (at least in the case of pins
held as an interlock against unsafe concurrent TID recycling by VACUUM).
These rules have consequences for routines that set LP_DEAD bits on
index tuples from an amgetbatch index AM: such routines have an inherent
need to reason about concurrent TID recycling by VACUUM, but can no
longer rely on their amgettuple routine holding on to a buffer pin
(during the aforementioned window) as an interlock against such
recycling. Instead, they have to follow a new, standardized approach.
The new approach taken by amgetbatch index AMs when setting LP_DEAD bits
is heavily based on the current nbtree dropPin design, which was added
by commit 2ed5b87f. It also works by checking if the page's LSN
advanced during the window where unsafe concurrent TID recycling might
have taken place.
This commit is similar to commit 8a879119, which taught nbtree to use
fake LSNs to improve its dropPin behavior. However, unlike that commit,
this is not an independently useful enhancement, since hash doesn't
implement anything like nbtree's dropPin behavior (not yet).
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-WzkehuhxyuA8quc7rRN3EtNXpiKsjPfO8mhb+0Dr2K0Dtg@mail.gmail.com
Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
heap_page_prune_and_freeze() can be invoked for pages with no pruning or
freezing work to do. To avoid this, if a page is already all-frozen or
it is all-visible and no freezing will be attempted, exit early. We
can't exit early if vacuum passed DISABLE_PAGE_SKIPPING, though.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
Change the IndexScanInstrumentation fields in IndexScanState,
IndexOnlyScanState, and BitmapIndexScanState from inline structs to
pointers. This avoids additional space overhead whenever new fields are
added to IndexScanInstrumentation in the future, at least in the common
case where the instrumentation isn't used (i.e. when the executor node
isn't being run through an EXPLAIN ANALYZE).
Preparation for an upcoming patch series that will add index
prefetching. The new slot-based interface that will enable index
prefetching necessitates that we add at least one more field to
IndexScanInstrumentation (to count heap fetches during index-only
scans).
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wz=g=JTSyDB4UtB5su2ZcvsS7VbP+ZMvvaG6ABoCb+s8Lw@mail.gmail.com
Move VM corruption detection and repair into heap page pruning. This
allows VM repair during on-access pruning, not only during vacuum.
Also, expand corruption detection to cover pages marked all-visible that
contain dead tuples and tuples inserted or deleted by in-progress
transactions, rather than only all-visible pages with LP_DEAD items.
Pinning the correct VM page before on-access pruning is cheap when
compared to the cost of actually pruning. The vmbuffer is saved in the
scan descriptor, so a query should only need to pin each VM page once,
and a single VM page covers a large number of heap pages.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
_hash_vacuum_one_page() in hashinsert.c is a routine related to hash
indexes that can perform a single-page VACUUM when dead tuples are
detected during index insertion. This routine previously had no test
coverage, and this commit adds a test case for that purpose.
To safely create dead tuples in a way that works with parallel tests,
this uses a technique based on a rollbacked INSERT, following a
suggestion by Heikki Linnakangas.
Author: Alexander Kuzmenkov <akuzmenkov@tigerdata.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CALzhyqxrc1ZHYmf5V8NE+yMboqVg7xZrQM7K2c7VS0p1v8z42w@mail.gmail.com
This commit moves all the declarations related to locktags from lock.h
to a new header called locktag.h. This header is useful so as code
paths that care about locktags but not the lock hashtable can know about
these without having to include lock.h and all its set of dependencies.
This move includes the basic locktag structures and the set of macros to
fill in the locktag fields before attempting to acquire a lock.
Based on a suggestion from me, suggestion done while discussing a
different feature.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/abufUya2oK-_PJ3E@paquier.xyz
Previously, we always fed SELECT ... INTO to the SPI machinery.
While that works for all cases, it's a great deal slower than
the otherwise-equivalent "var := expression" if the expression
is "simple" and the INTO target is a single variable. Users
coming from MSSQL or T_SQL are likely to be surprised by this;
they are used to writing SELECT ... INTO since there is no
"var := expression" syntax in those dialects. Hence, check for
a simple expression and use the faster code path if possible.
(Here, "simple" means whatever exec_is_simple_query accepts,
which basically means "SELECT scalar-expression" without any
input tables, aggregates, qual clauses, etc.)
This optimization is not entirely transparent. Notably, one of
the reasons it's faster is that the hooks that pg_stat_statements
uses aren't called in this path, so that the evaluated expression
no longer appears in pg_stat_statements output as it did before.
There may be some other minor behavioral changes too, although
I tried hard to make error reporting look the same. Hopefully,
none of them are significant enough to not be acceptable as
routine changes in a PG major version.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://postgr.es/m/CAFj8pRDieSQOPDHD_svvR75875uRejS9cN87FoAC3iXMXS1saQ@mail.gmail.com
Now that pg_waldump supports reading WAL from tar archives, remove the
restriction that forced --no-parse-wal for tar-format backups.
pg_verifybackup now automatically locates the WAL archive: it looks for
a separate pg_wal.tar first, then falls back to the main base.tar. A
new --wal-path option (replacing the old --wal-directory, which is kept
as a silent alias) accepts either a directory or a tar archive path.
The default WAL directory preparation is deferred until the backup
format is known, since tar-format backups resolve the WAL path
differently from plain-format ones.
Author: Amul Sul <sulamul@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
pg_waldump can now accept the path to a tar archive (optionally
compressed with gzip, lz4, or zstd) containing WAL files and decode
them. This was added primarily for pg_verifybackup, which previously
had to skip WAL parsing for tar-format backups.
The implementation uses the existing archive streamer infrastructure
with a hash table to track WAL segments read from the archive. If WAL
files within the archive are not in sequential order, out-of-order
segments are written to a temporary directory (created via mkdtemp under
$TMPDIR or the archive's directory) and read back when needed. An
atexit callback ensures the temporary directory is cleaned up.
The --follow option is not supported when reading from a tar archive.
Author: Amul Sul <sulamul@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
Several refactoring steps in preparation for adding tar archive WAL
decoding support to pg_waldump:
- Move XLogDumpPrivate and related declarations into a new pg_waldump.h
header, allowing a second source file to share them.
- Factor out required_read_len() so the read-size calculation can be
reused for both regular WAL files and tar-archived WAL.
- Move the WAL segment size variable into XLogDumpPrivate and rename it
to segsize, making it accessible to the archive streamer code.
Author: Amul Sul <sulamul@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
Consolidate tar archive identification and compression-type detection
logic into a shared location. Currently used by pg_basebackup and
pg_verifybackup, this functionality is also required for upcoming
pg_waldump enhancements.
This change promotes code reuse and simplifies maintenance across
frontend tools.
Author: Amul Sul <sulamul@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
These warning limits were last changed to 40M by commit cd5e82256d.
For the benefit of workloads that rapidly consume transactions or
multixacts, this commit bumps the limits to 100M. This will
hopefully give users enough time to react.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://postgr.es/m/aRdhSSFb9zZH_0zc%40nathan
This commit adds DETAIL messages to the existing wraparound
WARNINGs that include the percentage of transaction/multixact IDs
that remain available for use. The hope is that this more clearly
expresses the urgency of the situation.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://postgr.es/m/aRdhSSFb9zZH_0zc%40nathan
genericcostestimate() estimates the number of index leaf pages to
be visited as a pro-rata fraction of the total number of leaf pages.
Or at least that was the intention. What it actually used in the
calculation was the total number of index pages, so that non-leaf
pages were also counted. In a decent-sized index the error is
probably small, since we expect upper page fanout to be high.
But in a small index that's not true; in the worst case with one
data-bearing page plus a metapage, we had 100% relative error.
This led to surprising planning choices such as not using a small
partial index.
To fix, ask genericcostestimate's caller to supply an estimate of
the number of non-leaf pages, and subtract that. For the built-in
index AMs, it seems sufficient to count the index metapage (if the
AM uses one) as non-leaf. Per the above argument, counting upper
index pages shouldn't change the estimate much, and in most cases
we don't have any easy way of estimating the number of upper pages.
This might be an area for further research in future.
Any external genericcostestimate callers that do not set the new field
GenericCosts.numNonLeafPages will see the same behavior as before,
assuming they followed the advice to zero out that whole struct.
Unsurprisingly, this change affects a number of plans seen in the
core regression tests. I hacked up the existing tests to keep the
tests' plans the same, since in each case it appeared that the
test's intent was to test exactly that plan. Also add one new
test case demonstrating that a better index choice is now made.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Henson Choi <assam258@gmail.com>
Discussion: https://postgr.es/m/870521.1745860752@sss.pgh.pa.us
Self-join removal failed to update Var nodes when the join clause was a
bare Var (e.g., ON t1.bool_col) rather than an expression containing
Vars. ChangeVarNodesWalkExpression() used expression_tree_walker(),
which descends into child nodes but does not process the top-level node
itself. When a bare Var referencing the removed relation appeared as
the clause, its varno was left unchanged, leading to "no relation entry
for relid N" errors.
Fix by calling ChangeVarNodes_walker() directly instead of
expression_tree_walker(), so the top-level node is also processed.
Bug: #19435
Reported-by: Hang Ammmkilo <ammmkilo@163.com>
Author: Andrei Lepikhov <lepihov@gmail.com>
Co-authored-by: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/19435-3cc1a87f291129f1%40postgresql.org
Backpatch-through: 18
... otherwise, the function invoked by the hook might consult the
catalog and not see that the new constraint exists.
This relies on set_attnotnull doing CommandCounterIncrement()
after successfully modifying the catalog.
Oversight in commit 14e87ffa5c.
Author: Artur Zakirov <zaartur@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/CAKNkYnxUPCJk-3Xe0A3rmCC8B8V8kqVJbYMVN6ySGpjs_qd7dQ@mail.gmail.com
Since this runs the main regression tests, it can take some time to
complete. Therefore, it's better to start it earlier, as we also do
for the main regression test suite.
Author: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: http://postgr.es/m/1095d3fe-a6eb-4d83-866e-649d6f369908@gmail.com
This adds the force_array option, which is available exclusively
when using COPY TO with the JSON format.
When enabled, this option wraps the output in a top-level JSON array
(enclosed in square brackets with comma-separated elements), making the
entire result a valid single JSON value. Without this option, the
default behavior is to output a stream of independent JSON objects.
Attempting to use this option with COPY FROM or with formats other than
JSON will raise an error.
Author: Joe Conway <mail@joeconway.com>
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
Discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
This introduces the JSON format option for the COPY TO command, allowing
users to export query results or table data directly as a stream of JSON
objects (one per line, NDJSON style).
The JSON format is currently supported only for COPY TO operations; it
is not available for COPY FROM.
JSON format is incompatible with some standard text/CSV formatting
options, including HEADER, DEFAULT, NULL, DELIMITER, FORCE QUOTE,
FORCE NOT NULL, and FORCE NULL.
Column list support is included: when a column list is specified, only
the named columns are emitted in each JSON object.
Regression tests covering valid JSON exports and error handling for
incompatible options have been added to src/test/regress/sql/copy.sql.
Author: Joe Conway <mail@joeconway.com>
Author: jian he <jian.universality@gmail.com>
Co-Authored-By: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Davin Shearer <davin@apache.org>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
Discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
Currently, the COPY command format is determined by two boolean fields
(binary, csv_mode) in CopyFormatOptions. This approach, while
functional, isn't ideal for implementing other formats in the future.
To simplify adding new formats, introduce a CopyFormat enum. This makes
the code cleaner and more maintainable, allowing for easier integration
of additional formats down the line.
Author: Joel Jacobson <joel@compiler.org>
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
Discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
This test leaves behind the roles and users it creates.
002_pg_upgrade test dumps and restore the regression when
PG_TEST_EXTRA contains regress_dump_restore. The global objects such
as users and roles are not dumped by pg_dump. But it still dumps the
policies associated with users, and commands to set the ownership.
Restoring these policies and the ownerships fails since the users and
roles do not exist. To fix this failure we could use --no-owner, but
it does not exclude the policy objects associated with users. Hence
drop the users, roles and policies that depend upon them at the end of
the test.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reported-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/a855795d-e697-4fa5-8698-d20122126567@eisentraut.org
Following commit fd366065e0, which added EXCEPT TABLE support to
CREATE PUBLICATION, this commit extends ALTER PUBLICATION to allow
modifying the exclusion list.
New Syntax:
ALTER PUBLICATION name SET publication_all_object [, ... ]
where publication_all_object is one of:
ALL TABLES [ EXCEPT TABLE ( except_table_object [, ... ] ) ]
ALL SEQUENCES
If the EXCEPT clause is provided, the existing exclusion list in
pg_publication_rel is replaced with the specified relations. If the
EXCEPT clause is omitted, any existing exclusions for the publication
are cleared. Similarly, SET ALL SEQUENCES updates
Note that because this is a SET command, specifying only one object
type (e.g., SET ALL SEQUENCES) will reset the other unspecified flags
(e.g., setting puballtables to false).
Consistent with CREATE PUBLICATION, only root partitioned tables or
standard tables can be specified in the EXCEPT list. Specifying a
partition child will result in an error.
Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Discussion: https://postgr.es/m/CALDaNm3=JrucjhiiwsYQw5-PGtBHFONa6F7hhWCXMsGvh=tamA@mail.gmail.com
In c456e3911, I mistakenly thought that the deformer code would never
see cstrings and that I could use pg_assume() to have the compiler omit
producing code for attlen == -2 attributes. That saves bloating the
deforming code a bit with the extra check and strlen() call. While this
is ok to do for tuples from the heap, it's not ok to do for
MinimalTuples as those *can* contain cstrings and
tts_minimal_getsomeattrs() implements deforming by inlining the
(slightly misleadingly named) slot_deform_heap_tuple() code.
To fix, add a new parameter to the slot_deform_heap_tuple() and have the
callers define which code to inline. Because this new parameter is
passed as a const, the compiler can choose to emit or not emit the
cstring-related code based on the parameter's value.
Author: David Rowley <dgrowleyml@gmail.com>
Reported-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAHewXNmSK+gKziAt_WvQoMVWt3_LRVMmRYY9dAbMPMcpPV0QmA@mail.gmail.com
This commit adds both the number of parallel workers planned and the
number of parallel workers actually launched to the output of
VACUUM (VERBOSE) and autovacuum logs.
Previously, this information was only reported as an INFO message
during VACUUM (VERBOSE), which meant it was not included in autovacuum
logs in practice. Although autovacuum does not yet support parallel
vacuum, a subsequent patch will enable it and utilize these logs in
its regression tests. This change also improves observability by
making it easier to verify if parallel vacuum is utilizing the
expected number of workers.
Author: Daniil Davydov <3danissimo@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CACG=ezZOrNsuLoETLD1gAswZMuH2nGGq7Ogcc0QOE5hhWaw=cw@mail.gmail.com
This enables the use of functions such as encode() and decode() with
UUID values, allowing them to be converted to and from alternative
formats like base64 or hex.
The cast maps the 16-byte internal representation of a UUID directly
to a bytea datum. This is more efficient than going through a text
forepresentation.
Bump catalog version.
Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Co-authored-by: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://postgr.es/m/CAJ7c6TOramr1UTLcyB128LWMqita1Y7%3Darq3KHaU%3Dqikf5yKOQ%40mail.gmail.com
In a plain join, we can just summarily discard an input tuple
with null join key(s), since it cannot match anything from
the other side of the join (assuming a strict join operator).
However, if the tuple comes from the outer side of an outer join
then we have to emit it with null-extension of the other side.
Up to now, hash joins did that by inserting the tuple into the hash
table as though it were a normal tuple. This is unnecessarily
inefficient though, since the required processing is far simpler than
for a potentially-matchable tuple. Worse, if there are a lot of such
tuples they will bloat the hash bucket they go into, possibly causing
useless repeated attempts to split that bucket or increase the number
of batches. We have a report of a large join vainly creating many
thousands of batches when faced with such input.
This patch improves the situation by keeping such tuples out of the
hash table altogether, instead pushing them into a separate tuplestore
from which we return them later. (One might consider trying to return
them immediately; but that would require substantial refactoring, and
it doesn't work anyway for cases where we rescan an unmodified hash
table.) This works even in parallel hash joins, because whichever
worker reads a null-keyed tuple can just return it; there's no need
for consultation with other workers. Thus the tuplestores are local
storage even in a parallel join.
A pre-existing buglet that I noticed while analyzing the code's
behavior is that ExecHashRemoveNextSkewBucket fails to decrement
hashtable->skewTuples for tuples moved into the main hash table
from the skew hash table. This invalidates ExecHashTableInsert's
calculation of the number of main-hash-table tuples, though probably
not by a lot since we expect the skew table to be small relative
to the main one. Nonetheless, let's fix that too while we're here.
Bug: #18909
Reported-by: Sergey Koposov <Sergey.Koposov@ed.ac.uk>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/3061845.1746486714@sss.pgh.pa.us
pg_dump's compression modules had variations on the theme of
fp = fdopen(dup(fd), mode);
if (fp == NULL)
// fail, reporting errno
which is problematic for two reasons. First, if dup() succeeds but
fdopen() fails, we'd leak the duplicated FD. That's not important
at present since the program will just exit immediately after failure
anyway; but perhaps someday we'll try to continue, making the resource
leak potentially significant. Second, if dup() fails then fdopen()
will overwrite the useful errno (perhaps EMFILE) with a misleading
value EBADF, making it difficult to understand what went wrong.
Fix both issues by testing for dup() failure before proceeding to
the next call.
These failures are sufficiently unlikely, and the consequences minor
enough, that this doesn't seem worth the effort to back-patch.
But let's fix it in HEAD.
Author: Jianghua Yang <yjhjstz@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/62bbe34d-2315-4b42-b768-56d901aa83e1@gmail.com
Except for GRANT and REVOKE on roles, the GRANTED BY clause
currently only accepts the current role to match the SQL standard.
And even if an acceptable grantor (i.e., the current role) is
specified, Postgres ignores it and chooses the "best" grantor for
the command. Allowing the user to select a specific grantor would
allow better control over the precise behavior of GRANT/REVOKE
statements. This commit adds that ability. For consistency with
select_best_grantor(), we only permit choosing grantor roles for
which the current role inherits privileges.
Author: Nathan Bossart <nathandbossart@gmail.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/aRYLkTpazxKhnS_w%40nathan
Introduce dshash_find_or_insert_extended, which is just like
dshash_find_or_insert except that it takes a flags argument.
Currently, the only supported flag is DSHASH_INSERT_NO_OOM, but
I have chosen to use an integer rather than a boolean in case we
end up with more flags in the future.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoaJwUukUZGu7_yL74oMTQQz2=zqucMhF9+9xBmSC5us1w@mail.gmail.com
This patch reimplements JsonValueList to be more space-efficient
and arranges for temporary JsonValueLists created during jsonpath
evaluation to be freed when no longer needed, rather than being
leaked till the end of the function evaluation cycle as before.
The motivation is to prevent indefinite memory bloat while
evaluating jsonpath expressions that traverse a lot of data.
As an example, this query
SELECT
jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i),
'$[*] ? (@ < $)');
formerly required about 6GB to execute, with the space required
growing quadratically with the length of the input array.
With this patch the memory consumption stays static. (The time
required is still quadratic, but we can't do much about that: this
path expression asks to compare each array element to each other one.)
The bloat happens because we construct a JsonValueList containing all
the array elements to represent the second occurrence of "$", and then
just leak it after evaluating the filter expression for any one value
generated from "$[*]". If I were implementing this functionality from
scratch I'd probably try to avoid materializing that representation at
all, but changing that now looks like more trouble than it's worth.
This patch takes the more conservative approach of just making sure
we free the list after we're done with it.
The existing representation of JsonValueList is neither especially
compact nor especially easy to free: it's a List containing pointers
to separately-palloc'd JsonbValue structs. We could theoretically
use list_free_deep, but it's not 100% clear that all the JsonbValues
are always safe for us to free. In any case we are talking about a
lot of palloc/pfree traffic if we keep it like this. This patch
replaces that with what's essentially an expansible array of
JsonbValues, so that even a long list requires relatively few
palloc requests. Also, for the very common case that only one or
two elements appear in the list, this representation uses *zero*
pallocs: the elements can be kept in the on-the-stack base struct.
Note that we are only interested in freeing the JsonbValue structs
themselves. While many types of JsonbValue include pointers to
external data such as strings or numerics, we expect that that data
is part of the original jsonb input Datum(s) and need not (indeed
cannot) be freed here.
In this reimplementation, JsonValueListAppend() always copies the
supplied JsonbValue struct into the JsonValueList data. This allows
simplifying and regularizing many call sites that sometimes palloc'd
JsonbValues and sometimes passed a local-variable JsonbValue. Always
doing the latter is simpler, faster, and less bug-prone.
I also removed JsonValueListLength() in favor of constant-time tests
for whether the list has zero, one, or more than one member, which is
what the callers really need to know. JsonValueListLength() was not
a hot code path, so this aspect of the patch won't move the needle in
the least performance-wise. But it seems neater.
I've not done any wide-ranging performance testing, but this should
be faster than the old code thanks to reduction of palloc overhead.
On the specific example shown above, it's about twice as fast as
before on not-very-large inputs; and of course it wins big if you
consider an input large enough to drive the old code into swapping.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/569394.1773783211@sss.pgh.pa.us
The recent commit to change copyObject() to use typeof_unqual allows
cleaning up some APIs to take advantage of this improved qualifier
handling. EventTriggerCollectSimpleCommand() is a good example: It
takes a node tree and makes a copy that it keeps around for its
internal purposes, but it can't communicate via its function signature
that it promises not scribble on the passed node tree. That is now
fixed.
Reviewed-by: David Geier <geidav.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/92f9750f-c7f6-42d8-9a4a-85a3cbe808f3%40eisentraut.org
ScalarArrayOpExpr used for either NOT IN or <>/= ALL, when the array
contains a NULL constant, will never evaluate to true. Here we add an
explicit short-circuit in scalararraysel() to account for this and return
0.0 rows when we see that a NULL exists. When the array is a constant,
we can very quickly see if there are any NULL values and return early
before going to much effort in scalararraysel(). For non-const arrays,
we short-circuit after finding the first NULL and forego selectivity
estimations of any remaining elements.
In the future, it might be better to do something for this case in
constant folding. We would need to be careful to only do this for
strict operators on expressions located in places that don't care about
distinguishing false from NULL returns. i.e. EXPRKIND_QUAL expressions.
Doing that requires a bit more thought and effort, so here we just fix
some needlessly slow selectivity estimations for ScalarArrayOpExpr
containing many array elements and at least one NULL.
Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/eaa2598c-5356-4e1e-9ec3-5fd6eb1cd704@tantorlabs.com
This module includes two functions:
- test_saslprep(), that performs pg_saslprep on a bytea.
- test_saslprep_ranges(), able to check for all valid ranges of UTF-8
codepoints pg_saslprep() handles each one of them.
This provides a detailed coverage of our implementation of SASLprep()
used for SCRAM, with:
- ASCII characters.
- Incomplete UTF-8 sequences, for 390b3cbbb2 (later backpatched).
- A more advanced check for all the valid UTF-8 ranges of codepoints, to
check for cases where these generate an empty password, based on an
original suggestion from Heikki Linnakangas. This part consumes
resources and time, so it is implemented as a TAP test under a
new PG_TEST_EXTRA value.
A different patch is still under discussion to tweak our internal
SASLprep() implementation, and this module can be used to track any
changes in behavior.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/aaEJ-El2seZHeFcG@paquier.xyz
widowbird has failed again after af8837a10b, with the same symptoms of
a backend still lying around when attempting a database rename with a
bgworker connected to the database being renamed.
We are still not sure yet how the failure can be reached, if this is a
timing issue in the test or an actual bug in the logic used for
interruptible bgworkers. This commit adds more debugging information in
the backend to help with the analysis as a temporary measure.
Another thing I have noticed is that the queries launching the dynamic
bgworkers or checking pg_stat_activity would connect to the database
renamed. These are switched to use 'postgres'. That will hopefully
remove some of the friction of the test, but I doubt that this is the
end of the story.
Discussion: https://postgr.es/m/abtJLEAsf1HZXWdR@paquier.xyz
The second half of this file is meant to test feedback, not
generated advice, and is meant to use the statements that it
prepares, not leftover prepared statements from earlier in the
file.
These mistakes resulted in failures under debug_discard_caches = 1,
because re-executing pt2 instead of executing pt4 for the first
time resulted in different output depending on whether the query
was replanned.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us> (per BF member avocet)
SSL password command reloading must be enabled on Windows and in
EXEC_BACKEND builds due to them always reloading the context. The
new tests in commit 4f433025 skipped under Windows but missed the
EXEC_BACKEND check. Reported by buildfarm member culicidae.
Author: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAOYmi+kXmCCgBWffzmSjaNhME5rD=gjyc_OP1FeWQTw2MmSNjg@mail.gmail.com
When headerscheck compiles ecpglib_extern.h, POSTGRES_ECPG_INTERNAL is
not defined, causing sqlca.h to expand "sqlca" as a macro
(*ECPGget_sqlca()). This causes the ecpg_init_sqlca() declaration to
trigger a -Wstrict-prototypes warning.
Fix by renaming the parameter from "sqlca" to "sqlca_p" in both the
declaration and definition, avoiding the macro expansion.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reported-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAN55FZ1VDwJ-ZD092ChYf%2B%2BhuP%2B-S3Cg45tJ8jNH5wx2c4BHAg%40mail.gmail.com
Unlike pg_dump and pg_dumpall, pg_restore first checks whether the
argument passed to --format, --host, and --port is empty before
setting the corresponding variable. Consequently, pg_restore does
not error if given an empty format name, whereas pg_dump and
pg_dumpall do. Empty arguments for --host and --port are ignored
by all three applications, so this commit produces no functionality
changes there. This behavior should perhaps be reconsidered, but
that is left as a future exercise. As with other recent changes to
option handling for these applications (commits b2898baaf7,
7c8280eeb5, and be0d0b457c), no back-patch.
Author: Mahendra Singh Thalor <mahi6run@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Discussion: https://postgr.es/m/CAKYtNApkh%3DVy2DpNRCnEJmPpxNuksbAh_QBav%3D2fLmVjBhGwFw%40mail.gmail.com
Like other Bison-written headers, it's not worth the trouble to
make this compilable standalone. (We might revisit this someday,
if we ever move up our minimum required Bison version.)
Support for SNI was added to clientside libpq in 5c55dc8b47 with the
sslsni parameter, but there was no support for utilizing it serverside.
This adds support for serverside SNI such that certificate/key handling
is available per host. A new config file, $datadir/pg_hosts.conf, is
used for configuring which certificate and key should be used for which
hostname. In order to use SNI the ssl_sni GUC must be set to on, when
it is off the ssl configuration works just like before. If ssl_sni is
enabled and pg_hosts.conf is non-empty it will take precedence over
the regular SSL GUCs, if it is empty or missing the regular GUCs will
be used just as before this commit with no hostname specific handling.
The TLS init hook is not compatible with ssl_sni since it operates on
a single TLS configuration and SNI break that assumption. If the init
hook and ssl_sni are both enabled, a WARNING will be issued.
Host configuration can either be for a literal hostname to match, non-
SNI connections using the no_sni keyword or a default fallback matching
all connections. By omitting no_sni and the fallback a strict mode
can be achieved where only connections using sslsni=1 and a specified
hostname are allowed.
CRL file(s) are applied from postgresql.conf to all configured hostnames.
Serverside SNI requires OpenSSL, currently LibreSSL does not support
the required infrastructure to update the SSL context during the TLS
handshake.
Author: Daniel Gustafsson <daniel@yesql.se>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Dewei Dai <daidewei1970@163.com>
Reviewed-by: Cary Huang <cary.huang@highgo.ca>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/1C81CD0D-407E-44F9-833A-DD0331C202E5@yesql.se
These tests were originally written to test the SSL SNI patchset
but they have merit on their own since we lack coverage for these
scenarios in the non SNI case as well.
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/1C81CD0D-407E-44F9-833A-DD0331C202E5@yesql.se
Move explain_mask_costs() and the associated planner row-estimation
tests from misc_functions.sql to a new regression test file,
planner_est.sql.
Previously, there wasn't an ideal home for such tests, likely as there
were very few such tests due to width and selectivity estimations being
too dependent on statistics and hardware. That's not always the case, as
we have SupportRequestRows support functions. More such tests are
possibly on the way, so let's create a better home so that we don't have
to create the explain_mask_costs() function in each file we might have
added such tests to.
Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvphShGABn-3AoE36dTvGHW7gUpFSw0_ZZnH84wGCW3hHw@mail.gmail.com
The IS JSON predicate only accepted the base types text, json, jsonb, and
bytea. Extend it to also accept domain types over those base types by
resolving through getBaseType() during parse analysis.
The base type OID is stored in the JsonIsPredicate node (as exprBaseType)
so the executor can dispatch to the correct validation path without
repeating the domain lookup at runtime.
When a non-supported type (or domain over a non-supported type) is used,
the error message displays the original type name as written by the user,
rather than the resolved base type.
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CACJufxEk34DnJFG72CRsPPT4tsJL9arobX0tNPsn7yH28J=zQg@mail.gmail.com
In 82467f627b I somehow ended up using 'so->currPos.buf' instead of the 'buf'
variable, which is incorrect when the buffer is not already pinned. At the
very least this can lead to assertion failures
Unfortunately this shows that this code path was not covered. Expand
src/test/modules/index/specs/killtuples.spec to test it. Until now the
'result' step always reported either a 0 or 1 buffer accesses, but when
exercising hash overflows, more buffers are accessed. To avoid depending on
the precise number of accesses, change the result step to return whether there
were any heap accesses. That makes the change a lot more verbose, but still
seems worth it.
Reported-by: Alexander Kuzmenkov <akuzmenkov@tigerdata.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/vjtmvwvbxt7w5uyacxpzibpj65ewcb7uqaqbhd4arvnjbp5jqz%405ksdh6fsyqve
Discussion: https://postgr.es/m/b9de8d05-3b02-4a27-9b0b-03972fa4bfd3@iki.fi
The previous code could allocate pgpa_sj_unique_rel objects in a context
that had too short a lifespan. Fix by allocating them (and any
associated List-related allocations) in the same context as the
pgpa_planner_state to which they are attached. We also need to copy
uniquerel->relids, because the associated RelOptInfo may also be
allocated within a short-lived context.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: http://postgr.es/m/a6e6d603-e847-44dc-acd5-879fb4570062@gmail.com
The TAP test included in this new module runs the regression tests
with pg_plan_advice loaded. It arranges for each query to be planned
twice. The first time, we generate plan advice. The second time, we
replan the query using the resulting advice string. If the tests
fail, that means that using pg_plan_advice to tell the planner to
do what it was going to do anyway breaks something, which indicates
a problem either with pg_plan_advice or with the planner.
The test also enables pg_plan_advice.feedback_warnings, so that if the
plan advice fails to apply cleanly when the query is replanned, a
failure will occur.
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Discussion: http://postgr.es/m/CA%2BTgmoZzM2i%2Bp-Rxdphs4qx7sshn-kzxF91ASQ5duOo0dFRXLQ%40mail.gmail.com
The Makefile failed to set HEADERS_pg_plan_advice, so the header wasn't
installed. Fixing that reveals another problem: since this is just a
loadable module, not an extension, the header file is installed into
$(includedir_server)/contrib rather than $(includedir_server)/extension.
While we have no existing cases of installing header files there, it
appears to be the intent of pgxs.mk. However, this is inconsistent with
meson.build, which was using dir_include_extension. Changing that to
dir_include_server / 'contrib' makes the install locations consistent
across the two builds.
Author: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: http://postgr.es/m/CAN4CZFP6NOjv__4Mx+iQD8StdpbHvzDAatEQn2n15UKJ=MySSQ@mail.gmail.com
This query fetches information from pg_stats, which did not return
table OIDs until recent commit 3b88e50d6c. Because of this, we had
to cart around arrays of schema and table names, and we needed an
extra filter clause to hopefully convince the planner to use the
correct index. With the introduction of pg_stats.tableid, we can
instead just use an array of OIDs, and we no longer need the extra
filter clause hack.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CADkLM%3DcoCVy92QkVUUTLdo5eO2bMDtwMrzRn_8miAhX%2BuPaqXg%40mail.gmail.com
A new attempt was made in 63275ce84d to make typeof_unqual work on all
configurations of CC and CLANG. This re-introduced an old problem
though, where CLANG would only support __typeof_unqual__ but the
configure check for CC detected support for typeof_unqual.
This fixes that by always defining typeof_unqual as __typeof_unqual__
under clang.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/92f9750f-c7f6-42d8-9a4a-85a3cbe808f3%40eisentraut.org
Complete the TODOs in to_json_is_immutable() and to_jsonb_is_immutable()
by recursing into container types (arrays, composites, ranges, multiranges,
domains) to check element/sub-type mutability, rather than conservatively
returning "mutable" for all arrays and composites.
The shared logic is factored into a single json_check_mutability() function
in jsonfuncs.c, with the existing exported functions as thin wrappers.
Composite type inspection uses lookup_rowtype_tupdesc() (typcache) instead
of relation_open() to avoid unnecessary lock acquisition in the optimizer.
Range and multirange types are now also checked recursively: if the
subtype's conversion is immutable, the range is considered immutable
for JSON purposes, even though range_out is generically marked STABLE.
This is a behavioral change: range types with immutable subtypes (e.g.,
int4range) can now appear in expression indexes via JSON_ARRAY/JSON_OBJECT,
whereas previously they were conservatively rejected.
Add regression tests for JSON_ARRAY and JSON_OBJECT mutability with
expression indexes and generated columns, covering arrays, composites,
domains, ranges, multiranges and combinations thereof.
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CACJufxFz=OsXQdsMJ-cqoqspD9aJrwntsQP-U2A-UaV_M+-S9g@mail.gmail.com
Commitfest: https://commitfest.postgresql.org/patch/5759
This commit adds table OID and attribute number columns to
pg_stats, and it adds table OID and statistics object OID columns
to pg_stats_ext and pg_stats_ext_exprs. A proposed follow-up
commit would use pg_stats.tableid to simplify a query in pg_dump.
The others have no immediate purpose but may be useful later.
Bumps catversion.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM%3DcoCVy92QkVUUTLdo5eO2bMDtwMrzRn_8miAhX%2BuPaqXg%40mail.gmail.com
In pg_get_propgraphdef(), sort the labels before writing out, for a
consistent dump order. Also, since we now have a list, we can get rid
of the separate table scan to get the count.
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Co-authored-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Co-authored-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/a855795d-e697-4fa5-8698-d20122126567@eisentraut.org
This commit adds two improvements to gen_guc_tables.pl:
1) When finding two entries with the same name, the script complained
about these being not in alphabetical order, which was confusing.
Duplicated entries are now reported as their own error.
2) While the presence of the required fields is checked for all the
parameters, the script did not perform any checks on the non-required
fields. A check is added to check that any field defined matches with
what can be accepted. Previously, a typo in the name of a required
field would cause the field to be reported as missing. Non-mandatory
fields would be silently ignored, which was problematic as we could lose
some information.
Author: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAN4CZFP=3xUoXb9jpn5OWwicg+rbyrca8-tVmgJsQAa4+OExkw@mail.gmail.com
[NO] INHERIT is not supported for partitioned tables, but this portion
of tablecmds.c did not apply the same rules as the other sub-commands,
checking the relkind in the execution phase, not the preparation phase.
This commit refactors the code to centralize the relkind and other
checks in the preparation phase for both command patterns, getting rid
of one translatable string on the way. ATT_PARTITIONED_TABLE is
removed from ATSimplePermissions(), and the child relation is checked
the same way for both sub-commands. The ALTER TABLE patterns that now
fail at preparation failed already at execution, hence there should be
no changes from the user perspective except more consistent error
messages generated.
Some comments at the top of ATPrepAddInherit() were incorrect,
CreateInheritance() being the routine checking the columns and
constraints between the parent and its to-be-child.
Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAEoWx2kggo1N2kDH6OSfXHL_5gKg3DqQ0PdNuL4LH4XSTKJ3-g@mail.gmail.com
The test has been reported as having a race condition for the case of a
worker that should be terminated after a database rename. Based on the
report received from buildfarm member jay, the database renamed is
accessed by a different session, preventing the ALTER DATABASE to
complete, ultimately failing the test.
Honestly, I am not completely sure what is the origin of this
disturbance, but two possibilities are an autovacuum or parallel worker
(due to debug_parallel_query being used by the host). In order to
(hopefully) stabilize the test, autovacuum and debug_parallel_query are
now disabled in the configuration of the node used in the test.
The failure is hard to reproduce, so it will take a few weeks to make
sure that the test has become stable. Let's see where it goes.
Reported-by: Aya Iwata <iwata.aya@fujitsu.com>
Discussion: https://postgr.es/m/OS3PR01MB8889505E2F3E443CCA4BD72EEA45A@OS3PR01MB8889.jpnprd01.prod.outlook.com
Previously, this was 16 bytes. With the use of some bitflags and by
reducing the attcacheoff field size to a 16-bit type, we can halve the
size of the struct.
It's unlikely that caching the offsets for offsets larger than what will
fit in a 16-bit int will help much as the tuple is very likely to have
some non-fixed-width types anyway, the offsets of which we cannot cache.
Shrinking this down to 8 bytes helps by accessing fewer cachelines when
performing tuple deformation. The fields used there are all fully
fledged fields, which don't require any bitmasking to extract the value
of. It also helps to more efficiently calculate the address of a
compact_attrs[] element in TupleDesc as the x86 LEA instruction can work
with 8 byte offsets, which allows the element address to be calculated
from the TupleDesc's address in a single instruction using LEA's
concurrent shift and add.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAApHDvodSVBj3ypOYbYUCJX%2BNWL%3DVZs63RNBQ_FxB_F%2B6QXF-A%40mail.gmail.com
Commit 6eedb2a5fd made the logical walsender call
XLogFlush(GetXLogInsertRecPtr()) to ensure that all pending WAL is flushed,
fixing a publisher shutdown hang. However, if the last WAL record ends at
a page boundary, GetXLogInsertRecPtr() can return an LSN pointing past
the page header, which can cause XLogFlush() to report an error.
A similar issue previously existed in the GiST code. Commit b1f14c9672
introduced GetXLogInsertEndRecPtr(), which returns a safe WAL insertion end
location (returning the start of the page when the last record ends at a page
boundary), and updated the GiST code to use it with XLogFlush().
This commit fixes the issue by making the logical walsender use
XLogFlush(GetXLogInsertEndRecPtr()) when flushing pending WAL during shutdown.
Backpatch to all supported versions.
Reported-by: Andres Freund <andres@anarazel.de>
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/vzguaguldbcyfbyuq76qj7hx5qdr5kmh67gqkncyb2yhsygrdt@dfhcpteqifux
Backpatch-through: 14
This code was recently adjusted by c456e3911, but that commit didn't get
the logic correct when finding the attnum to start walking the tuple in.
If there is a NULL, we need to start walking the tuple before it.
Author: David Rowley <dgrowleyml@gmail.com>
Reported-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAHewXNnb-s_=VdVUZ9h7dPA0u3hxV8x2aU3obZytnqQZ_MiROA@mail.gmail.com
TOK_IDENT allows only non-keywords; identifier should be used
any place where either keywords or non-keywords should be accepted.
Hence, without this commit, any string that happens to be a keyword
can't be used as a partition schema, partition name, or plan name,
which is incorrect.
Author: Lukas Fittl <lukas@fittl.com>
Discussion: http://postgr.es/m/CAP53PkzKeD=t90OfeMsniYrcRe2THQbUx3g6wV17Y=ZtiwmWTQ@mail.gmail.com
The fundamental problem is that when we call clang to generate
bitcode, we might be using configure results from a different
compiler, which might not be fully compatible with the clang we are
using. In practice, clang supports most things other compilers
support, so this has apparently not been a problem in practice.
But commits 4cfce4e62c, 0af05b5dbb, and 59292f7aac have been
struggling to make typeof_unqual work in this situation. Clang added
support in version 19, GCC in version 14, so if you are using, say,
GCC 14 and Clang 16, the compilation with the latter will fail. Such
combinations are not very likely in practice, because GCC 14 and Clang
19 were released within a few months of each other, and so Linux
distributions are likely to have suitable combinations. But some
buildfarm members and some Fedora versions are affected, so this tries
to fix it.
The fully correct solution would be to run a separate set of configure
tests for that clang-for-bitcode, but that would be very difficult to
implement, and probably of limited use in practice. So the workaround
here is that we hardcodedly override the configure result under clang
based on the version number. As long as we only have a few of these
cases, this should be manageable.
Also swap the order of the tests of typeof_unqual: Commit 59292f7aac
tested the underscore variant first, but the reasons for that are now
gone.
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/92f9750f-c7f6-42d8-9a4a-85a3cbe808f3%40eisentraut.org
This commit teaches pg_dumpall to fail when both --clean and
--data-only are specified. Previously, it passed the options
through to pg_dump, which would fail after pg_dumpall had already
started producing output. Like recent commits b2898baaf7 and
7c8280eeb5, no back-patch.
Author: Mahendra Singh Thalor <mahi6run@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Discussion: https://postgr.es/m/CAKYtNArrHiJ0LDB9BFZiUWs6tC78QkBN50wiwO07WhxewYDS3Q%40mail.gmail.com
Remove a bunch of #include lines from execnodes.h. Most of these
requier suitable typedefs to be added, so that it still compiles
standalone. In one case, the fix is to move a struct definition to the
one .c file where it is needed.
Also some light clean up in plannodes.h and genam.h, though not as
extensive as in execnodes.h.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/202603131240.ihwqdxnj7w2o@alvherre.pgsql
Commit 8fe315f18d added the stats_reset column to pg_statio_all_sequences and
included a regression test to verify that statistics in this view are reset
correctly. However, this test caused buildfarm member crake to report
a pg_upgradeCheck failure.
The failing test assumed that the blks_read and blks_hit counters
in pg_statio_all_sequences would be zero after calling
pg_stat_reset_single_table_counters(). On crake, however, either blks_read or
blks_hit sometimes appeared as 1 during the pg_upgradeCheck test, even right
after the reset.
Since these counters may change due to concurrent activity and the test is
unstable, this commit removes the checks for blks_read and blks_hit in
pg_statio_all_sequences from the regression test.
Per buildfarm member crake.
Discussion: https://postgr.es/m/CAHGQGwFcay_tX=7HSS=N=+Yd0FLEm2GrJgwxnqHM4wvxX0B=4g@mail.gmail.com
When an extension is located via extension_control_path and it has a
hardcoded $libdir/ path, this is stripped by the
extension_control_path mechanism. But when pg_upgrade verifies the
extension using LOAD, this stripping does not happen, and so
pg_upgrade will fail because it cannot load the extension. To work
around that, change pg_upgrade to itself strip the prefix when it runs
its checks. A test case is also added.
Author: Jonathan Gonzalez V. <jonathan.abdiel@gmail.com>
Reviewed-by: Niccolò Fei <niccolo.fei@enterprisedb.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/43b3691c673a8b9158f5a09f06eacc3c63e2c02d.camel%40gmail.com
They were already using pg_attribute_aligned. This replaces that with
alignas and moves that into the required syntactic position.
Suggested-by: Peter Eisentraut <peter@eisentraut.org>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/d7a788fa-e609-4894-a8be-2f70e135424f%40eisentraut.org
A following commit will enable -Wstrict-prototypes and -Wold-style-definition
by default. This commit fixes the warnings that those new flags will generate
before actually adding the new flags.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/13d51b20-a69c-4ac1-8546-ec4fc278064f%40eisentraut.org
Implementation of SQL property graph queries, according to SQL/PGQ
standard (ISO/IEC 9075-16:2023).
This adds:
- GRAPH_TABLE table function for graph pattern matching
- DDL commands CREATE/ALTER/DROP PROPERTY GRAPH
- several new system catalogs and information schema views
- psql \dG command
- pg_get_propgraphdef() function for pg_dump and psql
A property graph is a relation with a new relkind RELKIND_PROPGRAPH.
It acts like a view in many ways. It is rewritten to a standard
relational query in the rewriter. Access privileges act similar to a
security invoker view. (The security definer variant is not currently
implemented.)
Starting documentation can be found in doc/src/sgml/ddl.sgml and
doc/src/sgml/queries.sgml.
Author: Peter Eisentraut <peter@eisentraut.org>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Ajay Pal <ajay.pal.k@gmail.com>
Reviewed-by: Henson Choi <assam258@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/a855795d-e697-4fa5-8698-d20122126567@eisentraut.org
When log_lock_waits is enabled, the "still waiting on lock" message is normally
emitted only once while a session continues waiting. However, if the wait is
interrupted, for example by wakeups from client_connection_check_interval,
SIGHUP for configuration reloads, or similar events, the message could be
emitted again each time the wait resumes.
For example, with very small client_connection_check_interval values
(e.g., 100 ms), this behavior could flood the logs with repeated messages,
making them difficult to use.
To prevent this, this commit guards the "still waiting on lock" message so
it is reported at most once during a lock wait, even if the wait is interrupted.
This preserves the intended behavior when no interrupts occur.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Hüseyin Demir <huseyin.d3r@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwHZUmg+r4kMcPYt_Z-txxVX+CJJhfra+qemxKXvAxYbpw@mail.gmail.com
ALTER TABLE .. CLUSTER ON and SET WITHOUT CLUSTER are not supported for
partitioned tables and already fail with a check happening when the
sub-command is executed, not when it is prepared.
This commit moves the relkind check for partitioned tables to happen
when the sub-command is prepared in ATSimplePermissions(). This matches
with the practice of the other sub-commands of ALTER TABLE, shaving one
translatable string.
mark_index_clustered() can be a bit simplified, switching one
elog(ERROR) to an assertion. Note that mark_index_clustered() can also
be called through a CLUSTER command, but it cannot be reached for a
partitioned table, per the assertion based on the relkind in
cluster_rel(), and there is only one caller of rebuild_relation().
Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAEoWx2kggo1N2kDH6OSfXHL_5gKg3DqQ0PdNuL4LH4XSTKJ3-g@mail.gmail.com
pg_statio_all_sequences lacked a stats_reset column, unlike the other
pg_statio_* views that already expose it. This commit adds the column so
users can see when the statistics in this view were last reset.
Also this commit updates the documentation for
pg_stat_reset_single_table_counters() to clarify that it can reset statistics
for sequences and materialized views as well.
Catalog version bumped.
Author: Sami Imseih <samimseih@gmail.com>
Co-authored-by: Shihao Zhong <zhong950419@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0v0OPGyDpwxkX81CtTt9xsj9-TNxhm=8JdOvEKPsVVFNg@mail.gmail.com
Commit 4daa140a2f introduced proper decoding for speculative aborts. As a
result, the internal state is guaranteed to be clean when a new
speculative insert is encountered. This patch removes the defensive
cleanup code that is no longer reachable.
Author: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/23256.1772702981@localhost
Commit 2a525cc97e introduced the ON_ERROR = 'set_null' option for COPY,
allowing it to be used with foreign tables backed by file_fdw. However,
unlike ON_ERROR = 'ignore', no regression test was added to verify
this behavior for file_fdw.
This commit adds a regression test to ensure that foreign tables using
file_fdw work correctly with ON_ERROR = 'set_null', improving test coverage.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Yi Ding <dingyi_yale@163.com>
Discussion: https://postgr.es/m/CAHGQGwGmPc6aHpA5=WxKreiDePiOEitfOFsW2dSo5m81xWXgRA@mail.gmail.com
This commit refactors hashbulkdelete() to use streaming reads, improving
the efficiency of the operation by prefetching upcoming buckets while
processing a current bucket. There are some specific changes required
to make sure that the cleanup work happens in accordance to the data
pushed to the stream read callback. When the cached metadata page is
refreshed to be able to process the next set of buckets, the stream is
reset and the data fed to the stream read callback has to be updated.
The reset needs to happen in two code paths, when _hash_getcachedmetap()
is called.
The author has seen better performance numbers than myself on this one
(with tweaks similar to 6c228755ad). The numbers are good enough for
both of us that this change is worth doing, in terms of IO and runtime.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VrqfbcDXqGrdLQ2xaQ=K0RzExNuw6U_GGqzSJu32wfdQ@mail.gmail.com
We had defenses against -ffast-math in timestamp-related files,
which is a pretty obsolete place for them since we've not supported
floating-point timestamps in a long time. Remove those and instead
put one in float.c, which is still broken by using this switch.
Add some commentary to put more color on why it's a bad idea.
Also remove the check from configure. That was just there to fail
faster, but it doesn't really seem necessary anymore, and besides
we have no corresponding check in meson.build.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Suggested-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/abFXfKC8zR0Oclon%40ip-10-97-1-34.eu-west-3.compute.internal
Print an OID value inserted into a SQL query with %u not %d.
The existing code accidentally fails to malfunction when
given an OID above 2^31, but only accidentally; future changes
to our SQL parser could perhaps break it.
Declare the Oid values that ecpg_type_infocache_push() and
ecpg_is_type_an_array() work with as "Oid" not "int".
This doesn't have any functional effect, but it's clearer.
At the moment I don't see a need to back-patch this.
Bug: #19429
Author: fairyfar@msn.com
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19429-aead3b1874be1a99@postgresql.org
This commit includes various optimizations to improve the performance of
tuple deformation.
We now precalculate CompactAttribute's attcacheoff, which allows us to
remove the code from the deform routines which was setting the
attcacheoff. Setting the attcacheoff is now handled by
TupleDescFinalize(), which must be called before the TupleDesc is used for
anything. Having TupleDescFinalize() means we can store the first
attribute in the TupleDesc which does not have an offset cached. That
allows us to add a dedicated deforming loop to deform all attributes up
to the final one with an attcacheoff set, or up to the first NULL
attribute, whichever comes first.
Here we also improve tuple deformation performance of tuples with NULLs.
Previously, if the HEAP_HASNULL bit was set in the tuple's t_infomask,
deforming would, one-by-one, check each and every bit in the NULL bitmap
to see if it was zero. Now, we process the NULL bitmap 1 byte at a time
rather than 1 bit at a time to find the attnum with the first NULL. We
can now deform the tuple without checking for NULLs up to just before that
attribute.
We also record the maximum attribute number which is guaranteed to exist
in the tuple, that is, has a NOT NULL constraint and isn't an
atthasmissing attribute. When deforming only attributes prior to the
guaranteed attnum, we've no need to access the tuple's natt count. As an
additional optimization, we only count fixed-width columns when
calculating the maximum guaranteed column, as this eliminates the need to
emit code to fetch byref types in the deformation loop for guaranteed
attributes.
Some locations in the code deform tuples that have yet to go through NOT
NULL constraint validation. We're unable to perform the guaranteed
attribute optimization when that's the case. This optimization is opt-in
via the TupleTableSlot using the TTS_FLAG_OBEYS_NOT_NULL_CONSTRAINTS
flag.
This commit also adds a more efficient way of populating the isnull
array by using a bit-wise SWAR trick which performs multiplication on the
inverse of the tuple's bitmap byte and masking out all but the lower bit
of each of the boolean's byte. This results in much more optimal code
when compared to determining the NULLness via att_isnull(). 8 isnull
elements are processed at once using this method, which means we need to
round the tts_isnull array size up to the next 8 bytes. The palloc code
does this anyway, but the round-up needed to be formalized so as not to
overwrite the sentinel byte in MEMORY_CONTEXT_CHECKING builds. Doing
this also allows the NULL-checking deforming loop to more efficiently
check the isnull array, rather than doing the bit-wise processing for each
attribute that att_isnull() does.
The level of performance improvement from these changes seems to vary
depending on the CPU architecture. Apple's M chips seem particularly
fond of the changes, with some of the tested deform-heavy queries going
over twice as fast as before. With x86-64, the speedups aren't quite as
large. With tables containing only a small number of columns, the
speedups will be less.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CAApHDvpoFjaj3%2Bw_jD5uPnGazaw41A71tVJokLDJg2zfcigpMQ%40mail.gmail.com
As of this commit all TupleDescs must have TupleDescFinalize() called on
them once the TupleDesc is set up and before BlessTupleDesc() is called.
In this commit, TupleDescFinalize() does nothing. This change has only
been separated out from the commit that properly implements this function
to make the change more obvious. Any extension which makes its own
TupleDesc will need to be modified to call the new function.
The follow-up commit which properly implements TupleDescFinalize() will
cause any code which forgets to do this to fail in assert-enabled builds in
BlessTupleDesc(). It may still be worth mentioning this change in the
release notes so that extension authors update their code.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CAApHDvpoFjaj3%2Bw_jD5uPnGazaw41A71tVJokLDJg2zfcigpMQ%40mail.gmail.com
CatalogCacheCreateEntry() computed the space needed for a CatCTup
as sizeof(CatCTup) + MAXIMUM_ALIGNOF. That's not our usual style,
and it wastes memory by allocating more padding than necessary.
On 64-bit machines sizeof(CatCTup) would be maxaligned already
since it contains pointer fields, therefore this code is wasting
8 bytes compared to the more usual MAXALIGN(sizeof(CatCTup)).
While at it, we don't really need to do MemoryContextSwitchTo()
when we're only allocating one block.
Author: ChangAo Chen <cca5507@qq.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/tencent_A42E0544C6184FE940CD8E3B14A3F0A39605@qq.com
Coverity complained that this function leaked the dumpdirpath string,
which it did. But we don't need to make a copy at all, because
there's not really any point in trimming trailing slashes from the
directory name here. If that were needed, the initial
file_exists_in_directory() test would have failed, since it doesn't
bother with that (and neither does anyplace else in this file).
Moreover, if we did want that, reimplementing canonicalize_path()
poorly is not the way to proceed. Arguably, all of this code should
be reexamined with an eye to using src/port/path.c's facilities, but
for today I'll settle for getting rid of the memory leak.
Future commits will use the visibility map in on-access pruning to fix
VM corruption and set the VM if the page is all-visible.
Saving the vmbuffer in the scan descriptor reduces the number of times
it would need to be pinned and unpinned, making the overhead of doing so
negligible.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/C3AB3F5B-626E-4AAA-9529-23E9A20C727F%40gmail.com
BufferGetPage() isn't cheap and heap_update() calls it multiple times
when it could just save the page from a single call. Do that.
While we are at it, make separate variables for old and new page in
heap_xlog_update(). It's confusing to reuse "page" for both pages.
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_a%2BhO4PCptyaPR7AMZd7FjcHfOFKKJT8ouU3KedMud0tQ%40mail.gmail.com
Solaris descendants (Illumos, OpenIndiana, OmniOS, etc.) hit System V
semaphore limits ("No space left on device" from semget) when running
many parallel test scripts under default system settings. We could
tell people to raise those settings, but there's a better answer.
Unnamed POSIX semaphores have been available on Solaris for decades
and work well, so prefer them, as was recently done for AIX.
This patch also updates the documentation to remove now-unnecessary
advice about raising project.max-sem-ids and project.max-msg-ids.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Greg Burd <greg@burd.me>
Discussion: https://postgr.es/m/470305.1772417108@sss.pgh.pa.us
"initdb -d" has been broken since commit f95d73ed4, because I changed
aclitemin to work in bootstrap mode but failed to consider aclitemout.
That routine isn't reached by default, but it is if the elog message
level is high enough, so it needs to work without catalog access too.
This patch just makes it use its existing code paths to print role
OIDs numerically. We could alternatively invent an inverse of
boot_get_role_oid() and print them symbolically, but that would take
more code and it's not apparent that it'd be any better for debugging
purposes.
Reported-by: Greg Burd <greg@burd.me>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/4416.1773328045@sss.pgh.pa.us
The comment about ParallelWorkerNumbr in parallel.c says:
In parallel workers, it will be set to a value >= 0 and < the number
of workers before any user code is invoked; each parallel worker will
get a different parallel worker number.
However asserts in various places collecting instrumentation allowed
(ParallelWorkerNumber == num_workers). That would be a bug, as the value
is used as index into an array with num_workers entries.
Fixed by adjusting the asserts accordingly. Backpatch to all supported
versions.
Discussion: https://postgr.es/m/5db067a1-2cdf-4afb-a577-a04f30b69167@vondra.me
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Backpatch-through: 14
This commit plugs into pgstattuple_approx(), the SQL function faster
than pgstattuple() that returns approximate results, the streaming read
APIs. A callback is used to be able to skip all-visible pages via VM
lookup, to match with the logic prior to this commit.
Under test conditions similar to 6c228755ad (some dm_delay and
debug_io_direct=data), this can substantially improve the execution time
of the function, particularly for large relations.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VrqfbcDXqGrdLQ2xaQ=K0RzExNuw6U_GGqzSJu32wfdQ@mail.gmail.com
This changes the TupleTableSlotOps contract to make it so the
getsomeattrs() function is in charge of calling
slot_getmissingattrs().
Since this removes all code from slot_getsomeattrs_int() aside from the
getsomeattrs() call itself, we may as well adjust slot_getsomeattrs() so
that it calls getsomeattrs() directly. We leave slot_getsomeattrs_int()
intact as this is still called from the JIT code.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAApHDvodSVBj3ypOYbYUCJX%2BNWL%3DVZs63RNBQ_FxB_F%2B6QXF-A%40mail.gmail.com
Use fake LSNs in all nbtree critical sections that write a WAL record.
That way we can safely apply the _bt_killitems LSN trick with logged and
unlogged indexes alike. This brings the same benefits to plain scans of
unlogged relations that commit 2ed5b87f brought to plain scans of logged
relations: scans will drop their leaf page pin eagerly (by applying the
"dropPin" optimization), which avoids blocking progress by VACUUM. This
is particularly helpful with applications that allow a scrollable cursor
to remain idle for long periods.
Preparation for an upcoming commit that will add the amgetbatch
interface, and switch nbtree over to it (from amgettuple) to enable I/O
prefetching. The index prefetching read stream's effective prefetch
distance is adversely affected by any buffer pins held by the index AM.
At the same time, it can be useful for prefetching to read dozens of
leaf pages ahead of the scan to maintain an adequate prefetch distance.
The index prefetching patch avoids this tension by always eagerly
dropping index page pins of the kind traditionally held as an interlock
against unsafe concurrent TID recycling by VACUUM (essentially the same
way that amgetbitmap routines have always avoided holding onto pins).
The work from this commit makes that possible during scans of nbtree
unlogged indexes -- without our having to give up on setting LP_DEAD
bits on index tuples altogether.
Follow-up to commit d774072f, which moved the fake LSN infrastructure
out of GiST so that it could be used by other index AMs.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAH2-WzkehuhxyuA8quc7rRN3EtNXpiKsjPfO8mhb+0Dr2K0Dtg@mail.gmail.com
Move utility functions used by GiST to generate fake LSNs into xlog.c
and xloginsert.c, so that other index AMs can also generate fake LSNs.
Preparation for an upcoming commit that will add support for fake LSNs
to nbtree, allowing its dropPin optimization to be used during scans of
unlogged relations. That commit is itself preparation for another
upcoming commit that will add a new amgetbatch/btgetbatch interface to
enable I/O prefetching.
Bump XLOG_PAGE_MAGIC due to XLOG_GIST_ASSIGN_LSN becoming
XLOG_ASSIGN_LSN.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAH2-WzkehuhxyuA8quc7rRN3EtNXpiKsjPfO8mhb+0Dr2K0Dtg@mail.gmail.com
The function used GetXLogInsertRecPtr() to generate the fake LSN. Most
of the time this is the same as what XLogInsert() would return, and so
it works fine with the XLogFlush() call. But if the last record ends at
a page boundary, GetXLogInsertRecPtr() returns LSN pointing after the
page header. In such case XLogFlush() fails with errors like this:
ERROR: xlog flush request 0/01BD2018 is not satisfied --- flushed only to 0/01BD2000
Such failures are very hard to trigger, particularly outside aggressive
test scenarios.
Fixed by introducing GetXLogInsertEndRecPtr(), returning the correct LSN
without skipping the header. This is the same as GetXLogInsertRecPtr(),
except that it calls XLogBytePosToEndRecPtr().
Initial investigation by me, root cause identified by Andres Freund.
This is a long-standing bug in gistGetFakeLSN(), probably introduced by
c6b92041d3 in PG13. Backpatch to all supported versions.
Reported-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/vf4hbwrotvhbgcnknrqmfbqlu75oyjkmausvy66ic7x7vuhafx@e4rvwavtjswo
Backpatch-through: 14
Presently, many statements in stats_import.sql select all columns
from the pg_stats system view. A proposed follow-up commit would
add columns to this view (some of which are not stable across test
runs), breaking all of these tests. This commit introduces a
convenience view for those statements so that future changes are
minimally disruptive.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CADkLM%3DcoCVy92QkVUUTLdo5eO2bMDtwMrzRn_8miAhX%2BuPaqXg%40mail.gmail.com
In 0b96e734c5 I (Andres) relied on page_collect_tuples() being called only
with an MVCC snapshot, and added assertions to that end, but did not realize
that IsMVCCSnapshot() allows both proper MVCC snapshots and historical
snapshots, which behave quite similarly to MVCC snapshots.
Unfortunately that can lead to incorrect visibility results during logical
decoding, as a historical snapshot is interpreted as a plain MVCC
snapshot. The only reason this wasn't noticed earlier is that it's hard to
reach as most of the time there are no sequential scans during logical
decoding.
To fix the bug and avoid issues like this in the future, split
IsMVCCSnapshot() into IsMVCCSnapshot() and IsMVCCLikeSnapshot(), where now
only the latter includes historic snapshots.
One effect of this is that during logical decoding no page-at-a-time snapshots
are used, as otherwise runtime branches to handle historic snapshots would be
needed in some performance critical paths. Given how uncommon sequential scans
are during logical decoding, that seems acceptable.
Author: Antonin Houska <ah@cybertec.at>
Reported-by: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/61812.1770637345@localhost
As part of 6225403f2, I'd removed the override for the `stlib` target,
since NAME no longer contains a major version number. But I forgot that
its dependencies are declared before Makefile.shlib is included; those
dependencies were then omitted entirely.
Per buildfarm member indri, which appears to be the only system so far
that's bothered by an empty archive.
Now that libpq-oauth doesn't have to match the major version of libpq,
some things in pg_wchar.h are technically unsafe for us to use. (See
b6c7cfac8 for a fuller discussion.) This is unlikely to be a problem --
we only care about UTF-8 in the context of OAuth right now -- but if
anyone did introduce a way to hit it, it'd be extremely difficult to
debug or reproduce, and it'd be a potential security vulnerability to
boot.
Define USE_PRIVATE_ENCODING_FUNCS so that anyone who tries to add a
dependency on the exported APIs will simply fail to link the shared
module.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAOYmi%2BmrGg%2Bn_X2MOLgeWcj3v_M00gR8uz_D7mM8z%3DdX1JYVbg%40mail.gmail.com
Switch the private libpq-oauth ABI to a public one, based on the new
PGoauthBearerRequestV2 API. A huge amount of glue code can be removed as
part of this, and several code paths can be deduplicated. Additionally,
the shared library no longer needs to change its name for every major
release; it's now just "libpq-oauth.so".
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAOYmi%2BmrGg%2Bn_X2MOLgeWcj3v_M00gR8uz_D7mM8z%3DdX1JYVbg%40mail.gmail.com
Since commit 5883ff30b0, some compilers have been warning that the
rtekind variable in unique_nonjoin_rtekind() may be used
uninitialized. There doesn't appear to be any actual risk, so
let's just initialize it to something to silence the compiler
warnings.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0sieVNfniCKMDdDjuXGd1OuzMQfTS5%3D9vX3sa-iiujKUA%40mail.gmail.com
Presently, such commands scan the input buffer one byte at a time
looking for special characters. This commit adds a new path that
uses SIMD instructions to skip over chunks of data without any
special characters. This can be much faster.
To avoid regressions, SIMD processing is disabled for the remainder
of the COPY FROM command as soon as we encounter a short line or a
special character (except for end-of-line characters, else we'd
always disable it after the first line). This is perhaps too
conservative, but it could probably be made more lenient in the
future via fine-tuned heuristics.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-authored-by: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Ayoub Kazar <ma_kazar@esi.dz>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Neil Conway <neil.conway@gmail.com>
Reviewed-by: Greg Burd <greg@burd.me>
Tested-by: Manni Wood <manni.wood@enterprisedb.com>
Tested-by: Mark Wong <markwkm@gmail.com>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
Historically, all SLRUs were addressed by transaction IDs, but that
hasn't been true for a long time. However, the error message on I/O
error still always talked about accessing a transaction ID.
This commit adds a callback that allows subsystems to construct their
own error messages, which can then correctly refer to a transaction
ID, multixid or whatever else is used to address the particular SLRU.
Author: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://www.postgresql.org/message-id/CACG=ezZZfurhYV+66ceubxQAyWqv9vaUi0yoO4-t48OE5xc0DQ@mail.gmail.com
This commit adds a stats_reset column to pg_stat_database_conflicts,
allowing users to see when the statistics in this view were last reset.
This makes the view consistent with pg_stat_database and other statistics
views.
Catalog version bumped.
Author: Shihao Zhong <zhong950419@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAGRkXqS98OebEWjax99_LVAECsxCB8i=BfsdAL34i-5QHfwyOQ@mail.gmail.com
ginExtractEntries() can produce a lot of entries for a single item.
During index build, we check for interrupts between entries, and the
fast-update codepath does it as part of vacuum_delay_point(), but the
non-fast update insertion codepath was uninterruptible. Add
CHECK_FOR_INTERRUPTS() between entries in the non-fast update codepath
too.
Author: Vinod Sridharan <vsridh90@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAFMdLD6mQvAuStiOGvBJxAEfo6wdjZhj3+JveTLxOX8MVn4zmA@mail.gmail.com
Previously, ginScanToDelete() and ginDeletePage() passed BlockNumbers and
re-read pages that were already pinned and locked during the tree walk. The
caller ginVacuumPostingTree()) held a cleanup-locked root buffer, yet
ginScanToDelete() re-read it by block number with special-case code to skip
re-locking.
At first, this commit gives both functions more appropriate names,
ginScanPostingTreeToDelete() and ginDeletePostingPage(), indicating they deal
with posting trees/pages. This is more descriptive and similar to the way we
name other GIN functions, for instance, ginVacuumPostingTree() and
ginVacuumPostingTreeLeaves().
Then rework both functions to pass Buffers directly. DataPageDeleteStack now
carries buffer, myoff (downlink offset in parent), and isRoot per level,
so ginScanPostingTreeToDelete() takes only GinVacuumState and
DataPageDeleteStack pointers. Also, ginDeletePostingPage() receives the three
Buffers directly, and no longer reads or releases them itself. The caller
reads and locks child pages before recursing, and manages buffer lifecycle
afterward.
This eliminates the confusing isRoot special cases in buffer management,
including the apparent (but unreachable) double release of the root
buffer identified by Andres Freund.
Add comments explaining the locking protocol and the DataPageDeleteStack
structure.
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/utrlxij43fbguzw4kldte2spc4btoldizutcqyrfakqnbrp3ir@ph3sphpj4asz
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Jinbinge <jinbinge@126.com>
The logic of xslt_process() has never considered the fact that
xsltSaveResultToString() would return NULL for an empty string (the
upstream code has always done so, with a string length of 0). This
would cause memcpy() to be called with a NULL pointer, something
forbidden by POSIX.
Like 46ab07ffda and similar fixes, this is backpatched down to all the
supported branches, with a test case to cover this scenario. An empty
string has been always returned in xml2 in this case, based on the
history of the module, so this is an old issue.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/c516a0d9-4406-47e3-9087-5ca5176ebcf9@gmail.com
Backpatch-through: 14
Currently, when the argument of copyObject() is const-qualified, the
return type is also, because the use of typeof carries over all the
qualifiers. This is incorrect, since the point of copyObject() is to
make a copy to mutate. But apparently no code ran into it.
The new implementation uses typeof_unqual, which drops the qualifiers,
making this work correctly.
typeof_unqual is standardized in C23, but all recent versions of all
the usual compilers support it even in non-C23 mode, at least as
__typeof_unqual__. We add a configure/meson test for typeof_unqual
and __typeof_unqual__ and use it if it's available, else we use the
existing fallback of just returning void *.
This is the second attempt, after the first attempt in commit
4cfce4e62c was reverted. The following two points address problems
with the earlier version:
We test the underscore variant first so that there is a higher chance
that clang used for bitcode also supports it, since we don't test that
separately.
Unlike the typeof test, the typeof_unqual test also tests with a void
pointer similar to how copyObject() would use it, because that is not
handled by MSVC, so we want the test to fail there.
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/92f9750f-c7f6-42d8-9a4a-85a3cbe808f3%40eisentraut.org
This commit replaces the synchronous ReadBufferExtended() loops with the
streaming read routines, affecting pgstatindex() (for btree) and
pgstathashindex() (for hash indexes).
Under test conditions similar to 6c228755ad (some dm_delay and
debug_io_direct=data), this can result in nice runtime and IO gains.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VrqfbcDXqGrdLQ2xaQ=K0RzExNuw6U_GGqzSJu32wfdQ@mail.gmail.com
Previously, ALTER TABLE ADD COLUMN always forced a table rewrite when
the column type was a domain with constraints (CHECK or NOT NULL), even
if the default value satisfied those constraints. This was because
contain_volatile_functions() considers CoerceToDomain immutable, so
the code conservatively assumed any constrained domain might fail.
Improve this by using soft error handling (ErrorSaveContext) to evaluate
the CoerceToDomain expression at ALTER TABLE time. If the default value
passes the domain's constraints, the value is stored as a "missing"
attribute default and no table rewrite is needed. If the constraint
check fails, we fall back to a table rewrite, preserving the historical
behavior that constraint violations are only raised when the table
actually contains rows.
Domains with volatile constraint expressions always require a table
rewrite since the constraint result could differ per evaluation and
cannot be cached.
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Viktor Holmberg <viktor.holmberg@aiven.io>
Discussion: https://postgr.es/m/CACJufxE_+iZBR1i49k_AHigppPwLTJi6km8NOsC7FWvKdEmmXg@mail.gmail.com
Add an optional bool *has_volatile output parameter to
DomainHasConstraints(). When non-NULL, the function checks whether any
CHECK constraint contains a volatile expression. Callers that don't
need this information pass NULL and get the same behavior as before.
This is needed by a subsequent commit that enables the fast default
optimization for domains with non-volatile constraints: we can safely
evaluate such constraints once at ALTER TABLE time, but volatile
constraints require a full table rewrite.
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Viktor Holmberg <viktor.holmberg@aiven.io>
Discussion: https://postgr.es/m/CACJufxE_+iZBR1i49k_AHigppPwLTJi6km8NOsC7FWvKdEmmXg@mail.gmail.com
Commit ac58465e06 added and documented a new progress-report view for
REPACK, but neglected to list the 'command' column in the docs. This is
my (Álvaro's) fail, as I added the column in v23 of the patch and forgot
to document it.
In passing, add a note in the docs for pg_stat_progress_cluster that it
might contain rows for sessions running REPACK, though mapping the
command name to either the older commands; and that it is for backwards-
compatibility only. (Maybe we should just remove this older view.)
Author: Noriyoshi Shinoda <noriyoshi.shinoda@hpe.com>
Discussion: https://postgr.es/m/LV8PR84MB37870F0F35EF2E8CB99768CBEE47A@LV8PR84MB3787.NAMPRD84.PROD.OUTLOOK.COM
Discussion: https://postgr.es/m/202510101352.vvp4p3p2dblu@alvherre.pgsql
Replace dynahash with simplehash for the per-backend PrivateRefCountHash
overflow table. Simplehash generates inlined, open-addressed lookup
code, avoiding the per-call overhead of dynahash that becomes noticeable
when many buffers are pinned with a CPU-bound workload.
Motivated by testing of the index prefetching patch, which pins many
more buffers concurrently than typical index scans.
Author: Peter Geoghegan <pg@bowt.ie>
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wz=g=JTSyDB4UtB5su2ZcvsS7VbP+ZMvvaG6ABoCb+s8Lw@mail.gmail.com
Avoid allocating memory for an nbtree descent stack during index scans.
We only require a descent stack during inserts, when it is used to
determine where to insert a new pivot tuple/downlink into the target
leaf page's parent page in the event of a page split. (Page deletion's
first phase also performs a _bt_search that requires a descent stack.)
This optimization improves performance by minimizing palloc churn. It
speeds up index scans that call _bt_search frequently/descend the index
many times, especially when the cost of scanning the index dominates
(e.g., with index-only skip scans). Testing has shown that the
underlying issue causes performance problems for an upcoming patch that
will replace btgettuple with a new btgetbatch interface to enable I/O
prefetching.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAH2-Wzmy7NMba9k8m_VZ-XNDZJEUQBU8TeLEeL960-rAKb-+tQ@mail.gmail.com
Provide a facility that (1) can be used to stabilize certain plan choices
so that the planner cannot reverse course without authorization and
(2) can be used by knowledgeable users to insist on plan choices contrary
to what the planner believes best. In both cases, terrible outcomes are
possible: users should think twice and perhaps three times before
constraining the planner's ability to do as it thinks best; nevertheless,
there are problems that are much more easily solved with these facilities
than without them.
This patch takes the approach of analyzing a finished plan to produce
textual output, which we call "plan advice", that describes key
decisions made during plan; if that plan advice is provided during
future planning cycles, it will force those key decisions to be made in
the same way. Not all planner decisions can be controlled using advice;
for example, decisions about how to perform aggregation are currently
out of scope, as is choice of sort order. Plan advice can also be edited
by the user, or even written from scratch in simple cases, making it
possible to generate outcomes that the planner would not have produced.
Partial advice can be provided to control some planner outcomes but not
others.
Currently, plan advice is focused only on specific outcomes, such as
the choice to use a sequential scan for a particular relation, and not
on estimates that might contribute to those outcomes, such as a
possibly-incorrect selectivity estimate. While it would be useful to
users to be able to provide plan advice that affects selectivity
estimates or other aspects of costing, that is out of scope for this
commit.
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Dian Fay <di@nmfay.com>
Reviewed-by: Ajay Pal <ajay.pal.k@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
Test suites driven by pg_regress can use the following environment
variables for path substitutions since d1029bb5a2:
- PG_ABS_SRCDIR
- PG_ABS_BUILDDIR
- PG_DLSUFFIX
- PG_LIBDIR
These variables have never been documented, and they can be useful for
out-of-core code based on the options used by the pg_regress command
invoked by installcheck (or equivalent) to build paths to libraries for
various commands, like LOAD or CREATE FUNCTION.
Reviewed-by: Zhang Hu <kongbaik228@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/abDAWzHaesHLDFke@paquier.xyz
This commit replaces the synchronous ReadBufferExtended() loops done in
blbulkdelete() and blvacuumcleanup() with the streaming read equivalent,
to improve I/O efficiency during bloom index vacuum cleanup operations.
Under the same test conditions as 6c228755ad, the runtime is proving
to gain around 30% better, with most the benefits coming from a large
reduction of the IO operation based on the stats retrieved in the
scenarios run.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VrqfbcDXqGrdLQ2xaQ=K0RzExNuw6U_GGqzSJu32wfdQ@mail.gmail.com
This commit replace the synchronous ReadBufferExtended() loop done in
ginvacuumcleanup() with the streaming read equivalent, to improve I/O
efficiency during GIN index vacuum cleanup operations.
With dm_delay to emulate some latency and debug_io_direct=data to force
synchronous writes and force the read path to be exercised, the author
has noticed a 5x improvement in runtime, with a substantial reduction in
IO stats numbers. I have reproduced similar numbers while running
similar tests, with improvements becoming better with more tuples and
more pages manipulated.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VrqfbcDXqGrdLQ2xaQ=K0RzExNuw6U_GGqzSJu32wfdQ@mail.gmail.com
The planner has historically been unable to convert "x NOT IN (SELECT
y ...)" sublinks into anti-joins. This is because standard SQL
semantics for NOT IN require that if the comparison "x = y" returns
NULL, the "NOT IN" expression evaluates to NULL (effectively false),
causing the row to be discarded. In contrast, an anti-join preserves
the row if no match is found. Due to this semantic mismatch regarding
NULL handling, the conversion was previously considered unsafe.
However, if we can prove that neither side of the comparison can yield
NULL values, and further that the operator itself cannot return NULL
for non-null inputs, the behavior of NOT IN and anti-join becomes
identical. Enabling this conversion allows the planner to treat the
sublink as a first-class relation rather than an opaque SubPlan
filter. This unlocks global join ordering optimization and permits
the selection of the most efficient join algorithm based on cost,
often yielding significant performance improvements for large
datasets.
This patch verifies that neither side of the comparison can be NULL
and that the operator is safe regarding NULL results before performing
the conversion.
To verify operator safety, we require that the operator be a member of
a B-tree or Hash operator family. This serves as a proxy for standard
boolean behavior, ensuring the operator does not return NULL on valid
non-null inputs, as doing so would break index integrity.
For operand non-nullability, this patch makes use of several existing
mechanisms. It leverages the outer-join-aware-Var infrastructure to
verify that a Var does not come from the nullable side of an outer
join, and consults the NOT-NULL-attnums hash table to efficiently
verify schema-level NOT NULL constraints. Additionally, it employs
find_nonnullable_vars to identify Vars forced non-nullable by qual
clauses, and expr_is_nonnullable to deduce non-nullability for other
expression types.
The logic for verifying the non-nullability of the subquery outputs
was adapted from prior work by David Rowley and Tom Lane.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Reviewed-by: Zhang Mingli <zmlpostgres@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/CAMbWs495eF=-fSa5CwJS6B-BaEi3ARp0UNb4Lt3EkgUGZJwkAQ@mail.gmail.com
Unfortunately, in 30df61990c, I made GetPrivateRefCountEntrySlow() set a
wrong cache hint when moving entries from the hash table to the faster array.
There are no correctness concerns due to this, just an unnecessary loss of
performance.
Noticed while testing the index prefetching patch.
Discussion: https://postgr.es/m/CAH2-Wz=g=JTSyDB4UtB5su2ZcvsS7VbP+ZMvvaG6ABoCb+s8Lw@mail.gmail.com
This commit adds support for ALTER TABLE ALTER CONSTRAINT ... [NOT]
ENFORCED for CHECK constraints. Previously, only foreign key
constraints could have their enforceability altered.
When changing from NOT ENFORCED to ENFORCED, the operation not only
updates catalog information but also performs a full table scan in
Phase 3 to validate that existing data satisfies the constraint.
For partitioned tables and inheritance hierarchies, the operation
recurses to all child tables. When changing to NOT ENFORCED, we must
recurse even if the parent is already NOT ENFORCED, since child
constraints may still be ENFORCED.
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Amul Sul <sulamul@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@cybertec.at>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CACJufxHCh_FU-FsEwsCvg9mN6-5tzR6H9ntn+0KUgTCaerDOmg@mail.gmail.com
The functions AlterConstrEnforceabilityRecurse and
ATExecAlterConstrEnforceability are being renamed to
AlterFKConstrEnforceabilityRecurse and ATExecAlterFKConstrEnforceability,
respectively.
The current alter constraint functions only handle Foreign Key constraints.
Renaming them to be more explicit about the constraint type is necessary;
otherwise, it will cause confusion when we later introduce the ability to alter
the enforceability of other constraints.
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Amul Sul <sulamul@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Discussion: https://postgr.es/m/CACJufxHCh_FU-FsEwsCvg9mN6-5tzR6H9ntn+0KUgTCaerDOmg@mail.gmail.com
When we were updating hint bits with just a share lock MarkBufferDirtyHint()
had to use a non-standard order of operations, i.e. WAL log the buffer before
marking the buffer dirty. This was required because the lock level used to set
hints did not conflict with the lock level that was used to flush pages, which
would have allowed flushing the page out before the WAL record. The
non-standard order in turn required preventing the checkpoint from starting
between writing the WAL record and flushing out the page.
Now that setting hints and writing out buffers use share-exclusive, we can
revert back to the normal order of operations.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d
Due to the recent changes to use a share-exclusive mode for setting hint bits
and for flushing pages - instead of using share mode as before - a buffer
cannot be dirtied while the flush is ongoing. The reason we needed
JUST_DIRTIED was to handle the case where the buffer was dirtied while IO was
ongoing - which is not possible anymore.
Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d
GetVictimBuffer() rejects a victim buffer if it is from a bulkread
strategy ring and reusing it would require flushing WAL. Unlogged table
buffers can have fake LSNs (e.g. unlogged GiST pages) and calling
XLogNeedsFlush() on a fake LSN is meaningless.
This is a bit of future-proofing because currently the bulkread strategy
is not used for relations with fake LSNs.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Earlier version reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/flat/fmkqmyeyy7bdpvcgkheb6yaqewemkik3ls6aaveyi5ibmvtxnd%40nu2kvy5rq3a6
On platforms where we can read or write the whole LSN atomically, we do
not need to lock the buffer header to prevent torn LSNs. We can do this
only on platforms with PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY, and when the
pd_lsn field is properly aligned.
For historical reasons the PageXLogRecPtr was defined as a struct with
two uint32 fields. This replaces it with a single uint64 value, to make
the intent clearer. To prevent issues with weak typedefs the value is
still wrapped in a struct.
This also adjusts heapfuncs() in pageinspect, to ensure proper alignment
when reading the LSN from a page on alignment-sensitive hardware.
Idea by Andres Freund. Initial patch by Andreas Karlsson, improved by
Peter Geoghegan. Minor tweaks by me.
Author: Andreas Karlsson <andreas@proxel.se>
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/b6610c3b-3f59-465a-bdbb-8e9259f0abc4@proxel.se
With io_method=worker, there's a single I/O submission queue. With
enough workers, the backends and workers may end up spending a lot of
time competing for the AioWorkerSubmissionQueueLock lock. This can
happen with workloads that keep the queue full, in which case it's
impossible to add requests to the queue. Increasing the number of I/O
workers increases the pressure on the lock, worsening the issue.
This change improves the situation in two ways:
* If AioWorkerSubmissionQueueLock can't be acquired without waiting,
the I/O is performed synchronously (as if the queue was full).
* When an entry can't be added to a full queue, stop trying to add more
entries. All remaining entries are handled as synchronous I/O.
The regression was reported by Alexandre Felipe. Investigation and
patch by me, based on an idea by Andres Freund.
Reported-by: Alexandre Felipe <o.alexandre.felipe@gmail.com>
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAE8JnxOn4+xUAnce+M7LfZWOqfrMMxasMaEmSKwiKbQtZr65uA@mail.gmail.com
This fixes two bugs in commit 1887d822f1.
First, if we are using the fallback C++ implementation of typeof, then
we need to include the C++ header <type_traits> for
std::remove_reference_t. This header is also likely to be used for
other C++ implementations of type tricks, so we'll put it into the
global includes.
Second, for the case that the C compiler supports typeof in a spelling
that is not "typeof" (for example, __typeof__), then we need to #undef
typeof in the C++ section to avoid warnings about duplicate macro
definitions.
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/92f9750f-c7f6-42d8-9a4a-85a3cbe808f3%40eisentraut.org
table_open() is a wrapper around relation_open() that checks that the
relkind is table-like and gives a user-facing error message if not.
It is best used in directly user-facing areas to check that the user
used the right kind of command for the relkind. In internal uses
where the relkind was previously checked from the user's perspective,
table_open() is not necessary and might even be confusing if it were
to give out-of-context error messages.
In rewriteHandler.c, there were several such table_open() calls, which
this changes to relation_open(). This currently doesn't make a
difference, but there are plans to have other relkinds that could
appear in the rewriter but that shouldn't be accessible via
table-specific commands, and this clears the way for that.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/6d3fef19-a420-4e11-8235-8ea534bf2080%40eisentraut.org
Discussion: https://www.postgresql.org/message-id/flat/a855795d-e697-4fa5-8698-d20122126567@eisentraut.org
At the moment hint bits can be set with just a share lock on a page (and,
until 45f658dacb, in one case even without any lock). Because of this we need
to copy pages while writing them out, as otherwise the checksum could be
corrupted.
The need to copy the page is problematic to implement AIO writes:
1) Instead of just needing a single buffer for a copied page we need one for
each page that's potentially undergoing I/O
2) To be able to use the "worker" AIO implementation the copied page needs to
reside in shared memory
It also causes problems for using unbuffered/direct-IO, independent of AIO:
Some filesystems, raid implementations, ... do not tolerate the data being
written out to change during the write. E.g. they may compute internal
checksums that can be invalidated by concurrent modifications, leading e.g. to
filesystem errors (as the case with btrfs).
It also just is plain odd to allow modifications of buffers that are just
share locked.
To address these issues, this commit changes the rules so that modifications
to pages are not allowed anymore while holding a share lock. Instead the new
share-exclusive lock (introduced in fcb9c977aa) allows at most one backend to
modify a buffer while other backends have the same page share locked. An
existing share-lock can be upgraded to a share-exclusive lock, if there are no
conflicting locks. For that BufferBeginSetHintBits()/BufferFinishSetHintBits()
and BufferSetHintBits16() have been introduced.
To prevent hint bits from being set while the buffer is being written out,
writing out buffers now requires a share-exclusive lock.
The use of share-exclusive to gate setting hint bits means that from now on
only one backend can set hint bits at a time. To allow multiple backends to
set hint bits would require more complicated locking: For setting hint bits
we'd need to store the count of backends currently setting hint bits and we
would need another lock-level for I/O conflicting with the lock-level to set
hint bits. Given that the share-exclusive lock for setting hint bits is only
held for a short time, that backends would often just set the same hint bits
and that the cost of occasionally not setting hint bits in hotly accessed
pages is fairly low, this seems like an acceptable tradeoff.
The biggest change to adapt to this is in heapam. To avoid performance
regressions for sequential scans that need to set a lot of hint bits, we need
to amortize the cost of BufferBeginSetHintBits() for cases where hint bits are
set at a high frequency. To that end HeapTupleSatisfiesMVCCBatch() uses the
new SetHintBitsExt(), which defers BufferFinishSetHintBits() until all hint
bits on a page have been set. Conversely, to avoid regressions in cases where
we can't set hint bits in bulk (because we're looking only at individual
tuples), use BufferSetHintBits16() when setting hint bits without batching.
Several other places also need to be adapted, but those changes are
comparatively simpler.
After this we do not need to copy buffers to write them out anymore. That
change is done separately however.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf%40gcnactj4z56m
This commit replaces the per-page buffer read look in blgetbitmap() with
a reading stream, to improve scan efficiency, particularly useful for
large bloom indexes. Some benchmarking with a large number of rows has
shown a very nice improvement in terms of runtime and IO read reduction
with test cases up to 10M rows for a bloom index scan.
For the io_uring method, The author has reported a 3x in runtime with
io_uring while I was at close to a 7x. For the worker method with 3
workers, the author has reported better numbers than myself in runtime,
with the reduction in IO stats being appealing for all the cases
measured.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VrqfbcDXqGrdLQ2xaQ=K0RzExNuw6U_GGqzSJu32wfdQ@mail.gmail.com
Commit 17f51ea818 introduced a new pendingRecoveryConflicts field in
PGPROC to replace the various ProcSignals. The new field was cleared
in ProcArrayEndTransaction(), which makes sense for conflicts with
e.g. locks or buffer pins which are gone at end of transaction. But it
is not appropriate for conflicts on a database, or a logical slot.
Because of this, the 035_standby_logical_decoding.pl test was
occasionally getting stuck in the buildfarm. It happens if the startup
process signals recovery conflict with the logical slot just when the
walsender process using the slot calls ProcArrayEndTransaction().
To fix, don't clear pendingRecoveryConflicts in
ProcArrayEndTransaction(). We could still clear certain conflict
flags, like conflicts on locks, but we didn't try to do that before
commit 17f51ea818 either.
In the passing, fix a misspelled comment, and make
InitAuxiliaryProcess() to also clear pendingRecoveryConflicts. I don't
think aux processes can have recovery conflicts, but it seems best to
initialize the field and keep InitAuxiliaryProcess() as close to
InitProcess() as possible.
Analyzed-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://www.postgresql.org/message-id/3e07149d-060b-48a0-8f94-3d5e4946ae45@gmail.com
Previously WAL records that froze tuples used OldestXmin as the snapshot
conflict horizon, or the visibility cutoff if the page would become
all-frozen. Both are newer than (or equal to) the newst XID actually
frozen on the page.
Track the newest XID that will be frozen and use that as the snapshot
conflict horizon instead. This yields an older horizon resulting in
fewer query cancellations on standbys.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAAKRu_bbaUV8OUjAfVa_iALgKnTSfB4gO3jnkfpcFgrxEpSGJQ%40mail.gmail.com
REPACK absorbs the functionality of VACUUM FULL and CLUSTER in a single
command. Because this functionality is completely different from
regular VACUUM, having it separate from VACUUM makes it easier for users
to understand; as for CLUSTER, the term is heavily overloaded in the
IT world and even in Postgres itself, so it's good that we can avoid it.
We retain those older commands, but de-emphasize them in the
documentation, in favor of REPACK; the difference between VACUUM FULL
and CLUSTER (namely, the fact that tuples are written in a specific
ordering) is neatly absorbed as two different modes of REPACK.
This allows us to introduce further functionality in the future that
works regardless of whether an ordering is being applied, such as (and
especially) a concurrent mode.
Author: Antonin Houska <ah@cybertec.at>
Reviewed-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/82651.1720540558@antos
Discussion: https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql
Previously heap_inplace_update_and_unlock() used an operation order similar to
MarkBufferDirty(), to reduce the number of different approaches used for
updating buffers. However, in an upcoming patch, MarkBufferDirtyHint() will
switch to using the update protocol used by most other places (enabled by hint
bits only being set while holding a share-exclusive lock).
Luckily it's pretty easy to adjust heap_inplace_update_and_unlock(). As a
comment already foresaw, we can use the normal order, with the slight change
of updating the buffer contents after WAL logging.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d
There's no need for a StringInfo when all you want is a string
being constructed in a single pass.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reported-by: Ranier Vilela <ranier.vf@gmail.com>
Reviewed-by: Yang Yuanzhuo <1197620467@qq.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CAEudQAq2wyXZRdsh+wVHcOrungPU+_aQeQU12wbcgrmE0bQovA@mail.gmail.com
Commit dae761a added initialization of some BrinBuildState fields
in initialize_brin_buildstate(). Later, commit b437571 inadvertently
added the same initialization again.
This commit removes that redundant initialization. No behavioral
change is intended.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2nmrca6-9SNChDvRYD6+r==fs9qg5J93kahS7vpoq8QVg@mail.gmail.com
A list of expressions with optional AS-labels is useful in a few
different places. Right now, this is available as xml_attribute_list
because it was first used in the XMLATTRIBUTES construct, but it is
already used elsewhere, and there are other possible future uses. To
reduce possible confusion going forward, rename it to
labeled_expr_list (like existing expr_list plus ColLabel).
Discussion: https://www.postgresql.org/message-id/flat/a855795d-e697-4fa5-8698-d20122126567@eisentraut.org
Up until now, the only way for a loadable module to disable the use of a
particular index was to use build_simple_rel_hook (or, previous to
yesterday's commit, get_relation_info_hook) to remove it from the index
list. While that works, it has some disadvantages. First, the index
becomes invisible for all purposes, and can no longer be used for
optimizations such as self-join elimination or left join removal, which
can severely degrade the resulting plan.
Second, if the module attempts to compel the use of a certain index
by removing all other indexes from the index list and disabling
other scan types, but the planner is unable to use the chosen index
for some reason, it will fall back to a sequential scan, because that
is only disabled, whereas the other indexes are, from the planner's
point of view, completely gone. While this situation ideally shouldn't
occur, it's hard for a loadable module to be completely sure whether
the planner will view a certain index as usable for a certain query.
If it isn't, it may be better to fall back to a scan using a disabled
index rather than falling back to an also-disabled sequential scan.
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA%2BTgmoYS4ZCVAF2jTce%3DbMP0Oq_db_srocR4cZyO0OBp9oUoGg%40mail.gmail.com
Crash recovery started without a backup_label previously crashed with a
PANIC if the checkpoint record could not be found. This commit lowers
the report generated to be a FATAL instead.
With recovery methods being more imaginative these days, this should
provide more flexibility when handling PostgreSQL recovery processing in
the event of a driver error, similarly to 15f68cebdc. An extra
benefit of this change is that it becomes possible to add a test to
check that a FATAL is hit with an expected error message pattern. With
the recovery code becoming more complicated over the last couple of
years, I suspect that this will be benefitial to cover in the long-term.
The original PANIC behavior has been introduced in the early days of
crash recovery, as of 4d14fe0048 (PANIC did not exist yet, the code
used STOP).
Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com>
Discussion: https://postgr.es/m/CAMm1aWZbQ-Acp_xAxC7mX9uZZMH8+NpfepY9w=AOxbBVT9E=uA@mail.gmail.com
What should be used is not "volatile foo *ptr" but "foo *volatile ptr",
The incorrect (former) style means that what the pointer variable points
to is volatile. The correct (latter) style means that the pointer
variable itself needs to be treated as volatile. The latter style is
required to ensure a consistent treatment of these variables after a
longjmp with the TRY/CATCH blocks.
Some casts can be removed thanks to this change.
Issue introduced by 2e94721747, so no backpatch is required. A
similar set of issues has been fixed in 93001888d8 for contrib/xml2/.
Author: ChangAo Chen <cca5507@qq.com>
Discussion: https://postgr.es/m/tencent_5BE8DAD985EE140ED62EA728C8D4E1311F0A@qq.com
This commit makes use of the function added by commit b2898baaf7
for these applications' handling of conflicting options. It
doesn't fix any bugs, but it does trim several lines of code.
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Steven Niu <niushiji@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CACJufxHDYn%2B3-2jR_kwYB0U7UrNP%2B0EPvAWzBBD5EfUzzr1uiw%40mail.gmail.com
For a long time, PostgreSQL has had a get_relation_info_hook which
plugins can use to editorialize on the information that
get_relation_info obtains from the catalogs. However, this hook is
only called for baserels of type RTE_RELATION, and there is
potential utility in a similar call back for other types of
RTEs. This might have had utility even before commit
4020b370f2 added pgs_mask to
RelOptInfo, but it certainly has utility now.
So, move the callback up one level, deleting get_relation_info_hook and
adding build_simple_rel_hook instead. The new callback is called just
slightly later than before and with slightly different arguments, but it
should be fairly straightforward to adjust existing code that currently
uses get_relation_info_hook: the values previously available as
relationObjectId and inhparent are now available via rte->relid and
rte->inh, and calls where rte->rtekind != RTE_RELATION can be ignored if
desired.
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA%2BTgmoYg8uUWyco7Pb3HYLMBRQoO6Zh9hwgm27V39Pb6Pdf%3Dug%40mail.gmail.com
Previously, the comments stated that there was no purpose to considering
startup cost for partial paths, but this is not the case: it's perfectly
reasonable to want a fast-start path for a plan that involves a LIMIT
(perhaps over an aggregate, so that there is enough data being processed
to justify parallel query but yet we don't want all the result rows).
Accordingly, rewrite add_partial_path and add_partial_path_precheck to
consider startup costs. This also fixes an independent bug in
add_partial_path_precheck: commit e222534679
failed to update it to do anything with the new disabled_nodes field.
That bug fix is formally separate from the rest of this patch and could
be committed separately, but I think it makes more sense to fix both
issues together, because then we can (as this commit does) just make
add_partial_path_precheck do the cost comparisons in the same way as
compare_path_costs_fuzzily, which hopefully reduces the chances of
ending up with something that's still incorrect.
This patch is based on earlier work on this topic by Tomas Vondra,
but I have rewritten a great deal of it.
Co-authored-by: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Tomas Vondra <tomas@vondra.me>
Discussion: http://postgr.es/m/CA+TgmobRufbUSksBoxytGJS1P+mQY4rWctCk-d0iAUO6-k9Wrg@mail.gmail.com
When I (rhaas) wrote the WAL summarizer code, I incorrectly believed
that XLOG_SMGR_TRUNCATE truncates all forks to the same length. In
fact, what other parts of the code do is compute the truncation length
for the FSM and VM forks from the truncation length used for the main
fork. But, because I was confused, I coded the WAL summarizer to set the
limit block for the VM fork to the same value as for the main fork.
(Incremental backup always copies FSM forks in full, so there is no
similar issue in that case.)
Doing that doesn't directly cause any data corruption, as far as I can
see. However, it does create a serious risk of consuming a large amount
of extra disk space, because pg_combinebackup's reconstruct.c believes
that the reconstructed file should always be at least as long as the
limit block value. We might want to be smarter about that at some point
in the future, because it's always safe to omit all-zeroes blocks at the
end of the last segment of a relation, and doing so could save disk
space, but the current algorithm will rarely waste enough disk space to
worry about unless we believe that a relation has been truncated to a
length much longer than its actual length on disk, which is exactly what
happens as a result of the problem mentioned in the previous paragraph.
To fix, create a new visibilitymap helper function and use it to include
the right limit block in the summary files. Incremental backups taken
with existing summary files will still have this issue, but this should
improve the situation going forward.
Diagnosed-by: Oleg Tkachenko <oatkachenko@gmail.com>
Diagnosed-by: Amul Sul <sulamul@gmail.com>
Discussion: http://postgr.es/m/CAAJ_b97PqG89hvPNJ8cGwmk94gJ9KOf_pLsowUyQGZgJY32o9g@mail.gmail.com
Discussion: http://postgr.es/m/6897DAF7-B699-41BF-A6FB-B818FCFFD585%40gmail.com
Backpatch-through: 17
Commit 2cd40adb85 added the IF NOT EXISTS option to ALTER TABLE ADD COLUMN.
This also enabled IF NOT EXISTS for ALTER FOREIGN TABLE ADD COLUMN,
but the ALTER FOREIGN TABLE documentation was not updated to mention it.
This commit updates the documentation to describe the IF NOT EXISTS option for
ALTER FOREIGN TABLE ADD COLUMN.
While updating that section, also this commit clarifies that the COLUMN keyword
is optional in ALTER FOREIGN TABLE ADD/DROP COLUMN. Previously, part of
the documentation could be read as if COLUMN were required.
This commit adds regression tests covering these ALTER FOREIGN TABLE syntaxes.
Backpatch to all supported versions.
Suggested-by: Fujii Masao <masao.fujii@gmail.com>
Author: Chao Li <lic@highgo.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwFk=rrhrwGwPtQxBesbT4DzSZ86Q3ftcwCu3AR5bOiXLw@mail.gmail.com
Backpatch-through: 14
When make_new_segment() creates an odd-sized segment, the pagemap was
only sized based on a number of usable_pages entries, forgetting that a
segment also contains metadata pages, and that the FreePageManager uses
absolute page indices that cover the entire segment. This
miscalculation could cause accesses to pagemap entries to be out of
bounds. During subsequent reuse of the allocated segment, allocations
landing on pages with indices higher than usable_pages could cause
out-of-bounds pagemap reads and/or writes. On write, 'span' pointers
are stored into the data area, corrupting the allocated objects. On
read (aka during a dsa_free), garbage is interpreted as a span pointer,
typically crashing the server in dsa_get_address().
The normal geometric path correctly sizes the pagemap for all pages in
the segment. The odd-sized path needs to do the same, but it works
forward from usable_pages rather than backward from total_size.
This commit fixes the sizing of the odd-sized case by adding pagemap
entries for the metadata pages after the initial metadata_bytes
calculation, using an integer ceiling division to compute the exact
number of additional entries needed in one go, avoiding any iteration in
the calculation.
An assertion is added in the code path for odd-sized segments, ensuring
that the pagemap includes the metadata area, and that the result is
appropriately sized.
This problem would show up depending on the size requested for the
allocation of a DSA segment. The reporter has noticed this issue when a
parallel hash join makes a DSA allocation large enough to trigger the
odd-sized segment path, but it could happen for anything that does a DSA
allocation.
A regression test is added to test_dsa, down to v17 where the test
module has been introduced. This adds a set of cheap tests to check the
problem, the new assertion being useful for this purpose. Sami has
proposed a test that took a longer time than what I have done here; the
test committed is faster and good enough to check the odd-sized
allocation path.
Author: Paul Bunn <paul.bunn@icloud.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/044401dcabac$fe432490$fac96db0$@icloud.com
Backpatch-through: 14
The tests of stats_import.sql include a set of queries to do
differential checks of the three statistics catalog relations, based on
the comparison of a source relation and a target relation, used for the
copy of the stats data with the restore functions:
- pg_statistic
- pg_stats_ext
- pg_stats_ext_exprs
This commit refactors the tests to reduce the bloat of such differential
queries, by creating a set of objects that make the differential queries
smaller:
- View for a base relation type.
- First function to retrieve stats data, that returns a type based on
the view previously created.
- Second function that checks the difference, based on two calls of the
first function.
This change leads to a nice reduction of stats_import.sql, with a larger
effect on the output file.
While on it, this adds some sanity checks for the three catalogs, to
warn developers that the stats import facilities may need to be updated
if any of the three catalogs change. These are rare in practice, see
918eee0c49 as one example. Another stylistic change is the use of the
extended output format for the differential queries, so as we avoid long
lines of output if a diff is caught.
Author: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/CADkLM=eEhxJpSUP+eC=eMGZZsVOpnfKDvVkuCbsFg9CajYwDsA@mail.gmail.com
We were testing the truth value of the array of booleans (which is
always true) instead of the boolean element specific to the affected
table column.
This causes a binary-upgrade dump fail to omit the name of a constraint;
that is, the correct constraint name is always printed, even when it's
not needed. The affected case is a binary-upgrade dump of a not-null
constraint in an inherited column, which must in addition have no
comment.
Another point is that in order for this to make a difference, the
constraint must have the default name in the child table. That is, the
constraint must have been created _in the parent table_ with the name
that it would have in the child table, like so:
CREATE TABLE parent (a int CONSTRAINT child_a_not_null NOT NULL);
CREATE TABLE child () INHERITS (parent);
Otherwise, the correct name must be printed by binary-upgrade pg_dump
anyway, since it wouldn't match the name produced at the parent.
Moreover, when it does hit, the pre-18-compatibility code (which has to
work with a constraint that has no name) gets involved and uses the
UPDATE on pg_constraint using the conkey instead of column name ... and
so everything ends up working correctly AFAICS.
I think it might cause a problem if the table and column names are
overly long, but I didn't want to spend time investigating further.
Still, it's wrong code, and static analyzers have twice complained about
it, so fix it by adding the array index accessor that was obviously
meant.
Reported-by: Ranier Vilela <ranier.vf@gmail.com>
Reported-by: George Tarasov <george.v.tarasov@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/CAEudQAo7ah=4TDheuEjtb0dsv6bHoK7uBNqv53Tsub2h-xBSJw@mail.gmail.com
Discussion: https://postgr.es/m/f3029f25-acc9-4cb9-a74f-fe93bcfb3a27@gmail.com
For the libpq-oauth module to eventually make use of the
PGoauthBearerRequest API, it needs some additional functionality: the
derived Issuer ID for the authorization server needs to be provided, and
error messages need to be built without relying on PGconn internals.
These features seem useful for application hooks, too, so that they
don't each have to reinvent the wheel.
The original plan was for additions to PGoauthBearerRequest to be made
without a version bump to the PGauthData type. Applications would simply
check a LIBPQ_HAS_* macro at compile time to decide whether they could
use the new features. That theoretically works for applications linked
against libpq, since it's not safe to downgrade libpq from the version
you've compiled against.
We've since found that this strategy won't work for plugins, due to a
complication first noticed during the libpq-oauth module split: it's
normal for a plugin on disk to be *newer* than the libpq that's loading
it, because you might have upgraded your installation while an
application was running. (In other words, a plugin architecture causes
the compile-time and run-time dependency arrows to point in opposite
directions, so plugins won't be able to rely on the LIBPQ_HAS_* macros
to determine what APIs are available to them.)
Instead, extend the original PGoauthBearerRequest (now retroactively
referred to as "v1" in the code) with a v2 subclass-style struct. When
an application implements and accepts PQAUTHDATA_OAUTH_BEARER_TOKEN_V2,
it may safely cast the base request pointer it receives in its callbacks
to v2 in order to make use of the new functionality. libpq will query
the application for a v2 hook first, then v1 to maintain backwards
compatibility, before giving up and using the builtin flow.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAOYmi%2BmrGg%2Bn_X2MOLgeWcj3v_M00gR8uz_D7mM8z%3DdX1JYVbg%40mail.gmail.com
pg_dumpall is missing checks for some conflicting options,
including those passed through to pg_dump. To fix, introduce a
new function that checks whether mutually exclusive options are
set, and use that in pg_dumpall. A similar change could likely be
made for pg_dump and pg_restore, but that is left as a future
exercise.
This is arguably a bug fix, but since this might break existing
scripts, no back-patch for now.
Author: Jian He <jian.universality@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Wang Peng <215722532@qq.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CACJufxFf5%3DwSv2MsuO8iZOvpLZQ1-meAMwhw7JX5gNvWo5PDug%40mail.gmail.com
The idea is to encourage the use of newer routines across the tree, as
these offer stronger type-safety guarantees than raw palloc().
Similar work has been done in commits 1b105f9472, 0c3c5c3b06,
31d3847a37, and 4f7dacc5b8. This commit extends those changes to
more locations within src/backend/replication/logical/.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAHut+Pv4N7Vpxo18+NAR1r9RGvR8b0BtwTkoeCE2PfFoXgmR6A@mail.gmail.com
Until now, substitute_grouped_columns and its predecessor
check_ungrouped_columns intentionally did not cope with references
to GROUP BY expressions (anything more complex than a Var) within
subqueries of the query having GROUP BY. Because they didn't try to
match subexpressions of subqueries to the GROUP BY list, they'd drill
down to raw Vars of the grouping level and then fail with "subquery
uses ungrouped column from outer query". There have been remarkably
few complaints about this deficiency, so nobody ever did anything
about it.
The reason for not wanting to deal with it is that within a subquery,
Vars will have varlevelsup different from zero and will thus not be
equal() to the expressions seen in the outer query. We recognized
this at least as far back as 96ca8ffeb, although I think the comment
I added about it then was just documenting a pre-existing deficiency.
It looks like at the time, the solutions I considered were
(1) write a version of equal() that permits an offset in varlevelsup,
or (2) dynamically apply IncrementVarSublevelsUp at each
subexpression. (1) would require an amount of new code that seems
rather out of proportion to the benefit, while (2) would add an
exponential amount of cost to the matching process. But rethinking
it now, what seems attractive is (3) apply IncrementVarSublevelsUp to
the groupingClause list not the subexpressions, and do so only once
per subquery depth level. Then we can still use plain equal() to
check for matches, and we're not incurring cost proportional to some
power of the subquery's complexity.
This patch continues to use the old logic when the GROUP BY list is
all Vars. We could discard the special comparison logic for that and
always do it the more general way, but that would be a good deal
slower. (Micro-benchmarking just parse analysis suggests it's about
50% slower than the Vars-only path. But we've not heard complaints
about the speed of matching within the main query, so I doubt that
applying the same matching logic within subqueries will be a problem.)
The lack of complaints suggests strongly that this is a very minority
use-case, so I don't want to make the typical case slower to fix it.
While testing that, I was surprised to discover a nearby bug:
GROUPING() within a subquery fails to match GROUP BY Vars that are
join alias Vars. It tries to apply flatten_join_alias_vars to make
such cases work, but that fails to work inside a subquery because
varlevelsup is wrong. Therefore, this patch invents a new entry point
flatten_join_alias_for_parser() that allows specification of a
sublevels_up offset. (It seems cleaner to give the parser its own
entry point rather than abuse the planner's conventions even further.)
While this is pretty clearly a bug fix, I'm hesitant to take the risk
of back-patching, seeing that the existing behavior has stood for so
long with so few complaints. Maybe we can reconsider once this patch
has baked awhile in master.
Reported-by: PALAYRET Jacques <jacques.palayret@meteo.fr>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/531183.1772058731@sss.pgh.pa.us
Allow CREATE SUBSCRIPTION to accept a foreign server using the SERVER
clause instead of a raw connection string using the CONNECTION clause.
* Enables a user with sufficient privileges to create a subscription
using a foreign server by name without specifying the connection
details.
* Integrates with user mappings (and other FDW infrastructure) using
the subscription owner.
* Provides a layer of indirection to manage multiple subscriptions
to the same remote server more easily.
Also add CREATE FOREIGN DATA WRAPPER ... CONNECTION clause to specify
a connection_function. To be eligible for a subscription, the foreign
server's foreign data wrapper must specify a connection_function.
Add connection_function support to postgres_fdw, and bump postgres_fdw
version to 1.3.
Bump catversion.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/61831790a0a937038f78ce09f8dd4cef7de7456a.camel@j-davis.com
wait_event.h itself includes wait_event_types.h, which is a generated
file, so it's nice that we can avoid compiling >10% of the tree just
because that file is regenerated.
To avoid breaking too many third-party modules, we now #include
utils/wait_classes.h in storage/latch.h. Then, the very common case
of doing
WaitLatch(..., PG_WAIT_EXTENSION)
continues to work by including just storage/latch.h. (I didn't try to
determine how many modules would actually break if we don't do this, but
this seems a convenient and low-impact measure.)
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/202602181214.gcmhx2vhlxzp@alvherre.pgsql
Use a different way to write StaticAssertExpr() that does not require
the GCC extension statement expressions.
For C, we put the static_assert into a struct. This appears to be a
common approach.
We still need to keep the fallback implementation to support buggy
MSVC < 19.33.
For C++, we put it into a lambda expression. (The C approach doesn't
work; it's not permitted to define a new type inside sizeof.)
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/5fa3a9f5-eb9a-4408-9baf-403d281f8b10%40eisentraut.org
Previously, when logical replication was running, shutting down
the publisher could cause the logical walsender to enter a busy loop
and prevent the publisher from completing shutdown.
During shutdown, the logical walsender waits for all pending WAL
to be written out. However, some WAL records could remain unflushed,
causing the walsender to wait indefinitely.
The issue occurred because the walsender used XLogBackgroundFlush() to
flush pending WAL. This function does not guarantee that all WAL is written.
For example, WAL generated by a transaction without an assigned
transaction ID that aborts might not be flushed.
This commit fixes the bug by making the logical walsender call XLogFlush()
instead, ensuring that all pending WAL is written and preventing
the busy loop during shutdown.
Backpatch to all supported versions.
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAO6_Xqo3co3BuUVEVzkaBVw9LidBgeeQ_2hfxeLMQcXwovB3GQ@mail.gmail.com
Backpatch-through: 14
Commit fd7d7b7191 added regression tests to verify recovery_target_timeline
settings. To confirm that invalid values are rejected, those tests started the
server with an invalid setting and then verified that startup failed. While
functionally correct, this approach was expensive because it required
setting up and starting the server for each test case.
This commit updates the tests for recovery_target_timeline to use
the simpler approach introduced by commit bffd7130 for recovery_target_xid,
using ALTER SYSTEM SET to verify that invalid settings are rejected.
This avoids the need to set up and start the server when checking invalid
recovery_target_timeline values.
Author: David Steele <david@pgbackrest.org>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwG44vZbSoBmg076G+xkR6n=Tj2=q+fVkfP7yEsyF1daFA@mail.gmail.com
heap_freetuple() is a thin wrapper doing a pfree(), and the function
import_pg_statistic(), introduced by ba97bf9cb7, had the idea to call
directly pfree() rather than the "dedicated" heap tuple routine.
upsert_pg_statistic_ext_data already uses heap_freetuple(). This code
is harmless as-is, but let's be consistent across the board.
Reported-by: Yonghao Lee <yonghao_lee@qq.com>
Discussion: https://postgr.es/m/tencent_CA1315EE8FB9C62F742C71E95FAD72214205@qq.com
The commit 0d2d4a0ec3 allowed pg_sync_replication_slots() to retry sync
attempts, but missed a case, when WAL prior to a slot's
confirmed_flush_lsn is not yet flushed locally.
By changing the elevel from ERROR to LOG, we allow the sync loop to
continue. This provides the opportunity for the slot to be synchronized
once the standby catches up with the necessary WAL.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAFPTHDZAA+gWDntpa5ucqKKba41=tXmoXqN3q4rpjO9cdxgQrw@mail.gmail.com
This commit introduces pg_stat_recovery, that exposes at SQL level the
state of recovery as tracked by XLogRecoveryCtlData in shared memory,
maintained by the startup process. This new view includes the following
fields, that are useful for monitoring purposes on a standby, once it
has reached a consistent state (making the execution of the SQL function
possible):
- Last-successfully replayed WAL record LSN boundaries and its timeline.
- Currently replaying WAL record end LSN and its timeline.
- Current WAL chunk start time.
- Promotion trigger state.
- Timestamp of latest processed commit/abort.
- Recovery pause state.
Some of this data can already be recovered from different system
functions, but not all of it. See pg_get_wal_replay_pause_state or
pg_last_xact_replay_timestamp. This new view offers a stronger
consistency guarantee, by grabbing the recovery state for all fields
through one spinlock acquisition.
The system view relies on a new function, called pg_stat_get_recovery().
Querying this data requires the pg_read_all_stats privilege. The view
returns no rows if the node is not in recovery.
This feature originates from a suggestion I have made while discussion
the addition of a CONNECTING state to the WAL receiver's shared memory
state, because we lacked access to some of the state data. The author
has taken the time to implement it, so thanks for that.
Bump catalog version.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/CABPTF7W+Nody-+P9y4PNk37-QWuLpfUrEonHuEhrX+Vx9Kq+Kw@mail.gmail.com
Discussion: https://postgr.es/m/aW13GJn_RfTJIFCa@paquier.xyz
This refactoring is going to be useful in an upcoming commit, to avoid
some code duplication with the function pg_get_wal_replay_pause_state(),
that returns a string for the recovery pause state.
Refactoring opportunity noticed while hacking on a different patch.
Discussion: https://postgr.es/m/CABPTF7W+Nody-+P9y4PNk37-QWuLpfUrEonHuEhrX+Vx9Kq+Kw@mail.gmail.com
Up to now, to create such a function, one had to make a pg_proc.dat
entry and then modify it with GRANT/REVOKE commands, which we put in
system_functions.sql. That seems a little ugly though, because it
violates the idea of having a single source of truth about the initial
contents of pg_proc, and it results in leaving dead rows in the
initial contents of pg_proc.
This patch improves matters by allowing aclitemin to work during early
bootstrap, before pg_authid has been loaded. On the same principle
that we use for early access to pg_type details, put a table of known
built-in role names into bootstrap.c, and use that in bootstrap mode.
To create a built-in function with a non-default ACL, one should write
the desired ACL list in its pg_proc.dat entry, using a simplified
version of aclitemout's notation: omit the grantor (if it is the
bootstrap superuser, which it pretty much always should be) and spell
the bootstrap superuser's name as POSTGRES, similarly to the notation
used elsewhere in src/include/catalog. This results in entries like
proacl => '{POSTGRES=X,pg_monitor=X}'
which shows that we've revoked public execute permissions and instead
granted that to pg_monitor.
In addition to fixing up pg_proc.dat entries, I got rid of some
role grants that had been stuck into system_functions.sql,
and instead put them into a new file pg_auth_members.dat;
that seems like a far less random place to put the information.
The correctness of the data changes can be verified by comparing the
initial contents of pg_proc and pg_auth_members before and after.
pg_proc should match exactly, but the OID column of pg_auth_members
will probably be different because those OIDs now get assigned a
little earlier in bootstrap. (I forced a catversion bump out of
caution, but it wasn't really necessary.)
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/183292bb-4891-4c96-a3ca-e78b5e0e1358@dunslane.net
Do not replace the target string unless the occurrence is surrounded
by whitespace or line start/end. This avoids potential false match
to a substring of a field. While we've not had trouble with that
up to now, the next patch creates hazards of false matches to
POSTGRES within an ACL field.
There is one call site that needs adjustment, as it was presuming
it could write "::1" and have that match "::1/128". For all the
others, this restriction is okay and strictly safer.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/183292bb-4891-4c96-a3ca-e78b5e0e1358@dunslane.net
The PruneState had members called "all_visible" and "all_frozen" which
reflect not the current state of the page but the state it could be in
once pruning and freezing have been executed. These are then saved in
the PruneFreezeResult so the caller can set the VM accordingly.
Prefix the PruneState members as well as the corresponsding
PruneFreezeResult members with "set_" to clarify that they represent the
proposed state of the all-visible and all-frozen bits for a heap page in
the visibility map, not the current state.
Author: Melanie Plageman <melanieplageman@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
This is similar to the other page accessors in bufpage.h. It improves
readability and avoids long lines.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/BD8B69E7-26D8-4706-9164-597C6AE57812%40gmail.com
heap_page_prune_and_freeze() and many of its helpers use the heap
buffer, block number, and page. Other helpers took the heap page and
didn't use it. Initializing these values once during
prune_freeze_setup() simplifies the helpers' interfaces and avoids any
repeated calls to BufferGetBlockNumber() and BufferGetPage().
While updating PruneState, also reorganize its fields to make the layout
and member documentation more consistent.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/BD8B69E7-26D8-4706-9164-597C6AE57812%40gmail.com
It looks like whoever wrote the astreamer (nee bbstreamer) code
thought that pg_log_error() is equivalent to elog(ERROR), but
it's not; it just prints a message. So all these places tried to
continue on after a compression or decompression error return,
with the inevitable result being garbage output and possibly
cascading error messages. We should use pg_fatal() instead.
These error conditions are probably pretty unlikely in practice,
which no doubt accounts for the lack of field complaints.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/1531718.1772644615@sss.pgh.pa.us
Backpatch-through: 15
The oauth_validator tests don't currently support HTTPS, which makes
testing PGOAUTHCAFILE difficult. Add a localhost certificate to
src/test/ssl and make use of it in oauth_server.py.
In passing, explain the hardcoded use of IPv4 in our issuer identifier,
after intermittent failures on NetBSD led to commit 8d9d5843b. (The new
certificate is still set up for IPv6, to make it easier to improve that
behavior in the future.)
Patch by Jonathan Gonzalez V., with some additional tests and tweaks by
me.
Author: Jonathan Gonzalez V. <jonathan.abdiel@gmail.com>
Discussion: https://postgr.es/m/8a296a2c128aba924bff0ae48af2b88bf8f9188d.camel@gmail.com
Allow libpq clients to retrieve the current pg_g_threadlock pointer with
PQgetThreadLock(). Single-threaded applications could already do this in
a convoluted way:
pgthreadlock_t tlock;
tlock = PQregisterThreadLock(NULL);
PQregisterThreadLock(tlock); /* re-register the callback */
/* use tlock */
But a generic library can't do that without potentially breaking
concurrent libpq connections.
The motivation for doing this now is the libpq-oauth plugin, which
currently relies on direct injection of pg_g_threadlock, and should
ideally not.
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAOYmi%2BmEU_q9sr1PMmE-4rLwFN%3DOjyndDwFZvpsMU3RNJLrM9g%40mail.gmail.com
Discussion: https://postgr.es/m/CAOYmi%2B%3DMHD%2BWKD4rsTn0v8220mYfyLGhEc5EfhmtqrAb7SmC5g%40mail.gmail.com
Using conn->errorMessage for these "shouldn't-happen" cases will only
work if the connection itself fails. Our SSL and password callbacks
print WARNINGs when they find themselves in similar situations, so
follow their lead.
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAOYmi%2BmEU_q9sr1PMmE-4rLwFN%3DOjyndDwFZvpsMU3RNJLrM9g%40mail.gmail.com
This branch missed the IsolationUsesXactSnapshot() check. That led to EPQ on
repeatable read and serializable isolation levels. This commit fixes the
issue and provides a simple isolation check for that. Backpatch through v15
where MERGE statement was introduced.
Reported-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAPpHfdvzZSaNYdj5ac-tYRi6MuuZnYHiUkZ3D-AoY-ny8v%2BS%2Bw%40mail.gmail.com
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Backpatch-through: 15
Previously, the recovery_target_xid GUC values were not sufficiently validated.
As a result, clearly invalid inputs such as the string "bogus", a decimal value
like "1.1", or 0 (a transaction ID smaller than the minimum valid value of 3)
were unexpectedly accepted. In these cases, the value was interpreted as
transaction ID 0, which could cause recovery to behave unexpectedly.
This commit improves validation of recovery_target_xid GUC so that invalid
values are rejected with an error. This prevents recovery from proceeding
with misconfigured recovery_target_xid settings.
Also this commit updates the documentation to clarify the allowed values
for recovery_target_xid GUC.
Author: David Steele <david@pgbackrest.org>
Reviewed-by: Hüseyin Demir <huseyin.d3r@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/f14463ab-990b-4ae9-a177-998d2677aae0@pgbackrest.org
In ALTER TABLE ... ADD/DROP COLUMN, the COLUMN keyword is optional. However,
part of the documentation could be read as if COLUMN were required, which may
mislead users about the command syntax.
This commit updates the ALTER TABLE documentation to clearly state that
COLUMN is optional for ADD and DROP.
Also this commit adds regression tests covering ALTER TABLE ... ADD/DROP
without the COLUMN keyword.
Backpatch to all supported versions.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2n6ShLMOnjOtf63TjjgGbgiTVT5OMsSOFmbjGb6Xue1Bw@mail.gmail.com
Backpatch-through: 14
XLogRecoveryCtlData is the structure that stores the shared-memory state
of WAL recovery, including information such as promotion requests, the
timeline ID (TLI), and the LSNs of replayed records.
This refactoring is independently useful because it allows code outside
of core to access the recovery state in live. It will be used by an
upcoming patch that introduces a SQL function for querying this
information, that can be accessed on a standby once a consistent state
has been reached. This only moves code around, changing nothing
functionally.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/CABPTF7W+Nody-+P9y4PNk37-QWuLpfUrEonHuEhrX+Vx9Kq+Kw@mail.gmail.com
This fixes a problem similar to ad8c86d22c. In this case, the test
could fail under the following circumstances:
- The primary is stopped with teardown_node(), meaning that it may not
be able to send all its WAL records to standby_1 and standby_2.
- If standby_2 receives more records than standby_1, attempting to
reconnect standby_2 to the promoted standby_1 would fail because of a
timeline fork.
This race condition is fixed with a simple trick: instead of tearing
down the primary, it is stopped cleanly so as all the WAL records of the
primary are received and flushed by both standby_1 and standby_2. Once
we do that, there is no need for a wait_for_catchup() before stopping
the node. The test wants to check that a timeline jump can be achieved
when reconnecting a standby to a promoted standby in the same cluster,
hence an immediate stop of the primary is not required.
This failure is harder to reach than the previous instability of
009_twophase, still the buildfarm has been able to detect this failure
at least once. I have tried Alexander Lakhin's test trick with the
bgwriter and very aggressive standby snapshots, but I could not
reproduce it directly. It is reachable, as the buildfarm has proved.
Backpatch down to all supported branches, and this problem can lead to
spurious failures in the buildfarm.
Discussion: https://postgr.es/m/493401a8-063f-436a-8287-a235d9e065fc@gmail.com
Backpatch-through: 14
The default value for default_toast_compression was "pglz". The main
reason for this choice is that this option is always available, pglz
code being embedded in Postgres. However, it is known that LZ4 is more
efficient than pglz: less CPU required, more compression on average. As
of this commit, the default value of default_toast_compression becomes
"lz4", if available. By switching to LZ4 as the default, users should
see natural speedups on TOAST data reads and/or writes.
Support for LZ4 in TOAST compression was added in Postgres v14, or 5
releases ago. This should be long enough to consider this feature as
stable.
While at it, quotes are removed from default_toast_compression in
postgresql.conf.sample. Quotes are not required in this case. The
in-place value replacement done by initdb if the build supports LZ4
would not use them in the postgresql.conf file added to a
freshly-initialized cluster.
Note that this is a version lighter than 7c1849311e, that included a
replacement of --with-lz4 by --without-lz4 in configure builds, forcing
a requirement for LZ4 in all environments. The buildfarm did not like
it, at all. This commit switches default_toast_compression to lz4 as
default only when --with-lz4 is defined, which should keep the buildfarm
at bay while still allowing users to benefit from LZ4 compression in
TOAST as long as the code is compiled with it.
Author: Euler Taveira <euler@eulerto.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Aleksander Alekseev <aleksander@tigerdata.com>
Discussion: https://posgr.es/m/435df33a-129e-4f0c-a803-f3935c5a5ecb@eisentraut.org
This reverts commit 7c1849311e, due to the fact that more than 60% of
the buildfarm members do not have lz4 installed. As we are in the last
commit fest of the development cycle, and that it could take a couple
of weeks to stabilize things, this change is reverted for now.
This commit will be reworked in a lighter version, as
default_toast_compression's default can be changed to "lz4" without the
switch from --with-lz4 to --without-lz4. This approach will keep the
buildfarm at bay, and still allow builds to take advantage of LZ4 in
TOAST by default, as long as the code is compiled with LZ4 support.
A harder requirement based on LZ4 should be achievable at some point,
but it is going to require some work from the buildfarm owners first.
Perhaps this part could be revisited at the beginning of the next
development cycle.
Discussion: https://postgr.es/m/CAOYmi+meTT0NbLbnVqOJD5OKwCtHL86PQ+RZZTrn6umfmHyWaw@mail.gmail.com
This is a followup to commit 763aaa06f0 Add non-text output formats to
pg_dumpall.
Add a --no-globals option to pg_restore that skips restoring global
objects (roles and tablespaces) when restoring from a pg_dumpall
archive. When -C/--create is not specified, databases that do not
already exist on the target server are also skipped.
This is useful when restoring only specific databases from a pg_dumpall
archive without needing the global objects to be restored first.
Author: Mahendra Singh Thalor <mahi6run@gmail.com>
With small tweaks by me.
Discussion: https://postgr.es/m/CAKYtNArdcc5kx1MdTtTKFNYiauo3=zCA-NB0LmBCW-RU_kSb3A@mail.gmail.com
The previous idea was "scale up the bucketsize estimate by the ratio
of the MCV's frequency to the average value's frequency". But we
should have been suspicious of that plan, since it frequently led to
impossible (> 1) values which we had to apply an ad-hoc clamp to.
Joel Jacobson demonstrated that it sometimes leads to making the
wrong choice about which side of the hash join should be inner.
Instead, drop the whole business of estimating average frequency, and
just clamp the bucketsize estimate to be at least the MCV's frequency.
This corresponds to the bucket size we'd get if only the MCV appears
in a bucket, and the MCV's frequency is not affected by the
WHERE-clause filters. (We were already making the latter assumption.)
This also matches the coding used since 4867d7f62 in the case where
only a default ndistinct estimate is available.
Interestingly, this change affects no existing regression test cases.
Add one to demonstrate that it helps pick the smaller table to be
hashed when the MCV is common enough to affect the results.
This leaves estimate_hash_bucket_stats not considering the effects of
null join keys at all, which we should probably improve. However,
I have a different patch in the queue that will change the executor's
handling of null join keys, so it seems appropriate to wait till
that's in before doing anything more here.
Reported-by: Joel Jacobson <joel@compiler.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Joel Jacobson <joel@compiler.org>
Discussion: https://postgr.es/m/341b723c-da45-4058-9446-1514dedb17c1@app.fastmail.com
The code path in astreamer_lz4_decompressor_content() that updated
the output pointers when the output buffer isn't full was wrong.
It advanced next_out by bytes_written, which could include previous
decompression output not just that of the current cycle. The
correct amount to advance is out_size. While at it, make the
output pointer updates look more like the input pointer updates.
This bug is pretty hard to reach, as it requires consecutive
compression frames that are too small to fill the output buffer.
pg_dump could have produced such data before 66ec01dc4, but
I'm unsure whether any files we use astreamer with would be
likely to contain problematic data.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/0594CC79-1544-45DD-8AA4-26270DE777A7@gmail.com
Backpatch-through: 15
Extend CREATE PUBLICATION ... FOR ALL TABLES to support the EXCEPT TABLE
syntax. This allows one or more tables to be excluded. The publisher will
not send the data of excluded tables to the subscriber.
To support this, pg_publication_rel now includes a prexcept column to flag
excluded relations. For partitioned tables, the exclusion is applied at
the root level; specifying a root table excludes all current and future
partitions in that tree.
Follow-up work will implement ALTER PUBLICATION support for managing these
exclusions.
Author: vignesh C <vignesh21@gmail.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/CALDaNm3=JrucjhiiwsYQw5-PGtBHFONa6F7hhWCXMsGvh=tamA@mail.gmail.com
The phase of the test where we want to check that 2PC transactions
prepared on a primary can be committed on a promoted standby relied on
an immediate stop of the primary. This logic has a race condition: it
could be possible that some records (most likely standby snapshot
records) are generated on the primary before it finishes its shutdown,
without the promoted standby know about them. When the primary is
recycled as new standby, the test could fail because of a timeline fork
as an effect of these extra records.
This fix takes care of the instability by doing a clean stop of the
primary instead of a teardown (aka immediate stop), so as all records
generated on the primary are sent to the promoted standby and flushed
there. There is no need for a teardown of the primary in this test
scenario: the commit of 2PC transactions on a promoted standby do not
care about the state of the primary, only of the standby.
This race is very hard to hit in practice, even slow buildfarm members
like skink have a very low rate of reproduction. Alexander Lakhin has
come up with a recipe to improve the reproduction rate a lot:
- Enable -DWAL_DEBUG.
- Patch the bgwriter so as standby snapshots are generated every
milliseconds.
- Run 009_twophase tests under heavy parallelism.
With this method, the failure appears after a couple of iterations.
With the fix in place, I have been able to run more than 50 iterations
of the parallel test sequence, without seeing a failure.
Issue introduced in 30820982b2, due to a copy-pasto coming from the
surrounding tests. Thanks also to Hayato Kuroda for digging into the
details of the failure. He has proposed a fix different than the one of
this commit. Unfortunately, it relied on injection points, feature only
available in v17. The solution of this commit is simpler, and can be
applied to v14~v16.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/b0102688-6d6c-c86a-db79-e0e91d245b1a@gmail.com
Backpatch-through: 14
The default value for default_toast_compression was "pglz". The main
reason for this choice is that this option is always available, pglz
code being embedded in Postgres. However, it is known that LZ4 is more
efficient than pglz: less CPU required, more compression on average. As
of this commit, the default value of default_toast_compression becomes
"lz4", if available. By switching to LZ4 as the default, users should
see natural speedups on TOAST data reads and/or writes.
Support for LZ4 in TOAST compression was added in Postgres v14, or 5
releases ago. This should be long enough to consider this feature as
stable.
--with-lz4 is removed, replaced by a --without-lz4 to disable LZ4 in the
builds on an option-basis, following a practice similar to readline or
ICU. References to --with-lz4 are removed from the documentation.
While at it, quotes are removed from default_toast_compression in
postgresql.conf.sample. Quotes are not required in this case. The
in-place value replacement done by initdb if the build supports LZ4
would not use them in the postgresql.conf file added to a
freshly-initialized cluster.
For the reference, a similar switch has been done with ICU in
fcb21b3acd. Some of the changes done in this commit are consistent
with that.
Note: this is going to create some disturbance in the buildfarm, in
environments where lz4 is not installed.
Author: Euler Taveira <euler@eulerto.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Aleksander Alekseev <aleksander@tigerdata.com>
Discussion: https://posgr.es/m/435df33a-129e-4f0c-a803-f3935c5a5ecb@eisentraut.org
In apply_child_basequals, after translating a parent relation's
restriction quals for a child relation, we simplify each child qual by
calling eval_const_expressions. Historically, the code then called
restriction_is_always_false and restriction_is_always_true to reduce
NullTest quals that are provably false or true.
However, since commit e2debb643, the planner natively performs
NullTest deduction during constant folding. Therefore, calling
restriction_is_always_false and restriction_is_always_true immediately
afterward is redundant and wastes CPU cycles. We can safely remove
them and simply rely on the constant folding to handle the deduction.
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-vLmGXaUEZyOMacN0BVfqWCt2tM-eDVWdDfJnOQaauGg@mail.gmail.com
The SAMESIGN macro was historically used as a helper for manual
integer overflow checks. However, since commit 4d6ad3125 introduced
overflow-aware integer operations, this manual sign-checking logic is
no longer necessary.
The macro remains defined in brin_minmax_multi.c and timestamp.c, but
is not used in either file. This patch removes these definitions to
clean things up.
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-NL3J3hQ3LzrwV-YUkQC18P+jM7ZiegQyAHzgdZev2qg@mail.gmail.com
When working on an already-defined view with matching attributes, CREATE
OR REPLACE VIEW would internally generate an ALTER TABLE command with a
set of AT_AddColumnToView sub-commands, one for each attribute added.
Such a command is stored in event triggers twice:
- Once as a simple command.
- Once as an ALTER TABLE command, as it has sub-commands.
There was no test coverage to track this command pattern in terms of
event triggers and DDL deparsing:
- For the test module test_ddl_deparse, two command notices are issued.
- For event triggers, a CREATE VIEW command is logged twice, which may
look a bit weird first, but again this maps with the internal behavior
of how the commands are built, and how the event trigger code reacts in
terms of commands gathered.
While on it, this adds a test for CREATE SCHEMA with a CREATE VIEW
command embedded in it, case supported by the grammar but not covered
yet.
This hole in the test coverage has been found while digging into what
would be a similar behavior for sequences if adding attributes to them
with ALTER TABLE variants, after the initial relation creation.
Discussion: https://postgr.es/m/aaFG9bqkEn0RhLJG@paquier.xyz
Read stream users can now pause lookahead when no blocks are currently
available. After resuming, subsequent read_stream_next_buffer() calls
continue lookahead with the previous lookahead distance.
This is especially useful for read stream users with self-referential
access patterns (where consuming already-read buffers can produce
additional block numbers).
Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJLT2JvWLEiBXMbkSSc5so_Y7%3DN%2BS2ce7npjLw8QL3d5w%40mail.gmail.com
The documentation previously had a systemd unit file that would not
attempt to recover from process failures such as OOM's, segfaults,
etc. This commit adds "Restart=on-failure",` which tells systemd to
attempt to restart the process after failure. This is the recommended
configuration per the systemd documentation: "Setting this to
on-failure is the recommended choice for long-running services". Many
PostgreSQL users will simply copy/paste what the PostgreSQL
documentation recommends and will probably do their own research and
change the service file to restart on failure, so might as well set
this as the default in the PostgreSQL documentation.
Author: Andrew Jackson <andrewjackson947@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAKK5BkFfMpAQnv8CLs%3Di%3DrZwurtCV_gmfRb0uZi-V%2Bd6wcryqg%40mail.gmail.com
Adjust a couple of for-loops where a local variable was shadowed by
another in the same scope, by renaming it as well as reducing its scope
to the containing for-loop.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAEoWx2kQ2x5gMaj8tHLJ3=jfC+p5YXHkJyHrDTiQw2nn2FJTmQ@mail.gmail.com
Commit c66a7d75e6 introduced a new "cast discards ‘volatile’"
warning (-Wcast-qual) in vac_truncate_clog().
Instead of making use of unvolatize(), remove the warning by reducing the
scope of the volatile qualifier (added in commit 2d2e40e3be) to only
2 fields.
Also do the same for vac_update_datfrozenxid(), since the intent of
commit f65ab862e3 was to prevent the same kind of race condition that
commit 2d2e40e3be was fixing.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Suggested-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/aZ3a%2BV82uSfEjDmD%40ip-10-97-1-34.eu-west-3.compute.internal
If ON_ERROR SET_NULL is specified during COPY FROM, any data type
conversion errors will result in the affected column being set to a
null value. A column's not-null constraints are still enforced, and
attempting to set a null value in such columns will raise a constraint
violation error. This applies to a column whose data type is a domain
with a NOT NULL constraint.
Author: Jian He <jian.universality@gmail.com>
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com>
Reviewed-by: Yugo NAGATA <nagata@sraoss.co.jp>
Reviewed-by: torikoshia <torikoshia@oss.nttdata.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/CAKFQuwawy1e6YR4S%3Dj%2By7pXqg_Dw1WBVrgvf%3DBP3d1_aSfe_%2BQ%40mail.gmail.com
Clarify the documentation of COMMENT ON to state that specifying an empty
string is treated as NULL, meaning that the comment is removed.
This makes the behavior explicit and avoids possible confusion about how
empty strings are handled.
Also adds regress test cases that use empty string to remove a comment.
Backpatch to all supported versions.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Shengbin Zhao <zshengbin91@gmail.com>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: zhangqiang <zhang_qiang81@163.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/26476097-B1C1-4BA8-AA92-0AD0B8EC7190@gmail.com
Backpatch-through: 14
This commit adds support for the restore of extended statistics of the
kind "exprs", counting for the statistics data computed for expressions.
The input format consists of a jsonb object which must be an array of
objects which are keyed by statistics parameter names, like this:
[{"stat_type1": "...", "stat_type2": "...", ...},
{"stat_type1": "...", "stat_type2": "...", ...}, ...]
The outer array must have as many elements as there are expressions
defined in the statistics object, mapping with the way extended
statistics are built with one pg_statistic tuple stored for each
expression whose statistics have been computed. The elements of the
array must be either objects or null values (equivalent of invalid data,
case also supported by the stats computations when its data is inserted
in the catalogs).
The keys of the inner objects are names of the statistical columns in
pg_stats_ext_exprs (i.e. everything after "inherited"). Not all
parameter keys need to be provided, those omitted being silently
ignored. Key values that do not match a statistical column name will
cause a warning to be issued, but do not otherwise fail the expression
or the import as a whole.
The expected value type for all parameters is jbvString, which allows
us to validate the values using the input function specific to that
parameter. Any parameters with a null value are silently ignored, same
as if they were not provided in the first place.
This commit includes a battery of test cases:
- Sanity checks for what-should-be-all the failures in restore code
paths, including parsing errors, parameter sanity checks depending on
the extended stats object definition, etc.
- Value injection, for scalar, array, range, multi-range cases.
- Stats data cloning, with differential checks between the source
relation and its target. The source and the target should hold the same
stats data after restore.
- While expressions are supported in extended statistics since v14,
range_length_histogram, range_empty_frac, and range_bounds_histogram
have been added to pg_stat_ext_exprs only in v19. A test case has been
added to emulate a dump taken from v18, with expression stats restored
for a range data type where these three fields are NULL.
Support for pg_dump is included, with expressions supported since v14,
inherited since v15, and data for range types in expressions in v19.
pg_upgrade is the main use-case of this feature; it is also possible to
inject statistics, same as for the other extstat kinds.
As of this commit, ANALYZE should not be required after pg_upgrade when
the cluster upgrading from uses extended statistics, as MCV,
dependencies, expressions and ndistinct stats are all covered. The
stats data related to range types used in expressions requires v19,
whose support has also been added.
Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=fPcci6oPyuyEZ0F4bWqAA7HzaWO+ZPptufuX5_uWt6kw@mail.gmail.com
We hadn't noticed this violation of -Wshadow=compatible-local
because this function isn't compiled without -DTRGM_REGEXP_DEBUG.
As long as we have to clean it up, let's do so by converting all
this function's loops to use C99 loop-local control variables.
Reported-by: Sergei Kornilov <sk@zsrv.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3009911772478436@08341ecb-668d-43a9-af4d-b45f00c72521
Presently, the GUC check hook for basic_archive.archive_directory
checks that the specified directory exists. Consequently, if the
directory does not exist at server startup, archiving will be stuck
indefinitely, even if it appears later. To fix, remove this check
from the hook so that archiving will resume automatically once the
directory is present. basic_archive must already be prepared to
deal with the directory disappearing at any time, so no additional
special handling is required.
Reported-by: Олег Самойлов <splarv@ya.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Discussion: https://postgr.es/m/73271769675212%40mail.yandex.ru
Backpatch-through: 15
Commit ab355e3a88 changed how the OldestMemberMXactId array is
indexed. It's no longer indexed by synthetic dummyBackendId, but with
ProcNumber. The PGPROC entries for prepared xacts come after auxiliary
processes in the allProcs array, which rendered the calculation for
MaxOldestSlot and the indexes into the array incorrect. (The
OldestVisibleMXactId array is not used for prepared xacts, and thus
never accessed with ProcNumber's greater than MaxBackends, so this
only affects the OldestMemberMXactId array.)
As a result, a prepared xact would store its value past the end of the
OldestMemberMXactId array, overflowing into the OldestVisibleMXactId
array. That could cause a transaction's row lock to appear invisible
to other backends, or other such visibility issues. With a very small
max_connections setting, the store could even go beyond the
OldestVisibleMXactId array, stomping over the first element in the
BufferDescriptor array.
To fix, calculate the array sizes more precisely, and introduce helper
functions to calculate the array indexes correctly.
Author: Yura Sokolov <y.sokolov@postgrespro.ru>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/7acc94b0-ea82-4657-b1b0-77842cb7a60c@postgrespro.ru
Backpatch-through: 17
Detailed completion of the RESET clause is still missing. Not sure a
detailed implementation is worth the trouble.
Author: Ian Lawrence Barwick <barwick@gmail.com>
Author: Vasuki M <vasukianand0119@gmail.com>
Reviewed-by: zengman <zengman@halodbtech.com>
Reviewed-by: Dharin Shah <dharinshah95@gmail.com>
Reviewed-by: Surya Poondla <suryapoondla4@gmail.com>
Discussion: https://postgr.es/m/CAB8KJ=iH_v1YB2ss1A=BqvOAf28OVYiWRqUdE6TJ3pP-RdsPig@mail.gmail.com
In commits 29d75b25b et al, I made pg_dumpall's dumpRoleMembership
logic treat a dangling grantor OID the same as dangling role and
member OIDs: print a warning and skip emitting the GRANT. This wasn't
terribly well thought out; instead, we should handle the case by
emitting the GRANT without the GRANTED BY clause. When the source
database is pre-v16, such cases are somewhat expected because those
versions didn't prevent dropping the grantor role; so don't even
print a warning that we did this. (This change therefore restores
pg_dumpall's pre-v16 behavior for these cases.) The case is not
expected in >= v16, so then we do print a warning, but soldiering on
with no GRANTED BY clause still seems like a reasonable strategy.
Per complaint from Robert Haas that we were now dropping GRANTs
altogether in easily-reachable scenarios.
Reported-by: Robert Haas <robertmhaas@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA+TgmoauoiW4ydDhdrseg+DD4Kwha=+TSZp18BrJeHKx3o1Fdw@mail.gmail.com
Backpatch-through: 16
All-visible pages can't contain prunable tuples. We already clear the
prune hint (pd_prune_xid) during pruning of all-visible pages, but we
were not doing so in vacuum phase three, nor initializing it for
all-frozen pages created by COPY FREEZE, and we were not clearing it on
standbys.
Because page hints are not WAL-logged, pages on a standby carry stale
pd_prune_xid values. After promotion, that stale hint triggers
unnecessary on-access pruning.
Fix this by clearing the prune hint everywhere we currently mark a heap
page all-visible. Clearing it when setting PD_ALL_VISIBLE ensures no
extra overhead.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/CAAKRu_b-BMOyu0X-0jc_8bWNSbQ5K6JTEueayEhcQuw-OkCSKg%40mail.gmail.com
Calling copyObject in C++ without GNU extensions (e.g. when using
-std=c++11 instead of -std=gnu++11) fails with an error like this:
error: use of undeclared identifier 'typeof'; did you mean 'typeid'
This is due to the C compiler used to compile PostgreSQL supporting
typeof, but that function actually not being present in the C++
compiler. This fixes that by explicitely checking for typeof support
in C++, and then either use that or define typeof ourselves as:
std::remove_reference_t<decltype(x)>
According to the paper that led to adding typeof to the C standard,
that's the C++ equivalent of the C typeof:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2927.htm#existing-decltype
Author: Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/DGPW5WCFY7WY.1IHCDNIVVT300%2540jeltef.nl
We can use either of these to implement a missing explicit_bzero().
explicit_memset() is supported on NetBSD. NetBSD hitherto didn't have
a way to implement explicit_bzero() other than the fallback variant.
memset_explicit() is the C23 standard, so we use it as first
preference. It is currently supported on:
- NetBSD 11
- FreeBSD 15
- glibc 2.43
It doesn't provide additional coverage, but as it's the new standard,
its availability will presumably grow.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/c4701776-8d99-41da-938d-88528a3adc15%40eisentraut.org
There are no known users of this flag. The last supposed user was
pglesslog, which is the reason why this flag has been introduced in
core, based on an historical search pointing at a8d539f124.
I have mentioned that we may want to remove this flag back in 2018, due
to zero users of it in core. More recently, Noah has pointed out that
this flag is not safe to use: XLP_BKP_REMOVABLE can be set by the WAL
writer in a lock-free fashion with runningBackups > 0, meaning that some
full-page images could be required but not logged, ultimately corrupting
backups.
Bump XLOG_PAGE_MAGIC.
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/20250705001628.c3.nmisch@google.com
Discussion: https://postgr.es/m/CAEze2WhiwKSoAvfUggjDeoeY0-rz9cTpfrHcqvBMmJxv-K_5DA@mail.gmail.com
The allocations used for the static array ExplainExtensionOptionArray,
that tracks a set of ExplainExtensionOption, used "char *" instead of
ExplainExtensionOption as the memory size consumed by one element,
underestimating the memory required by half.
The initial allocation of ExplainExtensionNameArray wants to hold 16
elements before being reallocated, and with "char *" it meant that there
was enough space only for 8 ExplainExtensionOption elements, 16 bytes
required for each element. The backend would crash once one tries to
register a 9th EXPLAIN option.
As far as I can see, the allocation formulas of GetExplainExtensionId()
have been copy-pasted to RegisterExtensionExplainOption(), but the
internal maths of the copy were not adjusted accordingly.
Oversight in c65bc2e1d1.
Author: Joel Jacobson <joel@compiler.org>
Discussion: https://postgr.es/m/2a4bd2f5-2a2f-409f-8ac7-110dd3fad4fc@app.fastmail.com
Backpatch-through: 18
This commit adds a new test module called "test_custom_types", that can
be used to stress code paths related to custom data type
implementations.
Currently, this is used as a test suite to validate the set of fixes
done in 3b7a6fa157, that requires some typanalyze callbacks that can
force very specific backend behaviors, as of:
- typanalyze callback that returns "false" as status, to mark a failure
in computing statistics.
- typanalyze callback that returns "true" but let's the backend know
that no interesting stats could be computed, with stats_valid set to
"false".
This could be extended more in the future if more problems are found.
For simplicity, the module uses a fake int4 data type, that requires a
btree operator class to be usable with extended statistics. The type is
created by the extension, and its properties are altered in the test.
Like 3b7a6fa157, this module is backpatched down to v14, for coverage
purposes.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aaDrJsE1I5mrE-QF@paquier.xyz
Backpatch-through: 14
This implements the tab completion that was marked as XXX TODO in the
source code. The following completion is now supported:
DELETE FROM <table> USING <TAB> -> list of relations supporting SELECT
This uses Query_for_list_of_selectables (instead of Query_for_list_of_tables)
because the USING clause can reference not only tables but also views and
other selectable objects, following the same syntax as the FROM clause
of a SELECT statement.
Author: Tatsuya Kawata <kawatatatsuya0913@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHza6qf0CLJuJr+5cQw0oWNebM5VyMB-ghoKBgnEjOQ_JtAiuw@mail.gmail.com
This commit addresses two defects regarding extended statistics on
expressions:
- When building extended statistics in lookup_var_attr_stats(), the call
to examine_attribute() did not account for the possibility of a NULL
return value. This can happen depending on the behavior of a typanalyze
callback — for example, if the callback returns false, if no rows are
sampled, or if no statistics are computed. In such cases, the code
attempted to build MCV, dependency, and ndistinct statistics using a
NULL pointer, incorrectly assuming valid statistics were available,
which could lead to a server crash.
- When loading extended statistics for expressions,
statext_expressions_load() did not account for NULL entries in the
pg_statistic array storing expression statistics. Such NULL entries can
be generated when statistics collection fails for an expression, as may
occur during the final step of serialize_expr_stats(). An extended
statistics object defining N expressions requires N corresponding
elements in the pg_statistic array stored for the expressions, and some
of these elements can be NULL. This situation is reachable when a
typanalyze callback returns true, but sets stats_valid to indicate that
no useful statistics could be computed.
While these scenarios cannot occur with in-core typanalyze callbacks, as
far as I have analyzed, they can be triggered by custom data types with
custom typanalyze implementations, at least.
No tests are added in this commit. A follow-up commit will introduce a
test module that can be extended to cover similar edge cases if
additional issues are discovered. This takes care of the core of the
problem.
Attribute and relation statistics already offer similar protections:
- ANALYZE detects and skips the build of invalid statistics.
- Invalid catalog data is handled defensively when loading statistics.
This issue exists since the support for extended statistics on
expressions has been added, down to v14 as of a4d75c86bf. Backpatch
to all supported stable branches.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aaDrJsE1I5mrE-QF@paquier.xyz
Backpatch-through: 14
In commit bd3e3e9e5, I over-hastily used 1 / rel->rows as the assumed
frequency of entries in a column that ANALYZE has found to be unique.
However, rel->rows is the number of table rows that are estimated to
pass the query's restriction conditions, so that we got a too-large
result if the query has selective restrictions. What I should have
used is 1 / rel->tuples, since that is the estimated total number of
table rows. The pre-existing code path that digs a frequency out of
the histogram produces a frequency relative to the whole table, so
surely this new alternative code path must do so as well. Any
correction needed on the basis of selectivity must be done by the
user of the mcv_freq value.
Fixing this causes all the regression test plans changed by bd3e3e9e5
to revert to what they had been, except for the first change in
join.out. As I correctly argued in bd3e3e9e5, in that test case we
have no stats and should not risk a hash join. Evidently I was less
correct to argue that the other changes were improvements.
Reported-by: Joel Jacobson <joel@compiler.org>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/341b723c-da45-4058-9446-1514dedb17c1@app.fastmail.com
Previously, the psql meta-commands that list publications, subscriptions,
and extended statistics did not display their associated comments,
whereas other \d meta-commands did. This made it inconvenient for users
to view these objects together with their descriptions.
This commit improves \dRp+ and \dRs+ to include comments for publications
and subscriptions. It also extends the \dX meta-command to accept the + option,
allowing comments for extended statistics to be shown when requested.
Author: Fujii Masao <masao.fujii@gmail.com>
Co-authored-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwGL4JqiKA26fnGx-cTM=VzoTs_uzqejvj4Fawyr4uLUUw@mail.gmail.com
- Call _xgetbv within x86_set_runtime_features rather than in a
separate function
- Use symbols for XCR mask bits rather than a magic constant
A future commit will build on this to detect YMM registers without
code duplication.
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CANWCAZbgEUFw7LuYSVeJ=Tj98R5HoOB1Ffeqk3aLvbw5rU5NTw@mail.gmail.com
Point out that Postgres automatically optimizes away the target list
of an EXISTS' subquery, except in weird cases such as target lists
containing set-returning functions. Thus, both common conventions
EXISTS(SELECT * FROM ...) and EXISTS(SELECT 1 FROM ...) are
overhead-free and there's little reason to prefer one over the other.
In the code comments, mention that the SQL spec says that
EXISTS(SELECT * FROM ...) should be interpreted as EXISTS(SELECT
some-literal FROM ...), but we don't choose to do it exactly that way.
Author: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/9b301c70-3909-4f0f-98ca-9e3c4d142f3e@eisentraut.org
The RTE's groupexprs list is used for deparsing views, and for that
usage it must contain the original alias Vars; else we can get
incorrect SQL output. But since commit 247dea89f,
parseCheckAggregates put the GROUP BY expressions through
flatten_join_alias_vars before building the RTE_GROUP RTE.
Changing the order of operations there is enough to fix it.
This patch unfortunately can do nothing for already-created views:
if they use a coding pattern that is subject to the bug, they will
deparse incorrectly and hence present a dump/reload hazard in the
future. The only fix is to recreate the view from the original SQL.
But the trouble cases seem to be quite narrow. AFAICT the output
was only wrong for "SELECT ... t1 LEFT JOIN t2 USING (x) GROUP BY x"
where t1.x and t2.x were not of identical data types and t1.x was
the side that required an implicit coercion. If there was no hidden
coercion, or if the join was plain, RIGHT, or FULL, the deparsed
output was uglier than intended but not functionally wrong.
Reported-by: Swirl Smog Dowry <swirl-smog-dowry@duck.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CA+-gibjCg_vjcq3hWTM0sLs3_TUZ6Q9rkv8+pe2yJrdh4o4uoQ@mail.gmail.com
Backpatch-through: 18
We now maintain an array of booleans that indicate which features were
detected at runtime. When code wants to check for a given feature,
the array is automatically checked if it has been initialized and if
not, a single function checks all features at once.
Move all x86 feature detection to pg_cpu_x86.c, and move the CRC
function choosing logic to the file where the hardware-specific
functions are defined, consistent with more recent hardware-specific
files in src/port.
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CANWCAZbgEUFw7LuYSVeJ=Tj98R5HoOB1Ffeqk3aLvbw5rU5NTw@mail.gmail.com
This commit updates the frontend tools (src/bin/, contrib/ and
src/test/) to use the memory allocation variants based on
pg_malloc_object() and pg_malloc_array() in various code paths. This
does not cover all the allocations, but a good chunk of them.
Like all the changes of this kind (31d3847a37, etc.), this should
encourage any future code to use this new style.
Author: Andreas Karlsson <andreas@proxel.se>
Discussion: https://postgr.es/m/cfb645da-6b3a-4f22-9bcc-5bc46b0e9c61@proxel.se
heapam_scan_analyze_next_tuple() doesn't distinguish between dead and
recently dead tuples when counting them, so it doesn't need OldestXmin.
GetOldestNonRemovableTransactionId() isn't free, so removing it is a
win.
Looking at other table AMs implementing table_scan_analyze_next_tuple(),
we couldn't find one using OldestXmin either, so remove it from the
callback.
Author: Melanie Plageman <melanieplageman@gmail.com>
Suggested-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CALdSSPjvhGXihT_9f-GJabYU%3D_PjrFDUxYaURuTbfLyQM6TErg%40mail.gmail.com
heap_page_would_be_all_visible() does not need to distinguish between
HEAPTUPLE_RECENTLY_DEAD and HEAPTUPLE_DEAD tuples: any tuple in a state
other than HEAPTUPLE_LIVE means the page is not all-visible and
heap_page_would_be_all_visible() returns false.
Given that, calling HeapTupleSatisfiesVacuum() is unnecessary, since it
performs extra work to distinguish between dead and recently dead tuples
using OldestXmin. Replace it with the more minimal
HeapTupleSatisfiesVacuumHorizon().
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CALdSSPjvhGXihT_9f-GJabYU%3D_PjrFDUxYaURuTbfLyQM6TErg%40mail.gmail.com
Commit 84d5efa7e3 missed some multibyte issues caused by short-circuit
logic in the callers. The callers assumed that if the predicate string
is longer than the label string, then it couldn't possibly be a match,
but it can be when using case-insensitive matching (LVAR_INCASE) if
casefolding changes the byte length.
Fix by refactoring to get rid of the short-circuit logic as well as
the function pointer, and consolidate the logic in a replacement
function ltree_label_match().
Discussion: https://postgr.es/m/02c6ef6cf56a5013ede61ad03c7a26affd27d449.camel@j-davis.com
Backpatch-through: 14
The LVRelState fields that track newly all-visible/all-frozen pages were
previously named vm_new_visible_pages, vm_new_frozen_pages, and
vm_new_visible_frozen_pages. The correct terminology is all-visible and
all-frozen; omitting “all” was open to misinterpretation, as the page
isn't visible or invisible, rather all the tuples on the page are
visible to all running and future transactions. Rename the members
accordingly.
Author: Melanie Plageman <melanieplageman@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
This macro had exactly one user in InstrStartNode, and the caller can
instead use INSTR_TIME_IS_ZERO / INSTR_TIME_SET_CURRENT directly.
This supports a future change that intends to modify the time source being
used in the InstrStartNode case.
Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAP53Pkx1bK1FB71_nBqYmzvSSXnp_MbE0ZDnU+baPJF6Ud2WDA@mail.gmail.com
This was incorrectly named "LT" for "larger than" in e5a5e0a907, but
that is against existing conventions, where "LT" means "less than".
Clarify by using "GT" for "greater than" in macro name, and add a missing
comment at the top of instr_time.h to note the macro's existence.
Reported by: Peter Smith <smithpb2250@gmail.com>
Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAHut%2BPut94CTpjQsqOJHdHkgJ2ZXq%2BqVSfMEcmDKLiWLW-hPfA%40mail.gmail.com
pg_dumpall can now produce output in custom, directory, or tar formats
in addition to plain text SQL scripts. When using non-text formats,
pg_dumpall creates a directory containing:
- toc.glo: global data (roles and tablespaces) in custom format
- map.dat: mapping between database OIDs and names
- databases/: subdirectory with per-database archives named by OID
pg_restore is extended to handle these pg_dumpall archives, restoring
globals and then each database. The --globals-only option can be used
to restore only the global objects.
This enables parallel restore of pg_dumpall output and selective
restoration of individual databases from a cluster-wide backup.
Author: Mahendra Singh Thalor <mahi6run@gmail.com>
Co-Author: Andrew Dunstan <andrew@dunslane.net>
Reviewed-By: Tushar Ahuja <tushar.ahuja@enterprisedb.com>
Reviewed-By: Jian He <jian.universality@gmail.com>
Reviewed-By: Vaibhav Dalvi <vaibhav.dalvi@enterprisedb.com>
Reviewed-By: Srinath Reddy <srinath2133@gmail.com>
Discussion: https://postgr.es/m/cb103623-8ee6-4ba5-a2c9-f32e3a4933fa@dunslane.net
Settings that ran the new test euc_kr.sql to completion would fail these
older src/pl tests. Use alternative expected outputs, for which psql
\gset and \if have reduced the maintenance burden. This fixes
"LANG=ko_KR.euckr LC_MESSAGES=C make check-world". (LC_MESSAGES=C fixes
IO::Pty usage in tests 010_tab_completion and 001_password.) That file
is new in commit c67bef3f32. Back-patch
to v14, like that commit.
Discussion: https://postgr.es/m/20260217184758.da.noahmisch@microsoft.com
Backpatch-through: 14
The documentation for the INCLUDING COMMENTS option of the LIKE clause
in CREATE TABLE was inaccurate and incomplete. It stated that comments for
copied columns, constraints, and indexes are copied, but regarding comments
on constraints in reality only comments on CHECK and NOT NULL constraints
are copied; comments on other constraints (such as primary keys) are not.
In addition, comments on extended statistics are copied, but this was not
documented.
The CREATE FOREIGN TABLE documentation had a similar omission: comments
on extended statistics are also copied, but this was not mentioned.
This commit updates the documentation to clarify the actual behavior.
The CREATE TABLE reference now specifies that comments on copied columns,
CHECK constraints, NOT NULL constraints, indexes, and extended statistics are
copied. The CREATE FOREIGN TABLE reference now notes that comments on
extended statistics are copied as well.
Backpatch to all supported versions. Documentation updates related to
CREATE FOREIGN TABLE LIKE and NOT NULL constraint comment copying are
not applied to v17 and earlier, since those features were introduced in v18.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwHSOSGcaYDvHF8EYCUCfGPjbRwGFsJ23cx5KbJ1X6JouQ@mail.gmail.com
Backpatch-through: 14
Previously, when one process woke another that was waiting on a lock,
ProcWakeup() incorrectly cleared its own waitStart field (i.e.,
MyProc->waitStart) instead of that of the process being awakened.
As a result, the awakened process retained a stale lock-wait start timestamp.
This did not cause user-visible issues. pg_locks.waitstart was reported as
NULL for the awakened process (i.e., when pg_locks.granted is true),
regardless of the waitStart value.
This bug was introduced by commit 46d6e5f567.
This commit fixes this by resetting the waitStart field of the process
being awakened in ProcWakeup().
Backpatch to all supported branches.
Reported-by: Chao Li <li.evan.chao@gmail.com>
Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: ji xu <thanksgreed@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/537BD852-EC61-4D25-AB55-BE8BE46D07D7@gmail.com
Backpatch-through: 14
The test added by commit 4b760a181 assumed that a table's physical
row order would be predictable after an UPDATE. But a non-heap table
AM might produce some other order. Even with heap AM, the assumption
seems risky; compare a3fd53bab for instance. Adding an ORDER BY is
cheap insurance and doesn't break any goal of the test.
Author: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CALT9ZEHcE6tpvumScYPO6pGk_ASjTjWojLkodHnk33dvRPHXVw@mail.gmail.com
Backpatch-through: 14
When simplify_EXISTS_query removes the GROUP BY clauses from an EXISTS
subquery, it previously deleted the RTE_GROUP RTE directly from the
subquery's range table.
This approach is dangerous because deleting an RTE from the middle of
the rtable list shifts the index of any subsequent RTE, which can
silently corrupt any Var nodes in the query tree that reference those
later relations. (Currently, this direct removal has not caused
problems because the RTE_GROUP RTE happens to always be the last entry
in the rtable list. However, relying on that is extremely fragile and
seems like trouble waiting to happen.)
Instead of deleting the RTE_GROUP RTE, this patch converts it in-place
to be RTE_RESULT type and clears its groupexprs list. This preserves
the length and indexing of the rtable list, ensuring all Var
references remain intact.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3472344.1771858107@sss.pgh.pa.us
Backpatch-through: 18
A future commit will move the CRC function choosing logic to the
file where the hardware-specific functions are defined, but until
then add guards for builds without those functions. Oversight in
commit b9278871f.
Per buildfarm animal rhinoceros
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/4014992.1771963187@sss.pgh.pa.us
The grease patch in 4966bd3ed found its first problem: prior to the
February 2018 patch releases, no server knew how to negotiate protocol
versions, so pg_upgrade needs to take that into account when speaking to
those older servers.
This will be true even after the grease feature is reverted; we don't
need anyone to trip over this again in the future. Backpatch so that all
supported versions of pg_upgrade can gracefully handle an update to the
default protocol version. (This is needed for any distributions that
link older binaries against newer libpqs, such as Debian.) Branches
prior to 18 need an additional version check, for the existence of
max_protocol_version.
Per buildfarm member crake.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAOYmi%2B%3D4QhCjssfNEoZVK8LPtWxnfkwT5p-PAeoxtG9gpNjqOQ%40mail.gmail.com
Backpatch-through: 14
Previously, backtrace generation on Windows would return an "unsupported"
message. With this commit, we rely on CaptureStackBackTrace() to capture
the call stack and the DbgHelp API (SymFromAddrW, SymGetLineFromAddrW64)
for symbol resolution.
Symbol handler initialization (SymInitialize) is performed once per
process and cached. If initialization fails, the report for it is
returned as the backtrace output. The symbol handler is cleaned up via
on_proc_exit() to release DbgHelp resources.
The implementation provides symbol names, offsets, and addresses. When
PDB files are available, it also includes source file names and line
numbers. Symbol names and file paths are converted from UTF-16 to the
database encoding using wchar2char(), which properly handles both UTF-8
and non-UTF-8 databases on Windows. When symbol information is
unavailable or encoding conversion fails, it falls back to displaying raw
addresses.
The implementation uses the explicit UTF16 versions of the DbgHelp
functions (SYMBOL_INFOW, SymFromAddrW, IMAGEHLP_LINEW64,
SymGetLineFromAddrW64) rather than the generic versions. This allows us
to rely on predictable encoding conversion, rather than using the
haphazard ANSI codepage that we'd get otherwise.
DbgHelp is apparently available on all Windows platforms we support, so
there are no version number checks.
Author: Bryan Green <dbryan.green@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Discussion: https://postgr.es/m/a692c0fe-caca-4c08-9c5d-debfd0ef2504@gmail.com
Currently, AlterDomainValidateConstraint will re-validate a constraint
that has already been validated, which would just waste cycles. This
operation should be a no-op when the constraint is already validated.
This also aligns with ATExecValidateConstraint.
Author: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxG=-Dv9fPJHqkA9c-wGZ2dDOWOXSp-X-0K_G7r-DgaASw@mail.gmail.com
Previously, changing the generation expression of a virtual column was
prohibited if the column was referenced by a CHECK constraint. This
lifts that restriction.
RememberAllDependentForRebuilding within ATExecSetExpression will
rebuild all the dependent constraints, later ATPostAlterTypeCleanup
queues the required AlterTableStmt operations for ALTER TABLE Phase 3
execution.
Overall, ALTER COLUMN SET EXPRESSION on virtual columns may require
scanning the table to re-verify any associated CHECK constraints, but
it does not require a table rewrite in ALTER TABLE Phase 3.
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CACJufxH3VETr7orF5rW29GnDk3n1wWbOE3WdkHYd3iPGrQ9E_A@mail.gmail.com
This commit includes a batch of fixes for various minor typos and
grammar mistakes, that have been proposed to the hackers mailing list
since the beginning of January.
Similar batches are planned on a bi-monthly basis depending on the
amount received, with the next one for the end of April.
The idea is to encourage more the use of these allocation routines
across the tree, as these offer stronger type safety guarantees than
pg_malloc() & friends (type cast in the result, sizeof() embedded).
This commit updates some code paths of src/fe_utils/.
This commit is similar to 31d3847a37.
Author: Henrik TJ <henrik@0x48.dk>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://postgr.es/m/6df1b64e-1314-9afd-41a3-3fefb76225e1@0x48.dk
Since the initial goal of pg_read_all_data was to be able to run
pg_dump as a non-superuser without explicitly granting access to
every object, it follows that it should allow reading all large
objects. For consistency, pg_write_all_data should allow writing
all large objects, too.
Author: Nitin Motiani <nitinmotiani@google.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/CAH5HC96dxAEvP78s1-JK_nDABH5c4w2MDfyx4vEWxBEfofGWsw%40mail.gmail.com
lgamma(NaN) should produce NaN, but on older versions of AIX
it reports an ERANGE error. While that's been fixed in the latest
version of libm, it'll take awhile for the fix to propagate. This
workaround is harmless even when the underlying bug does get fixed.
Discussion: https://postgr.es/m/3603369.1771877682@sss.pgh.pa.us
There were a couple of comments in parse_relation.c
> Note: properly, lockmode should be declared LOCKMODE not int, but that
> would require importing storage/lock.h into parse_relation.h. Since
> LOCKMODE is typedef'd as int anyway, that seems like overkill.
but actually LOCKMODE has been in storage/lockdefs.h for a while,
which is intentionally a more narrow header. So we can include that
one in parse_relation.h and just use LOCKMODE normally.
An alternative would be to add a duplicate typedef into
parse_relation.h, but that doesn't seem necessary here.
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/4bcd65fb-2497-484c-bb41-83cb435eb64d%40eisentraut.org
Send PG_PROTOCOL_GREASE and _pq_.test_protocol_negotiation, which were
introduced in commit d8d7c5dc8, by default, and fail the connection if
the server attempts to claim support for them. The hope is to provide
feedback to noncompliant implementations and gain confidence in our
ability to advance the protocol. (See the other commit for details.)
To help end users navigate the situation, a link to our documentation
that explains the behavior is displayed. We append this to the error
message when the NegotiateProtocolVersion response is incorrect, or when
the peer sends an error during startup that appears to be grease-
related.
It's still possible for users to connect to servers that don't support
protocol negotiation, by adding max_protocol_version=3.0 to their
connection strings. Only the default connection behavior is impacted.
This commit is tracked as a PG19 open item and will be reverted before
RC1. (The implementation here doesn't handle negotiation with later
server versions, so it can't be released into the wild as a
five-year-supported feature. But an improved implementation might be
able to do so, in the future...)
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/DDPR5BPWH1RJ.1LWAK6QAURVAY%40jeltef.nl
The concerns that led us to remove AIX support in commit 0b16bb877
have now been alleviated:
1. IBM has stepped forward to provide support, including buildfarm
animal(s).
2. AIX 7.2 and later seem to be fine with large pg_attribute_aligned
requirements. Since 7.1 is now EOL anyway, we can just cease to
support it.
3. Tossing xlc support overboard seems okay as well. It's a bit
sad to drop one of the few remaining non-gcc-alike compilers, but
working around xlc's bugs and idiosyncrasies doesn't seem justified
by the theoretical portability benefits.
4. Likewise, we can stop supporting 32-bit AIX builds. This is
not so much about whether we could build such executables as that
they're too much of a pain to manage in the field, due to limited
address space available for dynamic library loading.
5. We hit on a way to manage catalog column alignment that doesn't
require continuing developer effort (see commit ecae09725).
Hence, this commit reverts 0b16bb877 and some follow-on commits
such as e6bb491bf, except for not putting back XLC support nor
the changes related to catalog column alignment.
Some other notable changes from the way things were in v16:
Prefer unnamed POSIX semaphores on AIX, rather than the default
choice of SysV semaphores.
Include /opt/freeware/lib in -Wl,-blibpath, even when it is not
mentioned anywhere in LDFLAGS.
Remove platform-specific adjustment of MEMSET_LOOP_LIMIT; maybe
that's still the right thing, but it really ought to be re-tested.
Silence compiler warnings related to getpeereid(), wcstombs_l(),
and PAM conversation procs.
Accept "libpythonXXX.a" as an okay name for the Python shared
library (but only on AIX!).
Author: Aditya Kamath <Aditya.Kamath1@ibm.com>
Author: Srirama Kucherlapati <sriram.rk@in.ibm.com>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CY5PR11MB63928CC05906F27FB10D74D0FD322@CY5PR11MB6392.namprd11.prod.outlook.com
Because we assume that int64 and double have the same alignment
requirement, AIX's default behavior that alignof(double) = 4 while
alignof(int64) = 8 is a headache. There are two issues:
1. We align both int8 and float8 tuple columns per ALIGNOF_DOUBLE,
which is an ancient choice that can't be undone without breaking
pg_upgrade and creating some subtle SQL-level compatibility issues
too. However, the cost of that is just some marginal inefficiency
in fetching int8 values, which can't be too awful if the platform
architects were willing to pay the same costs for fetching float8s.
So our decision is to leave that alone. This patch makes our
alignment choices the same as they were pre-v17, namely that
ALIGNOF_DOUBLE and ALIGNOF_INT64_T are whatever the compiler prefers
and then MAXIMUM_ALIGNOF is the larger of the two. (On all supported
platforms other than AIX, all three values will be the same.)
2. We need to overlay C structs onto catalog tuples, and int8 fields
in those struct declarations may not be aligned to match this rule.
In the old branches we had some annoying rules about ordering catalog
columns to avoid alignment problems, but nobody wants to resurrect
those. However, there's a better answer: make the compiler construe
those struct declarations the way we need it to by using the pack(N)
pragma. This requires no manual effort to maintain going forward;
we only have to insert the pragma into all the catalog *.h files.
(As the catalogs stand at this writing, nothing actually changes
because we've not moved any affected columns since v16; hence no
catversion bump is required. The point of this is to not have
to worry about the issue going forward.)
We did not have this option when the AIX port was first made. This
patch depends on the C99 feature _Pragma(), as well as the pack(N)
pragma which dates to somewhere around gcc 4.0, and probably doesn't
exist in xlc at all. But now that we've agreed to toss xlc support
out the window, there doesn't seem to be a reason not to go this way.
In passing, I got rid of LONGALIGN[_DOWN] along with the configure
probes for ALIGNOF_LONG. We were not using those anywhere and it
seems highly unlikely that we'd do so in future. Instead supply
INT64ALIGN[_DOWN], which isn't used either but at least could
have a good reason to be used.
Discussion: https://postgr.es/m/1127261.1769649624@sss.pgh.pa.us
This uses the "connection warning" infrastructure introduced by
commit 1d92e0c2cc to emit a WARNING when an MD5 password is used to
authenticate. MD5 password support was marked as deprecated in
v18 and will be removed in a future release of Postgres. These
warnings are on by default but can be turned off via the existing
md5_password_warnings parameter.
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Xiangyu Liang <liangxiangyu_2013@163.com>
Discussion: https://postgr.es/m/aYzeAYEbodkkg5e-%40nathan
There are three static definitions of validate_relation_kind() in the
codebase, one each in table.c, indexam.c and sequence.c, validating that
the given relation is a table, an index or a sequence respectively.
The compiler knows which definition to use where because they are static.
But this could be confusing to a reader. Rename these functions so that
their names reflect the kind of relation they are validating. While at
it, also update the comments in table.c to clarify the definition of
table-like relkinds so that we don't have to maintain the exclusion list
as the set of relkinds undergoes changes.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/6d3fef19-a420-4e11-8235-8ea534bf2080%40eisentraut.org
It instead of checking which relkinds it shouldn't be, explicitly list
the ones we accept. This is used to check which relkinds are accepted
in table_open() and related functions. Before this change, figuring
that out was always a few steps too complicated. This also makes
changes for new relkinds more explicit instead of accidental.
Finally, this makes this more aligned with the functions of the same
name in src/backend/access/index/indexam.c and
src/backend/access/sequence/sequence.c.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/6d3fef19-a420-4e11-8235-8ea534bf2080%40eisentraut.org
Previously, these characters could cause problems when passed through
shell commands, and were flagged with a comment in string_utils.c
suggesting they be rejected in a future major release.
The affected commands are CREATE DATABASE, CREATE ROLE, CREATE TABLESPACE,
ALTER DATABASE RENAME, ALTER ROLE RENAME, and ALTER TABLESPACE RENAME.
Also add a pg_upgrade check to detect these invalid names in clusters
being upgraded from pre-v19 versions, producing a report file listing
any offending objects that must be renamed before upgrading.
Tests have been modified accordingly.
Author: Mahendra Singh Thalor <mahi6run@gmail.com>
Reviewed-By: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-By: Andrew Dunstan <andrew@dunslane.net>
Reviewed-By: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-By: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-By: Srinath Reddy <srinath2133@gmail.com>
Discussion: https://postgr.es/m/CAKYtNApkOi4FY0S7+3jpTqnHVyyZ6Tbzhtbah-NBbY-mGsiKAQ@mail.gmail.com
We now support the common meson option -Ddefault_library, with values
'both' (the default), 'shared' (install only shared libraries), and
'static' (install only static libraries). The 'static' choice doesn't
actually work, since psql and other programs insist on linking to the
shared version of libpq, but it's there pro-forma. It could be built
out if we really wanted, but since we have never supported the
equivalent in the autoconf build system, there doesn't appear to be an
urgent need.
With an eye to re-supporting AIX, the internal implementation
distinguishes whether to install libpgport.a and other static-only
libraries from whether to build/install the static variant of
libraries that we can build both ways. This detail isn't exposed as a
meson option, though it could be if there's demand.
The Cirrus CI task SanityCheck now uses -Ddefault_library=shared to
save a little bit of build time (and to test this option).
Author: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/e8aa97db-872b-4087-b073-f296baae948d@eisentraut.org
This replaces some loops over word-length popcount functions with
calls to pg_popcount(). Since pg_popcount() may use a function
pointer for inputs with sizes >= a Bitmapset word, this produces a
small regression for the common one-word case in bms_num_members().
To deal with that, this commit adds an inlined fast-path for that
case. This fast-path could arguably go in pg_popcount() itself
(with an appropriate alignment check), but that is left for future
study.
Suggested-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CANWCAZY7R%2Biy%2Br9YM_sySNydHzNqUirx1xk0tB3ej5HO62GdgQ%40mail.gmail.com
This commit replaces the implementations of pg_popcount{32,64} with
branchless ones in plain C. While these new implementations do not
make use of more sophisticated population count instructions
available on some CPUs, testing indicates they perform well,
especially now that they are inlined. Newer versions of popular
compilers will automatically replace these with special
instructions if possible, anyway. A follow-up commit will replace
various loops over these functions with calls to pg_popcount(),
leaving us little reason to worry about micro-optimizing them
further.
Since this commit removes the only uses of the popcount builtins,
we can also remove the corresponding configuration checks.
Suggested-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CANWCAZY7R%2Biy%2Br9YM_sySNydHzNqUirx1xk0tB3ej5HO62GdgQ%40mail.gmail.com
Future commits will consolidate the CPU feature detection functionality
now scattered around in various files, and the CRC "*_choose.c"
files seem to be the natural place for it. For now, just rename in
a separate commit to make it easier to follow the git log. Do the
minimum necessary to keep the build systems functional, and build the
new file pg_cpu_x86.c unconditionally using guards to control the
visibility of its contents, following the model of some more recent
files in src/port.
Limit scope to x86 to reduce the number of moving parts, since the
motivation for doing this now is to clear out some technical debt
before adding AVX2 detection. Arm is left for future work.
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CANWCAZbgEUFw7LuYSVeJ=Tj98R5HoOB1Ffeqk3aLvbw5rU5NTw@mail.gmail.com
On clang, -Wimplicit-fallthrough requires annotations with attributes,
but on gcc, -Wimplicit-fallthrough is the same as
-Wimplicit-fallthrough=3, which allows annotations with comments. In
order to enforce consistent annotations with attributes on both
compilers, we test first for -Wimplicit-fallthrough=5, which will
succeed on gcc, and if that is not found we test for
-Wimplicit-fallthrough.
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/76a8efcd-925a-4eaf-bdd1-d972cd1a32ff%40eisentraut.org
conflict.h currently includes utils/timestamp.h despite only requiring
basic timestamp type definitions. This creates unnecessary overhead.
Replace the include with datatype/timestamp.h to provide the necessary
types. This change requires explicitly including utils/timestamp.h in
test_custom_fixed_stats.c, which previously relied on the indirect
inclusion.
Extracted from the larger patch by Andres Freund.
Discussion: https://postgr.es/m/aY-UE-4t7FiYgH3t@alap3.anarazel.de
Adding this section brings consistency with the pages of other tools,
potentially easing the introduction of new options in the future as
these are now showing in the shape of a list.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://postgr.es/m/CAHut+PtSF5AW3DHpYA-_muDLms2xBUzHpd545snVj8vFpmsmGg@mail.gmail.com
On common architectures, the PGPROC struct happened to be a multiple
of 64 bytes on PG 18, but it's changed on 'master' since. There was
worry that changing the alignment might hurt performance, due to false
cacheline sharing across elements in the proc array. However, there
was no explicit alignment, so any alignment to cache lines was
accidental. Add explicit alignment to remove worry about false
sharing.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/3dd6f70c-b94d-4428-8e75-74a7136396be@iki.fi
The ordering was pretty random, making it hard to get an overview of
what's in it. Group related fields together, and add comments to act
as separators between the groups.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/3dd6f70c-b94d-4428-8e75-74a7136396be@iki.fi
This command requires an input file (WAL summary file), that has to be
specified without an option name. The shape of the command and how to
use this parameter is implied in its synopsis. However, this page
lacked a description of the parameter. Listing parameters that do
not require an option is a common practice across the docs. See for
example pg_dump, pg_restore, etc.
Author: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/CAHut+PtbQi8Dw_0upS9dd=Oh9OqfOdAo=0_DOKG=YSRT_a+0Fw@mail.gmail.com
If a CREATE TABLE statement defined a constraint whose name is identical
to the name generated for a NOT NULL constraint, we'd throw an
(unnecessary) unique key violation error on
pg_constraint_conrelid_contypid_conname_index: this can easily be
avoided by choosing a different name for the NOT NULL constraint.
Fix by passing the constraint names already created by
AddRelationNewConstraints() to AddRelationNotNullConstraints(), so that
the latter can avoid name collisions with them.
Bug: #19393
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reported-by: Hüseyin Demir <huseyin.d3r@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/19393-6a82427485a744cf@postgresql.org
The field was mainly used for the position in a LOCK's wait queue, but
also as the position in a the freelist when the PGPROC entry was not
in use. The reuse saves some memory at the expense of readability,
which seems like a bad tradeoff. If we wanted to make the struct
smaller there's other things we could do, but we're actually just
discussing adding padding to the struct for performance reasons.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/3dd6f70c-b94d-4428-8e75-74a7136396be@iki.fi
Following the example set by commit 58a359e585, we can squeeze out
a little more performance from COPY FROM (FORMAT {text,csv}) by
inlining CopyReadLineText() and passing the is_csv parameter as a
constant. This allows the compiler to emit specialized code with
fewer branches.
This is preparatory work for a proposed follow-up commit that would
further optimize this code with SIMD instructions.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Ayoub Kazar <ma_kazar@esi.dz>
Tested-by: Manni Wood <manni.wood@enterprisedb.com>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
If the 'bounds' array needs to be expanded, because the input contains
more trigrams than the initial guess, the code didn't return the
reallocated array correctly to the caller. That could lead to a crash
in the rare case that the input string becomes longer when it's
lower-cased. The only known instance of that is when an ICU locale is
used with certain single-byte encodings. This was an oversight in
commit 00896ddaf4.
Author: Zsolt Parragi <zsolt.parragi@percona.com>
Backpatch-through: 18
When adjust_appendrel_attrs translates a Var referencing a parent
relation into a Var referencing a child relation, it propagates
varnullingrels from the parent Var to the translated Var. Previously,
the code simply overwrote the translated Var's varnullingrels with
those of the parent.
This was incorrect because the translated Var might already possess
nonempty varnullingrels. This happens, for example, when a LATERAL
subquery within a UNION ALL references a Var from the nullable side of
an outer join. In such cases, the translated Var correctly carries
the outer join's relid in its varnullingrels. Overwriting these bits
with the parent Var's set caused the planner to lose track of the fact
that the Var could be nulled by that outer join.
In the reported case, because the underlying column had a NOT NULL
constraint, the planner incorrectly deduced that the Var could never
be NULL and discarded essential IS NOT NULL filters. This led to
incorrect query results where NULL rows were returned instead of being
filtered out.
To fix, use bms_add_members to merge the parent Var's varnullingrels
into the translated Var's existing set, preserving both sources of
nullability.
Back-patch to v16. Although the reported case does not seem to cause
problems in v16, leaving incorrect varnullingrels in the tree seems
like a trap for the unwary.
Bug: #19412
Reported-by: Sergey Shinderuk <s.shinderuk@postgrespro.ru>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19412-1d0318089b86859e@postgresql.org
Backpatch-through: 16
pgstat.h is a widely included header. Including worker_internal.h there is
unnecessary and creates tight coupling. By refactoring
pgstat_report_subscription_error() to fetch the required
LogicalRepWorkerType internally rather than receiving it as an argument,
we can eliminate the need for the internal header.
Reported-by: Andres Freund <andres@anarazel.de>
Author: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/aY-UE-4t7FiYgH3t@alap3.anarazel.de
S_LOCK_FREE() is used by the test program in s_lock.c, but nobody
has voiced concerns about losing some coverage there.
SpinLockFree() appears to have been unused since it was introduced
by commit 499abb0c0f. There was agreement to remove these in 2020,
but it never happened. Since we still have agreement for removal
in 2026, let's do that now.
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/aZX2oUcKf7IzHnnK%40nathan
Discussion: https://postgr.es/m/20200608225338.m5zho424w6lpwb2d%40alap3.anarazel.de
This has been a keyword since C99, and we now require C11, so we no
longer need to use __inline__ or to check for it at configure time.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/aZdGbDaV4_yKCMc-%40nathan
Commit e222534679 adjusted the logic in
add_path() to keep the path list sorted by disabled_nodes and then by
total_cost, but failed to make the corresponding adjustment to
add_partial_path. As a result, add_partial_path might sort the path list
just by total cost, which could lead to later planner misbehavior.
In principle, this should be back-patched to v18, but we are typically
reluctant to back-patch planner fixes for fear of destabilizing working
installations, and it is unclear to me that this has sufficiently
serious consequences to justify an exception, so for now, no back-patch.
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: http://postgr.es/m/CAMbWs4-mO3jMK4t_LgcJ+7Eo=NmGgkxettgRaVbJzZvVZ1koMA@mail.gmail.com
The source version of pg_hba.conf.sample contains
@remove-line-for-nolocal@ markers that indicate which lines should
be deleted for an installation that doesn't HAVE_UNIX_SOCKETS.
We no longer support that case, and since commit f55808828 all
that initdb is doing is unconditionally removing the markers.
We might as well remove the markers from the source version and
drop the removal code, which is unintelligible now anyway.
This will not of course save any noticeable number of cycles
in initdb, but it might save some confusion for future
developers looking at pg_hba.conf.sample. It also reduces the
number of distinct cases that replace_token() has to support,
possibly allowing some tightening of that function.
Discussion: https://postgr.es/m/2287786.1771458157@sss.pgh.pa.us
This commit allows setting wal_receiver_timeout per subscription
using the CREATE SUBSCRIPTION and ALTER SUBSCRIPTION commands.
The value is stored in the subwalrcvtimeout column of the pg_subscription
catalog.
When set, this value overrides the global wal_receiver_timeout for
the subscription's apply worker. The default is -1, which means the
global setting (from the server configuration, command line, role,
or database) remains in effect.
This feature is useful for configuring different timeout values for
each subscription, especially when connecting to multiple publisher
servers, to improve failure detection.
Bump catalog version.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/a1414b64-bf58-43a6-8494-9704975a41e9@oss.nttdata.com
When multiple subscribers connect to different publisher servers,
it can be useful to set different wal_receiver_timeout values for
each connection to better detect failures. However, previously
this wasn't possible, which limited flexibility in managing subscriptions.
This commit changes wal_receiver_timeout to be user-settable,
allowing different values to be assigned using ALTER ROLE SET for
each subscription owner. This effectively enables per-subscription
configuration.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/a1414b64-bf58-43a6-8494-9704975a41e9@oss.nttdata.com
Checkpoint completion log messages include more detail than checkpoint
start messages, but previously omitted the checkpoint request flags,
which were only logged at checkpoint start. As a result, users had to
correlate completion messages with earlier start messages to see
the full context.
This commit includes the checkpoint request flags in the checkpoint
completion log message as well. This duplicates some information,
but makes the completion message self-contained and easier to interpret.
Author: Soumya S Murali <soumyamurali.work@gmail.com>
Reviewed-by: Michael Banck <mbanck@gmx.net>
Reviewed-by: Yuan Li <carol.li2025@outlook.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAMtXxw9tPwV=NBv5S9GZXMSKPeKv5f9hRhSjZ8__oLsoS5jcuA@mail.gmail.com
Instead of using comments to mark fallthrough switch cases, use the
fallthrough attribute. This will (in the future, not here) allow
supporting other compilers besides gcc. The commenting convention is
only supported by gcc, the attribute is supported by clang, and in the
fullness of time the C23 standard attribute would allow supporting
other compilers as well.
Right now, we package the attribute into a macro called
pg_fallthrough. This commit defines that macro and replaces the
existing comments with that macro invocation.
We also raise the level of the gcc -Wimplicit-fallthrough= option from
3 to 5 to enforce the use of the attribute.
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/76a8efcd-925a-4eaf-bdd1-d972cd1a32ff%40eisentraut.org
A fallthrough attribute after the last case is a constraint violation
in C23, and clang warns about it (not about this comment, but if we
changed it to an attribute). Remove it. (There was apparently never
anything after this to fall through to, even in the first commit
da07a1e8565.)
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/76a8efcd-925a-4eaf-bdd1-d972cd1a32ff%40eisentraut.org
As transam's README documents, the general order of actions recommended
when WAL-logging a buffer is to unlock and unpin buffers after leaving a
critical section. This pattern was not being followed by some code
paths of GIN and GiST, adjusted in this commit, where buffers were
either unlocked or unpinned inside a critical section. Based on my
analysis of each code path updated here, there is no reason to not
follow the recommended unlocking/unpin pattern done outside of a
critical section.
These inconsistencies are rather old, coming mainly from ecaa4708e5
and ff301d6e69. The guidelines in the README predate these commits,
being introduced in 6d61cdec07.
Author: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CALdSSPgBPnpNNzxv0Y+_GNFzW6PmzRZYh+_hpf06Y1N2zLhZaQ@mail.gmail.com
Up to now, to create such a function, one had to make a pg_proc.dat
entry and then overwrite it with a CREATE OR REPLACE command in
system_functions.sql. That's error-prone (cf. bug #19409) and
results in leaving dead rows in the initial contents of pg_proc.
Manual maintenance of pg_node_tree strings seems entirely impractical,
and parsing expressions during bootstrap would be extremely difficult
as well. But Andres Freund observed that all the current use-cases
are simple constants, and building a Const node is well within the
capabilities of bootstrap mode. So this patch invents a special case:
if bootstrap mode is asked to ingest a non-null value for
pg_proc.proargdefaults (which would otherwise fail in
pg_node_tree_in), it parses the value as an array literal and then
feeds the element strings to the input functions for the corresponding
parameter types. Then we can build a suitable pg_node_tree string
with just a few more lines of code.
This allows removing all the system_functions.sql entries that are
just there to set up default arguments, replacing them with
proargdefaults fields in pg_proc.dat entries. The old technique
remains available in case someone needs a non-constant default.
The initial contents of pg_proc are demonstrably the same after
this patch, except that (1) json_strip_nulls and jsonb_strip_nulls
now have the correct provolatile setting, as per bug #19409;
(2) pg_terminate_backend, make_interval, and drandom_normal
now have defaults that don't include a type coercion, which is
how they should have been all along.
In passing, remove some unused entries from bootstrap.c's TypInfo[]
array. I had to add some new ones because we'll now need an entry for
each default-possessing system function parameter, but we shouldn't
carry more than we need there; it's just a maintenance gotcha.
Bug: #19409
Reported-by: Lucio Chiessi <lucio.chiessi@trustly.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Author: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/183292bb-4891-4c96-a3ca-e78b5e0e1358@dunslane.net
Discussion: https://postgr.es/m/19409-e16cd2605e59a4af@postgresql.org
The previous default bgworker_die() signal would exit with elog(FATAL)
directly from the signal handler. That could cause deadlocks or
crashes if the signal handler runs while we're e.g holding a spinlock
or in the middle of a memory allocation.
All the built-in background workers overrode that to use the normal
die() handler and CHECK_FOR_INTERRUPTS(). Let's make that the default
for all background workers. Some extensions relying on the old
behavior might need to adapt, but the new default is much safer and is
the right thing to do for most background workers.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/5238fe45-e486-4c62-a7f3-c7d8d416e812@iki.fi
Previously, if --stamp_file was specified, libpq_check.pl would create a
new stamp file only if none could be found. If there was already a
stamp file, the script would do nothing, leaving the previous stamp file
in place. This logic could cause unnecessary rebuilds because meson
relies on the timestamp of the output files to determine if a rebuild
should happen. In this case, a stamp file generated during an older
check would be kept, but we need a stamp file from the latest moment
where the libpq check has been run, so as correct rebuild decisions can
be taken.
This commit changes libpq_check.pl so as a fresh stamp file is created
each time libpq_check.pl is run, when --stamp_file is specified.
Oversight in commit 4a8e6f43a6.
Reported-by: Andres Freund <andres@anarazel.de>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: VASUKI M <vasukim1992002@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ22rrN6gCn7urtmTR=_5z7ArZLUJu-TsMChdXwmRTaquA@mail.gmail.com
The main purpose of this change is to allow an ABI checker to understand
when the list of SysCacheIdentifier changes, by switching all the
routine declarations that relied on a signed integer for a syscache ID
to this new type. This is going to be useful in the long-term for
versions newer than v19 so as we will be able to check when the list of
values in SysCacheIdentifier is updated in a non-ABI compliant fashion.
Most of the changes of this commit are due to the new definition of
SyscacheCallbackFunction, where a SysCacheIdentifier is now required for
the syscache ID. It is a mechanical change, still slightly invasive.
There are more areas in the tree that could be improved with an ABI
checker in mind; this takes care of only one area.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/289125.1770913057@sss.pgh.pa.us
This commit tweaks the generation of the syscache IDs for the enum
SysCacheIdentifier to now include an invalid value, with -1 assigned as
value. The concept of an invalid syscache ID exists when handling
lookups of a ObjectAddress, based on their set of properties in
ObjectPropertyType. -1 is used for the case where an object type has no
option for a syscache lookup.
This has been found as independently useful while discussing a switch of
SysCacheIdentifier to a typedef, as we already have places that want to
know about the concept of an invalid value when dealing with
ObjectAddresses.
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://postgr.es/m/aZQRnmp9nVjtxAHS@paquier.xyz
get_catalog_object_by_oid_extended() has been doing a syscache lookup
when given a cache ID strictly higher than 0, which is wrong because the
first valid value of SysCacheIdentifier is 0.
This issue had no consequences, as the first value assigned in the
enum SysCacheIdentifier is AGGFNOID, which is not used in the object
type properties listed in objectaddress.c. Even if an ID of 0 was
hypotherically given, the code would still work with a less efficient
heap-or-index scan.
Discussion: https://postgr.es/m/aZTr_R6JGmqokUBb@paquier.xyz
... instead of passing a bunch of separate booleans.
Also, rearrange the argument list in a hopefully more sensible order.
Discussion: https://postgr.es/m/202602111846.xpvuccb3inbx@alvherre.pgsql
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> (older version)
Commit 38e0190ced forgot to pfree() an allocation (freed in other
places of the same function) in only one of several spots in
check_log_min_messages(). Per Coverity. Add that.
While at it, avoid open-coding guc_strdup(). The new coding does a
strlen() that wasn't there before, but I doubt it's measurable.
Previously, SIGINT was treated the same as SIGTERM in walwriter and
walsummarizer. That decision goes back to when the walwriter process
was introduced (commit ad4295728e), and was later copied to
walsummarizer. It was a pretty arbitrary decision back then, and we
haven't adopted that convention in all the other processes that have
been introduced later.
Summary of how other processes respond to SIGINT:
- Autovacuum launcher: Cancel the current iteration of launching
- bgworker: Ignore (unless connected to a database)
- checkpointer: Request shutdown checkpoint
- bgwriter: Ignore
- pgarch: Ignore
- startup process: Ignore
- walreceiver: Ignore
- IO worker: die()
IO workers are a notable exception in that they exit on SIGINT, and
there's a documented reason for that: IO workers ignore SIGTERM, so
SIGINT provides a way to manually kill them. (They do respond to
SIGUSR2, though, like all the other processes that we don't want to
exit immediately on SIGTERM on operating system shutdown.)
To make this a little more consistent, ignore SIGINT in walwriter and
walsummarizer. They have no "query" to cancel, and they react to
SIGTERM just fine.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/818bafaf-1e77-4c78-8037-d7120878d87c@iki.fi
Most of the StaticAssert macros already worked in C++ with Clang and
GCC:(the only compilers we're currently testing C++ extension support
for). This adds a regression test for them in our test C++ extension,
so we can safely change their implementation without accidentally
breaking C++.
The only macros that StaticAssert macros that don't work yet are the
StaticAssertVariableIsOfType and StaticAssertVariableIsOfTypeMacro.
These will be added in a follow-on commit.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/CAGECzQR21OnnKiZO_1rLWO0-16kg1JBxnVq-wymYW0-_1cUNtg@mail.gmail.com
All of these macros already work in C++ with Clang and GCC (the only
compilers we're currently testing C++ extension support for). This
adds a regression test for them in our test C++ extension, so we can
safely change their implementation without accidentally breaking C++.
Some of the List macros didn't work in C++ in the past (see commit
d5ca15ee5), and this would have caught that.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/CAGECzQR21OnnKiZO_1rLWO0-16kg1JBxnVq-wymYW0-_1cUNtg@mail.gmail.com
Commit c67bef3f32 introduced this test helper function for use by
src/test/regress/sql/encoding.sql, but its logic was incorrect. It
confused an encoding ID for a boolean so it gave the wrong results for
some inputs, and also forgot the usual return macro. The mistake didn't
affect values actually used in the test, so there is no change in
behavior.
Also drop it and another missed function at the end of the test, for
consistency.
Backpatch-through: 14
Author: Zsolt Parragi <zsolt.parragi@percona.com>
Various buildfarm members, having compilers like gcc 8.5 and 6.3, fail
to deduce that text_substring() variable "E" is initialized if
slice_size!=-1. This suppression approach quiets gcc 8.5; I did not
reproduce the warning elsewhere. Back-patch to v14, like commit
9f4fd119b2.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1157953.1771266105@sss.pgh.pa.us
Backpatch-through: 14
The receive function of hstore was not able to handle correctly
duplicate key values when a new duplicate links to a NULL value, where a
pfree() could be attempted on a NULL pointer, crashing due to a pointer
dereference.
This problem would happen for a COPY BINARY, when stacking values like
that:
aa => 5
aa => null
The second key/value pair is discarded and pfree() calls are attempted
on its key and its value, leading to a pointer dereference for the value
part as the value is NULL. The first key/value pair takes priority when
a duplicate is found.
Per offline report.
Reported-by: "Anemone" <vergissmeinnichtzh@gmail.com>
Reported-by: "A1ex" <alex000young@gmail.com>
Backpatch-through: 14
Before v12, pg_largeobject_metadata was defined WITH OIDS, so
unlike newer versions, the "oid" column was a hidden system column
that pg_dump's getTableAttrs() will not pick up. Thus, for commit
161a3e8b68, we did not bother trying to use COPY for
pg_largeobject_metadata for upgrades from older versions. This
commit removes that restriction by adjusting the query in
getTableAttrs() to pick up the "oid" system column and by teaching
dumpTableData_copy() to use COPY (SELECT ...) for this catalog,
since system columns cannot be used in COPY's column list.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/aYzuAz_ITUpd9ZvH%40nathan
syscache_info.h was installed into $installdir/include/server/catalog
if you use a non-VPATH autoconf build, but not if you use a VPATH
build or meson. That happened because the makefiles blindly install
src/include/catalog/*.h, and in a non-VPATH build the generated
header files would be swept up in that. While it's hard to conjure
a reason to need syscache_info.h outside of backend build, it's
also hard to get the makefiles to skip syscache_info.h, so let's
go the other way and install it in the other two cases too.
Another problem, new in v19, was that meson builds install a copy of
src/include/catalog/README, while autoconf builds do not. The issue
here is that that file is new and wasn't added to meson.build's
exclusion list.
While it's clearly a bug if different build methods don't install
the same set of files, I doubt anyone would thank us for changing
the behavior in released branches. Hence, fix in master only.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/946828.1771185367@sss.pgh.pa.us
The X25519 curve is not allowed when OpenSSL is configured for
FIPS mode, so add a note to the documentation that the default
setting must be altered for such setups.
Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3521653.1770666093@sss.pgh.pa.us
The X25519 curve is disallowed when OpenSSL is configured for
FIPS mode which makes the testsuite fail. Since X25519 isn't
required for the tests we can remove it to allow FIPS enabled
configurations to run the tests.
Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3521653.1770666093@sss.pgh.pa.us
This completes the work started by commit 75f49221c2.
In basebackup.c, changing the StaticAssertStmt to StaticAssertDecl
results in having the same StaticAssertDecl() in 2 functions. So, it
makes more sense to move it to file scope instead.
Also, as it depends on some computations based on 2 tar blocks, define
TAR_NUM_TERMINATION_BLOCKS.
In deadlock.c, change the StaticAssertStmt to StaticAssertDecl and
keep it in the function scope. Add new braces to avoid warning from
-Wdeclaration-after-statement.
In aset.c, change the StaticAssertStmt to StaticAssertDecl and move it
to file scope.
Finally, update the comments in c.h a bit.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/aYH6ii46AvGVCB84%40ip-10-97-1-34.eu-west-3.compute.internal
When both standby.signal and recovery.signal are present, standby.signal
takes precedence and the server runs in standby mode. Previously,
in this case, recovery.signal was not removed at the end of standby mode
(i.e., on promotion) or at the end of archive recovery, while standby.signal
was removed. As a result, a leftover recovery.signal could cause
a subsequent restart to enter archive recovery unexpectedly, potentially
preventing the server from starting. This behavior was surprising and
confusing to users.
This commit fixes the issue by updating the recovery code to remove
recovery.signal alongside standby.signal when both files are present and
recovery completes.
Because this code path is particularly sensitive and changes in recovery
behavior can be risky for stable branches, this change is applied only to
the master branch.
Reported-by: Nikolay Samokhvalov <nik@postgres.ai>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: David Steele <david@pgbackrest.org>
Discussion: https://postgr.es/m/CAM527d8PVAQFLt_ndTXE19F-XpDZui861882L0rLY3YihQB8qA@mail.gmail.com
The error message added in 379695d3cc referred to the public key being
too long. This is confusing as it is in fact the session key included
in a PGP message which is too long. This is harmless, but let's be
precise about what is wrong.
Per offline report.
Reported-by: Zsolt Parragi <zsolt.parragi@percona.com>
Backpatch-through: 14
Commit 1e7fe06c10 changed
pg_mbstrlen_with_len() to ereport(ERROR) if the input ends in an
incomplete character. Most callers want that. text_substring() does
not. It detoasts the most bytes it could possibly need to get the
requested number of characters. For example, to extract up to 2 chars
from UTF8, it needs to detoast 8 bytes. In a string of 3-byte UTF8
chars, 8 bytes spans 2 complete chars and 1 partial char.
Fix this by replacing this pg_mbstrlen_with_len() call with a string
traversal that differs by stopping upon finding as many chars as the
substring could need. This also makes SUBSTRING() stop raising an
encoding error if the incomplete char is past the end of the substring.
This is consistent with the general philosophy of the above commit,
which was to raise errors on a just-in-time basis. Before the above
commit, SUBSTRING() never raised an encoding error.
SUBSTRING() has long been detoasting enough for one more char than
needed, because it did not distinguish exclusive and inclusive end
position. For avoidance of doubt, stop detoasting extra.
Back-patch to v14, like the above commit. For applications using
SUBSTRING() on non-ASCII column values, consider applying this to your
copy of any of the February 12, 2026 releases.
Reported-by: SATŌ Kentarō <ranvis@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Bug: #19406
Discussion: https://postgr.es/m/19406-9867fddddd724fca@postgresql.org
Backpatch-through: 14
The prior order caused spurious Valgrind errors. They're spurious
because the ereport(ERROR) non-local exit discards the pointer in
question. pg_mblen_cstr() ordered the checks correctly, but these other
two did not. Back-patch to v14, like commit
1e7fe06c10.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20260214053821.fa.noahmisch@microsoft.com
Backpatch-through: 14
Radix sort can be much faster than quicksort, but for our purposes it
is limited to sequences of unsigned bytes. To make tuples with other
types amenable to this technique, several features of tuple comparison
must be accounted for, i.e. the sort key must be "normalized":
1. Signedness -- It's possible to modify a signed integer such that
it can be compared as unsigned. For example, a signed char has range
-128 to 127. If we cast that to unsigned char and add 128, the range
of values becomes 0 to 255 while preserving order.
2. Direction -- SQL allows specification of ASC or DESC. The
descending case is easily handled by taking the complement of the
unsigned representation.
3. NULL values -- NULLS FIRST and NULLS LAST must work correctly.
This commmit only handles the case where datum1 is pass-by-value
Datum (possibly abbreviated) that compares like an ordinary
integer. (Abbreviations of values of type "numeric" are a convenient
counterexample.) First, tuples are partitioned by nullness in the
correct NULL ordering. Then the NOT NULL tuples are sorted with radix
sort on datum1. For tiebreaks on subsequent sortkeys (including the
first sort key if abbreviated), we divert to the usual qsort.
ORDER BY queries on pre-warmed buffers are up to 2x faster on high
cardinality inputs with radix sort than the sort specializations added
by commit 697492434, so get rid of them. It's sufficient to fall back
to qsort_tuple() for small arrays. Moderately low cardinality inputs
show more modest improvents. Our qsort is strongly optimized for very
low cardinality inputs, but radix sort is usually equal or very close
in those cases.
The changes to the regression tests are caused by under-specified sort
orders, e.g. "SELECT a, b from mytable order by a;". For unstable
sorts, such as our qsort and this in-place radix sort, there is no
guarantee of the order of "b" within each group of "a".
The implementation is taken from ska_byte_sort() (Boost licensed),
which is similar to American flag sort (an in-place radix sort) with
modifications to make it better suited for modern pipelined CPUs.
The technique of normalization described above can also be extended
to the case of multiple keys. That is left for future work (Thanks
to Peter Geoghegan for the suggestion to look into this area).
Reviewed-by: Chengpeng Yan <chengpeng_yan@outlook.com>
Reviewed-by: zengman <zengman@halodbtech.com>
Reviewed-by: ChangAo Chen <cca5507@qq.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com> (earlier version)
Discussion: https://postgr.es/m/CANWCAZYzx7a7E9AY16Jt_U3+GVKDADfgApZ-42SYNiig8dTnFA@mail.gmail.com
Commit dfd79e2d added a TODO comment to update this paragraph
when support for PASSING was added. Commit 6185c9737c added
PASSING but missed resolving this TODO. Fix by expanding the
paragraph with a reference to PASSING.
Author: Aditya Gollamudi <adigollamudi@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/20260117051406.sx6pss4ryirn2x4v@pgs
The URL for Ditaa linked to the old Sourceforge version which is
too old for what we need, the fork over on Github is the correct
version to use for re-generating the SVG files for the docs. The
required Ditaa version is 0.11.0 as it when SVG support as added.
Running the version found on Sourceforge produce the error below:
$ ditaa -E -S --svg in.txt out.txt
Unrecognized option: --svg
usage: ditaa <INPFILE> [OUTFILE] [-A] [-b <BACKGROUND>] [-d] [-E] [-e
<ENCODING>] [-h] [--help] [-o] [-r] [-S] [-s <SCALE>] [-T] [-t
<TABS>] [-v] [-W]
While there, also mention that meson rules exists for building
images.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://postgr.es/m/CAN55FZ2O-23xERF2NYcvv9DM_1c9T16y6mi3vyP=O1iuXS0ASA@mail.gmail.com
This adds an 'images' target to the meson build system in order to
be able to regenerate the images used in the docs.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAN55FZ0c0Tcjx9=e-YibWGHa1-xmdV63p=THH4YYznz+pYcfig@mail.gmail.com
The idea is to encourage more the use of these allocation routines
across the tree, as these offer stronger type safety guarantees than
pg_malloc() & co (type cast in the result, sizeof() embedded). This set
of changes is dedicated to the pg_dump code.
Similar work has been done as of 31d3847a37, as one example.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@tigerdata.com>
Discussion: https://postgr.es/m/CAHut+PvpGPDLhkHAoxw_g3jdrYxA1m16a8uagbgH3TGWSKtXNQ@mail.gmail.com
Use BackgroundPsql's published API for automatically restarting
its timer for each query, rather than manually reaching into it
to achieve the same thing.
010_tab_completion.pl's logic for this predates the invention
of BackgroundPsql (and 664d75753 missed the opportunity to
make it cleaner). 030_pager.pl copied-and-pasted the code.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1100715.1712265845@sss.pgh.pa.us
This log message was referring to conflicts, but it is about checksum
failures. The log message improved in this commit should never show up,
due to the fact that pgstat_prepare_report_checksum_failure() should
always be called before pgstat_report_checksum_failures_in_db(), with a
stats entry already created in the pgstats shared hash table. The three
code paths able to report database-level checksum failures follow
already this requirement.
Oversight in b96d3c3897.
Author: Wang Peng <215722532@qq.com>
Discussion: https://postgr.es/m/tencent_9B6CD6D9D34AE28CDEADEC6188DB3BA1FE07@qq.com
Backpatch-through: 18
It's currently only used in the server, but it was placed in src/port
with the idea that it might be useful in client programs too. However,
it will currently fail to link if used in a client program, because
CHECK_FOR_INTERRUPTS() is not usable in client programs. Fix that by
wrapping it in "#ifndef FRONTEND".
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://www.postgresql.org/message-id/21cc7a48-99d9-4f69-9a3f-2c2de61ac8e5%40iki.fi
Backpatch-through: 18
The uses of these functions do not justify the level of
micro-optimization we've done and may even hurt performance in some
cases (e.g., due to using function pointers). This commit removes
all architecture-specific implementations of pg_popcount{32,64} and
converts the portable ones to inlined functions in pg_bitutils.h.
These inlined versions should produce the same code as before (but
inlined), so in theory this is a net gain for many machines. A
follow-up commit will replace the remaining loops over these
word-length popcount functions with calls to pg_popcount(), further
reducing the need for architecture-specific implementations.
Suggested-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Greg Burd <greg@burd.me>
Discussion: https://postgr.es/m/CANWCAZY7R%2Biy%2Br9YM_sySNydHzNqUirx1xk0tB3ej5HO62GdgQ%40mail.gmail.com
Over the past few releases, we've added a huge amount of complexity
to our popcount implementations. Commits fbe327e5b4, 79e232ca01,
8c6653516c, and 25dc485074 did some preliminary refactoring, but
many opportunities remain. In particular, if we disclaim interest
in micro-optimizing this code for 32-bit builds and in unnecessary
alignment checks on x86-64, we can remove a decent chunk of code.
I cannot find public discussion or benchmarks for the code this
commit removes, but it seems unlikely that this change will
noticeably impact performance on affected systems.
Suggested-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CANWCAZY7R%2Biy%2Br9YM_sySNydHzNqUirx1xk0tB3ej5HO62GdgQ%40mail.gmail.com
This adds a new ON CONFLICT action DO SELECT [FOR UPDATE/SHARE], which
returns the pre-existing rows when conflicts are detected. The INSERT
statement must have a RETURNING clause, when DO SELECT is specified.
The optional FOR UPDATE/SHARE clause allows the rows to be locked
before they are are returned. As with a DO UPDATE conflict action, an
optional WHERE clause may be used to prevent rows from being selected
for return (but as with a DO UPDATE action, rows filtered out by the
WHERE clause are still locked).
Bumps catversion as stored rules change.
Author: Andreas Karlsson <andreas@proxel.se>
Author: Marko Tiikkaja <marko@joh.to>
Author: Viktor Holmberg <v@viktorh.net>
Reviewed-by: Joel Jacobson <joel@compiler.org>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/d631b406-13b7-433e-8c0b-c6040c4b4663@Spark
Discussion: https://postgr.es/m/5fca222d-62ae-4a2f-9fcb-0eca56277094@Spark
Discussion: https://postgr.es/m/2b5db2e6-8ece-44d0-9890-f256fdca9f7e@proxel.se
Discussion: https://postgr.es/m/CAL9smLCdV-v3KgOJX3mU19FYK82N7yzqJj2HAwWX70E=P98kgQ@mail.gmail.com
Following e68b6adad9, the reason for skipping slot synchronization is
stored as a slot property. This commit removes redundant function
parameters that previously tracked this state, instead relying directly on
the slot property.
Additionally, this change centralizes the logic for skipping
synchronization when required WAL has not yet been received or flushed. By
consolidating this check, we reduce code duplication and the risk of
inconsistent state updates across different code paths.
In passing, add an assertion to ensure a slot is marked as temporary if a
consistent point has not been reached during synchronization.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/TY4PR01MB16907DD16098BE3B20486D4569463A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/CAFPTHDZAA+gWDntpa5ucqKKba41=tXmoXqN3q4rpjO9cdxgQrw@mail.gmail.com
The only place that used p_is_insert was transformAssignedExpr(),
which used it to distinguish INSERT from UPDATE when handling
indirection on assignment target columns -- see commit c1ca3a19df.
However, this information is already available to
transformAssignedExpr() via its exprKind parameter, which is always
either EXPR_KIND_INSERT_TARGET or EXPR_KIND_UPDATE_TARGET.
As noted in the commit message for c1ca3a19df, this use of
p_is_insert isn't particularly pretty, so have transformAssignedExpr()
use the exprKind parameter instead. This then allows p_is_insert to be
removed entirely, which simplifies state management in a few other
places across the parser.
Author: Viktor Holmberg <v@viktorh.net>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/badc3b4c-da73-4000-b8d3-638a6f53a769@Spark
For a LEFT JOIN, if any var from the right-hand side (RHS) is forced
to null by upper-level quals but is known to be non-null for any
matching row, the only way the upper quals can be satisfied is if the
join fails to match, producing a null-extended row. Thus, we can
treat this left join as an anti-join.
Previously, this transformation was limited to cases where the join's
own quals were strict for the var forced to null by upper qual levels.
This patch extends the logic to check table constraints, leveraging
the NOT NULL attribute information already available thanks to the
infrastructure introduced by e2debb643. If a forced-null var belongs
to the RHS and is defined as NOT NULL in the schema (and is not
nullable due to lower-level outer joins), we know that the left join
can be reduced to an anti-join.
Note that to ensure the var is not nullable by any lower-level outer
joins within the current subtree, we collect the relids of base rels
that are nullable within each subtree during the first pass of the
reduce-outer-joins process. This allows us to verify in the second
pass that a NOT NULL var is indeed safe to treat as non-nullable.
Based on a proposal by Nicolas Adenis-Lamarre, but this is not the
original patch.
Suggested-by: Nicolas Adenis-Lamarre <nicolas.adenis.lamarre@gmail.com>
Author: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CACPGbctKMDP50PpRH09in+oWbHtZdahWSroRstLPOoSDKwoFsw@mail.gmail.com
If the variable's value is null, exec_stmt_return() missed filling
in estate->rettype. This is a pretty old bug, but we'd managed not
to notice because that value isn't consulted for a null result ...
unless we have to cast it to a domain. That case led to a failure
with "cache lookup failed for type 0".
The correct way to assign the data type is known by exec_eval_datum.
While we could copy-and-paste that logic, it seems like a better
idea to just invoke exec_eval_datum, as the ROW case already does.
Reported-by: Pavel Stehule <pavel.stehule@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFj8pRBT_ahexDf-zT-cyH8bMR_qcySKM8D5nv5MvTWPiatYGA@mail.gmail.com
Backpatch-through: 14
The pg_stat_activity view shows information for aux processes, but the
pg_stat_get_backend_wait_event() and
pg_stat_get_backend_wait_event_type() functions did not. To fix, call
AuxiliaryPidGetProc(pid) if BackendPidGetProc(pid) returns NULL, like
we do in pg_stat_get_activity().
In version 17 and above, it's a little silly to use those functions
when we already have the ProcNumber at hand, but it was necessary
before v17 because the backend ID was different from ProcNumber. I
have other plans for wait_event_info on master, so it doesn't seem
worth applying a different fix on different versions now.
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/c0320e04-6e85-4c49-80c5-27cfb3a58108@iki.fi
Backpatch-through: 14
This commit adds a new parameter called
password_expiration_warning_threshold that controls when the server
begins emitting imminent-password-expiration warnings upon
successful password authentication. By default, this parameter is
set to 7 days, but this functionality can be disabled by setting it
to 0. This patch also introduces a new "connection warning"
infrastructure that can be reused elsewhere. For example, we may
want to warn about the use of MD5 passwords for a couple of
releases before removing MD5 password support.
Author: Gilles Darold <gilles@darold.net>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: songjinzhou <tsinghualucky912@foxmail.com>
Reviewed-by: liu xiaohui <liuxh.zj.cn@gmail.com>
Reviewed-by: Yuefei Shi <shiyuefei1004@gmail.com>
Reviewed-by: Steven Niu <niushiji@gmail.com>
Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/129bcfbf-47a6-e58a-190a-62fc21a17d03%40migops.com
The buildfarm occasionally shows a variant row order in the output
of this UPDATE ... RETURNING, implying that the preceding INSERT
dropped one of the rows into some free space within the table rather
than appending them all at the end. It's not entirely clear why that
happens some times and not other times, but we have established that
it's affected by concurrent activity in other databases of the
cluster. In any case, the behavior is not wrong; the test is at fault
for presuming that a seqscan will give deterministic row ordering.
Add an ORDER BY atop the update to stop the buildfarm noise.
The buildfarm seems to have shown this only in v18 and master
branches, but just in case the cause is older, back-patch to
all supported branches.
Discussion: https://postgr.es/m/3866274.1770743162@sss.pgh.pa.us
Backpatch-through: 14
* Remove an unused variable
* Use "default log level" consistently (instead of "generic")
* Keep the process types in alphabetical order (missed one place in the
SGML docs)
* Since log_min_messages type was changed from enum to string, it
is a good idea to add single quotes when printing it out. Otherwise
it fails if the user copies and pastes from the SHOW output to SET,
except in the simplest case. Using single quotes reduces confusion.
* Use lowercase string for the burned-in default value, to keep the same
output as previous versions.
Author: Euler Taveira <euler@eulerto.com>
Author: Man Zeng <zengman@halodbtech.com>
Author: Noriyoshi Shinoda <noriyoshi.shinoda@hpe.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/202602091250.genyflm2d5dw@alvherre.pgsql
It protects the freeProcs and some other fields in ProcGlobal, so
let's move it there. It's good for cache locality to have it next to
the thing it protects, and just makes more sense anyway. I believe it
was allocated as a separate shared memory area just for historical
reasons.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/b78719db-0c54-409f-b185-b0d59261143f@iki.fi
On the INSERT page, mention that SELECT privileges are also required
for any columns mentioned in the arbiter clause, including those
referred to by the constraint, and clarify that this applies to all
forms of ON CONFLICT, not just ON CONFLICT DO UPDATE.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Viktor Holmberg <v@viktorh.net>
Discussion: https://postgr.es/m/CAEZATCXGwMQ+x00YY9XYG46T0kCajH=21QaYL9Xatz0dLKii+g@mail.gmail.com
Backpatch-through: 14
On the CREATE POLICY page, the description of per-command policies
stated that SELECT policies are applied when an INSERT has an ON
CONFLICT DO NOTHING clause. However, that is only the case if it
includes an arbiter clause, so clarify that.
While at it, also clarify the comment in the regression tests that
cover this.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Viktor Holmberg <v@viktorh.net>
Discussion: https://postgr.es/m/CAEZATCXGwMQ+x00YY9XYG46T0kCajH=21QaYL9Xatz0dLKii+g@mail.gmail.com
Backpatch-through: 14
An extension (or core code) might want to reconstruct the planner's
decisions about whether and where to perform partitionwise joins from
the final plan. To do so, it must be possible to find all of the RTIs
of partitioned tables appearing in the plan. But when an AppendPath
or MergeAppendPath pulls up child paths from a subordinate AppendPath
or MergeAppendPath, the RTIs of the subordinate path do not appear
in the final plan, making this kind of reconstruction impossible.
To avoid this, propagate the RTI sets that would have been present
in the 'apprelids' field of the subordinate Append or MergeAppend
nodes that would have been created into the surviving Append or
MergeAppend node, using a new 'child_append_relid_sets' field for
that purpose. The value of this field is a list of Bitmapsets,
because each relation whose append-list was pulled up had its own
set of RTIs: just one, if it was a partitionwise scan, or more than
one, if it was a partitionwise join. Since our goal is to see where
partitionwise joins were done, it is essential to avoid losing the
information about how the RTIs were grouped in the pulled-up
relations.
This commit also updates pg_overexplain so that EXPLAIN (RANGE_TABLE)
will display the saved RTI sets.
Co-authored-by: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
This commit changes the definition of varlena to a typedef, so as it
becomes possible to remove "struct" markers from various declarations in
the code base. Historically, "struct" markers are not the project style
for variable declarations, so this update simplifies the code and makes
it more consistent across the board.
This change has an impact on the following structures, simplifying
declarations using them:
- varlena
- varatt_indirect
- varatt_external
This cleanup has come up in a different path set that played with
TOAST and varatt.h, independently worth doing on its own.
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aW8xvVbovdhyI4yo@paquier.xyz
An extension (or core code) might want to reconstruct the planner's
choice of join order from the final plan. To do so, it must be possible
to find all of the RTIs that were part of the join problem in that plan.
Commit adbad833f3, together with the
earlier work in 8c49a484e8, is enough to
let us match up RTIs we see in the final plan with RTIs that we see
during the planning cycle, but we still have a problem if the planner
decides to drop some RTIs out of the final plan altogether.
To fix that, when setrefs.c removes a SubqueryScan, single-child Append,
or single-child MergeAppend from the final Plan tree, record the type of
the removed node and the RTIs that the removed node would have scanned
in the final plan tree. It would be natural to record this information
on the child of the removed plan node, but that would require adding an
additional pointer field to type Plan, which seems undesirable. So,
instead, store the information in a separate list that the executor need
never consult, and use the plan_node_id to identify the plan node with
which the removed node is logically associated.
Also, update pg_overexplain to display these details.
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
Suppose that we're currently planning a query and, when that same
query was previously planned and executed, we learned something about
how a certain table within that query should be planned. We want to
take note when that same table is being planned during the current
planning cycle, but this is difficult to do, because the RTI of the
table from the previous plan won't necessarily be equal to the RTI
that we see during the current planning cycle. This is because each
subquery has a separate range table during planning, but these are
flattened into one range table when constructing the final plan,
changing RTIs.
Commit 8c49a484e8 allows us to match up
subqueries seen in the previous planning cycles with the subqueries
currently being planned just by comparing textual names, but that's
not quite enough to let us deduce anything about individual tables,
because we don't know where each subquery's range table appears in
the final, flattened range table.
To fix that, store a list of SubPlanRTInfo objects in the final
planned statement, each including the name of the subplan, the offset
at which it begins in the flattened range table, and whether or not
it was a dummy subplan -- if it was, some RTIs may have been dropped
from the final range table, but also there's no need to control how
a dummy subquery gets planned. The toplevel subquery has no name and
always begins at rtoffset 0, so we make no entry for it.
This commit teaches pg_overexplain's RANGE_TABLE option to make use
of this new data to display the subquery name for each range table
entry.
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
Commit 94f3ad3961 failed to do this
because I couldn't think of a use for the information, but this has
proven to be short-sighted. Best to fix it before this code is
officially released.
Now, the only argument to standard_planenr that isn't passed to
planner_setup_hook is boundParams, but that is accessible via
glob->boundParams, and so doesn't need to be passed separately.
Discussion: https://www.postgresql.org/message-id/CA+TgmoYS4ZCVAF2jTce=bMP0Oq_db_srocR4cZyO0OBp9oUoGg@mail.gmail.com
Commit 4020b370f2 had the idea that it
would be a good idea to handle testing PGS_CONSIDER_NONPARTIAL within
cost_material to save callers the trouble, but that turns out not to be
a very good idea. One concern is that it makes cost_material() dependent
on the caller having initialized certain fields in the MaterialPath,
which is a bit awkward for materialize_finished_plan, which wants to use
a dummy path.
Another problem is that it can result in generated materialized nested
loops where the Materialize node is disabled, contrary to the intention
of joinpath.c's logic in match_unsorted_outer() and
consider_parallel_nestloop(), which aims to consider such paths only
when they would not need to be disabled. In the previous coding, it was
possible for the pgs_mask on the joinrel to have PGS_CONSIDER_NONPARTIAL
set, while the inner rel had the same bit clear. In that case, we'd
generate and then disable a Materialize path.
That seems wrong, so instead, pull up the logic to test the
PGS_CONSIDER_NONPARTIAL bit into joinpath.c, restoring the historical
behavior that either we don't generate a given materialized nested loop
in the first place, or we don't disable it.
Discussion: http://postgr.es/m/CA+TgmoawzvCoZAwFS85tE5+c8vBkqgcS8ZstQ_ohjXQ9wGT9sw@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoYS4ZCVAF2jTce=bMP0Oq_db_srocR4cZyO0OBp9oUoGg@mail.gmail.com
Two changes here:
1. Introduce a separate RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK flag to
indicate a suspected deadlock that involves a buffer pin. Previously
the startup process used the same flag for a deadlock involving just
regular locks, and to check for deadlocks involving the buffer
pin. The cases are handled separately in the startup process, but the
receiving backend had to deduce which one it was based on
HoldingBufferPinThatDelaysRecovery(). With a separate flag, the
receiver doesn't need to guess.
2. Rewrite the ProcessRecoveryConflictInterrupt() function to not rely
on fallthrough through the switch-statement. That was difficult to
read.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/4cc13ba1-4248-4884-b6ba-4805349e7f39@iki.fi
The log messages used in this file applied too much quoting logic:
- No need for quote_identifier(), which is fine to not use in the
context of a log entry.
- The usual project style is to group the namespace and object together
in a quoted string, when mentioned in an log message. This code quoted
the namespace name and the extended statistics object name separately,
which was confusing.
Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20260210.143752.1113524465620875233.horikyota.ntt@gmail.com
This commit adds three attributes to the system view pg_stats_ext_exprs,
whose data can exist when involving a range type in an expression:
range_length_histogram
range_empty_frac
range_bounds_histogram
These statistics fields exist since 918eee0c49, and have become
viewable in pg_stats later in bc3c8db8ae. This puts the definition of
pg_stats_ext_exprs on par with pg_stats.
This issue has showed up during the discussion about the restore of
extended statistics for expressions, so as it becomes possible to query
the stats data to restore from the catalogs. Having access to this data
is useful on its own, without the restore part.
Some documentation and some tests are added, written by me. Corey has
authored the part in system_views.sql.
Bump catalog version.
Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aYmCUx9VvrKiZQLL@paquier.xyz
In the spirit of 8d19d0e13, this patch teaches the planner about the
principle that NullTest with !argisrow is fully equivalent to SQL's IS
[NOT] DISTINCT FROM NULL.
The parser already performs this transformation for literal NULLs.
However, a DistinctExpr expression with one input evaluating to NULL
during planning (e.g., via const-folding of "1 + NULL" or parameter
substitution in custom plans) currently remains as a DistinctExpr
node.
This patch closes the gap for const-folded NULLs. It specifically
targets the case where one input is a constant NULL and the other is a
nullable non-constant expression. (If the other input were otherwise,
the DistinctExpr node would have already been simplified to a constant
TRUE or FALSE.)
This transformation can be beneficial because NullTest is much more
amenable to optimization than DistinctExpr, since the planner knows a
good deal about the former and next to nothing about the latter.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49BMAOWvkdSHxpUDnniqJcEcGq3_8dd_5wTR4xrQY8urA@mail.gmail.com
The BooleanTest construct (IS [NOT] TRUE/FALSE/UNKNOWN) treats a NULL
input as the logical value "unknown". However, when the input is
proven to be non-nullable, this special handling becomes redundant.
In such cases, the construct can be simplified directly to a boolean
expression or a constant.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49BMAOWvkdSHxpUDnniqJcEcGq3_8dd_5wTR4xrQY8urA@mail.gmail.com
The IS DISTINCT FROM construct compares values acting as though NULL
were a normal data value, rather than "unknown". Semantically, "x IS
DISTINCT FROM y" yields true if the values differ or if exactly one is
NULL, and false if they are equal or both NULL. Unlike ordinary
comparison operators, it never returns NULL.
Previously, the planner only simplified this construct if all inputs
were constants, folding it to a constant boolean result. This patch
extends the optimization to cases where inputs are non-constant but
proven to be non-nullable. Specifically, "x IS DISTINCT FROM NULL"
folds to constant TRUE if "x" is known to be non-nullable. For cases
where both inputs are guaranteed not to be NULL, the expression
becomes semantically equivalent to "x <> y", and the DistinctExpr is
converted into an inequality OpExpr.
This transformation provides several benefits. It converts the
comparison into a standard operator, allowing the use of partial
indexes and constraint exclusion. Furthermore, if the clause is
negated (i.e., "IS NOT DISTINCT FROM"), it simplifies to an equality
operator. This enables the planner to generate better plans using
index scans, merge joins, hash joins, and EC-based qual deduction.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49BMAOWvkdSHxpUDnniqJcEcGq3_8dd_5wTR4xrQY8urA@mail.gmail.com
For binary upgrades from v16 or newer, pg_upgrade transfers the
files for pg_largeobject_metadata from the old cluster, as opposed
to using COPY or ordinary SQL commands to reconstruct its contents.
While this approach adds complexity, it can greatly reduce
pg_upgrade's runtime when there are many large objects.
Large objects with comments or security labels are one source of
complexity for this approach. During pg_upgrade, schema
restoration happens before files are transferred. Comments and
security labels are transferred in the former step, but the COMMENT
and SECURITY LABEL commands will fail if their corresponding large
objects do not exist. To deal with this, pg_upgrade first copies
only the rows of pg_largeobject_metadata that are needed to avoid
failures. Later, pg_upgrade overwrites those rows by replacing
pg_largeobject_metadata's files with its files in the old cluster.
Unfortunately, there's a subtle problem here. Simply put, there's
no guarantee that pg_upgrade will overwrite all of
pg_largeobject_metadata's files on the new cluster. For example,
the new cluster's version might more aggressively extend relations
or create visibility maps, and pg_upgrade's file transfer code is
not sophisticated enough to remove files that lack counterparts in
the old cluster. These extra files could cause problems
post-upgrade.
More fortunately, we can simultaneously fix the aforementioned
problem and further optimize binary upgrades for clusters with many
large objects. If we teach the COMMENT and SECURITY LABEL commands
to allow nonexistent large objects during binary upgrades,
pg_upgrade no longer needs to transfer pg_largeobject_metadata's
contents beforehand. This approach also allows us to remove the
associated dependency tracking from pg_dump, even for upgrades from
v12-v15 that use COPY to transfer pg_largeobject_metadata's
contents.
In addition to what is described in the previous paragraph, this
commit modifies the query in getLOs() to only retrieve LOs with
comments or security labels for upgrades from v12 or newer. We
have long assumed that such usage is rare, so this should reduce
pg_upgrade's memory usage and runtime in many cases. We might also
be able to remove the "upgrades from v12 or newer" restriction on
the recent batch of optimizations by adding special handling for
pg_largeobject_metadata's hidden OID column on older versions
(since this catalog previously used the now-removed WITH OIDS
feature), but that is left as a future exercise.
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/3yd2ss6n7xywo6pmhd7jjh3bqwgvx35bflzgv3ag4cnzfkik7m%40hiyadppqxx6w
Clean up a few leftovers from when the deadlock checker was called
from signal handler. We stopped doing that in commit 6753333f55, in
year 2015.
- CheckDeadLock can return a return value directly to the caller,
there's no need to use a global variable for that.
- Remove outdated comments that claimed that CheckDeadLock "signals
ProcSleep".
- It should be OK to ereport() from DeadLockCheck now. I considered
getting rid of InitDeadLockChecking() and moving the workspace
allocations into DeadLockCheck, but it's still good to avoid doing
the allocations while we're holding all the partition locks. So just
update the comment to give that as the reason we do the allocations
up front.
They are not and never have been used by any known code -- apparently we
just cargo-culted them in commit 37484ad2aa (or their ancestor macros
anyway, which begat these functions in commit 34694ec888). Allegedly
they're also potentially dangerous; users are better off going through
HeapTupleSetHintBits instead.
Author: Andy Fan <zhihuifan1213@163.com>
Discussion: https://postgr.es/m/87sejogt4g.fsf@163.com
While the preceding commit prevented such attachments from occurring
in future, this one aims to prevent further abuse of any already-
created operator that exposes _int_matchsel to the wrong data types.
(No other contrib module has a vulnerable selectivity estimator.)
We need only check that the Const we've found in the query is indeed
of the type we expect (query_int), but there's a difficulty: as an
extension type, query_int doesn't have a fixed OID that we could
hard-code into the estimator.
Therefore, the bulk of this patch consists of infrastructure to let
an extension function securely look up the OID of a datatype
belonging to the same extension. (Extension authors have requested
such functionality before, so we anticipate that this code will
have additional non-security uses, and may soon be extended to allow
looking up other kinds of SQL objects.)
This is done by first finding the extension that owns the calling
function (there can be only one), and then thumbing through the
objects owned by that extension to find a type that has the desired
name. This is relatively expensive, especially for large extensions,
so a simple cache is put in front of these lookups.
Reported-by: Daniel Firer as part of zeroday.cloud
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2004
Backpatch-through: 14
Selectivity estimators come in two flavors: those that make specific
assumptions about the data types they are working with, and those
that don't. Most of the built-in estimators are of the latter kind
and are meant to be safely attachable to any operator. If the
operator does not behave as the estimator expects, you might get a
poor estimate, but it won't crash.
However, estimators that do make datatype assumptions can malfunction
if they are attached to the wrong operator, since then the data they
get from pg_statistic may not be of the type they expect. This can
rise to the level of a security problem, even permitting arbitrary
code execution by a user who has the ability to create SQL objects.
To close this hole, establish a rule that built-in estimators are
required to protect themselves against being called on the wrong type
of data. It does not seem practical however to expect estimators in
extensions to reach a similar level of security, at least not in the
near term. Therefore, also establish a rule that superuser privilege
is required to attach a non-built-in estimator to an operator.
We expect that this restriction will have little negative impact on
extensions, since estimators generally have to be written in C and
thus superuser privilege is required to create them in the first
place.
This commit changes the privilege checks in CREATE/ALTER OPERATOR
to enforce the rule about superuser privilege, and fixes a couple
of built-in estimators that were making datatype assumptions without
sufficiently checking that they're valid.
Reported-by: Daniel Firer as part of zeroday.cloud
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2004
Backpatch-through: 14
These data types are represented like full-fledged arrays, but
functions that deal specifically with these types assume that the
array is 1-dimensional and contains no nulls. However, there are
cast pathways that allow general oid[] or int2[] arrays to be cast
to these types, allowing these expectations to be violated. This
can be exploited to cause server memory disclosure or SIGSEGV.
Fix by installing explicit checks in functions that accept these
types.
Reported-by: Altan Birler <altan.birler@tum.de>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2003
Backpatch-through: 14
pgp_sym_decrypt() and pgp_pub_decrypt() will raise such errors, while
bytea variants will not. The existing "dat3" test decrypted to non-UTF8
text, so switch that query to bytea.
The long-term intent is for type "text" to always be valid in the
database encoding. pgcrypto has long been known as a source of
exceptions to that intent, but a report about exploiting invalid values
of type "text" brought this module to the forefront. This particular
exception is straightforward to fix, with reasonable effect on user
queries. Back-patch to v14 (all supported versions).
Reported-by: Paul Gerste (as part of zeroday.cloud)
Reported-by: Moritz Sanft (as part of zeroday.cloud)
Author: shihao zhong <zhong950419@gmail.com>
Reviewed-by: cary huang <hcary328@gmail.com>
Discussion: https://postgr.es/m/CAGRkXqRZyo0gLxPJqUsDqtWYBbgM14betsHiLRPj9mo2=z9VvA@mail.gmail.com
Backpatch-through: 14
Security: CVE-2026-2006
Change log_min_messages from being a single element to a comma-separated
list of type:level elements, with 'type' representing a process type,
and 'level' being a log level to use for that type of process. The list
must also have a freestanding level specification which is used for
process types not listed, which convenientely makes the whole thing
backwards-compatible.
Some choices made here could be contested; for instance, we use the
process type `backend` to affect regular backends as well as dead-end
backends and the standalone backend, and `autovacuum` means both the
launcher and the workers. I think it's largely sensible though, and it
can easily be tweaked if desired.
Author: Euler Taveira <euler@eulerto.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Tan Yang <332696245@qq.com>
Discussion: https://postgr.es/m/e85c6671-1600-4112-8887-f97a8a5d07b2@app.fastmail.com
A security patch changed them today, so close the coverage gap now.
Test that buffer overrun is avoided when pg_mblen*() requires more
than the number of bytes remaining.
This does not cover the calls in dict_thesaurus.c or in dict_synonym.c.
That code is straightforward. To change that code's input, one must
have access to modify installed OS files, so low-privilege users are not
a threat. Testing this would likewise require changing installed
share/postgresql/tsearch_data, which was enough of an obstacle to not
bother.
Security: CVE-2026-2006
Backpatch-through: 14
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
A corrupted string could cause code that iterates with pg_mblen() to
overrun its buffer. Fix, by converting all callers to one of the
following:
1. Callers with a null-terminated string now use pg_mblen_cstr(), which
raises an "illegal byte sequence" error if it finds a terminator in the
middle of the sequence.
2. Callers with a length or end pointer now use either
pg_mblen_with_len() or pg_mblen_range(), for the same effect, depending
on which of the two seems more convenient at each site.
3. A small number of cases pre-validate a string, and can use
pg_mblen_unbounded().
The traditional pg_mblen() function and COPYCHAR macro still exist for
backward compatibility, but are no longer used by core code and are
hereby deprecated. The same applies to the t_isXXX() functions.
Security: CVE-2026-2006
Backpatch-through: 14
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reported-by: Paul Gerste (as part of zeroday.cloud)
Reported-by: Moritz Sanft (as part of zeroday.cloud)
When converting multibyte to pg_wchar, the UTF-8 implementation would
silently ignore an incomplete final character, while the other
implementations would cast a single byte to pg_wchar, and then repeat
for the remaining byte sequence. While it didn't overrun the buffer, it
was surely garbage output.
Make all encodings behave like the UTF-8 implementation. A later change
for master only will convert this to an error, but we choose not to
back-patch that behavior change on the off-chance that someone is
relying on the existing UTF-8 behavior.
Security: CVE-2026-2006
Backpatch-through: 14
Author: Thomas Munro <thomas.munro@gmail.com>
Reported-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
While EUC_CN supports only 1- and 2-byte sequences (CS0, CS1), the
mb<->wchar conversion functions allow 3-byte sequences beginning SS2,
SS3.
Change pg_encoding_max_length() to return 3, not 2, to close a
hypothesized buffer overrun if a corrupted string is converted to wchar
and back again in a newly allocated buffer. We might reconsider that in
master (ie harmonizing in a different direction), but this change seems
better for the back-branches.
Also change pg_euccn_mblen() to report SS2 and SS3 characters as having
length 3 (following the example of EUC_KR). Even though such characters
would not pass verification, it's remotely possible that invalid bytes
could be used to compute a buffer size for use in wchar conversion.
Security: CVE-2026-2006
Backpatch-through: 14
Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
The code made a subtle assumption that the lower-cased version of a
string never has more characters than the original. That is not always
true. For example, in a database with the latin9 encoding:
latin9db=# select lower(U&'\00CC' COLLATE "lt-x-icu");
lower
-----------
i\x1A\x1A
(1 row)
In this example, lower-casing expands the single input character into
three characters.
The generate_trgm_only() function relied on that assumption in two
ways:
- It used "slen * pg_database_encoding_max_length() + 4" to allocate
the buffer to hold the lowercased and blank-padded string. That
formula accounts for expansion if the lower-case characters are
longer (in bytes) than the originals, but it's still not enough if
the lower-cased string contains more *characters* than the original.
- Its callers sized the output array to hold the trigrams extracted
from the input string with the formula "(slen / 2 + 1) * 3", where
'slen' is the input string length in bytes. (The formula was
generous to account for the possibility that RPADDING was set to 2.)
That's also not enough if one input byte can turn into multiple
characters.
To fix, introduce a growable trigram array and give up on trying to
choose the correct max buffer sizes ahead of time.
Backpatch to v18, but no further. In previous versions lower-casing was
done character by character, and thus the assumption that lower-casing
doesn't change the character length was valid. That was changed in v18,
commit fb1a18810f.
Security: CVE-2026-2007
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
The function assumed that if charlen == bytelen, there are no
multibyte characters in the string. That's sensible, but the callers
were a little careless in how they calculated the lengths. The callers
converted the string to lowercase before calling make_trigram(), and
the 'charlen' value was calculated *before* the conversion to
lowercase while 'bytelen' was calculated after the conversion. If the
lowercased string had a different number of characters than the
original, make_trigram() might incorrectly apply the fastpath and
treat all the bytes as single-byte characters, or fail to apply the
fastpath (which is harmless), or it might hit the "Assert(bytelen ==
charlen)" assertion. I'm not aware of any locale / character
combinations where you could hit that assertion in practice,
i.e. where a string converted to lowercase would have fewer characters
than the original, but it seems best to avoid making that assumption.
To fix, remove the 'charlen' argument. To keep the performance when
there are no multibyte characters, always try the fast path first, but
check the input for multibyte characters as we go. The check on each
byte adds some overhead, but it's close enough. And to compensate, the
find_word() function no longer needs to count the characters.
This fixes one small bug in make_trigrams(): in the multibyte
codepath, it peeked at the byte just after the end of the input
string. When compiled with IGNORECASE, that was harmless because there
is always a NUL byte or blank after the input string. But with
!IGNORECASE, the call from generate_wildcard_trgm() doesn't guarantee
that.
Backpatch to v18, but no further. In previous versions lower-casing was
done character by character, and thus the assumption that lower-casing
doesn't change the character length was valid. That was changed in v18,
commit fb1a18810f.
Security: CVE-2026-2007
Reviewed-by: Noah Misch <noah@leadboat.com>
pgp_pub_decrypt_bytea() was missing a safeguard for the session key
length read from the message data, that can be given in input of
pgp_pub_decrypt_bytea(). This can result in the possibility of a buffer
overflow for the session key data, when the length specified is longer
than PGP_MAX_KEY, which is the maximum size of the buffer where the
session data is copied to.
A script able to rebuild the message and key data that can trigger the
overflow is included in this commit, based on some contents provided by
the reporter, heavily editted by me. A SQL test is added, based on the
data generated by the script.
Reported-by: Team Xint Code as part of zeroday.cloud
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2005
Backpatch-through: 14
Looking again at commit 7cdb633c8, I wondered why we have hard-wired
"1034" for the OID of type aclitem[]. Some other entries in the same
array have numeric type OIDs as well. This seems to be a hangover
from years ago when not every built-in pg_type entry had an OID macro.
But since we made genbki.pl responsible for generating these macros,
there are macros available for all these array types, so there's no
reason not to follow the project policy of never writing numeric OID
constants in C code.
This thinko caused us to not substitute our own getopt() code,
which results in failing to parse long options for the postmaster
since Solaris' getopt() doesn't do what we expect. This can be seen
in the results of buildfarm member icarus, which is the only one
trying to build via meson on Solaris.
Per consultation with pgsql-release, it seems okay to fix this
now even though we're in release freeze. The fix visibly won't
affect any other platforms, and it can't break Solaris/meson
builds any worse than they're already broken.
Discussion: https://postgr.es/m/2471229.1770499291@sss.pgh.pa.us
Backpatch-through: 16
Commit 176dffdf7 added a NULL array pointer check before performing
a qsort in order to prevent undefined behavior when passing NULL
pointer and zero length. To head off future degenerate cases, check
that there are at least two elements to sort before proceeding with
insertion sort. This has the added advantage of allowing us to remove
four equivalent checks that guarded against recursion/iteration.
There might be a tiny performance penalty from unproductive
recursions, but we can buy that back by increasing the insertion sort
threshold. That is left for future work.
Discussion: https://postgr.es/m/CANWCAZZWvds_35nXc4vXD-eBQa_=mxVtqZf-PM_ps=SD7ghhJg@mail.gmail.com
Commit 7b378237a widened AclMode to 64 bits, which implies that
the alignment of AclItem is now determined by an int64 field.
That commit correctly set the typalign for SQL type aclitem to
'd', but it missed the hard-wired knowledge about _aclitem in
bootstrap.c. This doesn't seem to have caused any ill effects,
probably because we never try to fill a non-null value into
an aclitem[] column during bootstrap. Nonetheless, it's clearly
a gotcha waiting to happen, so fix it up.
In passing, also fix a couple of typanalyze functions that were
using hard-coded typalign constants when they could just as
easily use greppable TYPALIGN_xxx macros.
Noticed these while working on a patch to expand the set of
typalign values. I doubt we are going to pursue that path,
but these fixes still seem worth a quick commit.
Discussion: https://postgr.es/m/1127261.1769649624@sss.pgh.pa.us
This commit adjusts a few debugging macros to match the style of
those in pg_config_manual.h. Like commits 123661427b and
b4cbc106a6, these were discovered while reviewing Aleksander
Alekseev's proposed changes to pgindent.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aP-H6kSsGOxaB21k%40nathan
The main reason that libpq doesn't request protocol version 3.2 by
default is because other proxy/server implementations don't implement
the negotiation. This is a bit of a chicken-and-egg problem: We don't
bump the default version that libpq requests, but other implementations
may not be incentivized to implement version negotiation if their users
never run into issues.
One established practice to combat this is to flip Postel's Law on its
head, by sending parameters that the server cannot possibly support. If
the server fails the handshake instead of correctly negotiating, then
the problem is surfaced naturally. If the server instead claims to
support the bogus parameters, then we fail the connection to make the
lie obvious. This is called "grease" (or "greasing"), after the GREASE
mechanism in TLS that popularized the concept:
https://www.rfc-editor.org/rfc/rfc8701.html
This patch reserves 3.9999 as an explicitly unsupported protocol version
number and `_pq_.test_protocol_negotiation` as an explicitly unsupported
protocol extension. A later commit will send these by default in order
to stress-test the ecosystem during the beta period; that commit will
then be reverted before 19 RC1, so that we can decide what to do with
whatever data has been gathered.
The _pq_.test_protocol_negotiation change here is intentionally docs-
only: after its implementation is reverted, the parameter should remain
reserved.
Extracted/adapted from a patch by Jelte Fennema-Nio.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/DDPR5BPWH1RJ.1LWAK6QAURVAY%40jeltef.nl
First, split the Protocol Versions table in two, and lead with the list
of versions that are supported today. Reserved and unsupported version
numbers go into the second table.
Second, in anticipation of a new (reserved) protocol extension, document
the extension negotiation process alongside version negotiation, and add
the corresponding tables for future extension parameter registrations.
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/DDPR5BPWH1RJ.1LWAK6QAURVAY%40jeltef.nl
This routine's internals directly used MyProcNumber to choose which
object ID to assign for the hash key of a backend's stats entry, while
the value to use is given as input argument of the function.
The original intention was to pass MyProcNumber as an argument of
pgstat_create_backend() when called in pgstat_bestart_final(),
pgstat_beinit() ensuring that MyProcNumber has been set, not use it
directly in the function. This commit addresses this inconsistency by
using the procnum given by the caller of pgstat_create_backend(), not
MyProcNumber.
This issue is not a cause of bugs currently. However, let's keep the
code in sync across all the branches where this code exists, as it could
matter in a future backpatch.
Oversight in 4feba03d8b.
Reported-by: Ryo Matsumura <matsumura.ryo@fujitsu.com>
Discussion: https://postgr.es/m/TYCPR01MB11316AD8150C8F470319ACCAEE866A@TYCPR01MB11316.jpnprd01.prod.outlook.com
Backpatch-through: 18
These errors are very unlikely going to show up, but in the event that
they happen, some incorrect information would have been provided:
- In pg_rewind, a stat() failure was reported as an open() failure.
- In pg_combinebackup, a check for the new directory of a tablespace
mapping was referred as the old directory.
- In pg_combinebackup, a failure in reading a source file when copying
blocks referred to the destination file.
The changes for pg_combinebackup affect v17 and newer versions. For
pg_rewind, all the stable branches are affected.
Author: Man Zeng <zengman@halodbtech.com>
Discussion: https://postgr.es/m/tencent_1EE1430B1E6C18A663B8990F@qq.com
Backpatch-through: 14
Provide a way to disable the use of posix_fallocate() for relation
files. It was introduced by commit 4d330a61bb. The new setting
file_extend_method=write_zeros can be used as a workaround for problems
reported from the field:
* BTRFS compression is disabled by the use of posix_fallocate()
* XFS could produce spurious ENOSPC errors in some Linux kernel
versions, though that problem is reported to have been fixed
The default is file_extend_method=posix_fallocate if available, as
before. The write_zeros option is similar to PostgreSQL < 16, except
that now it's multi-block.
Backpatch-through: 16
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reported-by: Dimitrios Apostolou <jimis@gmx.net>
Discussion: https://postgr.es/m/b1843124-fd22-e279-a31f-252dffb6fbf2%40gmx.net
synchronized_standby_slots is defined in guc_parameter.dat as part of
the REPLICATION_PRIMARY group and is listed under the "Primary Server"
section in postgresql.conf.sample. However, in the documentation
its description was previously placed under the "Sending Servers" section.
Since synchronized_standby_slots only takes effect on the primary server,
this commit moves its documentation to the "Primary Server" section to
match its behavior and other references.
Backpatch to v17 where synchronized_standby_slots was added.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwE_LwgXgCrqd08OFteJqdERiF3noqOKu2vt7Kjk4vMiGg@mail.gmail.com
Backpatch-through: 17
Commit 29d0a77fa6 improved pg_upgrade to allow migrating logical slots
provided that all logical slots have caught up (i.e., they have no
pending decodable WAL records). Previously, this verification was done
by checking each slot individually, which could be time-consuming if
there were many logical slots to migrate.
This commit optimizes the check to avoid reading the same WAL stream
multiple times. It performs the check only for the slot with the
minimum confirmed_flush_lsn and applies the result to all other slots
in the same database. This limits the check to at most one logical
slot per database.
During the check, we identify the last decodable WAL record's LSN to
report any slots with unconsumed records, consistent with the existing
error reporting behavior. Additionally, the maximum
confirmed_flush_lsn among all logical slots on the database is used as
an early scan cutoff; finding a decodable WAL record beyond this point
implies that no slot has caught up.
Performance testing demonstrated that the execution time remains
stable regardless of the number of slots in the database.
Note that we do not distinguish slots based on their output plugins. A
hypothetical plugin might use a replication origin filter that filters
out changes from a specific origin. In such cases, we might get a
false positive (erroneously considering a slot caught up). However,
this is safe from a data integrity standpoint, such scenarios are
rare, and the impact of a false positive is minimal.
This optimization is applied only when the old cluster is version 19
or later.
Bump catalog version.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAD21AoBZ0LAcw1OHGEKdW7S5TRJaURdhEk3CLAW69_siqfqyAg@mail.gmail.com
This affects two command patterns, showing information about relations:
* oid2name -x -d DBNAME, applying to all relations on a database.
* oid2name -x -d DBNAME -t TABNAME [-t ..], applying to a subset of
defined relations on a database.
The relative path of a relation is added to the information provided,
using pg_relation_filepath().
Author: David Bidoc <dcbidoc@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Guillaume Lelarge <guillaume.lelarge@dalibo.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Mark Wong <markwkm@gmail.com>
Discussion: https://postgr.es/m/CABour1v2CU1wjjoM86wAFyezJQ3-+ncH43zY1f1uXeVojVN8Ow@mail.gmail.com
Instead of assigning the backend type in the Main function of each
postmaster child, do it right after fork(), by which time it is already
known by postmaster_child_launch(). This reduces the time frame during
which MyBackendType is incorrect.
Before this commit, ProcessStartupPacket would overwrite MyBackendType
to B_BACKEND for dead-end backends, which is quite dubious. Stop that.
We may now see MyBackendType == B_BG_WORKER before setting up
MyBgworkerEntry. As far as I can see this is only a problem if we try
to log a message and %b is in log_line_prefix, so we now have a constant
string to cover that case. Previously, it would print "unrecognized",
which seems strictly worse.
Author: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/e85c6671-1600-4112-8887-f97a8a5d07b2@app.fastmail.com
Commit 5f13999aa1 added a TAP test for GUC settings passed via the
CONNECTION string in logical replication, but the buildfarm member
sungazer reported test failures.
The test incorrectly used the subscriber's log file position as the
starting offset when reading the publisher's log. As a result, the test
failed to find the expected log message in the publisher's log and
erroneously reported a failure.
This commit fixes the test to use the publisher's own log file position
when reading the publisher's log.
Also, to avoid similar confusion in the future, this commit splits the single
$log_location variable into $log_location_pub and $log_location_sub,
clearly distinguishing publisher and subscriber log positions.
Backpatched to v15, where commit 5f13999aa1 introduced the test.
Per buildfarm member sungazer.
This issue was reported and diagnosed by Alexander Lakhin.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/966ec3d8-1b6f-4f57-ae59-fc7d55bc9a5a@gmail.com
Backpatch-through: 15
Mostly this involves checking for NULL pointer before doing operations
that add a non-zero offset.
The exception is an overflow warning in heap_fetch_toast_slice(). This
was caused by unneeded parentheses forcing an expression to be
evaluated to a negative integer, which then got cast to size_t.
Per clang 21 undefined behavior sanitizer.
Backpatch to all supported versions.
Co-authored-by: Alexander Lakhin <exclusion@gmail.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/777bd201-6e3a-4da0-a922-4ea9de46a3ee@gmail.com
Backpatch-through: 14
We can immediately make use of it in pg_signal_backend(), which
previously fetched the process type from the backend status array with
pgstat_get_backend_type_by_proc_number(). That was correct but felt a
little questionable to me: backend status should be for observability
purposes only, not for permission checks.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/b77e4962-a64a-43db-81a1-580444b3e8f5@iki.fi
Currently, when the argument of copyObject() is const-qualified, the
return type is also, because the use of typeof carries over all the
qualifiers. This is incorrect, since the point of copyObject() is to
make a copy to mutate. But apparently no code ran into it.
The new implementation uses typeof_unqual, which drops the qualifiers,
making this work correctly.
typeof_unqual is standardized in C23, but all recent versions of all
the usual compilers support it even in non-C23 mode, at least as
__typeof_unqual__. We add a configure/meson test for typeof_unqual
and __typeof_unqual__ and use it if it's available, else we use the
existing fallback of just returning void *.
Reviewed-by: David Geier <geidav.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/92f9750f-c7f6-42d8-9a4a-85a3cbe808f3%40eisentraut.org
A failure while closing pg_wal/summaries/ incorrectly generated a report
about pg_wal/archive_status/.
While at it, this commit adds #undefs for the macros used in
KillExistingWALSummaries() and KillExistingArchiveStatus() to prevent
those values from being misused in an incorrect function context.
Oversight in dc21234005.
Author: Tianchen Zhang <zhang_tian_chen@163.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/SE2P216MB2390C84C23F428A7864EE07FA19BA@SE2P216MB2390.KORP216.PROD.OUTLOOK.COM
Backpatch-through: 17
The pg_dump documentation had repetitive notes for the --schema,
--table, and --extension switches, noting that dependent database
objects are not automatically included in the dump. This commit removes
these notes and replaces them with a consolidated paragraph in the
"Notes" section.
pg_restore had a similar note for -t but lacked one for -n; do likewise.
Also, add a note to --extension in pg_dump to note that ancillary files
(such as shared libraries and control files) are not included in the
dump and must be present on the destination system.
Author: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/284C4D55-4F90-4AA0-84C8-1E6A28DDF271@gmail.com
I don't understand how to reach errdetail_abort() with
MyProc->recoveryConflictPending set. If a recovery conflict signal is
received, ProcessRecoveryConflictInterrupt() raises an ERROR or FATAL
error to cancel the query or connection, and abort processing clears
the flag. The error message from ProcessRecoveryConflictInterrupt() is
very clear that the query or connection was terminated because of
recovery conflict.
The only way to reach it AFAICS is with a race condition, if the
startup process sends a recovery conflict signal when the transaction
has just entered aborted state for some other reason. And in that case
the detail would be misleading, as the transaction was already aborted
for some other reason, not because of the recovery conflict.
errdetail_abort() was the only user of the recoveryConflictPending
flag in PGPROC, so we can remove that and all the related code too.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/4cc13ba1-4248-4884-b6ba-4805349e7f39@iki.fi
When using ALTER TABLE ... ADD CONSTRAINT to add a not-null constraint
with an explicit name, we have to ensure that if the column is already
marked NOT NULL, the provided name matches the existing constraint name.
Failing to do so could lead to confusion regarding which constraint
object actually enforces the rule.
This patch adds a check to throw an error if the user tries to add a
named not-null constraint to a column that already has one with a
different name.
Reported-by: yanliang lei <msdnchina@163.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Co-authored-bu: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/19351-8f1c523ead498545%40postgresql.org
Two wait events are added to the COPY FROM/TO code:
* COPY_FROM_READ: reading data from a copy_file.
* COPY_TO_WRITE: writing data to a copy_file.
In the COPY code, copy_file can be set when processing a command through
the pipe mode (for the non-DestRemote case), the program mode or the
file mode, when processing fread() or fwrite() on it.
Author: Nikolay Samokhvalov <nik@postgres.ai>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAM527d_iDzz0Kqyi7HOfqa-Xzuq29jkR6AGXqfXLqA5PR5qsng@mail.gmail.com
This routine has an option to bypass an error if a WAL summary file is
opened for read but is missing (missing_ok=true). However, the code
incorrectly checked for EEXIST, that matters when using O_CREAT and
O_EXCL, rather than ENOENT, for this case.
There are currently only two callers of OpenWalSummaryFile() in the
tree, and both use missing_ok=false, meaning that the check based on the
errno is currently dead code. This issue could matter for out-of-core
code or future backpatches that would like to use missing_ok set to
true.
Issue spotted while monitoring this area of the code, after
a9afa021e9.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aYAf8qDHbpBZ3Rml@paquier.xyz
Backpatch-through: 17
Previously, when synchronous_standby_names was changed (for example,
by reducing the number of required synchronous standbys or modifying
the standby list), backends waiting for synchronous replication were not
released immediately, even if the new configuration no longer required them
to wait. They could remain blocked until additional messages arrived from
standbys and triggered their release.
This commit improves walsender so that backends waiting for synchronous
replication are released as soon as the updated configuration takes effect and
the new settings no longer require them to wait, by calling
SyncRepReleaseWaiters() when configuration changes are processed.
As part of this change, the duplicated code that handles configuration changes
in walsender has been refactored into a new helper function, which is now used
at the three existing call places.
Since this is an improvement rather than a bug fix, it is applied only to
the master branch.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/CAOzEurSRii0tEYhu5cePmRcvS=ZrxTLEvxm3Kj0d7_uKGdM23g@mail.gmail.com
This commit introduces a new prompt escape %i for psql, which shows
whether the connected server is operating in hot standby mode. It
expands to standby if the server reports in_hot_standby = on, and
primary otherwise.
This is useful for distinguishing standby servers from primary ones
at a glance, especially when working with multiple connections in
replicated environments where libpq's multi-host connection strings
are used.
Author: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://www.postgresql.org/message-id/flat/016f6738-f9a9-4e98-bb5a-e1e4b9591d46@uni-muenster.de
The test relies on VACUUM being able to mark a page all-visible, but
this can fail when autovacuum in other sessions prevents the visibility
horizon from advancing. Making the test table temporary isolates its
horizon from other sessions, including catalog table vacuums, ensuring
reliable test behavior.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/2b09fba6-6b71-497a-96ef-a6947fcc39f6%40gmail.com
Separate att_align_nominal() into two macros, similarly to what
was already done with att_align_datum() and att_align_pointer().
The inner macro att_nominal_alignby() is really just TYPEALIGN(),
while att_align_nominal() retains its previous API by mapping
TYPALIGN_xxx values to numbers of bytes to align to and then
calling att_nominal_alignby(). In support of this, split out
tupdesc.c's logic to do that mapping into a publicly visible
function typalign_to_alignby().
Having done that, we can replace performance-critical uses of
att_align_nominal() with att_nominal_alignby(), where the
typalign_to_alignby() mapping is done just once outside the loop.
In most places I settled for doing typalign_to_alignby() once
per function. We could in many places pass the alignby value
in from the caller if we wanted to change function APIs for this
purpose; but I'm a bit loath to do that, especially for exported
APIs that extensions might call. Replacing a char typalign
argument by a uint8 typalignby argument would be an API change
that compilers would fail to warn about, thus silently breaking
code in hard-to-debug ways. I did revise the APIs of array_iter_setup
and array_iter_next, moving the element type attribute arguments to
the former; if any external code uses those, the argument-count
change will cause visible compile failures.
Performance testing shows that ExecEvalScalarArrayOp is sped up by
about 10% by this change, when using a simple per-element function
such as int8eq. I did not check any of the other loops optimized
here, but it's reasonable to expect similar gains.
Although the motivation for creating this patch was to avoid a
performance loss if we add some more typalign values, it evidently
is worth doing whether that patch lands or not.
Discussion: https://postgr.es/m/1127261.1769649624@sss.pgh.pa.us
Up to now we've used GNU-style local labels for branch targets
in s_lock.h's assembly blocks. But there's an alternative style,
which I for one didn't know about till recently: use regular
assembler labels, and insert a per-asm-block number in them
using %= to ensure they are distinct across multiple TAS calls
within one source file. gcc has had %= since gcc 2.0, and
I've verified that clang knows it too.
While the immediate motivation for changing this is that AIX's
assembler doesn't do local labels, it seems to me that this is a
superior solution anyway. There is nothing mnemonic about "1:",
while a regular label can convey something useful, and at least
to me it feels less error-prone. Therefore let's standardize on
this approach, also converting the one other usage in s_lock.h.
Discussion: https://postgr.es/m/399291.1769998688@sss.pgh.pa.us
A failing unlink() was reporting an incorrect error message, referring
to stat().
Author: Man Zeng <zengman@halodbtech.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/tencent_3BBE865C5F49D452360FF190@qq.com
Backpath-through: 17
The build generates four files based on the wait event contents stored
in wait_event_names.txt:
- wait_event_types.h
- pgstat_wait_event.c
- wait_event_funcs_data.c
- wait_event_types.sgml
The SGML file is generated as part of a documentation build, with its
data stored in doc/src/sgml/ for meson and configure. The three others
are handled differently for meson and configure:
- In configure, all the files are created in src/backend/utils/activity/.
A link to wait_event_types.h is created in src/include/utils/.
- In meson, all the files are created in src/include/utils/.
The two C files, pgstat_wait_event.c and wait_event_funcs_data.c, are
then included in respectively wait_event.c and wait_event_funcs.c,
without the "utils/" path.
For configure, this does not present a problem. For meson, this has to
be combined with a trick in src/backend/utils/activity/meson.build,
where include_directories needs to point to include/utils/ to make the
inclusion of the C files work properly, causing builds to pull in
PostgreSQL headers rather than system headers in some build paths, as
src/include/utils/ would take priority.
In order to fix this issue, this commit reworks the way the C/H files
are generated, becoming consistent with guc_tables.inc.c:
- For meson, basically nothing changes. The files are still generated
in src/include/utils/. The trick with include_directories is removed.
- For configure, the files are now generated in src/backend/utils/, with
links in src/include/utils/ pointing to the ones in src/backend/. This
requires extra rules in src/backend/utils/activity/Makefile so as a
make command in this sub-directory is able to work.
- The three files now fall under header-stamp, which is actually simpler
as guc_tables.inc.c does the same.
- wait_event_funcs_data.c and pgstat_wait_event.c are now included with
"utils/" in their path.
This problem has not been an issue in the buildfarm; it has been noted
with AIX and a conflict with float.h. This issue could, however, create
conflicts in the buildfarm depending on the environment with unexpected
headers pulled in, so this fix is backpatched down to where the
generation of the wait-event files has been introduced.
While on it, this commit simplifies wait_event_names.txt regarding the
paths of the files generated, to mention just the names of the files
generated. The paths where the files are generated became incorrect.
The path of the SGML path was wrong.
This change has been tested in the CI, down to v17. Locally, I have run
tests with configure (with and without VPATH), as well as meson, on the
three branches.
Combo oversight in fa88928470 and 1e68e43d3f.
Reported-by: Aditya Kamath <aditya.kamath1@ibm.com>
Discussion: https://postgr.es/m/LV8PR15MB64888765A43D229EA5D1CFE6D691A@LV8PR15MB6488.namprd15.prod.outlook.com
Backpatch-through: 17
Similarly to the preceding commit, 030_pager.pl was assuming
that patterns it looks for in interactive psql output would
appear by themselves on a line, but that assumption tends to
fall over in builds made --without-readline: the output we
get might have a psql prompt immediately followed by the
expected line of output.
For several of these tests, just checking for the pattern
followed by newline seems sufficient, because we could not
get a false match against the command echo, nor against the
unreplaced command output if the pager fails to be invoked
when expected. However, that's fairly scary for the test
that was relying on information_schema.referential_constraints:
"\d+" could easily appear at the end of a line in that view.
Let's get rid of that hazard by making a custom test view
instead of using information_schema.referential_constraints.
This test script is new in v19, so no need for back-patch.
Reported-by: Oleg Tselebrovskiy <o.tselebrovskiy@postgrespro.ru>
Author: Oleg Tselebrovskiy <o.tselebrovskiy@postgrespro.ru>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com>
Discussion: https://postgr.es/m/db6fdb35a8665ad3c18be01181d44b31@postgrespro.ru
BackgroundPsql needs to wait for all the output from an interactive
psql command to come back. To make sure that's happened, it issues
the command, then issues \echo and \warn psql commands that echo
a "banner" string (which we assume won't appear in the command's
output), then waits for the banner strings to appear. The hazard
in this approach is that the banner will also appear in the echoed
psql commands themselves, so we need to distinguish those echoes from
the desired output. Commit 8b886a4e3 tried to do that by positing
that the desired output would be directly preceded and followed by
newlines, but it turns out that that assumption is timing-sensitive.
In particular, it tends to fail in builds made --without-readline,
wherein the command echoes will be made by the pty driver and may
be interspersed with prompts issued by psql proper.
It does seem safe to assume that the banner output we want will be
followed by a newline, since that should be the last output before
things quiesce. Therefore, we can improve matters by putting quotes
around the banner strings in the \echo and \warn psql commands, so
that their echoes cannot include banner directly followed by newline,
and then checking for just banner-and-newline in the match pattern.
While at it, spruce up the pump() call in sub query() to look like
the neater version in wait_connect(), and don't die on timeout
until after printing whatever we got.
Reported-by: Oleg Tselebrovskiy <o.tselebrovskiy@postgrespro.ru>
Diagnosed-by: Oleg Tselebrovskiy <o.tselebrovskiy@postgrespro.ru>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com>
Discussion: https://postgr.es/m/db6fdb35a8665ad3c18be01181d44b31@postgrespro.ru
Backpatch-through: 14
For readability. It was a slight modularity violation to have fields
in PGShmemHeader that were only used by the allocator code in
shmem.c. And it was inconsistent that ShmemLock was nevertheless not
stored there. Moving all the allocator-related fields to a separate
struct makes it more consistent and modular, and removes the need to
allocate and pass ShmemLock separately via BackendParameters.
Merge InitShmemAccess() and InitShmemAllocation() into a single
function that initializes the struct when called from postmaster, and
when called from backends in EXEC_BACKEND mode, re-establishes the
global variables. That's similar to all the *ShmemInit() functions
that we have.
Co-authored-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAExHW5uNRB9oT4pdo54qAo025MXFX4MfYrD9K15OCqe-ExnNvg@mail.gmail.com
Previously, the CheckXidAlive check was performed within the table_scan*next*
functions. This caused the check to be executed for every fetched tuple, an
unnecessary overhead.
To fix, move the check to table_beginscan* so it is performed once per scan
rather than once per row.
Note: table_tuple_fetch_row_version() does not use a scan descriptor;
therefore, the CheckXidAlive check is retained in that function. The overhead
is unlikely to be relevant for the existing callers.
Reported-by: Andres Freund <andres@anarazel.de>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Suggested-by: Amit Kapila <akapila@postgresql.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/tlpltqm5jjwj7mp66dtebwwhppe4ri36vdypux2zoczrc2i3mp%40dhv4v4nikyfg
In fcb9c977aa I included an assertion in BufferLockConditional() to detect if
a conditional lock acquisition is done on a buffer that we already have
locked. The assertion was added in the course of adding other assertions.
Unfortunately I failed to realize that some of our code relies on such lock
acquisitions to silently fail. E.g. spgist and nbtree may try to conditionally
lock an already locked buffer when acquiring a empty buffer.
LWLockAcquireConditional(), which was previously used to implement
ConditionalLockBuffer(), does not have such an assert.
Instead of just removing the assert, and relying on the lock acquisition to
fail due to the buffer already locked, this commit changes the behaviour of
conditional content lock acquisition to fail if the current backend has any
pre-existing lock on the buffer, even if the lock modes would not
conflict. The reason for that is that we currently do not have space to track
multiple lock acquisitions on a single buffer. Allowing multiple locks on the
same buffer by a backend also seems likely to lead to bugs.
There is only one non-self-exclusive conditional content lock acquisition, in
GetVictimBuffer(), but it only is used if the target buffer is not pinned and
thus can't already be locked by the current backend.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/90bd2cbb-49ce-4092-9f61-5ac2ab782c94@gmail.com
Commit 6ceef9408 was still one brick shy of a load, because it caused
any usage at all of PGIOAlignedBlock or PGAlignedXLogBlock to fail
under older g++. Notably, this broke "headerscheck --cplusplus".
We can permit references to these structs as abstract structs though;
only actual declaration of such a variable needs to be forbidden.
Discussion: https://www.postgresql.org/message-id/3119480.1769189606@sss.pgh.pa.us
The leaks were hard to reach in practice and the impact was low.
The callers provide a buffer the same number of bytes as the source
string (plus one for NUL terminator) as a starting size, and libc
never increases the number of characters. But, if the byte length of
one of the converted characters is larger, then it might need a larger
destination buffer. Previously, in that case, the working buffers
would be leaked.
Even in that case, the call typically happens within a context that
will soon be reset. Regardless, it's worth fixing to avoid such
assumptions, and the fix is simple so it's worth backporting.
Discussion: https://postgr.es/m/e2b7a0a88aaadded7e2d19f42d5ab03c9e182ad8.camel@j-davis.com
Backpatch-through: 18
Use the proper constant InvalidXLogRecPtr instead of literal 0 when
assigning XLogRecPtr variables and struct fields.
This improves code clarity by making it explicit that these are
invalid LSN values rather than ambiguous zero literals.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aRtd2dw8FO1nNX7k@ip-10-97-1-34.eu-west-3.compute.internal
cost_tidrangescan() was setting the disabled_nodes value correctly,
and then immediately resetting it to zero, due to poor code editing on
my part.
materialized_finished_plan correctly set matpath.parent to
zero, but forgot to also set matpath.parallel_workers = 0, causing
an access to uninitialized memory in cost_material. (This shouldn't
result in any real problem, but it makes valgrind unhappy.)
reparameterize_path was dereferencing a variable before verifying that
it was not NULL.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us> (issue #1)
Reported-by: Michael Paquier <michael@paquier.xyz> (issue #1)
Diagnosed-by: Lukas Fittl <lukas@fittl.com> (issue #1)
Reported-by: Zsolt Parragi <zsolt.parragi@percona.com> (issue #2)
Reported-by: Richard Guo <guofenglinux@gmail.com> (issue #3)
Discussion: http://postgr.es/m/CAN4CZFPvwjNJEZ_JT9Y67yR7C=KMNa=LNefOB8ZY7TKDcmAXOA@mail.gmail.com
Discussion: http://postgr.es/m/aXrnPgrq6Gggb5TG@paquier.xyz
In the psql prompt, %P prompt shows the current pipeline status. Unlike
most of the other options, its status was showing up in the output
generated even if psql was not connected to a database. This was
confusing, because without a connection a pipeline status makes no
sense.
Like the other options, %P is updated so as its data is now hidden
without an active connection.
Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/86EF76B5-6E62-404D-B9EC-66F4714D7D5F@gmail.com
Backpatch-through: 18
These have been fat-fingered in 0e80f3f88d and 302879bd68. The
error message for ndistinct had an incorrect grammar, while the one for
dependencies had finished with a period (incorrect based on the project
guidelines).
Discussion: https://postgr.es/m/aXrsjZQbVuB6236u@paquier.xyz
The test added in this commit copies the data of an ANALYZE run on one
relation to a secondary relation with the same attribute definitions and
extended statistics objects. Once the clone is done, the target and
origin should have the same extended statistics information, with no
differences.
This test would have been able to catch e3094679b9, for example, as we
expect the full range of statistics to be copied over, with no
differences generated between the results of an ANALYZE and the data
copied to the cloned relation.
Note that this new test should remain at the bottom of stats_import.sql,
so as any additions in the main relation and its clone are automatically
covered when copying their statistics, so as it would work as a sanity
check in the future.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
The restore of extended statistics has some paths dedicated to
multirange types and expressions for all the stats kinds supported, and
we did not have coverage for the case where an extended stats object
uses a multirange attribute with or without an expression.
Extracted from a larger patch by the same author, with a couple of
tweaks from me regarding the format of the output generated, to make it
more readable to the eye.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
The test added in commit 851f6649cc uses a backup taken from a node
created by the previous test to perform standby related checks. On
Windows, however, the standby failed to start with the following error:
FATAL: could not rename file "backup_label" to "backup_label.old": Permission denied
This occurred because some background sessions from the earlier test were
still active. These leftover processes continued accessing the parent
directory of the backup_label file, likely preventing the rename and
causing the failure. Ensuring that these sessions are cleanly terminated
resolves the issue in local testing.
Additionally, the has_restoring => 1 option has been removed, as it was
not required by the new test.
Reported-by: Robert Haas <robertmhaas@gmail.com>
Backpatch-through: 17
Discussion: https://postgr.es/m/CA+TgmobdVhO0ckZfsBZ0wqDO4qHVCwZZx8sf=EinafvUam-dsQ@mail.gmail.com
This commit adds support for the restore of extended statistics of the
kind "mcv", aka most-common values.
This format is different from n_distinct and dependencies stat types in
that it is the combination of three of the four different arrays from the
pg_stats_ext view which in turn require three different input parameters
on pg_restore_extended_statistics(). These are translated into three
input arguments for the function:
- "most_common_vals", acting as a leader of the others. It is a
2-dimension array, that includes the common values.
- "most_common_freqs", 1-dimension array of float8[], with a number of
elements that has to match with "most_common_vals".
- "most_common_base_freqs", 1-dimension array of float8[], with a number
of elements that has to match with "most_common_vals".
All three arrays are required to achieve the restore of this type of
extended statistics (if "most_common_vals" happens to be NULL in the
catalogs, the rest is NULL by design).
Note that "most_common_val_nulls" is not required in input, its data is
rebuilt from the decomposition of the "most_common_vals" array based on
its text[] representation. The initial versions of the patch provided
this option in input, but we do not require it and it simplifies a lot
the result.
Support in pg_dump is added down to v13 which is where the support for
this type of extended statistics has been added, when --statistics is
used. This means that upgrade and dumps can restore extended statistics
data transparently, like "dependencies", "ndistinct", attribute and
relation statistics. For MCV, the values are directly queried from the
relevant catalogs.
Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
This commit moves the separate global variables for replication origin
state into a single ReplOriginXactState struct. This groups logically
related variables, which improves code readability and simplifies
state management (e.g., resetting the state) by handling them as a
unit.
Author: Chao Li <lic@highgo.com>
Suggested-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2=pYvfRthXHTzSrOsf5_FfyY4zJyK4zV2v4W=yjUij1cA@mail.gmail.com
Factor out common logic for clearing replorigin_session_* variables
into a dedicated helper function, replorigin_xact_clear().
This removes duplicated assignments of these variables across multiple
call sites, and makes the intended scope of each reset explicit.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAEoWx2=pYvfRthXHTzSrOsf5_FfyY4zJyK4zV2v4W=yjUij1cA@mail.gmail.com
The replication origin code was using inconsistent naming
conventions. Functions were typically prefixed with 'replorigin',
while typedefs and constants used "RepOrigin".
This commit unifies the naming convention by renaming RepOriginId to
ReplOriginId.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAD21AoBDgm3hDqUZ+nqu=ViHmkCnJBuJyaxG_yvv27BAi2zBmQ@mail.gmail.com
Each RelOptInfo now has a pgs_mask member which is a mask of acceptable
strategies. For most rels, this is populated from PlannerGlobal's
default_pgs_mask, which is computed from the values of the enable_*
GUCs at the start of planning.
For baserels, get_relation_info_hook can be used to adjust pgs_mask for
each new RelOptInfo, at least for rels of type RTE_RELATION. Adjusting
pgs_mask is less useful for other types of rels, but if it proves to
be necessary, we can revisit the way this hook works or add a new one.
For joinrels, two new hooks are added. joinrel_setup_hook is called each
time a joinrel is created, and one thing that can be done from that hook
is to manipulate pgs_mask for the new joinrel. join_path_setup_hook is
called each time we're about to add paths to a joinrel by considering
some particular combination of an outer rel, an inner rel, and a join
type. It can modify the pgs_mask propagated into JoinPathExtraData to
restrict strategy choice for that particular combination of rels.
To make joinrel_setup_hook work as intended, the existing calls to
build_joinrel_partition_info are moved later in the calling functions;
this is because that function checks whether the rel's pgs_mask includes
PGS_CONSIDER_PARTITIONWISE, so we want it to only be called after
plugins have had a chance to alter pgs_mask.
Upper rels currently inherit pgs_mask from the input relation. It's
unclear that this is the most useful behavior, but at the moment there
are no hooks to allow the mask to be set in any other way.
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
Commit 90eae926a fixed ON CONFLICT handling during REINDEX CONCURRENTLY
on partitioned tables by treating unparented indexes as potential
arbiters. However, there's a remaining race condition: when pg_inherits
records are swapped between consecutive calls to get_partition_ancestors(),
two different child indexes can appear to have the same parent, causing
duplicate entries in the arbiter list and triggering "invalid arbiter
index list" errors.
Note that this is not a new problem introduced by 90eae926a. The same
error could occur before that commit in a slightly different scenario:
an index is selected during planning, then index_concurrently_swap()
commits, and a subsequent call to get_partition_ancestors() uses a new
catalog snapshot that sees zero ancestors for that index.
Fix by tracking which parent indexes have already been processed. If a
subsequent call to get_partition_ancestors() returns a parent we've
already seen, treat that index as unparented instead, allowing it to be
matched via IsIndexCompatibleAsArbiter() like other concurrent reindex
scenarios.
Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/e5a8c1df-04e5-4343-85ef-5df2a7e3d90c@gmail.com
This commit fixes an issue with the restore of ndistinct and
dependencies statistics, causing the operation to fail when any of these
kinds included expressions.
In extended statistics, expressions use strictly negative attribute
numbers, decremented from -1. For example, let's imagine an object
defined as follows:
CREATE STATISTICS stats_obj (dependencies) ON lower(name), upper(name)
FROM tab_obj;
This object would generate dependencies stats using -1 and -2 as
attribute numbers, like that:
[{"attributes": [-1], "dependency": -2, "degree": 1.000000},
{"attributes": [-2], "dependency": -1, "degree": 1.000000}]
However, pg_restore_extended_stats() forgot to account for the number of
expressions defined in an extended statistics object. This would cause
the validation step of ndistinct and dependencies data to fail,
preventing a restore of their stats even if the input is valid.
This issue has come up due to an incorrect split of the patch set. Some
tests are included to cover this behavior.
Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aXl4bMfSTQUxM_yy@paquier.xyz
Commit 302879bd68 has added the ability to restore extended stats of
the type "dependencies", but it has forgotten the addition of a test to
verify that the value restored was actually set.
This test is the pg_dependencies equivalent of the test added for
pg_ndistinct in 0e80f3f88d.
Author: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/CADkLM=dZr_Ut3jKw94_BisyyDtNZPRJWeOALXVzcJz=ZFTAhvQ@mail.gmail.com
The oauth_validator tests missed the lessons of c89525d57 et al, so
certain combinations of command-line build order and `meson test`
options can result in
Command 'oauth_hook_client' not found in [...] at src/test/perl/PostgreSQL/Test/Utils.pm line 427.
Add the missing dependency on the test executable. This fixes, for
example,
$ ninja clean && ninja meson-test-prereq && PG_TEST_EXTRA=oauth meson test --no-rebuild
Reported-by: Jonathan Gonzalez V. <jonathan.abdiel@gmail.com>
Author: Jonathan Gonzalez V. <jonathan.abdiel@gmail.com>
Discussion: https://postgr.es/m/6e8f4f7c23faf77c4b6564c4b7dc5d3de64aa491.camel@gmail.com
Discussion: https://postgr.es/m/qh4c5tvkgjef7jikjig56rclbcdrrotngnwpycukd2n3k25zi2%4044hxxvtwmgum
Backpatch-through: 18
A race condition could cause a newly synced replication slot to become
invalidated between its initial sync and the checkpoint.
When syncing a replication slot to a standby, the slot's initial
restart_lsn is taken from the publisher's remote_restart_lsn. Because slot
sync happens asynchronously, this value can lag behind the standby's
current redo pointer. Without any interlocking between WAL reservation and
checkpoints, a checkpoint may remove WAL required by the newly synced
slot, causing the slot to be invalidated.
To fix this, we acquire ReplicationSlotAllocationLock before reserving WAL
for a newly synced slot, similar to commit 006dd4b2e5. This ensures that
if WAL reservation happens first, the checkpoint process must wait for
slotsync to update the slot's restart_lsn before it computes the minimum
required LSN.
However, unlike in ReplicationSlotReserveWal(), this lock alone cannot
protect a newly synced slot if a checkpoint has already run
CheckPointReplicationSlots() before slotsync updates the slot. In such
cases, the remote restart_lsn may be stale and earlier than the current
redo pointer. To prevent relying on an outdated LSN, we use the oldest
WAL location available if it is greater than the remote restart_lsn.
This ensures that newly synced slots always start with a safe, non-stale
restart_lsn and are not invalidated by concurrent checkpoints.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Vitaly Davydov <v.davydov@postgrespro.ru>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Backpatch-through: 17
Discussion: https://postgr.es/m/TY4PR01MB16907E744589B1AB2EE89A31F94D7A%40TY4PR01MB16907.jpnprd01.prod.outlook.com
This commit integrates the new pg_restore_extended_stats() function into
pg_dump, so as the data of extended statistics is detected and included
in dumps when the --statistics switch is specified. Currently, the same
extended stats kinds as the ones supported by the SQL function can be
dumped: "n_distinct" and "dependencies".
The extended statistics data can be dumped down to PostgreSQL 10, with
the following changes depending on the backend version dealt with:
- In v19 and newer versions, the format of pg_ndistinct and
pg_dependencies has changed, catalogs can be directly queried.
- In v18 and older versions, the format is translated to the new format
supported by the backend.
- In v14 and older versions, inherited extended statistics are not
supported.
- In v11 and older versions, the data for ndistinct and dependencies
was stored in pg_statistic_ext. These have been moved to pg_stats_ext
in v12.
- Extended Statistics have been introduced in v10, no support is needed
for versions older than that.
The extended statistics data is dumped if it can be found in the
catalogs. If the catalogs are empty, then no restore of the stats data
is attempted.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
WalSndShutdown() previously called abort() after proc_exit(0) to
silence compiler warnings. This is no longer needed, because both
WalSndShutdown() and proc_exit() are declared pg_noreturn,
allowing the compiler to recognize that the function does not return.
Also there are already other functions, such as CheckpointerMain(),
that call proc_exit() without an abort(), and they do not produce warnings.
Therefore this abort() call in WalSndShutdown() is useless and
this commit removes it.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CAHGQGwHPX1yoixq+YB5rF4zL90TMmSEa3FpHURtqW3Jc5+=oSA@mail.gmail.com
We've assumed that touching the memory is sufficient for a page to be
located on one of the NUMA nodes. But a page may be moved to a swap
after we touch it, due to memory pressure.
We touch the memory before querying the status, but there is no
guarantee it won't be moved to the swap in the meantime. The touching
happens only on the first call, so later calls are more likely to be
affected. And the batching increases the window too.
It's up to the kernel if/when pages get moved to swap. We have to accept
ENOENT (-2) as a valid result, and handle it without failing. This patch
simply treats it as an unknown node, and returns NULL in the two
affected views (pg_shmem_allocations_numa and pg_buffercache_numa).
Hugepages cannot be swapped out, so this affects only regular pages.
Reported by Christoph Berg, investigation and fix by me. Backpatch to
18, where the two views were introduced.
Reported-by: Christoph Berg <myon@debian.org>
Discussion: 18
Backpatch-through: https://postgr.es/m/aTq5Gt_n-oS_QSpL@msg.df7cb.de
This commit adds support for the restore of extended statistics of the
kind "dependencies", for the following input data:
[{"attributes": [2], "dependency": 3, "degree": 1.000000},
{"attributes": [3], "dependency": 2, "degree": 1.000000}]
This relies on the existing routines of "dependencies" to cross-check
the input data with the definition of the extended statistics objects
for the attribute numbers. An input argument of type "pg_dependencies"
is required for this new option.
Thanks to the work done in 0e80f3f88d for the restore function and
e1405aa5e3 for the input handling of data type pg_dependencies, this
addition is straight-forward. This will be used so as it is possible to
transfer these statistics across dumps and upgrades, removing the need
for a post-operation ANALYZE for these kinds of statistics.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
Encapsulating the cases that clear the visibility map after vacuum phase
I, when corruption is detected, into in a helper makes the code cleaner
and enables further refactoring in future commits.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/7ib3sa55sapwjlaz4sijbiq7iezna27kjvvvar4dpgkmadml6t%40gfpkkwmdnepx
lazy_scan_prune() takes a parameter from lazy_scan_heap() indicating
whether the page was marked all-visible in the VM at the time it was
last checked in find_next_unskippable_block(). This behavior is
historical, dating back to commit 608195a3a3, when we did not pin the
VM page until deciding we must read it. Now that the VM page is already
pinned, there is no meaningful benefit to relying on a cached VM status.
Removing this cached value simplifies the logic in both lazy_scan_heap()
and lazy_scan_prune(). It also clarifies future work that will set the
visibility map on-access: such paths will not have a cached value
available, which would make the logic harder to reason about. And
eliminating it enables us to detect and repair VM corruption on-access.
Along with removing the cached value and unconditionally checking the
visibility status of the heap page, this commit also moves the VM
corruption handling to occur first. This reordering should have no
performance impact, since the checks are inexpensive and performed only
once per page. It does, however, make the control flow easier to
understand. The new restructuring also makes it possible to set the VM
after fixing corruption (if pruning found the page all-visible).
Now that no callers of visibilitymap_set() use its return value, change
its (and visibilitymap_set_vmbits()) return type to void.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
lazy_scan_prune() previously had two separate cases that called
visibilitymap_set() after pruning and freezing. These branches were
nearly identical except that one attempted to avoid dirtying the heap
buffer. However, that situation can never occur — the heap buffer cannot
be clean at that point (and we would hit an assertion if it were).
In lazy_scan_prune(), when we change a previously all-visible page to
all-frozen and the page was recorded as all-visible in the visibility
map by find_next_unskippable_block(), the heap buffer will always be
dirty. Either we have just frozen a tuple and already dirtied the
buffer, or the buffer was modified between find_next_unskippable_block()
and heap_page_prune_and_freeze() and then pruned in
heap_page_prune_and_freeze().
Additionally, XLogRegisterBuffer() asserts that the buffer is dirty, so
attempting to add a clean heap buffer to the WAL chain would assert out
anyway.
Since the “clean heap buffer with already set VM” case is impossible,
the two visibilitymap_set() branches in lazy_scan_prune() can be merged.
Doing so makes the intent clearer and emphasizes that the heap buffer
must always be marked dirty before being added to the WAL chain.
This commit also adds a test case for vacuuming when no heap
modifications are required. Currently this ensures that the heap buffer
is marked dirty before it is added to the WAL chain, but if we later
remove the heap buffer from the VM-set WAL chain or pass it with the
REGBUF_NO_CHANGES flag, this test would guard that behavior.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
Discussion: https://postgr.es/m/flat/CAAKRu_ZWx5gCbeCf7PWCv8p5%3D%3Db7EEws0VD2wksDxpXCvCyHvQ%40mail.gmail.com
Modify two places creating GIN indexes in regression tests, so that the
build is parallel. This provides a basic test coverage, even if the
amounts of data are fairly small.
Reported-by: Kirill Reshke <reshkekirill@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/CALdSSPjUprTj+vYp1tRKWkcLYzdy=N=O4Cn4y_HoxNSqQwBttg@mail.gmail.com
When building a tuplesort during parallel GIN builds, the function
incorrectly looked up the default B-Tree operator, not the function
associated with the GIN opclass (through GIN_COMPARE_PROC).
Fixed by using the same logic as initGinState(), and the other place
in parallel GIN builds.
This could cause two types of issues. First, a data type might not have
a B-Tree opclass, in which case the PrepareSortSupportFromOrderingOp()
fails with an ERROR. Second, a data type might have both B-Tree and GIN
opclasses, defining order/equality in different ways. This could lead to
logical corruption in the index.
Backpatch to 18, where parallel GIN builds were introduced.
Discussion: https://postgr.es/m/73a28b94-43d5-4f77-b26e-0d642f6de777@iki.fi
Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Backpatch-through: 18
Buildfarm member fairywren hit the Windows limitation on the length of a
file path. While there may be other things we should also do to prevent
this from happening, it's certainly the case that the length of this
test file name is much longer than others in the same directory, so make
it shorter.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: http://postgr.es/m/274e0a1a-d7d2-4bc8-8b56-dd09f285715e@gmail.com
Backpatch-through: 17
This fixes cases where a qualifier (const, in all cases here) was
dropped by a cast, but the cast was otherwise necessary or desirable,
so the straightforward fix is to add the qualifier into the cast.
Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b04f4d3a-5e70-4e73-9ef2-87f777ca4aac%40eisentraut.org
The psql documentation for the \d and \d+ meta-commands lists objects
that may be shown, but previously the wording could be read as exhaustive
even though additional objects can also appear in the output.
This commit clarifies the description by adding phrasing such as "for example"
or "such as", making it clear that the listed objects are illustrative
rather than a complete list. While the change is small, it helps avoid
potential user confusion.
As this is a documentation clarification rather than a bug fix,
it is not backpatched.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHut+Pt1DBtaUqfJftkkaQLJJJenYJBtb6Ec6s6vu82KEMh46A@mail.gmail.com
This code thought it was optimizing WindowAgg evaluation by getting rid
of duplicate WindowFuncs, but it turns out all it does today is lead to
cost-underestimations and makes it possible that optimize_window_clauses
could miss some of the WindowFuncs that must receive an updated winref.
The deduplication likely was useful when it was first added, but since
the projection code was changed in b8d7f053c, the list of WindowFuncs
gathered by find_window_functions isn't used during execution. Instead,
the expression evaluation code will process the node's targetlist to find
the WindowFuncs.
The reason the deduplication could cause issues for
optimize_window_clauses() is because if a WindowFunc is moved to another
WindowClause, the winref is adjusted to reference the new WindowClause.
If any duplicate WindowFuncs were discarded in find_window_functions()
then the WindowFuncLists may not include all the WindowFuncs that need
their winref adjusted. This could lead to an error message such as:
ERROR: WindowFunc with winref 2 assigned to WindowAgg with winref 1
The back-branches will receive a different fix so that the WindowAgg costs
are not affected.
Author: Meng Zhang <mza117jc@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAErYLFAuxmW0UVdgrz7iiuNrxGQnFK_OP9hBD5CUzRgjrVrz=Q@mail.gmail.com
Fix for commit a9bdb63bba. The previous plan of redefining alignas
didn't work, because it interfered with other C++ header files (e.g.,
LLVM). So now the new workaround is to just disable the affected
typedefs under the affected compilers. These are not typically used
in extensions anyway.
Discussion: https://www.postgresql.org/message-id/3119480.1769189606%40sss.pgh.pa.us
Like its cousin functions for the restore of relation and attribute
stats, pg_restore_extended_stats() needs to be run by a user that is the
database owner or has MAINTAIN privileges on the table whose stats are
restored. This commit adds a regression test ensuring that MAINTAIN is
required when calling the function. This test also checks that a
ShareUpdateExclusive lock is taken on the table whose stats are
restored.
This has been split from the commit that has introduced
pg_restore_extended_stats(), for clarity.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
The tuple data upserted into pg_statistic_ext_data was missing an
initialization for the nulls flag of stxoid and stxdinherit. This would
cause an incorrect handling of the stats data restored.
This issue has been spotted by CatalogTupleCheckConstraints(),
translating to a NOT NULL constraint inconsistency, while playing more
with the follow-up portions of the patch set.
Oversight in 0e80f3f88d (mea culpa). Surprisingly, the buildfarm did
not complain yet.
Discussion: https://postgr.es/m/CADkLM=c7DY3Jv6ef0n_MGUJ1FyTMUoT697LbkST05nraVGNHYg@mail.gmail.com
This function closely mirror its relation and attribute counterparts,
but for extended statistics (i.e. CREATE STATISTICS) objects, being
able to restore extended statistics for an extended stats object. Like
the other functions, the goal of this feature is to ease the dump or
upgrade of clusters so as ANALYZE would not be required anymore after
these operations, stats being directly loaded into the target cluster
without any post-dump/upgrade computation.
The caller of this function needs the following arguments for the
extended stats to restore:
- The name of the relation.
- The schema name of the relation.
- The name of the extended stats object.
- The schema name of the extended stats object.
- If the stats are inherited or not.
- One or more extended stats kind with its data.
This commit adds only support for the restore of the extended statistics
kind "n_distinct", building the basic infrastructure for the restore
of more extended statistics kinds in follow-up commits, including MVC
and dependencies.
The support for "n_distinct" is eased in this commit thanks to the
previous work done particularly in commits 1f927cce44 and
44eba8f06e, that have added the input function for the type
pg_ndistinct, used as data type in input of this new restore function.
Bump catalog version.
Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
This commit adds a test for pg_restore_attribute_stats() with the
injection of statistics related to a multirange type. This case is
supported in statatt_get_type() since its introduction in ce207d2a79,
but there was no test in the main regression test suite to check for the
case where attribute stats is restored for a multirange type, as done by
multirange_typanalyze().
Author: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/CADkLM=c3JivzHNXLt-X_JicYknRYwLTiOCHOPiKagm2_vdrFUg@mail.gmail.com
Based on name of the macro, it was implied that it could be used for all
mmap() calls on portability grounds. However, its use is limited to
sysv_shmem.c, for CreateAnonymousSegment(). This commit removes the
declaration, reducing the confusion around it as a portability tweak,
being limited to SysV-style shared memory.
This macro has been introduced in b0fc0df936 for sysv_shmem.c,
originally. It has been moved to mem.h in 0ac5e5a7e1 a bit later.
Suggested by: Peter Eisentraut <peter@eisentraut.org>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAExHW5vTWABxuM5fbQcFkGuTLwaxuZDEE2vtx2WuMUWk6JnF4g@mail.gmail.com
Discussion: https://postgr.es/m/12add41a-7625-4639-a394-a5563e349322@eisentraut.org
The intention of the work done in fb9f95502 was that these functions are
inlined. I noticed my compiler isn't doing this on -O2 (gcc version
15.2.0). Also, clang version 20.1.8 isn't inlining either. Fix by
marking both of these functions as pg_attribute_always_inline to avoid
leaving this up to the compiler's heuristics.
A quick test with a Seq Scan on a table with a single int column running
a query that filters all 1 million rows in the WHERE clause yields a
3.9% speedup on my Zen4 machine.
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvrL7Q41B=gv+3wc8+AJGKZugGegUbBo8FPQ+3+NGTPb+w@mail.gmail.com
This commit adds more tests to cover STORAGE MAIN and EXTENDED, checking
how these use TOAST tables. EXTENDED is already widely tested as the
default behavior, but there were no tests where the clause pattern is
directly specified. STORAGE MAIN and its interactions with TOAST was
not covered at all.
This hole in the tests has been noticed for STORAGE MAIN (inline
compressible varlenas), where I have managed to break the backend
without the tests able to notice the breakage while playing with the
varlena structures.
Reviewed-by: Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com>
Discussion: https://postgr.es/m/aXMdX1UTHnzYPkHk@paquier.xyz
Builds with CLOBBER_CACHE_ALWAYS enabled are failing the new test
introduced in 1572ea96e6, checking the nesting level calculation in
the planner hook. The inner query of the function called twice is
registered as normalized, as such builds would register a PGSS entry in
the post-parse-analyse hook due to the cached plans requiring
revalidation.
A trick based on debug_discard_caches cannot work as far as I can, a
normalized query still being registered. This commit takes a different
approach with the addition of a DISCARD PLANS before the first function
call. This forces the use of a normalized query in the PGSS entry for
the inner query of the function with and without CLOBBER_CACHE_ALWAYS,
which should be enough to stabilize the test. Note that the test is
still checking what it should: when removing the nesting level
calculation in the planner hook of PGSS, one still gets a failure for
the PGSS entry of the inner query in the function, with "toplevel" being
flipped to true instead of false (it should be false, as a non-top-level
entry).
Per buildfarm members avocet and trilobite, at least.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/82dd02bb-4e0f-40ad-a60b-baa1763ff0bd@gmail.com
When executing a data-modifying CTE query containing MERGE and some
other DML operation on a table with statement-level AFTER triggers,
the transition tables passed to the triggers would fail to include the
rows affected by the MERGE.
The reason is that, when initializing a ModifyTable node for MERGE,
MakeTransitionCaptureState() would create a TransitionCaptureState
structure with a single "tcs_private" field pointing to an
AfterTriggersTableData structure with cmdType == CMD_MERGE. Tuples
captured there would then not be included in the sets of tuples
captured when executing INSERT/UPDATE/DELETE ModifyTable nodes in the
same query.
Since there are no MERGE triggers, we should only create
AfterTriggersTableData structures for INSERT/UPDATE/DELETE. Individual
MERGE actions should then use those, thereby sharing the same capture
tuplestores as any other DML commands executed in the same query.
This requires changing the TransitionCaptureState structure, replacing
"tcs_private" with 3 separate pointers to AfterTriggersTableData
structures, one for each of INSERT, UPDATE, and DELETE. Nominally,
this is an ABI break to a public structure in commands/trigger.h.
However, since this is a private field pointing to an opaque data
structure, the only way to create a valid TransitionCaptureState is by
calling MakeTransitionCaptureState(), and no extensions appear to be
doing that anyway, so it seems safe for back-patching.
Backpatch to v15, where MERGE was introduced.
Bug: #19380
Reported-by: Daniel Woelfel <dwwoelfel@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19380-4e293be2b4007248%40postgresql.org
Backpatch-through: 15
In preparation for a future change to libpq's default protocol version,
pin today's default (3.0) in the libpq_pipeline tests.
Patch by Jelte Fennema-Nio, with some additional refactoring of the
PQconnectdbParams() logic by me.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://postgr.es/m/DDPR5BPWH1RJ.1LWAK6QAURVAY%40jeltef.nl
InitializeShmemGUCs() always added 1 to the value calculated for
shared_memory_size_in_huge_pages, which is unnecessary if the
shared memory size is divisible by the huge page size.
CreateAnonymousSegment() neglected to check for overflow when
rounding up to a multiple of the huge page size.
These are arguably bugs, but they seem extremely unlikely to be
causing problems in practice, so no back-patch.
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAO6_Xqq2vZbva0R9eQSY0p2kfksX2aP4r%3D%2BZ_q1HBYNU%3Dm8bBg%40mail.gmail.com
Previously, a WAL receiver freshly started would set its state to
WALRCV_STREAMING immediately at startup, before actually establishing a
replication connection.
This commit introduces a new state called WALRCV_CONNECTING, which is
the state used when the WAL receiver freshly starts, or when a restart
is requested, with a switch to WALRCV_STREAMING once the connection to
the upstream server has been established with COPY_BOTH, meaning that
the WAL receiver is ready to stream changes. This change is useful for
monitoring purposes, especially in environments with a high latency
where a connection could take some time to be established, giving some
room between the [re]start phase and the streaming activity.
From the point of view of the startup process, that flips the shared
memory state of the WAL receiver when it needs to be stopped, the
existing WALRCV_STREAMING and the new WALRCV_CONNECTING states have the
same semantics: the WAL receiver has started and it can be stopped.
Based on an initial suggestion from Noah Misch, with some input from me
about the design.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Rahila Syed <rahilasyed90@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VQ5tGOSG5TS-Cg+Fb8gLCGFzxJ_eX4qg+WZ3ZPt=FtwQ@mail.gmail.com
ExecInitModifyTable() unconditionally required a ctid junk column even
when the target was a partitioned table. This led to spurious "could
not find junk ctid column" errors when all children were excluded and
only the dummy root result relation remained.
A partitioned table only appears in the result relations list when all
leaf partitions have been pruned, leaving the dummy root as the sole
entry. Assert this invariant (nrels == 1) and skip the ctid requirement.
Also adjust ExecModifyTable() to tolerate invalid ri_RowIdAttNo for
partitioned tables, which is safe since no rows will be processed in
this case.
Bug: #19099
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19099-e05dcfa022fe553d%40postgresql.org
Backpatch-through: 14
Commit f16241bef mistakenly supposed that INSERT...ON CONFLICT DO
UPDATE rejects partitioned target tables. (This may have been
accurate when the patch was written, but it was already obsolete
when committed.) Hence, there's an assertion that we can't see
ItemPointerIndicatesMovedPartitions() in that path, but the assertion
is triggerable.
Some other places throw error if they see a moved-across-partitions
tuple, but there seems no need for that here, because if we just retry
then we get the same behavior as in the update-within-partition case,
as demonstrated by the new isolation test. So fix by deleting the
faulty Assert. (The fact that this is the fix doubtless explains
why we've heard no field complaints: the behavior of a non-assert
build is fine.)
The TM_Deleted case contains a cargo-culted copy of the same Assert,
which I also deleted to avoid confusion, although I believe that one
is actually not triggerable.
Per our code coverage report, neither the TM_Updated nor the
TM_Deleted case were reached at all by existing tests, so this
patch adds tests for both.
Reported-by: Dmitry Koval <d.koval@postgrespro.ru>
Author: Joseph Koshakow <koshy44@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/f5fffe4b-11b2-4557-a864-3587ff9b4c36@postgrespro.ru
Backpatch-through: 14
In the spirit of commit 4b7e6c73b0 and following, which see for more
details; it appears to have been quite an uncontroversial C11 feature to
use and it makes the code nicer to read.
This commit changes the relopt_value struct.
Author: Peter Eisentraut <peter@eisentraut.org>
Author: Álvaro Herrera <alvherre@kurilemu.de>
Note: Yes, this was written twice independently.
Discussion: https://postgr.es/m/202601192106.zcdi3yu2gzti@alvherre.pgsql
When a range type is created, several construction functions are also
created, two for the range type and three for the multirange type.
These have an internal dependency, so they "belong" to the range type.
But there was no way to identify those functions when given a range
type. An upcoming patch needs access to the two- or possibly the
three-argument range constructor function for a given range type. The
only way to do that would be with fragile workarounds like matching
names and argument types. The correct way to do that kind of thing is
to record to the links in the system catalogs. This is what this
patch does, it records the OIDs of these five constructor functions in
the pg_range catalog. (Currently, there is no code that makes use of
this.)
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/7d63ddfa-c735-4dfe-8c7a-4f1e2a621058%40eisentraut.org
There were many PG_GETARG_* calls, mostly around gin, gist, spgist
code, that were commented out, presumably to indicate that the
argument was unused and to indicate that it wasn't forgotten or
miscounted. But keeping commented-out code updated with refactorings
and style changes is annoying. So this commit changes them to
#ifdef NOT_USED
blocks, which is a style already in use. That way, at least the
indentation and syntax highlighting works correctly, making some of
these blocks much easier to read.
An alternative would be to just delete that code, but there is some
value in making unused arguments explicit, and some of this arguably
serves as example code for index AM APIs.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/328e4371-9a4c-4196-9df9-1f23afc900df%40eisentraut.org
The uses of AssertVariableIsOfType in pg_upgrade are unnecessary
because the calls to upgrade_task_add_step() already check the
compatibility of the callback functions.
These were apparently copied from a previous coding style, but similar
removals were already done in commit 30b789eafe.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/3d289481-7f76-409f-81c7-81824219cc75%40eisentraut.org
Just like we only support compiling with C11, we only support
compiling extensions with C++11 and up. Some compilers support C++11
but don't enable it by default. This detects if flags are needed to
enable C++11 support, in a similar way to how we check the same for
C11 support.
The C++ test extension module added by commit 476b35d4e3 confirmed
that C++11 is effectively required. (This was understood in mailing
list discussions but not recorded anywhere in the source code.)
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/E1viDt1-001d7E-2I%40gemulon.postgresql.org
The possible values of pg_stat_wal_receiver.status have never been
documented. Note that the status "stopped" will never show up in this
view, hence there is no need to document it.
Issue noticed while discussing a patch that aims to add a new status to
WAL receiver.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/CABPTF7X_Jgmyk1FBVNf3tyAOKqU55LLpLMzWkGtEAb_jQWVN=g@mail.gmail.com
With LLVM >= 17, transform passes are provided as a string to
LLVMRunPasses. Only two strings were used: "default<O3>" and
"default<O0>,mem2reg".
With previous LLVM versions, an additional inline pass was added when
JIT inlining was enabled without optimization. With LLVM >= 17, the code
would go through llvm_inline, prepare the functions for inlining, but
the generated bitcode would be the same due to the missing inline pass.
This patch restores the previous behavior by adding an inline pass when
inlining is enabled but no optimization is done.
This fixes an oversight introduced by 76200e5e when support for LLVM 17
was added.
Backpatch-through: 14
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Pierre Ducroquet <p.psql@pinaraf.info>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAO6_XqrNjJnbn15ctPv7o4yEAT9fWa-dK15RSyun6QNw9YDtKg%40mail.gmail.com
Commit bc2f348 introduced multi-line HEADER support for COPY. This commit
extends this capability to file_fdw, allowing multiple header lines to be
skipped.
Because CREATE/ALTER FOREIGN TABLE requires option values to be single-quoted,
this commit also updates defGetCopyHeaderOption() to accept integer values
specified as strings for HEADER option.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: songjinzhou <tsinghualucky912@foxmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAOzEurT+iwC47VHPMS+uJ4WSzvOLPsZ2F2_wopm8M7O+CZa3Xw@mail.gmail.com
The error message reported for invalid values of the HEADER option in COPY
command previously used the term "non-negative integer", which is
discouraged by the Error Message Style Guide because it is ambiguous about
whether zero is allowed.
This commit improves the error message by replacing "non-negative integer"
there with "an integer value greater than or equal to zero" to make
the accepted values explicit.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Steven Niu <niushiji@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwE86PcuPZbP=aurmW7Oo=eycF10gxjErWq4NmY-5TTX4Q@mail.gmail.com
This commit does the following:
* Removes TRY_POPCNT_X86_64. We now assume that the required CPUID
intrinsics are available when HAVE_X86_64_POPCNTQ is defined, as we
have done since v16 for meson builds when
USE_SSE42_CRC32C_WITH_RUNTIME_CHECK is defined and since v17 when
USE_AVX512_POPCNT_WITH_RUNTIME_CHECK is defined.
* Moves the MSVC check for HAVE_X86_64_POPCNTQ to configure-time.
This way, we set it for all relevant platforms in one place.
* Moves the #defines for USE_SSE2 and USE_NEON to c.h so that they
can be used elsewhere without including simd.h. Consequently, we
can remove the POPCNT_AARCH64 macro.
* Moves the #includes for pg_bitutils.h to below the system headers
in pg_popcount_{aarch64,x86}.c, since we no longer depend on macros
from pg_bitutils.h to decide which system headers to use.
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/aWf_InS1VrbeXAfP%40nathan
Since we now have several implementations of the popcount
functions, let's give them more descriptive names. This commit
replaces "slow" with "portable" and "fast" with "sse42". While the
POPCNT instruction is technically not part of SSE4.2, this naming
scheme is close enough in practice and is arguably easier to
understand than using "popcnt" instead.
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/aWf_InS1VrbeXAfP%40nathan
This moves the remaining x86-64-specific popcount implementations
in pg_bitutils.c to pg_popcount_x86.c.
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/aWf_InS1VrbeXAfP%40nathan
This is preparatory work for a follow-up commit that will move the
rest of the x86-64-specific popcount code to this file.
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/aWf_InS1VrbeXAfP%40nathan
Continuing to support this backwards-compatibility feature has
nontrivial costs; in particular it is potentially a security hazard
if an application somehow gets confused about which setting the
server is using. We changed the default to ON fifteen years ago,
which seems like enough time for applications to have adapted.
Let's remove support for the legacy string syntax.
We should not remove the GUC altogether, since client-side code will
still test it, pg_dump scripts will attempt to set it to ON, etc.
Instead, just prevent it from being set to OFF. There is precedent
for this approach (see commit de66987ad).
This patch does remove the related GUC escape_string_warning, however.
That setting does nothing when standard_conforming_strings is on,
so it's now useless. We could leave it in place as a do-nothing
setting to avoid breaking clients that still set it, if there are any.
But it seems likely that any such client is also trying to turn off
standard_conforming_strings, so it'll need work anyway.
The client-side changes in this patch are pretty minimal, because even
though we are dropping the server's support, most of our clients still
need to be able to talk to older server versions. We could remove
dead client code only once we disclaim compatibility with pre-v19
servers, which is surely years away. One change of note is that
pg_dump/pg_dumpall now set standard_conforming_strings = on in their
source session, rather than accepting the source server's default.
This ensures that literals in view definitions and such will be
printed in a way that's acceptable to v19+. In particular,
pg_upgrade will work transparently even if the source installation has
standard_conforming_strings = off. (However, pg_restore will behave
the same as before if given an archive file containing
standard_conforming_strings = off. Such an archive will not be safely
restorable into v19+, but we shouldn't break the ability to extract
valid data from it for use with an older server.)
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3279216.1767072538@sss.pgh.pa.us
From the user's point of view these are just Boolean values; from the
implementation side we can now distinguish an option that hasn't been
set. Reimplement the vacuum_truncate reloption using this type.
This could also be used for reloptions vacuum_index_cleanup and
buffering, but those additionally need a per-option "alias" for the
state where the variable is unset (currently the value "auto").
Author: Nikolay Shaplov <dhyan@nataraj.su>
Reviewed-by: Timur Magomedov <t.magomedov@postgrespro.ru>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/3474141.usfYGdeWWP@thinkpad-pgpro
This was introduced in the SJE patch (fc069a3a6), but it doesn't
do anything because pull_var_clause() never tests it. Apparently
it snuck in from somebody's private fork. Remove it again, but
only in HEAD -- seems best to let it be in v18.
Author: Alexander Pyhalov <a.pyhalov@postgrespro.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/70008c19d22e3dd1565ca57f8436c0ba@postgrespro.ru
We were using SnapshotAny to do some index checks, but that's wrong and
causes spurious errors when used on indexes created by CREATE INDEX
CONCURRENTLY. Fix it to use an MVCC snapshot, and add a test for it.
Backpatch of 6bd469d26a to branches 14-16. I previously misidentified
the bug's origin: it came in with commit 7f563c09f8 (pg11-era, not
5ae2087202 as claimed previously), so all live branches are affected.
Also take the opportunity to fix some comments that we failed to update
in the original commits and apply pgperltidy. In branch 14, remove the
unnecessary test plan specification (which would have need to have been
changed anyway; c.f. commit 549ec201d613.)
Diagnosed-by: Donghang Lin <donghanglin@gmail.com>
Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Backpatch-through: 17
Discussion: https://postgr.es/m/CANtu0ojmVd27fEhfpST7RG2KZvwkX=dMyKUqg0KM87FkOSdz8Q@mail.gmail.com
This commit adds tests to verify the computation of the nesting level
for two code paths: the planner hook and the ExecutorFinish() hook. The
nesting level is essential to save a correct "toplevel" status for the
added PGSS entries.
The author has noticed that removing the manipulations of nesting_level
in these two code paths did not cause the tests to complain, meaning
that we never had coverage for the assumptions taken by the code.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0uK1PSrgf52bWCtDpzaqbWt04o6ZA7zBm6UQyv7vyvf9w@mail.gmail.com
After commit 476b35d4e3, some buildfarm members are complaining about
not recognizing _Noreturn when building the new C++ module
test_cplusplusext. This is not a C++ feature, but it was gated like
#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L
#define pg_noreturn _Noreturn
But apparently that was not sufficient. Some platforms define
__STDC_VERSION__ even in C++ mode. (In this particular case, it was
g++ on Solaris, but apparently this is also done by some other
platforms, and it is allowed by the C++ standard.) To fix, add a
... && !defined(__cplusplus)
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/CAGECzQR21OnnKiZO_1rLWO0-16kg1JBxnVq-wymYW0-_1cUNtg@mail.gmail.com
This change enhances the clarity and usefulness of error detail messages
generated during logical replication conflicts. The following improvements
have been made:
1. Eliminate redundant output: Avoid printing duplicate remote row and
replica identity values for the multiple_unique_conflicts conflict type.
2. Improve message structure: Append tuple values directly to the main
error message, separated by a colon (:), for better readability.
3. Simplify local row terminology: Remove the word 'existing' when
referring to the local row, as this is already implied by context.
4. General code refinements: Apply miscellaneous code cleanups to improve
how conflict detail messages are constructed and formatted.
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Discussion: https://postgr.es/m/CAHut+Psgkwy5-yGRJC15izecySGGysrbCszv_z93ess8XtCDOQ@mail.gmail.com
The test "squashing" was the last item of the REGRESS list, but
"cleanup" should be the second to last, dropping the extension.
"oldextversions" is the last item.
In passing, the REGRESS list is cleaned up to include one item per line,
so as diffs are minimized when adding new test files.
Noticed while playing with this area of the code.
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Man Zeng <zengman@halodbtech.com>
Discussion: https://postgr.es/m/aW6_Xc8auuu5iAPi@paquier.xyz
While we already test that our headers are valid C++ using
headerscheck, it turns out that the macros we define might still
expand to invalid C++ code. This adds a minimal test extension that
is compiled using C++ to test that it's actually possible to build and
run extensions written in C++. Future commits will improve C++
compatibility of some of our macros and add usage of them to this
extension make sure that they don't regress in the future.
The test module is for the moment disabled when using MSVC. In
particular, the use of designated initializers in PG_MODULE_MAGIC
would require C++20, for which we are currently not set up. (GCC and
Clang support it as extensions.) It is planned to fix this.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/CAGECzQR21OnnKiZO_1rLWO0-16kg1JBxnVq-wymYW0-_1cUNtg@mail.gmail.com
This way we don't have to walk the entire process type array and
strcmp() the string with the names therein. The integer value can be
directly used as array index instead.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/202512090935.k3xrtr44hxkn@alvherre.pgsql
A recent BF failure showed that commit 7a485bd641 did not handle the case
where a sequence is dropped concurrently during sequence synchronization
on the subscriber. Previously, pg_get_sequence_data() would ERROR out if
the sequence was dropped concurrently. After 7a485bd641, it instead
returns NULL, which leads to an assertion failure on the subscriber.
To handle this change, update sequence synchronization to skip sequences
for which pg_get_sequence_data() returns NULL.
Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CALDaNm0FoGdt+1mzua0t-=wYdup5_zmFrvfNf-L=MGBnj9HAcg@mail.gmail.com
This addition is in the same spirit as 32e27bd320 for MVNDistinct and
MVDependencies, except that we were missing a free routine for the third
type of extended statistics, MCVList. I was not sure if we needed an
equivalent for MCVList, but after more review of the main patch set for
the import of extended statistics, it has become clear that we do.
This is introduced as its own change as this routine can be useful on
its own. This one is a piece that has not been written by Corey
Huinker, I have just noticed it by myself on the way.
Author: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
When IN/ANY clauses contain both constants and variable expressions, the
optimizer transforms them into separate structures: constants become
an array expression while variables become individual OR conditions.
This transformation was creating an overlap with the token locations,
causing pg_stat_statements query normalization to crash because it
could not calculate the amount of bytes remaining to write for the
normalized query.
This commit disables squashing for mixed IN list expressions when
constructing a scalar array op, by setting list_start and list_end
to -1 when both variables and non-variables are present. Some
regression tests are added to PGSS to verify these patterns.
Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0ts9qiONnHjjHxPxtePs22GBo4d3jZ_s2BQC59AN7XbAA@mail.gmail.com
Backpatch-through: 18
When faced with a relation containing more than 1 physical segment
(i.e. >1GB, with normal settings), the previous code could compute a
truncation block length greater than RELSEG_SIZE, which could lead to
restore failures of this form:
file "%s" has truncation block length %u in excess of segment size %u
The fix is simply to clamp the maximum computed truncation_block_length
to RELSEG_SiZE. I have also added some comments to clarify the logic.
The test case was written by Oleg Tkachenko, but I have rewritten its
comments.
Reported-by: Oleg Tkachenko <oatkachenko@gmail.com>
Diagnosed-by: Oleg Tkachenko <oatkachenko@gmail.com>
Co-authored-by: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Oleg Tkachenko <oatkachenko@gmail.com>
Reviewed-by: Amul Sul <sulamul@gmail.com>
Backpatch-through: 17
Discussion: http://postgr.es/m/00FEFC88-EA1D-4271-B38F-EB741733A84A@gmail.com
When checking a subquery's output expressions to see if it's safe to
push down an upper-level qual, check_output_expressions() previously
treated grouping Vars as opaque Vars. This implicitly assumed they
were stable and scalar.
However, a grouping Var's underlying expression corresponds to the
grouping clause, which may be volatile or set-returning. If an
upper-level qual references such an output column, pushing it down
into the subquery is unsafe. This can cause strange results due to
multiple evaluation of a volatile function, or introduce SRFs into
the subquery's WHERE/HAVING quals.
This patch teaches check_output_expressions() to look through grouping
Vars to their underlying expressions. This ensures that any
volatility or set-returning properties in the grouping clause are
detected, preventing the unsafe pushdown.
We do not need to recursively examine the Vars contained in these
underlying expressions. Even if they reference outputs from
lower-level subqueries (at any depth), those references are guaranteed
not to expand to volatile or set-returning functions, because
subqueries containing such functions in their targetlists are never
pulled up.
Backpatch to v18, where this issue was introduced.
Reported-by: Eric Ridge <eebbrr@gmail.com>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/7900964C-F99E-481E-BEE5-4338774CEB9F@gmail.com
Backpatch-through: 18
This is pretty pro-forma for our purposes, as the only change
is a historical correction for pre-1976 DST laws in
Baja California. (Upstream made this release mostly to update
their leap-second data, which we don't use.) But with minor
releases coming up, we should be up-to-date.
Backpatch-through: 14
The code adding the WAL information included in a backup manifest is
cross-checked with the contents of the timeline history file of the end
timeline. A check based on the end timeline, when it fails, reported
the value of the start timeline in the error message. This error is
fixed to show the correct timeline number in the report.
This error report would be confusing for users if seen, because it would
provide an incorrect information, so backpatch all the way down.
Oversight in 0d8c9c1210.
Author: Man Zeng <zengman@halodbtech.com>
Discussion: https://postgr.es/m/tencent_0F2949C4594556F672CF4658@qq.com
Backpatch-through: 14
An assertion is used in this routine to check that a valid namespace OID
is given by the caller, but it was repeated twice: once at the top of
the routine and a second time multiple times in a switch/case. This
commit removes the assertions within the switch/case.
Thinko in commit 765cbfdc92.
Author: Man Zeng <zengman@halodbtech.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/tencent_40F8C1D82E2EE28065009AAA@qq.com
The function is part of the injection_points test module and only used
in tests. None of the current tests call it with a NULL argument, but
it is supposed to work.
Backpatch-through: 17
If auto-analyze kicks in at just the right moment, it can hold a
snapshot and prevent the VACUUM command in the test from removing the
deleted tuples. The test needs the tuples to be removed, otherwise no
half-dead page is generated. To fix, introduce a helper procedure to
wait for the removable cutoff to advance, like the one used in the
syscache-update-pruned test for similar purposes.
Thanks to Alexander Lakhin for reproducing and analyzing the test
failure, and Tom Lane for the report.
Discussion: https://www.postgresql.org/message-id/307198.1767408023@sss.pgh.pa.us
Some compilers, e.g. gcc with -Og or -O1, warn about the wait_event in
BufferLockAcquire() possibly being uninitialized. That can't actually happen,
as the switch() covers all legal lock mode values, but we still need to
silence the warning. We could add a default:, but we'd like to get a warning
if we were to get a new lock mode in the future. So just initialize
wait_event to 0.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/934395.1768518154@sss.pgh.pa.us
Issue fat-fingered in d756fa1019, noticed while doing more review of
the main patch set proposed. I have missed the fact that this can be
triggered by specifying an extended stats object that does not match
with the relation specified and already locked. Like the cases where
an object defined in input is missing, the code is changed to issue a
WARNING instead of a confusing cache lookup failure.
A regression test is added to cover this case.
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
Commit cbc127917e introduced tracking of unpruned relids to skip
processing of pruned partitions. PlannedStmt.unprunableRelids is
computed as the difference between PlannerGlobal.allRelids and
prunableRelids, but allRelids only contains RTE_RELATION entries.
This means non-relation RTEs (VALUES, subqueries, CTEs, etc.) are
never included in unprunableRelids, and consequently not in
es_unpruned_relids at runtime.
As a result, rowmarks attached to non-relation RTEs were incorrectly
skipped during executor initialization. This affects any DML statement
that has rowmarks on such RTEs, including MERGE with a VALUES or
subquery source, and UPDATE/DELETE with joins against subqueries or
CTEs. When a concurrent update triggers an EPQ recheck, the missing
rowmark leads to incorrect results.
Fix by restricting the es_unpruned_relids membership check to
RTE_RELATION entries only, since partition pruning only applies to
actual relations. Rowmarks for other RTE kinds are now always
processed.
Bug: #19355
Reported-by: Bihua Wang <wangbihua.cn@gmail.com>
Diagnosed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/19355-57d7d52ea4980dc6@postgresql.org
Backpatch-through: 18
If a FATAL error occurs while holding a lock in a DSM segment (such
as a dshash lock) and the process is not in a transaction, a
segmentation fault can occur during process exit.
The problem sequence is:
1. Process acquires a lock in a DSM segment (e.g., via dshash)
2. FATAL error occurs outside transaction context
3. proc_exit() begins, calling before_shmem_exit callbacks
4. dsm_backend_shutdown() detaches all DSM segments
5. Later, on_shmem_exit callbacks run
6. ProcKill() calls LWLockReleaseAll()
7. Segfault: the lock being released is in unmapped memory
This only manifests outside transaction contexts because
AbortTransaction() calls LWLockReleaseAll() during transaction
abort, releasing locks before DSM cleanup. Background workers and
other non-transactional code paths are vulnerable.
Fix by calling LWLockReleaseAll() unconditionally at the start of
shmem_exit(), before any callbacks run. Releasing locks before
callbacks prevents the segfault - locks must be released before
dsm_backend_shutdown() detaches their memory. This is safe because
after an error, held locks are protecting potentially inconsistent
data anyway, and callbacks can acquire fresh locks if needed.
Also add a comment noting that LWLockReleaseAll() must be safe to
call before LWLock initialization (which it is, since
num_held_lwlocks will be 0), plus an Assert for the post-condition.
This fix aligns with the original design intent from commit
001a573a2, which noted that backends must clean up shared memory
state (including releasing lwlocks) before unmapping dynamic shared
memory segments.
Reported-by: Rahila Syed <rahilasyed90@gmail.com>
Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAH2L28uSvyiosL+kaic9249jRVoQiQF6JOnaCitKFq=xiFzX3g@mail.gmail.com
Backpatch-through: 14
Commit 1e2fddfa33 changed OutputFsync() so that it always returns true.
However, pg_recvlogical.c still contained checks of its boolean return
value, which are now redundant.
This commit removes those checks and changes the type of return value of
OutputFsync() to void, simplifying the code.
Suggested-by: Yilin Zhang <jiezhilove@126.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwFeTymZQ7RLvMU6WuDGar8bUQCazg=VOfA-9GeBkg-FzA@mail.gmail.com
This commit adds a test to verify that data already received and flushed by
pg_recvlogical is not streamed again even after the connection is lost,
reestablished, and logical replication is restarted.
Author: Mircea Cadariu <cadariu.mircea@gmail.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwFeTymZQ7RLvMU6WuDGar8bUQCazg=VOfA-9GeBkg-FzA@mail.gmail.com
wait_for_file() waits for the contents of a specified file, starting at an
optional offset, to match a given regular expression. If no offset is
provided, the entire file is checked. The function times out after
$PostgreSQL::Test::Utils::timeout_default seconds. It returns the total
file length on success.
The existing wait_for_log() function contains almost identical logic, but
is limited to reading the cluster's log file. This commit also refactors
wait_for_log() to call wait_for_file() instead, avoiding code duplication.
This helper will be used by upcoming changes.
Suggested-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwFeTymZQ7RLvMU6WuDGar8bUQCazg=VOfA-9GeBkg-FzA@mail.gmail.com
Previously, when pg_recvlogical lost connection, reconnected, and restarted
replication, data that had already been flushed could be streamed again.
This happened because the replication start position used when restarting
replication was taken from the last standby status message, which could be
older than the position of the last flushed data. As a result, some flushed
data newer than the replication start position could exist and be re-sent.
This commit fixes the issue by ensuring all written data is flushed to disk
before restarting replication, and by using the last flushed position as
the replication start point. This prevents already flushed data from being
re-sent.
Additionally, previously when the --no-loop option was used, pg_recvlogical
could exit without flushing written data, potentially losing data. To fix
this issue, this commit also ensures all data is flushed to disk before
exiting due to --no-loop.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Yilin Zhang <jiezhilove@126.com>
Reviewed-by: Dewei Dai <daidewei1970@163.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwFeTymZQ7RLvMU6WuDGar8bUQCazg=VOfA-9GeBkg-FzA@mail.gmail.com
The test introduced in 639352d904 has added a direct pg_ctl command to
start a node, a method that is incompatible with the teardown() routine
used at the end of the test as the PID saved in the Cluster object would
prevent the node to be shut down. This can ultimately prevent the test
to perform its cleanup, failing on timeout.
Like pg_ctl's 001_start_stop or ssl_passphrase_callback's 001_testfunc,
this commit changes the test so a direct pg_ctl command is used to stop
the rogue node. That should be hopefully enough to cool down the
buildfarm.
Per report from buildfarm member fairywren, which is the only animal
that is showing this issue.
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/TY7PR01MB1455452AE9053DD2B77B74FEAF58CA@TY7PR01MB14554.jpnprd01.prod.outlook.com
This function is able to clear the data associated to an extended
statistics object, making things so as the object looks as
newly-created.
The caller of this function needs the following arguments for the
extended stats to clear:
- The name of the relation.
- The schema name of the relation.
- The name of the extended stats object.
- The schema name of the extended stats object.
- If the stats are inherited or not.
The first two parameters are especially important to ensure a consistent
lookup and ACL checks for the relation on which is based the extended
stats object that will be cleared, relying first on a RangeVar lookup
where permissions are checked without locking a relation, critical to
prevent denial-of-service attacks when using this kind of function (see
also 688dc6299a for a similar concern). The third to fifth arguments
give a way to target the extended stats records to clear.
This has been extracted from a larger patch by the same author, for a
piece which is again useful on its own. I have rewritten large portions
of it. The tests have been extended while discussing this piece,
resulting on what this commit includes. The intention behind this
feature is to add support for the import of extended statistics across
dumps and upgrades, this change building one piece that we will be able
to rely on for the rest of the changes.
Bump catalog version.
Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
This reverts commit f8d7f29b3e, plus parts of
subsequent commits fixing a typo in a parameter name.
Support for disowned lwlocks was added for the benefit of AIO, to be able to
have content locks "owned" by the AIO subsystem. But as of commit fcb9c977aa,
content locks do not use lwlocks anymore.
It does not seem particularly likely that we need this facility outside of the
AIO use-case, therefore remove the now unused functions.
I did choose to keep the comment added in the aforementioned commit about
lock->owner intentionally being left pointing to the last owner.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/cj5mcjdpucvw4a54hehslr3ctukavrbnxltvuzzhqnimvpju5e@cy3g3mnsefwz
As of commit fcb9c977aa, ForEachLWLockHeldByMe(), introduced in f4ece891fc,
is not used anymore, as content locks are now implemented in bufmgr.c. It
doesn't seem that likely that a new user of the functionality will appear all
that soon, making removal of the function seem like the most sensible path. It
can easily be added back if necessary.
Discussion: https://postgr.es/m/lneuyxqxamqoayd2ntau3lqjblzdckw6tjgeu4574ezwh4tzlg%40noioxkquezdw
Until now buffer content locks were implemented using lwlocks. That has the
obvious advantage of not needing a separate efficient implementation of
locks. However, the time for a dedicated buffer content lock implementation
has come:
1) Hint bits are currently set while holding only a share lock. This leads to
having to copy pages while they are being written out if checksums are
enabled, which is not cheap. We would like to add AIO writes, however once
many buffers can be written out at the same time, it gets a lot more
expensive to copy them, particularly because that copy needs to reside in
shared buffers (for worker mode to have access to the buffer).
In addition, modifying buffers while they are being written out can cause
issues with unbuffered/direct-IO, as some filesystems (like btrfs) do not
like that, due to filesystem internal checksums getting corrupted.
The solution to this is to require a new share-exclusive lock-level to set
hint bits and to write out buffers, making those operations mutually
exclusive. We could introduce such a lock-level into the generic lwlock
implementation, however it does not look like there would be other users,
and it does add some overhead into important code paths.
2) For AIO writes we need to be able to race-freely check whether a buffer is
undergoing IO and whether an exclusive lock on the page can be acquired. That
is rather hard to do efficiently when the buffer state and the lock state
are separate atomic variables. This is a major hindrance to allowing writes
to be done asynchronously.
3) Buffer locks are by far the most frequently taken locks. Optimizing them
specifically for their use case is worth the effort. E.g. by merging
content locks into buffer locks we will be able to release a buffer lock
and pin in one atomic operation.
4) There are more complicated optimizations, like long-lived "super pinned &
locked" pages, that cannot realistically be implemented with the generic
lwlock implementation.
Therefore implement content locks inside bufmgr.c. The lockstate is stored as
part of BufferDesc.state. The implementation of buffer content locks is fairly
similar to lwlocks, with a few important differences:
1) An additional lock-level share-exclusive has been added. This lock-level
conflicts with exclusive locks and itself, but not share locks.
2) Error recovery for content locks is implemented as part of the already
existing private-refcount tracking mechanism in combination with resowners,
instead of a bespoke mechanism as the case for lwlocks. This means we do
not need to add dedicated error-recovery code paths to release all content
locks (like done with LWLockReleaseAll() for lwlocks).
3) The lock state is embedded in BufferDesc.state instead of having its own
struct.
4) The wakeup logic is a tad more complicated due to needing to support the
additional lock-level
This commit unfortunately introduces some code that is very similar to the
code in lwlock.c, however the code is not equivalent enough to easily merge
it. The future wins that this commit makes possible seem worth the cost.
As of this commit nothing uses the new share-exclusive lock mode. It will be
used in a future commit. It seemed too complicated to introduce the lock-level
in a separate commit.
It's worth calling out one wart in this commit: Despite content locks not
being lwlocks anymore, they continue to use PGPROC->lw* - that seemed better
than duplicating the relevant infrastructure.
Another thing worth pointing out is that, after this change, content locks are
not reported as LWLock wait events anymore, but as new wait events in the
"Buffer" wait event class (see also 6c5c393b74). The old BufferContent lwlock
tranche has been removed.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
This is motivated by wanting to merge buffer content locks into
BufferDesc.state in a future commit, rather than having a separate lwlock (see
commit c75ebc657f for more details). As this change is rather mechanical, it
seems to make sense to split it out into a separate commit, for easier review.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
This patch reworks LISTEN/NOTIFY to avoid waking backends that have
no need to process the notification messages we just sent.
The primary change is to create a shared hash table that tracks
which processes are listening to which channels (where a "channel" is
defined by a database OID and channel name). This allows a notifying
process to accurately determine which listeners are interested,
replacing the previous weak approximation that listeners in other
databases couldn't be interested.
Secondly, if a listener is known not to be interested and is
currently stopped at the old queue head, we avoid waking it at all
and just directly advance its queue pointer past the notifications
we inserted.
These changes permit very significant improvements (integer multiples)
in NOTIFY throughput, as well as a noticeable reduction in latency,
when there are many listeners but only a few are interested in any
specific message. There is no improvement for the simplest case where
every listener reads every message, but any loss seems below the noise
level.
Author: Joel Jacobson <joel@compiler.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/6899c044-4a82-49be-8117-e6f669765f7e@app.fastmail.com
On restart, a replica can fail with an error like 'unexpected data
beyond EOF in block 200 of relation T/D/R'. These are the steps to
reproduce it:
- A relation has a size of 400 blocks.
- Blocks 201 to 400 are empty.
- Block 200 has two rows.
- Blocks 100 to 199 are empty.
- A restartpoint is done
- Vacuum truncates the relation to 200 blocks
- A FPW deletes a row in block 200
- A checkpoint is done
- A FPW deletes the last row in block 200
- Vacuum truncates the relation to 100 blocks
- The replica restarts
When the replica restarts:
- The relation on disk starts at 100 blocks, because all the
truncations were applied before restart.
- The first truncate to 200 blocks is replayed. It silently fails, but
it will still (incorrectly!) update the cache size to 200 blocks
- The first FPW on block 200 is applied. XLogReadBufferForRead relies
on the cached size and incorrectly assumes that the page already
exists in the file, and thus won't extend the relation.
- The online checkpoint record is replayed, calling smgrdestroyall
which causes the cached size to be discarded
- The second FPW on block 200 is applied. This time, the detected size
is 100 blocks, an extend is attempted. However, the block 200 is
already present in the buffer cache due to the first FPW. This
triggers the 'unexpected data beyond EOF'.
To fix, update the cached size in SmgrRelation with the current size
rather than the requested new size, when the requested new size is
greater.
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://www.postgresql.org/message-id/CAO6_Xqrv-snNJNhbj1KjQmWiWHX3nYGDgAc=vxaZP3qc4g1Siw@mail.gmail.com
Backpatch-through: 14
Liujinyang reported the one in binaryheap.c, I then found and analyzed
the rest.
For future patches, we require git archaelogical analysis before we
accept patches of this nature.
Co-authored-by: liujinyang <21043272@qq.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/tencent_6B302BFCAF6F010E00AB5C2C0ECB7AA3F205@qq.com
We called io_uring_cqe_seen(..., cqe) before reading cqe->res. That allows the
completion to be reused, which in turn could lead to cqe->res being
overwritten. The window for that is very narrow and the likelihood of it
happening is very low, as we should never actually utilize all CQEs, but the
consequences would be bad.
This bug was reported to me privately.
Backpatch-through: 18
Discussion: https://postgr.es/m/bwo3e5lj2dgi2wzq4yvbyzu7nmwueczvvzioqsqo6azu6lm5oy@pbx75g2ach3p
When an autovacuum worker exits, the launcher needs to be notified
with SIGUSR2, so that it can rebalance and possibly launch a new
worker. The launcher must be notified only after the worker has
finished ProcKill(), so that the worker slot is available for a new
worker. Before this commit, the autovacuum worker was responsible for
that, which required a slightly complicated dance to pass the
launcher's PID from FreeWorkerInfo() to ProcKill() in a global
variable.
Simplify that by moving the responsibility of the signaling to the
postmaster. The postmaster was already doing it when it failed to fork
a worker process, so it seems logical to make it responsible for
notifying the launcher on worker exit too. That's also how the
notification on background worker exit is done.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: li carol <carol.li2025@outlook.com>
Discussion: https://www.postgresql.org/message-id/a5e27d25-c7e7-45d5-9bac-a17c8f462def@iki.fi
If a multixid with zero offset is left behind after a crash, and that
multixid later becomes the oldest multixid, truncation might try to
look up its offset and read the zero value. In the worst case, we
might incorrectly use the zero offset to truncate valid SLRU segments
that are still needed. I'm not sure if that can happen in practice, or
if there are some other lower-level safeguards or incidental reasons
that prevent the caller from passing an unwritten multixid as the
oldest multi. But better safe than sorry, so let's add an explicit
check for it.
In stable branches, we should perhaps do the same check for
'oldestOffset', i.e. the offset of the old oldest multixid (in master,
'oldestOffset' is gone). But if the old oldest multixid has an invalid
offset, the damage has been done already, and we would never advance
past that point. It's not clear what we should do in that case. The
check that this commit adds will prevent such an multixid with invalid
offset from becoming the oldest multixid in the first place, which
seems enough for now.
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: Discussion: https://www.postgresql.org/message-id/000301b2-5b81-4938-bdac-90f6eb660843@iki.fi
Backpatch-through: 14
With 64-bit multixact offsets, PerformMembersTruncation() doesn't need
the starting offset anymore. The 'oldestOffset' value that
TruncateMultiXact() calculates is no longer used for anything. Remove
it, and the code to calculate it.
'oldestOffset' was included in the WAL record as 'startTruncMemb',
which sounds nice if you e.g. look at the WAL with pg_waldump, but it
was also confusing because we didn't actually use the value for
determining what to truncate. Replaying the WAL would remove all
segments older than 'endTruncMemb', regardless of
'startTruncMemb'. The 'startTruncOff' stored in the WAL record was
similarly unnecessary even before 64-bit multixid offsets, it was
stored just for the sake of symmetry with 'startTruncMemb'. Remove
both from the WAL record, and rename the remaining 'endTruncOff' to
'oldestMulti' and 'endTruncMemb' to 'oldestOffset', for consistency
with the variable names used for them in other places.
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://www.postgresql.org/message-id/000301b2-5b81-4938-bdac-90f6eb660843@iki.fi
The initialization of PL/Python (the Python interpreter, the global
state, the plpy module) was arranged confusingly across different
functions with unclear and confusing boundaries. For example,
PLy_init_interp() said "Initialize the Python interpreter ..." but it
didn't actually do this, and PLy_init_plpy() said "initialize plpy
module" but it didn't do that either. After this change, all the
global initialization is called directly from _PG_init(), and the plpy
module initialization is all called from its registered initialization
function PyInit_plpy().
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: li carol <carol.li2025@outlook.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/f31333f1-fbb7-4098-b209-bf2d71fbd4f3%40eisentraut.org
This seems to have existed like this since Python 3 support was
added (commit dd4cd55c15), but it's unclear what this second call is
supposed to accomplish.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: li carol <carol.li2025@outlook.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/f31333f1-fbb7-4098-b209-bf2d71fbd4f3%40eisentraut.org
The comments "PyModule_AddObject does not add a refcount to the
object, for some odd reason" seem distracting. Arguably, this
behavior is expected, not odd. Also, the additional references
created by the existing code are apparently not necessary. But we
should clean up the reference in the error case, as suggested by the
Python documentation.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: li carol <carol.li2025@outlook.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/f31333f1-fbb7-4098-b209-bf2d71fbd4f3%40eisentraut.org
This code has been commented out since the first commit of plpython.
It doesn't seem worth keeping.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: li carol <carol.li2025@outlook.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/f31333f1-fbb7-4098-b209-bf2d71fbd4f3%40eisentraut.org
These routines are useful to perform some basic validation checks on
each object structure, working currently on attribute numbers for
non-expression and expression attnums. These checks could be extended
in the future.
Note that this code is not used yet in the tree, and that these
functions will become handy for an upcoming patch for the import of
extended statistics data. However, they are worth their own independent
change as they are actually useful by themselves, with at least the
extension code argument in mind (or perhaps I am just feeling more
pedantic today).
Extracted from a larger patch by the same author, with many adjustments
and fixes by me.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
In CreateWorkExprContext(), maxBlockSize is initialized to
ALLOCSET_DEFAULT_MAXSIZE, and it then immediately reassigned,
thus the initialization is a redundant.
Author: Andreas Karlsson <andreas@proxel.se>
Reported-by: Chao Li <lic@highgo.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/83a14f3c-f347-4769-9c01-30030b31f1eb@gmail.com
This reverts commit f0f2c0c1ae.
The original problem that led to the use of pg_restrict was that MSVC
couldn't handle plain restrict, and defining it to something else
would conflict with its __declspec(restrict) that is used in system
header files. In C11 mode, this is no longer a problem, as MSVC
handles plain restrict. This led to the commit to replace pg_restrict
with restrict. But this did not take C++ into account. Standard C++
does not have restrict, so we defined it as something else (for
example, MSVC supports __restrict). But this then again conflicts
with __declspec(restrict) in system header files. So we have to
revert this attempt. The comments are updated to clarify that the
reason for this is now C++ only.
Reported-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/CAGECzQRoD7chJP1-dneSrhxUJv%2BBRcigoGOO4UwGzaShLot2Yw%40mail.gmail.com
Previously, the Python Limited API was disabled on MSVC due to build
failures caused by Meson not knowing to link against python3.lib
instead of python3XX.lib when using the Limited API.
This commit works around the Meson limitation by explicitly finding
and linking against python3.lib on MSVC, and removes the preprocessor
guard that was disabling the Limited API on MSVC in plpython.h.
This requires python3.lib to be present in the Python installation,
which is included when Python is installed.
Author: Bryan Green <dbryan.green@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/ee410de1-1e0b-4770-b125-eeefd4726a24%40eisentraut.org
This patch switches some code paths to use GetDatum() macros more in
line with the data types of the variables they manipulate. This set of
changes does not fix a problem, but it is always nice to be more
consistent across the board.
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Roman Khapov <rkhapov@yandex-team.ru>
Reviewed-by: Yuan Li <carol.li2025@outlook.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Man Zeng <zengman@halodbtech.com>
Discussion: https://postgr.es/m/CALdSSPidtC7j3MwhkqRj0K2hyp36ztnnjSt6qzGxQtiePR1dzw@mail.gmail.com
Commit 5b148706c5 exposed functionality that allows multiple processes to
use the same replication origin, enabling non-builtin logical replication
solutions to implement parallel apply for large transactions.
With this functionality, if two backends acquire the same replication
origin and one of them resets it first, the acquired_by flag is cleared
without acknowledging that another backend is still actively using the
origin. This can lead to the origin being unintentionally dropped. If the
shared memory for that dropped origin is later reused for a newly created
origin, the remaining backend that still holds a pointer to the old memory
may inadvertently advance the LSN of a completely different origin,
causing unpredictable behavior.
Although the underlying issue predates commit 5b148706c5, it did not
surface earlier because the internal parallel apply worker mechanism
correctly coordinated origin resets and drops.
This commit resolves the problem by introducing a reference counter for
replication origins. The reference count increases when a backend sets the
origin and decreases when it resets it. Additionally, the backend that
first acquires the origin will not release it until all other backends
using the origin have released it as well.
The patch also prevents dropping a replication origin when acquired_by is
zero but the reference counter is nonzero, covering the scenario where the
first session exits without properly releasing the origin.
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/TY4PR01MB169077EE72ABE9E55BAF162D494B5A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/CAMPB6wfe4zLjJL8jiZV5kjjpwBM2=rTRme0UCL7Ra4L8MTVdOg@mail.gmail.com
The test 002_save_fullpage.pl, checking --save-fullpage fails with
wal_consistency_checking enabled, due to the fact that the block saved
in the file has the same LSN as the LSN used in the file name. The test
required that the block LSN is stritly lower than file LSN. This commit
relaxes the check a bit, by allowing the LSNs to match.
While on it, the test name is reworded to include some information about
the file and block LSNs, which is useful for debugging.
Author: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/4226AED7-E38F-419B-AAED-9BC853FB55DE@yandex-team.ru
Backpatch-through: 16
This is in preparation to widening the buffer state to 64 bits, which in turn
is preparation for implementing content locks in bufmgr. This commit aims to
make the subsequent commits a bit easier to review, by separating out
reformatting etc from the actual changes.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/4csodkvvfbfloxxjlkgsnl2lgfv2mtzdl7phqzd4jxjadxm4o5@usw7feyb5bzf
Previously, a flag was set to indicate that a lock release should wake up
waiters. Since waking waiters is the default behavior in the majority of
cases, this logic has been inverted. The new LW_FLAG_WAKE_IN_PROGRESS flag is
now set iff wakeups are explicitly inhibited.
The motivation for this change is that in an upcoming commit, content locks
will be implemented independently of lwlocks, with the lock state stored as
part of BufferDesc.state. As all of a buffer's flags are cleared when the
buffer is invalidated, without this change we would have to re-add the
RELEASE_OK flag after clearing the flags; otherwise, the next lock release
would not wake waiters.
It seems good to keep the implementation of lwlocks and buffer content locks
as similar as reasonably possible.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/4csodkvvfbfloxxjlkgsnl2lgfv2mtzdl7phqzd4jxjadxm4o5@usw7feyb5bzf
RangeTblEntry.groupexprs was marked with the node attribute
query_jumble_ignore, causing a list of GROUP BY expressions to be
ignored during the query jumbling. For example, these two queries could
be grouped together within the same query ID:
SELECT count(*) FROM t GROUP BY a;
SELECT count(*) FROM t GROUP BY b;
However, as such queries use different GROUP BY clauses, they should be
split across multiple entries.
This fixes an oversight in 247dea89f7, that has introduced an RTE for
GROUP BY clauses. Query IDs are documented as being stable across minor
releases, but as this is a regression new to v18 and that we are still
early in its support cycle, a backpatch is exceptionally done as this
has broken a behavior that exists since query jumbling is supported in
core, since its introduction in pg_stat_statements.
The tests of pg_stat_statements are expanded to cover this area, with
patterns involving GROUP BY and GROUPING clauses.
Author: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxEy2W+tCqC7XuJ94r3ivWsM=onKJp94kRFx3hoARjBeFQ@mail.gmail.com
Backpatch-through: 18
Commit 9f8377f7a introduced the DEFAULT option for file_fdw but did not
update the documentation. This commit adds the missing description of
the DEFAULT option to the file_fdw documentation.
Backpatch to v16, where the DEFAULT option was introduced.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAOzEurT_PE7QEh5xAdb7Cja84Rur5qPv2Fzt3Tuqi=NU0WJsbg@mail.gmail.com
Backpatch-through: 16
The test script added by commit e1c971945d failed to handle the case
of cache-clobbering builds (CLOBBER_CACHE_ALWAYS and
CATCACHE_FORCE_RELEASE) properly -- it would only exit a loop on
timeout, which is slow, and unfortunate because I (Álvaro) increased the
timeout for that loop to the complete default TAP test timeout, causing
the buildfarm to report the whole test run as a timeout failure. We can
be much quicker: exit the loop as soon as the backend is seen as waiting
on the injection point.
In this commit we still reduce the timeout (of that loop and a nearby
one just to be safe) to half of the default.
I (Álvaro) had also changed Mihail's "sleep(1)" to "sleep(0.1)", which
apparently turns a 1s sleep into a 0s sleep, because Perl -- probably
making this a busy loop. Use Time::HiRes::usleep instead, like we do in
other tests.
Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CADzfLwWOVyJygX6BFuyuhTKkJ7uw2e8OcVCDnf6iqnOFhMPE%2BA%40mail.gmail.com
There are two reasons for doing so:
1) It is generally faster to perform checks in a batched fashion and making
sequential scans faster is nice.
2) We would like to stop setting hint bits while pages are being written
out. The necessary locking becomes visible for page mode scans, if done for
every tuple. With batching, the overhead can be amortized to only happen
once per page.
There are substantial further optimization opportunities along these
lines:
- Right now HeapTupleSatisfiesMVCCBatch() simply uses the single-tuple
HeapTupleSatisfiesMVCC(), relying on the compiler to inline it. We could
instead write an explicitly optimized version that avoids repeated xid
tests.
- Introduce batched version of the serializability test
- Introduce batched version of HeapTupleSatisfiesVacuum
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/6rgb2nvhyvnszz4ul3wfzlf5rheb2kkwrglthnna7qhe24onwr@vw27225tkyar
To be able to guarantee that we can set the hint bit, acquire an exclusive
lock on the old buffer. This is required as a future commit will only allow
hint bits to be set with a new lock level, which is acquired as-needed in a
non-blocking fashion.
We need the hint bits, set in heapam_relation_copy_for_cluster() ->
HeapTupleSatisfiesVacuum(), to be set, as otherwise reform_and_rewrite_tuple()
-> rewrite_heap_tuple() will get confused. Specifically, rewrite_heap_tuple()
checks for HEAP_XMAX_INVALID in the old tuple to determine whether to check
the old-to-new mapping hash table.
It'd be better if we somehow could avoid setting hint bits on the old page. A
common reason to use VACUUM FULL is very bloated tables - rewriting most of
the old table during VACUUM FULL doesn't exactly help.
Reviewed-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/4wggb7purufpto6x35fd2kwhasehnzfdy3zdcu47qryubs2hdz@fa5kannykekr
Before this commit fsm_vacuum_page() modified the page without any lock on the
page. Historically that was kind of ok, as we didn't rely on the freespace to
really stay consistent and we did not have checksums. But these days pages are
checksummed and there are ways for FSM pages to be included in WAL records,
even if the FSM itself is still not WAL logged. If a FSM page ever were
modified while a WAL record referenced that page, we'd be in trouble, as the
WAL CRC could end up getting corrupted.
The reason to address this right now is a series of patches with the goal to
only allow modifications of pages with an appropriate lock level. Obviously
not having any lock is not appropriate :)
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/4wggb7purufpto6x35fd2kwhasehnzfdy3zdcu47qryubs2hdz@fa5kannykekr
Discussion: https://postgr.es/m/e6a8f734-2198-4958-a028-aba863d4a204@iki.fi
Doing this meant that those two headers, which are supposed to be
internal to their corresponding index AMs, were being included pretty
much universally, because tuplesort.h is included by execnodes.h which
is very widely used. Stop that, and fix fallout.
We also change indexing.h to no longer include execnodes.h (tuptable.h
is sufficient), and relscan.h to no longer include buf.h (pointless
since c2fe139c20).
Author: Mario González <gonzalemario@gmail.com>
Discussion: https://postgr.es/m/CAFsReFUcBFup=Ohv_xd7SNQ=e73TXi8YNEkTsFEE2BW7jS1noQ@mail.gmail.com
Some structs and enums related to parallel query instrumentation had
organically grown scattered across various files, and were causing
header pollution especially through execnodes.h. Create a single file
where they can live together.
This only moves the structs to the new file; cleaning up the pollution
by removing no-longer-necessary cross-header inclusion will be done in
future commits.
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Co-authored-by: Mario González <gonzalemario@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/202510051642.wwmn4mj77wch@alvherre.pgsql
Discussion: https://postgr.es/m/CAFsReFUr4KrQ60z+ck9cRM4WuUw1TCghN7EFwvV0KvuncTRc2w@mail.gmail.com
In many cases, the cast would silently drop a const qualifier. To
fix, drop the unnecessary cast and let the compiler check the types
and qualifiers. Add const to read-only local variables, preserving
the const qualifiers from the function signatures.
Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/aUQHy/MmWq7c97wK%40ip-10-97-1-34.eu-west-3.compute.internal
Functions that dump table data receive their parameters through const
void * but were casting away const. Add const qualifiers to functions
that only read the table information.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aUQHy/MmWq7c97wK%40ip-10-97-1-34.eu-west-3.compute.internal
The dmetaphone() SQL function internally upper-cases the argument
string. It did this using the toupper() function. That way, it has a
dependency on the global LC_CTYPE locale setting, which we want to get
rid of.
The "double metaphone" algorithm specifically supports the "C with
cedilla" letter, so just using ASCII case conversion wouldn't work.
To fix that, use the passed-in collation and use the str_toupper()
function, which has full awareness of collations and collation
providers.
Note that this does not change the fact that this function only works
correctly with single-byte encodings. The change to str_toupper()
makes the case conversion multibyte-enabled, but the rest of the
function is still not ready.
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://www.postgresql.org/message-id/108e07a2-0632-4f00-984d-fe0e0d0ec726%40eisentraut.org
The parameter name used for the option value was named "number-of-jobs",
which was inconsistent with what all the other tools with an option
called --jobs use. This commit updates the parameter name to "njobs".
Author: Tatsuro Yamada <yamatattsu@gmail.com>
Discussion: https://postgr.es/m/CAOKkKFvHqA6Tny0RKkezWVfVV91nPJyj4OGtMi3C1RznDVXqrg@mail.gmail.com
Previously the instrumentation logic always converted to seconds, only for
many of the callers to do unnecessary division to get to milliseconds. As an
upcoming refactoring will split the Instrumentation struct, utilize instrtime
always to keep things simpler. It's also a bit faster to not have to first
convert to a double in functions like InstrEndLoop(), InstrAggNode().
Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAP53PkzZ3UotnRrrnXWAv=F4avRq9MQ8zU+bxoN9tpovEu6fGQ@mail.gmail.com
Reword publish_via_partition_root's opening paragraph. Describe its
behavior more clearly, and directly state that its default is false.
Per complaint by Peter Smith; final text of the patch made in
collaboration with Chao Li.
Author: Chao Li <li.evan.chao@gmail.com>
Author: Peter Smith <peter.b.smith@fujitsu.com>
Reported-by: Peter Smith <peter.b.smith@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAHut%2BPu7SpK%2BctOYoqYR3V4w5LKc9sCs6c_qotk9uTQJQ4zp6g%40mail.gmail.com
Backpatch-through: 14
This formerly said "unique constraint must ...", which was accurate
enough when it only applied to UNIQUE and PRIMARY KEY constraints.
However, now we use it for exclusion constraints too, and in that
case it's a tad confusing. Do what we already did in the errdetail
message: print the constraint_type, so that it looks like "UNIQUE
constraint ...", "EXCLUDE constraint ...", etc.
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxH6VhAf65Vghg4T2q315gY=Rt4BUfMyunkfRj0n2S9n-g@mail.gmail.com
Since commit bd15b7db48, pg_dump uses pg_get_sequence_data() (née
pg_sequence_read_tuple()) to gather all sequence data in a single
query as opposed to a query per sequence. Two related bugs have
been identified:
* If the user lacks appropriate privileges on the sequence, pg_dump
generates a setval() command with garbage values instead of
failing as expected.
* pg_dump can fail due to a concurrently dropped sequence, even if
the dropped sequence's data isn't part of the dump.
This commit fixes the above issues by 1) teaching
pg_get_sequence_data() to return nulls instead of erroring for a
missing sequence and 2) teaching pg_dump to fail if it tries to
dump the data of a sequence for which pg_get_sequence_data()
returned nulls. Note that pg_dump may still fail due to a
concurrently dropped sequence, but it should now only do so when
the sequence data is part of the dump. This matches the behavior
before commit bd15b7db48.
Bug: #19365
Reported-by: Paveł Tyślacki <pavel.tyslacki@gmail.com>
Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19365-6245240d8b926327%40postgresql.org
Discussion: https://postgr.es/m/2885944.1767029161%40sss.pgh.pa.us
Backpatch-through: 18
Commit 63d1b1cf7f replaced a direct nodeTag() comparison with the IsA()
macro in copy.c, but a similar direct comparison remained in define.c.
This commit replaces that comparison with IsA() for consistency.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwGjWGS89_sTx=sbPm0FQemyQQrfTKm=taUhAJFV5k-9cw@mail.gmail.com
This is important for Postgres extensions that are written in C++,
such as pg_duckdb, which uses PGXS as the build system currently. In
the autotools build, C++ is not coupled to LLVM. If the autotools
build is configured without --with-llvm, the C++ compiler and the
various flags get persisted into the Makefile.global.
Author: Tristan Partin <tristan@partin.io>
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/D98JHQF7H2A8.VSE3I4CJBTAB%40partin.io
When creating a partition for a RANGE partitioned table, the reporting
of errors relating to converting the specified range values into
constant values for the partition key's type could display the name of a
previous partition key column when an earlier range was specified as
MINVALUE or MAXVALUE.
This was caused by the code not correctly incrementing the index that
tracks which partition key the foreach loop was working on after
processing MINVALUE/MAXVALUE ranges.
Fix by using foreach_current_index() to ensure the index variable is
always set to the List element being worked on.
Author: myzhen <zhenmingyang@yeah.net>
Reviewed-by: zhibin wang <killerwzb@gmail.com>
Discussion: https://postgr.es/m/273cab52.978.19b96fc75e7.Coremail.zhenmingyang@yeah.net
Backpatch-through: 14
In the wake of the previous commit, this script will fail
if executed via CREATE EXTENSION, so it's useless. Remove it,
but keep the delta scripts, allowing old (even very old)
versions of the btree_gist SQL objects to be upgraded to 1.9
via ALTER EXTENSION UPDATE after a pg_upgrade.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/2483812.1754072263@sss.pgh.pa.us
This patch completes the transition to making inet_ops be default for
inet/cidr columns, rather than btree_gist's opclasses. Once we do
that, though, pg_upgrade has a big problem. A dump from an older
version will see btree_gist's opclasses as being default, so it will
not mention the opclass explicitly in CREATE INDEX commands, which
would cause the restore to create the indexes using inet_ops. Since
that's not compatible with what's actually in the files, havoc would
ensue.
This isn't readily fixable, because the CREATE INDEX command strings
are built by the older server's pg_get_indexdef() function; pg_dump
hasn't nearly enough knowledge to modify those strings successfully.
Even if we cared to put in the work to make that happen in pg_dump,
it would be counterproductive because the end goal here is to get
people off of these opclasses. Allowing such indexes to persist
through pg_upgrade wouldn't advance that goal.
Therefore, this patch just adds code to pg_upgrade to detect indexes
that would be problematic and refuse to upgrade.
There's another issue too: even without any indexes to worry about,
pg_dump in binary-upgrade mode will reproduce the "CREATE OPERATOR
CLASS ... DEFAULT" commands for btree_gist's opclasses, and those
will fail because now we have a built-in opclass that provides a
conflicting default. We could ask users to drop the btree_gist
extension altogether before upgrading, but that would carry very
severe penalties. It would affect perfectly-valid indexes for other
data types, and it would drop operators that might be relied on in
views or other database objects. Instead, put a hack in DefineOpClass
to ignore the DEFAULT clauses for these opclasses when in
binary-upgrade mode. This will result in installing a version of
btree_gist that isn't quite the version it claims to be, but that can
be fixed by issuing ALTER EXTENSION UPDATE afterwards.
Since we don't apply that hack when not in binary-upgrade mode,
it is now impossible to install any version of btree_gist
less than 1.9 via CREATE EXTENSION.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/2483812.1754072263@sss.pgh.pa.us
btree_gist's gist_inet_ops and gist_cidr_ops opclasses are
fundamentally broken: they rely on an approximate representation of
the inet values and hence sometimes miss rows they should return.
We want to eventually get rid of them altogether, but as the first
step on that journey, we should mark them not-opcdefault.
To do that, roll up the preceding deltas since 1.2 into a new base
script btree_gist--1.9.sql. This will allow installing 1.9 without
going through a transient situation where gist_inet_ops and
gist_cidr_ops are marked as opcdefault; trying to create them that
way will fail if there's already a matching default opclass in the
core system. Additionally provide btree_gist--1.8--1.9.sql, so
that a database that's been pg_upgraded from an older version can
be migrated to 1.9.
I noted along the way that commit 57e3c5160 had missed marking the
gist_bool_ops support functions as PARALLEL SAFE. While that probably
has little harmful effect (since AFAIK we don't check that when
calling index support functions), this seems like a good time to make
things consistent.
Readers will also note that I removed the former habit of installing
some opclass operators/functions with ALTER OPERATOR FAMILY, instead
just rolling them all into the CREATE OPERATOR CLASS steps. The
comment in btree_gist--1.2.sql that it's necessary to use ALTER for
pg_upgrade reproducibility has been obsolete since we invented the
amadjustmembers infrastructure. Nowadays, gistadjustmembers will
force all operators and non-required support functions to have "soft"
opfamily dependencies, regardless of whether they are installed by
CREATE or ALTER.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/2483812.1754072263@sss.pgh.pa.us
The only user-visible change is the fix in the "malformed
pg_dependencies" error detail. That one is new in commit e1405aa5e3,
so no backpatching required.
When repurposing a standby to a logical replica, pg_createsubscriber
uses for the new replica a set of configuration parameters saved into
postgresql.auto.conf, to force recovery patterns when the physical
replica is promoted.
While not wrong in practice, this approach can cause issues when forcing
again recovery on a logical replica or its base backup as the recovery
parameters are not reset on the target server once pg_createsubscriber
is done with the node.
This commit aims at improving the situation, by changing the way
recovery parameters are saved on the target node. Instead of writing
all the configuration to postgresql.auto.conf, this file now uses an
include_if_exists, that points to a pg_createsubscriber.conf. This new
file contains all the recovery configuration, and is renamed to
pg_createsubscriber.conf.disabled when pg_createsubscriber exits. This
approach resets the recovery parameters, and offers the benefit to keep
a trace of the setup used when the target node got promoted, for
debugging purposes. If pg_createsubscriber.conf cannot be renamed
(unlikely scenario), a warning is issued to inform users that a manual
intervention may be required to reset this configuration.
This commit includes a test case to demonstrate the problematic case: a
standby node created from a base backup of what was the target node of
pg_createsubscriber does not get confused when started. If removing
this new logic, the test fails with the standby not able to start due
to an incorrect recovery target setup, where the startup process fails
quickly with a FATAL.
I have provided the design idea for the patch, that Alyona has written
(with some code adjustments from me). This could be considered as a
bug, but after discussion this is put into the bucket for improvements.
Redesigning pg_createsubscriber would not be acceptable in the stable
branches anyway.
Author: Alyona Vinter <dlaaren8@gmail.com>
Reviewed-by: Ilyasov Ian <ianilyasov@outlook.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andrey Rudometov <unlimitedhikari@gmail.com>
Discussion: https://postgr.es/m/CAGWv16K6L6Pzm99i1KiXLjFWx2bUS3DVsR6yV87-YR9QO7xb3A@mail.gmail.com
This commit adds tab completion support for the keywords "pstdin" and
"pstdout" in the \copy command. "pstdin" is now suggested after FROM,
and "pstdout" is suggested after TO, alongside filenames and other
keywords.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20251231183953.95132e171e43abd5e9b78084@sraoss.co.jp
This commit does the following to get tests passing for
MSVC/AArch64:
* Implements spin_delay() with an ISB instruction (like we do for
gcc/clang on AArch64).
* Sets USE_ARMV8_CRC32C unconditionally. Vendor-supported versions
of Windows for AArch64 require at least ARMv8.1, which is where CRC
extension support became mandatory.
* Implements S_UNLOCK() with _InterlockedExchange(). The existing
implementation for MSVC uses _ReadWriteBarrier() (a compiler
barrier), which is insufficient for this purpose on non-TSO
architectures.
There are likely other changes required to take full advantage of
the hardware (e.g., atomics/arch-arm.h, simd.h,
pg_popcount_aarch64.c), but those can be dealt with later.
Author: Niyas Sait <niyas.sait@linaro.org>
Co-authored-by: Greg Burd <greg@burd.me>
Co-authored-by: Dave Cramer <davecramer@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Tested-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/A6152C7C-F5E3-4958-8F8E-7692D259FF2F%40greg.burd.me
Discussion: https://postgr.es/m/CAFPTBD-74%2BAEuN9n7caJ0YUnW5A0r-KBX8rYoEJWqFPgLKpzdg%40mail.gmail.com
Fix comments that incorrectly described transformations performed by the
"Avoid extra index searches through preprocessing" mechanism introduced
by commit b3f1a13f.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-By: Chao Li <li.evan.chao@gmail.com>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/20251230190145.c3c88c5eb0f88b136adda92f@sraoss.co.jp
Backpatch-through: 18
The USER and IN GROUP clauses of CREATE ROLE are deprecated, and
commit 8e78f0a1 removed them from the CREATE ROLE syntax syntax
synopsis in the docs. However, previously CREATE USER and
CREATE GROUP docs still listed these clauses.
Since CREATE USER is equivalent to CREATE ROLE ... WITH LOGIN and
CREATE GROUP is equivalent to CREATE ROLE, their documented syntax
synopsis should match CREATE ROLE to avoid confusion.
Therefore this commit removes the deprecated USER and IN GROUP
clauses from the CREATE USER and CREATE GROUP syntax synopsis
in the docs.
Author: Japin Li <japinli@hotmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/MEAPR01MB3031C30E72EF16CFC08C8565B687A@MEAPR01MB3031.ausprd01.prod.outlook.com
Commit c7eab0e97 changed the default password_encryption setting to
'scram-sha-256', so update the example for creating a user with an
assigned password.
In addition, commit 08951a7c9 added new options that in turn pass
default tokens NOBYPASSRLS and NOREPLICATION to the CREATE ROLE
command, so fix this omission as well for v16 and later.
Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/cff1ea60-c67d-4320-9e33-094637c2c4fb%40iki.fi
Backpatch-through: 14
During catcache searches, the most-recently searched entries are kept at
the head of the list to speed up subsequent searches, keeping the
"freshest" entries at its beginning. A rehash of the catcache was doing
the opposite: fresh entries were moved to the tail of the newly-created
buckets, causing a rehash to slow down a bit.
When a rehash is done, this commit switches the code to use
dlist_push_tail() instead of dlist_push_head(), so as fresh entries are
kept at the head of the lists, not their tail.
Author: ChangAo Chen <cca5507@qq.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/tencent_9EA10D8512B5FE29E7323F780A0749768708@qq.com
GUC is a commonly used term but previously appeared only
in the acronym documentation. This commit adds glossary and
documentation index entries for GUC to make it easier to find
and understand.
Author: Robert Treat <rob@xzilla.net>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CABV9wwPQnkeo_G6-orMGnHPK9SXGVWm7ajJPzsbE6944tDx=hQ@mail.gmail.com
This new identifier type provides support for 64-bit unsigned values,
to be used in catalogs, like OIDs. An advantage of a new data type is
that it becomes easier to grep for it in the code when assigning this
type to a catalog attribute, linking it to dedicated APIs and internal
structures.
The following operators are added in this commit, with dedicated tests:
- Casts with integer types and OID.
- btree and hash operators
- min/max functions.
- C type with related macros and defines, named around "Oid8".
This has been mentioned as useful on its own on the thread to add
support for 64-bit TOAST values, so as it becomes possible to attach
this data type to the TOAST code and catalog definitions. However, as
this concept can apply to many more areas, it is implemented as its own
independent change. This is based on a discussion with Andres Freund
and Tom Lane.
Bump catalog version.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com>
Discussion: https://postgr.es/m/1891064.1754681536@sss.pgh.pa.us
In a7f107df2 I changed subplan param evaluation to happen within the
containing expression. As part of that, ExecInitSubPlanExpr() was changed to
evaluate parameters via a new EEOP_PARAM_SET expression step. These parameters
were temporarily stored into ExprState->resvalue/resnull, with some reasoning
why that would be fine. Unfortunately, that analysis was wrong -
ExecInitSubscriptionRef() evaluates the input array into "resv"/"resnull",
which will often point to ExprState->resvalue/resnull. This means that the
EEOP_PARAM_SET, if inside an array subscript, would overwrite the input array
to array subscript.
The fix is fairly simple - instead of evaluating into
ExprState->resvalue/resnull, store the temporary result of the subplan in the
subplan's return value.
Bug: #19370
Reported-by: Zepeng Zhang <redraiment@gmail.com>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Diagnosed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/19370-7fb7a5854b7618f1@postgresql.org
Backpatch-through: 18
The new test 002_worker_terminate relies on the generation of a LOG
entry to check that a worker has been started, but missed the fact that
a node set with log_error_verbosity = verbose would add an error code.
The regexp used for the matching check did not take this case into
account, making the test fail on a timeout. The regexp is now fixed to
handle the verbose case correctly.
Per buildfarm member prion, that uses log_error_verbosity = verbose.
The error was reproducible by setting this GUC the same way in the test.
Oversight in f1e251be80.
All the code paths updated here have been using index_close() to
close a relation that was opened with relation_open(), in pgstattuple
and pageinspect. index_close() does the same thing as relation_close(),
so there is no harm, but being inconsistent could lead to issues if the
internals of these close() functions begin to introduce some specific
logic in the future.
In passing, this commit adds some comments explaining why we are using
relation_open() instead of index_open() in a few places, which is due to
the fact that partitioned indexes are not allowed in these functions.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/aUKamYGiDKO6byp5@ip-10-97-1-34.eu-west-3.compute.internal
Background workers gain a new flag, called BGWORKER_INTERRUPTIBLE, that
offers the possibility to terminate the workers when these are connected
to a database that is involved in one of the following commands:
ALTER DATABASE RENAME TO
ALTER DATABASE SET TABLESPACE
CREATE DATABASE
DROP DATABASE
This is useful to give background workers the same behavior as backends
and autovacuum workers, which are stopped when these commands are
executed. The default behavior, that exists since 9.3, is still to
never terminate bgworkers connected to the database involved in any of
these commands. The new flag has to be set to terminate the workers.
A couple of tests are added to worker_spi to track the commands that
impact the termination of the workers. There is a test case for a
non-interruptible worker, additionally, that relies on an injection
point to make the wait time in CountOtherDBBackends() reduced from 5s to
0.3s for faster test runs. The tests rely on the contents of the server
logs to check if a worker has been started or terminated:
- LOG generated by worker_spi_main() at startup, once connection to
database is done.
- FATAL in bgworker_die() when terminated.
A couple of tests run in the CI have showed that this method is stable
enough. The safe_psql() calls that scan pg_stat_activity could be
replaced with some poll_query_until() for more stability, if the current
method proves to be an issue in the buildfarm.
Author: Aya Iwata <iwata.aya@fujitsu.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Ryo Matsumura <matsumura.ryo@fujitsu.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/OS7PR01MB11964335F36BE41021B62EAE8EAE4A@OS7PR01MB11964.jpnprd01.prod.outlook.com
Since commit 1462aad2e4, which introduced the ability to modify the
two_phase property of a slot, the comments above ReplicationSlotCreate
have become outdated. We have now added a cautionary note in the comments
above ReplicationSlotAlter explaining when it is safe to modify the
two_phase property of a slot.
Author: Daniil Davydov <3danissimo@gmail.com>
Author: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/CAJDiXggZXQZ7bD0QcTizDt6us9aX6ZKK4dWxzgb5x3+TsVHjqQ@mail.gmail.com
When processing the "publish" options of an ALTER PUBLICATION command,
we call SplitIdentifierString() to split the options into a List of
strings. Since SplitIdentifierString() modifies the delimiter
character and puts NULs in their place, this would overwrite the memory
of the AlterPublicationStmt. Later in AlterPublicationOptions(), the
modified AlterPublicationStmt is copied for event triggers, which would
result in the event trigger only seeing the first "publish" option
rather than all options that were specified in the command.
To fix this, make a copy of the string before passing to
SplitIdentifierString().
Here we also adjust a similar case in the pgoutput plugin. There's no
known issues caused by SplitIdentifierString() here, so this is being
done out of paranoia.
Thanks to Henson Choi for putting together an example case showing the
ALTER PUBLICATION issue.
Author: sunil s <sunilfeb26@gmail.com>
Reviewed-by: Henson Choi <assam258@gmail.com>
Reviewed-by: zengman <zengman@halodbtech.com>
Backpatch-through: 14
Commit d926462d81 restored the behavior of passing GUC settings from
the CONNECTION string to the publisher's walsender, allowing per-connection
configuration.
This commit adds a TAP test to verify that behavior works correctly.
Since commit d926462d81 was recently applied and backpatched to v15,
this follow-up commit is also backpatched accordingly.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/CAHGQGwGYV+-abbKwdrM2UHUe-JYOFWmsrs6=QicyJO-j+-Widw@mail.gmail.com
Backpatch-through: 15
Prior to v15, GUC settings supplied in the CONNECTION clause of
CREATE SUBSCRIPTION were correctly passed through to
the publisher's walsender. For example:
CREATE SUBSCRIPTION mysub
CONNECTION 'options=''-c wal_sender_timeout=1000'''
PUBLICATION ...
would cause wal_sender_timeout to take effect on the publisher's walsender.
However, commit f3d4019da5 changed the way logical replication
connections are established, forcing the publisher's relevant
GUC settings (datestyle, intervalstyle, extra_float_digits) to
override those provided in the CONNECTION string. As a result,
from v15 through v18, GUC settings in the CONNECTION string were
always ignored.
This regression prevented per-connection tuning of logical replication.
For example, using a shorter timeout for walsender connecting
to a nearby subscriber and a longer one for walsender connecting
to a remote subscriber.
This commit restores the intended behavior by ensuring that
GUC settings in the CONNECTION string are again passed through
and applied by the walsender, allowing per-connection configuration.
Backpatch to v15, where the regression was introduced.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/CAHGQGwGYV+-abbKwdrM2UHUe-JYOFWmsrs6=QicyJO-j+-Widw@mail.gmail.com
Backpatch-through: 15
The old code would set *opid = InvalidOid to determine if the
get_opclass_opfamily_and_input_type() worked or not. This means more
moving parts that what's really needed here. Let's just fail
immediately if the get_opclass_opfamily_and_input_type() lookup fails.
Author: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Neil Chen <carpenter.nail.cz@gmail.com>
Discussion: https://postgr.es/m/CA+renyXOrjLacP_nhqEQUf2W+ZCoY2q5kpQCfG05vQVYzr8b9w@mail.gmail.com
The comment claimed *strat got set to InvalidStrategy when the function
lookup fails. This isn't true; an ERROR is raised when that happens.
Author: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Neil Chen <carpenter.nail.cz@gmail.com>
Discussion: https://postgr.es/m/CA+renyXOrjLacP_nhqEQUf2W+ZCoY2q5kpQCfG05vQVYzr8b9w@mail.gmail.com
Backpatch-through: 18
Update pg_rewind documentation to reflect the change that data checksums are
now enabled by default during initdb.
Backpatch to v18, where data checksums were changed to be enabled by default.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/TY4PR01MB16907D62F3A0A377B30FDBEA794B2A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Backpatch-through: 18
It's been almost a year since we last did this, and upstream has
been busy. They've added stemmers for Polish and Esperanto,
and also deprecated their old Dutch stemmer in favor of the
Kraaij-Pohlmann algorithm. (The "dutch" stemmer is now the
latter, and "dutch_porter" is the old algorithm.)
Upstream also decided to rename their internal header "header.h"
to something less generic: "snowball_runtime.h". Seems like a good
thing, but it complicates this patch a bit because we were relying on
interposing our own version of "header.h" to control system header
inclusion order. (We're partially failing at that now, because now the
generated stemmer files include <stddef.h> before snowball_runtime.h.
I think that'll be okay, but if the buildfarm complains then we'll
have to do more-extensive editing of the generated files.)
I realized that we weren't documenting the available stemmers in
any user-visible place, except indirectly through sample \dFd output.
That's incomplete because we only provide built-in dictionaries for
the recommended stemmers for each language, not alternative stemmers
such as dutch_porter. So I added a list to the documentation.
I did not do anything with the stopword lists. If those are still
available from snowballstem.org, they are mighty well hidden.
Discussion: https://postgr.es/m/1185975.1767569534@sss.pgh.pa.us
Previously the ulimit -p 256 was needed to increase the limit on
openbsd. However, sometimes the limit actually was too low, causing
"could not fork new process for connection: Resource temporarily unavailable"
errors. Most commonly on netbsd, but also on openbsd.
The ulimit on openbsd couldn't trivially be increased with ulimit, because of
hitting the hard limit.
Instead of increasing the limit in the CI script, the CI image generation now
increases the limits: https://github.com/anarazel/pg-vm-images/pull/129
Backpatch-through: 18
When the standby is passed as a PostgreSQL::Test::Cluster instance,
use the WAIT FOR LSN command on the standby server to implement
wait_for_catchup() for replay, write, and flush modes. This is more
efficient than polling pg_stat_replication on the upstream, as the
WAIT FOR LSN command uses a latch-based wakeup mechanism.
The optimization applies when:
- The standby is passed as a Cluster object (not just a name string)
- The mode is 'replay', 'write', or 'flush' (not 'sent')
- The standby is in recovery
For 'sent' mode, when the standby is passed as a string (e.g., a
subscription name for logical replication), or when the standby has
been promoted, the function falls back to the original polling-based
approach using pg_stat_replication on the upstream.
Discussion: https://postgr.es/m/CABPTF7UiArgW-sXj9CNwRzUhYOQrevLzkYcgBydmX5oDes1sjg%40mail.gmail.com
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@kurilemu.de>
Update psql tab completion to support the optional MODE option in the
WAIT FOR LSN command. After specifying an LSN value, completion now offers
both MODE and WITH keywords. The MODE option specifies which LSN type to wait
for. In particular, it controls whether the wait is evaluated from the
standby or primary perspective.
When MODE is specified, the completion suggests the valid mode values:
standby_replay, standby_write, standby_flush, and primary_flush.
Discussion: https://postgr.es/m/CABPTF7UiArgW-sXj9CNwRzUhYOQrevLzkYcgBydmX5oDes1sjg%40mail.gmail.com
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@kurilemu.de>
This commit extends the WAIT FOR LSN command with an optional MODE option in
the WITH clause that specifies which LSN type to wait for:
WAIT FOR LSN '<lsn>' [WITH (MODE '<mode>', ...)]
where mode can be:
- 'standby_replay' (default): Wait for WAL to be replayed to the specified
LSN,
- 'standby_write': Wait for WAL to be written (received) to the specified
LSN,
- 'standby_flush': Wait for WAL to be flushed to disk at the specified LSN,
- 'primary_flush': Wait for WAL to be flushed to disk on the primary server.
The default mode is 'standby_replay', matching the original behavior when MODE
is not specified. This follows the pattern used by COPY and EXPLAIN
commands, where options are specified as string values in the WITH clause.
Modes are explicitly named to distinguish between primary and standby
operations:
- Standby modes ('standby_replay', 'standby_write', 'standby_flush') can only
be used during recovery (on a standby server),
- Primary mode ('primary_flush') can only be used on a primary server.
The 'standby_write' and 'standby_flush' modes are useful for scenarios where
applications need to ensure WAL has been received or persisted on the standby
without necessarily waiting for replay to complete. The 'primary_flush' mode
allows waiting for WAL to be flushed on the primary server.
This commit also includes includes:
- Documentation updates for the new syntax and mode descriptions,
- Test coverage for all four modes, including error cases and concurrent
waiters,
- Wakeup logic in walreceiver for standby write/flush waiters,
- Wakeup logic in WAL writer for primary flush waiters.
Discussion: https://postgr.es/m/CABPTF7UiArgW-sXj9CNwRzUhYOQrevLzkYcgBydmX5oDes1sjg%40mail.gmail.com
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@kurilemu.de>
Add support for waiting on WAL write and flush LSNs in addition to the
existing replay LSN wait type. This provides the foundation for
extending the WAIT FOR command with MODE parameter.
Key changes are following.
- Add WAIT_LSN_TYPE_STANDBY_WRITE and WAIT_LSN_TYPE_STANDBY_FLUSH to
WaitLSNType.
- Add GetCurrentLSNForWaitType() to retrieve the current LSN for each wait
type.
- Add new wait events WAIT_EVENT_WAIT_FOR_WAL_WRITE and
WAIT_EVENT_WAIT_FOR_WAL_FLUSH for pg_stat_activity visibility.
- Update WaitForLSN() to use GetCurrentLSNForWaitType() internally.
Discussion: https://postgr.es/m/CABPTF7UiArgW-sXj9CNwRzUhYOQrevLzkYcgBydmX5oDes1sjg%40mail.gmail.com
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@kurilemu.de>
Replace ERRCODE_UNDEFINED_TABLE with ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE
for the case where we don't find a parent-child relationship between the
partitioned table and its partition. In this case, tables are present, but
they are not in a prerequisite state (no relationship).
Discussion: https://postgr.es/m/CAHewXNmBM%2B5qbrJMu60NxPn%2B0y-%3D2wXM-QVVs3xRp8NxFvDb9A%40mail.gmail.com
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
A few places were accessing &ProcGlobal->allProcs directly, so adjust
them to use the accessor macro instead.
Author: Maksim Melnikov <m.melnikov@postgrespro.ru>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/80621c00-aba6-483c-88b1-a845461d1165@postgrespro.ru
7d854bdc5b has removed two symbols from pg_config.h.in. This file is
automatically generated. The correct cleanup needs to be done in the
build scripts, instead. autoheader produces now a consistent
pg_config.h.in, without the symbols that were removed in the previous
commit.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1193764.1767573683@sss.pgh.pa.us
This change is a cocktail of harmonization of function argument names,
grammar typos, renames for better consistency and unused code (see
ltree). All of these have been spotted by the author.
Author: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/b2c0d0b7-3944-487d-a03d-d155851958ff@gmail.com
We must tell init about each role name we plan to connect as,
else SSPI auth fails. Similar to previous patches such as
da44d71e7.
Oversight in f3c9e341c, per buildfarm member drongo.
This patch mostly just fills in the field, although a few error
reports in resolve_unique_index_expr() are adjusted to use it.
The next commit will add more uses.
catversion bump out of an abundance of caution: I'm not sure
IndexElem can appear in stored rules, but I'm not sure it can't
either.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Co-authored-by: jian he <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxH3OgXF1hrzGAaWyNtye2jHEmk9JbtrtGv-KJK6tsGo5w@mail.gmail.com
Discussion: https://postgr.es/m/202512121327.f2zimsr6guso@alvherre.pgsql
This fixes a poorly written integer comparison function which was
performing subtraction in an attempt to return a negative value when
a < b and a positive value when a > b, and 0 when the values were equal.
Unfortunately that didn't always work correctly due to two's complement
having the INT_MIN 1 further from zero than INT_MAX. This could result
in an overflow and cause the comparison function to return an incorrect
result, which would result in the binary search failing to find the
value being searched for.
This could cause poor selectivity estimates when the statistics stored
the value of INT_MAX (2147483647) and the value being searched for was
large enough to result in the binary search doing a comparison with that
INT_MAX value.
Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2ng1Ot5LoKbVU-Dh---dFTUZWJRH8wv2chBu29fnNDMaQ@mail.gmail.com
Backpatch-through: 14
Change "function" to "function or procedure" in
PreventInTransactionBlock, and improve grammar of ExecWaitStmt's
complaint about having an active snapshot.
Author: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Marcos Pegoraro <marcos@f10.com.br>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFj8pRCveWPR06bbad9GnMb0Kcr6jnXPttv9XOaOB+oFCD1Tsg@mail.gmail.com
Add a new "location" column to the pg_available_extensions and
pg_available_extension_versions views, exposing the directory where
the extension is located.
The default system location is shown as '$system', the same value
that can be used to configure the extension_control_path GUC.
User-defined locations are only visible for super users, otherwise
'<insufficient privilege>' is returned as a column value, the same
behaviour that we already use in pg_stat_activity.
I failed to resist the temptation to do a little extra editorializing of
the TAP test script.
Catalog version bumped.
Author: Matheus Alcantara <mths.dev@pm.me>
Reviewed-By: Chao Li <li.evan.chao@gmail.com>
Reviewed-By: Rohit Prasad <rohit.prasad@arm.com>
Reviewed-By: Michael Banck <mbanck@gmx.net>
Reviewed-By: Manni Wood <manni.wood@enterprisedb.com>
Reviewed-By: Euler Taveira <euler@eulerto.com>
Reviewed-By: Quan Zongliang <quanzongliang@yeah.net>
Commit f54af9f267 added a check for
io_uring_queue_init_mem(). However, it used the macro name
HAVE_LIBURING_QUEUE_INIT_MEM in both meson.build and the C code, while
the Autotools build script defined HAVE_IO_URING_QUEUE_INIT_MEM. As a
result, the optimization was never enabled in builds configured with
Autotools, as the C code checked for the wrong macro name.
This commit changes the macro name to HAVE_IO_URING_QUEUE_INIT_MEM in
meson.build and the C code. This matches the actual function
name (io_uring_queue_init_mem), following the standard HAVE_<FUNCTION>
convention.
Backpatch to 18, where the macro was introduced.
Bug: #19368
Reported-by: Evan Si <evsi@amazon.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19368-016d79a7f3a1c599@postgresql.org
Backpatch-through: 18
This <note> was originally written to describe the double levels
of de-backslashing encountered when a backslash-aware string
literal is used to hold the text representation of a composite
value. It still made sense when we switched to mostly using E'...'
syntax for that type of literal. However, commit f77de4b0c mangled
it completely by changing the example literal to be SQL-standard.
The extra pass of de-backslashing described in the text doesn't
actually occur with the example as written, unless you happen to
be using standard_conforming_strings = off.
We could restore this <note> to self-consistency by reverting the
change from f77de4b0c, but on the whole I judge that its time has
passed. standard_conforming_strings = off is nearly obsolete,
and may soon be fully so. But without that, the behavior isn't
so complicated as to justify a discursive note. I observe that
the nearby section about array I/O syntax has no equivalent text,
although that syntax is equally subject to this issue.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2998401.1767038920@sss.pgh.pa.us
Discussion: https://postgr.es/m/3279216.1767072538@sss.pgh.pa.us
jit_profiling_support=true captures profile data for Linux perf. On
other platforms, LLVMCreatePerfJITEventListener() returns NULL and the
attempt to register the listener would crash.
Fix by ignoring the setting in that case. The documentation already
says that it only has an effect if perf support is present, and we
already did the same for older LLVM versions that lacked support.
No field reports, unsurprisingly for an obscure developer-oriented
setting. Noticed in passing while working on commit 1a28b4b4.
Backpatch-through: 14
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGJgB6gvrdDohgwLfCwzVQm%3DVMtb9m0vzQn%3DCwWn-kwG9w%40mail.gmail.com
Up to now, index amhandlers were expected to produce a new, palloc'd
struct on each call. That requires palloc/pfree overhead, and creates
a risk of memory leaks if the caller fails to pfree, and the time
taken to fill such a large structure isn't nil. Moreover, we were
storing these things in the relcache, eating several hundred bytes for
each cached index. There is not anything in these structs that needs
to vary at runtime, so let's change the definition so that an
amhandler can return a pointer to a "static const" struct of which
there's only one copy per index AM. Mark all the core code's
IndexAmRoutine pointers const so that we catch anyplace that might
still try to change or pfree one.
(This is similar to the way we were already handling TableAmRoutine
structs. This commit does fix one comment that was infelicitously
copied-and-pasted into tableamapi.c.)
This commit needs to be called out in the v19 release notes as an API
change for extension index AMs. An un-updated AM will still work
(as of now, anyway) but it risks memory leaks and will be slower than
necessary.
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAEoWx2=vApYk2LRu8R0DdahsPNEhWUxGBZ=rbZo1EXE=uA+opQ@mail.gmail.com
This commit adds the total memory allocated during vacuum, the number
of times the dead items storage was reset, and the configured memory
limit. This helps users understand how much memory VACUUM required,
and such information can be used to avoid multiple index scans.
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHza6qcPitBCkyiKJosDTt3bmxMvzZOTONoebwCkBZrr3rk65Q%40mail.gmail.com
Previously, ReplicationSlotsComputeRequiredXmin() computed the oldest
xmin across all slots without holding ProcArrayLock (when
already_locked is false), acquiring the lock just before updating the
replication slot xmin.
This could lead to a race condition: if a backend created a new slot
and updates the global replication slot xmin, another backend
concurrently running ReplicationSlotsComputeRequiredXmin() could
overwrite that update with an invalid or stale value. This happens
because the concurrent backend might have computed the aggregate xmin
before the new slot was accounted for, but applied the update after
the new slot had already updated the global value.
In the reported failure, a walsender for an apply worker computed
InvalidTransactionId as the oldest xmin and overwrote a valid
replication slot xmin value computed by a walsender for a tablesync
worker. Consequently, the tablesync worker computed a transaction ID
via GetOldestSafeDecodingTransactionId() effectively without
considering the replication slot xmin. This led to the error "cannot
build an initial slot snapshot as oldest safe xid %u follows
snapshot's xmin %u", which was an assertion failure prior to commit
240e0dbacd.
To fix this, we acquire ReplicationSlotControlLock in exclusive mode
during slot creation to perform the initial update of the slot
xmin. In ReplicationSlotsComputeRequiredXmin(), we hold
ReplicationSlotControlLock in shared mode until the global slot xmin
is updated in ProcArraySetReplicationSlotXmin(). This prevents
concurrent computations and updates of the global xmin by other
backends during the initial slot xmin update process, while still
permitting concurrent calls to ReplicationSlotsComputeRequiredXmin().
Backpatch to all supported versions.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Pradeep Kumar <spradeepkumar29@gmail.com>
Reviewed-by: Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1L8wYcyTPxNzPGkhuO52WBGoOZbT0A73Le=ZUWYAYmdfw@mail.gmail.com
Backpatch-through: 14
We currently require LLVM 14, so these probes for LLVM 9 functions
always succeeded. Even when the features aren't enabled in an LLVM
build, dummy functions are defined (a problem for a later commit).
The whole PGAC_CHECK_LLVM_FUNCTIONS macro and Meson equivalent are
removed, because we switched to testing LLVM_VERSION_MAJOR at compile
time in subsequent work and these were the last holdouts. That suits
the nature of LLVM API evolution better, and also allows for strictly
mechanical pruning in future commits like 820b5af7 and 972c2cd2. They
advanced the minimum LLVM version but failed to spot these.
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGJgB6gvrdDohgwLfCwzVQm%3DVMtb9m0vzQn%3DCwWn-kwG9w%40mail.gmail.com
This new function exposes at SQL level some information related to
multixacts, not available until now. This data is useful for monitoring
purposes, especially for workloads that make a heavy use of multixacts:
- num_mxids, number of MultiXact IDs in use.
- num_members, number of member entries in use.
- members_size, bytes used by num_members in pg_multixact/members/.
- oldest_multixact: oldest MultiXact still needed.
This patch has been originally proposed when MultiXactOffset was still
32 bits, to monitor wraparound. This part is not relevant anymore since
bd8d9c9bdf that has widen MultiXactOffset to 64 bits. The monitoring
of disk space usage for the members is still relevant.
Some tests are added to check this function, in the shape of one
isolation test with concurrent transactions that take a ROW SHARE lock,
and some SQL tests for pg_read_all_stats. Some documentation is added
to explain some patterns that can come from the information provided by
the function.
Bump catalog version.
Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Discussion: https://postgr.es/m/CA+QeY+AAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg@mail.gmail.com
This function calculates in bytes the storage taken between two
multixact offsets. This will be used in an upcoming patch, introduced
separately here as this piece can be useful on its own.
Author: Naga Appani <nagnrik@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aUyTvZMq2CLgNEB4@paquier.xyz
This routine returned a number of members as a MultiXactOffset,
calculated based on the difference between the next-to-be-assigned
offset and the oldest offset. However, this number is not actually an
offset but a number.
This type confusion comes from the original implementation of
MultiXactMemberFreezeThreshold(), in 53bb309d2d. The number of
members is now defined as a uint64, large enough for MultiXactOffset.
This change will be used in a follow-up patch.
Reviewed-by: Naga Appani <nagnrik@gmail.com>
Discussion: https://postgr.es/m/aUyTvZMq2CLgNEB4@paquier.xyz
REL_18_STABLE and master have commit ee485912, so they always use the
newer LLVM opaque pointer functions. Drop -Wno-deprecated-declarations
(commit a56e7b660) for code under jit/llvm in those branches, to catch
any new deprecation warnings that arrive in future version of LLVM.
Older branches continued to use functions marked deprecated in LLVM 14
and 15 (ie switched to the newer functions only for LLVM 16+), as a
precaution against unforeseen compatibility problems with bitcode
already shipped. In those branches, the comment about warning
suppression is updated to explain that situation better. In theory we
could suppress warnings only for LLVM 14 and 15 specifically, but that
isn't done here.
Backpatch-through: 14
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1407185.1766682319%40sss.pgh.pa.us
estimate_hash_bucket_stats is defined to return zero to *mcv_freq if
it cannot obtain a value for the frequency of the most common value.
Its sole caller final_cost_hashjoin ignored this provision and would
blindly believe the zero value, resulting in computing zero for the
largest bucket size. In consequence, the safety check that intended
to prevent the largest bucket from exceeding get_hash_memory_limit()
was ineffective, allowing very silly plans to be chosen if statistics
were missing.
After fixing final_cost_hashjoin to disregard zero results for
mcv_freq, a second problem appeared: some cases that should use hash
joins failed to. This is because estimate_hash_bucket_stats was
unaware of the fact that ANALYZE won't store MCV statistics if it
doesn't find any multiply-occurring values. Thus the lack of an MCV
stats entry doesn't necessarily mean that we know nothing; we may
well know that the column is unique. The former coding returned zero
for *mcv_freq in this case, which was pretty close to correct, but now
final_cost_hashjoin doesn't believe it and disables the hash join.
So check to see if there is a HISTOGRAM stats entry; if so, ANALYZE
has in fact run for this column and must have found it to be unique.
In that case report the MCV frequency as 1 / rows, instead of claiming
ignorance.
Reporting a more accurate *mcv_freq in this case can also affect the
bucket-size skew adjustment further down in estimate_hash_bucket_stats,
causing hash-join cost estimates to change slightly. This affects
some plan choices in the core regression tests. The first diff in
join.out corresponds to a case where we have no stats and should not
risk a hash join, but the remaining changes are caused by producing
a better bucket-size estimate for unique join columns. Those are all
harmless changes so far as I can tell.
The existing behavior was introduced in commit 4867d7f62 in v11.
It appears from the commit log that disabling the bucket-size safety
check in the absence of statistics was intentional; but we've now seen
a case where the ensuing behavior is bad enough to make that seem like
a poor decision. In any case the lack of other problems with that
safety check after several years helps to justify enforcing it more
strictly. However, we won't risk back-patching this, in case any
applications are depending on the existing behavior.
Bug: #19363
Reported-by: Jinhui Lai <jinhui.lai@qq.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/2380165.1766871097@sss.pgh.pa.us
Discussion: https://postgr.es/m/19363-8dd32fc7600a1153@postgresql.org
This patch causes one postgres_fdw test case to revert to the plan
it used before aa86129e1, i.e., using a remote sort in preference to
local sort. That decision is actually a coin-flip because cost_sort()
will give the same answer on both sides, so that the plan choice comes
down to little more than roundoff error. In consequence, the test
output can change as a result of even minor changes in nearby costs,
as we saw in aa86129e1 (compare also b690e5fac and 4b14e1871).
b690e5fac's solution to stabilizing the adjacent test case was to
disable sorting locally, and here we extend that to the currently-
problematic case. Without this, the following patch would cause this
plan choice to change back in this same way, for even less apparent
reason.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/2551253.1766952956@sss.pgh.pa.us
Mkvcbuild.pm scrapes Makefile contents, but couldn't understand the
change made by commit bec2a0aa. Revealed by BF animal hamerkop in
branch REL_16_STABLE.
1. It used += instead of =, which didn't match the pattern that
Mkvcbuild.pm looks for. Drop the +.
2. Mkvcbuild.pm doesn't link PROGRAM executables with libpgport. Apply
a local workaround to REL_16_STABLE only (later branches dropped
Mkvcbuild.pm).
Backpatch-through: 16
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/175163.1766357334%40sss.pgh.pa.us
When looking up statistical data about an expression, we failed to
look through PlaceHolderVar nodes, treating them as opaque. This
could prevent us from matching an expression to base columns, index
expressions, or extended statistics, as examine_variable() relies on
strict structural matching.
As a result, queries involving PlaceHolderVar nodes often fell back to
default selectivity estimates, potentially leading to poor plan
choices.
This patch updates examine_variable() to strip PlaceHolderVars before
analysis. This is safe during estimation because PlaceHolderVars are
transparent for the purpose of statistics lookup: they do not alter
the value distribution of the underlying expression.
To minimize performance overhead on this hot path, a lightweight
walker first checks for the presence of PlaceHolderVars. The more
expensive mutator is invoked only when necessary.
There is one ensuing plan change in the regression tests, which is
expected and demonstrates the fix: the rowcount estimate becomes much
more accurate with this patch.
Back-patch to v18. Although this issue exists before that, changes in
this version made it common enough to notice. Given the lack of field
reports for older versions, I am not back-patching further.
Reported-by: Haowu Ge <gehaowu@bitmoe.com>
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/62af586c-c270-44f3-9c5e-02c81d537e3d.gehaowu@bitmoe.com
Backpatch-through: 18
When pulling up a subquery, we may need to wrap its targetlist items
in PlaceHolderVars to enforce separate identity or as a result of
outer joins. However, this causes any upper-level WHERE clauses
referencing these outputs to contain PlaceHolderVars, which prevents
indxpath.c from recognizing that they could be matched to index
columns or index expressions, potentially affecting the planner's
ability to use indexes.
To fix, explicitly strip PlaceHolderVars from index operands. A
PlaceHolderVar appearing in a relation-scan-level expression is
effectively a no-op. Nevertheless, to play it safe, we strip only
PlaceHolderVars that are not marked nullable.
The stripping is performed recursively to handle cases where
PlaceHolderVars are nested or interleaved with other node types. To
minimize performance impact, we first use a lightweight walker to
check for the presence of strippable PlaceHolderVars. The expensive
mutator is invoked only if a candidate is found, avoiding unnecessary
memory allocation and tree copying in the common case where no
PlaceHolderVars are present.
Back-patch to v18. Although this issue exists before that, changes in
this version made it common enough to notice. Given the lack of field
reports for older versions, I am not back-patching further.
Reported-by: Haowu Ge <gehaowu@bitmoe.com>
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/62af586c-c270-44f3-9c5e-02c81d537e3d.gehaowu@bitmoe.com
Backpatch-through: 18
This change makes more readable code diffs when adding new items or
removing old items, while ensuring that lines do not get excessively
long. Some SUBDIRS, PROGRAMS and REGRESS lists are split.
Note that there are a few more REGRESS lists that could be split,
particularly in contrib/.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Co-Authored-By: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Man Zeng <zengman@halodbtech.com>
Discussion: https://postgr.es/m/DF6HDGB559U5.3MPRFCWPONEAE@jeltef.nl
The correct spelling is Beijing, fix in regression test
and docs.
Author: JiaoShuntian <jiaoshuntian@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/ebfa3ec2-dc3c-4adb-be2a-4a882c2e85a7@gmail.com
Presumably, the C type MsgType was meant to hold the protocol message
type in the pre-version-3 era, but this was never fully developed even
then, and the name is pretty confusing nowadays. It has only one
vestigial use for cancel requests that we can get rid of. Since a
cancel request is indicated by a special protocol version number, we
can use the ProtocolVersion type, which MsgType was based on.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/505e76cb-0ca2-4e22-ba0f-772b5dc3f230%40eisentraut.org
The variable_is_guc_list_quote function need to know about all
GUC_QUOTE variables, this adds oauth_validator_libraries which
was missing. Backpatch to v18 where OAuth was introduced.
Author: ChangAo Chen <cca5507@qq.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/tencent_03D4D2A5C0C8DCE0CD1DB4D945858E15420A@qq.com
Backpatch-through: 18
pg_stat_get_backend_activity() calls pgstat_clip_activity() to ensure
that the reported query string is correctly truncated when it finishes
with an incomplete multi-byte sequence. However, the result returned by
the function was not what pgstat_clip_activity() generated, but the
non-truncated, original, contents from PgBackendStatus.st_activity_raw.
Oversight in 54b6cd589a, so backpatch all the way down.
Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2mDzwc48q2EK9tSXS6iJMJ35wvxNQnHX+rXjy5VgLvJQw@mail.gmail.com
Backpatch-through: 14
This change has the advantage of removing some weird type casts, caused
by offset calculations based on pgoff_t but saved as int (on older
branches we use off_t, which could be 4 or 8 bytes depending on the
environment). These are safe currently because capped by
MAX_PHYSICAL_FILESIZE, but we would run into problems when to make
MAX_PHYSICAL_FILESIZE larger or allow callers of these routines to use a
larger physical max size on demand.
While on it, this improves BufFileDumpBuffer() so as we do not use an
offset for "availbytes". It is not a file offset per-set, but a number
of available bytes.
This change should lead to no functional changes.
Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aUStrqoOCDRFAq1M@paquier.xyz
Many of the operations done for attribute stats in attribute_stats.c
share the same logic as extended stats, as done by a patch under
discussion to add support for extended stats import and export. All the
pieces necessary for extended statistics are moved to stats_utils.c,
which is the file where common facilities are shared for stats files.
The following renames are done:
* get_attr_stat_type() -> statatt_get_type()
* init_empty_stats_tuple() -> statatt_init_empty_tuple()
* set_stats_slot() -> statatt_set_slot()
* get_elem_stat_type() -> statatt_get_elem_type()
While on it, this commit adds more documentation for all these
functions, describing more their internals and the dependencies that
have been implied for attribute statistics. The same concepts apply to
extended statistics, at some degree.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Yu Wang <wangyu_runtime@163.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
If there are any SRFs in a PathTarget, we must separate it into
SRF-computing and SRF-free targets. This is because the executor can
only handle SRFs that appear at the top level of the targetlist of a
ProjectSet plan node.
If we find a subexpression that matches an expression already computed
in the previous plan level, we should treat it like a Var and should
not split it again. setrefs.c will later replace the expression with
a Var referencing the subplan output.
However, when processing the grouping target for grouping sets, the
planner can fail to recognize that an expression is already computed
in the scan/join phase. The root cause is a mismatch in the
nullingrels bits. Expressions in the grouping target carry the
grouping nulling bit in their nullingrels to indicate that they can be
nulled by the grouping step. However, the corresponding expressions
in the scan/join target do not have these bits.
As a result, the exact match check in list_member() fails, leading the
planner to incorrectly believe that the expression needs to be
re-evaluated from its arguments, which are often not available in the
subplan. This can lead to planner errors such as "variable not found
in subplan target list".
To fix, ignore the grouping nulling bit when checking whether an
expression from the grouping target is available in the pre-grouping
input target. This aligns with the matching logic in setrefs.c.
Backpatch to v18, where this issue was introduced.
Bug: #19353
Reported-by: Marian MULLER REBEYROL <marian.muller@serli.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/19353-aaa179bba986a19b@postgresql.org
Backpatch-through: 18
Commit 8a3e4011 introduced tab completion for the ONLY option of
VACUUM and ANALYZE, along with some code simplification using
MatchAnyN. However, it caused a regression in tab completion for
VACUUM option values. For example, neither ON nor OFF was suggested
after "VACUUM (VERBOSE". In addition, the ONLY keyword was not
suggested immediately after a completed option list.
Backpatch to v18.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Discussion: https://postgr.es/m/20251223021509.19bba68ecbbc70c9f983c2b4@sraoss.co.jp
Backpatch-through: 18
Commit 67c209 removed the WARNING for insufficient wal_level from the
expected output, but the WARNING may still appear on buildfarm members
that run with wal_level=minimal.
To avoid unstable test output depending on wal_level, this commit the
test to use ALTER PUBLICATION for verifying the same behavior,
ensuring the output remains consistent regardless of the wal_level
setting.
Per buildfarm member thorntail.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Discussion: https://postgr.es/m/TY4PR01MB16907680E27BAB146C8EB1A4294B2A@TY4PR01MB16907.jpnprd01.prod.outlook.com
The pg_overexplain documentation previously used the <literal> tag for
some file names, struct names, and commands. Update the markup to
use the more appropriate tags: <filename>, <structname>, and <command>.
Backpatch to v18, where pg_overexplain was introduced.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Shixin Wang <wang-shi-xin@outlook.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwEyYUzz0LjBV_fMcdwU3wgmu0NCoT+JJiozPa8DG6eeog@mail.gmail.com
Backpatch-through: 18
CREATE SUBSCRIPTION with copy_data=true and origin='none' previously
failed when the publisher was running a version earlier than PostgreSQL 19,
even though this combination should be supported.
The failure occurred because the command issued a query calling
pg_get_publication_sequences function on the publisher. That function
does not exist before PG19 and the query is only needed for logical
replication sequence synchronization, which is supported starting in PG19.
This commit fixes this issue by skipping that query when the
publisher runs a version earlier than PG19.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwEx4twHtJdiPWTyAXJhcBPLaH467SH2ajGSe-41m65giA@mail.gmail.com
The retain_dead_tuples subscription option is supported only when
the publisher runs PostgreSQL 19 or later. However, it could previously
be enabled even when the publisher was running an earlier version.
This was caused by check_pub_dead_tuple_retention() comparing
the publisher server version against 19000 instead of 190000.
Fix this typo so that the version check correctly enforces the PG19+
requirement.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwEx4twHtJdiPWTyAXJhcBPLaH467SH2ajGSe-41m65giA@mail.gmail.com
Update the logical replication documentation to explicitly outline the
privilege requirements for each publication syntax. This will ensure users
understand the necessary permissions when creating or managing
publications.
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/CANhcyEXODen4U0XLk0aAwFTwGxjAfE9eRaynREenLp-JBSaFHw@mail.gmail.com
Currently, the function expr_is_nonnullable() checks only Const and
Var expressions to determine if an expression is non-nullable. This
patch extends the detection logic to handle more expression types.
This can enable several downstream optimizations, such as reducing
NullTest quals to constant truth values (e.g., "COALESCE(var, 1) IS
NULL" becomes FALSE) and converting "COUNT(expr)" to the more
efficient "COUNT(*)" when the expression is proven non-nullable.
This breaks a test case in test_predtest.sql, since we now simplify
"ARRAY[] IS NULL" to constant FALSE, preventing it from weakly
refuting a strict ScalarArrayOpExpr ("x = any(ARRAY[])"). To ensure
the refutation logic is still exercised as intended, wrap the array
argument in opaque_array().
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49UhPBjm+NRpxerjaeuFKyUZJ_AjM3NBcSYK2JgZ6VTEQ@mail.gmail.com
We break ROW(...) IS [NOT] NULL into separate tests on its component
fields. During this breakdown, we can improve efficiency by utilizing
expr_is_nonnullable() to detect fields that are provably non-nullable.
If a component field is proven non-nullable, it affects the outcome
based on the test type. For an IS NULL test, a single non-nullable
field refutes the whole NullTest, reducing it to constant FALSE. For
an IS NOT NULL test, the check for that specific field is guaranteed
to succeed, so we can discard it from the list of component tests.
This extends the existing optimization logic, which previously only
handled Const fields, to support any expression that can be proven
non-nullable.
In passing, update the existing constant folding of NullTests to use
expr_is_nonnullable() instead of var_is_nonnullable(), enabling it to
benefit from future improvements to that function.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49UhPBjm+NRpxerjaeuFKyUZJ_AjM3NBcSYK2JgZ6VTEQ@mail.gmail.com
The COALESCE function returns the first of its arguments that is not
null. When an argument is proven non-null, if it is the first
non-null-constant argument, the entire COALESCE expression can be
replaced by that argument. If it is a subsequent argument, all
following arguments can be dropped, since they will never be reached.
Currently, we perform this simplification only for Const arguments.
This patch extends the simplification to support any expression that
can be proven non-nullable.
This can help avoid the overhead of evaluating unreachable arguments.
It can also lead to better plans when the first argument is proven
non-nullable and replaces the expression, as the planner no longer has
to treat the expression as non-strict, and can also leverage index
scans on the resulting expression.
There is an ensuing plan change in generated_virtual.out, and we have
to modify the test to ensure that it continues to test what it is
intended to.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49UhPBjm+NRpxerjaeuFKyUZJ_AjM3NBcSYK2JgZ6VTEQ@mail.gmail.com
The logical replication parallel apply worker could incorrectly advance
the origin progress during an error or failed apply. This behavior risks
transaction loss because such transactions will not be resent by the
server.
Commit 3f28b2fcac addressed a similar issue for both the apply worker and
the table sync worker by registering a before_shmem_exit callback to reset
origin information. This prevents the worker from advancing the origin
during transaction abortion on shutdown. This patch applies the same fix
to the parallel apply worker, ensuring consistent behavior across all
worker types.
As with 3f28b2fcac, we are backpatching through version 16, since parallel
apply mode was introduced there and the issue only occurs when changes are
applied before the transaction end record (COMMIT or ABORT) is received.
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 16
Discussion: https://postgr.es/m/TY4PR01MB169078771FB31B395AB496A6B94B4A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/TYAPR01MB5692FAC23BE40C69DA8ED4AFF5B92@TYAPR01MB5692.jpnprd01.prod.outlook.com
This one was missed in 8f1791c61, because the machines that
detected those issues don't compile this function.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1324889.1764886170@sss.pgh.pa.us
This commit updates pg_visibility_map_summary() to use the
visibilitymap_count() API, replacing its own counting mechanism. This
simplifies the function and improves performance by leveraging the
vectorized implementation introduced in commit 41c51f0c68.
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAEze2WgPu-EYYuYQimy=AHQHGa7w8EvLVve5DM5eGMR6zh-7sw@mail.gmail.com
Previously logical decoding required wal_level to be set to 'logical'
at server start. This meant that users had to incur the overhead of
logical-level WAL logging even when no logical replication slots were
in use.
This commit adds functionality to automatically control logical
decoding availability based on logical replication slot presence. The
newly introduced module logicalctl.c allows logical decoding to be
dynamically activated when needed when wal_level is set to
'replica'.
When the first logical replication slot is created, the system
automatically increases the effective WAL level to maintain
logical-level WAL records. Conversely, after the last logical slot is
dropped or invalidated, it decreases back to 'replica' WAL level.
While activation occurs synchronously right after creating the first
logical slot, deactivation happens asynchronously through the
checkpointer process. This design avoids a race condition at the end
of recovery; a concurrent deactivation could happen while the startup
process enables logical decoding at the end of recovery, but WAL
writes are still not permitted until recovery fully completes. The
checkpointer will handle it after recovery is done. Asynchronous
deactivation also avoids excessive toggling of the logical decoding
status in workloads that repeatedly create and drop a single logical
slot. On the other hand, this lazy approach can delay changes to
effective_wal_level and the disabling logical decoding, especially
when the checkpointer is busy with other tasks. We chose this lazy
approach in all deactivation paths to keep the implementation simple,
even though laziness is strictly required only for end-of-recovery
cases. Future work might address this limitation either by using a
dedicated worker instead of the checkpointer, or by implementing
synchronous waiting during slot drops if workloads are significantly
affected by the lazy deactivation of logical decoding.
The effective WAL level, determined internally by XLogLogicalInfo, is
allowed to change within a transaction until an XID is assigned. Once
an XID is assigned, the value becomes fixed for the remainder of the
transaction. This behavior ensures that the logging mode remains
consistent within a writing transaction, similar to the behavior of
GUC parameters.
A new read-only GUC parameter effective_wal_level is introduced to
monitor the actual WAL level in effect. This parameter reflects the
current operational WAL level, which may differ from the configured
wal_level setting.
Bump PG_CONTROL_VERSION as it adds a new field to CheckPoint struct.
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAD21AoCVLeLYq09pQPaWs+Jwdni5FuJ8v2jgq-u9_uFbcp6UbA@mail.gmail.com
After waiting for a concurrent updater to finish, heap_lock_tuple()
followed the update chain to lock all tuple versions. However, when
stepping from the initial tuple to the next one, it failed to check
that the next tuple's XMIN matches the initial tuple's XMAX. That's an
important check whenever following an update chain, and the recursive
part that follows the chain did it, but the initial step missed it.
Without the check, if the updating transaction aborts, the updated
tuple is vacuumed away and replaced by an unrelated tuple, the
unrelated tuple might get incorrectly locked.
Author: Jasper Smit <jasper.smit@servicenow.com>
Discussion: https://www.postgresql.org/message-id/CAOG+RQ74x0q=kgBBQ=mezuvOeZBfSxM1qu_o0V28bwDz3dHxLw@mail.gmail.com
Backpatch-through: 14
Since ce0fdbfe97, a replication slot and an origin are created by each
tablesync worker, whose information is stored in both a catalog and
shared memory (once the origin is set up in the latter case). The
transaction where the origin is created is the same as the one that runs
the initial COPY, with the catalog state of the origin becoming visible
for other sessions only once the COPY transaction has committed. The
catalog state is coupled with a state in shared memory, initialized at
the same time as the origin created in the catalogs. Note that the
transaction doing the initial data sync can take a long time, time that
depends on the amount of data to transfer from a publication node to its
subscriber node.
Now, when a DROP SUBSCRIPTION is executed, all its workers are stopped
with the origins removed. The removal of each origin relies on a
catalog lookup. A worker still running the initial COPY would fail its
transaction, with the catalog state of the origin rolled back while the
shared memory state remains around. The session running the DROP
SUBSCRIPTION should be in charge of cleaning up the catalog and the
shared memory state, but as there is no data in the catalogs the shared
memory state is not removed. This issue would leave orphaned origin
data in shared memory, leading to a confusing state as it would still
show up in pg_replication_origin_status. Note that this shared memory
data is sticky, being flushed on disk in replorigin_checkpoint at
checkpoint. This prevents other origins from reusing a slot position
in the shared memory data.
To address this problem, the commit moves the creation of the origin at
the end of the transaction that precedes the one executing the initial
COPY, making the origin immediately visible in the catalogs for other
sessions, giving DROP SUBSCRIPTION a way to know about it. A different
solution would have been to clean up the shared memory state using an
abort callback within the tablesync worker. The solution of this commit
is more consistent with the apply worker that creates an origin in a
short transaction.
A test is added in the subscription test 004_sync.pl, which was able to
display the problem. The test fails when this commit is reverted.
Reported-by: Tenglong Gu <brucegu@amazon.com>
Reported-by: Daisuke Higuchi <higudai@amazon.com>
Analyzed-by: Michael Paquier <michael@paquier.xyz>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aUTekQTg4OYnw-Co@paquier.xyz
Backpatch-through: 14
off_t was previously used for offsets, which is 4 bytes on Windows,
hence limiting the backend code to a hard limit for files longer than
2GB. This leads to some simplification in these files, removing some
casts based on long, also 4 bytes on Windows.
This commit removes one comment introduced in db3c4c3a2d, not relevant
anymore as pgoff_t is a safe 8-byte alternative on Windows.
This change is surprisingly not invasive, as the callers of
BufFileTell(), BufFileSeek() and BufFileTruncateFileSet() (worker.c,
tuplestore.c, etc.) track offsets in local structures that just to
switch from off_t to pgoff_t for the most part.
The file is still relying on a maximum file size of
MAX_PHYSICAL_FILESIZE (1GB). This change allows the code to make this
maximum potentially larger in the future, or larger on a per-demand
basis.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aUStrqoOCDRFAq1M@paquier.xyz
Previously, only the first option in a parenthesized option list was
suggested by tab completion. This commit enhances tab completion for
both COPY TO and COPY FROM commands to suggest options after each
comma.
Also add completion for HEADER and FREEZE option value candidates.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20250605100835.b396f9d656df1018f65a4556@sraoss.co.jp
Correct the referenced location of the RangeTblEntry definition
in the pg_overexplain documentation.
Backpatched to v18, where pg_overexplain was introduced.
Author: Julien Tachoires <julien@tachoires.me>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/20251218092319.tht64ffmcvzqdz7u@poseidon.home.virt
Backpatch-through: 18
nbtree index-only scans of an index that uses btree/name_ops as one of
its index column's input opclasses are no longer at any risk of reading
past the end of currTuples. We're no longer reliant on such scans being
able to at least read from the start of markTuples storage (which uses
space from the same allocation as currTuples) to avoid a segfault:
StoreIndexTuple (from nodeIndexonlyscan.c) won't actually read past the
end of a cstring datum from a name_ops index. In other words, we
already have the "special-case treatment for name_ops" that the removed
comment supposed we could avoid by relying on markTuples in this way.
Oversight in commit a63224be49, which added special case handling of
name_ops cstrings to StoreIndexTuple, but missed these comments.
An unused variable caused a compiler warning on BF animal fairywren, an
snprintf() call was redundant, and some buffer sizes were inconsistent.
Per code review from Tom Lane.
The Makefile's test ifeq ($(PORTNAME), win32) never succeeded due to a
circularity, so only Meson builds were actually compiling the new test
code, partially explaining why CI didn't tell us about the warning
sooner (the other problem being that CompilerWarnings only makes
world-bin, a problem for another commit). Simplify.
Backpatch-through: 16, like commit c507ba55
Author: Bryan Green <dbryan.green@gmail.com>
Co-authored-by: Thomas Munro <tmunro@gmail.com>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1086088.1765593851%40sss.pgh.pa.us
The main optimization is for LockBufHdr() to delay initializing
SpinDelayStatus, similar to what LWLockWaitListLock already did. The
initialization is sufficiently expensive & buffer header lock acquisitions are
sufficiently frequent, to make it worthwhile to instead have a fastpath (via a
likely() branch) that does not initialize the SpinDelayStatus.
While LWLockWaitListLock() already the aforementioned optimization, it did not
use likely(), and inspection of the assembly shows that this indeed leads to
worse code generation (also observed in a microbenchmark). Fix that by adding
the likely().
While the LockBufHdr() improvement is a small gain on its own, it mainly is
aimed at preventing a regression after a future commit, which requires
additional locking to set hint bits.
While touching both, also make the comments more similar to each other.
Reviewed-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
In the wake of commit db6a4a985, remove most use of 'md5' from the
example configuration file. The only remainder is an example exception
for a client that doesn't support SCRAM.
Author: Mikael Gustavsson <mikael.gustavsson@smhi.se>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/176595607507.978865.11597773194269211255@wrigleys.postgresql.org
Discussion: https://postgr.es/m/4ed268473fdb4cf9b0eced6c8019d353@smhi.se
Backpatch-through: 18
Previously, if memory context logging was triggered repeatedly and
rapidly while a previous request was still being processed, it could
result in recursive calls to ProcessLogMemoryContextInterrupt().
This could lead to infinite recursion and potentially crash the process.
This commit adds a guard to prevent such recursion.
If ProcessLogMemoryContextInterrupt() is already in progress and
logging memory contexts, subsequent calls will exit immediately,
avoiding unintended recursive calls.
While this scenario is unlikely in practice, it's not impossible.
This change adds a safety check to prevent such failures.
Back-patch to v14, where memory context logging was introduced.
Reported-by: Robert Haas <robertmhaas@gmail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Artem Gavrilov <artem.gavrilov@percona.com>
Discussion: https://postgr.es/m/CA+TgmoZMrv32tbNRrFTvF9iWLnTGqbhYSLVcrHGuwZvCtph0NA@mail.gmail.com
Backpatch-through: 14
All the code paths updated here have been using relation_close() to
close a relation that has already been opened with table_open() or
index_open(), where a relkind check is enforced.
table_close() and index_open() do the same thing as relation_close(), so
there was no harm, but being inconsistent could lead to issues if the
internals of these close() functions begin to introduce some logic
specific to each relkind in the future.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aUKamYGiDKO6byp5@ip-10-97-1-34.eu-west-3.compute.internal
Commit 0decd5e89d missed
DO_SUBSCRIPTION_REL, leading to assertion failures. In the unlikely use
case of diffing "pg_dump --binary-upgrade" output, spurious diffs were
possible. As part of fixing that, align the DumpableObject naming and
sort order with DO_PUBLICATION_REL. The overall effect of this commit
is to change sort order from (subname, srsubid) to (rel, subname).
Since DO_SUBSCRIPTION_REL is only for --binary-upgrade, accept that
larger-than-usual dump order change. Back-patch to v17, where commit
9a17be1e24 introduced DO_SUBSCRIPTION_REL.
Reported-by: vignesh C <vignesh21@gmail.com>
Author: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CALDaNm2x3rd7C0_HjUpJFbxpAqXgm=QtoKfkEWDVA8h+JFpa_w@mail.gmail.com
Backpatch-through: 17
Operations on unlogged relations should not be WAL-logged. The
brin_initialize_empty_new_buffer() function didn't get the memo.
The function is only called when a concurrent update to a brin page
uses up space that we're just about to insert to, which makes it
pretty hard to hit. If you do manage to hit it, a full-page WAL record
is erroneously emitted for the unlogged index. If you then crash,
crash recovery will fail on that record with an error like this:
FATAL: could not create file "base/5/32819": File exists
Author: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/CALdSSPhpZXVFnWjwEBNcySx_vXtXHwB2g99gE6rK0uRJm-3GgQ@mail.gmail.com
Backpatch-through: 14
Commit 0d2d4a0ec3 introduced a test that verifies replication slot
synchronization to a standby server via SQL API. However, the test did not
configure synchronized_standby_slots. Without this setting, logical
failover slots can advance beyond the physical replication slot, causing
intermittent synchronization failures.
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Discussion: https://postgr.es/m/TY4PR01MB16907DF70205308BE918E0D4494ABA@TY4PR01MB16907.jpnprd01.prod.outlook.com
This change has been suggested by the two authors listed in this commit,
both of them providing an incomplete solution (David's formula relied on
a "bytea *", while Bertrand's did not use palloc_array()). The solution
provided in this commit uses GBT_VARKEY instead of the inconsistent
bytea for the allocation size, with a palloc_array().
The change related to Vsrt is one I am flipping to a more consistent
style, in passing.
Author: David Geier <geidav.pg@gmail.com>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
Discussion: https://postgr.es/m/aTrG3Fi4APtfiCvQ@ip-10-97-1-34.eu-west-3.compute.internal
4ba012a8ed defined the "header" (pointer to the stats data) of
from_serialized_data() as a const, even though it is fine (and
expected!) for the callback to modify the shared memory entry when
loading the stats at startup.
While on it, this commit updates the callback to_serialized_data() in
the test module test_custom_stats to make the data extracted from the
"header" parameter a const since it should never be modified: the stats
are written to disk and no modifications are expected in the shared
memory entry.
This clarifies the API contract of these new callbacks.
Reported-By: Peter Eisentraut <peter@eisentraut.org>
Author: Michael Paquier <michael@paquier.xyz>
Co-authored-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/d87a93b0-19c7-4db6-b9c0-d6827e7b2da1@eisentraut.org
Commit e0f373ee4 fixed up races in Cluster::connect_fails when using
log_like. t/002_client.pl didn't get the memo, though, because it
doesn't use Test::Cluster to perform its custom hook tests. (This is
probably not an issue at the moment, since the log check is only done
after authentication success and not failure, but there's no reason to
wait for someone to hit it.)
Introduce the fix, based on debug2 logging, to its use of log_check() as
well, and move the logic into the test() helper so that any additions
don't need to continually duplicate it.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAOYmi%2BmrGg%2Bn_X2MOLgeWcj3v_M00gR8uz_D7mM8z%3DdX1JYVbg%40mail.gmail.com
Backpatch-through: 18
The definition of PGoauthBearerRequest uses a temporary SOCKTYPE macro
to hide the difference between Windows and Berkeley socket handles,
since we don't surface pgsocket in our public API. This macro doesn't
need to escape the header, because implementers will choose the correct
socket type based on their platform, so I #undef'd it immediately after
use.
I didn't namespace that helper, though, so if anyone else needs a
SOCKTYPE macro, libpq-fe.h will now unhelpfully get rid of it. This
doesn't seem too far-fetched, given its proximity to existing POSIX
macro names.
Add a PQ_ prefix to avoid collisions, update and improve the surrounding
documentation, and backpatch.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAOYmi%2BmrGg%2Bn_X2MOLgeWcj3v_M00gR8uz_D7mM8z%3DdX1JYVbg%40mail.gmail.com
Backpatch-through: 18
I originally used just "regress-VERSION.mo", but that seems too
generic considering that some packagers will put this file into
a system-wide directory. Per suggestion from Christoph Berg.
Reported-by: Christoph Berg <myon@debian.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/aULSW7Xqx5MqDW_1@msg.df7cb.de
The test is very sensitive to how backends start and exit, because it
tests dead-end backends which occur when all the connection slots are
in use. The test failed occasionally in the CI, when the backend that
was launched for the raw_connect_works() check lingered for a while,
and exited only later during the test. When it exited, it released a
connection slot, when the test expected all the slots to be in use at
that time.
The 002_connection_limits.pl test had a similar issue: if the backend
launched for safe_psql() in the test initialization lingers around, it
uses up a connection slot during the test, messing up the test's
connection counting. I haven't seen that in the CI, but when I added a
"sleep(1);" to proc_exit(), the test failed.
To make the tests more robust, restart the server to ensure that the
lingering backends doesn't interfere with the later test steps.
In the passing, fix a bogus test name.
Report and analysis by Jelte Fennema-Nio, Andres Freund, Thomas Munro.
Discussion: https://www.postgresql.org/message-id/CAGECzQSU2iGuocuP+fmu89hmBmR3tb-TNyYKjCcL2M_zTCkAFw@mail.gmail.com
Backpatch-through: 18
Allow pg_createsubscriber to reuse existing publications instead of
failing when they already exist on the publisher.
Previously, pg_createsubscriber would fail if any specified publication
already existed. Now, existing publications are reused as-is with their
current configuration, and non-existing publications are created
automatically with FOR ALL TABLES.
This change provides flexibility when working with mixed scenarios of
existing and new publications. Users should verify that existing
publications have the desired configuration before reusing them, and can
use --dry-run with verbose mode to see which publications will be reused
and which will be created.
Only publications created by pg_createsubscriber are cleaned up during
error cleanup operations. Pre-existing publications are preserved unless
'--clean=publications' is explicitly specified, which drops all
publications.
This feature would be helpful for pub-sub configurations where users want
to subscribe to a subset of tables from the publisher.
Author: Shubham Khanna <khannashubham1197@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: tianbing <tian_bing_0531@163.com>
Discussion: https://postgr.es/m/CAHv8Rj%2BsxWutv10WiDEAPZnygaCbuY2RqiLMj2aRMH-H3iZwyA%40mail.gmail.com
This change makes pgstat_report_vacuum() more consistent with
pgstat_report_analyze(), that also uses a Relation. This enforces a
policy that callers of this routine should open and lock the relation
whose statistics are updated before calling this routine. We will
unlikely have a lot of callers of this routine in the tree, but it seems
like a good idea to imply this requirement in the long run.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aUEA6UZZkDCQFgSA@ip-10-97-1-34.eu-west-3.compute.internal
strlcpy() ensures that a target string is zero-terminated, so there is
no need to enforce it a second time in this code. This simplification
could have been done in 0eb23285a2.
Author: Feilong Meng <feelingmeng@foxmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/tencent_771178777C5BC17FCB7F7A1771CD1FFD5708@qq.com
ICU still depends on libc for compatibility with certain historical
behavior for single-byte encodings. Make the dependency explicit by
holding a locale_t object when required.
We should consider a better solution in the future, such as decoding
the text to UTF-32 and using u_tolower(). That would be a behavior
change and require additional infrastructure though; so for now, just
avoid the global LC_CTYPE dependency.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Previously, libc's tolower() was always used for lowercasing
identifiers, regardless of the database locale (though only characters
beyond 127 in single-byte encodings were affected). Refactor to allow
each provider to supply its own implementation of identifier
downcasing.
For historical compatibility, when using a single-byte encoding, ICU
still relies on tolower().
One minor behavior change is that, before the database default locale
is initialized, it uses ASCII semantics to downcase the
identifiers. Previously, it would use the postmaster's LC_CTYPE
setting from the environment. While that could have some effect during
GUC processing, for example, it would have been fragile to rely on the
environment setting anyway. (Also, it only matters when the encoding
is single-byte.)
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Previously, ltree_prefix_eq_ci() used lowercasing with the default
collation; while ltree_crc32_sz() used tolower() directly. These were
equivalent only if the default collation provider was libc and the
encoding was single-byte.
Change both to use casefolding with the default collation.
Backpatch through 18, where the casefolding APIs were introduced. The
bug exists in earlier versions, but would require some adaptation.
A REINDEX is required for ltree indexes where the database default
collation is not libc.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Backpatch-through: 18
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Discussion: https://postgr.es/m/01fc00fd66f641b9693d4f9f1af0ccf44cbdfbdf.camel@j-davis.com
Previously, the API for ltree_strncasecmp() took two inputs but only
one length (that of the smaller input). It truncated the larger input
to that length, but that could break a multibyte sequence.
Change the API to be a check for prefix equality (possibly
case-insensitive) instead, which is all that's needed by the
callers. Also, provide the lengths of both inputs.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/5f65b85740197ba6249ea507cddf609f84a6188b.camel%40j-davis.com
Backpatch-through: 14
We already do this in CreateParallelContext, InitializeParallelDSM, and
LaunchParallelWorkers. I suspect the reason why the matching logic was
omitted from ReinitializeParallelDSM is that I failed to realize that
any memory allocation was happening here -- but shm_mq_attach does
allocate, which could result in a shm_mq_handle being allocated in a
shorter-lived context than the ParallelContext which points to it.
That could result in a crash if the shorter-lived context is freed
before the parallel context is destroyed. As far as I am currently
aware, there is no way to reach a crash using only code that is
present in core PostgreSQL, but extensions could potentially trip
over this. Fixing this in the back-branches appears low-risk, so
back-patch to all supported versions.
Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Co-authored-by: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
Backpatch-through: 14
Discussion: http://postgr.es/m/CAKZiRmwfVripa3FGo06=5D1EddpsLu9JY2iJOTgbsxUQ339ogQ@mail.gmail.com
Further research shows that the reason commit 7db6809ce failed
is that recent glibc versions short-circuit translation attempts
when LC_MESSAGES is 'C.<encoding>', not only when it's 'C'.
There seems no way around that, so we'll have to live with only
testing NLS when a suitable real locale is installed.
However, something can still be salvaged: it still seems like a
good idea to verify that the PRI* macros work as-expected even when
we can't check their translations (see f8715ec86 for motivation).
Hence, adjust the test to always run the ereport calls, and tweak
the parameter values in hopes of detecting any cases where there's
confusion about the actual widths of the parameters.
Discussion: https://postgr.es/m/1991599.1765818338@sss.pgh.pa.us
heap_page_prune_and_freeze() fills in PruneState->deadoffsets, the array
of OffsetNumbers of dead tuples. It is returned to the caller in the
PruneFreezeResult. To avoid having two copies of the array, the
PruneState saves only a pointer to the array. This was a bit unusual and
confusing, so add a clarifying comment.
Author: Melanie Plageman <melanieplageman@gmail.com>
Suggested-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2=jiD1nqch4JQN+odAxZSD7mRvdoHUGJYN2r6tQG_66yQ@mail.gmail.com
The const qualification of the presult argument to prune_freeze_setup()
is later cast away, so it was not correct. Remove it and add a comment
explaining that presult should not be modified.
Author: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/fb97d0ae-a0bc-411d-8a87-f84e7e146488%40eisentraut.org
Commit 119fc30 moved CompareType to cmptype.h but the mention in
the docs still refered to primnodes.h
Author: Daisuke Higuchi <higuchi.daisuke11@gmail.com>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAEVT6c8guXe5P=L_Un5NUUzCgEgbHnNcP+Y3TV2WbQh-xjiwqA@mail.gmail.com
Backpatch-through: 18
In the wake of commit b45242fd3, bytea_sortsupport() still called out
to varstr_sortsupport(). Treating bytea as a kind of text/varchar
required varstr_sortsupport() to allow for the possibility of
NUL bytes, but only for C collation. This was confusing. For
better separation of concerns, create an independent sortsupport
implementation in bytea.c.
The heuristics for bytea_abbrev_abort() remain the same as for
varstr_abbrev_abort(). It's possible that the bytea case warrants
different treatment, but that is left for future investigation.
In passing, adjust some strange looking comparisons in
varstr_abbrev_abort().
Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAJ7c6TP1bAbEhUJa6+rgceN6QJWMSsxhg1=mqfSN=Nb-n6DAKg@mail.gmail.com
This commit provides test coverage for dc7c77f825, where the redo
record and the checkpoint record finish on different WAL segments with
the start of recovery able to detect that the redo record is missing.
This test uses a wait injection point done in the critical section of a
checkpoint, method that requires not one but actually two wait injection
points to avoid any memory allocations within the critical section of
the checkpoint:
- Checkpoint run with a background psql.
- One first wait point is run by the checkpointer before the critical
section, allocating the shared memory required by the DSM registry for
the wait machinery in the library injection_points.
- First point is woken up.
- Second wait point is loaded before the critical section, allocating
the memory to build the path to the library loaded, then run in the
critical section once the checkpoint redo record has been logged.
- WAL segment is switched while waiting on the second point.
- Checkpoint completes.
- Stop cluster with immediate mode.
- The segment that includes the redo record is removed.
- Start, recovery fails as the redo record cannot be found.
The error message introduced in dc7c77f825 is now reduced to a FATAL,
meaning that the information is still provided while being able to use a
test for it. Nitin has provided a basic version of the test, that I
have enhanced to make it portable with two points. Without
dc7c77f825, the cluster crashes in this test, not on a PANIC but due
to the pointer dereference at the beginning of recovery, failure
mentioned in the other commit.
Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAMm1aWaaJi2w49c0RiaDBfhdCL6ztbr9m=daGqiOuVdizYWYaA@mail.gmail.com
This commit adds an extra check at the beginning of recovery to ensure
that the redo record of a checkpoint exists before attempting WAL
replay, logging a PANIC if the redo record referenced by the checkpoint
record could not be found. This is the same level of failure as when a
checkpoint record is missing. This check is added when a cluster is
started without a backup_label, after retrieving its checkpoint record.
The redo LSN used for the check is retrieved from the checkpoint record
successfully read.
In the case where a backup_label exists, the startup process already
fails if the redo record cannot be found after reading a checkpoint
record at the beginning of recovery.
Previously, the presence of the redo record was not checked. If the
redo and checkpoint records were located on different WAL segments, it
would be possible to miss a entire range of WAL records that should have
been replayed but were just ignored. The consequences of missing the
redo record depend on the version dealt with, these becoming worse the
older the version used:
- On HEAD, v18 and v17, recovery fails with a pointer dereference at the
beginning of the redo loop, as the redo record is expected but cannot be
found. These versions are good students, because we detect a failure
before doing anything, even if the failure is misleading in the shape of
a segmentation fault, giving no information that the redo record is
missing.
- In v16 and v15, problems show at the end of recovery within
FinishWalRecovery(), the startup process using a buggy LSN to decide
from where to start writing WAL. The cluster gets corrupted, still it
is noisy about it.
- v14 and older versions are worse: a cluster gets corrupted but it is
entirely silent about the matter. The redo record missing causes the
startup process to skip entirely recovery, because a missing record is
the same as not redo being required at all. This leads to data loss, as
everything is missed between the redo record and the checkpoint record.
Note that I have tested that down to 9.4, reproducing the issue with a
version of the author's reproducer slightly modified. The code is wrong
since at least 9.2, but I did not look at the exact point of origin.
This problem has been found by debugging a cluster where the WAL segment
including the redo segment was missing due to an operator error, leading
to a crash, based on an investigation in v15.
Requesting archive recovery with the creation of a recovery.signal or
a standby.signal even without a backup_label would mitigate the issue:
if the record cannot be found in pg_wal/, the missing segment can be
retrieved with a restore_command when checking that the redo record
exists. This was already the case without this commit, where recovery
would re-fetch the WAL segment that includes the redo record. The check
introduced by this commit makes the segment to be retrieved earlier to
make sure that the redo record can be found.
On HEAD, the code will be slightly changed in a follow-up commit to not
rely on a PANIC, to include a test able to emulate the original problem.
This is a minimal backpatchable fix, kept separated for clarity.
Reported-by: Andres Freund <andres@anarazel.de>
Analyzed-by: Andres Freund <andres@anarazel.de>
Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com>
Discussion: https://postgr.es/m/20231023232145.cmqe73stvivsmlhs@awork3.anarazel.de
Discussion: https://postgr.es/m/CAMm1aWaaJi2w49c0RiaDBfhdCL6ztbr9m=daGqiOuVdizYWYaA@mail.gmail.com
Backpatch-through: 14
This reverts commit 7db6809ced.
That doesn't seem to work with recent (last couple of years)
glibc, and the reasons are obscure. I can't let the farm stay
this broken for long.
Now that the prior commits have fixed missing OAuth translations, pull
the bespoke usage of libpq_gettext() for OAUTHBEARER parsing into
oauth_json_set_error() itself, and make that a gettext trigger as well,
to better match what the other sites are doing. Add an _internal()
variant to handle the existing untranslated case.
Suggested-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/0EEBCAA8-A5AC-4E3B-BABA-0BA7A08C361B%40yesql.se
Backpatch-through: 18
Some error messages are generated when OAuth multiplexer operations fail
unexpectedly in the client. Álvaro pointed out that these are both
difficult to translate idiomatically (as they use internal terminology
heavily) and of dubious translation value to end users (since they're
going to need to get developer help anyway). The response parsing engine
has a similar issue.
Remove these from the translation files by introducing internal variants
of actx_error() and oauth_parse_set_error().
Suggested-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAOYmi%2BkQQ8vpRcoSrA5EQ98Wa3G6jFj1yRHs6mh1V7ohkTC7JA%40mail.gmail.com
Backpatch-through: 18
Several strings that should have been translated as they passed through
libpq_gettext were not actually being pulled into the translation files,
because I hadn't directly wrapped them in one of the GETTEXT_TRIGGERS.
Move the responsibility for calling libpq_gettext() to the code that
sets actx->errctx. Doing the same in report_type_mismatch() would result
in double-translation, so mark those strings with gettext_noop()
instead. And wrap two ternary operands with gettext_noop(), even though
they're already in one of the triggers, since xgettext sees only the
first.
Finally, fe-auth-oauth.c was missing from nls.mk, so none of that file
was being translated at all. Add it now.
Original patch by Zhijie Hou, plus suggested tweaks by Álvaro Herrera
and small additions by me.
Reported-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/TY4PR01MB1690746DB91991D1E9A47F57E94CDA%40TY4PR01MB16907.jpnprd01.prod.outlook.com
Backpatch-through: 18
This commit adds a new "void *arg" parameter to
GetNamedDSMSegment() that is passed to the initialization callback
function. This is useful for reusing an initialization callback
function for multiple DSM segments.
Author: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAN4CZFMjh8TrT9ZhWgjVTzBDkYZi2a84BnZ8bM%2BfLPuq7Cirzg%40mail.gmail.com
This removes a never-used CacheInvalidateHeapTupleInplace() parameter.
It adds README content about inplace update visibility in logical
decoding. It rewrites other comments.
Back-patch to v18, where commit 243e9b40f1
first appeared. Since this removes a CacheInvalidateHeapTupleInplace()
parameter, expect a v18 ".abi-compliance-history" edit to follow. PGXN
contains no calls to that function.
Reported-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reported-by: Ilyasov Ian <ianilyasov@outlook.com>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Surya Poondla <s_poondla@apple.com>
Discussion: https://postgr.es/m/CA+renyU+LGLvCqS0=fHit-N1J-2=2_mPK97AQxvcfKm+F-DxJA@mail.gmail.com
Backpatch-through: 18
I had supposed that the majority of machines with gettext installed
would have most language locales installed, but at least in the
buildfarm it turns out less than half have es_ES installed. So
depending on that to run the test now seems like a bad idea. But it
turns out that gettext can be persuaded to "translate" even in the C
locale, as long as you fake out its short-circuit logic by spelling
the locale name like "C.UTF-8" or similar. (Many thanks to Bryan
Green for correcting my misconceptions about that.) Quick testing
suggests that that spelling is accepted by most platforms, though
again the buildfarm may show that "most" isn't "all".
Hence, remove the es_ES dependency and instead create a "C" message
catalog. I've made the test unconditionally set lc_messages to
'C.UTF-8'. That approach might need adjustment depending on what
the buildfarm shows, but let's keep it simple until proven wrong.
While at it, tweak the test so that we run the various ereport's
even when !ENABLE_NLS. This is useful to verify that the macros
provided by <inttypes.h> are compatible with snprintf.c, as we
now know is worth questioning.
Discussion: https://postgr.es/m/1991599.1765818338@sss.pgh.pa.us
Previously, like_fixed_prefix() used char-at-a-time logic, which
forced it to be too conservative for case-insensitive matching.
Introduce like_fixed_prefix_ci(), and use that for case-insensitive
pattern prefixes. It uses multibyte and locale-aware logic, along with
the new pg_iswcased() API introduced in 630706ced0.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Late-model gcc with -fsanitize=undefined enabled issues warnings
about uses of PageGetItemId() when it can't prove that the
offsetNumber is > 0. The call sites where this happens are
checking that the offnum is <= PageGetMaxOffsetNumber(page), so
it seems reasonable to add an explicit check that offnum >= 1 too.
While at it, rearrange the code to be less contorted and avoid
duplicate checks on PageGetMaxOffsetNumber. Maybe the compiler
would optimize away the duplicate logic or maybe not, but the
existing coding has little to recommend it anyway.
There are multiple instances of this identical coding pattern in
heapam.c and heapam_xlog.c. Current gcc only complains about two
of them, but I fixed them all in the name of consistency.
Potentially this could be back-patched in the name of silencing
warnings; but I think enabling UBSAN is mainly something people
would do on HEAD, so for now it seems not worth the trouble.
Discussion: https://postgr.es/m/1699806.1765746897@sss.pgh.pa.us
The workload to generate multixids before upgrade is very slow on
buildfarm members running with JIT enabled. The workload runs a lot of
small queries, so it's unsurprising that JIT makes it slower. On my
laptop it nevertheless runs in under 10 s even with JIT enabled, while
some buildfarm members have been hitting the 180 s timeout. That seems
extreme, but I suppose it's still expected on very slow and busy
buildfarm animals. The timeout applies to the BackgroundPsql sessions
as whole rather than the individual queries.
Bump up the timeout to avoid the test failures. Add periodic progress
reports to the test output so that we get a better picture of just how
slow the test is.
In the passing, also fix comments about how many multixids and members
the workload generates. The comments were written based on 10 parallel
connections, but it actually uses 20.
Discussion: https://www.postgresql.org/message-id/b7faf07c-7d2c-4f35-8c43-392e057153ef@gmail.com
In the server, check explicitly for multixids with zero members. We
used to have an assertion for it, but commit d4b7bde418 replaced it
with more extensive runtime checks, but it missed the original case of
zero members.
In the upgrade code, a negative length never makes sense, so better
check for it explicitly. Commit d4b7bde418 added a similar sanity
check to the corresponding server code on master, and in backbranches,
the 'length' is passed to palloc which would fail with "invalid memory
alloc request size" error. Clarify the comments on what kind of
invalid entries are tolerated by the upgrade code and which ones are
reported as fatal errors.
Coverity complained about 'length' in the upgrade code being
tainted. That's bogus because we trust the data on disk at least to
some extent, but hopefully this will silence the complaint. If not,
I'll dismiss it manually.
Discussion: https://www.postgresql.org/message-id/7b505284-c6e9-4c80-a7ee-816493170abc@iki.fi
We have tried to stabilize them several times already, but they are very
flaky -- apparently there's some intrinsic instability that's hard to
solve with the isolationtester framework. They are very noisy in CI
runs (whereas buildfarm has not registered any such failures). They may
need to be rewritten completely. In the meantime just comment them out
in Makefile/meson.build, leaving the spec files around.
Per complaint from Andres Freund.
Discussion: https://postgr.es/m/202512112014.icpomgc37zx4@alvherre.pgsql
HAVE__STATIC_ASSERT was really a test for GCC statement expressions,
as needed for StaticAssertExpr() now that _Static_assert could be
assumed to be available through our C11 requirement. This
artificially prevented Visual Studio from being able to use
static_assert() in other contexts.
Instead, make a new test for HAVE_STATEMENT_EXPRESSIONS, and use that
to control only whether StaticAssertExpr() uses fallback code, not the
other variants. This improves the quality of failure messages in the
(much more common) other variants under Visual Studio.
Also get rid of the two separate implementations for C++, since the C
implementation is also also valid as C++11. While it is a stretch to
apply HAVE_STATEMENT_EXPRESSIONS tested with $CC to a C++ compiler,
the previous C++ coding assumed that the C++ compiler had them
unconditionally, so it isn't a new stretch. In practice, the C and
C++ compilers are very likely to agree, and if a combination is ever
reported that falsifies this assumption we can always reconsider that.
Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGKvr0x_oGmQTUkx%3DODgSksT2EtgCA6LmGx_jQFG%3DsDUpg%40mail.gmail.com
Coverity complained that offset cannot be 0 here because there's an
explicit check for "offset == 0" earlier in the function, but it
didn't see the possibility that offset could've wrapped around to 0.
The code is correct, but clarify the comment about it.
The same code exists in backbranches in the server
GetMultiXactIdMembers() function and in 'master' in the pg_upgrade
GetOldMultiXactIdSingleMember function. In backbranches Coverity
didn't complain about it because the check was merely an assertion,
but change the comment in all supported branches for consistency.
Per Tom Lane's suggestion.
Discussion: https://www.postgresql.org/message-id/1827755.1765752936@sss.pgh.pa.us
Previously, pg_sync_replication_slots() would finish without synchronizing
slots that didn't meet requirements, rather than failing outright. This
could leave some failover slots unsynchronized if required catalog rows or
WAL segments were missing or at risk of removal, while the standby
continued removing needed data.
To address this, the function now waits for the primary slot to advance to
a position where all required data is available on the standby before
completing synchronization. It retries cyclically until all failover slots
that existed on the primary at the start of the call are synchronized.
Slots created after the function begins are not included. If the standby
is promoted during this wait, the function exits gracefully and the
temporary slots will be removed.
Author: Ajin Cherian <itsajin@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Yilin Zhang <jiezhilove@126.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
I have fat-fingered an error message related to an offset while
switching the code to use pgoff_t. Let's switch to the same error
message used in the rest of the tree for similar failures with fseeko(),
instead.
Per buildfarm members running macos: longfin, sifaka and indri.
gist_page_items() opens its target relation with index_open(), but
closed it using relation_close() instead of index_close(). This was
harmless because index_close() and relation_close() do the exact same
work, still inconsistent with the rest of the code tree as routines
opening and closing a relation based on a relkind are expected to match,
at least in name.
Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2=bL41WWcD-4Fxx-buS2Y2G5=9PjkxZbHeFMR6Uy2WNvw@mail.gmail.com
This commit builds upon 4ba012a8ed, giving an example of what can be
achieved with the new callbacks. This provides coverage for the new
pgstats APIs, while serving as a reference template.
Note that built-in stats kinds could use them, we just don't have a
use-case there yet.
Author: Sami Imseih <samimseih@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com
Cumulative stats kinds gain the capability to write additional per-entry
data when flushing the stats at shutdown, and read this data when
loading back the stats at startup. This can be fit for example in the
case of variable-length data (like normalized query strings), so as it
becomes possible to link the shared memory stats entries to data that is
stored in a different area, like a DSA segment.
Three new optional callbacks are added to PgStat_KindInfo, available to
variable-numbered stats kinds:
* to_serialized_data: writes auxiliary data for an entry.
* from_serialized_data: reads auxiliary data for an entry.
* finish: performs actions after read/write/discard operations. This is
invoked after processing all the entries of a kind, allowing extensions
to close file handles and clean up resources.
Stats kinds have the option to store this data in the existing pgstats
file, but can as well store it in one or more additional files whose
names can be built upon the entry keys. The new serialized callbacks
are called once an entry key is read or written from the main stats
file. A file descriptor to the main pgstats file is available in the
arguments of the callbacks.
Author: Sami Imseih <samimseih@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com
The current list from the buildfarm includes quite a few typedef
names that it used to miss. The reason is a bit obscure, but it
seems likely to have something to do with our recent increased
use of palloc_object and palloc_array. In any case, this makes
the relevant struct declarations be much more nicely formatted,
so I'll take it. Install the current list and re-run pgindent
to update affected code.
Syncing with the current list also removes some obsolete
typedef names and fixes some alphabetization errors.
Discussion: https://postgr.es/m/1681301.1765742268@sss.pgh.pa.us
There doesn't seem to be any great reason why this has been a macro
rather than a typedef. But doing it like that means our buildfarm
typedef tooling doesn't capture the name as a typedef. That would
result in pgindent glitches, except that we've seemingly kept it
in typedefs.list manually. That's obviously error-prone, so let's
convert it to a typedef now.
Discussion: https://postgr.es/m/1681301.1765742268@sss.pgh.pa.us
While commit 5b275a6e1 fixed a few unhappy buildfarm animals,
it looks like the remainder simply don't have any es_ES locale
at all. There's little point in running the test in that case,
so minimize the number of variant expected-files by bailing out.
Also emit a log entry so that it's possible to tell from buildfarm
postmaster logs which case occurred.
Possibly, the scope of this testing could be improved by providing
additional translations. But I think it's likely that the failing
animals have no non-C locales installed at all.
In passing, update typedefs.list so that koel doesn't think
regress.c is misformatted.
Discussion: https://postgr.es/m/E1vUpNU-000kcQ-1D@gemulon.postgresql.org
The private refcount entry for a buffer is often looked up repeatedly for the
same buffer, e.g. to pin and then unpin a buffer. Benchmarking shows that it's
worthwhile to have a one-entry cache for that case. With that cache in place,
it's worth splitting GetPrivateRefCountEntry() into a small inline
portion (for the cache hit case) and an out-of-line helper for the rest.
This is helpful for some workloads today, but becomes more important in an
upcoming patch that will utilize the private refcount infrastructure to also
store whether the buffer is currently locked, as that increases the rate of
lookups substantially.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/6rgb2nvhyvnszz4ul3wfzlf5rheb2kkwrglthnna7qhe24onwr@vw27225tkyar
This makes lookups faster, due to allowing auto-vectorized lookups. It is also
beneficial for an upcoming patch, independent of auto-vectorization, as the
upcoming patch wants to track more information for each pinned buffer, making
the existing loop, iterating over an array of PrivateRefCountEntry, more
expensive due to increasing its size.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
While CI testing in advance of commit 8c498479d suggested that all
Unix-ish platforms would accept 'es_ES.UTF-8', the buildfarm has
a different opinion. Let's dynamically select something that works,
if possible.
Discussion: https://postgr.es/m/E1vUpNU-000kcQ-1D@gemulon.postgresql.org
Coverity complained about this, not without reason:
OldMultiXactReader *state = state = pg_malloc(sizeof(*state));
(I'm surprised this is even legal C ... why is "state" in-scope
in its initialization expression?)
While at it, convert to use our newly-preferred "pg_malloc_object"
macro instead of an explicit sizeof().
We've never actually had a formal test for this facility.
It seems worth adding one now, mainly because we are starting
to depend on gettext() being able to handle the PRI* macros
from <inttypes.h>, and it's not all that certain that that
works everywhere. So the test goes to a bit of effort to
check all the PRI* macros we are likely to use.
(This effort has indeed found one problem already, now fixed
in commit f8715ec86.)
Discussion: https://postgr.es/m/3098752.1765221796@sss.pgh.pa.us
Discussion: https://postgr.es/m/292844.1765315339@sss.pgh.pa.us
Change WAIT_LSN_TYPE_COUNT from an enum sentinel to a macro definition,
in a similar way to IOObject, IOContext, and BackendType enums. Remove
explicit enum value assignments well.
Author: Xuneng Zhou <xunengzhou@gmail.com>
f2e4cc4279 and 4b3d173629 implement ALTER TABLE ... MERGE/SPLIT
PARTITION(s) commands. In several places, these commits use palloc(),
where we should use palloc_object() and palloc_array(). This commit
provides appropriate usage of palloc_object() and palloc_array().
Reported-by: Man Zeng <zengman@halodbtech.com>
Discussion: https://postgr.es/m/tencent_3661BB522D5466B33EA33666%40qq.com
This new DDL command splits a single partition into several partitions. Just
like the ALTER TABLE ... MERGE PARTITIONS ... command, new partitions are
created using the createPartitionTable() function with the parent partition
as the template.
This commit comprises a quite naive implementation which works in a single
process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all
the operations, including the tuple routing. This is why the new DDL command
can't be recommended for large, partitioned tables under high load. However,
this implementation comes in handy in certain cases, even as it is. Also, it
could serve as a foundation for future implementations with less locking and
possibly parallelism.
Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru
Author: Dmitry Koval <d.koval@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Richard Guo <guofenglinux@gmail.com>
Co-authored-by: Dagfinn Ilmari Mannsaker <ilmari@ilmari.org>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Co-authored-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Robert Haas <rhaas@postgresql.org>
Reviewed-by: Stephane Tachoires <stephane.tachoires@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Daniel Gustafsson <dgustafsson@postgresql.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
This new DDL command merges several partitions into a single partition of the
target table. The target partition is created using the new
createPartitionTable() function with the parent partition as the template.
This commit comprises a quite naive implementation which works in a single
process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all
the operations, including the tuple routing. This is why this new DDL
command can't be recommended for large partitioned tables under a high load.
However, this implementation comes in handy in certain cases, even as it is.
Also, it could serve as a foundation for future implementations with less
locking and possibly parallelism.
Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru
Author: Dmitry Koval <d.koval@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Richard Guo <guofenglinux@gmail.com>
Co-authored-by: Dagfinn Ilmari Mannsaker <ilmari@ilmari.org>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Co-authored-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Robert Haas <rhaas@postgresql.org>
Reviewed-by: Stephane Tachoires <stephane.tachoires@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Daniel Gustafsson <dgustafsson@postgresql.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
The reference to the test module test_custom_stats should have been
added under the section "Custom Cumulative Statistics", but the section
"Injection Points" has been updated instead, reversing the references
for both test modules.
d52c24b0f8 has removed a paragraph that was correct, and 31280d96a6
has added a paragraph that was incorrect.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0s4heX926+ZNh63u12gLd9jgauU6yiirKc7xGo1G01PXQ@mail.gmail.com
In commit b61aa76e4 I added an assumption in jsonb_object_agg_finalfn
that it'd be okay to apply uniqueifyJsonbObject repeatedly to a
JsonbValue. I should have studied that code more closely first,
because in skip_nulls mode it removed leading nulls by changing the
"pairs" array start pointer. This broke the data structure's
invariants in two ways: pairs no longer references a repalloc-able
chunk, and the distance from pairs to the end of its array is less
than parseState->size. So any subsequent addition of more pairs is
at high risk of clobbering memory and/or causing repalloc to crash.
Unfortunately, adding more pairs is exactly what will happen when the
aggregate is being used as a window function.
Fix by rewriting uniqueifyJsonbObject to not do that. The prior
coding had little to recommend it anyway.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/ec5e96fb-ee49-4e5f-8a09-3f72b4780538@gmail.com
When relptr.h was added (commit fbc1c12a94), there was no check for
HAVE_TYPEOF, so it used HAVE__BUILTIN_TYPES_COMPATIBLE_P, which
already existed (commit ea473fb2de) and which was thought to cover
approximately the same compilers. But the guarded code can also work
without HAVE__BUILTIN_TYPES_COMPATIBLE_P, and we now have a check for
HAVE_TYPEOF (commit 4cb824699e), so let's fix this up to use the
correct logic.
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/CA%2BhUKGL7trhWiJ4qxpksBztMMTWDyPnP1QN%2BLq341V7QL775DA%40mail.gmail.com
It's as pointless as ASC/DESC and NULLS FIRST/LAST are, so reject all of
them in the same way. While at it, normalize the others' error messages
to have less translatable strings. Add tests for these errors.
Noticed while reviewing recent INSERT ON CONFLICT patches.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/202511271516.oiefpvn3z27m@alvherre.pgsql
Before this commit, when multixid wraparound happens,
MultiXactState->nextMXact goes to 0, which is invalid. All the readers
need to deal with that possibility and skip over the 0. That's
error-prone and we've missed it a few times in the past. This commit
changes the responsibility so that all the writers of
MultiXactState->nextMXact skip over the zero already, and readers can
trust that it's never 0.
We were already doing that for MultiXactState->oldestMultiXactId; none
of its writers would set it to 0. ReadMultiXactIdRange() was
nevertheless checking for that possibility. For clarity, remove that
check.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Discussion: https://www.postgresql.org/message-id/3624730d-6dae-42bf-9458-76c4c965fb27@iki.fi
The fix for concurrent index operations in bc32a12e0d started
considering indexes that are not yet marked indisvalid as arbiters for
INSERT ON CONFLICT. For partitioned tables, this leads to including
indexes that may not exist in partitions, causing a trivially
reproducible "invalid arbiter index list" error to be thrown because of
failure to match the index. To fix, it suffices to ignore !indisvalid
indexes on partitioned tables. There should be no risk that the set of
indexes will change for concurrent transactions, because in order for
such an index to be marked valid, an ALTER INDEX ATTACH PARTITION must
run which requires AccessExclusiveLock.
Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/17622f79-117a-4a44-aa8e-0374e53faaf0%40gmail.com
It's not far-fetched that we'd try to read a multixid with an invalid
offset in case of bugs or corruption. Or if you call
pg_get_multixact_members() after a crash that left behind invalid but
unused multixids. Better to get a somewhat descriptive error message
if that happens.
Discussion: https://www.postgresql.org/message-id/3624730d-6dae-42bf-9458-76c4c965fb27@iki.fi
Previously, c.h made <assert.h> only available in frontends (#ifdef
FRONTEND), which was probably reasonable, because the only thing it
would give you is assert(), which you generally shouldn't use in the
backend. But with C11, <assert.h> also makes available
static_assert(), which would be useful everywhere. So this patch
moves <assert.h> to the commonly available header files in c.h and
fixes a small complication in regcustom.h that resulted from that.
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA%2BhUKGKvr0x_oGmQTUkx%3DODgSksT2EtgCA6LmGx_jQFG%3DsDUpg%40mail.gmail.com
This is the last batch of changes that have been suggested by the
author, this part covering the non-trivial changes. Some of the changes
suggested have been discarded as they seem to lead to more instructions
generated, leaving the parts that can be qualified as in-place
replacements.
Similar work has been done in 1b105f9472, 0c3c5c3b06 and
31d3847a37.
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
The code over-allocated the memory required for os_page_status, relying
on uint64 for its element size instead of an int, hence doubling what
was required. This could mean quite a lot of memory if dealing with a
lot of NUMA pages.
Oversight in ba2a3c2302.
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
Backpatch-through: 18
Previously, during a promotion, only the slot synchronization worker was
signaled to shut down. The backend executing slot synchronization via the
pg_sync_replication_slots() SQL function was not signaled, allowing it to
complete its synchronization cycle before exiting.
An upcoming patch improves pg_sync_replication_slots() to wait until
replication slots are fully persisted before finishing. This behaviour
requires the backend to exit promptly if a promotion occurs.
This patch ensures that, during promotion, a signal is also sent to the
backend running pg_sync_replication_slots(), allowing it to be interrupted
and exit immediately.
Author: Ajin Cherian <itsajin@gmail.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
Make it clear why _bt_killitems sorts the scan's so->killedItems[]
array. Also add an assertion to the _bt_killitems loop (that iterates
through this array) to verify it accesses tuples in leaf page order.
Follow-up to commit bfb335df58.
Author: Peter Geoghegan <pg@bowt.ie>
Suggested-by: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAGnEboirgArezZDNeFrR8FOGvKF-Xok333s2iVwWi65gZf8MEA@mail.gmail.com
An array of LLVMBasicBlockRef is allocated with the size used for an
element being "LLVMBasicBlockRef *" rather than "LLVMBasicBlockRef".
LLVMBasicBlockRef is a type that refers to a pointer, so this did not
directly cause a problem because both should have the same size, still
it is incorrect.
This issue has been spotted while reviewing a different patch, and
exists since 2a0faed9d7, so backpatch all the way down.
Discussion: https://postgr.es/m/CA+hUKGLngd9cKHtTUuUdEo2eWEgUcZ_EQRbP55MigV2t_zTReg@mail.gmail.com
Backpatch-through: 14
Although clang claims to be compatible with gcc's printf format
archetypes, this appears to be a falsehood: it likes __syslog__
(which gcc does not, on most platforms) and doesn't accept
gnu_printf. This means that if you try to use gcc with clang++
or clang with g++, you get compiler warnings when compiling
printf-like calls in our C++ code. This has been true for quite
awhile, but it's gotten more annoying with the recent appearance
of several buildfarm members that are configured like this.
To fix, run separate probes for the format archetype to use with the
C and C++ compilers, and conditionally define PG_PRINTF_ATTRIBUTE
depending on __cplusplus.
(We could alternatively insist that you not mix-and-match C and
C++ compilers; but if the case works otherwise, this is a poor
reason to insist on that.)
No back-patch for now, but we may want to do that if this
patch survives buildfarm testing.
Discussion: https://postgr.es/m/986485.1764825548@sss.pgh.pa.us
Always return TIDs in descending order when returning groups of TIDs
from an nbtree posting list tuple during nbtree backwards scans. This
makes backwards scans tend to require fewer buffer hits, since the scan
is less likely to repeatedly pin and unpin the same heap page/buffer
(we'll get exactly as many buffer hits as we get with a similar forwards
scan case).
Commit 0d861bbb, which added nbtree deduplication, originally did things
this way to avoid interfering with _bt_killitems's approach to setting
LP_DEAD bits on posting list tuples. _bt_killitems makes a soft
assumption that it can always iterate through posting lists in ascending
TID order, finding corresponding killItems[]/so->currPos.items[] entries
in that same order. This worked out because of the prior _bt_readpage
backwards scan behavior. If we just changed the backwards scan posting
list logic in _bt_readpage, without altering _bt_killitems itself, it
would break its soft assumption.
Avoid that problem by sorting the so->killedItems[] array at the start
of _bt_killitems. That way the order that dead items are saved in from
btgettuple can't matter; so->killedItems[] will always be in the same
order as so->currPos.items[] in the end. Since so->currPos.items[] is
now always in leaf page order, regardless of the scan direction used
within _bt_readpage, and since so->killedItems[] is always in that same
order, the _bt_killitems loop can continue to make a uniform assumption
about everything being in page order. In fact, sorting like this makes
the previous soft assumption about item order into a hard invariant.
Also deduplicate the so->killedItems[] array after it is sorted. That
way there's no risk of the _bt_killitems loop becoming confused by a
duplicate dead item/TID. This was possible in cases that involved a
scrollable cursor that encountered the same dead TID more than once
(within the same leaf page/so->currPos context). This doesn't come up
very much in practice, but it seems best to be as consistent as possible
about how and when _bt_killitems will LP_DEAD-mark index tuples.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=Wut2pKvbW-u3hJ_LXwsYeiXHiW8oN1GfbKPavcGo8Ow@mail.gmail.com
It's only useful for an ILIKE optimization for the libc provider using
a single-byte encoding and a non-C locale, but it creates significant
internal complexity.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
The test seemed to incorrectly think that query_safe() takes an
argument that describes what the query does, similar to e.g.
command_ok(). Until commit bd8d9c9bdf the extra arguments were
harmless and were just ignored, but when commit bd8d9c9bdf introduced
a new optional argument to query_safe(), the extra arguments started
clashing with that, causing the test to fail.
Backpatch to v17, that's the oldest branch where the test exists. The
extra arguments didn't cause any trouble on the older branches, but
they were clearly bogus anyway.
1. The test initially focuses on the "parent" table, then switches to
the "part" table, and goes back to the "parent" table. That seems a
little weird, so move the tests around so that all the commands on the
"parent" table are done first, followed by the "part" table.
2. ALTER TABLE ALTER COLUMN SET EXPRESSION was not tested, so add
that.
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACJufxFDi7fnwB-8xXd_ExML7-7pKbTaK4j46AJ=4-14DXvtVg@mail.gmail.com
Phase I vacuum gives the page a once-over after pruning and freezing to
check that the values of all_visible and all_frozen agree with the
result of heap_page_is_all_visible(). This is meant to keep the logic in
phase I for determining visibility in sync with the logic in phase III.
Rewrite the assertion to avoid an Assert(false).
Suggested by Andres Freund.
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/mhf4vkmh3j57zx7vuxp4jagtdzwhu3573pgfpmnjwqa6i6yj5y%40sy4ymcdtdklo
These functions took a ResourceOwner argument, but only checked if it
was NULL, and then used CurrentResourceOwner for the actual work.
Surely the intention was to use the passed-in resource owner. All
current callers passed CurrentResourceOwner or NULL, so this has no
consequences at the moment, but it's an accident waiting to happen for
future caller and extensions.
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAEze2Whnfv8VuRZaohE-Af+GxBA1SNfD_rXfm84Jv-958UCcJA@mail.gmail.com
Backpatch-through: 17
pthread_exit() is added to the list of symbols allowed when building
libpq. This has been reported as possible when libpq is statically
linked to libcrypto, where pthread_exit() could be called.
Reported-by: Torsten Rupp <torsten.rupp@gmx.net>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/19095-6d8256d0c37d4be2@postgresql.org
Buildfarm members skimmer and crake have reported that pg_upgrade
running from v18 fails due to the changes of d52c24b0f8, with the
expectations that the objects removed in the test module
injection_points should still be present post upgrades, but the test
module does not have them anymore.
The origin of the issue is that the following test modules depend on
injection_points, but they do not drop the extension once the tests
finish, leaving its traces in the dumps used for the upgrades:
- gin, down to v17
- typcache, down to v18
- nbtree, HEAD-only
Test modules have no upgrade requirements, as they are used only for..
Tests, so there is no point in keeping them around.
An alternative solution would be to drop the databases created by these
modules in AdjustUpgrade.pm, but the solution of this commit to drop the
extension is simpler. Note that there would be a catch if using a
solution based on AdjustUpgrade.pm as the database name used for the
test runs differs between configure and meson:
- configure relies on USE_MODULE_DB for the database name unicity, that
would build a database name based on the *first* entry of REGRESS, that
lists all the SQL tests.
- meson relies on a "name" field.
For example, for the test module "gin", the regression database is named
"regression_gin" under meson, while it is more complex for configure, as
of "contrib_regression_gin_incomplete_splits". So a AdjustUpgrade.pm
would need a set of DROP DATABASE IF EXISTS to solve this issue, to cope
with each build system.
The failure has been caused by d52c24b0f8, and the problem can happen
with upgrade dumps from v17 and v18 to HEAD. This problem is not
currently reachable in the back-branches, but it could be possible that
a future change in injection_points in stable branches invalidates this
theory, so this commit is applied down to v17 in the test modules that
matter.
Per discussion with Tom Lane and Heikki Linnakangas.
Discussion: https://postgr.es/m/2899652.1765167313@sss.pgh.pa.us
Backpatch-through: 17
The warning was showing up in the early stages of the meson build, when
the contents of Makefile.global is generated based on the configuration
of meson for PGXS.
NM is added to pgxs_empty. This declaration is only used internally for
the libpq sanity check, so there is no point in exposing it in PGXS.
Oversight in 4a8e6f43a6.
Reported-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/4423e01f-1e52-4f47-a6ca-05cc8081c888@eisentraut.org
A comment in tuplesort.c was claiming that the code was defining
INITIAL_MEMTUPSIZE so that it *does not* exceed
ALLOCSET_SEPARATE_THRESHOLD, but the code actually ensures that we
purposefully *do* exceed ALLOCSET_SEPARATE_THRESHOLD for the initial
allocation of the tuples array, as per reasons detailed in the
commentary of grow_memtuples().
Also, there's not much need to repeat the mention about
ALLOCSET_SEPARATE_THRESHOLD in each location where INITIAL_MEMTUPSIZE is
used, so remove those comments.
Author: ChangAo Chen <cca5507@qq.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/tencent_6FA14F85D6B5B5291532D6789E07F4765C08%40qq.com
The idea is to encourage more the use of these new routines across the
tree, as these offer stronger type safety guarantees than palloc().
This batch of changes includes most of the trivial changes suggested by
the author for src/backend/.
A total of 334 files are updated here. Among these files, 48 of them
have their build change slightly; these are caused by line number
changes as the new allocation formulas are simpler, shaving around 100
lines of code in total.
Similar work has been done in 0c3c5c3b06 and 31d3847a37.
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
PostgreSQL's src/port/open.c has always set bInheritHandle = TRUE
when opening files on Windows, making all file descriptors inheritable
by child processes. This meant the O_CLOEXEC flag, added to many call
sites by commit 1da569ca1f (v16), was silently ignored.
The original commit included a comment suggesting that our open()
replacement doesn't create inheritable handles, but it was a mis-
understanding of the code path. In practice, the code was creating
inheritable handles in all cases.
This hasn't caused widespread problems because most child processes
(archive_command, COPY PROGRAM, etc.) operate on file paths passed as
arguments rather than inherited file descriptors. Even if a child
wanted to use an inherited handle, it would need to learn the numeric
handle value, which isn't passed through our IPC mechanisms.
Nonetheless, the current behavior is wrong. It violates documented
O_CLOEXEC semantics, contradicts our own code comments, and makes
PostgreSQL behave differently on Windows than on Unix. It also creates
potential issues with future code or security auditing tools.
To fix, define O_CLOEXEC to _O_NOINHERIT in master, previously used by
O_DSYNC. We use different values in the back branches to preserve
existing values. In pgwin32_open_handle() we set bInheritHandle
according to whether O_CLOEXEC is specified, for the same atomic
semantics as POSIX in multi-threaded programs that create processes.
Backpatch-through: 16
Author: Bryan Green <dbryan.green@gmail.com>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com> (minor adjustments)
Discussion: https://postgr.es/m/e2b16375-7430-4053-bda3-5d2194ff1880%40gmail.com
This new option instructs vacuumdb to print, but not execute, the
VACUUM and ANALYZE commands that would've been sent to the server.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CADkLM%3DckHkX7Of5SrK7g0LokPUwJ%3Dkk8JU1GXGF5pZ1eBVr0%3DQ%40mail.gmail.com
This commit refactors the code for marking a ParallelSlot as idle
to a new static inline function. This can be used to mark a slot
that was obtained via ParallelSlotGetIdle() but that we don't
intend to actually use for a query as idle again.
This is preparatory work for a follow-up commit that will add a
--dry-run option to vacuumdb.
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com
Discussion: https://postgr.es/m/CADkLM%3DckHkX7Of5SrK7g0LokPUwJ%3Dkk8JU1GXGF5pZ1eBVr0%3DQ%40mail.gmail.com
Presently, the "echo" and "quiet" variables are carted around to
various functions, which is a bit tedious. To simplify things,
this commit moves them into the vacuumingOptions struct and removes
the related function parameters. While at it, remove some
redundant initialization code in vacuumdb's main() function.
This is preparatory work for a follow-up commit that will add a
--dry-run option to vacuumdb.
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CADkLM%3DckHkX7Of5SrK7g0LokPUwJ%3Dkk8JU1GXGF5pZ1eBVr0%3DQ%40mail.gmail.com
The new column, started_by, indicates the initiator of the
analyze ('manual' or 'autovacuum'), helping users and monitoring tools
to better understand ANALYZE behavior.
Bump catalog version.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Yu Wang <wangyu_runtime@163.com>
Discussion: https://postgr.es/m/CAA5RZ0suoicwxFeK_eDkUrzF7s0BVTaE7M%2BehCpYcCk5wiECpw%40mail.gmail.com
The new columns, mode and started_by, indicate the vacuum
mode ('normal', 'aggressive', or 'failsafe') and the initiator of the
vacuum ('manual', 'autovacuum', or 'autovacuum_wraparound'),
respectively. This allows users and monitoring tools to better
understand VACUUM behavior.
Bump catalog version.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Yu Wang <wangyu_runtime@163.com>
Discussion: https://postgr.es/m/CAOzEurQcOY-OBL_ouEVfEaFqe_md3vB5pXjR_m6L71Dcp1JKCQ@mail.gmail.com
POSIX has for a long time defined the "j" length modifier for
printf conversions as meaning the size of intmax_t or uintmax_t.
We got away without supporting that so far, because we were not
using intmax_t anywhere. However, commit e6be84356 re-introduced
upstream's use of intmax_t and PRIdMAX into zic.c. It emerges
that on some platforms (at least FreeBSD and macOS), <inttypes.h>
defines PRIdMAX as "jd", so that snprintf.c falls over if that is
used. (We hadn't noticed yet because it would only be apparent
if bad data is fed to zic, resulting in an error report, and even
then the only visible symptom is a missing line number in the
error message.)
We could revert that decision from our copy of zic.c, but
on the whole it seems better to update snprintf.c to support
this standard modifier. There might well be extensions,
now or in future, that expect it to work.
I did this in the lazy man's way of translating "j" to either
"l" or "ll" depending on a compile-time sizeof() check, just
as was done long ago to support "z" for size_t. One could
imagine promoting intmax_t to have full support in snprintf.c,
for example converting fmtint()'s value argument and internal
arithmetic to use [u]intmax_t not [unsigned] long long. But
that'd be more work and I'm hesitant to do it anyway: if there
are any platforms out there where intmax_t is actually wider
than "long long", this would doubtless result in a noticeable
speed penalty to snprintf(). Let's not go there until we have
positive evidence that there's a reason to, and some way to
measure what size of penalty we're taking.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3210703.1765236740@sss.pgh.pa.us
This eliminates MultiXactOffset wraparound and the 2^32 limit on the
total number of multixid members. Multixids are still limited to 2^31,
but this is a nice improvement because 'members' can grow much faster
than the number of multixids. On such systems, you can now run longer
before hitting hard limits or triggering anti-wraparound vacuums.
Not having to deal with MultiXactOffset wraparound also simplifies the
code and removes some gnarly corner cases.
We no longer need to perform emergency anti-wraparound freezing
because of running out of 'members' space, so the offset stop limit is
gone. But you might still not want 'members' to consume huge amounts
of disk space. For that reason, I kept the logic for lowering vacuum's
multixid freezing cutoff if a large amount of 'members' space is
used. The thresholds for that are roughly the same as the "safe" and
"danger" thresholds used before, 2 billion transactions and 4 billion
transactions. This keeps the behavior for the freeze cutoff roughly
the same as before. It might make sense to make this smarter or
configurable, now that the threshold is only needed to manage disk
usage, but that's left for the future.
Add code to pg_upgrade to convert multitransactions from the old to
the new format, rewriting the pg_multixact SLRU files. Because
pg_upgrade now rewrites the files, we can get rid of some hacks we had
put in place to deal with old bugs and upgraded clusters. Bump catalog
version for the pg_multixact/offsets format change.
Author: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACG%3DezaWg7_nt-8ey4aKv2w9LcuLthHknwCawmBgEeTnJrJTcw@mail.gmail.com
The description of deferrable constraints in create_table.sgml states
that deferrable constraints cannot be used as conflict arbitrators in
an INSERT with an ON CONFLICT DO UPDATE clause, but in fact this
restriction applies to all ON CONFLICT clauses, not just those with DO
UPDATE. Fix this, and while at it, change the word "arbitrators" to
"arbiters", to match the terminology used elsewhere.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWsybvZP3ce8rGcVNx-QHuDOJZDz8y=p1SzqHwjRXyV4Q@mail.gmail.com
Backpatch-through: 14
query_is_distinct_for() is intended to determine whether a query never
returns duplicates of the specified columns. For queries using
grouping sets, if there are no grouping expressions, the query may
contain one or more empty grouping sets. The goal is to detect
whether there is exactly one empty grouping set, in which case the
query would return a single row and thus be distinct.
The previous logic in query_is_distinct_for() was incomplete because
the check was insufficiently thorough and could return false when it
could have returned true. It failed to consider cases where the
DISTINCT clause is used on the GROUP BY, in which case duplicate empty
grouping sets are removed, leaving only one. It also did not
correctly handle all possible structures of GroupingSet nodes that
represent a single empty grouping set.
To fix, add a check for the groupDistinct flag, and expand the query's
groupingSets tree into a flat list, then verify that the expanded list
contains only one element.
No backpatch as this could result in plan changes.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMbWs480Z04NtP8-O55uROq2Zego309+h3hhaZhz6ztmgWLEBw@mail.gmail.com
Similar to the issue with constraint and statistics expressions fixed
in 317c117d6, index expressions and predicate can also suffer from
incorrect reduction of NullTest clauses during const-simplification,
due to unfixed varnos and the use of a NULL root. It has been
reported that this issue can cause the planner to fail to pick up a
partial index that it previously matched successfully.
Because we need to cache the const-simplified index expressions and
predicate in the relcache entry, we cannot fix the Vars before
applying eval_const_expressions. To ensure proper reduction of
NullTest clauses, this patch runs eval_const_expressions a second time
-- after the Vars have been fixed and with a valid root.
It could be argued that the additional call to eval_const_expressions
might increase planning time, but I don't think that's a concern. It
only runs when index expressions and predicate are present; it is
relatively cheap when run on small expression trees (which is
typically the case for index expressions and predicate), and it runs
on expressions that have already been const-simplified once, making
the second pass even cheaper. In return, in cases like the one
reported, it allows the planner to match and use partial indexes,
which can lead to significant execution-time improvements.
Bug: #19007
Reported-by: Bryan Fox <bryfox@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/19007-4cc6e252ed8aa54a@postgresql.org
Previously, the slotsync worker relied on SIGINT for graceful shutdown
during promotion. However, SIGINT is also used by the LOCK_TIMEOUT handler
to cancel queries. Since the slotsync worker can lock catalog tables while
parsing libpq tuples, this overlap caused it to ignore LOCK_TIMEOUT
signals and potentially wait indefinitely on locks.
This patch replaces the slotsync worker's SIGINT handler with
StatementCancelHandler to correctly process query-cancel interrupts.
Additionally, the startup process now uses SIGUSR1 to signal the slotsync
worker to stop during promotion. The worker exits after detecting that the
shared memory flag stopSignaled is set.
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 17, here it was introduced
Discussion: https://postgr.es/m/TY4PR01MB169078F33846E9568412D878C94A2A@TY4PR01MB16907.jpnprd01.prod.outlook.com
There were a number of useless casts in format arguments, either
where the input to the cast was already in the right type, or
seemingly uselessly casting between types instead of just using the
right format placeholder to begin with.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/07fa29f9-42d7-4aac-8834-197918cbbab6%40eisentraut.org
The idea is to encourage more the use of these new routines across the
tree, as these offer stronger type safety guarantees than palloc().
The following paths are included in this batch, treating all the areas
proposed by the author for the most trivial changes, except src/backend
(by far the largest batch):
src/bin/
src/common/
src/fe_utils/
src/include/
src/pl/
src/test/
src/tutorial/
Similar work has been done in 31d3847a37.
The code compiles the same before and after this commit, with the
following exceptions due to changes in line numbers because some of the
new allocation formulas are shorter:
blkreftable.c
pgfnames.c
pl_exec.c
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
After my recent commit 7902a47c20, Nathan noticed that
pg_atomic_unlocked_write_u64() was not accurately described by the comments
for the 32bit version. Turns out the 32bit version has suffered from
copy-and-paste-itis since its introduction. Fix.
Reported-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/aTGt7q4Jvn97uGAx@nathan
Plus a similar fix to the README.
Backpatch as far back as the sgml issue exists. The README issue does
exist in v14, but that seems unlikely to harm anyone.
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ed3db7ea-55b4-4809-86af-81ad3bb2c7d3@gmail.com
Backpatch-through: 15
This commit refactors the sanity check done by libpq to ensure that
there is no exit() reference in the build, moving the check from a
standalone Makefile rule to a perl script.
Platform-specific checks are now part of the script, avoiding most of
the duplication created by the introduction of this check for meson, but
not all of them:
- Solaris and Windows skipped in the script.
- Whitelist of symbols is in the script.
- nm availability, with its path given as an option of the script. Its
execution is checked in the script.
- Check is disabled if coverage reports are enabled. This part is not
pushed down to the script.
- Check is disabled for static builds of libpq. This part is filtered
out in each build script.
A trick is required for the stamp file, in the shape of an optional
argument that can be given to the script. Meson expects the stamp in
output and uses this argument, generating the stamp file in the script.
Meson is able to handle the removal of the stamp file internally when
libpq needs to be rebuilt and the check done again.
This refactoring piece has come up while discussing the addition of more
items in the symbols considered as acceptable.
This sanity check has never been run by meson since its introduction in
dc227eb82e, so it is possible that this fails in some of the buildfarm
members. At least the CI is happy with it, but let's see how it goes.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-authored-by: VASUKI M <vasukim1992002@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/19095-6d8256d0c37d4be2@postgresql.org
The argument of isspace() (like other <ctype.h> functions)
must be cast to unsigned char to ensure portable results.
Per NetBSD buildfarm members. Oversight in 636c1914b.
Make _bt_readpage pass down the current scan direction to various
utility functions within its pstate variable. Also have _bt_readpage
work off of a local copy of scan->ignore_killed_tuples within its
per-tuple loop (rather than using scan->ignore_killed_tuples directly).
Testing has shown that this significantly benefits large range scans,
which are naturally able to take full advantage of the pstate.startikey
optimization added by commit 8a510275. Running a pgbench script with a
"SELECT abalance FROM pgbench_accounts WHERE aid BETWEEN ..." query
shows an increase in transaction throughput of over 5%. There also
appears to be a small performance benefit when running pgbench's
built-in select-only script.
Follow-up to commit 65d6acbc.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzmwMwcwKFgaf+mYPwiz3iL4AqpXnwtW_O0vqpWPXRom9Q@mail.gmail.com
Quite a bit of code within nbtutils.c is only called by _bt_readpage.
Move _bt_readpage and all of the nbtutils.c functions it depends on into
a new .c file, nbtreadpage.c. Also reorder some of the functions within
the new file for clarity.
This commit has no functional impact. It is strictly mechanical.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzmwMwcwKFgaf+mYPwiz3iL4AqpXnwtW_O0vqpWPXRom9Q@mail.gmail.com
Currently, we use special values that are otherwise invalid for each
option to indicate "option was not given". Replace that with separate
boolean variables for each option. It seems more clear to be explicit.
We were already doing that for the -m option, because there were no
invalid values for nextMulti that we could use (since commit
94939c5f3a).
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/81adf5f3-36ad-4bcd-9ba5-1b95c7b7a807@iki.fi
The strtoul() function that we used to parse many of the options
accepts negative values, and silently wraps them to the equivalent
unsigned values. For example, -1 becomes 0xFFFFFFFF, on platforms
where unsigned long is 32 bits wide. Also, on platforms where
"unsigned long" is 64 bits wide, we silently casted values larger than
UINT32_MAX to the equivalent 32-bit value. Both of those behaviors
seem undesirable, so tighten up the parsing to reject them.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/81adf5f3-36ad-4bcd-9ba5-1b95c7b7a807@iki.fi
When parse.pl processes braces, it does not take into account that
braces could also be their own token if single quoted ('{', '}').
This is not currently used but a future patch wants to make use of it.
This fixes that by using lookaround assertions to detect the quotes.
To make sure all Perl versions in play support this and to avoid
surprises later on, let's give this a spin on the buildfarm now. It
can exist independently of future work.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/a855795d-e697-4fa5-8698-d20122126567@eisentraut.org
The code in BootStrapXLOG() and in pg_test_fsync.c tried to align WAL
buffers in complicated ways. Also, they still used XLOG_BLCKSZ for
the alignment, even though that should now be PG_IO_ALIGN_SIZE. This
can now be simplified and made more consistent by using
PGAlignedXLogBlock, either directly in BootStrapXLOG() and using
alignas in pg_test_fsync.c.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/f462a175-b608-44a1-b428-bdf351e914f4%40eisentraut.org
This test module acts as a replacement that existed prior to
d52c24b0f8 in the test module injection_points. It uses a more
flexible structure than its ancestor:
- Two libraries are built, one for fixed-sized stats and one for
variable-sized stats.
- No GUCs required. The stats are enabled only if one or both libraries
are loaded with shared_preload_libraries.
- Same kind IDs reserved: 25 (variable-sized) and 26 (fixed-sized)
The goal of this redesign is to be able to easier extend the code
coverage provided by this module for other changes that are currently
under discussion, and injection_points was not suited for these.
Injection points are also now widely used in the tree now, so extending
more the test coverage for custom pgstats in the test module
injection_points would be a riskier long-term move.
The new code is mostly a copy of what existed previously in the test
module injection_points, with the same callbacks defined for fixed-sized
and variable-sized stats, but a simpler overall structure in terms of
the stats counters updated.
The test coverage should remain the same as previously: one TAP test is
used to check data reports, crash recovery and clean restart scenarios.
Tests are added for the manual reset of fixed-sized stats, something
not tested until now.
Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAA5RZ0sJgO6GAwgFxmzg9MVP=rM7Us8KKcWpuqxe-f5qxmpE0g@mail.gmail.com
A race condition could cause a newly created replication slot to become
invalidated between WAL reservation and a checkpoint.
Previously, if the required WAL was removed, we retried the reservation
process. However, the slot could still be invalidated before the retry if
the WAL was not yet removed but the checkpoint advanced the redo pointer
beyond the slot's intended restart LSN and computed the minimum LSN that
needs to be preserved for the slots.
The fix is to acquire an exclusive lock on ReplicationSlotAllocationLock
during WAL reservation to serialize WAL reservation and checkpoint's
minimum restart_lsn computation. This ensures that, if WAL reservation
occurs first, the checkpoint waits until restart_lsn is updated before
removing WAL. If the checkpoint runs first, subsequent WAL reservations
pick a position at or after the latest checkpoint's redo pointer.
We can't use the same fix for branch 17 and prior because commit
2090edc6f3 changed to compute to the minimum restart_LSN among slot's at
the beginning of checkpoint (or restart point). The fix for 17 and prior
branches is under discussion and will be committed separately.
Reported-by: suyu.cmj <mengjuan.cmj@alibaba-inc.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Vitaly Davydov <v.davydov@postgrespro.ru>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/5e045179-236f-4f8f-84f1-0f2566ba784c.mengjuan.cmj@alibaba-inc.com
The test module injection_points has been used as a landing spot to
provide coverage for the custom pgstats APIs, for both fixed-sized and
variable-sized stats kinds. Some recent work related to pgstats is
proving that this structure makes the implementation of new tests
harder.
This commit removes the code related to pgstats from injection_points,
and an equivalent will be reintroduced as a separate test module in a
follow-up commit. This removal is done in its own commit for clarity.
Using injection_points for this test coverage was perhaps not the best
way to design things, but this was good enough while working on the
first flavor of the custom pgstats APIs. Using a new test module will
make easier the introduction of new tests, and we will not need to worry
about the impact of new changes related to custom pgstats could have
with the internals of injection_points.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0sJgO6GAwgFxmzg9MVP=rM7Us8KKcWpuqxe-f5qxmpE0g@mail.gmail.com
The error details updated in this commit can be reached in the
regression tests. They did not follow the project style, and they
should be written them as full sentences.
Some of the errors are switched to use an elog(), for cases that involve
paths that cannot be reached based on the previous state of the parser
processing the input data (array start, object end, etc.). The error
messages for these cases use now a more consistent style across the
board, with the state of the parser reported for debugging.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Michael Paquier <michael@paquier.xyz>
Co-authored-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/1353179.1764901790@sss.pgh.pa.us
find_variable() and its subroutines transiently scribble on the
passed-in "name" string, even though we've declared that "const".
The string is in fact temporary, so this is not very harmful,
but it's confusing and will produce compiler warnings with
late-model gcc. Rearrange the code so that instead of modifying
the given string, we make temporary copies of the parts that we
need separated out. (I used loc_alloc so that the copies are
short-lived and don't need to be freed explicitly.)
This code is poorly structured and confusing, to the point where
my first attempt to fix it was wrong. It is also under-tested,
allowing the broken v1 patch to nonetheless pass regression.
I'll restrain myself from rewriting it completely, and just add
some comments and more test cases.
We will probably want to back-patch this once gcc 15.2 becomes
more widespread, but for now just put it in master.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1324889.1764886170@sss.pgh.pa.us
The general case for converting to a JSONB numeric value is to run the
source datatype's output function and then numeric_in, but we can do
substantially better than that for integer and numeric source values.
This patch improves the speed of jsonb_agg by 30% for integer input,
and nearly 2X for numeric input.
Sadly, the obvious idea of using float4_numeric and float8_numeric
to speed up those cases doesn't work: they are actually slower than
the generic coerce-via-I/O method, and not by a small amount.
They might round off differently than this code has historically done,
too. Leave that alone pending possible changes in those functions.
We can also do better than the existing code for text/varchar/bpchar
source data; this optimization is similar to one that already exists
in the json_agg() code. That saves 20% or so for such inputs.
Also make a couple of other minor improvements, such as not giving
JSONTYPE_CAST its own special case outside the switch when it could
perfectly well be handled inside, and not using dubious string hacking
to detect infinity and NaN results.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/1060917.1753202222@sss.pgh.pa.us
The various variants of jsonb_agg() operate as follows,
for each aggregate input value:
1. Build a JsonbValue tree representation of the input value.
2. Flatten the JsonbValue tree into a Jsonb in on-disk format.
3. Iterate through the Jsonb, building a JsonbValue that is part
of the aggregate's state stored in aggcontext, but is otherwise
identical to what phase 1 built.
This is very slightly less silly than it sounds, because phase 1
involves calling non-JSONB code such as datatype output functions,
which are likely to leak memory, and we don't want to leak into the
aggcontext. Nonetheless, phases 2 and 3 are accomplishing exactly
nothing that is useful if we can make phase 1 put the JsonbValue
tree where we need it. We could probably do that with a bunch of
MemoryContextSwitchTo's, but what seems more robust is to give
pushJsonbValue the responsibility of building the JsonbValue tree
in a specified non-current memory context. The previous patch
created the infrastructure for that, and this patch simply makes
the aggregate functions use it and then rips out phases 2 and 3.
For me, this makes jsonb_agg() with a text column as input run
about 2X faster than before. It's not yet on par with json_agg(),
but this removes a whole lot of the difference.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/1060917.1753202222@sss.pgh.pa.us
Instead of passing "JsonbParseState **" to pushJsonbValue(),
pass a pointer to a JsonbInState, which will contain the
parseState stack pointer as well as other useful fields.
Also, instead of returning a JsonbValue pointer that is often
meaningless/ignored, return the top-level JsonbValue pointer
in the "result" field of the JsonbInState.
This involves a lot of (mostly mechanical) edits, but I think
the results are notationally cleaner and easier to understand.
Certainly the business with sometimes capturing the result of
pushJsonbValue() and sometimes not was bug-prone and incapable of
mechanical verification. In the new arrangement, JsonbInState.result
remains null until we've completed a valid sequence of pushes, so
that an incorrect sequence will result in a null-pointer dereference,
not mistaken use of a partial result.
However, this isn't simply an exercise in prettier notation.
The real reason for doing it is to provide a mechanism whereby
pushJsonbValue() can be told to construct the JsonbValue tree
in a context that is not CurrentMemoryContext. That happens
when a non-null "outcontext" is specified in the JsonbInState.
No callers exercise that option in this patch, but the next
patch in the series will make use of it.
I tried to improve the comments in this area too.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/1060917.1753202222@sss.pgh.pa.us
pg_type.typlen says 12 for the size of timetz, but sizeof(TimeTzADT)
will be 16 on most platforms due to alignment padding. Using the
sizeof number is no problem for usages such as palloc'ing a result
datum, but in usages such as datumCopy we really ought to match
what pg_type says. Add a macro TIMETZ_TYPLEN so that we have a
symbolic way to write that rather than hard-coding "12".
I cannot find any place where we've needed this so far, but an
upcoming patch requires it.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/2329959.1765047648@sss.pgh.pa.us
The SQL standard says that corr() and friends should return NULL in
the mathematically-undefined case where all the inputs in one of
the columns have the same value. We were checking that by seeing
if the sums Sxx and Syy were zero, but that approach is very
vulnerable to roundoff error: if a sum is close to zero but not
exactly that, we'd come out with a pretty silly non-NULL result.
Instead, directly track whether the inputs are all equal by
remembering the common value in each column. Once we detect
that a new input is different from before, represent that by
storing NaN for the common value. (An objection to this scheme
is that if the inputs are all NaN, we will consider that they
were not all equal. But under IEEE float arithmetic rules,
one NaN is never equal to another, so this behavior is arguably
correct. Moreover it matches what we did before in such cases.)
Then, leave the sums at their exact value of zero for as long
as we haven't detected different input values.
This solution requires the aggregate transition state to contain
8 float values not 6, which is not problematic, and it seems to add
less than 1% to the aggregates' runtime, which seems acceptable.
While we're here, improve corr()'s final function to cope with
overflow/underflow in the final calculation, and to clamp its
result to [-1, 1] in case of roundoff error.
Although this is arguably a bug fix, it requires a catversion bump
due to the change in aggregates' initial states, so it can't be
back-patched.
Patch written by me, but many of the ideas are due to Dean Rasheed,
who also did a deal of testing.
Bug: #19340
Reported-by: Oleg Ivanov <o15611@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/19340-6fb9f6637f562092@postgresql.org
Previously, the 027_stream_regress test reported the full contents of
regression.diffs upon a test failure, when the standby and the primary
were still alive. If a test fails quite badly, the amount of
information reported can be really high, bloating the reports in the
buildfarm, the CI, or even local runs.
In most cases, we have noticed that having all this information is not
necessary when attempting to identify the source of a problem in this
test. This commit changes the situation by including the head and tail
of regression.diffs in the reports generated on failure rather than its
full contents, building upon b93f4e2f98 to optionally control the size
of the reports with the new environment variable
PG_TEST_FILE_READ_LINES.
This will perhaps require some more tuning, but the hope is to reduce
some of the buildfarm report bloat while making the information good
enough to deduce what is happening when something is going wrong, be it
in the buildfarm or some tests run in the CI, at least.
Suggested-by: Andres Freund <andres@anarazel.de>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAN55FZ1D6KXvjSs7YGsDeadqCxNF3UUhjRAfforzzP0k-cE=bA@mail.gmail.com
This function reads the lines from a file and filters its contents to
report its head and tail contents. The amount of contents to read from
a file can be tuned by the environment variable PG_TEST_FILE_READ_LINES,
that can be used to override the default of 50 lines. If the file whose
content is read has less lines than two times PG_TEST_FILE_READ_LINES,
the whole file is returned.
This will be used in a follow-up commit to limit the amount of
information reported by some of the TAP tests on failure, where we have
noticed that the contents reported by the buildfarm can be heavily
bloated in some cases, with the head and tail contents of a report being
able to provide enough information to be useful for debugging.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAN55FZ1D6KXvjSs7YGsDeadqCxNF3UUhjRAfforzzP0k-cE=bA@mail.gmail.com
Due to an off-by-one error, the code failed to find matches at the
end of the haystack. Fix by rewriting the loop.
While at it, fix a comment that claimed that the function could find
a zero-length match. Such a match could send a caller into an endless
loop. However, zero-length matches only make sense with an empty
search string, and that case is explicitly excluded by all callers.
To make sure it stays that way, add an Assert and a comment.
Bug: #19341
Reported-by: Adam Warland <adam.warland@infor.com>
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19341-1d9a22915edfec58@postgresql.org
Backpatch-through: 18
apply_scanjoin_target_to_paths wants to avoid useless work and
platform-specific dependencies by throwing away the path list created
prior to applying the final scan/join target and constructing a whole
new one using the final scan/join target. However, this is only valid
when we'll consider all the same strategies after the pathlist reset
as before.
After resetting the path list, we reconsider Append and MergeAppend
paths with the modified target list; therefore, it's only valid for a
partitioned relation. However, what the previous coding missed is that
it cannot be a partitioned join relation, because that also has paths
that are not Append or MergeAppend paths and will not be reconsidered.
Thus, before this patch, we'd sometimes choose a partitionwise strategy
with a higher total cost than cheapest non-partitionwise strategy,
which is not good.
We had a surprising number of tests cases that were relying on this
bug to work as they did. A big part of the reason for this is that row
counts in regression test cases tend to be low, which brings the cost
of partitionwise and non-partitionwise strategies very close together,
especially for merge joins, where the real and perceived advantages of
a partitionwise approach are minimal. In addition, one test case
included a row-count-inflating join. In such cases, a partitionwise
join can easily be a loser on cost, because the total number of tuples
passing through an Append node is much higher than it is with a
non-partitionwise strategy. That test case is adjusted by adding
additional join clauses to avoid the row count inflation.
Although the failure of the planner to choose the lowest-cost path is a
bug, we generally do not back-patch fixes of this type, because planning
is not an exact science and there is always a possibility that some user
will end up with a plan that has a lower estimated cost but actually
runs more slowly. Hence, no backpatch here, either.
The code change here is exactly what was originally proposed by
Ashutosh, but the changes to the comments and test cases have been
very heavily rewritten by me, helped along by some very useful advice
from Richard Guo.
Reported-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Author: Robert Haas <rhaas@postgresql.org>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Arne Roland <arne.roland@malkut.net>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: http://postgr.es/m/CAExHW5toze58+jL-454J3ty11sqJyU13Sz5rJPQZDmASwZgWiA@mail.gmail.com
Newest versions of gcc are able to detect cases where code implicitly
casts away const by assigning the result of strchr() or a similar
function applied to a "const char *" value to a target variable
that's just "char *". This of course creates a hazard of not getting
a compiler warning about scribbling on a string one was not supposed
to, so fixing up such cases is good.
This patch fixes a dozen or so places where we were doing that.
Most are trivial additions of "const" to the target variable,
since no actually-hazardous change was occurring. There is one
place in ecpg.trailer where we were indeed violating the intention
of not modifying a string passed in as "const char *". I believe
that's harmless not a live bug, but let's fix it by copying the
string before modifying it.
There is a remaining trouble spot in ecpg/preproc/variable.c,
which requires more complex surgery. I've left that out of this
commit because I want to study that code a bit more first.
We probably will want to back-patch this once compilers that detect
this pattern get into wider circulation, but for now I'm just
going to apply it to master to see what the buildfarm says.
Thanks to Bertrand Drouvot for finding a couple more spots than
I had.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/1324889.1764886170@sss.pgh.pa.us
The idea is to encourage more the use of these new routines across the
tree, as these offer stronger type safety guarantees than palloc(). In
an ideal world, palloc() would then act as an internal routine of these
flavors, whose footprint in the tree is minimal.
The patch sent by the author is very large, and this chunk of changes
represents something like 10% of the overall patch submitted.
The code compiled is the same before and after this commit, using
objdump to do some validation with a difference taken in-between. There
are some diffs, which are caused by changes in line numbers because some
of the new allocation formulas are shorter, for the following files:
trgm_regexp.c, xpath.c and pg_walinspect.c.
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
Corey Huinker has come up with a recipe that is more compact and more
pleasant to the eye for extended stats because we know that all of them
are 1-dimension JSON arrays. This commit switches the extended stats
tests to use replace() instead of jsonb_pretty(), splitting the data so
as one line is used for each item in the extended stats object.
This results in the removal of a good chunk of test output, that is now
easier to debug with one line used for each item in a stats object.
This patch has not been provided by Corey. This is some post-commit
cleanup work that I have noticed as good enough to do on its own while
reviewing the rest of the patch set Corey has posted.
Discussion: https://postgr.es/m/CADkLM=csMd52i39Ye8-PUUHyzBb3546eSCUTh-FBQ7bzT2uZ4Q@mail.gmail.com
Commit 76b78721ca introduced two new columns in pg_stat_replication_slots
to improve monitoring of slot synchronization. One of these columns was
named slotsync_skip_at, which is inconsistent with the naming convention
used for similar columns in other system views.
Columns that store timestamps of the most recent event typically use the
'last_' in the column name (e.g., last_autovacuum, checksum_last_failure).
Renaming slotsync_skip_at to slotsync_last_skip aligns with this pattern,
making the purpose of the column clearer and improving overall consistency
across the views.
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Michael Banck <mbanck@gmx.net>
Discussion: https://postgr.es/m/20251128091552.GB13635@p46.dedyn.io;lightning.p46.dedyn.io
Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com
Some of the buildfarm members with some old gcc versions have been
complaining about an always-true test for a NULL pointer caused by a
combination of SOFT_ERROR_OCCURRED() and a local ErrorSaveContext
variable.
These warnings are taken care of by removing SOFT_ERROR_OCCURRED(),
switching to a direct variable check, like 56b1e88c80.
Oversights in e1405aa5e3 and 44eba8f06e.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1341064.1764895052@sss.pgh.pa.us
This commit adds the version information of a node initialized by
Cluster.pm, that may vary depending on the install_path given by the
test. The code was written so as the node information, that includes
the version number, was dumped before the version number was set.
This is particularly useful for the pg_upgrade TAP tests, that may mix
several versions for cross-version runs. The TAP infrastructure also
allows mixing nodes with different versions, so this information can be
useful for out-of-core tests.
Backpatch down to v15, where Cluster.pm and the pg_upgrade TAP tests
have been introduced.
Author: Potapov Alexander <a.potapov@postgrespro.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/e59bb-692c0a80-5-6f987180@170377126
Backpatch-through: 15
Adjust the prune_freeze_setup() parameter types of new_relfrozen_xid and
new_relmin_mxid to prevent misleading Coverity analysis.
heap_page_prune_and_freeze() compared these values against NULL when
passing them to prune_freeze_setup(), causing Coverity to assume they
could be NULL and flag a possible null-pointer dereference later, even
though it occurs inside a directly related conditional.
Reported-by: Coverity
Author: Melanie Plageman <melanieplageman@gmail.com>
The key is the first member of PrivateRefCountEntry, which has type
Buffer. This commit changes the key size from sizeof(int32) to
sizeof(Buffer). This appears to be an oversight in commit
4b4b680c3d, but it's of no consequence because Buffer has been a
signed 32-bit integer for a long time.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aS77DTpl0fOkIKSZ%40ip-10-97-1-34.eu-west-3.compute.internal
We were using SnapshotAny to do some index checks, but that's wrong and
causes spurious errors when used on indexes created by CREATE INDEX
CONCURRENTLY. Fix it to use an MVCC snapshot, and add a test for it.
This problem came in with commit 5ae2087202, which introduced
uniqueness check. Backpatch to 17.
Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Backpatch-through: 17
Discussion: https://postgr.es/m/CANtu0ojmVd27fEhfpST7RG2KZvwkX=dMyKUqg0KM87FkOSdz8Q@mail.gmail.com
Currently, headerscheck and cpluspluscheck are very slow, and they
defeat use of ccache. This fixes that, and now they are much faster.
The problem was that the test files are created in a randomly-named
directory (`mktemp -d /tmp/$me.XXXXXX`), and this directory is
mentioned on the compiler command line, which is part of the cache
key.
The solution is to create the test files in the build directory. For
example, for src/include/storage/ipc.h, we generate
tmp_headerscheck_c/src_include_storage_ipc_h.c (or .cpp)
Now ccache works. (And it's also a bit easier to debug everything
with this naming.)
(The subdirectory is used to keep the cleanup trap simple.)
The observed speedup on Cirrus CI for headerscheck plus cpluspluscheck
is from about 1min 20s to only 20s. In local use, the speedups are
similar.
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/b49e74d4-3cf9-4d1c-9dce-09f75e55d026%40eisentraut.org
The assertion checking MyProcNumber used MaxBackends as the upper
bound, but the procInfos array is allocated with size
MaxBackends + NUM_AUXILIARY_PROCS. This inconsistency would cause
a false assertion failure if an auxiliary process calls WaitForLSN().
Author: Xuneng Zhou <xunengzhou@gmail.com>
In an upcoming patch more wait events will be added to the wait event
class (for buffer locking), making the current name too
specific. Alternatively we could introduce a dedicated wait event class for
those, but it seems somewhat confusing to have a BUFFERPIN and a BUFFER wait
event class.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
Commit 1eccb9315 added a test case that will cause a large number
of evaluations of a plpgsql function. With -DCLOBBER_CACHE_ALWAYS,
that takes an unreasonable amount of time (hours) because the
function's cache entries are repeatedly deleted and rebuilt.
That doesn't add any useful test coverage --- other test cases
already exercise plpgsql well enough --- and it's not part of what
this test intended to cover. We can get the same planner coverage,
if not more, by making the test directly invoke numeric_lt().
Reported-by: Tomas Vondra <tomas@vondra.me>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/baf1ae02-83bd-4f5d-872a-1d04f11a9073@vondra.me
With this commit, the next multixid's offset will always be set on the
offsets page, by the time that a backend might try to read it, so we
no longer need the waiting mechanism with the condition variable. In
other words, this eliminates "corner case 2" mentioned in the
comments.
The waiting mechanism was broken in a few scenarios:
- When nextMulti was advanced without WAL-logging the next
multixid. For example, if a later multixid was already assigned and
WAL-logged before the previous one was WAL-logged, and then the
server crashed. In that case the next offset would never be set in
the offsets SLRU, and a query trying to read it would get stuck
waiting for it. Same thing could happen if pg_resetwal was used to
forcibly advance nextMulti.
- In hot standby mode, a deadlock could happen where one backend waits
for the next multixid assignment record, but WAL replay is not
advancing because of a recovery conflict with the waiting backend.
The old TAP test used carefully placed injection points to exercise
the old waiting code, but now that the waiting code is gone, much of
the old test is no longer relevant. Rewrite the test to reproduce the
IPC/MultixactCreation hang after crash recovery instead, and to verify
that previously recorded multixids stay readable.
Backpatch to all supported versions. In back-branches, we still need
to be able to read WAL that was generated before this fix, so in the
back-branches this includes a hack to initialize the next offsets page
when replaying XLOG_MULTIXACT_CREATE_ID for the last multixid on a
page. On 'master', bump XLOG_PAGE_MAGIC instead to indicate that the
WAL is not compatible.
Author: Andrey Borodin <amborodin@acm.org>
Reviewed-by: Dmitry Yurichev <dsy.075@yandex.ru>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Ivan Bykov <i.bykov@modernsys.ru>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/172e5723-d65f-4eec-b512-14beacb326ce@yandex.ru
Backpatch-through: 14
These were removed in 5dee7a603f, but that was too optimistic, per
buildfarm member prion as reported by Tom Lane. Mea (Álvaro's) culpa.
Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Discussion: https://postgr.es/m/570630.1764737028@sss.pgh.pa.us
The majority of cases already used "restartpoint" with just a few
instances of "restart point". Changing the latter spelling to the
former ensures consistency in the user facing documentation. Code
comments are not affected by this since it is not worth the churn
to change anything there.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/0F6E38D0-649F-4489-B2C1-43CD937E6636@yesql.se
The comment for the Pointer type said 'XXX Pointer arithmetic is done
with this, so it can't be void * under "true" ANSI compilers.'. This
has been fixed in the previous commit 756a436893. This now changes
the definition of the type from char * to void *, as envisaged by that
comment.
Extension code that relies on using Pointer for pointer arithmetic
will need to make changes similar to commit 756a436893, but those
changes would be backward compatible.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org
The comment for the Pointer type says 'XXX Pointer arithmetic is done
with this, so it can't be void * under "true" ANSI compilers.'. This
fixes that. Change from Pointer to use char * explicitly where
pointer arithmetic is needed. This makes the meaning of the code
clearer locally and removes a dependency on the actual definition of
the Pointer type. (The definition of the Pointer type is not changed
in this commit.)
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org
Both dsa_get_total_size() and dsa_get_total_size_from_handle() take
an exclusive lock just to read a variable. This commit reduces the
lock level to LW_SHARED in those functions.
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/aS8fMzWs9e8iHxk2%40nathan
amcheck incorrectly reported the following error if there were any
half-dead pages in the index:
ERROR: mismatch between parent key and child high key in index
"amchecktest_id_idx"
It's expected that a half-dead page does not have a downlink in the
parent level, so skip the test.
Reported-by: Konstantin Knizhnik <knizhnik@garret.ru>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Discussion: https://www.postgresql.org/message-id/33e39552-6a2a-46f3-8b34-3f9f8004451f@garret.ru
Backpatch-through: 14
To increase our test coverage in general, and because I will add onto
this in the next commit to also test amcheck with incomplete splits.
This is copied from the similar test we had for GIN indexes. B-tree's
incomplete splits work similarly to GIN's, so with small changes, the
same test works for B-tree too.
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/abd65090-5336-42cc-b768-2bdd66738404@iki.fi
Presently, this view reports NULL for the size of DSAs and dshash
tables because 1) the current backend might not be attached to them
and 2) the registry doesn't save the pointers to the dsa_area or
dshash_table in local memory. Also, the view doesn't show
partially-initialized entries to avoid ambiguity, since those
entries would report a NULL size as well.
This commit introduces a function that looks up the size of a DSA
given its handle (transiently attaching to the control segment if
needed) and teaches pg_dsm_registry_allocations to use it to show
the size of successfully-initialized DSA and dshash entries.
Furthermore, the view now reports partially-initialized entries
with a NULL size.
Reviewed-by: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aSeEDeznAsHR1_YF%40nathan
This idea (implemented in commits and bc32a12e0d and 9e8fa05d34) of
using notices to detect that a session is sleeping was unreliable, so
simplify the concurrency controller session to just look at
pg_stat_activity for a process sleeping on the injection point we want
it to hit. This change allows us to remove a secondary injection point
and the alternative expected output files.
Reproduced by Alexander Lakhin following a report in buildfarm member
skink (which runs the server under valgrind).
Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/3e302c96-cdd2-45ec-af84-03dbcdccde4a@gmail.com
When planning queries with ON CONFLICT on partitioned tables, the
indexes to consider as arbiters for each partition are determined based
on those found in the parent table. However, it's possible for an index
on a partition to be reindexed, and in that case, the auxiliary indexes
created on the partition must be considered as arbiters as well; failing
to do that may result in spurious "duplicate key" errors given
sufficient bad luck.
We fix that in this commit by matching every index that doesn't have a
parent to each initially-determined arbiter index. Every unparented
matching index is considered an additional arbiter index.
Closely related to the fixes in bc32a12e0d and 2bc7e886fc, and for
identical reasons, not backpatched (for now) even though it's a
longstanding issue.
Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CANtu0ojXmqjmEzp-=aJSxjsdE76iAsRgHBoK0QtYHimb_mEfsg@mail.gmail.com
This removes some casts where the input already has the same type as
the type specified by the cast. Their presence could cause risks of
hiding actual type mismatches in the future or silently discarding
qualifiers. It also improves readability. Same kind of idea as
7f798aca1d and ef8fe69360. (This does not change all such
instances, but only those hand-picked by the author.)
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/aSQy2JawavlVlEB0%40ip-10-97-1-34.eu-west-3.compute.internal
Instead of complicated pointer arithmetic, overlay a uint32 array and
just access the array members. That's safe thanks to
XLogRecGetBlockData() returning a MAXALIGNed buffer.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/aSQy2JawavlVlEB0%40ip-10-97-1-34.eu-west-3.compute.internal
One could do more work here to eliminate the Windows difference
described in the comment, but that can be a separate project. The
purpose of this change is to update comments that might confusingly
indicate that C99 is not required.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/170308e6-a7a3-4484-87b2-f960bb564afa%40eisentraut.org
This commit updates two functions that convert "timestamptz" to
"timestamp", and vice-versa, to use the soft error reporting rather than
a their own logic to do the same. These are now named as follows:
- timestamp2timestamptz_safe()
- timestamptz2timestamp_safe()
These functions were suffixed with "_opt_overflow", previously.
This shaves some code, as it is possible to detect how a timestamp[tz]
overflowed based on the returned value rather than a custom state. It
is optionally possible for the callers of these functions to rely on the
error generated internally by these functions, depending on the error
context.
Similar work has been done in d03668ea05 and 4246a977ba.
Reviewed-by: Amul Sul <sulamul@gmail.com>
Discussion: https://postgr.es/m/aS09YF2GmVXjAxbJ@paquier.xyz
The regex mechanism scans through the first "max_chr" character values
to cache character property ranges (isalpha, etc.). For single-byte
encodings, there's no sense in scanning beyond UCHAR_MAX; but for
UTF-8 it makes sense to cache higher code point values (though not all
of them; only up to MAX_SIMPLE_CHR).
Prior to 5a38104b36, the logic about how many character values to scan
was based on the pg_regex_strategy, which was dependent on the
provider. Commit 5a38104b36 preserved that logic exactly, allowing
different providers to define the "max_chr".
Now, change it to depend only on the encoding and whether
ctype_is_c. For this specific calculation, distinguishing between
providers creates more complexity than it's worth.
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
When REINDEX CONCURRENTLY is processing the index that supports a
constraint, there are periods during which multiple indexes match the
constraint index's definition. Those must all be included in the set of
inferred index for INSERT ON CONFLICT, in order to avoid spurious
"duplicate key" errors.
To fix, we set things up to match all indexes against attributes,
expressions and predicates of the constraint index, then return all
indexes that match those, rather than just the one constraint index.
This is more onerous than before, where we would just test the named
constraint for validity, but it's not more onerous than processing
"conventional" inference (where a list of attribute names etc is given).
This is closely related to the misbehaviors fixed by bc32a12e0d, for a
different situation. We're not backpatching this one for now either,
for the same reasons.
Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CANtu0ojXmqjmEzp-=aJSxjsdE76iAsRgHBoK0QtYHimb_mEfsg@mail.gmail.com
This one is almost a textbook example of an aliasing violation, and it
is straightforward to fix, so clean it up. (The warning only shows up
if you remove the -fno-strict-aliasing option.) Also, move the code
after the error checking. Doesn't make a difference technically, but
it seems strange to do actions before errors are checked.
Reported-by: Tatsuo Ishii <ishii@postgresql.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/20240724.155525.366150353176322967.ishii%40postgresql.org
This split exists for most of the other RMGRs, and makes cleaner the
separation between the WAL code, the redo code and the record
description code (already in its own file) when it comes to the sequence
RMGR. The redo and masking routines are moved to a new file,
sequence_xlog.c. All the RMGR routines are now located in a new header,
sequence_xlog.h.
This separation is useful for a different patch related to sequences
that I have been working on, where it makes a refactoring of sequence.c
easier if its RMGR routines and its core routines are split.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/aSfTxIWjiXkTKh1E@paquier.xyz
This commit changes some functions related to the data types date and
timestamp to use the soft error reporting rather than a custom boolean
flag called "overflow", used to let the callers of these functions know
if an overflow happens.
This results in the removal of some boilerplate code, as it is possible
to rely on an error context rather than a custom state, with the
possibility to use the error generated inside the functions updated
here, if necessary.
These functions were suffixed with "_opt_overflow". They are now
renamed to use "_safe" as suffix.
This work is similar to 4246a977ba.
Author: Amul Sul <sulamul@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAAJ_b95HEmFyzHZfsdPquSHeswcopk8MCG1Q_vn4tVkZ+xxofw@mail.gmail.com
42473b3b3 added prosupport infrastructure to allow simplification of
Aggrefs during constant-folding. In some cases the context->root that's
given to eval_const_expressions_mutator() can be NULL. 42473b3b3 failed
to take that into account, which could result in a crash.
To fix, add a check and only call simplify_aggref() when the PlannerInfo
is set.
Author: David Rowley <dgrowleyml@gmail.com>
Reported-by: Birler, Altan <altan.birler@tum.de>
Discussion: https://postgr.es/m/132d4da23b844d5ab9e352d34096eab5@tum.de
We have some limited ability to detect redundant and contradictory
conditions involving an nbtree row comparison key following commits
f09816a0 and bd3f59fd: we can do so in simple cases involving IS NULL
and IS NOT NULL keys on a row compare key's first column. We can
likewise determine that a scan's qual is unsatisfiable given a row
compare whose first subkey's arg is NULL. Update obsolete comments that
claimed that we merely copied row compares into the output key array
"without any editorialization".
Also update another _bt_preprocess_keys header comment paragraph: add a
parenthetical remark that points out that preprocessing will generate a
skip array for the preceding example qual. That will ultimate lead to
preprocessing marking the example's lower-order y key required -- which
is exactly what the example supposes cannot happen. Keep the original
comment, though, since it accurately describes the mechanical rules that
determine which keys get marked required in the absence of skip arrays
(which can occasionally still matter). This fixes an oversight in
commit 92fe23d9, which added the nbtree skip scan optimization.
Author: Peter Geoghegan <pg@bowt.ie>
Backpatch-through: 18
Formerly, when updating an auto-updatable view, or a relation with
rules, if the original query had any data-modifying CTEs, the rewriter
would rewrite those CTEs multiple times as RewriteQuery() recursed
into the product queries. In most cases that was harmless, because
RewriteQuery() is mostly idempotent. However, if the CTE involved
updating an always-generated column, it would trigger an error because
any subsequent rewrite would appear to be attempting to assign a
non-default value to the always-generated column.
This could perhaps be fixed by attempting to make RewriteQuery() fully
idempotent, but that looks quite tricky to achieve, and would probably
be quite fragile, given that more generated-column-type features might
be added in the future.
Instead, fix by arranging for RewriteQuery() to rewrite each CTE
exactly once (by tracking the number of CTEs already rewritten as it
recurses). This has the advantage of being simpler and more efficient,
but it does make RewriteQuery() dependent on the order in which
rewriteRuleAction() joins the CTE lists from the original query and
the rule action, so care must be taken if that is ever changed.
Reported-by: Bernice Southey <bernice.southey@gmail.com>
Author: Bernice Southey <bernice.southey@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CAEDh4nyD6MSH9bROhsOsuTqGAv_QceU_GDvN9WcHLtZTCYM1kA@mail.gmail.com
Backpatch-through: 14
There was a pg_isblank() function that claimed to be a replacement for
the standard isblank() function, which was thought to be "not very
portable yet". We can now assume that it's portable (it's in C99).
But pg_isblank() actually diverged from the standard isblank() by also
accepting '\r', while the standard one only accepts space and tab.
This was added to support parsing pg_hba.conf under Windows. But the
hba parsing code now works completely differently and already handles
line endings before we get to pg_isblank(). The other user of
pg_isblank() is for ident protocol message parsing, which also handles
'\r' separately. So this behavior is now obsolete and confusing.
To improve clarity, I separated those concerns. The ident parsing now
gets its own function that hardcodes the whitespace characters
mentioned by the relevant RFC. pg_isblank() is now static in hba.c
and is a wrapper around the standard isblank(), with some extra logic
to ensure robust treatment of non-ASCII characters.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/170308e6-a7a3-4484-87b2-f960bb564afa%40eisentraut.org
Introduce a new column, slotsync_skip_reason, in the pg_replication_slots
view. This column records the reason why the last slot synchronization was
skipped. It is primarily relevant for logical replication slots on standby
servers where the 'synced' field is true. The value is NULL when
synchronization succeeds.
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com
This commit introduces three new functions for marking shared buffers as
dirty by using the functions introduced in 9660906dbd:
* pg_buffercache_mark_dirty() for one shared buffer.
- pg_buffercache_mark_dirt_relation() for all the shared buffers in a
relation.
* pg_buffercache_mark_dirty_all() for all the shared buffers in pool.
The "_all" and "_relation" flavors are designed to address the
inefficiency of repeatedly calling pg_buffercache_mark_dirty() for each
individual buffer, which can be time-consuming when dealing with with
large shared buffers pool.
These functions are intended as developer tools and are available only
to superusers. There is no need to bump the version of pg_buffercache,
4b203d499c having done this job in this release cycle.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Joseph Koshakow <koshy44@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Yuhang Qiu <iamqyh@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw@mail.gmail.com
The solution used in 0ca3b1697 to determine the Parallel TID Range
Scan's start location was to modify the signature of
table_block_parallelscan_startblock_init() to allow the startblock
to be passed in as a parameter. This allows the scan limits to be
adjusted before that function is called so that the limits are picked up
when the parallel scan starts. The commit made it so the call to
table_block_parallelscan_startblock_init uses the HeapScanDesc's
rs_startblock to pass the startblock to the parallel scan. That all
works ok for Parallel TID Range scans as the HeapScanDesc rs_startblock
gets set by heap_setscanlimits(), but for Parallel Seq Scans, initscan()
does not initialize rs_startblock, and that results in passing an
uninitialized value to table_block_parallelscan_startblock_init() as
noted by the buildfarm member skink, running Valgrind.
To fix this issue, make it so initscan() sets the rs_startblock for
parallel scans unless we're doing a rescan. This makes it so
table_block_parallelscan_startblock_init() will be called with the
startblock set to InvalidBlockNumber, and that'll allow the syncscan
code to find the correct start location (when enabled). For Parallel
TID Range Scans, this InvalidBlockNumber value will be overwritten in
the call to heap_setscanlimits().
initscan() is a bit light on documentation on what's meant to get
initialized where for parallel scans. From what I can tell, it looks like
it just didn't matter prior to 0ca3b1697 that rs_startblock was left
uninitialized for parallel scans. To address the light documentation,
I've also added some comments to mention that the syncscan location for
parallel scans is figured out in table_block_parallelscan_startblock_init.
I've also taken the liberty to adjust the if/else if/else code in
initscan() to make it clearer which parts apply to parallel scans and
which parts are for the serial scans.
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvqALm+k7FyfdQdCw1yF_8HojvR61YRrNhwRQPE=zSmnQA@mail.gmail.com
This commit introduces new internal bufmgr routines for marking shared
buffers as dirty:
* MarkDirtyUnpinnedBuffer()
* MarkDirtyRelUnpinnedBuffers()
* MarkDirtyAllUnpinnedBuffers()
These functions provide an efficient mechanism to respectively mark one
buffer, all the buffers of a relation, or the entire shared buffer pool
as dirty, something that can be useful to force patterns for the
checkpointer. MarkDirtyUnpinnedBufferInternal(), an extra routine, is
used by these three, to mark as dirty an unpinned buffer.
They are intended as developer tools to manipulate buffer dirtiness in
bulk, and will be used in a follow-up commit.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Joseph Koshakow <koshy44@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Yuhang Qiu <iamqyh@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw@mail.gmail.com
Normally, if a WHERE clause is implied by the predicate of a partial
index, we drop that clause from the set of quals used with the index,
since it's redundant to test it if we're scanning that index.
However, if it's a hash index (or any !amoptionalkey index), this
could result in dropping all available quals for the index's first
key, preventing us from generating an indexscan.
It's fair to question the practical usefulness of this case. Since
hash only supports equality quals, the situation could only arise
if the index's predicate is "WHERE indexkey = constant", implying
that the index contains only one hash value, which would make hash
a really poor choice of index type. However, perhaps there are
other !amoptionalkey index AMs out there with which such cases are
more plausible.
To fix, just don't filter the candidate indexquals this way if
the index is !amoptionalkey. That's a bit hokey because it may
result in testing quals we didn't need to test, but to do it
more accurately we'd have to redundantly identify which candidate
quals are actually usable with the index, something we don't know
at this early stage of planning. Doesn't seem worth the effort.
Reported-by: Sergei Glukhov <s.glukhov@postgrespro.ru>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/e200bf38-6b45-446a-83fd-48617211feff@postgrespro.ru
Backpatch-through: 14
The documentation for CREATE/ALTER PUBLICATION previously showed:
[ ONLY ] table_name [ * ] [ ( column_name [, ... ] ) ] [ WHERE ( expression ) ] [, ... ]
to indicate that the table/column specification could be repeated.
However, placing [, ... ] directly after a multi-part construct was
misleading and made it unclear which portion was repeatable.
This commit introduces a new term, table_and_columns, to represent:
[ ONLY ] table_name [ * ] [ ( column_name [, ... ] ) ] [ WHERE ( expression ) ]
and updates the synopsis to use:
table_and_columns [, ... ]
which clearly identifies the repeatable element.
Backpatched to v15, where the misleading syntax was introduced.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHut+PtsyvYL3KmA6C8f0ZpXQ=7FEqQtETVy-BOF+cm9WPvfMQ@mail.gmail.com
Backpatch-through: 15
Two of the isolation tests introduce by commit bc32a12e0d had a
problem under CATCACHE_FORCE_RELEASE, as evidenced by buildfarm member
prion. An injection point is hit ahead of what the test spec expects,
so a session goes to sleep and there's no one there to wait it up. Fix
in the simplest possible way, which is to conditionally wake the process
up if it's waiting. An alternative output file is necessary to cover
both cases.
This suggests a couple of possible improvements to the injection points
infrastructure: a conditional wakeup (doing nothing if no one is
sleeping, as opposed to throwing an error), as well as a way to attach
to a point in "deactivated" mode, activated later.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Discussion: https://postgr.es/m/202511261817.fyixgtt3hqdr@alvherre.pgsql
They were already using pg_attribute_aligned. This replaces that with
alignas and moves that into the required syntactic position. This
ends up making these three atomics implementations appear a bit more
consistent, but shouldn't change anything otherwise.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org
transformJsonFuncExpr() used exprType()/exprLocation() on the
possibly coerced path expression, which could be NULL when
coercion to jsonpath failed, leading to "cache lookup failed
for type 0" errors.
Preserve the original expression node so that type and location
in the "must be of type jsonpath" error are reported correctly.
Add regression tests to cover these cases.
Reported-by: Jian He <jian.universality@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CACJufxHunVg81JMuNo8Yvv_hJD0DicgaVN2Wteu8aJbVJPBjZA@mail.gmail.com
Backpatch-through: 17
In v14, bb437f995 added support for scanning for ranges of TIDs using a
dedicated executor node for the purpose. Here, we allow these scans to
be parallelized. The range of blocks to scan is divvied up similarly to
how a Parallel Seq Scans does that, where 'chunks' of blocks are
allocated to each worker and the size of those chunks is slowly reduced
down to 1 block per worker by the time we're nearing the end of the
scan. Doing that means workers finish at roughly the same time.
Allowing TID Range Scans to be parallelized removes the dilemma from the
planner as to whether a Parallel Seq Scan will cost less than a
non-parallel TID Range Scan due to the CPU concurrency of the Seq Scan
(disk costs are not divided by the number of workers). It was possible
the planner could choose the Parallel Seq Scan which would result in
reading additional blocks during execution than the TID Scan would have.
Allowing Parallel TID Range Scans removes the trade-off the planner
makes when choosing between reduced CPU costs due to parallelism vs
additional I/O from the Parallel Seq Scan due to it scanning blocks from
outside of the required TID range. There is also, of course, the
traditional parallelism performance benefits to be gained as well, which
likely doesn't need to be explained here.
Author: Cary Huang <cary.huang@highgo.ca>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Steven Niu <niushiji@gmail.com>
Discussion: https://postgr.es/m/18f2c002a24.11bc2ab825151706.3749144144619388582@highgo.ca
This adds SupportRequestSimplifyAggref to allow pg_proc.prosupport
functions to receive an Aggref and allow them to determine if there is a
way that the Aggref call can be optimized.
Also added is a support function to allow transformation of COUNT(ANY)
into COUNT(*). This is possible to do when the given "ANY" cannot be
NULL and also that there are no ORDER BY / DISTINCT clauses within the
Aggref. This is a useful transformation to do as it is common that
people write COUNT(1), which until now has added unneeded overhead.
When counting a NOT NULL column. The overheads can be worse as that
might mean deforming more of the tuple, which for large fact tables may
be many columns in.
It may be possible to add prosupport functions for other aggregates. We
could consider if ORDER BY could be dropped for some calls, e.g. the
ORDER BY is quite useless in MAX(c ORDER BY c).
There is a little bit of passing fallout from adjusting
expr_is_nonnullable() to handle Const which results in a plan change in
the aggregates.out regression test. Previously, nothing was able to
determine that "One-Time Filter: (100 IS NOT NULL)" was always true,
therefore useless to include in the plan.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAApHDvqGcPTagXpKfH=CrmHBqALpziThJEDs_MrPqjKVeDF9wA@mail.gmail.com
If DSM registry entry initialization fails, backends could try to
use an uninitialized DSM segment, DSA, or dshash table (since the
entry is still added to the registry). To fix, restructure the
code so that the registry retries initialization as needed. This
commit also modifies pg_get_dsm_registry_allocations() to leave out
partially-initialized entries, as they shouldn't have any allocated
memory.
DSM registry entry initialization shouldn't fail often in practice,
but retrying was deemed better than leaving entries in a
permanently failed state (as was done by commit 1165a933aa, which
has since been reverted).
Suggested-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/E1vJHUk-006I7r-37%40gemulon.postgresql.org
Backpatch-through: 17
Previously, the caller needed to check ctype_is_c first for some
routines and not others. Now, the APIs consistently work, and the
caller can just check ctype_is_c for optimization purposes.
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
pgperltidy requires a particular version of perltidy, but the version
wasn't checked like how pgindent checks the underlying indent binary.
Fix by checking the version of perltidy and error out if an incorrect
version is used.
Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/1209850.1764092152@sss.pgh.pa.us
This reverts commit 1165a933aa (and the corresponding commits on
the back-branches). In a follow-up commit, we'll teach the
registry to retry entry initialization instead of leaving it in a
permanently failed state.
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/E1vJHUk-006I7r-37%40gemulon.postgresql.org
Backpatch-through: 17
Refactor the setup and planning phases of pruning and freezing into
helpers. This streamlines heap_page_prune_and_freeze() and makes it more
clear when the examination of tuples ends and page modifications begin.
No code change beyond what was required to extract the code into helper
functions.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/mhf4vkmh3j57zx7vuxp4jagtdzwhu3573pgfpmnjwqa6i6yj5y%40sy4ymcdtdklo
ssl_passphrase_command_supports_reload was not covered by the SSL
testsuite, and connection tests after unlocking secrets with the
passphrase was also missing. This adds test coverage for reloads
of passphrase commands as well as connection attempts which tests
the different codepaths for Windows and non-EXEC_BACKEND builds.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/5F301096-921A-427D-8EC1-EBAEC2A35082@yesql.se
There is no straightforward way to determine if a cluster is running
in EXEC_BACKEND mode or not, which is useful for tests to know. This
adds a GUC debug_exec_backend similar to debug_assertions which will
be true when the server is running in EXEC_BACKEND mode.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/5F301096-921A-427D-8EC1-EBAEC2A35082@yesql.se
When running on Windows (or EXEC_BACKEND) the SSL configuration will
be reloaded on each backend start, so the passphrase command will be
reloaded along with it. This implies that passphrase command reload
must be enabled on Windows for connections to work at all. Document
this since it wasn't mentioned explicitly, and will there add markup
for parameter value to match the rest of the docs.
Backpatch to all supported versions.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/5F301096-921A-427D-8EC1-EBAEC2A35082@yesql.se
Backpatch-through: 14
The code comment said, "It is expected that this routine will
eventually be replaced with the C99 hypot() function.", so let's do
that now.
This function is tested via the geometry regression test, so if it is
faulty on any platform, it will show up there.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/170308e6-a7a3-4484-87b2-f960bb564afa%40eisentraut.org
Response padding from the oauth_validator abuse tests was adding a
couple megabytes to the test logs. We don't need the buildfarm to hold
onto that, and we don't need to read it when debugging; truncate it.
Reported-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/202511251218.zfs4nu2qnh2m%40alvherre.pgsql
Backpatch-through: 18
The test failed because it assumed that a newly created logical
replication slot could be synced to the standby by the slotsync worker.
However, the presence of an existing physical slot caused the new logical
slot to use a non-latest xmin. On the standby, the DDL had already been
replayed, advancing xmin, which led to the slotsync worker failing to sync
the lagging logical slot.
To resolve this, we moved the slot sync statistics tests to run after the
tests that do not require the newly created slot to be sync-ready.
As per buildfarm.
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OSCPR01MB14966FE0BFB6C212298BFFEDEF5D1A@OSCPR01MB14966.jpnprd01.prod.outlook.com
pg_dependencies is used as data type for the contents of dependencies
extended statistics. This new input function consumes the format that
has been established by e76defbcf0 for the output function of
pg_dependencies, enforcing some sanity checks for:
- Checks for the input object, which should be a one-dimension array
with correct attributes and values.
- The key names: "attributes", "dependency", "degree". All are
required, other key names are blocked.
- Value types for each key: "attributes" requires an array of integers,
"dependency" an attribute number, "degree" a float.
- List of attributes. In this case, it is possible that some
dependencies are not listed in the statistics data, as items with a
degree of 0 are discarded when building the statistics. This commit
includes checks for simple scenarios, like duplicated attributes, or
overlapping values between the list of "attributes" and the "dependency"
value. Even if the input function considers the input as valid, a value
still needs to be cross-checked with the attributes defined in a
statistics object at import.
- Based on the discussion, the checks on the values are loose, as there
is also an argument for potentially stats injection. For example,
"degree" should be defined in [0.0,1.0], but a check is not enforced.
This is required for a follow-up patch that aims to implement the import
of extended statistics. Some tests are added to check the code paths of
the JSON parser checking the shape of the pg_dependencies inputs, with
91% of code coverage reached. The tests are located in their own new
test file, for clarity.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Yuefei Shi <shiyuefei1004@gmail.com>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
pg_ndistinct is used as data type for the contents of ndistinct extended
statistics. This new input function consumes the format that has been
established by 1f927cce44 for the output function of pg_ndistinct,
enforcing some sanity checks for:
- Checks for the input object, which should be a one-dimension array
with correct attributes and values.
- The key names: "attributes", "ndistinct". Both are required, other
key names are blocked.
- Value types for each key: "attributes" requires an array of integers,
and "ndistinct" an integer.
- List of attributes. Note that this enforces a check so as an
attribute list has to be a subset of the longest attribute list found.
This does not enforce that a full group of attribute sets exist, based
on how the groups are generated when the ndistinct objects are
generated, making the list of ndistinct items a bit loose. Note a check
would still be required at import to see if the attributes listed match
with the attribute numbers set in the definition of a statistics object.
- Based on the discussion, the checks on the values are loose, as there
is also an argument for potentially stats injection. The relation and
attribute level stats follow the same line of argument for the values.
This is required for a follow-up patch that aims to implement the import
of extended statistics. Some tests are added to check the code paths of
the JSON parser checking the shape of the pg_ndistinct inputs, with 90%
of code coverage reached. The tests are located in their own new test
file, for clarity.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Yuefei Shi <shiyuefei1004@gmail.com>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
heap_page_prune_and_freeze() requires the caller to initialize
PruneFreezeParams->cutoffs so that the function can correctly evaluate
whether tuples should be frozen. This requirement previously existed
only in comments and was easy to miss, especially after “cutoffs” was
converted from a direct function parameter to a field of the newly
introduced PruneFreezeParams struct (added in 1937ed7062). Adding an
assert makes this requirement explicit and harder to violate.
Also, fix a minor typo while we're at it.
Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/0AC177F5-5E26-45EE-B273-357C51212AC5%40gmail.com
This bloats the regression log files for no reason.
Backpatch to 18; no further only because it fails to apply cleanly.
(It's just whitespace change that conflicts, but I don't think this
warrants more effort than this.)
Discussion: https://postgr.es/m/202511251218.zfs4nu2qnh2m@alvherre.pgsql
Previously, gen_guc_tables.pl would emit "Use of uninitialized value"
warnings if required fields were missing in guc_parameters.dat (for
example, when an integer or real GUC omitted the 'max' value). The
resulting error messages were unclear and did not identify which GUC
entry was problematic.
Add explicit validation of required fields depending on the parameter
type, and fail with a clear and specific message such as:
guc_parameters.dat:1909: error: entry "max_index_keys" of type "int" is missing required field "max"
No changes to generated guc_tables.c.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/CAEoWx2%3DoP4LgHi771_OKhPPUS7B-CTqCs%3D%3DuQcNXWrwBoAm5Vg%40mail.gmail.com
The issue occurred because the replication slot was not released in the
slotsync worker when a slot synchronization cycle was skipped. This skip
happened because the required WAL was not received and flushed on the
standby server. As a result, in the next cycle, when attempting to acquire
the slot, an assertion failure was triggered.
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Discussion: https://postgr.es/m/CAA4eK1KMwYUYy=oAVHu9mam+vX50ixxfhO4_C=kgQC8VCQHEfw@mail.gmail.com
This patch adds two new columns to the pg_stat_replication_slots view:
slotsync_skip_count - the total number of times a slotsync operation was
skipped.
slotsync_skip_at - the timestamp of the most recent skip.
These additions provide better visibility into replication slot
synchronization behavior.
A future patch will introduce the slotsync_skip_reason column in
pg_replication_slots to capture the reason for skip.
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com
This comment should probably have been moved to pg_locale_libc.c in
commit 66ac94cdc7 (2024), but upon closer examination it was already
completely obsolete then.
The first part of the comment has been obsolete since commit
85feb77aa0 (2017), which required that the system provides the
required wide-character functions.
The second part has been obsolete since commit e9931bfb75 (2024),
which eliminated code paths depending on the global LC_CTYPE setting.
Discussion: https://www.postgresql.org/message-id/flat/170308e6-a7a3-4484-87b2-f960bb564afa%40eisentraut.org
This commit renames write_chunk and read_chunk to respectively
pgstat_write_chunk() and pgstat_read_chunk(), along with the *_s
convenience macros.
These are made available for plug-ins, so as any code that decides to
write and/or read stats data can rely on a single code path for this
work.
Extracted from a larger patch by the same author.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com
Accidentally the code in LWLockWakeup() checked the list of to-be-woken up
processes to see if LW_FLAG_HAS_WAITERS should be unset. That means that
HAS_WAITERS would not get unset immediately, but only during the next,
unnecessary, call to LWLockWakeup().
Luckily, as the code stands, this is just a small efficiency issue.
However, if there were (as in a patch of mine) a case in which LWLockWakeup()
would not find any backend to wake, despite the wait list not being empty,
we'd wrongly unset LW_FLAG_HAS_WAITERS, leading to potentially hanging.
While the consequences in the backbranches are limited, the code as-is
confusing, and it is possible that there are workloads where the additional
wait list lock acquisitions hurt, therefore backpatch.
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
Backpatch-through: 14
We've long had a practice of making views temporary by default if they
reference any temporary tables. However the implementation was pretty
incomplete, in that it only searched for RangeTblEntry references to
temp relations. Uses of temporary types, regclass constants, etc
were not detected even though the dependency mechanism considers them
grounds for dropping the view. Thus a view not believed to be temp
could silently go away at session exit anyhow.
To improve matters, replace the ad-hoc isQueryUsingTempRelation()
logic with use of the dependency-based infrastructure introduced by
commit 572c40ba9. This is complete by definition, and it's less code
overall.
While we're at it, we can also extend the warning NOTICE (or ERROR
in the case of a materialized view) to mention one of the temp
objects motivating the classification of the view as temp, as was
done for functions in 572c40ba9.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Discussion: https://postgr.es/m/19cf6ae1-04cd-422c-a760-d7e75fe6cba9@uni-muenster.de
Group the PG_PROTOCOL() codes, add a comment to AuthRequest now that the
AUTH_REQ codes live in a different header, and make some small
adjustments to spacing and comment style for the sake of scannability.
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CAOYmi%2B%3D6zg4oXXOQtifrVao_YKiujTDa3u6bxnU08r0FsSig4g%40mail.gmail.com
Commit 600086f47 added (several bespoke copies of) size_t addition with
overflow checks to libpq. Move this to common/int.h, along with
its subtraction and multiplication counterparts.
pg_neg_size_overflow() is intentionally omitted; I'm not sure we should
add SSIZE_MAX to win32_port.h for the sake of a function with no
callers.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAOYmi%2B%3D%2BpqUd2MUitvgW1pAJuXgG_TKCVc3_Ek7pe8z9nkf%2BAg%40mail.gmail.com
Previously, we would only consider indexes marked indisvalid as usable
for INSERT ON CONFLICT. But that's problematic during CREATE INDEX
CONCURRENTLY and REINDEX CONCURRENTLY, because concurrent transactions
would end up with inconsistents lists of inferred indexes, leading to
deadlocks and spurious errors about unique key violations (because two
transactions are operating on different indexes for the speculative
insertion tokens). Change this function to return indexes even if
invalid. This fixes the spurious errors and deadlocks.
Because such indexes might not be complete, we still need uniqueness to
be verified in a different way. We do that by requiring that at least
one index marked valid is part of the set of indexes returned. It is
that index that is going to help ensure that the inserted tuple is
indeed unique.
This does not fix similar problems occurring with partitioned tables or
with named constraints. These problems will be fixed in follow-up
commits.
We have no user report of this problem, even though it exists in all
branches. Because of that and given that the fix is somewhat tricky, I
decided not to backpatch for now.
Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CANtu0ogv+6wqRzPK241jik4U95s1pW3MCZ3rX5ZqbFdUysz7Qw@mail.gmail.com
index-killtuples test depends on the contrib modules btree_gin and
btree_gist, which would not be installed in a temporary installation
with an execution of the main isolation test suite like this one:
make -C src/test/isolation/ check
src/test/isolation/ should not depend on contrib/, and EXTRA_INSTALL has
no effect in this case as this test suite uses its own Makefile rules.
This commit moves index-killtuples into its new module, called "index",
whose name looks like the best fit there can be as it depends on more
than one index AM. btree_gin and btree_gist are now pulled in the
temporary installation with EXTRA_INSTALL. The test is renamed to
"killtuples", for simplicity.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Suggested-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aKJsWedftW7UX1WM@paquier.xyz
This replaces some uses of pg_attribute_aligned() with the standard
alignas() for cases where extended alignment (larger than max_align_t)
is required.
This patch stipulates that all supported compilers must support
alignments up to PG_IO_ALIGN_SIZE, but that seems pretty likely.
We can then also desupport the case where direct I/O is disabled
because pg_attribute_aligned is not supported.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org
ba2a3c2302 has added a way to check if a buffer is spread across
multiple pages with some NUMA information, via a new view
pg_buffercache_numa that depends on pg_buffercache_numa_pages(), a SQL
function. These can only be queried when support for libnuma exists,
generating an error if not.
However, it can be useful to know how shared buffers and OS pages map
when NUMA is not supported or not available. This commit expands the
capabilities around pg_buffercache_numa:
- pg_buffercache_numa_pages() is refactored as an internal function,
able to optionally process NUMA. Its SQL definition prior to this
commit is still around to ensure backward-compatibility with v1.6.
- A SQL function called pg_buffercache_os_pages() is added, able to work
with or without NUMA.
- The view pg_buffercache_numa is redefined to use
pg_buffercache_os_pages().
- A new view is added, called pg_buffercache_os_pages. This ignores
NUMA for its result processing, for a better efficiency.
The implementation is done so as there is no code duplication between
the NUMA and non-NUMA views/functions, relying on one internal function
that does the job for all of them. The module is bumped to v1.7.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/Z/fFA2heH6lpSLlt@ip-10-97-1-34.eu-west-3.compute.internal
The comment incorrectly indicated that indexcollations[] stored
collations for both key columns and INCLUDE columns, but in reality it
only has elements for the key columns. canreturn[] didn't get a mention,
so add that while we're here.
Author: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAEG8a3LwbZgMKOQ9CmZarX5DEipKivdHp5PZMOO-riL0w%3DL%3D4A%40mail.gmail.com
Backpatch-through: 14
We don't have an official concept of temporary functions. (You can
make one explicitly in pg_temp, but then you have to explicitly
schema-qualify it on every call.) However, until now we were quite
laissez-faire about whether a non-temporary function could depend on
a temporary object, such as a temp table or view. If one does,
it will silently go away at end of session, due to the automatic
DROP ... CASCADE on the session's temporary objects. People have
complained that that's surprising; however, we can't really forbid
it because other people (including our own regression tests) rely
on being able to do it. Let's compromise by emitting a NOTICE
at CREATE FUNCTION time. This is somewhat comparable to our
ancient practice of emitting a NOTICE when forcing a view to
become temp because it depends on temp tables.
Along the way, refactor recordDependencyOnExpr() so that the
dependencies of an expression can be combined with other
dependencies, instead of being emitted separately and perhaps
duplicatively.
We should probably make the implementation of temp-by-default
views use the same infrastructure used here, but that's for
another patch. It's unclear whether there are any other object
classes that deserve similar treatment.
Author: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19cf6ae1-04cd-422c-a760-d7e75fe6cba9@uni-muenster.de
This request allows a support function to replace a function call
appearing in FROM (typically a set-returning function) with an
equivalent SELECT subquery. The subquery will then be subject
to the planner's usual optimizations, potentially allowing a much
better plan to be generated. While the planner has long done this
automatically for simple SQL-language functions, it's now possible
for extensions to do it for functions outside that group.
Notably, this could be useful for functions that are presently
implemented in PL/pgSQL and work by generating and then EXECUTE'ing
a SQL query.
Author: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/09de6afa-c33d-4d94-a5cb-afc6cea0d2bb@illuminatedcomputing.com
The existing range_minus function raises an exception when the range is
"split", because then the result can't be represented by a single range.
For example '[0,10)'::int4range - '[4,5)' would be '[0,4)' and '[5,10)'.
This commit adds new set-returning functions so that callers can get
results even in the case of splits. There is no risk of an exception for
multiranges, but a set-returning function lets us handle them the same
way we handle ranges.
Both functions return zero results if the subtraction would give an
empty range/multirange.
The main use-case for these functions is to implement UPDATE/DELETE FOR
PORTION OF, which must compute the application-time of "temporal
leftovers": the part of history in an updated/deleted row that was not
changed. To preserve the untouched history, we will implicitly insert
one record for each result returned by range/multirange_minus_multi.
Using a set-returning function will also let us support user-defined
types for application-time update/delete in the future.
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/ec498c3d-5f2b-48ec-b989-5561c8aa2024%40illuminatedcomputing.com
There was no easy way to run specific tests in the meson based builds.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: postgr.es/m/CAExHW5tK-QqayUN0%2BN3MF5bjV6vLKDCkRuGwoDJwc7vGjwCygQ%40mail.gmail.com
Makes the code a little simpler.
The old implementation accepted trailing whitespace, but that was
unnecessary. Firstly, its sibling function for parsing decimals,
strtodouble(), does not accept trailing whitespace. Secondly, none of
the callers can pass a string with trailing whitespace to it.
In the passing, check specifically for ERANGE before printing the "out
of range" error. On some systems, strtoul() and strtod() return EINVAL
on an empty or all-spaces string, and "invalid input syntax" is more
appropriate for that than "out of range". For the existing
strtodouble() function this is purely academical because it's never
called with errorOK==false, but let's be tidy. (Perhaps we should
remove the dead codepaths altogether, but I'll leave that for another
day.)
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Yuefei Shi <shiyuefei1004@gmail.com>
Reviewed-by: Neil Chen <carpenter.nail.cz@gmail.com>
Discussion: https://www.postgresql.org/message-id/861dd5bd-f2c9-4ff5-8aa0-f82bdb75ec1f@iki.fi
This reverts changes done in PostgreSQL over the upstream code to
avoid relying on C99 <stdint.h> and <inttypes.h>.
In passing, there were a few other minor and cosmetic changes that I
left in to improve alignment with upstream, including some C11 feature
use (_Noreturn).
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/9ad2749f-77ab-4ecb-a321-1ca915480b05%40eisentraut.org
This changes a few union members that only existed to ensure
alignments and replaces them with the C11 alignas specifier.
This change only uses fundamental alignments (meaning approximately
alignments of basic types), which all C11 compilers must support.
There are opportunities for similar changes using extended alignments,
for example in PGIOAlignedBlock, but these are not necessarily
supported by all compilers, so they are kept as a separate change.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org
Commit 4bea91f added support for "COPY table TO" with partitioned
tables. This commit enhances initial table synchronization in logical
replication to use "COPY table TO" for partitioned tables if possible,
instead of "COPY (SELECT ...) TO" variant, improving performance.
Author: Ajin Cherian <itsajin@gmail.com>
Discussion: https://postgr.es/m/CAFPTHDY=w+xmEof=yyjhbDzaLxhBkoBzKcksEofXcT6EcjMbtQ@mail.gmail.com
This conforms more closely with the style of other struct initializers
in the code base. Initializing multiple fields on a single line is
unpopular in part because pgindent won't permit a space after the comma
before the next field's period.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/87see87fnq.fsf%40wibble.ilmari.org
This generates a more accurate code count because 'make distclean'
doesn't always remove build files.
Author: idea from David Rowley
Discussion: https://postgr.es/m/aR4hoOotVHB7TXo5@momjian.us
Backpatch-through: master
During pruning and freezing in phase I of vacuum, we delay clearing
all_visible and all_frozen in the presence of dead items. This allows
opportunistic freezing if the page would otherwise be fully frozen,
since those dead items are later removed in vacuum phase III.
To move the VM update into the same WAL record that
prunes and freezes tuples, we must know whether the page will
be marked all-visible/all-frozen before emitting WAL. Previously we
waited until after emitting WAL to update all_visible/all_frozen to
their correct values.
The only barrier to updating these flags immediately after deciding
whether to opportunistically freeze was that while emitting WAL for a
record freezing tuples, we use the pre-corrected value of all_frozen to
compute the snapshot conflict horizon. By determining the conflict
horizon earlier, we can update the flags immediately after making the
opportunistic freeze decision.
This is required to set the VM in the XLOG_HEAP2_PRUNE_VACUUM_SCAN
record emitted by pruning and freezing.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
Previously, we relied on all_visible and all_frozen being used together
to ensure that all_frozen was correct, but it is better to keep both
fields updated.
Future changes will separate their usage, so we should not depend on
all_visible for the validity of all_frozen.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
heap_page_prune_and_freeze() had accumulated an unwieldy number of input
parameters and upcoming work to handle VM updates in this function will
add even more.
Introduce a new PruneFreezeParams struct to group the function’s input
parameters, improving readability and maintainability.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc%407uw6jyyxuyf7
A set of wording improvements and spelling fixes.
Author: Oleg Sibiryakov <o.sibiryakov@postgrespro.ru>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/e62bedb5-c26f-4d37-b4ed-ce9b55f1e980@postgrespro.ru
When running in Docker, the container may not have privileges needed by
get_mempolicy(). This is called by numa_available() in libnuma, but
versions prior to 2.0.19 did not expect that. The numa_available() call
seemingly succeeds, but then we get unexpected failures when trying to
query status of pages:
postgres =# select * from pg_shmem_allocations_numa;
ERROR: XX000: failed NUMA pages inquiry status: Operation not
permitted
LOCATION: pg_get_shmem_allocations_numa, shmem.c:691
The best solution is to call get_mempolicy() first, and proceed to
numa_available() only when it does not fail with EPERM. Otherwise we'd
need to treat older libnuma versions as insufficient, which seems a bit
too harsh, as this only affects containerized systems.
Fix by me, based on suggestions by Christoph. Backpatch to 18, where the
NUMA functions were introduced.
Reported-by: Christoph Berg <myon@debian.org>
Reviewed-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/aPDZOxjrmEo_1JRG@msg.df7cb.de
Backpatch-through: 18
The upstream timezone code uses a bool variable as an array subscript.
Back when PostgreSQL's bool was char, this would have caused a warning
from gcc -Wchar-subscripts, which is included in -Wall. But this has
been obsolete since probably commit d26a810ebf, but certainly since
bool is now the C standard bool. So we can remove this deviation from
the upstream code, to make future code merges simpler.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/9ad2749f-77ab-4ecb-a321-1ca915480b05%40eisentraut.org
Commit 792353f7d5 updated the pg_dump and pg_dumpall documentation to
clarify which statistics are not included in their output. The pg_upgrade
documentation contained a nearly identical description, but it was not updated
at the same time.
This commit updates the pg_upgrade documentation to match those changes.
Backpatch to v18, where commit 792353f7d5 was backpatched to.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Bruce Momjian <bruce@momjian.us>
Discussion: https://postgr.es/m/CAHGQGwFnfgdGz8aGWVzgFCFwoWQU7KnFFjmxinf4RkQAkzmR+w@mail.gmail.com
Backpatch-through: 18
This commit updates encode() and decode() so that when an invalid encoding
is specified, their error message includes a HINT listing all valid encodings.
This helps users quickly see which encodings are supported without needing
to consult the documentation.
Author: Shinya Sugamoto <shinya34892@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAAe3y+99sfPv8UDF1VM-rC1i5HBdqxUh=2HrbJJFm2+i=1OwOw@mail.gmail.com
MSVCRT predated C99 and invented non-standard placeholders for 64-bit
numbers, and then later used them in standard macros when C99
<inttypes.h> arrived. The macros just use %lld etc when building with
UCRT, so there should be no way for our interposed sprintf.c code to
receive the pre-standard kind these days. Time to drop the code that
parses them.
That code was in fact already dead when commit 962da900 landed, as we'd
disclaimed MSVCRT support a couple of weeks earlier in commit 1758d424,
but patch development overlapped and the history of these macros hadn't
been investigated.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/4d8b1a67-aab2-4429-b44b-f03988095939%40eisentraut.org
If both sides of the operator have most-common-value statistics,
eqjoinsel wants to check which MCVs have matches on the other side.
Formerly it did this with a dumb compare-all-the-entries loop,
which had O(N^2) behavior for long MCV lists. When that code was
written, twenty-plus years ago, that seemed tolerable; but nowadays
people frequently use much larger statistics targets, so that the
O(N^2) behavior can hurt quite a bit.
To add insult to injury, when asked for semijoin semantics, the
entire comparison loop was done over, even though we frequently
know that it will yield exactly the same results.
To improve matters, switch to using a hash table to perform the
matching. Testing suggests that depending on the data type, we may
need up to about 100 MCVs on each side to amortize the extra costs
of setting up the hash table and performing hash-value computations;
so continue to use the old looping method when there are fewer MCVs
than that.
Also, refactor so that we don't repeat the matching work unless
we really need to, which occurs only in the uncommon case where
eqjoinsel_semi decides to truncate the set of inner MCVs it
considers. The refactoring also got rid of the need to use the
presented operator's commutator. Real-world operators that are
using eqjoinsel should pretty much always have commutators, but
at the very least this saves a few syscache lookups.
Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Co-authored-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20ea8bf5-3569-4e46-92ef-ebb2666debf6@tantorlabs.com
The fix for bug #19055 (commit b0cc0a71e) allowed CTE references in
sub-selects within aggregate functions to affect the semantic levels
assigned to such aggregates. It turns out this broke some related
cases, leading to assertion failures or strange planner errors such
as "unexpected outer reference in CTE query". After experimenting
with some alternative rules for assigning the semantic level in
such cases, we've come to the conclusion that changing the level
is more likely to break things than be helpful.
Therefore, this patch undoes what b0cc0a71e changed, and instead
installs logic to throw an error if there is any reference to a
CTE that's below the semantic level that standard SQL rules would
assign to the aggregate based on its contained Var and Aggref nodes.
(The SQL standard disallows sub-selects within aggregate functions,
so it can't reach the troublesome case and hence has no rule for
what to do.)
Perhaps someone will come along with a legitimate query that this
logic rejects, and if so probably the example will help us craft
a level-adjustment rule that works better than what b0cc0a71e did.
I'm not holding my breath for that though, because the previous
logic had been there for a very long time before bug #19055 without
complaints, and that bug report sure looks to have originated from
fuzzing not from real usage.
Like b0cc0a71e, back-patch to all supported branches, though
sadly that no longer includes v13.
Bug: #19106
Reported-by: Kamil Monicz <kamil@monicz.dev>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19106-9dd3668a0734cd72@postgresql.org
Backpatch-through: 14
The previous commit updated this file to use spaces instead of
tabs. This commit adds a test to ensure that no new tabs are
added.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aReNUKdMgKxLqmq7%40nathan
This file is written for 8-space tabs, since we expect that most
users who edit their configuration files use 8-space tabs.
However, most of PostgreSQL is written for 4-space tabs, and at
least one popular web interface defaults to 4-space tabs. Rather
than trying to standardize on a particular tab width for this file,
let's just switch to spaces.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aReNUKdMgKxLqmq7%40nathan
This commit updates .editorconfig and .gitattributes in preparation
for a follow-up commit that will modify this file to use spaces
instead of tabs.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aReNUKdMgKxLqmq7%40nathan
We need separate pairing heaps for different WaitLSNType's, because there
might be waiters for different LSN's at the same time. However, one process
can wait only for one type of LSN at a time. So, no need for inHeap
and heapNode fields to be arrays.
Discussion: https://postgr.es/m/CAPpHfdsBR-7sDtXFJ1qpJtKiohfGoj%3DvqzKVjWxtWsWidx7G_A%40mail.gmail.com
Author: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
This patch renames the sync_error_count column to sync_table_error_count
in the pg_stat_subscription_stats view. The new name makes the purpose
explicit now that a separate column exists to track sequence
synchronization errors.
Additionally, the column seq_sync_error_count is renamed to
sync_seq_error_count to maintain a consistent naming pattern, making it
easier for users to group, and query synchronization related counters.
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CALDaNm3WwJmz=-4ybTkhniB-Nf3qmFG9Zx1uKjyLLoPF5NYYXA@mail.gmail.com
If you go back as far as the RHEL7 era, <sys/auxv.h> does not provide
the HWCAPxxx macros needed with elf_aux_info or getauxval, so you need
to get those from the kernel header <asm/hwcap.h> instead. We knew
that for the 32-bit case but failed to extrapolate to the 64-bit case.
Oversight in commit aac831caf.
Reported-by: GaoZengqi <pgf00a@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFmBtr3Av62-jBzdhFkDHXJF9vQmNtSnH2upwODjnRcsgdTytw@mail.gmail.com
Backpatch-through: 18
Remove bogus stripping of RelabelTypes: that can result in building
an output SAOP tree with incorrect exposed exprType for the operands,
which might confuse polymorphic operators. Moreover it demonstrably
prevents folding some OR-trees to SAOPs when the RHS expressions
have different base types that were coerced to the same type by
RelabelTypes.
Reduce prohibition on type_is_rowtype to just disallow type RECORD.
We need that because otherwise we would happily fold multiple RECORD
Consts into a RECORDARRAY Const even if they aren't the same record
type. (We could allow that perhaps, if we checked that they all have
the same typmod, but the case doesn't seem worth that much effort.)
However, there is no reason at all to disallow the transformation
for named composite types, nor domains over them: as long as we can
find a suitable array type we're good.
Remove some assertions that seem rather out of place (it's not
this code's duty to verify that the RestrictInfo structure is
sane). Rewrite some comments.
The issues with RelabelType stripping seem severe enough to
back-patch this into v18 where the code was introduced.
Author: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAHewXN=aH7GQBk4fXU-WaEeVmQWUmBAeNyBfJ3VKzPphyPKUkQ@mail.gmail.com
Backpatch-through: 18
The documentation did not previously mention the default values for
the --fsync-interval and --plugin options, even though pg_recvlogical --help
shows them. This omission made it harder for users to understand
the tool's behavior from the documentation alone.
This commit adds the missing default value descriptions for both options
to the pg_recvlogical documentation.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/CAHGQGwFqssPBjkWMFofGq32e_tANOeWN-cM=6biAP3nnFUXMRw@mail.gmail.com
PostgreSQL 18 deprecated password_encryption='md5', but the
comments for this GUC in the sample configuration file did
not mention the deprecation. Update comments with a notice
to make as many users as possible aware of it. Also add a
comment to the related md5_password_warnings GUC while there.
Author: Michael Banck <mbanck@gmx.net>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Backpatch-through: 18
The existing format of pg_dependencies uses a single-object JSON
structure, with each key value embedding all the knowledge about the
set attributes tracked, like:
{"1 => 5": 1.000000, "5 => 1": 0.423130}
While this is a very compact format, it is confusing to read and it is
difficult to manipulate the values within the object, particularly when
tracking multiple attributes.
The new output format introduced in this commit is a JSON array of
objects, with:
- A key named "degree", with a float value.
- A key named "attributes", with an array of attribute numbers.
- A key named "dependency", with an attribute number.
The values use the same underlying type as previously when printed, with
a new output format that shows now as follows:
[{"degree": 1.000000, "attributes": [1], "dependency": 5},
{"degree": 0.423130, "attributes": [5], "dependency": 1}]
This new format will become handy for a follow-up set of changes, so as
it becomes possible to inject extended statistics rather than require an
ANALYZE, like in a dump/restore sequence or after pg_upgrade on a new
cluster.
This format has been suggested by Tomas Vondra. The key names are
defined in the header introduced by 1f927cce44, to ease the
integration of frontend-specific changes that are still under
discussion. (Again a personal note: if anybody comes up with better
name for the keys, of course feel free.)
The bulk of the changes come from the regression tests, where
jsonb_pretty() is now used to make the outputs generated easier to
parse.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
The existing format of pg_ndistinct uses a single-object JSON structure
where each key is itself a comma-separated list of attnums, like:
{"3, 4": 11, "3, 6": 11, "4, 6": 11, "3, 4, 6": 11}
While this is a very compact format, it is confusing to read and it is
difficult to manipulate the values within the object.
The new output format introduced in this commit is an array of objects,
with:
- A key named "attributes", that contains an array of attribute numbers.
- A key named "ndistinct", represented as an integer.
The values use the same underlying type as previously when printed, with
a new output format that shows now as follows:
[{"ndistinct": 11, "attributes": [3,4]},
{"ndistinct": 11, "attributes": [3,6]},
{"ndistinct": 11, "attributes": [4,6]},
{"ndistinct": 11, "attributes": [3,4,6]}]
This new format will become handy for a follow-up set of changes, so as
it becomes possible to inject extended statistics rather than require an
ANALYZE, like in a dump/restore sequence or after pg_upgrade on a new
cluster.
This format has been suggested by Tomas Vondra. The key names are
defined in a new header, to ease with the integration of
frontend-specific changes that are still under discussion. (Personal
note: I am not specifically wedded to these key names, but if there are
better name suggestions for this release, feel free.)
The bulk of the changes come from the regression tests, where
jsonb_pretty() is now used to make the outputs generated easier to
parse.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
Until d2ea2d310d, the PS_USE_PS_STRINGS
option was used on the GNU/Hurd. As this option got removed and
PS_USE_CLOBBER_ARGV appears to work fine nowadays on the Hurd, define
this one to re-enable process title changes on this platform.
In the 14 and 15 branches, the existing test for __hurd__ (added 25
years ago by commit 209aa77d, removed in 16 by the above commit) is left
unchanged for now as it was activating slightly different code paths and
would need investigation by a Hurd user.
Author: Michael Banck <mbanck@debian.org>
Discussion: https://postgr.es/m/CA%2BhUKGJMNGUAqf27WbckYFrM-Mavy0RKJvocfJU%3DJ2XcAZyv%2Bw%40mail.gmail.com
Backpatch-through: 16
Likewise for MemSetAligned.
"long" wasn't the most suitable type for these macros as with MSVC in
64-bit builds, sizeof(long) == 4, which is narrower than the processor's
word size, therefore these macros had to perform twice as many loops as
they otherwise might.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CAApHDvoGFjSA3aNyVQ3ivbyc4ST=CC5L-_VjEUQ92HbE2Cxovg@mail.gmail.com
"long" is 32 bits on Windows 64-bit. Switch to a datatype that's 64-bit
on all platforms. While we're there, use an unsigned type as these
fields count things that have occurred, of which it's not possible to
have negative numbers of.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CAApHDvoGFjSA3aNyVQ3ivbyc4ST=CC5L-_VjEUQ92HbE2Cxovg@mail.gmail.com
This new test, added in 009_log_temp_files, checks that the temporary
files created by a WITH HOLD cursor are dropped at the end of the
transaction where the transaction has been created.
The portal's executor is shutdown in PersistHoldablePortal(), after for
example some forced detoast, so as the cursor data can be accessed
without requiring a snapshot.
Author: Mircea Cadariu <cadariu.mircea@gmail.com>
Discussion: https://postgr.es/m/0a666d28-9080-4239-90d6-f6345bb43468@gmail.com
When instrumenting a MERGE command containing both WHEN NOT MATCHED BY
SOURCE and WHEN NOT MATCHED BY TARGET actions using EXPLAIN ANALYZE, a
concurrent update of the target relation could lead to an Assert
failure in show_modifytable_info(). In a non-assert build, this would
lead to an incorrect value for "skipped" tuples in the EXPLAIN output,
rather than a crash.
This could happen if the concurrent update caused a matched row to no
longer match, in which case ExecMerge() treats the single originally
matched row as a pair of not matched rows, and potentially executes 2
not-matched actions for the single source row. This could then lead to
a state where the number of rows processed by the ModifyTable node
exceeds the number of rows produced by its source node, causing
"skipped_path" in show_modifytable_info() to be negative, triggering
the Assert.
Fix this in ExecMergeMatched() by incrementing the instrumentation
tuple count on the source node whenever a concurrent update of this
kind is detected, if both kinds of merge actions exist, so that the
number of source rows matches the number of actions potentially
executed, and the "skipped" tuple count is correct.
Back-patch to v17, where support for WHEN NOT MATCHED BY SOURCE
actions was introduced.
Bug: #19111
Reported-by: Dilip Kumar <dilipbalaut@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/19111-5b06624513d301b3@postgresql.org
Backpatch-through: 17
WaitLSNWakeup() incorrectly returned early when called with
InvalidXLogRecPtr (meaning "wake all waiters"), because the fast-path
check compared minWaitedLSN > 0 without validating currentLSN first.
This caused WAIT FOR LSN commands to wait indefinitely during standby
promotion until random signals woke them.
Add an XLogRecPtrIsValid() check before the comparison so that
InvalidXLogRecPtr bypasses the fast-path and wakes all waiters immediately.
Discussion: https://postgr.es/m/CABPTF7UieOYbOgH3EnQCasaqcT1T4N6V2wammwrWCohQTnD_Lw%40mail.gmail.com
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
All GUCs in postgresql.conf.sample should be set to the default
value and be commented out. This syntax was however not tested
for, making omissions easy to miss. Add a test which check all
lines for syntax.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/19727040-3EE4-4719-AF4F-2548544113D7@yesql.se
All settings in this file should be commented out. In addition to
fixing that, also fix the indentation for this line.
Oversight in commit c758119e5b.
Reported-by: Daniel Gustafsson <daniel@yesql.se>
Author: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/19727040-3EE4-4719-AF4F-2548544113D7%40yesql.se
Backpatch-through: 18
Commit 5e4fcbe531 added a check_rights parameter to this function
for use by ALTER TABLE commands that re-create statistics objects.
However, we intentionally ignore check_rights when verifying
relation ownership because this function's lookup could return a
different answer than the caller's. This commit adds a note to
this effect so that we remember it down the road.
Reviewed-by: Noah Misch <noah@leadboat.com>
Backpatch-through: 14
Path expansion might expose characters like spaces which would cause
command failure, so double-quote the examples. While %f doesn't need
quoting since it uses a fixed character set, it is best to be
consistent.
Discussion: https://postgr.es/m/aROPCQCfvKp9Htk4@momjian.us
Backpatch-through: master
Previously, when pgbench ran a custom script that triggered retriable errors
(e.g., deadlocks) followed by multiple \syncpipeline commands in pipeline mode,
the following assertion failure could occur:
Assertion failed: (res == ((void*)0)), function discardUntilSync, file pgbench.c, line 3594.
The issue was that discardUntilSync() assumed a pipeline sync result
(PGRES_PIPELINE_SYNC) would always be followed by either another sync result
or NULL. This assumption was incorrect: when multiple sync requests were sent,
a sync result could instead be followed by another result type. In such cases,
discardUntilSync() mishandled the results, leading to the assertion failure.
This commit fixes the issue by making discardUntilSync() correctly handle cases
where a pipeline sync result is followed by other result types. It now continues
discarding results until another pipeline sync followed by NULL is reached.
Backpatched to v17, where support for \syncpipeline command in pgbench was
introduced.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/20251111105037.f3fc554616bc19891f926c5b@sraoss.co.jp
Backpatch-through: 17
This reverts commit 1fd981f053, based on concerns that the logging
improvements do not justify the protocol breakage of dropping an unnamed
portal once its execution has completed.
It seems unlikely that one would try to send an execute or describe
message after the portal has been used, but if they do such
post-completion messages would not be able to process as the previous
versions. Let's revert this change for now so as we keep compatibility
and consider a different solution.
The tests added by 76bba03312 track the pre-1fd981f05369 behavior, and
are still valid.
Discussion: https://postgr.es/m/CA+TgmoYFJyJNQw3RT7veO3M2BWRE9Aw4hprC5rOcawHZti-f8g@mail.gmail.com
Much of the "Replication Slot" chapter applies to physical and logical
slots, but it was sloppy in mentioning mostly physical slots. This
patch clarified which parts of the text apply to which slot types.
This chapter is referenced from the logical slot/subscriber chapter, so
it needs to do double duty.
Backpatch-through: master
Also mention that logical replication slots are created by default when
subscriptions are created. This should clarify the text.
Backpatch-through: master
Previously it was not clear that "physical" replication slots were being
discussed, and that they needed to be created on the primary and not the
standby.
Backpatch-through: master
While the underlying getaddrinfo call accepts a null pointer for
the hintp parameter, pg_getaddrinfo_all does not. Document this
difference with a comment to make it clear.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Sergey Tatarintsev <s.tatarintsev@postgrespro.ru>
Discussion: https://postgr.es/m/1e5efc94-407e-40b8-8b10-4d25f823c6d7@postgrespro.ru
On the CREATE POLICY page, the "Policies Applied by Command Type"
table was missing MERGE ... THEN DELETE and some of the policies
applied during INSERT ... ON CONFLICT and MERGE. Fix that, and try to
improve readability by listing the various MERGE cases separately,
rather than together with INSERT/UPDATE/DELETE. Mention COPY ... TO
along with SELECT, since it behaves in the same way. In addition,
document which policy violations cause errors to be thrown, and which
just cause rows to be silently ignored.
Also, a paragraph above the table states that INSERT ... ON CONFLICT
DO UPDATE only checks the WITH CHECK expressions of INSERT policies
for rows appended to the relation by the INSERT path, which is
incorrect -- all rows proposed for insertion are checked, regardless
of whether they end up being inserted. Fix that, and also mention that
the same applies to INSERT ... ON CONFLICT DO NOTHING.
In addition, in various other places on that page, clarify how the
different types of policy are applied to different commands, and
whether or not errors are thrown when policy checks do not pass.
Backpatch to all supported versions. Prior to v17, MERGE did not
support RETURNING, and so MERGE ... THEN INSERT would never check new
rows against SELECT policies. Prior to v15, MERGE was not supported at
all.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Viktor Holmberg <v@viktorh.net>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWqnfeChjK=n1V_dYZT4rt4mnq+ybf9c0qXDYTVMsy8pg@mail.gmail.com
Backpatch-through: 14
Issue introduced by 84fb27511d. I have missed this diff while
adding pgoff_t to the typedef list of pgindent, while addressing a
separate indentation issue.
Per buildfarm member koel.
PostgreSQL's Windows port has never been able to handle files larger
than 2GB due to the use of off_t for file offsets, only 32-bit on
Windows. This causes signed integer overflow at exactly 2^31 bytes when
trying to handle files larger than 2GB, for the routines touched by this
commit.
Note that large files are forbidden by ./configure (3c6248a828) and
meson (recent change, see 79cd66f28c). This restriction also exists
in v16 and older versions for the now-dead MSVC scripts.
The code base already defines pgoff_t as __int64 (64-bit) on Windows for
this purpose, and some function declarations in headers use it, but many
internals still rely on off_t. This commit switches more routines to
use pgoff_t, offering more portability, for areas mainly related to file
extensions and storage.
These are not critical for WAL segments yet, which have currently a
maximum size allowed of 1GB (well, this opens the door at allowing a
larger size for them). This matters more for segment files if we want
to lift the large file restriction in ./configure and meson in the
future, which would make sense to remove once/if all traces of off_t are
gone from the tree. This can additionally matter for out-of-core code
that may want files larger than 2GB in places where off_t is four bytes
in size.
Note that off_t is still used in other parts of the tree like
buffile.c, WAL sender/receiver, base backup, pg_combinebackup, etc.
These other code paths can be addressed separately, and their update
will be required if we want to remove the large file restriction in the
future. This commit is a good first cut in itself towards more
portability, hopefully.
On Unix-like systems, pgoff_t is defined as off_t, so this change only
affects Windows behavior.
Author: Bryan Green <dbryan.green@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/0f238ff4-c442-42f5-adb8-01b762c94ca1@gmail.com
pg_logical_slot_get_changes_guts() previously assigned InvalidXLogRecPtr to
the local variable upto_nchanges, which is of type int32, not XLogRecPtr.
While this caused no functional issue since InvalidXLogRecPtr is defined as 0,
it was semantically incorrect.
This commit fixes the issue by updating pg_logical_slot_get_changes_guts()
to set upto_nchanges to 0 instead of InvalidXLogRecPtr.
No backpatch is needed, as the previous behavior was harmless.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Steven Niu <niushiji@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwHKHuR5NGnGxU3+ebz7cbC1ZAR=AgG4Bueq==Lj6iX8Sw@mail.gmail.com
This comment seems to refer to some stuff that was removed during
development in 2005.
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/aRJFDxKJLFE_1Iai%40nathan
Since this is a test module, leaking a couple of LWLock tranches is
fine, but we want to discourage that pattern in third-party code.
This commit teaches the module to create only one tranche and to
store its ID in shared memory for use by other backends.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/dd36d384-55df-4fc2-825c-5bc56c950fa9%40gmail.com
If DSM entry initialization fails, backends could try to use an
uninitialized DSM segment, DSA, or dshash table (since the entry is
still added to the registry). To fix, keep track of whether
initialization completed, and ERROR if a backend tries to attach to
an uninitialized entry. We could instead retry initialization as
needed, but that seemed complicated, error prone, and unlikely to
help most cases. Furthermore, such problems probably indicate a
coding error.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/dd36d384-55df-4fc2-825c-5bc56c950fa9%40gmail.com
Backpatch-through: 17
Before we started to freeze async notify entries (commit 8eeb4a0f7c),
no one looked at the 'xid' on an entry with invalid 'dboid'. But now
we might actually need to freeze it later. Initialize them with
InvalidTransactionId to begin with, to avoid that work later.
Álvaro pointed this out in review of commit 8eeb4a0f7c, but I forgot
to include this change there.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://www.postgresql.org/message-id/202511071410.52ll56eyixx7@alvherre.pgsql
Backpatch-through: 14
Previous commit fixed a bug where VACUUM would truncate the CLOG
that's still needed to check the commit status of XIDs in the async
notify queue, but as mentioned in the commit message, it wasn't a full
fix. If a backend is executing asyncQueueReadAllNotifications() and
has just made a local copy of an async SLRU page which contains old
XIDs, vacuum can concurrently truncate the CLOG covering those XIDs,
and the backend still gets an error when it calls
TransactionIdDidCommit() on those XIDs in the local copy. This commit
fixes that race condition.
To fix, hold the SLRU bank lock across the TransactionIdDidCommit()
calls in NOTIFY processing.
Per Tom Lane's idea. Backpatch to all supported versions.
Reviewed-by: Joel Jacobson <joel@compiler.org>
Reviewed-by: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Discussion: https://www.postgresql.org/message-id/2759499.1761756503@sss.pgh.pa.us
Backpatch-through: 14
The async notification queue contains the XID of the sender, and when
processing notifications we call TransactionIdDidCommit() on the
XID. But we had no safeguards to prevent the CLOG segments containing
those XIDs from being truncated away. As a result, if a backend didn't
for some reason process its notifications for a long time, or when a
new backend issued LISTEN, you could get an error like:
test=# listen c21;
ERROR: 58P01: could not access status of transaction 14279685
DETAIL: Could not open file "pg_xact/000D": No such file or directory.
LOCATION: SlruReportIOError, slru.c:1087
To fix, make VACUUM "freeze" the XIDs in the async notification queue
before truncating the CLOG. Old XIDs are replaced with
FrozenTransactionId or InvalidTransactionId.
Note: This commit is not a full fix. A race condition remains, where a
backend is executing asyncQueueReadAllNotifications() and has just
made a local copy of an async SLRU page which contains old XIDs, while
vacuum concurrently truncates the CLOG covering those XIDs. When the
backend then calls TransactionIdDidCommit() on those XIDs from the
local copy, you still get the error. The next commit will fix that
remaining race condition.
This was first reported by Sergey Zhuravlev in 2021, with many other
people hitting the same issue later. Thanks to:
- Alexandra Wang, Daniil Davydov, Andrei Varashen and Jacques Combrink
for investigating and providing reproducable test cases,
- Matheus Alcantara and Arseniy Mukhin for review and earlier proposed
patches to fix this,
- Álvaro Herrera and Masahiko Sawada for reviews,
- Yura Sokolov aka funny-falcon for the idea of marking transactions
as committed in the notification queue, and
- Joel Jacobson for the final patch version. I hope I didn't forget
anyone.
Backpatch to all supported versions. I believe the bug goes back all
the way to commit d1e027221d, which introduced the SLRU-based async
notification queue.
Discussion: https://www.postgresql.org/message-id/16961-25f29f95b3604a8a@postgresql.org
Discussion: https://www.postgresql.org/message-id/18804-bccbbde5e77a68c2@postgresql.org
Discussion: https://www.postgresql.org/message-id/CAK98qZ3wZLE-RZJN_Y%2BTFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw@mail.gmail.com
Backpatch-through: 14
Previously, if async notify processing encountered an error, we would
report the error to the client and advance our read position past the
offending entry to prevent trying to process it over and over
again. Trying to continue after an error has a few problems however:
- We have no way of telling the client that a notification was
lost. They get an ERROR, but that doesn't tell you much. As such,
it's not clear if keeping the connection alive after losing a
notification is a good thing. Depending on the application logic,
missing a notification could cause the application to get stuck
waiting, for example.
- If the connection is idle, PqCommReadingMsg is set and any ERROR is
turned into FATAL anyway.
- We bailed out of the notification processing loop on first error
without processing any subsequent notifications. The subsequent
notifications would not be processed until another notify interrupt
arrives. For example, if there were two notifications pending, and
processing the first one caused an ERROR, the second notification
would not be processed until someone sent a new NOTIFY.
This commit changes the behavior so that any ERROR while processing
async notifications is turned into FATAL, causing the client
connection to be terminated. That makes the behavior more consistent
as that's what happened in idle state already, and terminating the
connection is a clear signal to the application that it might've
missed some notifications.
The reason to do this now is that the next commits will change the
notification processing code in a way that would make it harder to
skip over just the offending notification entry on error.
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Discussion: https://www.postgresql.org/message-id/fedbd908-4571-4bbe-b48e-63bfdcc38f64@iki.fi
Backpatch-through: 14
Explicitly document that privileges are transferred along with the
ownership. Backpatch to all supported versions since this behavior
has always been present.
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Josef Šimánek <josef.simanek@gmail.com>
Reported-by: Gilles Parc <gparc@free.fr>
Discussion: https://postgr.es/m/2023185982.281851219.1646733038464.JavaMail.root@zimbra15-e2.priv.proxad.net
Backpatch-through: 14
This creates a src/backend/catalog/pg_tablespace.c supporting file
containing a new function get_tablespace_location(), which lets the code
underlying pg_tablespace_location() be reused for other purposes.
Author: Manni Wood <manni.wood@enterprisedb.com>
Author: Nishant Sharma <nishant.sharma@enterprisedb.com>
Reviewed-by: Vaibhav Dalvi <vaibhav.dalvi@enterprisedb.com>
Reviewed-by: Ian Lawrence Barwick <barwick@gmail.com>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAKWEB6rmnmGKUA87Zmq-s=b3Scsnj02C0kObQjnbL2ajfPWGEw@mail.gmail.com
The range for commit_siblings was incorrectly listed as starting on 1
instead of 0 in the sample configuration file. Backpatch down to all
supported branches.
Author: Man Zeng <zengman@halodbtech.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/tencent_53B70BA72303AE9C6889E78E@qq.com
Backpatch-through: 14
In order to make the errorhandling code in backend libpq be thread-
safe the global variable used by the certificate verification call-
back need to be replaced with passing private data.
This moves the threadsafety needle a little but forwards, the call
to strerror_r also needs to be replaced with the error buffer made
thread local. This is left as future work for when add the thread
primitives required for this to the tree.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/353226C7-97A1-4507-A380-36AA92983AE6@yesql.se
Instead of having to write a semicolon inside the macro argument, we can
insert a semicolon with another macro layer. This no longer gives
pg_bsd_indent indigestion, so we can remove the digestive aids that had
to be installed in the pgindent Perl script.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/202511111134.njrwf5w5nbjm@alvherre.pgsql
Backpatch-through: 18
pg_resetwal didn't accept multixid 0 or multixact offset UINT32_MAX,
but they are both valid values that can appear in the control file.
That caused pg_upgrade to fail if you tried to upgrade a cluster
exactly at multixid or offset wraparound, because pg_upgrade calls
pg_resetwal to restore multixid/offset on the new cluster to the
values from the old cluster. To fix, allow those values in
pg_resetwal.
Fixes bugs #18863 and #18865 reported by Dmitry Kovalenko.
Backpatch down to v15. Version 14 has the same bug, but the patch
doesn't apply cleanly there. It could be made to work but it doesn't
seem worth the effort given how rare it is to hit this problem with
pg_upgrade, and how few people are upgrading to v14 anymore.
Author: Maxim Orlov <orlovmg@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACG%3DezaApSMTjd%3DM2Sfn5Ucuggd3FG8Z8Qte8Xq9k5-%2BRQis-g@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/18863-72f08858855344a2@postgresql.org
Discussion: https://www.postgresql.org/message-id/18865-d4c66cf35c2a67af@postgresql.org
Backpatch-through: 15
Add documentation describing sequence synchronization support in logical
replication. It explains how sequence changes are synchronized from the
publisher to the subscriber, the configuration requirements, and provide
examples illustrating setup and usage.
Additionally, document the pg_get_sequence_data() function, which allows
users to query sequence details on the publisher to determine when to
refresh corresponding sequences on the subscriber.
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
This new file is named pg_dependencies.c and includes all the code
directly related to the data type pg_dependencies, extracted from the
extended statistics code.
Some patches are under discussion to change its input and output
functions, and this separation makes the follow-up changes cleaner by
separating the logic related to the data type and the functional
dependencies statistics core logic in dependencies.c.
Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aQ2k8--a0FfwSwX9@paquier.xyz
This new file is named pg_ndistinct.c and includes all the code directly
related to the data type pg_ndistinct, extracted from the extended
statistics code.
Some patches are under discussion to change its input and output
functions, and this separation makes the follow-up changes cleaner by
separating the logic related to the data type and the multivariate
ndistinct coefficient core logic in mvdistinct.c.
Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aQ2k8--a0FfwSwX9@paquier.xyz
The synopsis for the ALTER PUBLICATION ... DROP ... command incorrectly
implied that a column list and WHERE clause could be specified as part of
the publication object. However, these options are not allowed for
DROP operations, making the documentation misleading.
This commit corrects the synopsis to clearly show only the valid forms
of publication objects.
Backpatched to v15, where the incorrect synopsis was introduced.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHut+PsPu+47Q7b0o6h1r-qSt90U3zgbAHMHUag5o5E1Lo+=uw@mail.gmail.com
Backpatch-through: 15
Previously we had both in code and comments. Keep the more common and
accepted variant.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/5EBF1771-0566-4D08-9F9B-CDCDEF4BDC98@gmail.com
The maximum limits for point name, library name, function name and
private area size were not kept track of in the tests. The new function
introduced in 16a2f70695 gives a way to trigger them. This is not
critical but cheap to cover.
While on it, this commit cleans up some of the tests introduced by
16a2f70695 for NULL inputs by using more consistent argument values.
The coverage does not change, but it makes the whole less confusing with
argument values that are correct based their position in the SQL
function called.
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/aRE7zhu6wOA29gFf@paquier.xyz
Previously, error messages for oversized injection point names, libraries,
and functions showed buffer sizes (64, 128, 128) instead of the usable
character limits (63, 127, 127) as it did not count for the
zero-terminated byte, which was confusing. These messages are adjusted
to show better the reality.
The limit enforced for the private area was also too strict by one byte,
as specifying a zone worth exactly INJ_PRIVATE_MAXLEN should be able to
work because three is no zero-terminated byte in this case.
This is a stylistic change (well, mostly, a private_area size of exactly
1024 bytes can be defined with this change, something that nobody seem
to care about based on the lack of complaints). However, this is a
testing facility let's keep the logic consistent across all the branches
where this code exists, as there is an argument in favor of out-of-core
extensions that use injection points.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CABPTF7VxYp4Hny1h+7ejURY-P4O5-K8WZg79Q3GUx13cQ6B2kg@mail.gmail.com
Backpatch-through: 17
A similar check existed in the MSVC scripts that have been removed in
v17 by 1301c80b21, but nothing of the kind was checked in meson when
building with a 4-byte off_t.
This commit adds a check to fail the builds when trying to use a
relation file size higher than 1GB when off_t is 4 bytes, like
./configure, rather than detecting these failures at runtime because the
code is not able to handle large files in this case.
Backpatch down to v16, where meson has been introduced.
Discussion: https://postgr.es/m/aQ0hG36IrkaSGfN8@paquier.xyz
Backpatch-through: 16
If you run pg_controldata on a cluster that has been initialized with
different PG_CONTROL_VERSION than what the pg_controldata program has
been compiled with, pg_controldata will still try to interpret the
control file, but the result is likely to be somewhat nonsensical. How
nonsensical it is depends on the differences between the versions. If
sizeof(ControlFileData) differs between the versions, the CRC will not
match and you get a warning of that, but otherwise you get no
warning.
Looking back at recent PG_CONTROL_VERSION updates, all changes that
would mess up the printed values have also changed
sizeof(ControlFileData), but there's no guarantee of that in future
versions.
Add an explicit check and warning for version number mismatch before
the CRC check. That way, you get a more clear warning if you use the
pg_controldata binary from wrong version, and if we change the control
file in the future in a way that doesn't change
sizeof(ControlFileData), this ensures that you get a warning in that
case too.
Discussion: https://www.postgresql.org/message-id/2afded89-f9f0-4191-84d8-8b8668e029a1@iki.fi
guc_var_compare() is invoked from qsort() on an array of struct
config_generic, but the function accesses these directly as
strings (char *). This relies on the name being the first field, so
this works. But we can write this more clearly by using the struct
and then accessing the field through the struct. Before the
reorganization of the GUC structs (commit a13833c35f), the old code
was probably more convenient, but now we can write this more clearly
and correctly.
After this change, it is no longer required that the name is the first
field in struct config_generic, so remove that comment.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/2c961fa1-14f6-44a2-985c-e30b95654e8d%40eisentraut.org
Commit 3e0ae46d90 added a field to ControlFileData and bumped
CATALOG_VERSION_NO, but CATALOG_VERSION_NO is not the right version
number for ControlFileData changes. Bumping either one will force an
initdb, but PG_CONTROL_VERSION is more accurate. Bump
PG_CONTROL_VERSION now.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/1874404.1762787779@sss.pgh.pa.us
This omission allowed table owners to create statistics in any
schema, potentially leading to unexpected naming conflicts. For
ALTER TABLE commands that require re-creating statistics objects,
skip this check in case the user has since lost CREATE on the
schema. The addition of a second parameter to CreateStatistics()
breaks ABI compatibility, but we are unaware of any impacted
third-party code.
Reported-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Security: CVE-2025-12817
Backpatch-through: 13
Several functions could overflow their size calculations, when presented
with very large inputs from remote and/or untrusted locations, and then
allocate buffers that were too small to hold the intended contents.
Switch from int to size_t where appropriate, and check for overflow
conditions when the inputs could have plausibly originated outside of
the libpq trust boundary. (Overflows from within the trust boundary are
still possible, but these will be fixed separately.) A version of
add_size() is ported from the backend to assist with code that performs
more complicated concatenation.
Reported-by: Aleksey Solovev (Positive Technologies)
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Security: CVE-2025-12818
Backpatch-through: 13
It seems plausible that someone might want to experiment with
different values. The pressing reason though is that I'm reviewing a
patch that requires pg_upgrade to manipulate SLRU files. That patch
needs to access SLRU_PAGES_PER_SEGMENT from pg_upgrade code, and
slru.h, where SLRU_PAGES_PER_SEGMENT is currently defined, cannot be
included from frontend code. Moving it to pg_config_manual.h makes it
accessible.
Now that it's a little more likely that someone might change
SLRU_PAGES_PER_SEGMENT, add a cluster compatibility check for it.
Bump catalog version because of the new field in the control file.
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://www.postgresql.org/message-id/c7a4ea90-9f7b-4953-81be-b3fcb47db057@iki.fi
While there are many tests related to relation rewrites, nothing existed
to check how the cumulative statistics behave in such cases for
relations.
A different patch is under discussion to move the relation statistics to
be tracked on a per-relfilenode basis, so as these could be rebuilt
during crash recovery. This commit gives us a way to check (and perhaps
change) the existing behaviors for several rewrite scenarios, mixing
transactions, sub-transactions, two-phase commit and VACUUM.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aQ3X20hbqoThQXgp@ip-10-97-1-34.eu-west-3.compute.internal
This new function is able to take in input more data than the existing
injection_point_attach():
- A library name.
- A function name.
- Some private data.
This gives more flexibility for tests so as these would not need to
reinvent a wrapper for InjectionPointAttach() when attaching a callback
from a library other than "injection_points". injection_point_detach()
can be used with both versions of injection_point_attach().
Author: Rahila Syed <rahilasyed.90@gmail.com>
Reviewed-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAH2L28sOG2b_TKkZU51dy+pWJtny1mqDmeFiFoUASGa0X0iiKQ@mail.gmail.com
generic-gcc.h maps our read and write barriers to C11 acquire and
release fences using compiler builtins, for platforms where we don't
have our own hand-rolled assembler. This is apparently enough for GCC,
but the C11 memory model is only defined in terms of atomic accesses,
and our barriers for non-atomic, non-volatile accesses were not always
respected under Clang's stricter interpretation of the standard.
This explains the occasional breakage observed on new RISC-V + Clang
animal greenfly in lock-free PgAioHandle manipulation code containing a
repeating pattern of loads and read barriers. The problem can also be
observed in code generated for MIPS and LoongAarch, though we aren't
currently testing those with Clang, and on x86, though we use our own
assembler there. The scariest aspect is that we use the generic version
on very common ARM systems, but it doesn't seem to reorder the relevant
code there (or we'd have debugged this long ago).
Fix by inserting an explicit compiler barrier. It expands to an empty
assembler block declared to have memory side-effects, so registers are
flushed and reordering is prevented. In those respects this is like the
architecture-specific assembler versions, but the compiler is still in
charge of generating the appropriate fence instruction. Done for write
barriers on principle, though concrete problems have only been observed
with read barriers.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Tested-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/d79691be-22bd-457d-9d90-18033b78c40a%40gmail.com
Backpatch-through: 13
The documentation stated that the directory specified by --file
must not exist, but pg_dump does allow for empty directories to
be specified and used.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Bruce Momjian <bruce@momjian.us>
Discussion: https://postgr.es/m/534AA60D-CF6B-432F-9882-E9737B33D1B7@gmail.com
This commit adds the --continue-on-error option, allowing pgbench clients
to continue running even when SQL statements fail for reasons other than
serialization or deadlock errors. Without this option (by default),
the clients aborts in such cases, which was the only available behavior
previously.
This option is useful for benchmarks using custom scripts that may
raise errors, such as unique constraint violations, where users want
pgbench to complete the run despite individual statement failures.
Author: Rintaro Ikeda <ikedarintarof@oss.nttdata.com>
Co-authored-by: Yugo Nagata <nagata@sraoss.co.jp>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Stepan Neretin <slpmcf@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/44334231a4d214fac382a69cceb7d9fc@oss.nttdata.com
This warning was disabled in meson.build (warning 4273). If you
enable it, it looks like this:
../src/backend/utils/misc/ps_status.c(27): warning C4273: '__p__environ': inconsistent dll linkage
C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt\stdlib.h(1158): note: see previous definition of '__p__environ'
The declaration in ps_status.c was:
#if !defined(WIN32) || defined(_MSC_VER)
extern char **environ;
#endif
The declaration in the OS header file is:
_DCRTIMP char*** __cdecl __p__environ (void);
#define _environ (*__p__environ())
So it is evident that this could be problematic.
The old declaration was required by the old MSVCRT library, but we
don't support that anymore with MSVC.
To fix, disable the re-declaration in ps_status.c, and also in some
other places that use the same code pattern but didn't trigger the
warning.
Then we can also re-enable the warning (delete the disablement in
meson.build).
Reviewed-by: Bryan Green <dbryan.green@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/bf060644-47ff-441b-97cf-c685d0827757@eisentraut.org
This commit adds a new column, seq_sync_error_count, to the
pg_stat_subscription_stats view. This counter tracks the number of errors
encountered by the sequence synchronization worker during operation.
Since a single worker handles the synchronization of all sequences, this
value may reflect errors from multiple sequences. This addition improves
observability of sequence synchronization behavior and helps monitor
potential issues during replication.
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
The following parameters can only be set at server start because
their context is PGC_POSTMASTER, but this information was missing
or incorrectly documented. This commit adds or corrects
that information for the following parameters:
* debug_io_direct
* dynamic_shared_memory_type
* event_source
* huge_pages
* io_max_combine_limit
* max_notify_queue_pages
* shared_memory_type
* track_commit_timestamp
* wal_decode_buffer_size
Backpatched to all supported branches.
Author: Karina Litskevich <litskevichkarina@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwGfPzcin-_6XwPgVbWTOUFVZgHF5g9ROrwLUdCTfjy=0A@mail.gmail.com
Backpatch-through: 13
If these parameters are set without units, the values are interpreted
as blocks. This detail was previously missing from the documentation,
so this commit adds it.
Backpatch to v17 where io_combine_limit was added.
Author: Karina Litskevich <litskevichkarina@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CACiT8iZCDkz1bNYQNQyvGhXWJExSnJULRTYT894u4-Ti7Yh6jw@mail.gmail.com
Backpatch-through: 17
The prior commit made it legal to modify BufferDesc.state while the buffer
header spinlock is held. This allows us to replace the CAS loop
inUnpinBufferNoOwner() with an atomic sub. This improves scalability
significantly. See the prior commits for more background.
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
Until now BufferDesc.state was not allowed to be modified while the buffer
header spinlock was held. This meant that operations like unpinning buffers
needed to use a CAS loop, waiting for the buffer header spinlock to be
released before updating.
The benefit of that restriction is that it allowed us to unlock the buffer
header spinlock with just a write barrier and an unlocked write (instead of a
full atomic operation). That was important to avoid regressions in
48354581a4. However, since then the hottest buffer header spinlock uses have
been replaced with atomic operations (in particular, the most common use of
PinBuffer_Locked(), in GetVictimBuffer() (formerly in BufferAlloc()), has been
removed in 5e89985928).
This change will allow, in a subsequent commit, to release buffer pins with a
single atomic-sub operation. This previously was not possible while such
operations were not allowed while the buffer header spinlock was held, as an
atomic-sub would not have allowed a race-free check for the buffer header lock
being held.
Using atomic-sub to unpin buffers is a nice scalability win, however it is not
the primary motivation for this change (although it would be sufficient). The
primary motivation is that we would like to merge the buffer content lock into
BufferDesc.state, which will result in more frequent changes of the state
variable, which in some situations can cause a performance regression, due to
an increased CAS failure rate when unpinning buffers. The regression entirely
vanishes when using atomic-sub.
Naively implementing this would require putting CAS loops in every place
modifying the buffer state while holding the buffer header lock. To avoid
that, introduce UnlockBufHdrExt(), which can set/add flags as well as the
refcount, together with releasing the lock.
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
A couple of ereports were making use of StringInfos as temporary storage
for the portions of the WARNING message. One was doing this to avoid
having 2 separate ereports. This was all fairly unnecessary and
resulted in more code rather than less code.
Refactor out the additional StringInfos and make
check_publications_origin_tables() use 2 ereports.
In passing, adjust pubnames to become a stack-allocated StringInfoData to
avoid having to palloc the temporary StringInfoData. This follows on
from the efforts made in 6d0eba662.
Author: Mats Kindahl <mats.kindahl@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/0b381b02-cab9-41f9-a900-ad6c8d26c1fc%40gmail.com
Now that commit 06edbed478 has introduced XLogRecPtrIsValid(), we can
use that instead of:
- XLogRecPtrIsInvalid()
- direct comparisons with InvalidXLogRecPtr
- direct comparisons with literal 0
This makes the code more consistent.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aQB7EvGqrbZXrMlg@ip-10-97-1-34.eu-west-3.compute.internal
XLogRecPtrIsInvalid() is inconsistent with the affirmative form of
macros used for other datatypes, and leads to awkward double negatives
in a few places. This commit introduces XLogRecPtrIsValid(), which
allows code to be written more naturally.
This patch only adds the new macro. XLogRecPtrIsInvalid() is left in
place, and all existing callers remain untouched. This means all
supported branches can accept hypothetical bug fixes that use the new
macro, and at the same time any code that compiled with the original
formulation will continue to silently compile just fine.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/aQB7EvGqrbZXrMlg@ip-10-97-1-34.eu-west-3.compute.internal
... and remove the list of \pset options from the general \? output.
That list was getting out of hand, both for developers to keep up to
date as well as for users to read.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/202511041638.dm4qukcxfjto@alvherre.pgsql
Stored generated columns are not yet computed when the filtering
happens, so we need to prohibit them to avoid incorrect behavior.
Virtual generated columns currently error out ("unexpected virtual
generated column reference"). They could probably work if we expand
them in the right place, but for now let's keep them consistent with
the stored variant. This doesn't change the behavior, it only gives a
nicer error message.
Co-authored-by: jian he <jian.universality@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CACJufxHb8YPQ095R_pYDr77W9XKNaXg5Rzy-WP525mkq+hRM3g@mail.gmail.com
Before commit e25626677f, spinlocks were implemented using semaphores
on some platforms (--disable-spinlocks). That made it necessary to
initialize semaphores early, before any spinlocks could be used. Now
that we don't support --disable-spinlocks anymore, we can allocate the
shared memory needed for semaphores the same way as other shared
memory structures. Since the semaphores are used only in the PGPROC
array, move the semaphore shmem size estimation and initialization
calls to ProcGlobalShmemSize() and InitProcGlobal().
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAExHW5seSZpPx-znjidVZNzdagGHOk06F+Ds88MpPUbxd1kTaA@mail.gmail.com
Test failure on buildfarm member prion:
The test failed due to an unexpected LOCATION: line appearing between the
WARNING and ERROR messages. This occurred because the prion machine uses
log_error_verbosity = verbose, which includes additional context in error
messages. The test was originally checking for both WARNING and ERROR
messages in sequence sync, but the extra LOCATION: line disrupted this
pattern. To make the test robust across different verbosity settings, it
now only checks for the presence of the WARNING message after the test,
which is sufficient to validate the intended behavior.
Failure to sync sequences with quoted names:
The previous implementation did not correctly quote sequence names when
querying remote information, leading to failures when quoted sequence
names were used. This fix ensures that sequence names are properly quoted
during remote queries, allowing sequences with quoted identifiers to be
synced correctly.
Author: Vignesh C <vignesh21@gmail.com>
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CALDaNm0WcdSCoNPiE-5ek4J2dMJ5o111GPTzKCYj9G5i=ONYtQ@mail.gmail.com
Discussion: https://postgr.es/m/CAOzEurQOSN=Zcp9uVnatNbAy=2WgMTJn_DYszYjv0KUeQX_e_A@mail.gmail.com
Since OpenBSD core dumps do not embed executable paths, the script now
searches for the corresponding binary manually within the specified
directory before invoking LLDB. This is imperfect but should find the
right executable in practice, as needed for meaningful backtraces.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ36R74TZ8RKsFueYwLxGKDAm3LU2FHM_ZUCSB6imd3vYA@mail.gmail.com
Backpatch-through: 18
Like relation_stats.c, these structures are used to track the argument
number, names and types of pg_restore_attribute_stats() and
pg_clear_attribute_stats().
Extracted from a larger patch by the same author, reworded by me for
consistency with relation_stats.c.
Author: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
If sizeof off_t is 4, then configure will print a line saying just "0"
after the test. This is the output of the following "expr" command.
If we are using expr just for the exit code, the output should be sent
to /dev/null, as is done elsewhere.
The previous code had a set of warnings to disable on MSVC. But some
of these weren't actually enabled by default anyway, only in higher
MSVC warning levels (/W, maps to meson warning_level). I rearranged
this so that it is clearer in what MSVC warning level a warning would
have been enabled. Furthermore, sort them numerically within the
levels.
Moreover, we can add a few warning types to the default set, to get a
similar set of warnings that we get by default with gcc or clang (the
equivalents of -Wswitch and -Wformat).
Reviewed-by: Bryan Green <dbryan.green@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/bf060644-47ff-441b-97cf-c685d0827757@eisentraut.org
03d40e4b5 added code to provide better row estimates for when a UNION
query ended up only with a single child due to other children being
found to be dummy rels. In that case, ordinarily it would be ok to call
estimate_num_groups() on the targetlist of the only child path, however
that's not safe to do if the UNION child is the result of some other set
operation as we generate targetlists containing Vars with varno==0 for
those, which estimate_num_groups() can't handle. This could lead to:
ERROR: XX000: no relation entry for relid 0
Fix this by avoiding doing this when the only child is the result of
another set operation. In that case we'll fall back on the
assume-all-rows-are-unique method.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/cfbc99e5-9d44-4806-ba3c-f36b57a85e21@gmail.com
postgres_fdw supports EvalPlanQual testing by using the infrastructure
provided by the core with the RecheckForeignScan callback routine (cf.
commits 5fc4c26db and 385f337c9), but there has been no test coverage
for that, except that recent commit 12609fbac, which fixed an issue in
commit 385f337c9, added a test case to exercise only a code path added
by that commit to the core infrastructure. So let's add test cases to
exercise other code paths as well at this time.
Like commit 12609fbac, back-patch to all supported branches.
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/CAPmGK15%2B6H%3DkDA%3D-y3Y28OAPY7fbAdyMosVofZZ%2BNc769epVTQ%40mail.gmail.com
Backpatch-through: 13
Use uppercase SQL keywords consistently throughout the documentation to
ease reading. Also add whitespace in a couple of places where it
improves readability.
Author: Erik Wienhold <ewie@ewie.name>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/82eb512b-8ed2-46be-b311-54ffd26978c4%40ewie.name
Various places that were using StringInfo but didn't need that
StringInfo to exist beyond the scope of the function were using
makeStringInfo(), which allocates both a StringInfoData and the buffer it
uses as two separate allocations. It's more efficient for these cases to
use a StringInfoData on the stack and initialize it with initStringInfo(),
which only allocates the string buffer. This also simplifies the cleanup,
in a few cases.
Author: Mats Kindahl <mats.kindahl@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/4379aac8-26f1-42f2-a356-ff0e886228d3@gmail.com
If any shell command fails, the whole script should fail. To avoid
future omissions, add this even for single-command scripts that use su
with heredoc syntax, as they might be extended or copied-and-pasted.
Extracted from a larger patch that wanted to use #error during
compilation, leading to the diagnosis of this problem.
Reviewed-by: Tristan Partin <tristan@partin.io> (earlier version)
Discussion: https://postgr.es/m/DDZP25P4VZ48.3LWMZBGA1K9RH%40partin.io
Backpatch-through: 15
We've successfully used libsanitizer for awhile with the undefined
and alignment sanitizers, but with some other sanitizers (at least
thread and hwaddress) it crashes due to internal recursion before
it's fully initialized itself. It turns out that that's due to the
"__ubsan_default_options" hack installed by commit f686ae82f, and we
can fix it by ensuring that __ubsan_default_options is built without
any sanitizer instrumentation hooks.
Reported-by: Emmanuel Sibi <emmanuelsibi.mec@gmail.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Diagnosed-by: Emmanuel Sibi <emmanuelsibi.mec@gmail.com>
Fix-suggested-by: Jacob Champion <jacob.champion@enterprisedb.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/F7543B04-E56C-4D68-A040-B14CCBAD38F1@gmail.com
Discussion: https://postgr.es/m/dbf77bf7-6e54-ed8a-c4ae-d196eeb664ce@gmail.com
Backpatch-through: 16
This section introduces temporal tables, with a focus on Application
Time (which we support) and only a brief mention of System Time (which
we don't). It covers temporal primary keys, unique constraints, and
temporal foreign keys. We will document temporal update/delete and
periods as we add those features.
This commit also adds glossary entries for temporal table, application
time, and system time.
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://www.postgresql.org/message-id/flat/ec498c3d-5f2b-48ec-b989-5561c8aa2024@illuminatedcomputing.com
WAIT FOR is to be used on standby and specifies waiting for
the specific WAL location to be replayed. This option is useful when
the user makes some data changes on primary and needs a guarantee to see
these changes are on standby.
WAIT FOR needs to wait without any snapshot held. Otherwise, the snapshot
could prevent the replay of WAL records, implying a kind of self-deadlock.
This is why separate utility command seems appears to be the most robust
way to implement this functionality. It's not possible to implement this as
a function. Previous experience shows that stored procedures also have
limitation in this aspect.
Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com
Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Implement a new facility that allows processes to wait for WAL to reach
specific LSNs, both on primary (waiting for flush) and standby (waiting
for replay) servers.
The implementation uses shared memory with per-backend information
organized into pairing heaps, allowing O(1) access to the minimum
waited LSN. This enables fast-path checks: after replaying or flushing
WAL, the startup process or WAL writer can quickly determine if any
waiters need to be awakened.
Key components:
- New xlogwait.c/h module with WaitForLSNReplay() and WaitForLSNFlush()
- Separate pairing heaps for replay and flush waiters
- WaitLSN lightweight lock for coordinating shared state
- Wait events WAIT_FOR_WAL_REPLAY and WAIT_FOR_WAL_FLUSH for monitoring
This infrastructure can be used by features that need to wait for WAL
operations to complete.
Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com
Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
The existing pairingheap_allocate() uses palloc(), which allocates
from process-local memory. For shared memory use cases, the pairingheap
structure must be allocated via ShmemAlloc() or embedded in a shared
memory struct. Add pairingheap_initialize() to initialize an already-
allocated pairingheap structure in-place, enabling shared memory usage.
Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com
Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
In generate_orderedappend_paths(), the function does not handle the
case where the paths in total_subpaths and fractional_subpaths are
identical. This situation is not uncommon, and as a result, it may
generate two exactly identical ordered append paths.
Fix by checking whether total_subpaths and fractional_subpaths contain
the same paths, and skipping creation of the ordered append path for
the fractional case when they are identical.
Given the lack of field complaints about this, I'm a bit hesitant to
back-patch, but let's clean it up in HEAD.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-OYsgA75tGGiBARt87G0y_z_GBTSLrzudcJxAzndYkYw@mail.gmail.com
In generate_orderedappend_paths(), there is an assumption that a child
relation's row estimate is always greater than zero. There is an
Assert verifying this assumption, and the estimate is also used to
convert an absolute tuple count into a fraction.
However, this assumption is not always valid -- for example, upper
relations can have their row estimates unset, resulting in a value of
zero. This can cause an assertion failure in debug builds or lead to
the tuple fraction being computed as infinity in production builds.
To fix, use the row estimate from the cheapest_total path to compute
the tuple fraction. The row estimate in this path should already have
been forced to a valid value.
In passing, update the comment for generate_orderedappend_paths() to
note that the function also considers the cheapest-fractional case
when not all tuples need to be retrieved. That is, it collects all
the cheapest fractional paths and builds an ordered append path for
each interesting ordering.
Backpatch to v18, where this issue was introduced.
Bug: #19102
Reported-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/19102-93480667e1200169@postgresql.org
Backpatch-through: 18
The test introduced by 17b2d5ec75 verifies that a WAL receiver
survives across a timeline jump by searching the server logs for
termination messages. However, it called restart() before the timeline
switch, which kills the WAL receiver and may log the exact message being
checked, hence failing the test. As TAP tests reuse the same log file
across restarts, a rotate_logfile() is used before the restart so as the
log matching check is not impacted by log entries generated by a
previous shutdown.
Recent changes to file handle inheritance altered I/O timing enough to
make this fail consistently while testing another patch.
While on it, this adds an extra check based on a PID comparison. This
test may lead to false positives as it could be possible that the WAL
receiver has processed a timeline jump before the initial PID is
grabbed, but it should be good enough in most cases.
Like 17b2d5ec75, backpatch down to v13.
Author: Bryan Green <dbryan.green@gmail.com>
Co-authored-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/9d00b597-d64a-4f1e-802e-90f9dc394c70@gmail.com
Backpatch-through: 13
This patch introduces sequence synchronization. Sequences that are synced
will have 2 states:
- INIT (needs [re]synchronizing)
- READY (is already synchronized)
A new sequencesync worker is launched as needed to synchronize sequences.
A single sequencesync worker is responsible for synchronizing all
sequences. It begins by retrieving the list of sequences that are flagged
for synchronization, i.e., those in the INIT state. These sequences are
then processed in batches, allowing multiple entries to be synchronized
within a single transaction. The worker fetches the current sequence
values and page LSNs from the remote publisher, updates the corresponding
sequences on the local subscriber, and finally marks each sequence as
READY upon successful synchronization.
Sequence synchronization occurs in 3 places:
1) CREATE SUBSCRIPTION
- The command syntax remains unchanged.
- The subscriber retrieves sequences associated with publications.
- Published sequences are added to pg_subscription_rel with INIT
state.
- Initiate the sequencesync worker to synchronize all sequences.
2) ALTER SUBSCRIPTION ... REFRESH PUBLICATION
- The command syntax remains unchanged.
- Dropped published sequences are removed from pg_subscription_rel.
- Newly published sequences are added to pg_subscription_rel with INIT
state.
- Initiate the sequencesync worker to synchronize only newly added
sequences.
3) ALTER SUBSCRIPTION ... REFRESH SEQUENCES
- A new command introduced for PG19 by f0b3573c3a.
- All sequences in pg_subscription_rel are reset to INIT state.
- Initiate the sequencesync worker to synchronize all sequences.
- Unlike "ALTER SUBSCRIPTION ... REFRESH PUBLICATION" command,
addition and removal of missing sequences will not be done in this
case.
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
Previously, unnamed portals were kept until the next Bind message or the
end of the transaction. This could cause temporary files to persist
longer than expected and make logging not reflect the actual SQL
responsible for the temporary file.
This patch changes exec_execute_message() to drop unnamed portals
immediately after execution to completion at the end of an Execute
message, making their removal more aggressive. This forces temporary
file cleanups to happen at the same time as the completion of the portal
execution, with statement logging correctly reflecting to which
statements these temporary files were attached to (see the diffs in the
TAP test updated by this commit for an idea).
The documentation is updated to describe the lifetime of unnamed
portals, and test cases are updated to verify temporary file removal and
proper statement logging after unnamed portal execution. This changes
how unnamed portals are handled in the protocol, hence no backpatch is
done.
Author: Frédéric Yhuel <frederic.yhuel@dalibo.com>
Co-Authored-by: Sami Imseih <samimseih@gmail.com>
Co-Authored-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0tTrTUoEr3kDXCuKsvqYGq8OOHiBwoD-dyJocq95uEOTQ%40mail.gmail.com
The comment for ChangeVarNodes() refers to a parameter named
change_RangeTblRef, which does not exist in the code.
The comment for ChangeVarNodesExtended() contains an extra space,
while the comment for replace_relid_callback() has an awkward line
break and a typo.
This patch fixes these issues and revises some sentences for smoother
wording.
Oversights in commits ab42d643c and fc069a3a6.
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs480j16HC1JtjKCgj5WshivT8ZJYkOfTyZAM0POjFomJkg@mail.gmail.com
Backpatch-through: 18
These assertions may prove to become useful to make sure that no process
other than the startup process calls the routines where these checks are
added, as we expect that these do not interfere with a WAL receiver
switched to a "stopping" state by a startup process.
The assumption that only the startup process can use this code has
existed for many years, without a check enforcing it.
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/aQmGeVLYl51y1m_0@paquier.xyz
First, the assertions in assign_io_method() were the wrong way round. Second,
the lengthof() assertion checked the length of io_method_options, which is the
wrong array to check and is always longer than pgaio_method_ops_table.
While add it, add a static assert to ensure pgaio_method_ops_table and
io_method_options stay in sync.
Per coverity and Tom Lane.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 18
In 2a0faed9d7, which added JIT compilation support for expressions, I
accidentally used sizeof(LLVMBasicBlockRef *) instead of
sizeof(LLVMBasicBlockRef) as part of computing the size of an allocation. That
turns out to have no real negative consequences due to LLVMBasicBlockRef being
a pointer itself (and thus having the same size). It still is wrong and
confusing, so fix it.
Reported by coverity.
Backpatch-through: 13
Allow pg_newlocale_from_collation(C_COLLATION_OID) to work even if
there's no catalog access, which some extensions expect.
Not known to be a bug without extensions involved, but backport to 18.
Also corrects an issue in master with dummy_c_locale (introduced in
commit 5a38104b36) where deterministic was not set. That wasn't a bug,
but could have been if that structure was used more widely.
Reported-by: Alexander Kukushkin <cyberdemn@gmail.com>
Reviewed-by: Alexander Kukushkin <cyberdemn@gmail.com>
Discussion: https://postgr.es/m/CAFh8B=nj966ECv5vi_u3RYij12v0j-7NPZCXLYzNwOQp9AcPWQ@mail.gmail.com
Backpatch-through: 18
This commit adds CHECK_FOR_INTERRUPTS to the shared buffer iteration
loops in EvictRelUnpinnedBuffers and EvictAllUnpinnedBuffers. These
functions, used by pg_buffercache's pg_buffercache_evict_relation and
pg_buffercache_evict_all, can now be interrupted during long-running
operations.
Backpatch to version 18, where these functions and their corresponding
pg_buffercache functions were introduced.
Author: Yuhang Qiu <iamqyh@gmail.com>
Discussion: https://postgr.es/m/8DC280D4-94A2-4E7B-BAB9-C345891D0B78%40gmail.com
Backpatch-through: 18
03d40e4b5 allowed dummy UNION [ALL] children to be removed from the plan
by checking for is_dummy_rel(). That commit neglected to still account
for the relids from the dummy rel so that the correct UPPERREL_SETOP
RelOptInfo could be found and used for adding the Paths to.
Not doing this could result in processing of subsequent UNIONs using the
same RelOptInfo as a previously processed UNION, which could result in
add_path() freeing old Paths that are needed by the previous UNION.
The same fix was independently submitted (2 mins later) by Richard Guo.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/bee34aec-659c-46f1-9ab7-7bbae0b7616c@gmail.com
Commit a95e3d84c0 added ActiveSnapshot push+pop when processing
work-items (BRIN autosummarization), but forgot to handle the case of
a transaction failing during the run, which drops the snapshot untimely.
Fix by making the pop conditional on an element being actually there.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Backpatch-through: 13
Discussion: https://postgr.es/m/202511041648.nofajnuddmwk@alvherre.pgsql
The parallel GIN builds perform "freezing" of TID lists when merging
chunks built earlier. This means determining what part of the list can
no longer change, depending on the last received chunk. The frozen part
can be evicted from memory and written out.
The code attempted to freeze items right before merging the old and new
TID list, after already attempting to trim the current buffer. That
means part of the data may get frozen based on the new TID list, but
will be trimmed later (on next loop). This increases memory usage.
This inverts the order, so that we freeze data first (before trimming).
The benefits are likely relatively small, but it's also virtually free
with no other downsides.
Discussion: https://postgr.es/m/CAHLJuCWDwn-PE2BMZE4Kux7x5wWt_6RoWtA0mUQffEDLeZ6sfA@mail.gmail.com
This commit enhances tab completion for both COPY FROM and COPY TO
commands to suggest STDIN and STDOUT, respectively.
To make suggesting both file names and keywords easier, it introduces
a new COMPLETE_WITH_FILES_PLUS() macro.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20250605100835.b396f9d656df1018f65a4556@sraoss.co.jp
Debian Trixie CI images are generated now [1], so use them with the
following changes:
- detect_stack_use_after_return=0 option is added to the ASAN_OPTIONS
because ASAN uses a "shadow stack" to track stack variable lifetimes
and this confuses Postgres' stack depth check [2].
- Perl is updated to the newer version (perl5.40-i386-linux-gnu).
- LLVM-14 is no longer default installation, no need to force using
LLVM-16.
- Switch MinGW CC/CXX to x86_64-w64-mingw32ucrt-* to fix build failure
from missing _iswctype_l in mingw-w64 v12 headers.
[1] https://github.com/anarazel/pg-vm-images/commit/35a144793f
[2] https://postgr.es/m/20240130212304.q66rquj5es4375ab%40awork3.anarazel.de
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ1_B1usTskAv+AYt1bA7abVd9YH6XrUUSbr-2Z0d5Wd8w@mail.gmail.com
Backpatch: 15-, where CI support was added
When building intermediate TID lists during parallel GIN builds, split
the sorted lists into smaller chunks, to limit the amount of memory
needed when merging the chunks later.
The leader may need to keep in memory up to one chunk per worker, and
possibly one extra chunk (before evicting some of the data). The code
processing item pointers uses regular palloc/repalloc calls, which means
it's subject to the MaxAllocSize (1GB) limit.
We could fix this by allowing huge allocations, but that'd require
changes in many places without much benefit. Larger chunks do not
actually improve performance, so the memory usage would be wasted.
Fixed by limiting the chunk size to not hit MaxAllocSize. Each worker
gets a fair share.
This requires remembering the number of participating workers, in a
place that can be accessed from the callback. Luckily, the bs_worker_id
field in GinBuildState was unused, so repurpose that.
Report by Greg Smith, investigation and fix by me. Batchpatched to 18,
where parallel GIN builds were introduced.
Reported-by: Gregory Smith <gregsmithpgsql@gmail.com>
Discussion: https://postgr.es/m/CAHLJuCWDwn-PE2BMZE4Kux7x5wWt_6RoWtA0mUQffEDLeZ6sfA@mail.gmail.com
Backpatch-through: 18
We have never had a SET syntax that allows setting a GUC_LIST_INPUT
parameter to be an empty list. A locution such as
SET search_path = '';
doesn't mean that; it means setting the GUC to contain a single item
that is an empty string. (For search_path the net effect is much the
same, because search_path ignores invalid schema names and '' must be
invalid.) This is confusing, not least because configuration-file
entries and the set_config() function can easily produce empty-list
values.
We considered making the empty-string syntax do this, but that would
foreclose ever allowing empty-string items to be valid in list GUCs.
While there isn't any obvious use-case for that today, it feels like
the kind of restriction that might hurt someday. Instead, let's
accept the forbidden-up-to-now value NULL and treat that as meaning an
empty list. (An objection to this could be "what if we someday want
to allow NULL as a GUC value?". That seems unlikely though, and even
if we did allow it for scalar GUCs, we could continue to treat it as
meaning an empty list for list GUCs.)
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrei Klychkov <andrew.a.klychkov@gmail.com>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Discussion: https://postgr.es/m/CA+mfrmwsBmYsJayWjc8bJmicxc3phZcHHY=yW5aYe=P-1d_4bg@mail.gmail.com
A generated column may end up being part of the partition key
expression, if it's specified as an expression e.g. "(<generated
column name>)" or if the partition key expression contains a whole-row
reference, even though we do not allow a generated column to be part
of partition key expression. Fix this hole.
Co-authored-by: jian he <jian.universality@gmail.com>
Co-authored-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Discussion: https://www.postgresql.org/message-id/flat/CACJufxF%3DWDGthXSAQr9thYUsfx_1_t9E6N8tE3B8EqXcVoVfQw%40mail.gmail.com
It's possible to define BRIN indexes on functions that require a
snapshot to run, but the autosummarization feature introduced by commit
7526e10224 fails to provide one. This causes autovacuum to leave a
BRIN placeholder tuple behind after a failed work-item execution, making
such indexes less efficient. Repair by obtaining a snapshot prior to
running the task, and add a test to verify this behavior.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reported-by: Giovanni Fabris <giovanni.fabris@icon.it>
Reported-by: Arthur Nascimento <tureba@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/202511031106.h4fwyuyui6fz@alvherre.pgsql
Previously, passwordFromFile() returned NULL for valid cases (like no
matching password found) and actual errors (two out-of-memory paths).
This made it impossible for its sole caller, pqConnectOptions2(), to
distinguish between these scenarios and fail the connection
appropriately should an out-of-memory error occur.
This patch extends passwordFromFile() to be able to detect both valid
and failure cases, with an error string given back to the caller of the
function.
Out-of-memory failures unlikely happen in the field, so no backpatch is
done.
Author: Joshua Shanks <jjshanks@gmail.com>
Discussion: https://postgr.es/m/CAOxqWDfihFRmhNVdfu8epYTXQRxkCHSOrg+=-ij2c_X3gW=o3g@mail.gmail.com
When the startup process switches from streaming to archive as WAL
source, we avoid calling ShutdownWalRcv() if the WAL receiver is not
streaming, based on WalRcvStreaming(). WALRCV_STOPPING is a state set
by ShutdownWalRcv(), called only by the startup process, meaning that it
should not be possible to reach this state while in
WaitForWALToBecomeAvailable().
This commit adds an assertion to make sure that a WAL receiver is never
in a WALRCV_STOPPING state should the startup process attempt to reset
InstallXLogFileSegmentActive.
Idea suggested by Noah Misch.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/19093-c4fff49a608f82a0@postgresql.org
This has come up as useful as an alternative of WalRcvStreaming(), to be
able to do sanity checks based on the state of a WAL receiver. This
will be used in a follow-up commit.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/19093-c4fff49a608f82a0@postgresql.org
Commit b4f584f9d2 (affecting v15~, later backpatched down to 13 as of
3635a0a35a) introduced an unconditional WAL receiver shutdown when
switching from streaming to archive WAL sources. This causes problems
during a timeline switch, when a WAL receiver enters WALRCV_WAITING
state but remains alive, waiting for instructions.
The unconditional shutdown can break some monitoring scenarios as the
WAL receiver gets repeatedly terminated and re-spawned, causing
pg_stat_wal_receiver.status to show a "streaming" instead of "waiting"
status, masking the fact that the WAL receiver is waiting for a new TLI
and a new LSN to be able to continue streaming.
This commit changes the WAL receiver behavior so as the shutdown becomes
conditional, with InstallXLogFileSegmentActive being always reset to
prevent the regression fixed by b4f584f9d2: only terminate the WAL
receiver when it is actively streaming (WALRCV_STREAMING,
WALRCV_STARTING, or WALRCV_RESTARTING). When in WALRCV_WAITING state,
just reset InstallXLogFileSegmentActive flag to allow archive
restoration without killing the process. WALRCV_STOPPED and
WALRCV_STOPPING are not reachable states in this code path. For the
latter, the startup process is the one in charge of setting
WALRCV_STOPPING via ShutdownWalRcv(), waiting for the WAL receiver to
reach a WALRCV_STOPPED state after switching walRcvState, so
WaitForWALToBecomeAvailable() cannot be reached while a WAL receiver is
in a WALRCV_STOPPING state.
A regression test is added to check that a WAL receiver is not stopped
on timeline jump, that fails when the fix of this commit is reverted.
Reported-by: Ryan Bird <ryanzxg@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/19093-c4fff49a608f82a0@postgresql.org
Backpatch-through: 13
The new wal_fpi_bytes counter calculates the total amount of full page
images inserted in WAL records, in bytes. This commit adds this
information to VACUUM and ANALYZE logs alongside the existing counters,
building upon f9a09aa295.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aQMMSSlFXy4Evxn3@paquier.xyz
The order in this list was previously pretty random and had grown
organically over time. This made it unnecessarily cumbersome to
maintain these lists, as there was no clear guidelines about where to
put new entries. Also, after the merger of the type-specific GUC
structs, the list still reflected the previous type-specific
super-order.
By using alphabetical order, the place for new entries becomes clear,
and often related entries will be listed close together.
This patch reorders the existing entries in guc_parameters.dat, and it
also augments the generation script to error if an entry is found at
the wrong place.
Note: The order is actually checked after lower-casing, to handle the
likes of "DateStyle".
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/8fdfb91e-60fb-44fa-8df6-f5dea47353c9@eisentraut.org
We've been nibbling away at removing uses of "long" for a long time,
since its width is platform-dependent. Here's one more: change the
remaining "long" fields in Plan nodes to Cardinality, since the three
surviving examples all represent group-count estimates. The upstream
planner code was converted to Cardinality some time ago; for example
the corresponding fields in Path nodes are type Cardinality, as are
the arguments of the make_foo_path functions. Downstream in the
executor, it turns out that these all feed to the table-size argument
of BuildTupleHashTable. Change that to "double" as well, and fix it
so that it safely clamps out-of-range values to the uint32 limit of
simplehash.h, as was not being done before.
Essentially, this is removing all the artificial datatype-dependent
limitations on these values from upstream processing, and applying
just one clamp at the moment where we're forced to do so by the
datatype choices of simplehash.h.
Also, remove BuildTupleHashTable's misguided attempt to enforce
work_mem/hash_mem_limit. It doesn't have enough information
(particularly not the expected tuple width) to do that accurately,
and it has no real business second-guessing the caller's choice.
For all these plan types, it's really the planner's responsibility
to not choose a hashed implementation if the hashtable is expected
to exceed hash_mem_limit. The previous patch improved the
accuracy of those estimates, and even if BuildTupleHashTable had
more information it should arrive at the same conclusions.
Reported-by: Jeff Janes <jeff.janes@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
For several types of plan nodes that use TupleHashTables, the
planner estimated the expected size of the table as basically
numEntries * (MAXALIGN(dataWidth) + MAXALIGN(SizeofHeapTupleHeader)).
This is pretty far off, especially for small data widths, because
it doesn't account for the overhead of the simplehash.h hash table
nor for any per-tuple "additional space" the plan node may request.
Jeff Janes noted a case where the estimate was off by about a factor
of three, even though the obvious hazards such as inaccurate estimates
of numEntries or dataWidth didn't apply.
To improve matters, create functions provided by the relevant executor
modules that can estimate the required sizes with reasonable accuracy.
(We're still not accounting for effects like allocator padding, but
this at least gets the first-order effects correct.)
I added functions that can estimate the tuple table sizes for
nodeSetOp and nodeSubplan; these rely on an estimator for
TupleHashTables in general, and that in turn relies on one for
simplehash.h hash tables. That feels like kind of a lot of mechanism,
but if we take any short-cuts we're violating modularity boundaries.
The other places that use TupleHashTables are nodeAgg, which took
pains to get its numbers right already, and nodeRecursiveunion.
I did not try to improve the situation for nodeRecursiveunion because
there's nothing to improve: we are not making an estimate of the hash
table size, and it wouldn't help us to do so because we have no
non-hashed alternative implementation. On top of that, our estimate
of the number of entries to be hashed in that module is so suspect
that we'd likely often choose the wrong implementation if we did have
two ways to do it.
Reported-by: Jeff Janes <jeff.janes@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
Add comments explaining when and where it is safe for nbtree to treat
row compare keys as if they were simple scalar inequality keys on the
row's most significant column. This is particularly important within
_bt_advance_array_keys, which deals with required inequality keys in a
general and uniform way, without any special handling for row compares.
Also spell out the implications of _bt_check_rowcompare's approach of
_conditionally_ evaluating lower-order row compare subkeys, particularly
when one of its lower-order subkeys might see NULL index tuple values
(these may or may not affect whether the qual as a whole is satisfied).
The behavior in this area isn't particularly intuitive, so these issues
seem worth going into.
In passing, add a few more defensive/documenting row comparison related
assertions to _bt_first and _bt_check_rowcompare.
Follow-up to commits bd3f59fd and ec986020.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Reviewed-By: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wznwkak_K7pcAdv9uH8ZfNo8QO7+tHXOaCUddMeTfaCCFw@mail.gmail.com
Backpatch-through: 18
_bt_first reliably uses the same equality key (on each index column) for
initial positioning purposes as the one that _bt_checkkeys can use to
end the scan following commit f09816a0. _bt_first no longer applies its
own independent rules to determine which initial positioning key to use
on each column (for equality and inequality keys alike). Preprocessing
is now fully in control of determining which keys start and end each
scan, ensuring that _bt_first and _bt_checkkeys have symmetric behavior.
Remove obsolete comments that described why _bt_first was expected to
use at least one of the available required equality keys for initial
positioning purposes. The rules in this area are now maximally strict
and uniform, so there's no reason to draw attention to equality keys.
Any column with a required equality key cannot have a redundant required
inequality key (nor can it have a redundant required equality key).
Oversight in commit f09816a0, which removed similar comments from
_bt_first, but missed these comments.
Author: Peter Geoghegan <pg@bowt.ie>
Backpatch-through: 18
The C standard says that the second and third arguments of a
conditional operator shall be both void type or both not-void
type. The Windows version of INTERRUPTS_PENDING_CONDITION()
got this wrong. It's pretty harmless because the result of
the operator is ignored anyway, but apparently recent versions
of MSVC have started issuing a warning about it. Silence the
warning by casting the dummy zero to void.
Reported-by: Christian Ullrich <chris@chrullrich.net>
Author: Bryan Green <dbryan.green@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/cc4ef8db-f8dc-4347-8a22-e7ebf44c0308@chrullrich.net
Backpatch-through: 13
subpath(ltree,offset,len) now correctly errors when given an offset
less than -n, where n is the number of labels in the given ltree.
There was a duplicate block of code that allowed an offset as low
as -2n. The documentation says no such thing, so this must have
been a copy-and-paste error in the original ltree patch.
While here, avoid redundant calculation of "end" and write
LTREE_MAX_LEVELS rather than its hard-coded value.
Author: Marcus Gartner <m.a.gartner@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAAUGV_SvBO9gWYbaejb9nhe-mS9FkNP4QADNTdM3wdRhvLobwA@mail.gmail.com
The original messages were confusing in dry-run mode in that they state
that something is being done, when in reality it isn't. Use alternative
wording in that case, to make the distinction clear.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/CAHut+PsvQJQnQO0KT0S2oegenkvJ8FUuY-QS5syyqmT24R2xFQ@mail.gmail.com
The code updates the system identifier, then runs pg_walreset; if the
latter fails, it complains about the former, which makes no sense.
Change the error message to complain about the right thing.
Noticed while reviewing a patch touching nearby code.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Backpatch-through: 17
Several functions in the codebase accept "Datum *" parameters but do
not modify the pointed-to data. These have been updated to take
"const Datum *" instead, improving type safety and making the
interfaces clearer about their intent. This change helps the compiler
catch accidental modifications and better documents immutability of
arguments.
Most of "Datum *" parameters have a pairing "bool *isnull" parameter,
they are constified as well.
No functional behavior is changed by this patch.
Author: Chao Li <lic@highgo.com>
Discussion: https://www.postgresql.org/message-id/flat/CAEoWx2msfT0knvzUa72ZBwu9LR_RLY4on85w2a9YpE-o2By5HQ@mail.gmail.com
The test introduced by this commit checks that a reload of
primary_conninfo leads to a WAL receiver restarted, by looking at the
request generated in the server logs. This is something for what there
was no coverage.
This has come up for a different patch, while discussing a regression
where a WAL receiver should not be stopped while waiting for a new
position to stream, like at the end of a timeline. In the case of the
other patch, we want to check that this log entry is not generated, but
if the error message is reworded the test would become silently broken.
The test of this commit ensures that we at least keep track the log
message format, for a supported scenario.
Extracted from a larger patch by the same author.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/aQKlC1v2_MXGV6_9@paquier.xyz
For all extant uses of TupleHashTables, execGrouping.c itself does
nothing with the "tablecxt" except to allocate new hash entries in it,
and the callers do nothing with it except to reset the whole context.
So this is an ideal use-case for a BumpContext, and the hash tables
are frequently big enough for the savings to be significant.
(Commit cc721c459 already taught nodeAgg.c this idea, but neglected
the other callers of BuildTupleHashTable.)
While at it, let's clean up some ill-advised leftovers from rebasing
TupleHashTables on simplehash.h:
* Many comments and variable names were based on the idea that the
tablecxt holds the whole TupleHashTable, whereas now it only holds the
hashed tuples (plus any caller-defined "additional storage"). Rename
to names like tuplescxt and tuplesContext, and adjust the comments.
Also adjust the memory context names to be like "<Foo> hashed tuples".
* Make ResetTupleHashTable() reset the tuplescxt rather than relying
on the caller to do so; that was fairly bizarre and seems like a
recipe for leaks. This is less efficient in the case where nodeAgg.c
uses the same tuplescxt for several different hashtables, but only
microscopically so because mcxt.c will short-circuit the extra resets
via its isReset flag. I judge the extra safety and intellectual
cleanliness well worth those few cycles.
* Remove the long-obsolete "allow_jit" check added by ac88807f9;
instead, just Assert that metacxt and tuplescxt are different.
We need that anyway for this definition of ResetTupleHashTable() to
be safe.
There is a side issue of the extent to which this change invalidates
the planner's estimates of hashtable memory consumption. However,
those estimates are already pretty bad, so improving them seems like
it can be a separate project. This change is useful to do first to
establish consistent executor behavior that the planner can expect.
A loose end not addressed here is that the "entrysize" calculation
in BuildTupleHashTable seems wrong: "sizeof(TupleHashEntryData) +
additionalsize" corresponds neither to the size of the simplehash
entries nor to the total space needed per tuple. It's questionable
why BuildTupleHashTable is second-guessing its caller's nbuckets
choice at all, since the original source of the number should have had
more information. But that all seems wrapped up with the planner's
estimation logic, so let's leave it for the planned followup patch.
Reported-by: Jeff Janes <jeff.janes@gmail.com>
Reported-by: David Rowley <dgrowleyml@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
Discussion: https://postgr.es/m/2268409.1761512111@sss.pgh.pa.us
The original is pretty baroque for no apparent reason; arguably, commit
2f9661311b should have done this. Noted while reviewing related code
for bug #18984. This is cosmetic (though I'm surprised that my compiler
generates shorter assembly this way), so no backpatch.
Discussion: https://postgr.es/m/18984-0f4778a6599ac3ae@postgresql.org
The docs for max_protocol_version suggested PQprotocolVersion()
instead of PQfullProtocolVersion() to find out the exact protocol
version. Since PQprotocolVersion() only returns the major protocol
version, that is bad advice.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAGECzQSKFxQsYAgr11PhdOr-RtPZEdAXZnHx6U3avLuk3xQaTQ%40mail.gmail.com
The new wal_fpi_bytes counter calculates the total amount of full page
images inserted in WAL records, in bytes. This commit exposes this
information in EXPLAIN (ANALYZE, WAL) alongside the existing counters,
for both the text and JSON/YAML outputs, building upon f9a09aa295.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discusssion: https://postgr.es/m/CAOzEurQtZEAfg6P0kU3Wa-f9BWQOi0RzJEMPN56wNTOmJLmfaQ@mail.gmail.com
This commit reverts 818fefd8fd, that has been introduced to address a
an instability in some of the TAP tests due to the presence of random
standby snapshot WAL records, when slots are invalidated by
InvalidatePossiblyObsoleteSlot().
Anyway, this commit had also the consequence of introducing a behavior
regression. After 818fefd8fd, the code may determine that a slot needs
to be invalidated while it may not require one: the slot may have moved
from a conflicting state to a non-conflicting state between the moment
when the mutex is released and the moment when we recheck the slot, in
InvalidatePossiblyObsoleteSlot(). Hence, the invalidations may be more
aggressive than they actually have to.
105b2cb336 has tackled the test instability in a way that should be
hopefully sufficient for the buildfarm, even for slow members:
- In v18, the test relies on an injection point that bypasses the
creation of the random records generated for standby snapshots,
eliminating the random factor that impacted the test. This option was
not available when 818fefd8fd was discussed.
- In v16 and v17, the problem was bypassed by disallowing a slot to
become active in some of the scenarios tested.
While on it, this commit adds a comment to document that it is fine for
a recheck to use xmin and LSN values stored in the slot, without storing
and reusing them across multiple checks.
Reported-by: "suyu.cmj" <mengjuan.cmj@alibaba-inc.com>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/f492465f-657e-49af-8317-987460cb68b0.mengjuan.cmj@alibaba-inc.com
Backpatch-through: 16
RIGHT_SEMI joins rely on the HEAP_TUPLE_HAS_MATCH flag to guarantee
that only the first match for each inner tuple is considered.
However, in a parallel hash join, the inner relation is stored in a
shared global hash table that can be probed by multiple workers
concurrently. This allows different workers to inspect and set the
match flags of the same inner tuples at the same time.
If two workers probe the same inner tuple concurrently, both may see
the match flag as unset and emit the same tuple, leading to duplicate
output rows and violating RIGHT_SEMI join semantics.
For now, we disable parallel plans for RIGHT_SEMI joins. In the long
term, it may be possible to support parallel execution by performing
atomic operations on the match flag, for example using a CAS or
similar mechanism.
Backpatch to v18, where RIGHT_SEMI join was introduced.
Bug: #19094
Reported-by: Lori Corbani <Lori.Corbani@jax.org>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19094-6ed410eb5b256abd@postgresql.org
Backpatch-through: 18
Because long is 32-bit on 64-bit Windows, it isn't a good datatype to
store the difference between 2 pointers. The under-sized type could
overflow and lead to scary warnings in MEMORY_CONTEXT_CHECKING builds,
such as:
WARNING: problem in alloc set ExecutorState: bad single-chunk %p in block %p
However, the problem lies only in the code running the check, not from
an actual memory accounting bug.
Fix by using "Size" instead of "long". This means using an unsigned
type rather than the previous signed type. If the block's freeptr was
corrupted, we'd still catch that if the unsigned type wrapped. Unsigned
allows us to avoid further needless complexities around comparing signed
and unsigned types.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 13
Discussion: https://postgr.es/m/CAApHDvo-RmiT4s33J=aC9C_-wPZjOXQ232V-EZFgKftSsNRi4w@mail.gmail.com
Two or more constants can have the same location. We handled this
correctly for non squashed constants, but failed to do it if squashed
(resulting in out-of-bounds memory access), because the code structure
became broken by commit 0f65f3eec4: we failed to update 'last_loc'
correctly when skipping these squashed constants.
The simplest fix seems to be to get rid of 'last_loc' altogether -- in
hindsight, it's quite pointless. Also, when ignoring a constant because
of this, make sure to fulfill fill_in_constant_lengths's duty of setting
its length to -1.
Lastly, we can use == instead of <= because the locations have been
sorted beforehand, so the < case cannot arise.
Co-authored-by: Sami Imseih <samimseih@gmail.com>
Co-authored-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reported-by: Konstantin Knizhnik <knizhnik@garret.ru>
Backpatch-through: 18
Discussion: https://www.postgresql.org/message-id/2b91e358-0d99-43f7-be44-d2d4dbce37b3%40garret.ru
Previously, we'd fill all fields except ccbin, and only later obtain and
detoast ccbin, with hypothetical failures being possible. If ccbin is
null (rare catalog corruption I have never witnessed) or its a corrupted
toast entry, we leak a tiny bit of memory in CacheMemoryContext from
having strdup'd the constraint name. Repair these by only attempting to
fill the struct once ccbin has been detoasted.
Author: Ranier Vilela <ranier.vf@gmail.com>
Discussion: https://postgr.es/m/CAEudQAr=i3_Z4GvmediX900+sSySTeMkvuytYShhQqEwoGyvhA@mail.gmail.com
Instead of having five separate GUC structs, one for each type, with
the generic part contained in each of them, flip it around and have
one common struct, with the type-specific part has a subfield.
The very original GUC design had type-specific structs and
type-specific lists, and the membership in one of the lists defined
the type. But now the structs themselves know the type (from the
.vartype field), and they are all loaded into a common hash table at
run time, and so this original separation no longer makes sense. It
creates a bunch of inconsistencies in the code about whether the
type-specific or the generic struct is the primary struct, and a lot
of casting in between, which makes certain assumptions about the
struct layouts.
After the change, all these casts are gone and all the data is
accessed via normal field references. Also, various code is
simplified because only one kind of struct needs to be processed.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/8fdfb91e-60fb-44fa-8df6-f5dea47353c9@eisentraut.org
MSVC in C11 mode supports the standard restrict qualifier, so we don't
need the workaround naming pg_restrict anymore.
Even though restrict is in C99 and should be supported by all
supported compilers, we keep the configure test and the hardcoded
redirection to __restrict, because that will also work in C++ in all
supported compilers. (restrict is not part of the C++ standard.)
For backward compatibility for extensions, we keep a #define of
pg_restrict around, but our own code doesn't use it anymore.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/0e3d8644-c01d-4374-86ea-9f0a987981f0%40eisentraut.org
XLogRecordAssemble() may be called multiple times before inserting a
record in XLogInsertRecord(), and the amount of FPIs generated inside
a record whose insertion is attempted multiple times may vary.
The logic added in f9a09aa295 touched directly pgWalUsage in
XLogRecordAssemble(), meaning that it could be possible for pgWalUsage
to be incremented multiple times for a single record. This commit
changes the code to use the same logic as the number of FPIs added to a
record, where XLogRecordAssemble() returns this information and feeds it
to XLogInsertRecord(), updating pgWalUsage only when a record is
inserted.
Reported-by: Shinya Kato <shinya11.kato@gmail.com>
Discussion: https://postgr.es/m/CAOzEurSiSr+rusd0GzVy8Bt30QwLTK=ugVMnF6=5WhsSrukvvw@mail.gmail.com
The new %S substitution shows the current value of search_path.
Note that this only works when connected to Postgres v18 or newer,
since search_path was first marked as GUC_REPORT in commit
28a1121fd9. On older versions that don't report search_path, %S is
replaced with a question mark.
Suggested-by: Lauri Siltanen <lauri.siltanen@gmail.com>
Author: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CANsM767JhTKCRagTaq5Lz52fVwLPVkhSpyD1C%2BOrridGv0SO0A%40mail.gmail.com
I have never seen this be a problem in practice, but it came up when
purposely corrupting catalog contents to study the fix for a nearby bug:
we'd try to decrement relchecks, but since it's zero we error out and
fail to drop the constraint. The fix is to downgrade the error to
warning, skip decrementing the counter, and otherwise proceed normally.
Given lack of field complaints, no backpatch.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/202508291058.q2zscdcs64fj@alvherre.pgsql
Some recent changes were made to remove the explicit dependency on
btree indexes in some parts of the code. One of these changes was
made in commit 9ef1851685, which allows non-btree indexes to be used
in get_actual_variable_range(). A follow-up commit ee1ae8b99f fixes
the cases where an index doesn’t have a sortopfamily as this is a
prerequisite to be used in get_actual_variable_range().
However, it was found that indexes that have amcanorder = true but do
not allow index-only-scans (amcanreturn returns false or is NULL) will
pass all of the conditions, while they should be rejected since
get_actual_variable_range() uses the index-only-scan machinery in
get_actual_variable_endpoint(). Such an index might cause errors like
ERROR: no data returned for index-only scan
during query planning.
The fix is to add a check in get_actual_variable_range() to reject
indexes that do not allow index-only scans.
Author: Maxime Schoemans <maxime.schoemans@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/20ED852A-C2D9-41EB-8671-8C8B9D418BE9%40enterprisedb.com
This new counter, called "wal_fpi_bytes", tracks the total amount in
bytes of full page images (FPIs) generated in WAL. This data becomes
available globally via pg_stat_wal, and for backend statistics via
pg_stat_get_backend_wal().
Previously, this information could only be retrieved with pg_waldump or
pg_walinspect, which may not be available depending on the environment,
and are expensive to execute. It offers hints about how much FPIs
impact the WAL generated, which could be a large percentage for some
workloads, as well as the effects of wal_compression or page holes.
Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, due to the addition of wal_fpi_bytes in
PgStat_WalCounters.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAOzEurQtZEAfg6P0kU3Wa-f9BWQOi0RzJEMPN56wNTOmJLmfaQ@mail.gmail.com
Extend logicalrep_worker_stop, logicalrep_worker_wakeup, and
logicalrep_worker_find to accept a worker type argument. This change
enables differentiation between logical replication worker types, such as
apply workers and table sync workers. While preserving existing behavior,
it lays the groundwork for upcoming patch to add sequence synchronization
workers.
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
Two tests are changed in this commit:
- libpq's 006_service
- ldap's 003_ldap_connection_param_lookup
CRLF translation is already handled by the text mode, so there should be
need for any specific logic. See also 1c6d462939, msys perl being one
case where the translation mattered.
Note: This is first applied on HEAD, and backpatch will follow once the
buildfarm has provided an opinion about this commit.
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/aPsh39bxwYKvUlAf@paquier.xyz
Backpatch-through: 13
The existing RLS tests focus on the outcomes of various testing
scenarios, rather than the exact policies applied. This sometimes
makes it hard to see why a particular result occurred (e.g., which
policy failed), or to construct a test that fails a particular policy
check without an earlier check failing. These new tests issue NOTICE
messages to show the actual policies applied for each command type,
including the different paths through INSERT ... ON CONFLICT and
MERGE, making it easier to verify the expected behaviour.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Viktor Holmberg <v@viktorh.net>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWqnfeChjK=n1V_dYZT4rt4mnq+ybf9c0qXDYTVMsy8pg@mail.gmail.com
This type is just char * underneath, it provides no real value, no
type safety, and just makes the code one level more mysterious. It is
more idiomatic to refer to blobs of memory by a combination of void *
and size_t, so change it to that.
Also, since this type hides the pointerness, we can't apply qualifiers
to what is pointed to, which requires some unconstify nonsense. This
change allows fixing that.
Extension code that uses the Item type can change its code to use
void * to be backward compatible.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/c75cccf5-5709-407b-a36a-2ae6570be766@eisentraut.org
Previously, the check_hook for synchronized_standby_slots attempted to
validate that each specified slot existed and was physical. However, these
checks were not performed during server startup. As a result, if users
configured non-existent slots before startup, the misconfiguration would
go undetected initially. This could later cause parallel query failures,
as newly launched workers would detect the issue and raise an ERROR.
This patch improves the check_hook by validating the syntax and format of
slot names. Validation of slot existence and type is deferred to the WAL
sender process, aligning with the behavior of the check_hook for
primary_slot_name.
Reported-by: Fabrice Chapuis <fabrice636861@gmail.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Rahila Syed <rahilasyed90@gmail.com>
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/CAA5-nLCeO4MQzWipCXH58qf0arruiw0OeUc1+Q=Z=4GM+=v1NQ@mail.gmail.com
Ensure that the target table on the subscriber exists before executing any
DML intended for replication.
Currently, if the table is missing, the replication logic keeps retrying
until the table is eventually created by the test. Although this behaviour
does not cause test failures, since the table is created after the INSERT
is published and replication eventually succeeds, however, it introduces
unnecessary looping and delays.
Author: Grem Snoort <grem.snoort@gmail.com>
Discussion: https://postgr.es/m/CANV9Qw5HD7=Fp4nox2O7DoVctHoabRRVW9Soo4A=QipqW5B=Tg@mail.gmail.com
When dealing with ResultRelInfos for partitions, there are cases where
there are mixed requirements for the ri_RootResultRelInfo. There are
cases when the partition itself requires a NULL ri_RootResultRelInfo and
in the same query, the same partition may require a ResultRelInfo with
its parent set in ri_RootResultRelInfo. This could cause the column
mapping between the partitioned table and the partition not to be done
which could result in crashes if the column attnums didn't match
exactly.
The fix is simple. We now check that the ri_RootResultRelInfo matches
what the caller passed to ExecGetTriggerResultRel() and only return a
cached ResultRelInfo when the ri_RootResultRelInfo matches what the
caller wants, otherwise we'll make a new one.
Author: David Rowley <dgrowleyml@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Reported-by: Dmitry Fomin <fomin.list@gmail.com>
Discussion: https://postgr.es/m/7DCE78D7-0520-4207-822B-92F60AEA14B4@gmail.com
Backpatch-through: 15
When testing 128/32-bit division in the test_int128 test module, make
sure that we don't attempt to divide by zero.
While at it, use pg_prng_int64() and pg_prng_int32() to generate the
random numbers required for the tests, rather than pg_prng_uint64().
This eliminates the need for any implicit or explicit type casts.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/da4wqngd6gwz5s4yf5y5f75xj7gpja62l4dbp6w4j3vs7fcjue@upvolcu4e6o2
This commit makes the way WAL segments are handled from the source to
the target server slightly smarter: the copy of the WAL segments is now
skipped if these have been created before the point where source and
target have diverged (the WAL segment where the point of divergence
exists is still copied), because we know that such segments exist on
both the target and source. Note that the on-disk size of the WAL
segments on the source and target need to match. Hence, only the
segments generated after the point of divergence are now copied. A
segment existing on the source but not the target is copied.
Previously, all the WAL segments were just copied in full. This change
can make the rewind operation cheaper in some configurations, especially
for setups where some WAL retention causes many segments to remain on
the source server even after the promotion of a standby used as source
to rewind a previous primary.
A TAP test is added to track these new behaviors. The file map printed
with --debug now includes all the information related to WAL segments,
to be able to track if these are copied or skipped, and the test relies
on the debug output generated.
Author: John Hsu <johnhyvr@gmail.com>
Author: Justin Kwan <justinpkwan@outlook.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Discussion: https://postgr.es/m/181b4c6fa9c.b8b725681941212.7547232617810891479@viggy28.dev
This commit enhances psql's tab completion support for large objects:
- Completes \lo_export <oid> with a file name
- Completes GRANT/REVOKE ... LARGE with OBJECT
- Completes ALTER DEFAULT PRIVILEGES GRANT/REVOKE ... LARGE with OBJECTS
Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/87y0syikki.fsf@wibble.ilmari.org
Commit 65281391a caused some additional error context lines to
appear in the output of one test case. That's fine, but we missed
updating the expected output. Do it now.
While here, add some missing test-output subdirectories to
contrib/sepgsql/.gitignore, so that we don't get git warnings
after running the tests.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1613232.1761255361@sss.pgh.pa.us
Backpatch-through: 18
The project Git server hasn't supported cloning with the Git protocol
in a very long time, but the documentation never got the memo. Remove
the mention of using the Git protocol, and while there wrap a mention
of Git in <productname> tags.
Backpatch down to all supported versions.
Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Gurjeet Singh <gurjeet@singh.im>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Gurjeet Singh <gurjeet@singh.im>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CABwTF4WMiMb-KT2NRcib5W0C8TQF6URMb+HK9a_=rnZnY8Q42w@mail.gmail.com
Backpatch-through: 13
If we're trying to perform check_function_bodies validation
of a PL/Python trigger function, we create a new PLyProcedure,
but we don't put it into the PLy_procedure_cache hash table.
(Doing so would be useless, since we don't have the relation
OID that is part of the cache key for a trigger function, so
we could not make an entry that would be found by later uses.)
However, we didn't think through what to do instead, with the
result that the PLyProcedure was simply leaked.
It would take a pretty large number of CREATE FUNCTION operations
for this to amount to a serious problem, but it's easy to see the
memory bloat if you do CREATE OR REPLACE FUNCTION in a loop.
To fix, have PLy_procedure_get delete the new PLyProcedure
and return NULL if it's not going to cache the PLyProcedure.
I considered making plpython3_validator do the cleanup instead,
which would be more natural. But then plpython3_validator would
have to know the rules under which PLy_procedure_get returns a
non-cached PLyProcedure, else it risks deleting something that's
pointed to by a cache entry. On the whole it seems more robust
to deal with the case inside PLy_procedure_get.
Found by the new version of Coverity (nice catch!). In the end
I feel this fix is more about satisfying Coverity than about
fixing a real-world problem, so I'm not going to back-patch.
These two functions expect there to be room to insert another item
in the FreePageBtree's array, but their assertions were too weak
to guarantee that. This has little practical effect granting that
the callers are not buggy, but it seems to be misleading late-model
Coverity into complaining about possible array overrun.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/799984.1761150474@sss.pgh.pa.us
Backpatch-through: 13
Commit c6f7f11d8 intended to prevent leaking any PyObject reference
counts in edge cases (such as out-of-memory during string
construction), but actually it introduced a leak in the normal case.
Repeating an error-trapping operation often enough would lead to
session-lifespan memory bloat. The problem is that I failed to
think about the fact that PyObject_GetAttrString() increments the
refcount of the returned PyObject, so that simply walking down the
list of error frame objects causes all but the first one to have
their refcount incremented.
I experimented with several more-or-less-complex ways around that,
and eventually concluded that the right fix is simply to drop the
newly-obtained refcount as soon as we walk to the next frame
object in PLy_traceback. This sounds unsafe, but it's perfectly
okay because the caller holds a refcount on the first frame object
and each frame object holds a refcount on the next one; so the
current frame object can't disappear underneath us.
By the same token, we can simplify the caller's cleanup back to
simply dropping its refcount on the first object. Cleanup of
each frame object will lead in turn to the refcount of the next
one going to zero.
I also added a couple of comments explaining why PLy_elog_impl()
doesn't try to free the strings acquired from PLy_get_spi_error_data()
or PLy_get_error_data(). That's because I got here by looking at a
Coverity complaint about how those strings might get leaked. They
are not leaked, but in testing that I discovered this other leak.
Back-patch, as c6f7f11d8 was. It's a bit nervous-making to be
putting such a fix into v13, which is only a couple weeks from its
final release; but I can't see that leaving a recently-introduced
leak in place is a better idea.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1203918.1761184159@sss.pgh.pa.us
Backpatch-through: 13
This patch adds support for a new SQL command:
ALTER SUBSCRIPTION ... REFRESH SEQUENCES
This command updates the sequence entries present in the
pg_subscription_rel catalog table with the INIT state to trigger
resynchronization.
In addition to the new command, the following subscription commands have
been enhanced to automatically refresh sequence mappings:
ALTER SUBSCRIPTION ... REFRESH PUBLICATION
ALTER SUBSCRIPTION ... ADD PUBLICATION
ALTER SUBSCRIPTION ... DROP PUBLICATION
ALTER SUBSCRIPTION ... SET PUBLICATION
These commands will perform the following actions:
Add newly published sequences that are not yet part of the subscription.
Remove sequences that are no longer included in the publication.
This ensures that sequence replication remains aligned with the current
state of the publication on the publisher side.
Note that the actual synchronization of sequence data/values will be
handled in a subsequent patch that introduces a dedicated sequence sync
worker.
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
isRelDataFile() is renamed to getFileContentType(), extended so as it
becomes able to detect more file patterns than only relation files. The
new file name pattern that can be detected is WAL files.
This refactoring has been suggested by Robert Haas. This will be used
in a follow-up patch where we are looking at improving how WAL files are
processed by pg_rewind. As of this change, WAL files are still handled
the same way as previously, always copied from the source to the target
server.
Extracted from a larger patch by the same authors.
Author: John Hsu <johnhyvr@gmail.com>
Author: Justin Kwan <justinpkwan@outlook.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Discussion: https://postgr.es/m/181b4c6fa9c.b8b725681941212.7547232617810891479@viggy28.dev
Commit 883a95646a introduced overflow entries in the replication lag tracker
to fix an issue where lag columns in pg_stat_replication could stall when
the replay LSN stopped advancing.
This commit adds comments clarifying the purpose and behavior of overflow
entries to improve code readability and understanding.
Since commit 883a95646a was recently applied and backpatched to all
supported branches, this follow-up commit is also backpatched accordingly.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VxqQA_DePxyZ7Y8V+ErYyXkmwJ1P6NC+YC+cvxMipWKw@mail.gmail.com
Backpatch-through: 13
The "else" code block having single statement with comments on a
separate line should have been surrounded by braces.
Reported-by: Chao Li <lic@highgo.com>
Suggested-by: David Rowley <dgrowleyml@gmail.com>
Author: Tatsuo Ishii <ishii@postgresql.org>
Discussion: https://postgr.es/m/20251020.125847.997839131426057290.ishii%40postgresql.org
These are listing which other tests one of the tests in the subsequent
group depends on. A couple of comments were located with unrelated
tests.
In passing, fix a small grammatical issue.
Noticed in passing while working on something else.
Author: David Rowley <dgrowleyml@gmail.com>
When JIT deformed tuples (controlled via the jit_tuple_deforming GUC),
types narrower than sizeof(Datum) would be zero-extended up to Datum
width. This wasn't the same as what fetch_att() does in the standard
tuple deforming code. Logically the values are the same when fetching
via the DatumGet*() marcos, but negative numbers are not the same in
binary form.
In the report, the problem was manifesting itself with:
ERROR: could not find memoization table entry
in a query which had a "Cache Mode: binary" Memoize node. However, it's
currently unclear what else is affected. Anything that uses
datum_image_eq() or datum_image_hash() on a Datum from a tuple deformed by
JIT could be affected, but it may not be limited to that.
The fix for this is simple: use signed extension instead of zero
extension.
Many thanks to Emmanuel Touzery for reporting this issue and providing
steps and backup which allowed the problem to easily be recreated.
Reported-by: Emmanuel Touzery <emmanuel.touzery@plandela.si>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/DB8P194MB08532256D5BAF894F241C06393F3A@DB8P194MB0853.EURP194.PROD.OUTLOOK.COM
Backpatch-through: 13
We had several places that used cast-to-unsigned-int as a substitute
for properly checking for overflow. Coverity has started objecting
to that practice as likely introducing Y2038 bugs. An extra
comparison is surely not much compared to the cost of time(NULL), nor
is this coding practice particularly readable. Let's do it honestly,
with explicit logic covering the cases of first-time-through and
clock-went-backwards.
I don't feel a need to back-patch though: our released versions
will be out of support long before 2038, and besides which I think
the code would accidentally work anyway for another 70 years or so.
The result of PLyUnicode_AsString is already palloc'd,
so pstrdup'ing it is just a waste of time and memory.
More importantly it might confuse people about whether
that's necessary. Doesn't seem important enough to
back-patch, but we should fix it. Spotted by Coverity.
Coverity complained that the list of table names returned by
retrieve_objects() was leaked, and it's right. Potentially this
is quite a lot of memory; however, it's not entirely clear how much
we gain by cleaning it up, since in many operating modes we're going
to need the list until the program is about to exit. Still, it will
win in some cases, so fix the leak.
vacuuming.c is new in v19, and this patch doesn't apply at all cleanly
to the predecessor code in v18. I'm not excited enough about the
issue to devise a back-patch.
One code path forgot to free the separately-malloc'd filename
part of a struct rfile. Another place freed the filename but
forgot the struct rfile itself. These seem worth fixing because
with a large backup we could be dealing with many files.
Coverity found the bug in make_rfile(). I found the other one
by manual inspection.
This small function is only used in one place, and it fails to
handle quoted table names (although the table name portion of the
input should never be quoted in current usage). In addition to
removing make_temptable_name_n() in favor of open-coding it where
needed, this commit ensures the "diff" table name is properly
quoted in order to future-proof this area a bit.
Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Discussion: https://postgr.es/m/CAJ7c6TO3a5q2NKRsjdJ6sLf8isVe4aMaaX1-Hj2TdHdhFw8zRA%40mail.gmail.com
Previously, if primary_slot_name was set to an invalid slot name and
the configuration file was reloaded, both the postmaster and all other
backend processes reported a WARNING. With many processes running,
this could produce a flood of duplicate messages. The problem was that
the GUC check hook for primary_slot_name reported errors at WARNING
level via ereport().
This commit changes the check hook to use GUC_check_errdetail() and
GUC_check_errhint() for error reporting. As with other GUC parameters,
this causes non-postmaster processes to log the message at DEBUG3,
so by default, only the postmaster's message appears in the log file.
Backpatch to all supported versions.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/CAHGQGwFud-cvthCTfusBfKHBS6Jj6kdAPTdLWKvP2qjUX6L_wA@mail.gmail.com
Backpatch-through: 13
Previously it was mistakenly assumed that there's only one window
function argument which needs to be processed by WinGetFuncArgInFrame
or WinGetFuncArgInPartition when IGNORE NULLS option is specified. To
eliminate the limitation, WindowObject->notnull_info is modified from
"uint8 *" to "uint8 **" so that WindowObject->notnull_info could store
pointers to "uint8 *" which holds NOT NULL info corresponding to each
window function argument. Moreover, WindowObject->num_notnull_info is
changed from "int" to "int64 *" so that WindowObject->num_notnull_info
could store the number of NOT NULL info corresponding to each function
argument. Memories for these data structures will be allocated when
WinGetFuncArgInFrame or WinGetFuncArgInPartition is called. Thus no
memory except the pointers is allocated for function arguments which
do not call these functions
Also fix the set mark position logic in WinGetFuncArgInPartition to
not raise a "cannot fetch row before WindowObject's mark position"
error in IGNORE NULLS case.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Tatsuo Ishii <ishii@postgresql.org>
Discussion: https://postgr.es/m/2952409.1760023154%40sss.pgh.pa.us
Previously, when the replay LSN reported in feedback messages from a standby
stopped advancing, for example, due to a recovery conflict, the write_lag and
flush_lag columns in pg_stat_replication would initially update but then stop
progressing. This prevented users from correctly monitoring replication lag.
The problem occurred because when any LSN stopped updating, the lag tracker's
cyclic buffer became full (the write head reached the slowest read head).
In that state, the lag tracker could no longer compute round-trip lag values
correctly.
This commit fixes the issue by handling the slowest read entry (the one
causing the buffer to fill) as a separate overflow entry and freeing space
so the write and other read heads can continue advancing in the buffer.
As a result, write_lag and flush_lag now continue updating even if the reported
replay LSN remains stalled.
Backpatch to all supported versions.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwGdGQ=1-X-71Caee-LREBUXSzyohkoQJd4yZZCMt24C0g@mail.gmail.com
Backpatch-through: 13
5983a4cff added CompactAttribute for storing commonly used fields from
FormData_pg_attribute. 5983a4cff didn't go to the trouble of adjusting
every location where we can use CompactAttribute rather than
FormData_pg_attribute, so here we change the remaining ones.
There are some locations where I've left the code using
FormData_pg_attribute. These are mostly in the ALTER TABLE code. Using
CompactAttribute here seems more risky as often the TupleDesc is being
changed and those changes may not have been flushed to the
CompactAttribute yet.
I've also left record_recv(), record_send(), record_cmp(), record_eq()
and record_image_eq() alone as it's not clear to me that accessing the
CompactAttribute is a win here due to the FormData_pg_attribute still
having to be accessed for most cases. Switching the relevant parts to
use CompactAttribute would result in having to access both for common
cases. Careful benchmarking may reveal that something can be done to
make this better, but in absence of that, the safer option is to leave
these alone.
In ReorderBufferToastReplace(), there was a check to skip attnums < 0
while looping over the TupleDesc. Doing this is redundant since
TupleDescs don't store < 0 attnums. Removing that code allows us to
move to using CompactAttribute.
The change in validateDomainCheckConstraint() just moves fetching the
FormData_pg_attribute into the ERROR path, which is cold due to calling
errstart_cold() and results in code being moved out of the common path.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAApHDvrMy90o1Lgkt31F82tcSuwRFHq3vyGewSRN=-QuSEEvyQ@mail.gmail.com
If a data block to be skipped over is less than 4kB, just read the
data instead of using fseeko(). Experimentation shows that this
avoids useless kernel calls --- possibly quite a lot of them, at
least with current glibc --- while not incurring any extra I/O,
since libc will read 4kB at a time anyway. (There may be platforms
where the default buffer size is different from 4kB, but this
change seems unlikely to hurt in any case.)
We don't expect short data blocks to be common in the wake of
66ec01dc4 and related commits. But older pg_dump files may well
contain very short data blocks, and that will likely be a case
to be concerned with for a long time.
While here, do a little bit of other cleanup in _skipData.
Make "buflen" be size_t not int; it can't really exceed the
range of int, but comparing size_t and int variables is just
asking for trouble. Also, when we initially allocate a buffer
for reading skipped data into, make sure it's at least 4kB to
reduce the odds that we'll shortly have to realloc it bigger.
Author: Dimitrios Apostolou <jimis@gmx.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2edb7a57-b225-3b23-a680-62ba90658fec@gmx.net
This commit adds a note to RELEASE_CHANGES to remind us to create
an .abi-compliance-history file for new major versions.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David E. Wheeler <david@justatheory.com>
Discussion: https://postgr.es/m/aPJ03E2itovDBcKX%40nathan
Previously, tsearch used the database's CTYPE setting, which only
matches the database default collation if the locale provider is libc.
Note that tsearch types (tsvector and tsquery) are not collatable
types. The locale affects parsing the original text, which is a lossy
process, so a COLLATE clause on the already-parsed value would not
make sense.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com
Currently there's no bug, because we have no code path where we
invalidate relcache entries where it'd cause a problem. But it's more
robust to do it this way in case we introduce such a path later, as some
Postgres forks reportedly already have.
Author: Daniil Davydov <3danissimo@gmail.com>
Reviewed-by: Stepan Neretin <slpmcf@gmail.com>
Discussion: https://postgr.es/m/CAJDiXgj3FNzAhV+jjPqxMs3jz=OgPohsoXFj_fh-L+nS+13CKQ@mail.gmail.com
A BlockNumber (32-bit) might not be large enough to add bo_pagesPerRange
to when the table contains close to 2^32 pages. At worst, this could
result in a cancellable infinite loop during the BRIN index scan with
power-of-2 pagesPerRange, and slow (inefficient) BRIN index scans and
scanning of unneeded heap blocks for non power-of-2 pagesPerRange.
Backpatch to all supported versions.
Author: sunil s <sunilfeb26@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAOG6S4-tGksTQhVzJM19NzLYAHusXsK2HmADPZzGQcfZABsvpA@mail.gmail.com
Backpatch-through: 13
The comment fixed in this commit described the function as dealing with
database blocks, but in reality it processes shared memory allocations.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aH4DDhdiG9Gi0rG7@ip-10-97-1-34.eu-west-3.compute.internal
Backpatch-through: 18
67a54b9e8 taught the planner to push down HAVING clauses even when
grouping sets are present, as long as the clause does not reference
any columns that are nullable by the grouping sets. However, there
was an oversight: if any empty grouping sets are present, the
aggregation node can produce a row that did not come from the input,
and pushing down a HAVING clause in this case may cause us to fail to
filter out that row.
Currently, non-degenerate HAVING clauses are not pushed down when
empty grouping sets are present, since the empty grouping sets would
nullify the vars they reference. However, degenerate (variable-free)
HAVING clauses are not subject to this restriction and may be
incorrectly pushed down.
To fix, explicitly check for the presence of empty grouping sets and
retain degenerate clauses in HAVING when they are present. This
ensures that we don't emit a bogus aggregated row. A copy of each
such clause is also put in WHERE so that query_planner() can use it in
a gating Result node.
To facilitate this check, this patch expands the groupingSets tree of
the query to a flat list of grouping sets before applying the HAVING
pushdown optimization. This does not add any additional planning
overhead, since we need to do this expansion anyway.
In passing, make a small tweak to preprocess_grouping_sets() by
reordering its initial operations a bit.
Backpatch to v18, where this issue was introduced.
Reported-by: Yuhang Qiu <iamqyh@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/0879D9C9-7FE2-4A20-9593-B23F7A0B5290@gmail.com
Backpatch-through: 18
pgwin32_unsetenv() (compatibility routine of unsetenv() on Windows)
lacks the input validation that its sibling pgwin32_setenv() has.
Without these checks, calling unsetenv() with incorrect names crashes on
WIN32. However, invalid names should be handled, failing on EINVAL.
This commit adds the same checks as setenv() to fail with EINVAL for a
"name" set to NULL, an empty string, or if '=' is included in the value,
per POSIX requirements.
Like 7ca37fb040, backpatch down to v14. pgwin32_unsetenv() is defined
on REL_13_STABLE, but with the branch going EOL soon and the lack of
setenv() there for WIN32, nothing is done for v13.
Author: Bryan Green <dbryan.green@gmail.com>
Discussion: https://postgr.es/m/b6a1e52b-d808-4df7-87f7-2ff48d15003e@gmail.com
Backpatch-through: 14
Previously, COPY TO command didn't support directly specifying
partitioned tables so users had to use COPY (SELECT ...) TO variant.
This commit adds direct COPY TO support for partitioned
tables, improving both usability and performance. Performance tests
show it's faster than the COPY (SELECT ...) TO variant as it avoids
the overheads of query processing and sending results to the COPY TO
command.
When used with partitioned tables, COPY TO copies the same rows as
SELECT * FROM table. Row-level security policies of the partitioned
table are applied in the same way as when executing COPY TO on a plain
table.
Author: jian he <jian.universality@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Melih Mutlu <m.melihmutlu@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CACJufxEZt%2BG19Ors3bQUq-42-61__C%3Dy5k2wk%3DsHEFRusu7%3DiQ%40mail.gmail.com
The revised logic in 001_ssltests.pl would fail if openssl
doesn't work or if Perl is a 32-bit build, because it had
already overwritten $serialno with something inappropriate
to use in the eventual match. We could go back to the
previous code layout, but it seems best to introduce a
separate variable for the output of openssl.
Per failure on buildfarm member mamba, which has a 32-bit Perl.
Commit d9572c4e3b added extension support and made pg_dump attempt to
dump security labels on extensions. However, security labels on extensions
are not actually supported, so this code was unnecessary.
This commit removes it.
Suggested-by: Jian He <jian.universality@gmail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxF8=z0v=888NKKEoTHQ+Jc4EXutFi91BF0fFjgFsZT6JQ@mail.gmail.com
Previously, attempting to use pg_checksums on a cluster with a control
file whose version does not match with what thetool is able to support
would lead to the following error:
pg_checksums: error: pg_control CRC value is incorrect
This is confusing, because it would look like the control file is
corrupted. However, the contents of the control file are correct,
pg_checksums not being able to understand how the past control file is
shaped.
This commit adds a check based on PG_VERSION, using the facility added
by cd0be131ba, using the same error message as some of the other
frontend tools. A note is added in the documentation about the major
version requirement.
Author: Michael Banck <mbanck@gmx.net>
Discussion: https://postgr.es/m/68f1ff21.170a0220.2c9b5f.4df5@mx.google.com
Our configure script intended to ensure this, but it supposed that
expr(1) would report an error for integer overflow. Maybe that
was true when the code was written (commit 3c6248a82 of 2008-05-02),
but all the modern expr's I tried will deliver bigger-than-int32
results without complaint. Moreover, if you use --with-segsize-blocks
then there's no check at all.
Ideally we'd add a test in configure itself to check that the value
fits in int, but to do that we'd need to suppose that test(1) handles
bigger-than-int32 numbers correctly. Probably modern ones do, but
that's an assumption I could do without; and I'm not too trusting
about meson either. Instead, let's install a static assertion, so
that even people who ignore all the compiler warnings you get from
such values will be forced to confront the fact that it won't work.
This has been hazardous for awhile, but given that we hadn't heard
a complaint about it till now, I don't feel a need to back-patch.
Reported-by: Casey Shobe <casey.allen.shobe@icloud.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/C5DC82D6-C76D-4E8F-BC2E-DF03EFC4FA24@icloud.com
It emerges that zlib's configuration logic is not robust enough
to guarantee that the macro will have the same ideas about struct
field layout as the library itself does, leading to corruption of
zlib's state struct followed by unintelligible failure messages.
This hazard has existed for a long time, but we'd not noticed
for several reasons:
(1) We only use gzgetc() when trying to read a manually-compressed
TOC file within a directory-format dump, which is a rarely-used
scenario that we weren't even testing before 20ec99589.
(2) No corruption actually occurs unless sizeof(long) is different
from sizeof(off_t) and the platform is big-endian.
(3) Some platforms have already fixed the configuration instability,
at least sufficiently for their environments.
Despite (3), it seems foolish to assume that the problem isn't
going to be present in some environments for a long time to come.
Hence, avoid relying on this macro. We can just #undef it and
fall back on the underlying function of the same name.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2122679.1760846783@sss.pgh.pa.us
Backpatch-through: 13
Coverity complains that the return value from gettuple_eval_partition
(stored in variable "datum") in a do..while loop in
WinGetFuncArgInPartition is overwritten when exiting the while
loop. This commit tries to fix the issue by changing the
gettuple_eval_partition call to:
(void) gettuple_eval_partition()
explicitly stating that we discard the return value. We are just
interested in whether we are inside or outside of partition, NULL or
NOT NULL here.
Also enhance some comments for easier code reading.
Reported-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aPCOabSE4VfJLaky%40paquier.xyz
We must tell init about each role name we plan to connect as,
else SSPI auth fails. Similar to previous patches such as
14793f471, 973542866.
Oversight in 208927e65, per buildfarm member drongo.
(Although that was back-patched to v13, the test script
only exists in v16 and up.)
This removes a few static functions and replaces them with 2 functions
which aim to be more reusable. The upper planner's pathkey requirements
can be simplified down to operations which require pathkeys in the same
order as the pathkeys for the given operation, and operations which can
make use of a Path's pathkeys in any order.
Here we also add some short-circuiting to truncate_useless_pathkeys(). At
any point we discover that all pathkeys are useful to a single operation,
we can stop checking the remaining operations as we're not going to be
able to find any further useful pathkeys - they're all possibly useful
already. Adjusting this seems to warrant trying to put the checks
roughly in order of least-expensive-first so that the short-circuits
have the most chance of skipping the more expensive checks.
In passing clean up has_useful_pathkeys() as it seems to have grown a
redundant check for group_pathkeys. This isn't needed as
standard_qp_callback will set query_pathkeys if there's any requirement
to have group_pathkeys. All this code does is waste run-time effort and
take up needless space.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAApHDvpbsEoTksvW5901MMoZo-hHf78E5up3uDOfkJnxDe_WAw@mail.gmail.com
It is possible to have a non-inherited not-null constraint on an
inherited column, but we were failing to preserve such constraints
during pg_upgrade where the source is 17 or older, because of a bug in
the pg_dump query for it. Oversight in commit 14e87ffa5c. Fix that
query. In passing, touch-up a bogus nearby comment introduced by the
same commit.
In version 17, make the regression tests leave a table in this
situation, so that this scenario is tested in the cross-version upgrade
tests of 18 and up.
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reported-by: Andrew Bille <andrewbille@gmail.com>
Bug: #19074
Backpatch-through: 18
Discussion: https://postgr.es/m/19074-ae2548458cf0195c@postgresql.org
This fixes an unlikely issue when fetching GROUPING SET results from
their internally stored hash tables. It was possible in rare cases that
the hash iterator would be set up incorrectly which could result in a
crash.
This was introduced in 4d143509c, so backpatch to v18.
Many thanks to Yuri Zamyatin for reporting and helping to debug this
issue.
Bug: #19078
Reported-by: Yuri Zamyatin <yuri@yrz.am>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/19078-dfd62f840a2c0766@postgresql.org
Backpatch-through: 18
Switch to using the English word here rather than using a verbified
function name.
The full word still fits within a single comment line, so it's probably
better just to use that instead of trying to shorten it, which might
cause confusion.
Author: Rafia Sabih <rafia.pghackers@gmail.com>
Discussion: https://postgr.es/m/CA+FpmFe7LnRF2NA_QfARjkSWme4mNt+Udwbh2Yb=zZm35Ji31w@mail.gmail.com
Commit a1b4f289be improved the hashjoin sizing to also consider the
memory used by BufFiles for batches. The code however had multiple
issues, making it ineffective or not working as expected in some cases.
* The amount of memory needed by buffers was calculated using uint32,
so it would overflow for nbatch >= 262144. If this happened the loop
would exit prematurely and the memory usage would not be reduced.
The nbatch overflow is fixed by reworking the condition to not use a
multiplication at all, so there's no risk of overflow. An explicit
cast was added to a similar calculation in ExecHashIncreaseBatchSize.
* The loop adjusting the nbatch value used hash_table_bytes to calculate
the old/new size, but then updated only space_allowed. The consequence
is the total memory usage was not reduced, but all the memory saved by
reducing the number of batches was used for the internal hash table.
This was fixed by using only space_allowed. This is also more correct,
because hash_table_bytes does not account for skew buckets.
* The code was also doubling multiple parameters (e.g. the number of
buckets for hash table), but was missing overflow protections.
The loop now checks for overflow, and terminates if needed. It'd be
possible to cap the value and continue the loop, but it's not worth
the complexity. And the overflow implies the in-memory hash table is
already very large anyway.
While at it, rework the comment explaining how the memory balancing
works, to make it more concise and easier to understand.
The initial nbatch overflow issue was reported by Vaibhav Jain. The
other issues were noticed by me and Melanie Plageman. Fix by me, with a
lot of review and feedback by Melanie.
Backpatch to 18, where the hashjoin memory balancing was introduced.
Reported-by: Vaibhav Jain <jainva@google.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/CABa-Az174YvfFq7rLS+VNKaQyg7inA2exvPWmPWqnEn6Ditr_Q@mail.gmail.com
pg_prewarm() currently checks for SELECT privileges on the target
relation. However, indexes do not have access rights of their own,
so a role may be denied permission to prewarm an index despite
having the SELECT privilege on its parent table. This commit fixes
this by locking the parent table before the index (to avoid
deadlocks) and checking for SELECT on the parent table. Note that
the code is largely borrowed from
amcheck_lock_relation_and_check().
An obvious downside of this change is the extra AccessShareLock on
the parent table during prewarming, but that isn't expected to
cause too much trouble in practice.
Author: Ayush Vatsa <ayushvatsa1810@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/CACX%2BKaMz2ZoOojh0nQ6QNBYx8Ak1Dkoko%3DD4FSb80BYW%2Bo8CHQ%40mail.gmail.com
Backpatch-through: 13
The SSL tests for pg_stat_ssl tries to exactly match the serial
from the certificate by extracting it with the openssl binary.
If that fails due to the binary not being available, a fallback
match is used, but the attempt to execute a missing binary adds
a warning to the output which can confuse readers for a failure
in the test. Fix by only attempting if the openssl binary was
found by autoconf/meson.
Backpatch down to v16 where commit c8e4030d1b made the test
use the OPENSSL variable from autoconf/meson instead of a hard-
coded value.
Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/aNPSp1-RIAs3skZm@msg.df7cb.de
Backpatch-through: 16
The URL for "Sorting and Searching Algorithms: A Cookbook"
by Thomas Niemann has started returning 404, and since we
refer to the page for license terms this replaces the now
defunct link with one to the copy on archive.org.
Author: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/6DED3DEF-875E-4D1D-8F8F-7353D5AF7B79@gmail.com
The TAP tests whose ok() calls are changed in this commit were relying
on perl operators, rather than equivalents available in Test::More. For
example, rather than the following:
ok($data =~ qr/expr/m, "expr matching");
ok($data !~ qr/expr/m, "expr not matching");
The new test code uses this equivalent:
like($data, qr/expr/m, "expr matching");
unlike($data, qr/expr/m, "expr not matching");
A huge benefit of the new formulation is that it is possible to know
about the values we are checking if a failure happens, making debugging
easier, should the test runs happen in the buildfarm, in the CI or
locally.
This change leads to more test code overall as perltidy likes to make
the code pretty the way it is in this commit.
Author: Sadhuprasad Patro <b.sadhu@gmail.com>
Discussion: https://postgr.es/m/CAFF0-CHhwNx_Cv2uy7tKjODUbeOgPrJpW4Rpf1jqB16_1bU2sg@mail.gmail.com
Improve the documentation of pg_stat_replication to explain when
the backend_xmin column becomes NULL. This happens when
a replication slot is used (the xmin is then shown in pg_replication_slots)
or when hot_standby_feedback is disabled.
Author: Renzo Dani <arons7@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CA+XOKQAMXzskpdUmj2sg03_5fmiXc2Gs0r3TX1_rmcFcqh+=xQ@mail.gmail.com
042_low_level_backup compared the result of a query two times with a
comparison operator based on an integer, while the result should be
compared with a string.
The outcome of the tests is currently not impacted by this change.
However, it could be possible that the tests fail to detect future
issues if the query results become different, for some reason.
Oversight in 99b4a63bef.
Author: Sadhuprasad Patro <b.sadhu@gmail.com>
Discussion: https://postgr.es/m/CAFF0-CHhwNx_Cv2uy7tKjODUbeOgPrJpW4Rpf1jqB16_1bU2sg@mail.gmail.com
Backpatch-through: 17
040_pg_createsubscriber has been calling safe_psql(), that returns the
result of a SQL query, with ok() without checking the result generated
(in this case 't', for a number of publications).
The outcome of the tests is currently not impacted by this change.
However, it could be possible that the test fails to detect future
issues if the query results become different.
The test is rewritten so as the number of publications is checked. This
is not the fix suggested originally by the author, but this is more
reliable in the long run.
Oversight in e5aeed4b80.
Author: Sadhuprasad Patro <b.sadhu@gmail.com>
Discussion: https://postgr.es/m/CAFF0-CHhwNx_Cv2uy7tKjODUbeOgPrJpW4Rpf1jqB16_1bU2sg@mail.gmail.com
Backpatch-through: 18
The original formulation failed to take into account the fact that for
the PGXS case, the source dir is not $(top_srcdir), so it ended up not
doing anything. Handle it explicitly.
Author: Ryo Matsumura <matsumura.ryo@fujitsu.com>
Reviewed-by: Bryan Green <dbryan.green@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/TYCPR01MB113164770FB0B0BE6ED21E68EE8DCA@TYCPR01MB11316.jpnprd01.prod.outlook.com
Add a test case to cover pg_dump with --compress=none. This
brings the coverage of compress_none.c up from about 64% to 90%,
in particular covering the new code added in a previous patch.
Include compression of toc.dat in manually-compressed test cases.
We would have found the bug fixed in commit a239c4a0c much sooner
if we'd done this. As far as I can tell, this doesn't reduce test
coverage at all, since there are other tests of directory format
that still use an uncompressed toc.dat.
Widen the wide row used to verify correct (de) compression.
Commit 1a05c1d25 advises us (not without reason) to ensure that
this test case fully fills DEFAULT_IO_BUFFER_SIZE, so that loops
within the compression logic will iterate completely. To follow
that advice with the proposed DEFAULT_IO_BUFFER_SIZE of 128K,
we need something close to this. This does indeed increase the
reported code coverage by a few lines.
While here, fix a glitch that I noticed in testing: the
$glob_patterns tests were incapable of failing, because glob()
will return 'foo' as 'foo' whether there is a matching file or
not. (Indeed, the stanza just above that one relies on that.)
In my testing, this patch adds approximately as much runtime
as was saved by the previous patch, so that it's about a wash
compared to the old code. However, we get better test coverage.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
Add a new test script 006_pg_dump_compress.pl, containing just
the pg_dump tests specifically concerned with compression, and
remove those tests from 002_pg_dump.pl. We can also drop some
infrastructure in 002_pg_dump.pl that was used only for these tests.
The point of this is to avoid the cost of running these test
cases over and over in all the scenarios (runs) that 002_pg_dump.pl
exercises. We don't learn anything more about the behavior of the
compression code that way, and we expend significant amounts of
time, since one of these test cases is quite large and due to get
larger.
The intent of this specific patch is to provide exactly the same
coverage as before, except that I went back to using --no-sync
in all the test runs moved over to 006_pg_dump_compress.pl.
I think that avoiding that had basically been cargo-culted into
these test cases as a result of modeling them on the
defaults_custom_format test case; again, doing that over and over
isn't going to teach us anything new.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
After commit fe8192a95, compress_zstd.c tends to produce data block
sizes around 128K, and we don't really have any control over that
unless we want to overrule ZSTD_CStreamOutSize(). Which seems like
a bad idea. But let's try to align the other compression modes to
produce block sizes roughly comparable to that, so that pg_restore's
skip-data performance isn't enormously different for different modes.
gzip compression can be brought in line simply by setting
DEFAULT_IO_BUFFER_SIZE = 128K, which this patch does. That
increases some unrelated buffer sizes, but none of them seem
problematic for modern platforms.
lz4's idea of appropriate block size is highly nonlinear:
if we just increase DEFAULT_IO_BUFFER_SIZE then the output
blocks end up around 200K. I found that adjusting the slop
factor in LZ4State_compression_init was a not-too-ugly way
of bringing that number roughly into line.
With compress = none you get data blocks the same sizes as the
table rows, which seems potentially problematic for narrow tables.
Introduce a layer of buffering to make that case match the others.
Comments in compress_io.h and 002_pg_dump.pl suggest that if
we increase DEFAULT_IO_BUFFER_SIZE then we need to increase the
amount of data fed through the tests in order to improve coverage.
I've not done that here, leaving it for a separate patch.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
This information appears to have been unused since commit
c5b7ba4e67. We could not find any references in third-party code,
either.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aO_CyFRpbVMtgJWM%40nathan
To support the upcoming addition of a sequence synchronization worker,
this patch extracts common synchronization logic shared by table sync
workers and the new sequence sync worker into a dedicated file. This
modularization improves code reuse, maintainability, and clarity in the
logical workers framework.
Author: vignesh C <vignesh21@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
EvalPlanQualStart() failed to propagate es_partition_directory into
the child EState used for EPQ rechecks. When execution time partition
pruning ran during the EPQ scan, executor code dereferenced a NULL
partition directory and crashed.
Previously, propagating es_partition_directory into the EPQ EState was
unnecessary because CreatePartitionPruneState(), which sets it on
demand, also initialized the exec-pruning context. After commit
d47cbf474, CreatePartitionPruneState() now initializes only the init-
time pruning context, leaving exec-pruning context initialization to
ExecInitNode(). Since EvalPlanQualStart() runs only ExecInitNode() and
not CreatePartitionPruneState(), it can encounter a NULL
es_partition_directory. Other executor fields initialized during
CreatePartitionPruneState() are already copied into the child EState
thanks to commit 8741e48e5d, but es_partition_directory was missed.
Fix by borrowing the parent estate's es_partition_directory in
EvalPlanQualStart(), and by clearing that field in EvalPlanQualEnd()
so the parent remains responsible for freeing the directory.
Add an isolation test permutation that triggers EPQ with execution-
time partition pruning, the case that reproduces this crash.
Bug: #19078
Reported-by: Yuri Zamyatin <yuri@yrz.am>
Diagnosed-by: David Rowley <dgrowleyml@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Co-authored-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/19078-dfd62f840a2c0766@postgresql.org
Backpatch-through: 18
Per report from buildfarm member prion. The CI does not use this
parameter, and this buildfarm member sets log_error_verbosity to
"verbose". This would generate extra LOCATION entries in the logs,
causing the regexps of the test to fail.
Trying to support log_error_verbosity=verbose in the test would mean to
tweak all the regexps used in the test to detect an optional set of
LOCATION lines, at least. This would not improve the coverage, and
forcing the GUC value is simpler.
Oversight in 76bba03312.
Discussion: https://postgr.es/m/aPBaNNGiYT3xMBN1@paquier.xyz
Temporary file usage is sometimes attributed to the wrong query in the logs
output. One identified reason is that unnamed portal cleanup (and
consequently temp file logging) happens during the next BIND message as
a, after debug_query_string has already been updated to the new query.
Dropping an unnamed portal in the next BIND message is a rather old
protocol behavior (fe19e56c57, also mentioned in the docs).
log_temp_files is a bit newer than that, as of be8a431881.
This commit adds tests to track which query is displayed next to the
temporary file(s) removed when a portal is dropped, and in some cases if
a query is displayed or not. We have not concluded how to improve the
situation yet; these tests will at least help in checking what changes
in the logs depending on the proposal discussed and how it affects the
scenarios tracked by this new test.
Author: Sami Imseih <samimseih@gmail.com>
Author: Frédéric Yhuel <frederic.yhuel@dalibo.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/3d07ee43-8855-42db-97e0-bad5db82d972@dalibo.com
This commit adjusts RangeVarCallbackForReindexIndex() to handle an
extremely unlikely race condition involving concurrent OID reuse.
In short, if REINDEX INDEX is executed at the same time that the
index is re-created with the same name and OID but a different
parent table OID, we might lock the wrong parent table. To fix,
simply detect when this happens and emit an ERROR. Unfortunately,
we can't gracefully handle this situation because we will have
already locked the index, and we must lock the parent table before
the index to avoid deadlocks.
While at it, I've replaced all but one early return in this
callback function with ERRORs that should be unreachable. While I
haven't verified the presence of a live bug, the checks in question
appear to be unnecessary, and the early returns seem prone to
breaking the parent table locking code in subtle ways. If nothing
else, this simplifies the code a bit.
This is a bug fix and could be back-patched, but given the presumed
rarity of the race condition and the lack of reports, I'm not going
to bother.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/Z8zwVmGzXyDdkAXj%40nathan
Per-character pg_locale_t APIs. Useful for tsearch parsing and
potentially other places.
Significant overlap with the regc_wc_isalpha() and related functions
in regc_pg_locale.c, but this change leaves those intact for
now.
Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com
Presently, these functions look up the relation's OID, lock it, and
then check privileges. Not only does this approach provide no
guarantee that the locked relation matches the arguments of the
lookup, but it also allows users to briefly lock relations for
which they do not have privileges, which might enable
denial-of-service attacks. This commit adjusts these functions to
use RangeVarGetRelidExtended(), which is purpose-built to avoid
both of these issues. The new RangeVarGetRelidCallback function is
somewhat complicated because it must handle both tables and
indexes, and for indexes, we must check privileges on the parent
table and lock it first. Also, it needs to handle a couple of
extremely unlikely race conditions involving concurrent OID reuse.
A downside of this change is that the coding doesn't allow for
locking indexes in AccessShare mode anymore; everything is locked
in ShareUpdateExclusive mode. Per discussion, the original choice
of lock levels was intended for a now defunct implementation that
used in-place updates, so we believe this change is okay.
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/Z8zwVmGzXyDdkAXj%40nathan
Backpatch-through: 18
The log output functionality of log_autovacuum_min_duration applies to
both VACUUM and ANALYZE, so it is not possible to separate the VACUUM
and ANALYZE log output thresholds. Logs are likely to be output only for
VACUUM and not for ANALYZE.
Therefore, we decided to separate the threshold for log output of VACUUM
by autovacuum (log_autovacuum_min_duration) and the threshold for log
output of ANALYZE by autovacuum (log_autoanalyze_min_duration).
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Kasahara Tatsuhito <kasaharatt@oss.nttdata.com>
Discussion: https://www.postgresql.org/message-id/flat/CAOzEurQtfV4MxJiWT-XDnimEeZAY+rgzVSLe8YsyEKhZcajzSA@mail.gmail.com
If inside an EPQ recheck, ExecScanFetch would run the recheck method
function for foreign/custom joins even if they aren't descendant nodes
in the EPQ recheck plan tree, which is problematic at least in the
foreign-join case, because such a foreign join isn't guaranteed to have
an alternative local-join plan required for running the recheck method
function; in the postgres_fdw case this could lead to a segmentation
fault or an assert failure in an assert-enabled build when running the
recheck method function.
Even if inside an EPQ recheck, any scan nodes that aren't descendant
ones in the EPQ recheck plan tree should be normally processed by using
the access method function; fix by modifying ExecScanFetch so that if
inside an EPQ recheck, it runs the recheck method function for
foreign/custom joins that are descendant nodes in the EPQ recheck plan
tree as before and runs the access method function for foreign/custom
joins that aren't.
This fix also adds to postgres_fdw an isolation test for an EPQ recheck
that caused issues stated above.
Oversight in commit 385f337c9.
Reported-by: Kristian Lejao <kristianlejao@gmail.com>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Co-authored-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/CAD21AoBpo6Gx55FBOW+9s5X=nUw3Xpq64v35fpDEKsTERnc4TQ@mail.gmail.com
Backpatch-through: 13
This essentially reverts commit 866566a690, which installed
safeguards against loading plpython2 and plpython3 into the same
process. We don't support plpython2 anymore, so this is obsolete.
The Python and PL/Python initialization now happens again in
_PG_init() rather than the first time a PL/Python call handler is
invoked. (Often, these will be very close together.)
I kept the separate PLy_initialize() function introduced by
866566a690 to keep _PG_init() a bit modular.
Reviewed-by: Mario González Troncoso <gonzalemario@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/9eb9feb6-1df3-4f0c-a0dc-9bcf35273111%40eisentraut.org
This patch replaces ALTER SUBSCRIPTION REFRESH with
ALTER SUBSCRIPTION REFRESH PUBLICATION in comments and error messages to
improve clarity and support future extensibility. The change aligns with
upcoming addition REFRESH SEQUENCES for sequence synchronization.
Author: vignesh C <vignesh21@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
pg_createsubscriber is documented as requiring the same major version as
the target clusters. Attempting to use this tool on a cluster where the
control file version read does not match with the version compiled with
would lead to the following error message:
pg_createsubscriber: error: control file appears to be corrupt
This is confusing as the control file is correct: only the version
expected does not match. This commit integrates pg_createsubscriber
with the facility added by cd0be131ba, where the contents of
PG_VERSION are read and compared with the value of PG_MAJORVERSION_NUM
expected by the tool. This puts pg_createsubscriber in line with the
documentation, with a better error message when the version does not
match.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aONDWig0bIGilixs@paquier.xyz
pg_resetwal's custom logic to retrieve the version number of a data
folder's PG_VERSION can be replaced by the facility introduced in
cd0be131ba. This removes some code.
One thing specific to pg_resetwal is that the first line of PG_VERSION
is read and reported in the error report generated when the major
version read does not match with the version pg_resetwal has been
compiled with. The new logic preserves this property, without changes
to neither the error message nor the data used in the error report.
Note that as a chdir() is done within the data folder before checking the
data of PG_VERSION, get_pg_version() needs to be tweaked to look for
PG_VERSION in the current folder.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aOiirvWJzwdVCXph@paquier.xyz
pg_combinebackup's custom logic to retrieve the version number of a data
folder's PG_VERSION can be replaced by the facility introduced in
cd0be131ba. This removes some code.
One thing specific to this tool is that backend versions older than v10
are not supported. The new code does the same checks as the previous
code.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aOiirvWJzwdVCXph@paquier.xyz
This reverts commit 74ac377d75.
The previous change contained a misconception about how publications
are cleaned up on the subscriber. The newly added log message could
confuse users, particularly when running pg_createsubscriber with
--dry-run - users would see a "dropping publication" message
immediately followed by a "no publications found" message.
Discussion: https://postgr.es/m/CAHut+Pu7xz1LqNvyQyvSHrV0Sw6D=e6T-Jm=gh1MRJrkuWGyBQ@mail.gmail.com
This function only requires a few fields from LVRelState, so pass them
in individually.
This change allows calling heap_page_is_all_visible() from code such as
pruneheap.c, which does not have access to an LVRelState.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl%40mdyyqpqzxjek
These functions appeared prominently in a profile of a patch that sets
the visibility map on-access. Inline them to remove call overhead and
make them cheaper to use in hot paths.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl%40mdyyqpqzxjek
After scanning the line pointers on a heap page during the first phase
of vacuum, we use the information collected to decide whether to use
the assembled freeze plans.
Move this decision logic into a helper function to improve readability.
While here, rename a PruneState member and disambiguate some local
variables in heap_page_prune_and_freeze().
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl%40mdyyqpqzxjek
When specifying --clean=publication to pg_createsubscriber, it drops
all existing publications with a log message "dropping all existing
publications in database "testdb"". Add a new log message "no
publications found" when there are no publications to drop, making the
progress more transparent to users.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAHut+Ptm+WJwbbYXhC0s6FP_98KzZCR=5CPu8F8N5uV8P7BpqA@mail.gmail.com
The present coding of dblink's get_rel_from_relname() predates the
introduction of RangeVarGetRelidExtended(), which provides a way to
check permissions before locking the relation. This commit adjusts
get_rel_from_relname() to use that function.
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/aOgmi6avE6qMw_6t%40nathan
add323da40 altered xl_heap_prune, changing the WAL format, but
neglected to bump XLOG_PAGE_MAGIC. Do so now.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Kirill Reshke <reshkekirill@gmail.com>
Reported-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aO3Gw6hCAZFUd5ab%40paquier.xyz
Previously WinCheckAndInitializeNullTreatment() used elog() to emit an
error message. ereport() should be used instead because it's a
user-facing error. Also use existing get_func_name() to get a
function's name, rather than own implementation.
Moreover add an assertion to validate winobj parameter, just like
other window function API.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Tatsuo Ishii <ishii@postgresql.org>
Reviewed-by: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/2952409.1760023154%40sss.pgh.pa.us
Unsurprisingly, this shaves code. get_major_server_version() can be
replaced by the new routine added by cd0be131ba, with the contents of
PG_VERSION stored in an allocated buffer instead of a fixed-sized one.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aOiirvWJzwdVCXph@paquier.xyz
get_pg_version() is able to return a version number, that can be used
for comparisons based on PG_VERSION_NUM. A macro is added to convert
the result to a major version number, to work with PG_MAJORVERSION_NUM.
It is possible to pass to the routine an optional argument, where the
contents retrieved from PG_VERSION are saved. This requirement matters
for some of the frontend code (one example: pg_upgrade wants that for
tablespace paths with a version number strictly older than v10).
This will be used by a set of follow-up patches, to be consumed in
various frontend tools that duplicate a logic similar to do what this
new routine does, like:
- pg_resetwal
- pg_combinebackup
- pg_createsubscriber
- pg_upgrade
This routine supports both the post-v10 version number and the older
flavor (aka 9.6), as required at least by pg_upgrade.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aOiirvWJzwdVCXph@paquier.xyz
The version number calculated by read_pg_version_file() is multiplied
once by 10000, to be able to do comparisons based on PG_VERSION_NUM or
equivalents with a minor version included. However, the version number
given sync_pgdata() was multiplied by 10000 a second time, leading to an
overestimated number.
This issue was harmless (still incorrect) as pg_combinebackup does not
support versions of Postgres older than v10, and sync_pgdata() only
includes a version check due to the rename of pg_xlog/ to pg_wal/. This
folder rename happened in the development cycle of v10. This would
become a problem if in the future sync_pgdata() is changed to have more
version-specific checks.
Oversight in dc21234005, so backpatch down to v17.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aOil5d0y87ZM_wsZ@paquier.xyz
Backpatch-through: 17
Instead of emitting a separate XLOG_HEAP2_VISIBLE WAL record for each
page that becomes all-visible in vacuum's third phase, specify the VM
changes in the already emitted XLOG_HEAP2_PRUNE_VACUUM_CLEANUP record.
Visibility checks are now performed before marking dead items unused.
This is safe because the heap page is held under exclusive lock for the
entire operation.
This reduces the number of WAL records generated by VACUUM phase III by
up to 50%.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
log_error() would probably fail completely if used, and would
certainly print garbage for anything that needed to be interpolated
into the message, because it was failing to use the correct printing
subroutine for a va_list argument.
This bug likely went undetected because the error cases this code
is used for are rarely exercised - they only occur when Windows
security API calls fail catastrophically (out of memory, security
subsystem corruption, etc).
The FRONTEND variant can be fixed just by calling vfprintf()
instead of fprintf(). However, there was no va_list variant
of write_stderr(), so create one by refactoring that function.
Following the usual naming convention for such things, call
it vwrite_stderr().
Author: Bryan Green <dbryan.green@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAF+pBj8goe4fRmZ0V3Cs6eyWzYLvK+HvFLYEYWG=TzaM+tWPnw@mail.gmail.com
Backpatch-through: 13
There was some confusion around how to adjust the n_distinct estimates
for partitioned tables. Here we try and clarify that
n_distinct_inherited needs to be adjusted rather than n_distinct.
Also fix some slightly misleading text which was talking about table
size rather than table rows, fix a grammatical error, and adjust some
text which indicated that ANALYZE was performing calculations based on
the n_distinct settings. Really it's the query planner that does this
and ANALYZE only stores the overridden n_distinct estimate value in
pg_statistic.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/CAApHDvrL7a-ZytM1SP8Uk9nEw9bR2CPzVb+uP+bcNj=_q-ZmVw@mail.gmail.com
I was distressed to find that reading an LZ4-compressed toc.dat
file was hundreds of times slower than it ought to be. On
investigation, the blame mostly affixes to LZ4Stream_read_overflow's
habit of memmove'ing all the remaining buffered data after each read
operation. Since reading a TOC file tends to involve a lot of small
(even one-byte) decompression calls, that amounts to an O(N^2) cost.
This could have been fixed with a minimal patch, but to my
eyes LZ4Stream_read_internal and LZ4Stream_read_overflow are
badly-written spaghetti code; in particular the eol_flag logic
is inefficient and duplicative. I chose to throw the code
away and rewrite from scratch. This version is about sixty
lines shorter as well as not having the performance issue.
Fortunately, AFAICT the only way to get to this problem is to
manually LZ4-compress the toc.dat and/or blobs.toc files within a
directory-style archive; in the main data files, we read blocks
that are large enough that the O(N^2) behavior doesn't manifest.
Few people do that, which likely explains the lack of field
complaints. Otherwise this performance bug might be considered
bad enough to warrant back-patching.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
Both of these modules dumped each bit of output that they got from
the underlying compression library as a separate "data block" in
the emitted archive file. In the case of zstd this'd frequently
result in block sizes well under 100 bytes; lz4 is a little better
but still produces blocks around 300 bytes, at least in the test
case I tried. This bloats the archive file a little bit compared
to larger block sizes, but the real problem is that when pg_restore
has to skip each data block rather than seeking directly to some
target data, tiny block sizes are enormously inefficient.
Fix both modules so that they fill their allocated buffer reasonably
well before dumping a data block. In the case of lz4, also delete
some redundant logic that caused the lz4 frame header to be emitted
as a separate data block. (That saves little, but I see no reason
to expend extra code to get worse results.)
I fixed the "stream API" code too. In those cases, feeding small
amounts of data to fwrite() probably doesn't have any meaningful
performance consequences. But it seems like a bad idea to leave
the two sets of code doing the same thing in two different ways.
In passing, remove unnecessary "extra paranoia" check in
_ZstdWriteCommon. _CustomWriteFunc (the only possible referent
of cs->writeF) already protects itself against zero-length writes,
and it's really a modularity violation for _ZstdWriteCommon to know
that the custom format disallows empty data blocks. Also, fix
Zstd_read_internal to do less work when passed size == 0.
Reported-by: Dimitrios Apostolou <jimis@gmx.net>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
pg_dump expects a read request of zero bytes to be a no-op; see for
example ReadStr(). Gzip_read got this wrong and falsely supposed
that the resulting gzret == 0 indicated an error. We could complicate
that error-checking logic some more, but it seems best to just fall
out immediately when passed size == 0.
This bug breaks the nominally-supported case of manually gzip'ing
the toc.dat file within a directory-style dump, so back-patch to v16
where this code came in. (Prior branches already have a short-circuit
for size == 0 before their only gzread call.)
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
Backpatch-through: 16
Since protocol version 3.2 the CancelRequest does not have a fixed size
length anymore. The protocol docs still listed the length field to be a
constant number though. This fixes that.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reported-by: Dmitry Igrishin <dmitigr@gmail.com>
Backpatch-through: 18
Remove a variable that is no longer in use following commit 9a2e2a28.
It's not immediately clear why there were no compiler warnings about
this oversight.
Author: Peter Geoghegan <pg@bowt.ie>
Backpatch-through: 18
In commit a45c78e32 I removed the only regression test case that
reaches this function, because it turns out that we only use it
if reading an LZ4-compressed blobs.toc file in a directory dump,
and that is a state that has to be created manually. That seems
like a bad thing to not test, not so much for LZ4Stream_gets()
itself as because it means the squirrely eol_flag logic in
LZ4Stream_read_internal() is not tested.
The reason for the change was that I thought the lz4 program did not
have any way to perform compression without explicit specification
of the output file name. However, it turns out that the syntax
synopsis in its man page is a lie, and if you read enough of the
man page you find out that with "-m" it will do what's needful.
So restore the manual compression step in that test case.
Noted while testing some proposed changes in pg_dump's compression
logic.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
Backpatch-through: 17
Commit 71f4c8c6f7 (which implemented DETACH CONCURRENTLY) added code
to create a separate table constraint when a table is detached
concurrently, identical to the partition constraint, on the theory that
such a constraint was needed in case the optimizer had constructed any
query plans that depended on the constraint being there. However, that
theory was apparently bogus because any such plans would be invalidated.
For hash partitioning, those constraints are problematic, because their
expressions reference the OID of the parent partitioned table, to which
the detached table is no longer related; this causes all sorts of
problems (such as inability of restoring a pg_dump of that table, and
the table no longer working properly if the partitioned table is later
dropped).
We'd like to get rid of all those constraints. In fact, for branch
master, do that -- no longer create any substitute constraints.
However, out of fear that some users might somehow depend on these
constraints for other partitioning strategies, for stable branches
(back to 14, which added DETACH CONCURRENTLY), only do it for hash
partitioning.
(If you repeatedly DETACH CONCURRENTLY and then ATTACH a partition, then
with this constraint addition you don't need to scan the table in the
ATTACH step, which presumably is good. But if users really valued this
feature, they would have requested that it worked for non-concurrent
DETACH also.)
Author: Haiyang Li <mohen.lhy@alibaba-inc.com>
Reported-by: Fei Changhong <feichanghong@qq.com>
Reported-by: Haiyang Li <mohen.lhy@alibaba-inc.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/18371-7fef49f63de13f02@postgresql.org
Discussion: https://postgr.es/m/19070-781326347ade7c57@postgresql.org
Introduced by my (Álvaro's) commit 9e4f914b5e, which was itself
backpatched to pg10, though only pg15 and up contain the problem
because of commit 9c08aea6a3.
This isn't a particularly significant leak, but given the fix is
trivial, we might as well backpatch to all branches where it applies, so
do that.
Author: Nathan Bossart <nathandbossart@gmail.com>
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/x4odfdlrwvsjawscnqsqjpofvauxslw7b4oyvxgt5owoyf4ysn@heafjusodrz7
An assertion in _bt_killitems expected the scan's currPos state to
contain a valid LSN, saved from when currPos's page was initially read.
The assertion failed to account for the fact that even logged relations
can have leaf pages with an invalid LSN when built with wal_level set to
"minimal". Remove the faulty assertion.
Oversight in commit e6eed40e (though note that the assertion was
backpatched to stable branches before 18 by commit 7c319f54).
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Matthijs van der Vleuten <postgresql@zr40.nl>
Bug: #19082
Discussion: https://postgr.es/m/19082-628e62160dbbc1c1@postgresql.org
Backpatch-through: 13
An error happening while a slot data is saved on disk in
SaveSlotToPath() could cause a state.tmp file (temporary file holding
the slot state data, renamed to its permanent name at the end of the
function) to remain around after it has been created. This temporary
file is created with O_EXCL, meaning that if an existing state.tmp is
found, its creation would fail. This would prevent the slot data to be
saved, requiring a manual intervention to remove state.tmp before being
able to save again a slot. Possible scenarios where this temporary file
could remain on disk is for example a ENOSPC case (no disk space) while
writing, syncing or renaming it. The bug reports point to a write
failure as the principal cause of the problems.
Using O_TRUNC has been argued back in 2019 as a potential solution to
discard any temporary file that could exist. This solution was rejected
as O_EXCL can also act as a safety measure when saving the slot state,
crash recovery offering cleanup guarantees post-crash. This commit uses
the alternative approach that has been suggested by Andres Freund back
in 2019. When the temporary state file cannot be written, synced,
closed or renamed (note: not when created!), an unlink() is used to
remove the temporary state file while holding the in-progress I/O
LWLock, so as any follow-up attempts to save a slot's data would not
choke on an existing file that remained around because of a previous
failure.
This problem has been reported a few times across the years, going back
to 2019, but for some reason I have never come back to do something
about it and it has been forgotten. A recent report has reminded me
that this was still a problem.
Reported-by: Kevin K Biju <kevinkbiju@gmail.com>
Reported-by: Sergei Kornilov <sk@zsrv.org>
Reported-by: Grigory Smolkin <g.smolkin@postgrespro.ru>
Discussion: https://postgr.es/m/CAM45KeHa32soKL_G8Vk38CWvTBeOOXcsxAPAs7Jt7yPRf2mbVA@mail.gmail.com
Discussion: https://postgr.es/m/3559061693910326@qy4q4a6esb2lebnz.sas.yp-c.yandex.net
Discussion: https://postgr.es/m/08bbfab1-a61d-3750-fc18-4ab2c1aa7f09@postgrespro.ru
Backpatch-through: 13
test_random_operations() did not check the result returned by
bms_is_member() in its last phase, when checking that the contents of
the bitmap match with what is expected. This was impacting the
reliability of the function and the coverage it could provide.
This commit improves the whole function, adding more checks based on
bms_is_member(), using a bitmap and a secondary array that tracks the
members added by random additions and deletions.
While on it, more comments are added to document the internals of the
function.
Reported-by: Ranier Vilela <ranier.vf@gmail.com>
Author: Greg Burd <greg@burd.me>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAEudQAq_zOSA2NUQSWePTGV_=90Uw0WcXxGOWnN-vwF046OOqA@mail.gmail.com
Instead of emitting a separate WAL XLOG_HEAP2_VISIBLE record for setting
bits in the VM, specify the VM block changes in the
XLOG_HEAP2_MULTI_INSERT record.
This halves the number of WAL records emitted by COPY FREEZE.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
The processing of the PARALLEL option for VACUUM was not quite
following what the DefElem code had intended. defGetInt32() already has
code to handle missing parameters and returns a perfectly good error
message for when that happens.
Here we get rid of the ExecVacuum() error:
ERROR: parallel option requires a value between 0 and N
and leave defGetInt32() handle it, which will give:
ERROR: parallel requires an integer value
defGetInt32() was already handling the non-integer parameter case, so it
may as well handle the missing parameter case too.
Additionally, parameterize the option name to make translator work easier,
and also use errhint_internal() rather than errhint() for the
BUFFER_USAGE_LIMIT option since there isn't any work for a translator to
do for "%s".
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAApHDvovH14tNWB+WvP6TSbfi7-=TysQ9h5tQ5AgavwyWRWKHA@mail.gmail.com
An error context callback function might leak some memory into
ErrorContext, since those functions are run with ErrorContext as
current context. In the case where the elevel is ERROR, this is
no problem since the code level that catches the error should do
FlushErrorState to clean up, and that will reset ErrorContext.
However, if the elevel is less than ERROR then no such cleanup occurs.
In principle, repeated leaks while emitting log messages or client
notices could accumulate arbitrarily much leaked data, if no ERROR
occurs in the session.
To fix, let errfinish() perform an ErrorContext reset if it is
at the outermost error nesting level. (If it isn't, we'll delay
cleanup until the outermost nesting level is exited.)
The only actual leakage of this sort that I've been able to observe
within our regression tests was recently introduced by commit
f727b63e8. While it seems plausible that there are other such
leaks not reached in the regression tests, the lack of field
reports suggests that they're not a big problem. Accordingly,
I won't take the risk of back-patching this now. We can always
back-patch later if we get field reports of leaks.
Reported-by: Andres Freund <andres@anarazel.de>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/jngsjonyfscoont4tnwi2qoikatpd5hifsg373vmmjvugwiu6g@m6opxh7uisgd
While pgoutput caches relation synchronization information in
RelationSyncCache that resides in CacheMemoryContext, each entry's
information (such as row filter expressions and column lists) is
stored in the entry's private memory context (entry_cxt in
RelationSyncEntry), which is a descendant memory context of the
decoding context. If a logical decoding invoked via SQL functions like
pg_logical_slot_get_binary_changes fails with an error, subsequent
logical decoding executions could access already-freed memory of the
entry's cache, resulting in a crash.
With this change, it's ensured that RelationSyncCache is cleaned up
even in error cases by using a memory context reset callback function.
Backpatch to 15, where entry_cxt was introduced for column filtering
and row filtering.
While the backbranches v13 and v14 have a similar issue where
RelationSyncCache persists even after an error when pgoutput is used
via SQL API, we decided not to backport this fix. This decision was
made because v13 is approaching its final minor release, and we won't
have an chance to fix any new issues that might arise. Additionally,
since using pgoutput via SQL API is not a common use case, the risk
outwights the benefit. If we receive bug reports, we can consider
backporting the fixes then.
Author: vignesh C <vignesh21@gmail.com>
Co-authored-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/CALDaNm0x-aCehgt8Bevs2cm=uhmwS28MvbYq1=s2Ekf0aDPkOA@mail.gmail.com
Backpatch-through: 15
Some of the buildfarm is still unhappy with WinGetFuncArgInPartition
even after 2273fa32b. While it seems to be just very old compilers,
we can suppress the warnings and arguably make the code more readable
by not initializing these variables till closer to where they are
used. While at it, make a couple of cosmetic comment improvements.
The creation of a replication slot done in a specific database on a
publisher was logged twice, with the second log not mentioning the
database where the slot creation happened. This commit removes the
information logged after a slot has been successfully created, moving
the information about the publisher from the second to the first log.
Note that failing a slot creation is also logged, so there is no loss of
information.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHut+Pv7qDvLbDgc9PQGhULT3rPXTxdu_=w+iW-kMs+zPADR+w@mail.gmail.com
This patch adds support for the ALL SEQUENCES clause in publications,
enabling synchronization/replication of all sequences that is useful for
upgrades.
Publications can now include all sequences via FOR ALL SEQUENCES.
psql enhancements:
\d shows publications for a given sequence.
\dRp indicates if a publication includes all sequences.
ALL SEQUENCES can be combined with ALL TABLES, but not with other options
like TABLE or TABLES IN SCHEMA. We can extend support for more granular
clauses in future.
The view pg_publication_sequences provides information about the mapping
between publications and sequences.
This patch enables publishing of sequences; subscriber-side support will
be added in upcoming patches.
Author: vignesh C <vignesh21@gmail.com>
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
SQL/JSON functions such as JSON_VALUE could fail with "unrecognized
node type" errors when a DEFAULT clause contained an explicit COLLATE
expression. That happened because assign_collations_walker() could
invoke exprSetCollation() on a JsonBehavior expression whose DEFAULT
still contained a CollateExpr, which exprSetCollation() does not
handle.
For example:
SELECT JSON_VALUE('{"a":1}', '$.c' RETURNING text
DEFAULT 'A' COLLATE "C" ON EMPTY);
Fix by validating in transformJsonBehavior() that the DEFAULT
expression's collation matches the enclosing JSON expression’s
collation. In exprSetCollation(), replace the recursive call on the
JsonBehavior expression with an assertion that its collation already
matches the target, since the parser now enforces that condition.
Reported-by: Jian He <jian.universality@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CACJufxHVwYYSyiVQ6o+PsRX6zQ7rAFinh_fv1kCfTsT1xG4Zeg@mail.gmail.com
Backpatch-through: 17
truncate_useless_pathkeys() seems to have neglected to account for
PathKeys that might be useful for WindowClause evaluation. Modify it so
that it properly accounts for that.
Making this work required adjusting two things:
1. Change from checking query_pathkeys to check sort_pathkeys instead.
2. Add explicit check for window_pathkeys
For #1, query_pathkeys gets set in standard_qp_callback() according to the
sort order requirements for the first operation to be applied after the
join planner is finished, so this changes depending on which upper
planner operations a particular query needs. If the query has window
functions and no GROUP BY, then query_pathkeys gets set to
window_pathkeys. Before this change, this meant PathKeys useful for the
ORDER BY were not accounted for in queries with window functions.
Because of #1, #2 is now required so that we explicitly check to ensure
we don't truncate away PathKeys useful for window functions.
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvrj3HTKmXoLMbUjTO=_MNMxM=cnuCSyBKidAVibmYPnrg@mail.gmail.com
Previously StrategyGetBuffer() acquired the buffer header spinlock for every
buffer, whether it was reusable or not. If reusable, it'd be returned, with
the lock held, to GetVictimBuffer(), which then would pin the buffer with
PinBuffer_Locked(). That's somewhat violating the spirit of the guidelines for
holding spinlocks (i.e. that they are only held for a few lines of consecutive
code) and necessitates using PinBuffer_Locked(), which scales worse than
PinBuffer() due to holding the spinlock. This alone makes it worth changing
the code.
However, the main reason to change this is that a future commit will make
PinBuffer_Locked() slower (due to making UnlockBufHdr() slower), to gain
scalability for the much more common case of pinning a pre-existing buffer. By
pinning the buffer with a single atomic operation, iff the buffer is reusable,
we avoid any potential regression for miss-heavy workloads. There strictly are
fewer atomic operations for each potential buffer after this change.
The price for this improvement is that freelist.c needs two CAS loops and
needs to be able to set up the resource accounting for pinned buffers. The
latter is achieved by exposing a new function for that purpose from bufmgr.c,
that seems better than exposing the entire private refcount infrastructure.
The improvement seems worth the complexity.
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
We're planning to merge buffer content locks into BufferDesc.state. To reduce
the size of that patch, centralize calls to BufferDescriptorGetContentLock().
The biggest part of the change is in assertions, by introducing
BufferIsLockedByMe[InMode]() (and removing BufferIsExclusiveLocked()). This
seems like an improvement even without aforementioned plans.
Additionally replace some direct calls to LWLockAcquire() with calls to
LockBuffer().
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
BM_PERMANENT is defined as 1U<<31, which is a negative number when interpreted
as a signed integer. Unfortunately the mask variable in BufferSync() was
signed. This has been wrong for a long time, but failed to fail, due to
integer conversion rules.
However, in an upcoming patch the width of the state variable will be
increased, with the wrong signedness leading to never flushing permanent
buffers - luckily caught in a test.
It seems better to fix this separately, instead of doing so as part of a
large, otherwise mechanical, patch.
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
While testing a new potential use for ReadRecentBuffer(), Andres
reported that it scales badly when called concurrently for the same
buffer by many backends. Instead of a naive (but wrong) coding with
PinBuffer(), it used the spinlock, so that it could be careful to pin
only if the buffer was valid and holding the expected block, to avoid
breaking invariants in eg GetVictimBuffer(). Unfortunately that made it
less scalable than PinBuffer(), which uses compare-exchange instead.
We can fix that by giving PinBuffer() a new skip_if_not_valid mode that
doesn't pin invalid buffers. It might occasionally skip when it
shouldn't due to the unlocked read of the header flags, but that's
unlikely and perfectly acceptable for an opportunistic optimisation
routine, and it can only succeed when it really should due to the
compare-exchange loop.
Note that this fixes ReadRecentBuffer()'s failure to bump the usage
count. While this could be seen as a bug, there currently aren't cases
affected by this in core, so it doesn't seem worth backpatching that portion.
Author: Thomas Munro <thomas.munro@gmail.com>
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/20230627020546.t6z4tntmj7wmjrfh%40awork3.anarazel.de
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
This commit introduces a new column mem_exceeded_count to the
pg_stat_replication_slots view. This counter tracks how often the
memory used by logical decoding exceeds the logical_decoding_work_mem
limit. The new statistic helps users determine whether exceeding the
logical_decoding_work_mem limit is a rare occurrences or a frequent
issue, information that wasn't available through existing statistics.
Bumps catversion.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/978D21E8-9D3B-40EA-A4B1-F87BABE7868C@yesql.se
In the same spirit as 3bf905692, assume that all compilers we still
support provide the NAN macro, and get rid of workarounds for that.
The C standard allows implementations to omit NAN if the underlying
float arithmetic lacks quiet (non-signaling) NaNs. However, we've
required that feature for years: the workarounds only supported
lack of the macro, not lack of the functionality. I put in a
compile-time #error if there's no macro, just for clarity.
Also fix up the copies of these functions in ecpglib, and leave
a breadcrumb for the next hacker who touches them.
History of the hacks being removed here can be found in commits
1bc2d544b, 4d17a2146, cec8394b5.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1952095.1759764279@sss.pgh.pa.us
Extensions can stash data computed at plan time into this list using
planner_shutdown_hook (or perhaps other mechanisms) and then access
it from any code that has access to the PlannedStmt (such as explain
hooks), allowing for extensible debugging and instrumentation of
plans.
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
These hooks allow plugins to get control at the earliest point at
which the PlannerGlobal object is fully initialized, and then just
before it gets destroyed. This is useful in combination with the
extendable plan state facilities (see extendplan.h) and perhaps for
other purposes as well.
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
This allows extensions to have access to any data they've stored
in the ExplainState during planning. Unfortunately, it won't help
with EXPLAIN EXECUTE is used, but since that case is less common,
this still seems like an improvement.
Since planner() has quite a few arguments now, also add some
documentation of those arguments and the return value.
Author: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo <guofenglinux@gmail.com>
Author: Antonin Houska <ah@cybertec.at> (in an older version)
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me> (in an older version)
Reviewed-by: Andy Fan <zhihuifan1213@163.com> (in an older version)
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> (in an older version)
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
This patch reuses the existing aggtransspace in pg_aggregate to
signal that an aggregate's transition state can grow unboundedly. If
aggtransspace is set to a negative value, it now indicates that the
transition state may consume unpredictable or large amounts of memory,
such as in aggregates like array_agg or string_agg that accumulate
input rows.
This information can be used by the planner to avoid applying
memory-sensitive optimizations (e.g., eager aggregation) when there is
a risk of excessive memory usage during partial aggregation.
Bump catalog version.
Per idea from Robert Haas, though applied differently than originally
suggested.
Discussion: https://postgr.es/m/CA+TgmoYbkvYwLa+1vOP7RDY7kO2=A7rppoPusoRXe44VDOGBPg@mail.gmail.com
The following information is added in the description of some GIN
records:
- In INSERT_LISTPAGE, the number of tuples and the right link block.
- In UPDATE_META_PAGE, the number of tuples, the previous tail block,
and the right link block.
- In SPLIT, the left and right children blocks.
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CALdSSPgnAt5L=D_xGXRXLYO5FK1H31_eYEESxdU1n-r4g+6GqA@mail.gmail.com
It is possible to call pg_stat_reset_single_function_counters() for a
single function, but the reset time was missing the system view showing
its statistics. Like all the fields of pg_stat_user_functions, the GUC
track_functions needs to be enabled to show the statistics about
function executions.
Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, as a result of the new field added to
PgStat_StatFuncEntry.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aONjnsaJSx-nEdfU@paquier.xyz
Fix several issues pointed out by Coverity (reported by Tome Lane).
- In row_is_in_frame(), return value of window_gettupleslot() was not
checked.
- WinGetFuncArgInPartition() tried to derefference "isout" pointer
even if it could be NULL in some places.
Besides the issues, I also fixed a compiler warning reported by Álvaro
Herrera.
Moreover, in WinGetFuncArgInPartition refactor the do...while loop so
that the codes inside the loop simpler. Also simplify the case when
abs_pos < 0.
Author: Tatsuo Ishii <ishii@postgresql.org>
Reviewed-by: Paul Ramsey <pramsey@cleverelephant.ca>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/1686755.1759679957%40sss.pgh.pa.us
Discussion: https://postgr.es/m/202510051612.gw67jlc2iqpw%40alvherre.pgsql
The INFINITY macro is always defined per C99 standard, so this should
mean we can now get rid of the workaround code for when that macro isn't
defined.
Also, delete the (now unneeded) #pragma code which was disabling a
compiler warning in MSVC. There was a comment explaining why the #pragma
was placed outside the function body to work around a MSVC compiler bug,
but the link explaining that was dead, as reported by jian he.
Author: David Rowley <dgrowleyml@gmail.com>
Reported-by: jian he <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxGARYETnNwtCK7QC0zE_7gq-tfN0mME=gT5rTNtC=VSHQ@mail.gmail.com
Instead, use the new mechanism that allows planner extensions to store
private state inside a PlannerInfo, treating GEQO as an in-core planner
extension. This is a useful test of the new facility, and also buys
back a few bytes of storage.
To make this work, we must remove innerrel_is_unique_ext's hack of
testing whether join_search_private is set as a proxy for whether
the join search might be retried. Add a flag that extensions can
use to explicitly signal their intentions instead.
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
Extension that make extensive use of planner hooks may want to
coordinate their efforts, for example to avoid duplicate computation,
but that's currently difficult because there's no really good way to
pass data between different hooks.
To make that easier, allow for storage of extension-managed private
state in PlannerGlobal, PlannerInfo, and RelOptInfo, along very
similar lines to what we have permitted for ExplainState since commit
c65bc2e1d1.
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
Seems Apple's version of "wc -l" puts spaces before the number.
(I wonder why the cfbot didn't find this.) While here, make
the failure case log what it got, to aid debugging future issues.
Per buildfarm.
We try to use the pager only when more than a screenful's worth of
data is to be printed. However, the code in print.c that's concerned
with counting the number of lines that will be needed missed a lot of
edge cases:
* While plain aligned mode accounted for embedded newlines in column
headers and table cells, unaligned and vertical output modes did not.
* In particular, since vertical mode repeats the headers for each
record, we need to account for embedded newlines in the headers for
each record.
* Multi-line table titles were not accounted for.
* tuples_only mode (where headers aren't printed) wasn't accounted
for.
* Footers were accounted for as one line per footer, again missing
the possibility of multi-line footers. (In some cases such as
"\d+" on a view, there can be many lines in a footer.) Also,
we failed to account for the default footer.
To fix, move the entire responsibility for counting lines into
IsPagerNeeded (or actually, into a new subroutine count_table_lines),
and then expand the logic as appropriate. Also restructure to make it
perhaps a bit easier to follow. It's still only completely accurate
for ALIGNED/WRAPPED/UNALIGNED formats, but the other formats are not
typically used with interactive output.
Arrange to not run count_table_lines at all unless we will use
its result, and teach it to quit early as soon as it's proven
that the output is long enough to require use of the pager.
When dealing with large tables this should save a noticeable
amount of time, since pg_wcssize() isn't exactly cheap.
In passing, move the "flog" output step to the bottom of printTable(),
rather than running it when we've already opened the pager in some
modes. In principle it shouldn't interfere with the pager because
flog should always point to a non-interactive file; but it seems silly
to risk any interference, especially when the existing positioning
seems to have been chosen with the aid of a dartboard.
Also add a TAP test to exercise pager mode. Up to now, we have had
zero test coverage of these code paths, because they aren't reached
unless isatty(stdout). We do have the test infrastructure to improve
that situation, though. Following the lead of 010_tab_completion.pl,
set up an interactive psql and feed it some test cases. To detect
whether it really did invoke the pager, point PSQL_PAGER to "wc -l".
The test is skipped if that utility isn't available.
Author: Erik Wienhold <ewie@ewie.name>
Test-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2dd2430f-dd20-4c89-97fd-242616a3d768@ewie.name
Previously, subqueries were given names only after they were planned,
which makes it difficult to use information from a previous execution of
the query to guide future planning. If, for example, you knew something
about how you want "InitPlan 2" to be planned, you won't know whether
the subquery you're currently planning will end up being "InitPlan 2"
until after you've finished planning it, by which point it's too late to
use the information that you had.
To fix this, assign each subplan a unique name before we begin planning
it. To improve consistency, use textual names for all subplans, rather
than, as we did previously, a mix of numbers (such as "InitPlan 1") and
names (such as "CTE foo"), and make sure that the same name is never
assigned more than once.
We adopt the somewhat arbitrary convention of using the type of sublink
to set the plan name; for example, a query that previously had two
expression sublinks shown as InitPlan 2 and InitPlan 1 will now end up
named expr_1 and expr_2. Because names are assigned before rather than
after planning, some of the regression test outputs show the numerical
part of the name switching positions: what was previously SubPlan 2 was
actually the first one encountered, but we finished planning it later.
We assign names even to subqueries that aren't shown as such within the
EXPLAIN output. These include subqueries that are a FROM clause item or
a branch of a set operation, rather than something that will be turned
into an InitPlan or SubPlan. The purpose of this is to make sure that,
below the topmost query level, there's always a name for each subquery
that is stable from one planning cycle to the next (assuming no changes
to the query or the database schema).
Author: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: http://postgr.es/m/3641043.1758751399@sss.pgh.pa.us
When either inputs of an INTERSECT [ALL] operator are proven not to return
any results (a dummy rel), then mark the entire INTERSECT operation as
dummy.
Likewise, if an EXCEPT [ALL] operation's left input is proven empty, then
mark the entire operation as dummy.
With EXCEPT ALL, we can easily handle the right input being dummy as
we can return the left input without any processing. That can lead to
significant performance gains during query execution. We can't easily
handle dummy right inputs for EXCEPT (without ALL), as that would require
deduplication of the left input. Wiring up those Paths is likely more
complex than it's worth as the gains during execution aren't that great,
so let's leave that one to be handled by the normal Path generation code.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAApHDvri53PPF76c3M94_QNWbJfXjyCnjXuj_2=LYM-0m8WZtw@mail.gmail.com
The prior code, added in 03d40e4b5 attempted to use the targetlist of the
first UNION child when all UNION children were proven as dummy rels.
That's not going to work when some operation atop of the Result node must
find target entries within the Result's targetlist. This could have been
something as simple as trying to sort the results of the UNION operation,
which would lead to:
ERROR: could not find pathkey item to sort
Instead, use the top-level UNION's targetlist and fix the varnos in
setrefs.c. Because set operation targetlists always use varno==0, we
can rewrite those to become varno==1, i.e. use the Vars from the first
UNION child. This does result in showing Vars from relations that are
not present in the final plan, but that's no different to what we see
when normal base relations are proven dummy.
Without this fix it would be possible to see the following error in
EXPLAIN VERBOSE when all UNION inputs were proven empty.
ERROR: bogus varno: 0
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvrUASy9sfULMEsM2udvZJP6AoBRCZvHYXYxZTy2tX9FYw@mail.gmail.com
Previously, we attempted to form a posting list tuple even when
ginCompressPostingList() failed to compress the posting list due to
its size. While there was no functional failure, it always wasted one
GinFormTuple() call when item pointers didn't fit in a posting list
tuple.
This commit ensures that a GIN index tuple is formed only when all
item pointers in the posting list are successfully compressed.
Author: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAE7r3M+C=jcpTD93f_RBHrQp3C+=TAXFs+k4tTuZuuxboK8AvA@mail.gmail.com
The hex_encode() and hex_decode() functions serve as the workhorses
for hexadecimal data for bytea's text format conversion functions,
and some workloads are sensitive to their performance. This commit
adds new implementations that use routines from port/simd.h, which
testing indicates are much faster for larger inputs. For small or
invalid inputs, we fall back on the existing scalar versions.
Since we are using port/simd.h, these optimizations apply to both
x86-64 and AArch64.
Author: Nathan Bossart <nathandbossart@gmail.com>
Co-authored-by: Chiranmoy Bhattacharya <chiranmoy.bhattacharya@fujitsu.com>
Co-authored-by: Susmitha Devanga <devanga.susmitha@fujitsu.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/aLhVWTRy0QPbW2tl%40nathan
This patch enhances the pg_get_sequence_data function to include the
page-level LSN (Log Sequence Number) of the sequence. This additional
metadata will be used by upcoming patches to support synchronization
of sequences during logical replication.
By exposing the LSN, we enable more accurate tracking of sequence
changes, which is essential for maintaining consistency across
replicated nodes.
Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
All the other structures describe the list of blocks used, and in the
case of a GIN_INSERT_LISTPAGE record block 0 refers to a list page with
the items added to it.
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CALdSSPgk=9WRoXhZy5fdk+T1hiau7qbL_vn94w_L1N=gtEdbsg@mail.gmail.com
The WAL records XLOG_GIN_INSERT and XLOG_GIN_VACUUM_DATA_LEAF_PAGE
included some information about the blocks added to the record.
This information is already provided by XLogRecGetBlockRefInfo() with
much more details about the blocks included in each record, like the
compression information, for example. This commit removes the block
information that existed in the record descriptions specific to GIN.
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CALdSSPgk=9WRoXhZy5fdk+T1hiau7qbL_vn94w_L1N=gtEdbsg@mail.gmail.com
It is possible to call pg_stat_reset_single_table_counters() on a
relation (index or table) but the reset time was missing from the system
views showing their statistics.
This commit adds the reset time as an attribute of pg_stat_all_tables,
pg_stat_all_indexes, and other relations related to them.
Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, as a result of the new field added to
PgStat_StatTabEntry.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aN8l182jKxEq1h9f@paquier.xyz
stats.sql is already doing some tests coverage on index statistics, by
retrieving for example idx_scan and friends in pg_stat_all_tables.
pg_stat_reset_single_table_counters() is supported for an index for a
long time, but the case was never covered.
This commit closes the gap, by using this reset function on an index,
cross-checking the contents of pg_stat_all_indexes.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aN8l182jKxEq1h9f@paquier.xyz
On Windows, this code did not handle error conditions correctly at
all, since it looked at "errno" which is not used for socket-related
errors on that platform. This resulted, for example, in failure
to connect to a PostgreSQL server with GSSAPI enabled.
We have a convention for dealing with this within libpq, which is to
use SOCK_ERRNO and SOCK_ERRNO_SET rather than touching errno directly;
but the GSSAPI code is a relative latecomer and did not get that memo.
(The equivalent backend code continues to use errno, because the
backend does this differently. Maybe libpq's approach should be
rethought someday.)
Apparently nobody tries to build libpq with GSSAPI support on Windows,
or we'd have heard about this before, because it's been broken all
along. Back-patch to all supported branches.
Author: Ning Wu <ning94803@gmail.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFGqpvg-pRw=cdsUpKYfwY6D3d-m9tw8WMcAEE7HHWfm-oYWvw@mail.gmail.com
Backpatch-through: 13
This adjusts UNION planning so that the planner produces more optimal
plans when one or more of the UNION's subqueries have been proven to be
empty (a dummy rel).
If any of the inputs are empty, then that input can be removed from the
Append / MergeAppend. Previously, a const-false "Result" node would
appear to represent this. Removing empty inputs has a few extra
benefits when only 1 union child remains as it means the Append or
MergeAppend can be removed in setrefs.c, making the plan slightly faster
to execute. Also, we can provide better n_distinct estimates by looking
at the sole remaining input rel's statistics.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAApHDvri53PPF76c3M94_QNWbJfXjyCnjXuj_2=LYM-0m8WZtw@mail.gmail.com
bms_union() causes a new set to be allocated. What this caller needs is
members added to an existing set. bms_add_members() is the tool for
that job.
This is just a matter of fixing an inefficiency due to surplus memory
allocations. No bugs being fixed.
The only other place I found that might be valid to apply this change is
in markNullableIfNeeded(), but I opted not to do that due to the risk to
reward ratio not looking favorable. The risk being that there *could* be
another pointer pointing to the Bitmapset.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAApHDvoCcoS-p5tZNJLTxFOKTYNjqVh7Dwf+5ikDUBwnvWftRw@mail.gmail.com
Presently, the SIMD implementation of this function uses unsigned
saturating subtraction to find bytes less than or equal to the
given value, which is a workaround for the lack of unsigned
comparison instructions on some architectures. However, Neon
offers vminvq_u8(), which returns the minimum (unsigned) value in
the vector. This commit adds a Neon-specific implementation that
uses vminvq_u8() to optimize vector8_has_le() on AArch64.
In passing, adjust the SSE2 implementation to use vector8_min() and
vector8_eq() to find values less than or equal to the given value.
This was the only use of vector8_ssub(), so it has been removed.
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/aNHDNDSHleq0ogC_%40nathan
Make some use of anonymous unions, which are allowed as of C11, as
examples and encouragement for future code, and to test compilers.
This commit changes the DSMRegistryEntry struct.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/aNKsDg0fJwqhZdXX%40nathan
This function is marked as strict, so we can safely remove the checks
checking for NULL input parameters.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAApHDvqiN0+mbooUOSCDALc=GoM8DmTbCdvwnCwak6Wb2O1ZJA@mail.gmail.com
In similar vein to commit ccc8194e42, a reset instance of a shared
memory TID store happened to occupy the same private memory as the old
one for the entry point, since the chunk freed after the last round
of index vacuuming was put on the context's freelist. The failure
to update the vacrel->dead_items pointer was evident by nudging the
system to allocate memory in a different area. This was not discovered
at the time of the earlier commit since our regression tests didn't
cover multiple index passes with parallel vacuum.
Backpatch to v17, when TidStore came in.
Author: Kevin Oommen Anish <kevin.o@zohocorp.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Tested-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/199a07cbdfc.7a1c4aac25838.1675074408277594551%40zohocorp.com
Backpatch-through: 17
The comment incorrectly references the defunct function
BufFileOpenShared(), which was replaced in commit dcac5e7ac.
This patch updates the comment to refer to the current function
BufFileOpenFileSet().
Author: Zhang Mingli <zmlpostgres@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/1cb48b4c-54ab-40cc-b355-0b3c2af6d3f7@Spark
Currently, pgbench aborts when a COPY response is received in
readCommandResponse(). However, as PQgetResult() returns an empty
result when there is no asynchronous result, through getCopyResult(),
the logic done at the end of readCommandResponse() for the error path
leads to an infinite loop.
This commit forcefully exits the COPY state with PQendcopy() before
moving to the error handler when fiding a COPY state, avoiding the
infinite loop. The COPY protocol is not supported by pgbench anyway, as
an error is assumed in this case, so giving up is better than having the
tool be stuck forever. pgbench was interruptible in this state.
A TAP test is added to check that an error happens if trying to use
COPY.
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://postgr.es/m/CAO6_XqpHyF2m73ifV5a=5jhXxH2chk=XrgefY+eWWPe2Eft3=A@mail.gmail.com
Backpatch-through: 13
Add IGNORE NULLS/RESPECT NULLS option (null treatment clause) to lead,
lag, first_value, last_value and nth_value window functions. If
unspecified, the default is RESPECT NULLS which includes NULL values
in any result calculation. IGNORE NULLS ignores NULL values.
Built-in window functions are modified to call new API
WinCheckAndInitializeNullTreatment() to indicate whether they accept
IGNORE NULLS/RESPECT NULLS option or not (the API can be called by
user defined window functions as well). If WinGetFuncArgInPartition's
allowNullTreatment argument is true and IGNORE NULLS option is given,
WinGetFuncArgInPartition() or WinGetFuncArgInFrame() will return
evaluated function's argument expression on specified non NULL row (if
it exists) in the partition or the frame.
When IGNORE NULLS option is given, window functions need to visit and
evaluate same rows over and over again to look for non null rows. To
mitigate the issue, 2-bit not null information array is created while
executing window functions to remember whether the row has been
already evaluated to NULL or NOT NULL. If already evaluated, we could
skip the evaluation work, thus we could get better performance.
Author: Oliver Ford <ojford@gmail.com>
Co-authored-by: Tatsuo Ishii <ishii@postgresql.org>
Reviewed-by: Krasiyan Andreev <krasiyan@gmail.com>
Reviewed-by: Andrew Gierth <andrew@tao11.riddles.org.uk>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Fetter <david@fetter.org>
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/flat/CAGMVOdsbtRwE_4+v8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A@mail.gmail.com
test_bms_make_singleton is defined as STRICT and only takes a single
parameter, so there is no need to check that parameter for NULL as a
NULL input won't ever reach there.
Author: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/BC483901-9587-4076-B20F-9A414C66AB78@yesql.se
This fixes a typo in the sql/expected test files and removes a
leftover comment from test_bitmapset.c from when the functions
invoked bms_free.
Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/978D21E8-9D3B-40EA-A4B1-F87BABE7868C@yesql.se
Move the checks out of the Makefile into a perl script that can be
called from both the Makefile and meson.build. The set of files checked
is simplified, so it is just all the sgml and xsl files found in
docs/src/sgml directory tree.
Along the way make some adjustments to .cirrus.tasks.yml to support this
better in CI.
Also ensure that the checks are part of the Makefile's html target.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-Author: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CAN55FZ3BnM+0twT-ZWL8As9oBEte_b+SBU==cz6Hk8JUCM_5Wg@mail.gmail.com
Oversight in 2c03216d83, when the redo code of GIN got refactored for
the new WAL format where block information has been standardized, as the
payload data got tracked for each block after the change, and not in the
whole record. This is just a cleanup.
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CALdSSPgnAt5L=D_xGXRXLYO5FK1H31_eYEESxdU1n-r4g+6GqA@mail.gmail.com
pgstattuple checks the state of the pages retrieved for gist and hash
using some check functions from each index AM, respectively
gistcheckpage() and _hash_checkpage(). When these are called, they
would fail when bumping on data that is found as incorrect (like opaque
area size not matching, or empty pages), contrary to btree that simply
discards these cases and continues to aggregate data.
Zero pages can happen after a crash, with these AMs being able to do an
internal cleanup when these are seen. Also, sporadic failures are
annoying when doing for example a large-scale diagnostic query based on
pgstattuple with a join of pg_class, as it forces one to use tricks like
quals to discard hash or gist indexes, or use a PL wrapper able to catch
errors.
This commit changes the reports generated for btree, gist and hash to
be more user-friendly;
- When seeing an empty page, report it as free space. This new rule
applies to gist and hash, and already applied to btree.
- For btree, a check based on the size of BTPageOpaqueData is added.
- For gist indexes, gistcheckpage() is not called anymore, replaced by a
check based on the size of GISTPageOpaqueData.
- For hash indexes, instead of _hash_getbuf_with_strategy(), use a
direct call to ReadBufferExtended(), coupled with a check based on
HashPageOpaqueData. The opaque area size check was already used.
- Pages that do not match these criterias are discarded from the stats
reports generated.
There have been a couple of bug reports over the years that complained
about the current behavior for hash and gist, as being not that useful,
with nothing being done about it. Hence this change is backpatched down
to v13.
Reported-by: Noah Misch <noah@leadboat.com>
Author: Nitin Motiani <nitinmotiani@google.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/CAH5HC95gT1J3dRYK4qEnaywG8RqjbwDdt04wuj8p39R=HukayA@mail.gmail.com
Backpatch-through: 13
Some macOS machines are having trouble with 002_inline, which executes
the JSON parser test executables hundreds of times in a nested loop.
Both developer machines and buildfarm critters have shown excessive test
durations, upwards of 20 seconds.
Push the innermost loop of 002_inline, which iterates through differing
chunk sizes, down into the test executable. (I'd eventually like to push
all of the JSON unit tests down into C, but this is an easy win in the
short term.) Testers have reported a speedup between 4-9x.
Reported-by: Robert Haas <robertmhaas@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Tested-by: Andrew Dunstan <andrew@dunslane.net>
Tested-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA%2BTgmobKoG%2BgKzH9qB7uE4MFo-z1hn7UngqAe9b0UqNbn3_XGQ%40mail.gmail.com
Backpatch-through: 17
Newer compilers warned about
extern int _CRT_glob = 0;
which is indeed a mysterious C construction, as it combines "extern"
and an initialization. It turns out that according to the C standard,
the "extern" is ignored here, so we can remove it to resolve the
warnings. But then we also need to add a real extern
declaration (without initializer) to satisfy
-Wmissing-variable-declarations.
(Note that this code is only active on MinGW.)
Discussion: https://www.postgresql.org/message-id/1053279b-da01-4eb4-b7a3-da6b5d8f73d1%40eisentraut.org
Two macros are added in this module, to cut duplicated patterns:
- PG_ARG_GETBITMAPSET(), for input argument handling, with knowledge
about NULL.
- PG_RETURN_BITMAPSET_AS_TEXT(), that generates a text result from a
Bitmapset.
These changes limit the code so as the SQL functions are now mostly
wrappers of the equivalent C function. Functions that use integer input
arguments still need some NULL handling, like bms_make_singleton().
A NULL input is translated to "<>", which is what nodeToString()
generates. Some of the tests are able to generate this result.
Per discussion, the calls of bms_free() are removed. These may be
justified if the functions are used in a rather long-lived memory
context, but let's keep the code minimal for now. These calls used NULL
checks, which were also not necessary as NULL is an input authorized by
bms_free().
Some of the tests existed to cover behaviors related to the SQL
functions for NULL inputs. Most of them are still relevant, as the
routines of bitmapset.c are able to handle such cases.
The coverage reports of bitmapset.c and test_bitmapset.c remain the
same after these changes, with 300 lines of C code removed.
Author: David Rowley <dgrowleyml@gmail.com>
Co-authored-by: Greg Burd <greg@burd.me>
Discussion: https://postgr.es/m/CAApHDvqghMnm_zgSNefto9oaEJ0S-3Cgb3gdsV7XvLC-hMS02Q@mail.gmail.com
pgbench uses readCommandResponse() to process server responses.
When readCommandResponse() encounters an error during a call to
PQgetResult() to fetch the current result, it attempts to report it
with an additional error message from PQerrorMessage(). However,
previously, this extra error message could be lost or become incorrect.
The cause was that after fetching the current result (and detecting
an error), readCommandResponse() called PQgetResult() again to
peek at the next result. This second call could overwrite the libpq
connection's error message before the original error was reported,
causing the error message retrieved from PQerrorMessage() to be
lost or overwritten.
This commit fixes the issue by updating readCommandResponse()
to use PQresultErrorMessage() instead of PQerrorMessage()
to retrieve the error message generated when the PQgetResult()
for the current result causes an error, ensuring the correct message
is reported.
Backpatch to all supported versions.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/20250925110940.ebacc31725758ec47d5432c6@sraoss.co.jp
Backpatch-through: 13
This reverts commit efcd5199d8.
I rebased my patch series incorrectly. This patch contained unrelated
parts from another patch, which made the overall build fail. Revert
for now and reconsider.
Stop including utils/relcache.h in access/genam.h, and stop including
htup_details.h in nodes/tidbitmap.h. Both these files (genam.h and
tidbitmap.h) are widely used in other header files, so it's in our best
interest that they remain as lean as reasonable.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/202509291356.o5t6ny2hoa3q@alvherre.pgsql
During recovery, XLogNeedsFlush() checks the minimum recovery LSN point
instead of the flush LSN point. The same condition checks are used when
updating the minimum recovery point in UpdateMinRecoveryPoint(), but are
written in reverse order.
This commit makes the order of the checks consistent between
XLogNeedsFlush() and UpdateMinRecoveryPoint(), improving the code
clarity. Note that the second check (as ordered by this commit) relies
on InRecovery, which is true only in the startup process. So this makes
XLogNeedsFlush() cheaper in the startup process with the first check
acting as a shortcut while doing crash recovery, where
LocalMinRecoveryPoint is an invalid LSN.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/aMIHNRTP6Wj6vw1s%40paquier.xyz
Contrary to its siblings for the archiver, the bgwriter and the
checkpointer stats, pgstat_report_inj_fixed() can be called
concurrently. This was causing an assertion failure, while messing up
with the stats.
This code is aimed at being a template for extension developers, so it
is not a critical issue, but let's be correct. This module has also
been useful for some benchmarking, at least for me, and that was how I
have discovered this issue.
Oversight in f68cd847fa.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://postgr.es/m/aNnXbAXHPFUWPIz2@paquier.xyz
Backpatch-through: 18
GROUP BY ALL is a form of GROUP BY that adds any TargetExpr that does
not contain an aggregate or window function into the groupClause of
the query, making it exactly equivalent to specifying those same
expressions in an explicit GROUP BY list.
This feature is useful for certain kinds of data exploration. It's
already present in some other DBMSes, and the SQL committee recently
accepted it into the standard, so we can be reasonably confident in
the syntax being stable. We do have to invent part of the semantics,
as the standard doesn't allow for expressions in GROUP BY, so they
haven't specified what to do with window functions. We assume that
those should be treated like aggregates, i.e., left out of the
constructed GROUP BY list.
In passing, wordsmith some existing documentation about GROUP BY,
and update some neglected synopsis entries in select_into.sgml.
Author: David Christensen <david@pgguru.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAHM0NXjz0kDwtzoe-fnHAqPB1qA8_VJN0XAmCgUZ+iPnvP5LbA@mail.gmail.com
Neighbor get_statistics_object_oid() ignores objects in pg_temp, as has
been the standard for non-relation, non-type namespace searches since
CVE-2007-2138. Hence, most operations that name a statistics object
correctly decline to map an unqualified name to a statistics object in
pg_temp. StatisticsObjIsVisibleExt() did not. Consequently,
pg_statistics_obj_is_visible() wrongly returned true for such objects,
psql \dX wrongly listed them, and getObjectDescription()-based ereport()
and pg_describe_object() wrongly omitted namespace qualification. Any
malfunction beyond that would depend on how a human or application acts
on those wrong indications. Commit
d99d58cdc8 introduced this. Back-patch to
v13 (all supported versions).
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/20250920162116.2e.nmisch@google.com
Backpatch-through: 13
This commit expands the set of tests added by 00c3d87a5c, to bring the
coverage of bitmapset.c close to 100% by addressing a lot of corner
cases (most of these relate to word counts and reallocations).
Some of the functions of this module also have their own idea of the
result to return depending on the input values given. These are
specific to the module, still let's add more coverage for all of them.
Some comments are made more consistent in the tests, while on it.
Author: Greg Burd <greg@burd.me>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aNR-gsGmLnMaNT5i@paquier.xyz
For UNION, EXCEPT and INTERSECT, we were not very good at estimating the
PathTarget.width for the set operation. Since the targetlist of the set
operation is made up of Vars with varno==0, this would result in
get_expr_width() applying a default estimate based on the Var's type
rather than taking width estimates from any relation's statistics.
Here we attempt to improve the situation by looking at the width estimates
for the set operation child paths and calculating the average width of the
relevant child paths weighted over the estimated number of rows. For
UNION and INTERSECT, the relevant paths to look at are *all* child paths.
For EXCEPT, since we don't return rows from the right-hand child (only
possibly remove left-hand rows matching those), we use only the left-hand
child for width estimates.
This also adjusts the hashed-UNION Path's PathTarget to use the same
PathTarget as its Append subpath. Both PathTargets will be the same and
are void of any resjunk columns, per generate_append_tlist(). Making
the AggPath use the same PathTarget saves having to adjust the "width"
of the AggPath's PathTarget too.
This was reported as a bug by sunw.fnst, but it's not something we ever
claimed to do properly. Plus, if we were to adjust this in back
branches, plans could change as the estimated input sizes to Sorts and
Hash Aggregates could go up or down. Plan choices aren't something we
want to destabilize in stable versions.
Reported-by: sunw.fnst <936739278@qq.com>
Author: David Rowley <drowleyml@gmail.com>
Discussion: https://postgr.es/m/tencent_34CF8017AB81944A4C08DD089D410AB6C306@qq.com
This serves as coverage for the tracking of entry count added by
7bd2975fa9 as built-in variable-sized stats kinds have no need for it,
at least not yet.
A new function, called injection_points_stats_count(), is added to the
module. It is able to return the number of entries. This has been
useful when doing some benchmarking to check the sanity of the counts.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aMPKWR81KT5UXvEr@paquier.xyz
Stats kinds can set a new option called "track_entry_count" (disabled by
default, available for variable-numbered stats) that will make pgstats
track the number of entries that exist in its shared hashtable.
As there is only one code path where a new entry is added, and one code
path where entries are freed, the count tracking is straight-forward in
its implementation. Reads of these counters are optimistic, and may
change across two calls. The counter is incremented when an entry is
created (not when reused), and is decremented when an entry is freed
from the hashtable (marked for drop with its refcount reaching 0), which
is something that pgstats decides internally.
A first use case of this facility would be pg_stat_statements, where we
need to be able to cap the number of entries that would be stored in the
shared hashtable, based on its "max" GUC. The module currently relies
on hash_get_num_entries(), which offers a cheap way to count how many
entries are in its hash table, but we cannot do that in pgstats for
variable-sized stats kinds as a single hashtable is used for all the
stats kinds. Independently of PGSS, this is useful for other custom
stats kinds that want to cap, control, or track the number of entries
they have, without depending on a potentially expensive sequential scan
to know the number of entries while holding an extra exclusive lock.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Keisuke Kuroda <keisuke.kuroda.3862@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aMPKWR81KT5UXvEr@paquier.xyz
transformPLAssignStmt contained many lines cribbed directly from
transformSelectStmt. I had supposed that we could manage to keep
the two copies in sync, but the bug just fixed in 7504d2be9 shows
that that hope was foolish. Let's refactor so there's just one copy.
The main stumbling block to doing this is that transformPLAssignStmt
has a chunk of custom code that has to run after transformTargetList
but before we potentially modify the tlist further during analysis
of ORDER BY and GROUP BY. Rather than make transformSelectStmt fully
aware of PLAssignStmt processing, I put that code into a callback
function. It still feels a little bit ugly, but it's not too awful,
and surely it's better than a hundred lines of duplicated code.
The steps involved in processing a PLAssignStmt remain exactly
the same as before, just in different places.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/31027.1758919078@sss.pgh.pa.us
Because we failed to do this, DISTINCT in GROUP BY DISTINCT would be
ignored in PL/pgSQL assignment statements. It's not surprising that
no one noticed, since such statements will throw an error if the query
produces more than one row. That eliminates most scenarios where
advanced forms of GROUP BY could be useful, and indeed makes it hard
even to find a simple test case. Nonetheless it's wrong.
This is directly the fault of be45be9c3 which added the groupDistinct
field, but I think much of the blame has to fall on c9d529848, in
which I incautiously supposed that we'd manage to keep two copies of
a big chunk of parse-analysis logic in sync. As a follow-up, I plan
to refactor so that there's only one copy. But that seems useful
only in master, so let's use this one-line fix for the back branches.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/31027.1758919078@sss.pgh.pa.us
Backpatch-through: 14
It had always been intended that this already works correctly as
pg_unreachable() uses __assume(0) on MSVC, and that directs the compiler
in a way so it knows that a given function won't return. However, with
ereport_domain(), it didn't work...
It's now understood that the failure to determine that elog(ERROR) does not
return comes from the inability of the MSVC compiler to detect the "const
int elevel_" is the same as the "elevel" macro parameter. MSVC seems to be
unable to make the "if (elevel_ >= ERROR) branch constantly-true when the
macro is used with any elevel >= ERROR, therefore the pg_unreachable() is
seen to only be present in a *conditional* branch rather than present
*unconditionally*.
While there seems to be no way to force the compiler into knowing that
elevel_ is equal to elevel within the ereport_domain() macro, there is a
way in C11 to determine if the elevel parameter is a compile-time
constant or not. This is done via some hackery using the _Generic()
intrinsic function, which gives us functionality similar to GCC's
__builtin_constant_p(), albeit only for integers.
Here we define pg_builtin_integer_constant_p() for this purpose.
Callers can check for availability via HAVE_PG_BUILTIN_INTEGER_CONSTANT_P.
ereport_domain() has been adjusted to use
pg_builtin_integer_constant_p() instead of __builtin_constant_p().
It's not quite clear at this stage if this now allows us to forego doing
the likes of "return NULL; /* keep compiler quiet */" as there may be other
compilers in use that have similar struggles. It's just a matter of time
before someone commits a function that does not "return" a value after
an elog(ERROR). Let's make time and lack of complaints about said commit
be the judge of if we need to continue the "/* keep compiler quiet */"
palaver.
Author: David Rowley <drowleyml@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CAApHDvom02B_XNVSkvxznVUyZbjGAR+5myA89ZcbEd3=PA9UcA@mail.gmail.com
This allows these routines to be reused by a future utility heavily
based on vacuumdb.
I made a few relatively minor changes from the original, most notably:
- objfilter was declared as an enum but the values are bit-or'ed, and
individual bits are tested throughout the code. We've discussed this
coding pattern in other contexts and stayed away from it, on the
grounds that the values so generated aren't really true values of the
enum. This commit changes it to be a bits32 with a few #defines for
the flag definitions, the way we do elsewhere. Also, instead of being
a global variable, it's now in the vacuumingOptions struct.
- Two booleans, analyze_only (in vacuumingOptions) and analyze_in_stages
(passed around as a separate boolean argument), are really determining
what "mode" the program runs in -- it's either vacuum, or one of those
two modes. I have three adjectives for them: inconsistent,
unergonomic, unorthodox. Removing these and replacing them with a
mode enum to be kept in vacuumingOptions makes the code structure easier
to understand in a couple of places, and it'll be useful for the new
mode we add next, so do that.
Reviewed-by: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/202508301750.cbohxyy2pcce@alvherre.pgsql
Use our established coding pattern to reduce maintenance pain when
adding other per-process-type characteristics.
Like PG_KEYWORD, PG_CMDTAG, PG_RMGR.
To keep the strings translatable, the relevant makefile now also scans
src/include for this specific file. I didn't want to have it scan all
.h files, as then gettext would have to scan all header files. I didn't
find any way to affect the meson behavior in this respect though.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Co-authored-by: Jonathan Gonzalez V. <jonathan.abdiel@gmail.com>
Discussion: https://postgr.es/m/202507151830.dwgz5nmmqtdy@alvherre.pgsql
When running pgbench with --verbose-errors option and a custom script that
triggered retriable errors (e.g., serialization errors) in pipeline mode,
an assertion failure could occur:
Assertion failed: (sql_script[st->use_file].commands[st->command]->type == 1), function commandError, file pgbench.c, line 3062.
The failure happened because pgbench assumed these errors would only occur
during SQL commands, but in pipeline mode they can also happen during
\endpipeline meta command.
This commit fixes the assertion failure by adjusting the assertion check to
allow such errors during either SQL commands or \endpipeline.
Backpatch to v15, where the assertion check was introduced.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwGWQMOzNkQs-LmpDHdNC0h8dmAuUMRvZrEntQi5a-b=Kg@mail.gmail.com
This improves the stability of VACUUM when processing btree indexes,
which was previously able to trigger an assertion failure in
_bt_lock_subtree_parent() when an error was previously thrown outside
the scope of _bt_split() when splitting a btree page. VACUUM would
consider the index as in a corrupted state as the right page would not
be zeroed for the error thrown (allocation failure is one pattern).
In a non-assert build, VACUUM is able to succeed, reporting what it sees
as a corruption while attempting to fix the index. This would manifest
as a LOG message, as of:
LOG: failed to re-find parent key in index "idx" for deletion target
page N
CONTEXT: while vacuuming index "idx" of relation "public.tab"
This commit improves the code to rely on two PGAlignedBlocks that are
used as a temporary space for the left and right pages. The main change
concerns the right page, whose contents are now copied into the
"temporary" PGAlignedBlock page while its original space is zeroed. Its
contents are moved from the PGAlignedBlock page back to the page once we
enter in the critical section used for the split. This simplifies the
split logic, as it is not necessary to zero the right page before
throwing an error anymore. Hence errors can now be thrown outside the
split code. For the left page, this shaves one allocation, with
PageGetTempPage() being previously used.
The previous logic originates from commit 8fa30f906b, at a point where
PGAlignedBlock did not exist yet. This could be argued as something
that should be backpatched, but the lack of complaints indicates that it
may not be necessary.
Author: Konstantin Knizhnik <knizhnik@garret.ru>
Discussion: https://postgr.es/m/566dacaf-5751-47e4-abc6-73de17a5d42a@garret.ru
The comment claimed that a TABLESPACE reference was added to the
resulting string, but that's not true. Looks like the comment was
copied from pg_get_indexdef_string() without being adjusted correctly.
Reported-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxHwVPgeu8o9D8oUeDQYEHTAZGt-J5uaJNgYMzkAW7MiCA@mail.gmail.com
... and find_window_run_conditions.
This seems to have been around and unused ever since the Run Condition
feature was added in 9d9c02ccd. Let's remove it to clean things up a
bit.
Author: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/DD26NJ0Y34ZS.2ZOJPHSY12PFI@gmail.com
While most tab completions in match_previous_words() use
COMPLETE_WITH* macros to wrap rl_completion_matches(), some direct
calls to rl_completion_matches() still remained.
This commit introduces COMPLETE_WITH_FILES and COMPLETE_WITH_GENERATOR
macros to replace these direct calls, enhancing both code consistency
and readability.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20250605100835.b396f9d656df1018f65a4556@sraoss.co.jp
I noticed the surprising behavior that pg_sleep(0.001) will sleep
for 2ms not the expected 1ms. Apparently the float8 calculation of
time-to-sleep is managing to produce something a hair over 1, which
ceil() rounds up to 2, and then WaitLatch() faithfully waits 2ms.
It could be that this works as-expected for some ranges of current
timestamp but not others, which would account for not having seen
it before. In any case, let's try to avoid it by removing the
float arithmetic in the delay calculation. We're stuck with the
declared input type being float8, but we can convert that to integer
microseconds right away, and then work strictly with integral values.
There might still be roundoff surprises for certain input values,
but at least the behavior won't be time-varying.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/3879137.1758825752@sss.pgh.pa.us
The functions test_stat_func() and test_stat_func2() had empty
function bodies, so that they took very little time to run. This made
it possible that on machines with relatively low timer resolution the
functions could return before the clock advanced, making the test fail
(as seen on buildfarm members fruitcrow and hamerkop).
To avoid that, pg_sleep for 10us during the functions. As far as we
can tell, all current hardware has clock resolution much less than
that. (The current implementation of pg_sleep will round it up to
1ms anyway, but someday that might get improved.)
Author: Michael Banck <mbanck@gmx.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/68d413a3.a70a0220.24c74c.8be9@mx.google.com
Backpatch-through: 15
If we already have an extension_state array but see a new extension_id
much larger than the highest the extension_id we've previously seen,
the old code might have failed to expand the array to a large enough
size, leading to disaster. Also, if we don't have an extension array
at all and need to create one, we should make sure that it's big enough
that we don't have to resize it instantly.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/2949591.1758570711@sss.pgh.pa.us
Backpatch-through: 18
These were omitted from build dependencies and also tab/nbsp
checks, with the result that "make" did nothing after modifying
a func/*.sgml file.
Oversight in 4e23c9ef6. AFAICT we don't need any comparable
changes in meson.build, or at least I don't see it doing anything
special for the pre-existing ref/*.sgml files.
When defining an injection point there is no need to wrap the definition
with USE_INJECTION_POINT guards, the INJECTION_POINT macro is available
in all builds. Remove to make the code consistent.
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/OSCPR01MB14966C8015DEB05ABEF2CE077F51FA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Backpatch-through: 17
... which silently propagates a lot of headers into many places
via pgstat.h, as evidenced by the variety of headers that this patch
needs to add to seemingly random places. Add a minimum of typedefs to
conflict.h to be able to remove execnodes.h, and fix the fallout.
Backpatch to 18, where conflict.h first appeared.
Discussion: https://postgr.es/m/202509191927.uj2ijwmho7nv@alvherre.pgsql
This commit updates the pgbench documentation to list \gset and \aset
as separate terms for easier reading. It also clarifies that \gset raises
an error if the query returns zero or multiple rows, and explains how to
detect cases where the query with \aset returned no rows.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/20250626180125.5b896902a3d0bcd93f86c240@sraoss.co.jp
Previously, vacuumdb --analyze-only issued VACUUM (ONLY_DATABASE_STATS)
at the end. Since --analyze-only is meant to update optimizer statistics only,
this extra VACUUM command is unnecessary.
This commit prevents vacuumdb --analyze-only from running that redundant
VACUUM command.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwEqHGa-k=wbRMucUVihHVXk4NQkK94GNN=ym9cQ5HBSHg@mail.gmail.com
f83d709760 incorrectly refers to a XLOG_HEAP2_PRUNE_FREEZE WAL record
opcode. No such code exists. The relevant opcodes are
XLOG_HEAP2_PRUNE_ON_ACCESS, XLOG_HEAP2_PRUNE_VACUUM_SCAN, and
XLOG_HEAP2_PRUNE_VACUUM_CLEANUP. Correct it.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc%407uw6jyyxuyf7
Without this, rebuilds can malfunction unless --enable-depend is used.
Historically we've expected that you can get away without
--enable-depend as long as you manually clean after changing *.h
files; the makefiles are supposed to handle other sorts of
dependencies. So add this one.
Follow-on to 635998965, so no need for back-patch.
Discussion: https://postgr.es/m/3121329.1758650878@sss.pgh.pa.us
We were already doing a short (1-second) pg_test_timing run during
check-world and buildfarm runs. But we weren't doing anything
with the result except for a basic regex-based sanity check.
Collecting that output from buildfarm runs is seeming very
attractive though, because it would help us determine what sort
of timing resolution is available on supported platforms.
It's not very long, so let's just note it verbatim in the TAP log.
Discussion: https://postgr.es/m/3321785.1758728271@sss.pgh.pa.us
This commit corrects several issues in function comments:
* The parameter "rel" was incorrectly referred to as "relation" in the comments
for table_tuple_delete(), table_tuple_update(), and table_tuple_lock().
* In table_tuple_delete(), "changingPart" was listed as an output parameter
in the comments but is actually input.
* In table_tuple_update(), "slot" was listed as an input parameter
in the comments but is actually output.
* The comment for "update_indexes" in table_tuple_update() was mis-indented.
* The comments for heap_lock_tuple() incorrectly referenced a non-existent
"tid" parameter.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2nB6Ay8g=KEn7L3qbYX_4+sLk9XOMkV0XZqHR4cTY8ZvQ@mail.gmail.com
Format validation and element extraction for intermediate line
strings were inconsistent in their handling of tab delimiters,
which resulted in an unclear error when multiple tab characters
were used as a delimiter. This fixes it by using captures from
the validation regex instead of a separate split() to avoid the
inconsistency. Also, it ensures that \t+ is used consistently
when inspecting the strings.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/20250729.135638.1148639539103758555.horikyota.ntt@gmail.com
Mappings for 18 characters have changed, affecting 36 code points. This
is a break in compatibility, but these characters are rarely used.
U+E5E5 (Private Use Area) was previously mapped to \xA3A0. This code
point now maps to \x65356535. Attempting to convert \xA3A0 will now
raise an error.
Separate from the 2022 update, the following mappings were previously
swapped, and subsequently corrected in 2000 and later versions:
* U+E7C7 (Private Use Area) now maps to \x8135F437
* U+1E3F (Latin Small Letter M with Acute) now maps to \xA8BC
The 2022 standard mentions the following policy changes, but they
have no effect in our implementation:
66 new ideographs are now required, but these are mapped
algorithmically so were already handled by utf8_and_gb18030.c.
Nine CJK compatibility ideographs are no longer required, but
implementations may retain them, as does the source we use from
the Unicode Consortium.
Release notes: Compatibility section
For further details, see:
https://www.unicode.org/L2/L2022/22274-disruptive-changes.pdfhttps://ken-lunde.medium.com/the-gb-18030-2022-standard-3d0ebaeb4132
Author: Chao Li <lic@highgo.com>
Author: Zheng Tao <taoz@highgo.com>
Discussion: https://postgr.es/m/966d9fc.169.198741fe60b.Coremail.jiaoshuntian%40highgo.com
Previously, the parallel apply worker used SIGINT to receive a graceful
shutdown signal from the leader apply worker. However, SIGINT is also used
by the LOCK_TIMEOUT handler to trigger a query-cancel interrupt. This
overlap caused the parallel apply worker to miss LOCK_TIMEOUT signals,
leading to incorrect behavior during lock wait/contention.
This patch resolves the conflict by switching the graceful shutdown signal
from SIGINT to SIGUSR2.
Reported-by: Zane Duffield <duffieldzane@gmail.com>
Diagnosed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 16, where it was introduced
Discussion: https://postgr.es/m/CACMiCkXyC4au74kvE2g6Y=mCEF8X6r-Ne_ty4r7qWkUjRE4+oQ@mail.gmail.com
The macros doing conversions of/from "text" from/to Bitmapset were using
arbitrary casts with Datum, something that is not fine since
2a600a93c7.
These macros do not actually need casts with Datum, as they are given
already "text" and Bitmapset data in input. They are updated to use
cstring_to_text() and text_to_cstring(), fixing the compiler warnings
reported by the buildfarm. Note that appending a -m32 to gcc to trigger
32-bit builds was enough to reproduce the warnings here.
While on it, outer parenthesis are added to TEXT_TO_BITMAPSET(), and
inner parenthesis are removed from BITMAPSET_TO_TEXT(), to make these
macros more consistent with the style used in the tree, based on
suggestions by Tom Lane.
Oversights in commit 00c3d87a5c.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Greg Burd <greg@burd.me>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/3027069.1758606227@sss.pgh.pa.us
Result nodes now include an RTI set, which is only non-NULL when they
have no subplan, and is taken from the relid set of the RelOptInfo that
the Result is generating. ExplainPreScanNode now takes notice of these
RTIs, which means that a few things get schema-qualified in the
regression tests that previously did not. This makes the output more
consistent between cases where some part of the plan tree is replaced by
a Result node and those where this does not happen.
Likewise, pg_overexplain's EXPLAIN (RANGE_TABLE) now displays the RTIs
stored in a Result node just as it already does for other RTI-bearing
node types.
Result nodes also now include a result_reason, which tells us something
about why the Result node was inserted. Using that information, EXPLAIN
now emits, where relevant, a "Replaces" line describing the origin of
a Result node.
The purpose of these changes is to allow code that inspects a Plan
tree to understand the origin of Result nodes that appear therein.
Discussion: http://postgr.es/m/CA+TgmoYeUZePZWLsSO+1FAN7UPePT_RMEZBKkqYBJVCF1s60=w@mail.gmail.com
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Remove stray whitespace in xref tag.
This was found due to a regression in xmllint 2.15.0 which flagged
this as an error, and at the time of this commit no fix for xmllint
has shipped.
Author: Erik Wienhold <ewie@ewie.name>
Discussion: https://postgr.es/m/f4c4661b-4e60-4c10-9336-768b7b55c084@ewie.name
Backpatch-through: 17
Bitmapset has a complex set of APIs, defined in bitmapset.h, and it can
be hard to test edge cases with the backend core code only.
This test module is aimed at closing the gap, and implements a set of
SQL functions that act as wrappers of the low-level C functions of the
same names. These functions rely on text as data type for the input and
the output as Bitmapset as a node has support for these. An extra
function, named test_random_operations(), can be used to stress bitmaps
with random member values and a defined number of operations potentially
useful for other purposes than only tests.
The coverage increases from 85.2% to 93.4%. It should be possible to
cover more code paths, but at least it's a beginning.
Author: Greg Burd <greg@burd.me>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/7BD1ABDB-B03A-464A-9BA9-A73B55AD8A1F@getmailspring.com
The package for the UUID library may be named "uuid" or "ossp-uuid", and
meson.build has been using a single call of dependency() with multiple
names, something only supported since meson 0.60.0.
The minimum version of meson supported by Postgres is 0.57.2 on HEAD,
since f039c22441, and 0.54 on stable branches down to 16.
Author: Oreo Yang <oreo.yang@hotmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/OS3P301MB01656E6F91539770682B1E77E711A@OS3P301MB0165.JPNP301.PROD.OUTLOOK.COM
Backpatch-through: 16
This adds support for base64url encoding and decoding, a base64
variant which is safe to use in filenames and URLs. base64url
replaces '+' in the base64 alphabet with '-' and '/' with '_',
thus making it safe for URL addresses and file systems.
Support for base64url was originally suggested by Przemysław Sztoch.
Author: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: David E. Wheeler <david@justatheory.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Chao Li (Evan) <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/70f2b6a8-486a-4fdb-a951-84cef35e22ab@sztoch.pl
The lossy-counting algorithm that ANALYZE uses to identify most-common
array elements has a notion of cutoff frequency: elements with
frequency greater than that are guaranteed to be collected, elements
with smaller frequencies are not. In cases where we find fewer MCEs
than the stats target would permit us to store, the cutoff frequency
provides valuable additional information, to wit that there are no
non-MCEs with frequency greater than that. What the selectivity
estimation functions actually use the "minfreq" entry for is as a
ceiling on the possible frequency of non-MCEs, so using the cutoff
rather than the lowest stored MCE frequency provides a tighter bound
and more accurate estimates.
Therefore, instead of redundantly storing the minimum observed MCE
frequency, store the cutoff frequency when there are fewer tracked
values than we want. (When there are more, then of course we cannot
assert that no non-stored elements are above the cutoff frequency,
since we're throwing away some that are; so we still use the
minimum stored frequency in that case.)
Notably, this works even when none of the values are common enough
to be called MCEs. In such cases we previously stored nothing in
the STATISTIC_KIND_MCELEM pg_statistic slot, which resulted in the
selectivity functions falling back to default estimates. So in that
case we want to construct a STATISTIC_KIND_MCELEM entry that contains
no "values" but does have "numbers", to wit the three extra numbers
that the MCELEM entry type defines. A small obstacle is that
update_attstats() has traditionally stored a null, not an empty array,
when passed zero "values" for a slot. That gives rise to an MCELEM
entry that get_attstatsslot() will spit up on. The least risky
solution seems to be to adjust update_attstats() so that it will emit
a non-null (but possibly empty) array when the passed stavalues array
pointer isn't NULL, rather than conditioning that on numvalues > 0.
In other existing cases I don't believe that that changes anything.
For consistency, handle the stanumbers array the same way.
In passing, improve the comments in routines that use
STATISTIC_KIND_MCELEM data. Particularly, explain why we use
minfreq / 2 not minfreq as the estimate for non-MCE values.
Thanks to Matt Long for the suggestion that we could apply this
idea even when there are more than zero MCEs.
Reported-by: Mark Frost <FROSTMAR@uk.ibm.com>
Reported-by: Matt Long <matt@mattlong.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/PH3PPF1C905D6E6F24A5C1A1A1D8345B593E16FA@PH3PPF1C905D6E6.namprd15.prod.outlook.com
Commit a391ff3c3, which added the ability for a function's support
function to provide a custom selectivity estimate for "WHERE f(...)",
unintentionally removed the possibility of applying expression
statistics after finding there's no applicable support function.
That happened because we no longer fell through to boolvarsel()
as before. Refactor to do so again, putting the 0.3333333 default
back into boolvarsel() where it had been (cf. commit 39df0f150).
I surely wouldn't have made this error if 39df0f150 had included
a test case, so add one now. At the time we did not have the
"extended statistics" infrastructure, but we do now, and it is
also unable to work in this scenario because of this error.
So make use of that for the test case.
This is very clearly a bug fix, but I'm afraid to put it into
released branches because of the likelihood of altering plan
choices, which we avoid doing in minor releases. So, master only.
Reported-by: Frédéric Yhuel <frederic.yhuel@dalibo.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/a8b99dce-1bfb-4d97-af73-54a32b85c916@dalibo.com
Initially this was to fix the "catched" typo, but I (David) wasn't quite
clear on what the previous comment meant about being "effective". I
expect this means efficiency, so I've reworded the comment to indicate
that.
While this is only a comment fixup, for the sake of possibly minimizing
possible future backpatching pain, I've opted to backpatch to 18 since
this code is new to that version and the release isn't out the door yet.
Author: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAHewXNmSYWPud1sfBvpKbCJeRkWeZYuqatxtV9U9LvAFXBEiBw@mail.gmail.com
Backpatch-through: 18
Commit 216a784829 introduced parallel apply workers, allowing multiple
processes to share a replication origin. To support this,
replorigin_session_setup() was extended to accept a pid argument
identifying the process using the origin.
This commit exposes that capability through the SQL interface function
pg_replication_origin_session_setup() by adding an optional pid parameter.
This enables multiple processes to coordinate replication using the same
origin when using SQL-level replication functions.
This change allows the non-builtin logical replication solutions to
implement parallel apply for large transactions.
Additionally, an existing internal error was made user-facing, as it can
now be triggered via the exposed SQL API.
Author: Doruk Yilmaz <doruk@mixrank.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/CAMPB6wfe4zLjJL8jiZV5kjjpwBM2=rTRme0UCL7Ra4L8MTVdOg@mail.gmail.com
Discussion: https://postgr.es/m/CAE2gYzyTSNvHY1+iWUwykaLETSuAZsCWyryokjP6rG46ZvRgQA@mail.gmail.com
When deciding which code path to use depending on the state of recovery,
XLogFlush() and XLogNeedsFlush() have been relying on different
criterias:
- XLogFlush() relied on XLogInsertAllowed().
- XLogNeedsFlush() relied on RecoveryInProgress().
Currently, the checkpointer is allowed to insert WAL records while
RecoveryInProgress() returns true for an end-of-recovery checkpoint,
where XLogInsertAllowed() matters. Using RecoveryInProgress() in
XLogNeedsFlush() did not really matter for its existing callers, as the
checkpointer only called XLogFlush(). However, a feature under
discussion, by Melanie Plageman, needs XLogNeedsFlush() to be able to
work in more contexts, the end-of-recovery checkpoint being one.
This commit changes XLogNeedsFlush() to use XLogInsertAllowed() instead
of RecoveryInProgress(), making the checks in both routines more
consistent. While on it, an assertion based on XLogNeedsFlush() is
added at the end of XLogFlush(), triggered when flushing a physical
position (not for the normal recovery patch that checks for updates of
the minimum recovery point). This assertion would fail for example in
the recovery test 015_promotion_pages if XLogNeedsFlush() is changed to
use RecoveryInProgress(). This should be hopefully enough to ensure
that the checks done in both routines remain consistent.
Author: Melanie Plageman <melanieplageman@gmail.com>
Co-authored-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAAKRu_a1vZRZRWO3_jv_X13RYoqLRVipGO0237g5PKzPa2YX6g@mail.gmail.com
Commit bb3ec16e14 moved partition pruning metadata into PlannedStmt.
At executor startup this metadata is used to initialize the EState
fields es_part_prune_infos, es_part_prune_states, and
es_part_prune_results. EvalPlanQualStart() failed to copy those
fields into the child EState, causing NULL dereference when Append
ran partition pruning during a recheck. This can occur with DELETE
or UPDATE on partitioned tables that use runtime pruning, e.g. with
generic plans.
Fix by copying all partition pruning state into the EPQ estate.
Add an isolation test that reproduces the crash with concurrent
UPDATE and DELETE on a partitioned table, where the DELETE session
hits the crash during its EPQ recheck after the UPDATE commits.
Bug: #19056
Reported-by: Fei Changhong <feichanghong@qq.com>
Diagnozed-by: Fei Changhong <feichanghong@qq.com>
Author: David Rowley <dgrowleyml@gmail.com>
Co-authored-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/19056-a677cef9b54d76a0%40postgresql.org
This change is a tighter rework of 7d85d87f4d, which tried to improve
the code so as it would work should PgStat_HashKey gain new fields that
create padding bytes. However, the previous change is proving to not be
enough as some code paths of pgstats do not pass PgStat_HashKey by
reference (valgrind would warn when padding is added to the structure,
through a new field).
Per discussion, let's document and check that PgStat_HashKey has no
padding rather than try to complicate the code of pgstats so as it is
able to work around that.
This removes a couple of memset(0) calls that should not be required.
While on it, this commit adds a static assertion checking that no
padding is introduced in the structure, by checking that the size of
PgStat_HashKey matches with the sum of the size of all its fields.
The object ID part of the hash key is already 8 bytes, which should be
plenty enough already. A comment is added to discourage the addition of
new fields.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0t9omat+HVSakJXwTMWvhpYFcAZb41RPWKwrKFUgmAFBQ@mail.gmail.com
When shared memory is re-initialized after a crash, the named
LWLock tranche request array that was copied to shared memory will
no longer be accessible. To fix, save the pointer to the original
array in postmaster's local memory, and switch to it when
re-initializing the LWLock-related shared memory.
Oversight in commit ed1aad15e0. Per buildfarm member batta.
Reported-by: Michael Paquier <michael@paquier.xyz>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aMoejB3iTWy1SxfF%40paquier.xyz
Discussion: https://postgr.es/m/f8ca018f-3479-49f6-a92c-e31db9f849d7%40gmail.com
Previously, pg_restore did not skip security labels on publications or
subscriptions even when --no-publications or --no-subscriptions was specified.
As a result, it could issue SECURITY LABEL commands for objects that were
never created, causing those commands to fail.
This commit fixes the issue by ensuring that security labels on publications
and subscriptions are also skipped when the corresponding options are used.
Backpatch to all supported versions.
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CACJufxHCt00pR9h51AVu6+yPD5J7JQn=7dQXxqacj0XyDhc-fA@mail.gmail.com
Backpatch-through: 13
StrategyInitialize() calls InitBufTable() with maximum number of entries that
the buffer lookup table can ever have. Thus there should not be any need to
allocate more element after initialization. Hence mark the hash table as fixed
sized.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAExHW5v0jh3F_wj86yC=qBfWk0uiT94qy=Z41uzAHLHh0SerRA@mail.gmail.com
If an aggregate function call contains a sub-select that has
an RTE referencing a CTE outside the aggregate, we must treat
that reference like a Var referencing the CTE's query level
for purposes of determining the aggregate's level. Otherwise
we might reach the nonsensical conclusion that the aggregate
should be evaluated at some query level higher than the CTE,
ending in a planner error or a broken plan tree that causes
executor failures.
Bug: #19055
Reported-by: BugForge <dllggyx@outlook.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19055-6970cfa8556a394d@postgresql.org
Backpatch-through: 13
Commit 2a600a93 made Datum 8 bytes wide everywhere. It was no longer
appropriate to use TypeSizeT on 32 bit systems, and JIT compilation
would fail with various type check errors. Introduce a separate
LLVMTypeRef with the name TypeDatum. TypeSizeT is still used in some
places for actual size_t values.
Reported-by: Dmitry Mityugov <d.mityugov@postgrespro.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Dmitry Mityugov <d.mityugov@postgrespro.ru>
Discussion: https://postgr.es/m/0a9f0be59171c2e8f1b3bc10f4fcf267%40postgrespro.ru
The pending entry was not used when incrementing its data, directly
manipulating the shared memory pointer, without even locking it. This
could mean losing statistics under concurrent activity. The flush
callback was a no-op.
This code serves as a base template for extensions for the custom
cumulative statistics, so let's be clean and use a pending entry for the
incrementations, whose data is then flushed to the corresponding entry
in the shared hashtable when all the stats are reported, in its own
flush callback.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0v0U0yhPbY+bqChomkPbyUrRQ3rQXnZf_SB-svDiQOpgQ@mail.gmail.com
Backpatch-through: 18
The shared memory size was calculated based on an offset of io_handles,
which is itself a pointer included in the structure. We tend to
overestimate the shared memory size overall, so this was unlikely an
issue in practice, but let's be correct and use the full size of the
structure in the calculation, so as the pointer for io_handles is
included.
Oversight in da7226993f.
Author: Madhukar Prasad <madhukarprasad@google.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAKi+wrbC2dTzh_vKJoAZXV5wqTbhY0n4wRNpCjJ=e36aoo0kFw@mail.gmail.com
Backpatch-through: 18
The EvalPlanQual recheck for TID Range Scan wasn't rechecking the TID qual
still passed after following update chains. This could result in tuples
being updated or deleted by plans using TID Range Scans where the ctid of
the new (updated) tuple no longer matches the clause of the scan. This
isn't desired behavior, and isn't consistent with what would happen if the
chosen plan had used an Index or Seq Scan, and that could lead to hard to
predict behavior for scans that contain TID quals and other quals as the
planner has freedom to choose TID Range or some other non-TID scan method
for such queries, and the chosen plan could change at any moment.
Here we fix this by properly implementing the recheck function for TID
Range Scans.
Backpatch to 14, where TID Range Scans were added
Reported-by: Sophie Alpert <pg@sophiebits.com>
Author: Sophie Alpert <pg@sophiebits.com>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/4a6268ff-3340-453a-9bf5-c98d51a6f729@app.fastmail.com
Backpatch-through: 14
The EvalPlanQual recheck for TID Scan wasn't rechecking the TID qual
still passed after following update chains. This could result in tuples
being updated or deleted by plans using TID Scans where the ctid of the
new (updated) tuple no longer matches the clause of the scan. This isn't
desired behavior, and isn't consistent with what would happen if the
chosen plan had used an Index or Seq Scan, and that could lead to hard to
predict behavior for scans that contain TID quals and other quals as the
planner has freedom to choose TID or some other scan method for such
queries, and the chosen plan could change at any moment.
Here we fix this by properly implementing the recheck function for TID
Scans.
Backpatch to 13, oldest supported version
Reported-by: Sophie Alpert <pg@sophiebits.com>
Author: Sophie Alpert <pg@sophiebits.com>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/4a6268ff-3340-453a-9bf5-c98d51a6f729@app.fastmail.com
Backpatch-through: 13
This reverts commit 98fc31d649.
That change allowed DROP OWNED BY to drop grants of the target
role to other roles, arguing that nobody would need those
privileges anymore. But that's not so: if you're not superuser,
you still need admin privilege on the target role so you can
drop it.
It's not clear whether or how the dependency-based approach
to solving the original problem can be adapted to keep these
grants. Since v18 release is fast approaching, the sanest
thing to do seems to be to revert this patch for now. The
race-condition problem is low severity and not worth taking
risks for.
I didn't force a catversion bump in 98fc31d64, so I won't do
so here either.
Reported-by: Dipesh Dhameliya <dipeshdhameliya125@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CABgZEgczOFicCJoqtrH9gbYMe_BV3Hq8zzCBRcMgmU6LRsihUA@mail.gmail.com
Backpatch-through: 18
The COMMENT should depend on the separately-dumped constraint, not the
domain. Sufficient restore parallelism might fail the COMMENT command
by issuing it before the constraint exists. Back-patch to v13, like
commit 0858f0f96e.
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/20250913020233.fa.nmisch@google.com
Backpatch-through: 13
Up to now we've contented ourselves with a one-size-fits-all error
hint when we fail to find any match to a function or procedure call.
That was mostly okay in the beginning, but it was never great, and
since the introduction of named arguments it's really not adequate.
We at least ought to distinguish "function name doesn't exist" from
"function name exists, but not with those argument names". And the
rules for named-argument matching are arcane enough that some more
detail seems warranted if we match the argument names but the call
still doesn't work.
This patch creates a framework for dealing with these problems:
FuncnameGetCandidates and related code will now pass back a bitmask of
flags showing how far the match succeeded. This allows a considerable
amount of granularity in the reports. The set-bits-in-a-bitmask
approach means that when there are multiple candidate functions, the
report will reflect the match(es) that got the furthest, which seems
correct. Also, we can avoid mentioning "maybe add casts" unless
failure to match argument types is actually the issue.
Extend the same return-a-bitmask approach to OpernameGetCandidates.
The issues around argument names don't apply to operator syntax,
but it still seems worth distinguishing between "there is no
operator of that name" and "we couldn't match the argument types".
While at it, adjust these messages and related ones to more strictly
separate "detail" from "hint", following our message style guidelines'
distinction between those.
Reported-by: Dominique Devienne <ddevienne@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/1756041.1754616558@sss.pgh.pa.us
In our previous discussions around making our regression tests
pass in FIPS mode, we concluded that we didn't need to support
the different error message wording observed with pre-3.0 OpenSSL.
However there are still a few LTS distributions soldiering along
with such versions, and now we have some in the buildfarm.
So let's add the variant expected-files needed to make them happy.
This commit only covers the core regression tests. Previous
discussion suggested that we might need some adjustments in
contrib as well, but it's not totally clear to me what those
would be. Rather than work it out from first principles,
I'll wait to see what the buildfarm shows.
Back-patch to v17 which is the oldest branch that claims
to support this case.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/443709.1757876535@sss.pgh.pa.us
Backpatch-through: 17
JsonConstructorExpr can produce non-NULL output with a NULL input, so
it should be treated as a non-strict construct. Failing to do so can
lead to incorrect query behavior.
For example, in the reported case, when pulling up a subquery that is
under an outer join, if the subquery's target list contains a
JsonConstructorExpr that uses subquery variables and it is mistakenly
treated as strict, it will be pulled up without being wrapped in a
PlaceHolderVar. As a result, the expression will be evaluated at the
wrong place and will not be forced to null when the outer join should
do so.
Back-patch to v16 where JsonConstructorExpr was introduced.
Bug: #19046
Reported-by: Runyuan He <runyuan@berkeley.edu>
Author: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/19046-765b6602b0a8cfdf@postgresql.org
Backpatch-through: 16
Previously we built the .map files for GB18030 (version 2000) from an
XML file. The 2022 version for this encoding is only available as a
Unicode Character Mapping (UCM) file, so as preparatory refactoring
switch to this format as the source for building version 2000.
As we do with most input files for the conversion mappings, download
the file on demand. In order to generate the same mappings we have
now, we must download from a previous upstream commit, rather than
the head since the latter contains a correction not present in our
current .map files.
The XML file is still used by EUC_CN, so we cannot delete it from our
repository. GB18030 is a superset of EUC_CN, so it may be possible to
build EUC_CN from the same UCM file, but that is left for future work.
Author: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/966d9fc.169.198741fe60b.Coremail.jiaoshuntian%40highgo.com
Fix for commit 3c86223c99. That commit moved the typedef of pg_int64
from postgres_ext.h to libpq-fe.h, because the only remaining place
where it might be used is libpq users, and since the type is obsolete,
the intent was to limit its scope.
The problem is that if someone builds an extension against an
older (pre-PG18) server version and a new (PG18) libpq, they might get
two typedefs, depending on include file order. This is not allowed
under C99, so they might get warnings or errors, depending on the
compiler and options. The underlying types might also be
different (e.g., long int vs. long long int), which would also lead to
errors. This scenario is plausible when using the standard Debian
packaging, which provides only the newest libpq but per-major-version
server packages.
The fix is to undo that part of commit 3c86223c99. That way, the
typedef is in the same header file across versions. At least, this is
the safest fix doable before PostgreSQL 18 releases.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/25144219-5142-4589-89f8-4e76948b32db%40eisentraut.org
Previously, pg_dump incorrectly queried pg_seclabel to retrieve security labels
for subscriptions, which are stored in pg_shseclabel as they are global objects.
This could result in security labels for subscriptions not being dumped.
This commit fixes the issue by updating pg_dump to query the pg_seclabels view,
which aggregates entries from both pg_seclabel and pg_shseclabel.
While querying pg_shseclabel directly for subscriptions was an alternative,
using pg_seclabels is simpler and sufficient.
In addition, pg_dump is updated to dump security labels on event triggers,
which were previously omitted.
Backpatch to all supported versions.
Author: Jian He <jian.universality@gmail.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CACJufxHCt00pR9h51AVu6+yPD5J7JQn=7dQXxqacj0XyDhc-fA@mail.gmail.com
Backpatch-through: 13
This commit addresses two sources of instability in the 035_conflicts
test:
Unstable VACUUM usage:
The test previously relied on VACUUM to remove a deleted column, which can
be unreliable due to concurrent background writer or checkpoint activity
that may lock the page containing the deleted tuple. Since the test
already verifies that replication_slot.xmin has advanced sufficiently to
confirm the feature's correctness, so, the VACUUM step is removed to
improve test stability.
Timing-sensitive retention resumption check:
The test includes a check to confirm that retention of conflict-relevant
information resumes after setting max_retention_duration to 0. However, in
some cases, the apply worker resumes retention immediately after the
inactive slot is removed from synchronized_standby_slots, even before
max_retention_duration is updated. This can happen if remote changes are
applied in under 1ms, causing the test to timeout while waiting for a
later log position. To ensure consistent behavior, this commit delays the
removal of synchronized_standby_slots until after max_retention_duration
is set to 0.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/TY4PR01MB16907805DE4816E53C54708159414A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Commit 7202d72787 added in passing some const qualifiers, but the one
on the postmaster_child_launch() startup_data argument was incorrect,
because the function itself modifies the pointed-to data. This is
hidden from the compiler because of casts. The qualifiers on the
functions called by postmaster_child_launch() are still correct.
Previously, pg_restore did not skip comments on policies even when
--no-policies was specified. As a result, it could issue COMMENT commands
for policies that were never created, causing those commands to fail.
This commit fixes the issue by ensuring that comments on policies
are also skipped when --no-policies is used.
Backpatch to v18, where --no-policies was added in pg_restore.
Author: Jian He <jian.universality@gmail.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CACJufxHCt00pR9h51AVu6+yPD5J7JQn=7dQXxqacj0XyDhc-fA@mail.gmail.com
Backpatch-through: 18
Previously, pg_restore did not skip comments on publications or subscriptions
even when --no-publications or --no-subscriptions was specified. As a result,
it could issue COMMENT commands for objects that were never created,
causing those commands to fail.
This commit fixes the issue by ensuring that comments on publications and
subscriptions are also skipped when the corresponding options are used.
Backpatch to all supported versions.
Author: Jian He <jian.universality@gmail.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CACJufxHCt00pR9h51AVu6+yPD5J7JQn=7dQXxqacj0XyDhc-fA@mail.gmail.com
Backpatch-through: 13
Add logic to _bt_set_startikey that determines whether row compare keys
are guaranteed to be satisfied by every tuple on a page that is about to
be read by _bt_readpage. This works in essentially the same way as the
existing scalar inequality logic. Testing has shown that the new logic
improves performance to about the same degree as the existing scalar
inequality logic (compared to the unoptimized case). In other words,
the new logic makes many row compare scans significantly faster.
Note that the new row compare inequality logic is only effective when
the same individual row member is the deciding subkey for all tuples on
the page (obviously, all tuples have to satisfy the row compare, too).
This is what makes the new row compare logic very similar to the
existing logic for scalar inequalities. Note, in particular, that this
makes it safe to ignore whether all row compare members are against
either ASC or DESC index attributes (i.e. it doesn't matter if
individual subkeys don't all use the same inequality strategy).
Also stop refusing to set pstate.startikey to an offset beyond any
nonrequired key (don't add logic that'll do that for an individual row
compare subkey, either). We can fully rely on our firstchangingattnum
tests instead. This will do the right thing when a page has a group of
tuples with NULLs in a lower-order attribute that makes the tuples fail
to satisfy a row compare key -- we won't incorrectly conclude that all
tuples must satisfy the row compare, just because firsttup and lasttup
happen to. Our firstchangingattnum test prevents that from happening.
(Note that the original "avoid evaluating nbtree scan keys" mechanism
added by commit e0b1ee17 couldn't support row compares due to issues
with tuples that contain NULLs in a lower-order subkey's attribute.
That original mechanism relied on requiredness markings, which the
replacement _bt_set_startikey mechanism never really needed.)
Follow up to commit 8a510275, which added the _bt_set_startikey
optimization. _bt_set_startikey is now feature complete; there's no
remaining kind of nbtree scan key that it still doesn't support.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznL6Z3H_GTQze9d8T_Ls=cYbnd-_9f-Jo7aYgTGRUD58g@mail.gmail.com
fmgr.h defined some types such as fmNodePtr which is just Node *, but
it made its own types to avoid having to include various header files.
With C11, we can now instead typedef the original names without fear
of conflicts.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/10d32190-f31b-40a5-b177-11db55597355@eisentraut.org
There are a number of forward declarations that use struct but not the
customary typedef, because that could have led to repeat typedefs,
which was not allowed. This is now allowed in C11, so we can update
these to provide the typedefs as well, so that the later uses of the
types look more consistent.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/10d32190-f31b-40a5-b177-11db55597355@eisentraut.org
This commit resumes automatic retention of conflict-relevant data for a
subscription. Previously, retention would stop if the apply process failed
to advance its xmin (oldest_nonremovable_xid) within the configured
max_retention_duration and user needs to manually re-enable
retain_dead_tuples option. With this change, retention will resume
automatically once the apply worker catches up and begins advancing its
xmin (oldest_nonremovable_xid) within the configured threshold.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
If extensions of equal names were installed in different directories
in the path, the views pg_available_extensions and
pg_available_extension_versions would show all of them, even though
only the first one was actually reachable by CREATE EXTENSION. To
fix, have those views skip extensions found later in the path if they
have names already found earlier.
Also add a bit of documentation that only the first extension in the
path can be used.
Reported-by: Pierrick <pierrick.chovelon@dalibo.com>
Discussion: https://www.postgresql.org/message-id/flat/8f5a0517-1cb8-4085-ae89-77e7454e27ba%40dalibo.com
The TimescaleDB extension expects to be able to change an nbtree scan's
keys across rescans. The issue arises in the extension's implementation
of loose index scan. This is arguably a misuse of the index AM API,
though apparently it worked until recently. It stopped working when the
skipScan flag was added to BTScanOpaqueData by commit 8a510275, though.
The flag wouldn't reliably track whether the scan (actually, the current
rescan) has any skip arrays, leading to confusion in _bt_set_startikey.
nbtree preprocessing will now defensively initialize the scan's skipScan
flag in all cases, including the case where _bt_preprocess_array_keys
returns early due to the (re)scan not using arrays. While nbtree isn't
obligated to support this use case (at least not according to my reading
of the index AM API), it still seems like a good idea to be consistent
here, on general robustness grounds.
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Natalya Aksman <natalya@timescale.com>
Discussion: https://postgr.es/m/CAJumhcirfMojbk20+W0YimbNDkwdECvJprQGQ-XqK--ph09nQw@mail.gmail.com
Backpatch-through: 18
Commit e3ffc3e91 fixed the translation of character classes in
SIMILAR TO regular expressions. Unfortunately the fix broke a corner
case: if there is an escape character right after the opening bracket
(for example in "[\q]"), a closing bracket right after the escape
sequence would not be seen as closing the character class.
There were two more oversights: a backslash or a nested opening bracket
right at the beginning of a character class should remove the special
meaning from any following caret or closing bracket.
This bug suggests that this code needs to be more readable, so also
rename the variables "charclass_depth" and "charclass_start" to
something more meaningful, rewrite an "if" cascade to be more
consistent, and improve the commentary.
Reported-by: Dominique Devienne <ddevienne@gmail.com>
Reported-by: Stephan Springl <springl-psql@bfw-online.de>
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFCRh-8NwJd0jq6P=R3qhHyqU7hw0BTor3W0SvUcii24et+zAw@mail.gmail.com
Backpatch-through: 13
pg_regress has a --no-locale option that forces the temporary database to
have C locale. However, currently, locale C only exists in the 'builtin'
locale provider. This makes 'pg_regress --no-locale' fail when the default
locale provider is not 'builtin'. This commit makes 'pg_regress --no-locale'
specify both LOCALE='C' and LOCALE_PROVIDER='builtin'.
Discussion: https://postgr.es/m/b54921f95e23b4391b1613e9053a3d58%40postgrespro.ru
Author: Oleg Tselebrovskiy <o.tselebrovskiy@postgrespro.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
It's not quite clear to me why this didn't show up in my local
check-world testing, but some of the buildfarm evidently runs
this test with a different database name. Adjust the test
so that that doesn't affect the reported error messages.
If the database or user had no entry in pg_db_role_setting,
RESET silently did nothing --- including not checking the
validity of the given GUC name. This is quite inconsistent
and surprising, because you *would* get such an error if there
were any pg_db_role_setting entry, even though it contains
values for unrelated GUCs.
While this is clearly a bug, changing it in stable branches seems
unwise. The effect will be that some ALTER commands that formerly
were no-ops will now be errors, and people don't like that sort of
thing in minor releases.
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/30783e-68c28a00-9-41004480@130449754
Commit a0b99fc12 caused pg_event_trigger_dropped_objects()
to not fill the object_name field for schemas, which it
should have; and caused it to fill the object_name field
for default values, which it should not have.
In addition, triggers and RLS policies really should behave
the same way as we're making column defaults do; that is,
they should have is_temporary = true if they belong to a
temporary table.
Fix those things, and upgrade event_trigger.sql's woefully
inadequate test coverage of these secondary output columns.
As before, back-patch only to v15.
Reported-by: Sergey Shinderuk <s.shinderuk@postgrespro.ru>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/bd7b4651-1c26-4d30-832b-f942fabcb145@postgrespro.ru
Backpatch-through: 15
This unblocks rejection of that syntax. One copy was a misspelling of
"SET TABLESPACE pg_default" that instead made no persistent changes.
The other copy just needed to populate a DATABASEOID syscache entry.
This slightly raises database.sql test coverage of catcache.c, while
dbcommands.c coverage remains the same.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1802710.1757608564@sss.pgh.pa.us
A recently added nbtree preprocessing step failed to account for the
fact that DESC columns already had their B-Tree strategy number commuted
at this point in preprocessing. As a result, preprocessing could output
a set of scan keys where one or more keys had the correct strategy
number, but used the wrong comparison routine.
To fix, make the faulty code path that looks up a more restrictive
replacement operator/comparison routine commute its requested inequality
strategy (while outputting the transformed strategy number as before).
This makes the final transformed scan key comport with the approach
preprocessing has always used to deal with DESC columns (which is
described by comments above _bt_fix_scankey_strategy).
Oversight in commit commit b3f1a13f, which made nbtree preprocessing
perform transformations on skip array inequalities that can reduce the
total number of index searches.
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Natalya Aksman <natalya@timescale.com>
Discussion: https://postgr.es/m/19049-b7df801e71de41b2@postgresql.org
Backpatch-through: 18
Users of logical decoding can encounter an unexpected change of
CurrentResourceOwner and CurrentMemoryContext. The problem is that,
unlike other call sites of RollbackAndReleaseCurrentSubTransaction(), in
reorderbuffer.c we fail to restore the original values of these global
variables after being clobbered by subtransaction abort. This patch
saves the values prior to the call and restores them eventually.
In addition, logical.c and logicalfuncs.c had a hack to restore resource
owner, presumably because of lack of this restore. Remove that.
Instead, because the test coverage here is not very consistent, add an
Assert() to ensure that the resowner is kept identical; this would make
it easy to detect other cases of bugs were we fail to restore resowner
properly. This could be removed later.
This is arguably an old bug, but there appears to be no reason to
backpatch it and it's risky to do so, so refrain for now.
Author: Antonin Houska <ah@cybertec.at>
Reported-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/119497.1756892972@localhost
Its size was ~3.8GB before, which sometimes was not enough. OpenBSD CI task
often were failing due to no space left on device. Increase the RAM disk size
to ~4.6 GB.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ2XVVPJRJmGB2DsL3gOrOinWh=HWvj6GO1cHzJ=6LwTag@mail.gmail.com
Backpatch-through: 18, where openbsd was added to CI
It was defining yyscan_t as a macro while the rest of the code uses a
typedef with #ifdef guards around it. The latter is also what the
flex generated code uses. So it seems best to make it look like those
other places for consistency.
The old way also had a potential for conflict if some code included
multiple headers providing yyscan_t. exprscan.l includes
#include "fe_utils/psqlscan_int.h"
#include "pgbench.h"
and fe_utils/psqlscan_int.h contains
#ifndef YY_TYPEDEF_YY_SCANNER_T
#define YY_TYPEDEF_YY_SCANNER_T
typedef void *yyscan_t;
#endif
which was then followed by pgbench.h
#define yyscan_t void *
and then the generated code in exprscan.c
#ifndef YY_TYPEDEF_YY_SCANNER_T
#define YY_TYPEDEF_YY_SCANNER_T
typedef void* yyscan_t;
#endif
This works, but if the #ifdef guard in psqlscan_int.h is removed, this
fails.
We want to move toward allowing repeat typedefs, per C11, but for that
we need to make sure they are all the same.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/10d32190-f31b-40a5-b177-11db55597355@eisentraut.org
If someone is stuck behind a lock for more than a second, that is
almost always a problem that is worth a log entry.
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-By: Michael Banck <mbanck@gmx.net>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Christoph Berg <myon@debian.org>
Reviewed-By: Stephen Frost <sfrost@snowman.net>
Discussion: https://postgr.es/m/b8b8502915e50f44deb111bc0b43a99e2733e117.camel%40cybertec.at
Per discussion, this compiler suite is no longer maintained, and
it has not been able to compile PostgreSQL since at least PostgreSQL
17.
This removes all the remaining support code for this compiler.
Note that the Solaris operating system continues to be supported, but
using GCC as the compiler.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/a0f817ee-fb86-483a-8a14-b6f7f5991b6e%40eisentraut.org
Clang 21 shows some new compiler warnings, for example:
warning: variable 'dstsize' is uninitialized when passed as a const pointer argument here [-Wuninitialized-const-pointer]
The fix is to initialize the variables when they are defined. This is
similar to, for example, the existing situation in gistKeyIsEQ().
Discussion: https://www.postgresql.org/message-id/flat/6604ad6e-5934-43ac-8590-15113d6ae4b1%40eisentraut.org
The typedef Relids (Bitmapset *) is intended to represent set of
relation identifiers, but was incorrectly used in several places to
store sets of attribute numbers. This is my oversight in e2debb643.
Fix that by replacing such usages with Bitmapset * to reflect the
correct semantics.
Author: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAEG8a3LJhp_xriXf39iCz0TsK+M-2biuhDhpLC6Baxw8+ZYT3A@mail.gmail.com
hashdesc.c was missing a couple of fields in its record descriptions, as
of:
- is_prev_bucket_same_wrt for SQUEEZE_PAGE.
- procid for INIT_META_PAGE.
- old_bucket_flag and new_bucket_flag for SPLIT_ALLOCATE_PAGE.
The author has noted the first hole, and I have spotted the others while
double-checking this area of the code. Note that the only data missing
now are the offsets stored in VACUUM_ONE_PAGE. We could perhaps add
them, if somebody sees value in this data, even if it makes the output
larger. These are discarded here.
Author: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CALdSSPjc-OVwtZH0Xrkvg7n=2ZwdbMJzqrm_ed_CfjiAzuKVGg@mail.gmail.com
In EXEC_BACKEND builds, GetNamedLWLockTranche() can segfault when
called outside of the postmaster process, as it might access
NamedLWLockTrancheRequestArray, which won't be initialized. Given
the lack of reports, this is apparently unusual, presumably because
it is usually called from a shmem_startup_hook like this:
mystruct = ShmemInitStruct(..., &found);
if (!found)
{
mystruct->locks = GetNamedLWLockTranche(...);
...
}
This genre of shmem_startup_hook evades the aforementioned
segfaults because the struct is initialized in the postmaster, so
all other callers skip the !found path.
We considered modifying the documentation or requiring
GetNamedLWLockTranche() to be called from the postmaster, but
ultimately we decided to simply move the request array to shared
memory (and add it to the BackendParameters struct), thereby
allowing calls outside postmaster on all platforms. Since the main
shared memory segment is initialized after accepting LWLock tranche
requests, postmaster builds the request array in local memory first
and then copies it to shared memory later.
Given the lack of reports, back-patching seems unnecessary.
Reported-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0v1_15QPg5Sqd2Qz5rh_qcsyCeHHmRDY89xVHcy2yt5BQ%40mail.gmail.com
pg_event_trigger_dropped_objects() would report a column default
object with is_temporary = false, even if it belongs to a
temporary table. This seems clearly wrong, so adjust it to
report the table's temp-ness.
While here, refactor EventTriggerSQLDropAddObject to make its
handling of namespace objects less messy and avoid duplication
of the schema-lookup code. And add some explicit test coverage
of dropped-object reports for dependencies of temp tables.
Back-patch to v15. The bug exists further back, but the
GetAttrDefaultColumnAddress function this patch depends on does not,
and it doesn't seem worth adjusting it to cope with the older code.
Author: Antoine Violin <violin.antuan@gmail.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFjUV9x3-hv0gihf+CtUc-1it0hh7Skp9iYFhMS7FJjtAeAptA@mail.gmail.com
Backpatch-through: 15
The comment mistakenly had "the others" for "the other", but this
commit also reorders the comment so it matches the macros below. Now we
describe the levels in increasing strictness. In addition, it seems
easier to follow if we introduce one level at a time, rather than
describing two, followed by "the other" (and then jumping back to one of
the first two). Finally, reword the sentence about the purpose of the
macros, which was slightly off-point.
Author: Paul Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Rustam ALLAKOV <rustamallakov@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA+renyUp=xja80rBaB6NpY3RRdi750y046x28bo_xg29zKY72Q@mail.gmail.com
This commit adds an isolation test showing that temporal foreign keys do
not permit referential integrity violations under concurrency, like
fk-snapshot-2. You can show that the test fails by passing false for
detectNewRows to ri_PerformCheck in ri_restrict.
Author: Paul Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Rustam ALLAKOV <rustamallakov@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA+renyUp=xja80rBaB6NpY3RRdi750y046x28bo_xg29zKY72Q@mail.gmail.com
This commit adds a missing isolation test for (non-PERIOD) foreign keys.
With REPEATABLE READ, one transaction can insert a referencing row while
another deletes the referenced row, and both see a valid state. But
after they have committed, the table violates referential integrity.
If the INSERT precedes the DELETE, we use a crosscheck snapshot to see
the just-added row, so that the DELETE can raise a foreign key error.
You can see the table violate referential integrity if you change
ri_restrict to pass false for detectNewRows to ri_PerformCheck.
A crosscheck snapshot is not needed when the DELETE comes first, because
the INSERT's trigger takes a FOR KEY SHARE lock that sees the row now
marked for deletion, waits for that transaction to commit, and raises a
serialization error. I (Paul) added a test for that too though.
We already have a similar test (in ri-triggers.spec) for SERIALIZABLE
snapshot isolation showing that you can implement foreign keys with just
pl/pgSQL, but that test does nothing to validate ri_triggers.c. We also
have tests (in fk-snapshot.spec) for other concurrency scenarios, but
not this one: we test concurrently deleting both the referencing and
referenced row, when the constraint activates a cascade/set null action.
But those tests don't exercise ri_restrict, and the consequence of
omitting a crosscheck comparison is different: a serialization failure,
not a referential integrity violation.
Author: Paul Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Rustam ALLAKOV <rustamallakov@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA+renyUp=xja80rBaB6NpY3RRdi750y046x28bo_xg29zKY72Q@mail.gmail.com
The Sun Studio compiler complains about an empty declaration here.
Note for future historians: This does not mean that this compiler is
still of current interest for anyone using PostgreSQL. But we can let
this small fix be its parting gift.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/a0f817ee-fb86-483a-8a14-b6f7f5991b6e%40eisentraut.org
The test assumes that a backend will execute COMMIT PREPARED on the
publisher and hit the injection point commit-after-delay-checkpoint within
the commit critical section. This should cause the apply worker on the
subscriber to wait for the transaction to complete.
However, the test does not guarantee that the injection point is actually
triggered, creating a race condition where the apply worker may proceed
prematurely during COMMIT PREPARED.
This commit resolves the issue by explicitly waiting for the injection
point to be hit before continuing with the test, ensuring consistent and
reliable behavior.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Discussion: https://postgr.es/m/TY4PR01MB1690751D1CA8C128B0770EC6F9409A@TY4PR01MB16907.jpnprd01.prod.outlook.com
hash_xlog.h included descriptions for the blocks used in WAL records
that were was not completely consistent with how the records are
generated, with one block missing for SQUEEZE_PAGE, and inconsistent
descriptions used for block 0 in VACUUM_ONE_PAGE and MOVE_PAGE_CONTENTS.
This information was incorrect since c11453ce0a, cross-checking the
logic for the record generation.
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CALdSSPj1j=a1d1hVA3oabRFz0hSU3KKrYtZPijw4UPUM7LY9zw@mail.gmail.com
Backpatch-through: 13
GucSource_Names was documented as being in guc.c, but since 0a20ff54f5
it is located in guc_tables.c. The reference to the location of
GucSource_Names is important, as GucSource needs to be kept in sync with
GucSource_Names.
Author: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/CAKFQuwYPgAHWPYjPzK7iXzhSZ6MKR8w20_Nz7ZXpOvx=kZbs7A@mail.gmail.com
Backpatch-through: 16
If sizeof(Pointer) is 4 then sizeof(SortItem) will be 12, so that
if data->numrows is odd then we placed the values array at a location
that's not a multiple of 8. That was fine when sizeof(Datum) was also
4, but in the wake of commit 2a600a93c it makes some alignment-picky
machines unhappy. (You need a 32-bit machine that nonetheless expects
8-byte alignment of 8-byte quantities, which is an odd-seeming
combination but it does exist outside the Intel universe.)
To fix, MAXALIGN the space allocated to the SortItem array.
In passing, let's make the "len" variable be Size not int,
just for paranoia's sake.
This code was arguably not too safe even before 2a600a93c, but at
present I don't see a strong argument for back-patching.
Reported-by: Tomas Vondra <tomas@vondra.me>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/87036018-8d70-40ad-a0ac-192b07bd7b04@vondra.me
Instead of building a separate memory context that's used just
for running hash functions, make the hash functions run in the
per-tuple context of the node's innerecontext. This saves a
little space at runtime, and it avoids needing to reset two
contexts instead of one inside buildSubPlanHash's main loop.
This largely reverts commit 133924e13. That's safe to do now
because bf6c614a2 decoupled the evaluation context used by
TupleHashTableMatch from that used for hash function evaluation,
so that there's no longer a risk of resetting the innerecontext
too soon.
Per discussion of bug #19040, although this is not directly
a fix for that.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Haiyang Li <mohen.lhy@alibaba-inc.com>
Reviewed-by: Fei Changhong <feichanghong@qq.com>
Discussion: https://postgr.es/m/19040-c9b6073ef814f48c@postgresql.org
If the hash functions used for hashing tuples leaked any memory,
we failed to clean that up, resulting in query-lifespan memory
leakage in queries using hashed subplans. One way that could
happen is if the values being hashed require de-toasting, since
most of our hash functions don't trouble to clean up de-toasted
inputs.
Prior to commit bf6c614a2, this leakage was largely masked
because TupleHashTableMatch would reset hashtable->tempcxt
(via execTuplesMatch). But it doesn't do that anymore, and
that's not really the right place for this anyway: doing it
there could reset the tempcxt many times per hash lookup,
or not at all. Instead put reset calls into ExecHashSubPlan
and buildSubPlanHash. Along the way to that, rearrange
ExecHashSubPlan so that there's just one place to call
MemoryContextReset instead of several.
This amounts to accepting the de-facto API spec that the caller
of the TupleHashTable routines is responsible for resetting the
tempcxt adequately often. Although the other callers seem to
get this right, it was not documented anywhere, so add a comment
about it.
Bug: #19040
Reported-by: Haiyang Li <mohen.lhy@alibaba-inc.com>
Author: Haiyang Li <mohen.lhy@alibaba-inc.com>
Reviewed-by: Fei Changhong <feichanghong@qq.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19040-c9b6073ef814f48c@postgresql.org
Backpatch-through: 13
autoconf builds have compiled this file with -ftree-vectorize since
commit 8870917623, but meson builds seem to have missed the memo.
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/aL85CeasM51-0D1h%40nathan
Backpatch-through: 16
All the calls replaced by this commit use 4-byte integers for their
variables used in input of my_log2(). Hence, the limit against
too-large inputs does not really apply. Thresholds are also applied, as
of:
- In nodeAgg.c, the number of partitions is limited by
HASHAGG_MAX_PARTITIONS.
- In nodeHash.c, ExecChooseHashTableSize() caps its maximum number of
buckets based on HashJoinTuple and palloc() allocation limit.
- In worker.c, the number of subxacts tracked by ApplySubXactData uses
uint32, making pg_ceil_log2_64() safe to use directly.
Several approaches have been discussed, like an integration with
thresholds in pg_bitutils.h, but it was found confusing. This uses
Dean's idea, which gives a simpler result than what I came up with to be
able to remove dynahash.h. dynahash.h will be removed in a follow-up
commit, removing some duplication with the ceil log2 routines.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAEZATCUJPQD_7sC-wErak2CQGNa6bj2hY-mr8wsBki=kX7f2_A@mail.gmail.com
The startup process does not process shared invalidation messages, only
sending them, and never calls AtEOXact_SMgr() which clean up any
unpinned SMgrRelations. Hence, it is never able to free SMgrRelations
on a periodic basis, bloating its hashtable over time.
Like the checkpointer and the bgwriter, this commit takes a conservative
approach by freeing periodically SMgrRelations when replaying a
checkpoint record, either online or shutdown, so as the startup process
has a way to perform a periodic cleanup.
Issue caused by 21d9c3ee4e, so backpatch down to v17.
Author: Jingtang Zhang <mrdrivingduck@gmail.com>
Reviewed-by: Yuhang Qiu <iamqyh@gmail.com>
Discussion: https://postgr.es/m/28C687D4-F335-417E-B06C-6612A0BD5A10@gmail.com
Backpatch-through: 17
This section claims that each backend executes the
shmem_startup_hook shortly after attaching to shared memory, which
is true for EXEC_BACKEND builds, but not for others. This commit
adds this important detail.
Oversight in commit 964152c476.
Reported-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0vEGT1eigGbVt604LkXP6mUPMwPMxQoRCbFny44w%2B9EUQ%40mail.gmail.com
Backpatch-through: 17
Currently, test_slru's shmem_startup_hook unconditionally generates
new LWLock tranche IDs. This is fine on non-EXEC_BACKEND builds,
where only the postmaster executes this hook, but on EXEC_BACKEND
builds, every backend executes it, too. To fix, only generate the
tranche IDs in the postmaster process by checking the
IsUnderPostmaster variable.
This is arguably a bug fix and could be back-patched, but since the
damage is limited to some extra unused tranche IDs in a test
module, I'm not going to bother.
Reported-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0vaAuonaf12CeDddQJu5xKL%2B6xVyS%2B_q1%2BcH%3D33JXV82w%40mail.gmail.com
This adds 3 new variants of the random() function:
random(min date, max date) returns date
random(min timestamp, max timestamp) returns timestamp
random(min timestamptz, max timestamptz) returns timestamptz
Each returns a random value x in the range min <= x <= max.
Author: Damien Clochard <damien@dalibo.info>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/f524d8cab5914613d9e624d9ce177d3d@dalibo.info
Address a potential SIGSEGV that may occur when the tablesync worker
attempts to locate a deleted row while applying changes. This situation
arises during conflict detection for update-deleted scenarios.
To prevent this crash, ensure that the operation is errored out early if
the leader apply worker is unavailable. Since the leader worker maintains
the necessary conflict detection metadata, proceeding without it serves no
purpose and risks reporting incorrect conflict type.
In the passing, improve a nearby comment.
Reported by Tom Lane as per Coverity
Author: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/334468.1757280992@sss.pgh.pa.us
Commit fd6ec93bf8 and other previous work established the
principle that when an error is potentially reachable in case of on-disk
corruption but is not expected to be reached otherwise,
ERRCODE_DATA_CORRUPTED should be used. This allows log monitoring
software to search for evidence of corruption by filtering on the error
code.
Enhance the existing log messages emitted when the heap page is found to
be inconsistent with the VM by adding this error code.
Suggested-by: Andrey Borodin <x4mmm@yandex-team.ru>
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/87DD95AA-274F-4F4F-BAD9-7738E5B1F905%40yandex-team.ru
Commit 161a3e8b68 taught pg_upgrade to use COPY for large object
metadata for upgrades from v12 and newer, which is much faster to
restore than the proper large object commands. For upgrades from
v16 and newer, we can take this a step further and transfer the
large object metadata files as if they were user tables. We can't
transfer the files from older versions because the aclitem data
type (needed by pg_largeobject_metadata.lomacl) changed its storage
format in v16 (see commit 7b378237aa). Note that this commit is
essentially a revert of commit 12a53c732c.
There are a couple of caveats. First, we still need to COPY the
corresponding pg_shdepend rows for large objects. Second, we need
to COPY anything in pg_largeobject_metadata with a comment or
security label, else restoring those will fail. This means that an
upgrade in which every large object has a comment or security label
won't gain anything from this commit, but it should at least avoid
making those unusual use-cases any worse.
pg_upgrade must also take care to transfer the relfilenodes of
pg_largeobject_metadata and its index, as was done for
pg_largeobject in commits d498e052b4 and bbe08b8869.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aJ3_Gih_XW1_O2HF%40nathan
f83d709760 refactored xl_heap_prune and added an unused member,
reason. While PruneReason is used when constructing this WAL record to
set the WAL record definition, it doesn't need to be stored in a
separate field in the record. Remove it.
We won't backport this, since modifying an exposed struct definition to
remove an unused field would do more harm than good.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/tvvtfoxz5ykpsctxjbzxg3nldnzfc7geplrt2z2s54pmgto27y%40hbijsndifu45
rte->alias should point only to a user-written alias, but in these
cases that principle was violated. Fixing this causes some regression
test output changes: wherever rte->alias previously had a value and
is now NULL, rte->eref is now set to a generated name rather than to
rte->alias; and the scheme used to generate eref names differs from
what we were doing for aliases.
The upshot is that instead of "*SELECT*" or "*SELECT* %d",
EXPLAIN will now emit "unnamed_subquery" or "unnamed_subquery_%d".
But that's a reasonable descriptor, and we were already producing
that in yet other cases, so this seems not too objectionable.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Robert Haas <rhaas@postgresql.org>
Discussion: https://postgr.es/m/CA+TgmoYSYmDA2GvanzPMci084n+mVucv0bJ0HPbs6uhmMN6HMg@mail.gmail.com
Previously, heap_xlog_visible() called visibilitymap_pin() even after
getting a buffer from XLogReadBufferForRedoExtended() -- which returns a
pinned buffer containing the specified block of the visibility map.
This would just have resulted in visibilitymap_pin() returning early
since the specified page was already present and pinned, but it was
confusing extraneous code, so remove it. It doesn't seem worth
backporting, though.
It appears to be an oversight in 2c03216.
While we are at it, remove two VM-related redundant asserts in the COPY
FREEZE code path. visibilitymap_set() already asserts that
PD_ALL_VISIBLE is set on the heap page and checks that the vmbuffer
contains the bits corresponding to the specified heap block, so callers
do not also need to check this.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CALdSSPhu7WZd%2BEfQDha1nz%3DDC93OtY1%3DUFEdWwSZsASka_2eRQ%40mail.gmail.com
A test has been added to ensure that conflict-relevant data is not
prematurely removed when a concurrent prepared transaction is being
committed on the publisher.
This test introduces an injection point that simulates the presence of a
prepared transaction in the commit phase, validating that the system
correctly delays conflict slot advancement until the transaction is fully
committed.
Additionally, the test serves as a safeguard for developers, ensuring that
the acquisition of the commit timestamp does not occur before marking
DELAY_CHKPT_IN_COMMIT in RecordTransactionCommitPrepared.
Reported-by: Robert Haas <robertmhaas@gmail.com>
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS9PR01MB16913F67856B0DA2A909788129400A@OS9PR01MB16913.jpnprd01.prod.outlook.com
A new pgstats entry is created as a two-step process:
- The entry is looked at in the shared hashtable of pgstats, and is
inserted if not found.
- When not found and inserted, its fields are then initialized. This
part include a DSA chunk allocation for the stats data of the new entry.
As currently coded, if the DSA chunk allocation fails due to an
out-of-memory failure, an ERROR is generated, leaving in the pgstats
shared hashtable an inconsistent entry due to the first step, as the
entry has already been inserted in the hashtable. These broken entries
can then be found by other backends, crashing them.
There are only two callers of pgstat_init_entry(), when loading the
pgstats file at startup and when creating a new pgstats entry. This
commit changes pgstat_init_entry() so as we use dsa_allocate_extended()
with DSA_ALLOC_NO_OOM, making it return NULL on allocation failure
instead of failing. This way, a backend failing an entry creation can
take appropriate cleanup actions in the shared hashtable before throwing
an error. Currently, this means removing the entry from the shared
hashtable before throwing the error for the allocation failure.
Out-of-memory errors unlikely happen in the wild, and we do not bother
with back-patches when these are fixed, usually. However, the problem
dealt with here is a degree worse as it breaks the shared memory state
of pgstats, impacting other processes that may look at an inconsistent
entry that a different process has failed to create.
Author: Mikhail Kot <mikhail.kot@databricks.com>
Discussion: https://postgr.es/m/CAAi9E7jELo5_-sBENftnc2E8XhW2PKZJWfTC3i2y-GMQd2bcqQ@mail.gmail.com
Backpatch-through: 15
This commit fixes three issues:
1) When a disabled subscription is created with retain_dead_tuples set to true,
the launcher is not woken up immediately, which may lead to delays in creating
the conflict detection slot.
Creating the conflict detection slot is essential even when the subscription is
not enabled. This ensures that dead tuples are retained, which is necessary for
accurately identifying the type of conflict during replication.
2) Conflict-related data was unnecessarily retained when the subscription does
not have a table.
3) Conflict-relevant data could be prematurely removed before applying
prepared transactions on the publisher that are in the commit critical section.
This issue occurred because the backend executing COMMIT PREPARED was not
accounted for during the computation of oldestXid in the commit phase on
the publisher. As a result, the subscriber could advance the conflict
slot's xmin without waiting for such COMMIT PREPARED transactions to
complete.
We fixed this issue by identifying prepared transactions that are in the
commit critical section during computation of oldestXid in commit phase.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS9PR01MB16913DACB64E5721872AA5C02943BA@OS9PR01MB16913.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/OS9PR01MB16913F67856B0DA2A909788129400A@OS9PR01MB16913.jpnprd01.prod.outlook.com
This commit allows to log the raw parse tree in the same way we
currently log the parse tree, rewritten tree, and plan tree.
To avoid unnecessary log noise for users not interested in this
detail, a new GUC option, "debug_print_raw_parse", has been added.
When starting the PostgreSQL process with "-d N", and N is 3 or higher,
debug_print_raw_parse is enabled automatically, alongside
debug_print_parse.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Tatsuo Ishii <ishii@postgresql.org>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2mcO0Gpo4vd8kPMAFWeJLSp0MeUUnaLdE1x0tSVd-VzUw%40mail.gmail.com
This set of changes removes the list of available buffers and instead simply
uses the clock-sweep algorithm to find and return an available buffer. This
also removes the have_free_buffer() function and simply caps the
pg_autoprewarm process to at most NBuffers.
While on the surface this appears to be removing an optimization it is in fact
eliminating code that induces overhead in the form of synchronization that is
problematic for multi-core systems.
The main reason for removing the freelist, however, is not the moderate
improvement in scalability, but that having the freelist would require
dedicated complexity in several upcoming patches. As we have not been able to
find a case benefiting from the freelist...
Author: Greg Burd <greg@burd.me>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/70C6A5B5-2A20-4D0B-BC73-EB09DD62D61C@getmailspring.com
Add an assert to visibilitymap_set() that the provided heap buffer is
exclusively locked, which is expected.
Also, enhance the debug logging message to specify which VM flags were
set.
Based on a related suggestion by Kirill Reshke on an in-progress
patchset.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CALdSSPhAU56g1gGVT0%2BwG8RrSWE6qW8TOfNJS1HNAWX6wPgbFA%40mail.gmail.com
When executing a MERGE UPDATE action, if there is more than one
concurrent update of the target row, the lock-and-retry code would
sometimes incorrectly identify the latest version of the target tuple,
leading to incorrect results.
This was caused by using the ctid field from the TM_FailureData
returned by table_tuple_lock() in a case where the result was TM_Ok,
which is unsafe because the TM_FailureData struct is not guaranteed to
be fully populated in that case. Instead, it should use the tupleid
passed to (and updated by) table_tuple_lock().
To reduce the chances of similar errors in the future, improve the
commentary for table_tuple_lock() and TM_FailureData to make it
clearer that table_tuple_lock() updates the tid passed to it, and most
fields of TM_FailureData should not be relied on in non-failure cases.
An exception to this is the "traversed" field, which is set in both
success and failure cases.
Reported-by: Dmitry <dsy.075@yandex.ru>
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/1570d30e-2b95-4239-b9c3-f7bf2f2f8556@yandex.ru
Backpatch-through: 15
SlruRecentlyUsed() is an inline function since 53c2a97a92, not a
macro. The description of long_segment_names was missing at the top of
SimpleLruInit(), part forgotten in 4ed8f0913b.
Author: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://postgr.es/m/aLpBLMOYwEQkaleF@jrouhaud
Backpatch-through: 17
This commit changes some functions related to the data type numeric to
use the soft error reporting rather than a custom boolean flag (called
"have_error") that callers of these functions could rely on to bypass
the generation of ERROR reports, letting the callers do their own error
handling (timestamp, jsonpath and numeric_to_char() require them).
This results in the removal of some boilerplate code that was required
to handle both the ereport() and the "have_error" code paths bypassing
ereport(), unifying everything under the soft error reporting facility.
While on it, some duplicated error messages are removed. The function
upgraded in this commit were suffixed with "_opt_error" in their names.
They are renamed to "_safe" instead.
This change relies on d9f7f5d32f, that has introduced the soft error
reporting infrastructure.
Author: Amul Sul <sulamul@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAAJ_b96No5h5tRuR+KhcC44YcYUCw8WAHuLoqqyyop8_k3+JDQ@mail.gmail.com
pg_lsn includes pg_lsn_in_internal() for the purpose of parsing a LSN
position for the GUC recovery_target_lsn (21f428ebde). It relies on a
boolean called "have_error" that would be set when the LSN parsing
fails, then let its callers handle any errors.
d9f7f5d32f has added support for soft error reporting. This commit
removes some boilerplate code and switches the routine to use soft error
reporting directly, giving to the callers of pg_lsn_in_internal()
the possibility to be fed the error message generated on failure.
pg_lsn_in_internal() routine is renamed to pg_lsn_in_safe(), for
consistency with other similar routines that are given an escontext.
Author: Amul Sul <sulamul@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAAJ_b96No5h5tRuR+KhcC44YcYUCw8WAHuLoqqyyop8_k3+JDQ@mail.gmail.com
Commit 38b602b028 modified this function to allocate enough space
for MAX_NAMED_TRANCHES (256) requests, which is likely far more
than most clusters need. This commit reverts that change so that
it first allocates enough space for only 16 requests and resizes
the array when necessary. While at it, remove the check for too
many tranches from this function. We can now rely on
InitializeLWLocks() to do that check via its calls to
LWLockNewTrancheId() for the named tranches.
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/aLmzwC2dRbqk14y6%40nathan
When executing a MERGE, check that the target relation supports all
actions mentioned in the MERGE command. Specifically, check that it
has a REPLICA IDENTITY if it publishes updates or deletes and the
MERGE command contains update or delete actions. Failing to do this
can silently break replication.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Tested-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/OS3PR01MB57180C87E43A679A730482DF94B62@OS3PR01MB5718.jpnprd01.prod.outlook.com
Backpatch-through: 15
If an INSERT has an ON CONFLICT DO UPDATE clause, the executor must
check that the target relation supports UPDATE as well as INSERT. In
particular, it must check that the target relation has a REPLICA
IDENTITY if it publishes updates. Formerly, it was not doing this
check, which could lead to silently breaking replication.
Fix by adding such a check to CheckValidResultRel(), which requires
adding a new onConflictAction argument. In back-branches, preserve ABI
compatibility by introducing a wrapper function with the original
signature.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Tested-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/OS3PR01MB57180C87E43A679A730482DF94B62@OS3PR01MB5718.jpnprd01.prod.outlook.com
Backpatch-through: 13
The counters saved from pgWalUsage, used for the difference calculations
when flushing the backend WAL stats, are updated when calling
pgstat_flush_backend() under PGSTAT_BACKEND_FLUSH_WAL, and not
pgstat_report_wal(). The comment updated in this commit referenced the
latter, but it is perfectly OK to flush the backend stats independently
of the WAL stats.
Noticed while looking at this area of the code, introduced by
76def4cdd7 as a copy-pasto.
Backpatch-through: 18
There are many places in this test program that need to consume a
PGresult while checking that its PQresultStatus is as-expected, or
related tasks such as checking that PQgetResult has nothing more to
return. These tasks were open-coded in a rather inconsistent way,
leading to some outright bugs, some memory leakage, and frequent
inconsistencies about what would be reported in event of an error.
Invent a few helper functions to standardize the behavior and
reduce code duplication. Also, rename the one pre-existing helper
function from confirm_query_canceled to consume_query_cancel, per
Álvaro's suggestion that "confirm" is a poor choice of verb for a
function that will discard the PGresult.
While at it, clean up assorted other places that were leaking
PGresults or even server connections. This is pure neatnik-ism,
since the test doesn't run long enough for those leaks to be of
any real-world concern.
While this fixes some things that are clearly bugs, it's only
a test program, and none of the bugs seem serious enough to
justify back-patching.
Bug: #18960
Reported-by: Dmitry Kovalenko <d.kovalenko@postgrespro.ru>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/18960-09cd4a5100152e58@postgresql.org
There are two ways for shared libraries to allocate their own
LWLock tranches. One way is to call RequestNamedLWLockTranche() in
a shmem_request_hook, which requires the library to be loaded via
shared_preload_libraries. The other way is to call
LWLockNewTrancheId(), which is not subject to the same
restrictions. However, LWLockNewTrancheId() does require each
backend to store the tranche's name in backend-local memory via
LWLockRegisterTranche(). This API is a little cumbersome and leads
to things like unhelpful pg_stat_activity.wait_event values in
backends that haven't loaded the library.
This commit moves these LWLock tranche names to shared memory, thus
eliminating the need for each backend to call
LWLockRegisterTranche(). Instead, the tranche name must be
provided to LWLockNewTrancheId(), which immediately makes the name
available to all backends. Since the tranche name array is
append-only, lookups can ordinarily avoid locking as long as their
local copy of the LWLock counter is greater than the requested
tranche ID.
One downside of this approach is that we now have a hard limit on
both the length of tranche names (NAMEDATALEN-1 bytes) and the
number of dynamically-allocated tranches (256). Besides a limit of
NAMEDATALEN-1 bytes for tranche names registered via
RequestNamedLWLockTranche(), no such limits previously existed. We
could avoid these new limits by using dynamic shared memory, but
the complexity involved didn't seem worth it. We briefly
considered making the tranche limit user-configurable but
ultimately decided against that, too. Since there is still a lot
of time left in the v19 development cycle, it's possible we will
revisit this choice.
Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAA5RZ0vvED3naph8My8Szv6DL4AxOVK3eTPS0qXsaKi%3DbVdW2A%40mail.gmail.com
Meson's "auto" feature mode silently disables features with missing
prerequisites, which is nice for development but can lead to false
positives in the CI (such as my commit b0635bfda, which broke OAuth
detection on OpenBSD). Use an explicit feature list in the Cirrus config
instead; this mirrors the --with-XXX experience of Autoconf.
While we're here, also move common configuration options into a single
variable, MESON_COMMON_PG_CONFIG_ARGS, as suggested by Peter. The
resulting hierarchy is as follows:
MESON_COMMON_PG_CONFIG_ARGS "global" Meson configuration options
MESON_COMMON_FEATURES the default set of CI features, to be used
unless there's a specific reason not to
MESON_FEATURES per-OS feature configuration, overriding
the above
The current exceptions to the use of MESON_COMMON_FEATURES are
- SanityCheck, which uses almost no dependencies;
- Windows - VS, whose feature list has diverged significantly from the
others; and
- Linux, which continues to use 'auto' features so that autodetection is
still tested in the CI. (Options shared between 64- and 32-bit builds
can go into LINUX_MESON_FEATURES instead.)
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Suggested-by: Jacob Champion <jacob.champion@enterprisedb.com>
Suggested-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/flat/CAN55FZ0aO8d_jkyRijcGP8qO%3DXH09qG%3Dpw0ZZDvB4LMzuXYU1w%40mail.gmail.com
Commit 6359989654 had it so that the parameter "debug_discard_caches"
did not exist unless DISCARD_CACHES_ENABLED was defined (typically via
enabling asserts). This was a mistake, it did not correspond to the
prior setup. Several tests use this parameter, so they were now
failing if you did not have asserts enabled.
Store the information in guc_tables.c in a .dat file similar to the
catalog data in src/include/catalog/, and generate a part of
guc_tables.c from that. The goal is to make it easier to edit that
information, and to be able to make changes to the downstream data
structures more easily. (Essentially, those are the same reasons as
for the original adoption of the .dat format.)
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: David E. Wheeler <david@justatheory.com>
Discussion: https://www.postgresql.org/message-id/flat/dae6fe89-1e0c-4c3f-8d92-19d23374fb10%40eisentraut.org
SubPlan nodes are typically built very early, before any RelOptInfos
have been constructed for the parent query level. As a result, the
simple_rel_array in the parent root has not yet been initialized.
Currently, during cost estimation of a SubPlan's testexpr, we may call
examine_variable() to look up statistical data about the expressions.
This can lead to "no relation entry for relid" errors.
To fix, pass root as NULL to cost_qual_eval() in cost_subplan(), since
the root does not yet contain enough information to safely consult
statistics.
One exception is SubPlan nodes built for the initplans of MIN/MAX
aggregates from indexes. In this case, having a NULL root is safe
because testexpr will be NULL. Additionally, an initplan will by
definition not consult anything from the parent plan.
Backpatch to all supported branches. Although the reported call path
that triggers this error is not reachable prior to v17, there's no
guarantee that other code paths -- especially in extensions -- could
not encounter the same issue when cost_qual_eval() is called with a
root that lacks a valid simple_rel_array. The test case is not
included in pre-v17 branches though.
Bug: #19037
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19037-3d1c7bb553c7ce84@postgresql.org
Backpatch-through: 13
PQtrace() was generating its output for non-printable characters without
casting the characters printed with unsigned char, leading to some extra
"\xffffff" generated in the output due to the fact that char may be
signed.
Oversights introduced by commit 198b3716db, so backpatch down to v14.
Author: Ran Benita <ran@unusedvar.com>
Discussion: https://postgr.es/m/a3383211-4539-459b-9d51-95c736ef08e0@app.fastmail.com
Backpatch-through: 14
SLRU bank locks are referred as "bank locks" or "SLRU bank locks" in the
code comments. The comments updated in this commit use the latter term.
Oversight in 53c2a97a92, that has replaced the single ControlLock by
the bank control locks.
Author: Julien Rouhaud <julien.rouhaud@free.fr>
Discussion: https://postgr.es/m/aLUT2UO8RjJOzZNq@jrouhaud
Backpatch-through: 17
COPY TO does not support a WHERE clause, and currently fails with the error:
ERROR: WHERE clause not allowed with COPY TO
Since the intended behavior can be achieved by using
COPY (SELECT ... WHERE ...) TO, this commit adds a HINT
to the error message:
HINT: Try the COPY (SELECT ... WHERE ...) TO variant.
This makes the error more informative and helps users
quickly find the alternative usage.
Author: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Discussion: https://postgr.es/m/3520c224c5ffac0113aef84a9179f37e@oss.nttdata.com
Note that this doesn't require bumping SLOT_VERSION because we
require sizeof(bool) == 1, thanks to commit 97525bc5c8.
Overight in commit ddd5f4f54a.
Discussion: Ranier Vilela <ranier.vf@gmail.com>
Previously, duplicate labels in CREATE TYPE AS ENUM were caught by
the unique index on pg_enum, resulting in a generic error message.
While this was evidently intentional, it's not terribly user-friendly,
nor consistent with the ALTER TYPE cases which take more care with
such errors. This patch adds an explicit check to produce a more
user-friendly and descriptive error message.
A potential objection to this implementation is that it adds O(N^2)
work to the creation operation. However, quick testing finds that
that's pretty negligible below 1000 enum labels, and tolerable even
at 10000. So it doesn't really seem worth being smarter.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20250704000402.37e605ab0c59c300965a17ee@sraoss.co.jp
This change replaces seven functions definitions by macros, reducing a
bit some repetitive patterns in the code. An interesting side effect is
that this removes an inconsistency in the naming of SLRU increment
functions with the field names.
This change is similar to 850f4b4c8c, 8018ffbf58 or 83a1a1b566.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aLHA//gr4dTpDHHC@ip-10-97-1-34.eu-west-3.compute.internal
This commit introduces a new subscription parameter,
max_retention_duration, aimed at mitigating excessive accumulation of dead
tuples when retain_dead_tuples is enabled and the apply worker lags behind
the publisher.
When the time spent advancing a non-removable transaction ID exceeds the
max_retention_duration threshold, the apply worker will stop retaining
conflict detection information. In such cases, the conflict slot's xmin
will be set to InvalidTransactionId, provided that all apply workers
associated with the subscription (with retain_dead_tuples enabled) confirm
the retention duration has been exceeded.
To ensure retention status persists across server restarts, a new column
subretentionactive has been added to the pg_subscription catalog. This
prevents unnecessary reactivation of retention logic after a restart.
The conflict detection slot will not be automatically re-initialized
unless a new subscription is created with retain_dead_tuples = true, or
the user manually re-enables retain_dead_tuples.
A future patch will introduce support for automatic slot re-initialization
once at least one apply worker confirms that the retention duration is
within the configured max_retention_duration.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Several statements need to reference the current connection's current
database name and current port value. Until now, this has been
accomplished by creating dynamic SQL statements inside of a DO block,
which is not as easy to parse. It also takes away some of the
granularity of any error messages that might occur, making debugging
harder.
By capturing the connection-specific settings into psql variables, it
becomes possible to write simpler SQL statements for the FDW objects.
This eliminates most of DO blocks used in this test, making it a bit
more readable and shorter.
Author: Author: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/CADkLM=cpUiJ3QF7aUthTvaVMmgQcm7QqZBRMDLhBRTR+gJX-Og@mail.gmail.com
Constraint expressions and statistics expressions loaded from the
system catalogs need to be run through const-simplification, because
the planner will be comparing them to similarly-processed qual
clauses. Without this step, the planner may fail to detect valid
matches.
Currently, NullTest clauses in these expressions may not be reduced
correctly during const-simplification. This happens because their Var
nodes do not yet have the correct varno when eval_const_expressions is
applied. Since eval_const_expressions relies on varno to reduce
NullTest quals, incorrect varno can cause problems.
Additionally, for statistics expressions, eval_const_expressions is
called with root set to NULL, which also inhibits NullTest reduction.
This patch fixes the issue by ensuring that Vars are updated to have
the correct varno before const-simplification, and that a valid root
is passed to eval_const_expressions when needed.
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/19007-4cc6e252ed8aa54a@postgresql.org
A proposed patch would place a limit of NAMEDATALEN-1 (i.e., 63)
bytes on the names of dynamically-allocated LWLock tranches, but
GetNamedDSA() and GetNamedDSHash() may register tranches with
longer names. This commit lowers the maximum DSM registry entry
name length to NAMEDATALEN-1 bytes and modifies GetNamedDSHash() to
create only one tranche, thereby allowing us to keep the DSM
registry's tranche names below NAMEDATALEN bytes.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/aKzIg1JryN1qhNuy%40nathan
Show the requested lock level and the object being waited on,
in the same format we use for deadlock reports and similar errors.
This is particularly helpful for debugging lock-timeout errors,
since otherwise the user has very little to go on about which
lock timed out. The performance cost of setting up the callback
should be negligible compared to the other tracing support already
present in WaitOnLock.
As in the deadlock-report case, we just show numeric object OIDs,
because it seems too scary to try to perform catalog lookups
in this context.
Reported-by: Steve Baldwin <steve.baldwin@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1602369.1752167154@sss.pgh.pa.us
Compression in pg_dump is abstracted using an API with multiple
implementations which can be selected at runtime by the user.
The API and its implementations have evolved over time, notable
commits include bf9aa490db, e9960732a9, 84adc8e20, and 0da243fed.
The errorhandling defined by the API was however problematic and
the implementations had a few bugs and/or were not following the
API specification. This commit modifies the API to ensure that
callers can perform errorhandling efficiently and fixes all the
implementations such that they all implement the API in the same
way. A full list of the changes can be seen below.
* write_func:
- Make write_func throw an error on all error conditions. All
callers of write_func were already checking for success and
calling pg_fatal on all errors, so we might as well make the
API support that case directly with simpler errorhandling as
a result.
* open_func:
- zstd: move stream initialization from the open function to
the read and write functions as they can have fatal errors.
Also ensure to dup the file descriptor like none and gzip.
- lz4: Ensure to dup the file descriptor like none and gzip.
* close_func:
- zstd: Ensure to close the file descriptor even if closing
down the compressor fails, and clean up state allocation on
fclose failures. Make sure to capture errors set by fclose.
- lz4: Ensure to close the file descriptor even if closing
down the compressor fails, and instead of calling pg_fatal
log the failures using pg_log_error. Make sure to capture
errors set by fclose.
- none: Make sure to catch errors set by fclose.
* read_func / gets_func:
- Make read_func unconditionally return the number of read
bytes instead of making it optional per implementation.
- lz4: Make sure to call throw an error and not return -1
- gzip: gzread returning zero cannot be assumed to indicate
EOF as it is documented to return zero for some types of
errors.
- lz4, zstd: Convert the _read_internal helper functions to
not call pg_fatal on errors to be able to handle gets_func
returning NULL on error.
* getc_func:
- zstd: Use an unsigned char rather than an int to read char
into.
* LZ4Stream_init:
- Make sure to not switch to inited state until we know that
initialization succeeded and reset errno just in case.
On top of these changes there are minor comment cleanups and
improvements as well as an attempt to consistently reset errno
in codepaths where it is inspected.
This work was initiated by a report of API misuse, which turned
into a larger body of work. As this is an internal API these
changes can be backpatched into all affected branches.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Evgeniy Gorbanev <gorbanyoves@basealt.ru>
Discussion: https://postgr.es/m/517794.1750082166@sss.pgh.pa.us
Backpatch-through: 16
Using the LWLockCounter requires first calculating its address in
shared memory like this:
LWLockCounter = (int *) ((char *) MainLWLockArray - sizeof(int));
Commit 82e861fbe1 started this trend in order to fix EXEC_BACKEND
builds, but it could also be fixed by adding it to the
BackendParameters struct. The current approach is somewhat
difficult to follow, so this commit switches to the latter. While
at it, swap around the code in LWLockShmemSize() to match the order
of assignments in CreateLWLocks() for added readability.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aLDLnan9gNCS9fHx%40nathan
Newer gcc versions will emit warnings about missing extern
declarations if certain header files are compiled by themselves.
Add the "extern" declarations needed to quiet that.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/1127775.1754417387@sss.pgh.pa.us
It's possible that if the only live partition is concurrently dropped
and try_table_open() fails, that the bms_del_member() will pfree the
live_parts Bitmapset. Since the bms_del_member() call does not assign
the result back to the live_parts local variable, the while loop could
segfault as that variable would still reference the pfree'd Bitmapset.
Backpatch to 15. 52f3de874 was backpatched to 14, but there's no
bms_del_member() there due to live_parts not yet existing in RelOptInfo in
that version. Technically there's no bug in version 15 as
bms_del_member() didn't pfree when the set became empty prior to
00b41463c (from v16). Applied to v15 anyway to keep the code similar and
to avoid the bad coding pattern.
Author: Bernd Reiß <bd_reiss@gmx.at>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/6b88f27a-c45c-4826-8e37-d61a04d90182@gmx.at
Backpatch-through: 15
I think the error message for a different condition was inadvertently
copied.
This problem seems to have been introduced by commit a4d75c86bf.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reported-by: jian he <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 14
Discussion: https://postgr.es/m/CACJufxEZ48toGH0Em_6vdsT57Y3L8pLF=DZCQ_gCii6=C3MeXw@mail.gmail.com
Ignore src/include/port/win32/sys/resource.h. At least on macOS,
including this results in warnings and errors because of duplication
with system headers:
../src/include/port/win32/sys/resource.h:10:9: warning: 'RUSAGE_CHILDREN' redefined
../src/include/port/win32/sys/resource.h:16:1: error: redefinition of struct or union 'struct rusage'
Since we are also not checking similar system-replacement headers for
Windows, it makes sense to exclude this one, too.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/1127775.1754417387%40sss.pgh.pa.us
Otherwise, headerscheck will fail if the ICU headers are in a location
not reached by the normal CFLAGS/CPPFLAGS:
../src/include/utils/pg_locale.h:21:10: fatal error: unicode/ucol.h: No such file or directory
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/1127775.1754417387%40sss.pgh.pa.us
BufferGetPage() already returns type Page, so casting it to Page
doesn't achieve anything. A sizable number of call sites does this
casting; remove that.
This was already done inconsistently in the code in the first import
in 1996 (but didn't exist in the pre-1995 code), and it was then
apparently just copied around.
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/CALdSSPgFhc5=vLqHdk-zCcnztC0zEY3EU_Q6a9vPEaw7FkE9Vw@mail.gmail.com
For a child relation, we should not assume that its parent's
unique-ified relation (or unique-ified path in v18) always exists. In
cases where all RHS columns that need to be unique-ified are equated
to constants, the unique-ified relation/path for the parent table is
not built, as there are no columns left to unique-ify. Failing to
account for this can result in a SIGSEGV crash during planning.
This patch checks whether the parent's unique-ified relation or path
exists and skips unique-ification of the child relation if it does
not.
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49MOdLW2c+qbLHHBt8VBu=4ONpM91D19=AWeW93eFUF6A@mail.gmail.com
Backpatch-through: 18
Previously, we used LW_EXCLUSIVE in several places despite only reading
WalSummarizerCtl fields. This patch reduces the lock level to LW_SHARED
where we are only reading the shared fields.
Backpatch to 17, where wal summarization was introduced.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/CAD21AoDdKhf_9oriEYxY-JCdF+Oe_muhca3pcdkMEdBMzyHyKw@mail.gmail.com
Backpatch-through: 17
One mechanism we have for implementing semi-joins is to de-duplicate
the output of the RHS and then treat the join as a plain inner join.
Initial construction of the join's SpecialJoinInfo identifies the
RHS columns that need to be de-duplicated, but later we may find that
some of those don't need to be handled explicitly, either because
they're known to be constant or because they are redundant with some
previous column.
Up to now, while sort-based de-duplication handled such cases well,
hash-based de-duplication didn't: we'd still hash on all of the
originally-identified columns. This is probably not a very big
deal performance-wise, but in the wake of commit a3179ab69 it can
cause planner errors. That happens when join elimination causes
recalculation of variables' attr_needed bitmapsets, and we decide
that a variable mentioned in a semijoin clause doesn't need to be
propagated up to the join level anymore.
There are a number of ways we could slice the blame for this, but the
only fix that doesn't result in pessimizing plans for loosely-related
cases is to be more careful about not hashing columns we don't
actually need to de-duplicate. We can install that consideration
into create_unique_paths in master, or the predecessor code in
create_unique_path in v18, without much refactoring.
(As follow-up work, it might be a good idea to look at more-invasive
refactoring, in hopes of preventing other bugs in this area. But
with v18 release so close, there's not time for that now, nor would
we be likely to want to put such refactoring into v18 anyway.)
Reported-by: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Diagnosed-by: Richard Guo <guofenglinux@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/1fd1a421-4609-4d46-a1af-ab74d5de504a@tantorlabs.ru
Backpatch-through: 18
This has been done historically because of get_database_name (which
since commit cb98e6fb8f belongs in lsyscache.c/h, so let's move it
there) and get_database_oid (which is in the right place, but whose
declaration should appear in pg_database.h rather than dbcommands.h).
Clean this up.
Also, xlogreader.h and stringinfo.h are no longer needed by dbcommands.h
since commit f1fd515b39, so remove them.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/202508191031.5ipojyuaswzt@alvherre.pgsql
During an investigation into rather odd aio related errors on macos, observed
by Alexander and Konstantin, we started to wonder if bitfield access is
related to the error. At the moment it looks like it is related, we cannot
reproduce the failures when replacing the bitfields. In addition, the problem
can only be reproduced with some compiler [versions] and not everyone has been
able to reproduce the issue.
The observed problem is that, very rarely, PgAioHandle->{state,target} are in
an inconsistent state, after having been checked to be in a valid state not
long before, triggering an assertion failure. Unfortunately, this could be
caused by wrong compiler code generation or somehow of missing memory barriers
- we don't really know. In theory there should not be any concurrent write
access to the handle in the state the bug is triggered, as the handle was idle
and is just being initialized.
Separately from the bug, we observed that at least gcc and clang generate
rather terrible code for the bitfield access. Even if it's not clear if the
observed assertion failure is actually caused by the bitfield somehow, the bad
code generation alone is sufficient reason to stop using bitfields.
Therefore, replace the enum bitfields with uint8s and instead cast in each
switch statement.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reported-by: Konstantin Knizhnik <knizhnik@garret.ru>
Discussion: https://postgr.es/m/1500090.1745443021@sss.pgh.pa.us
Backpatch-through: 18
Commit d31bbfb659 lost some test coverage, because the situation
being tested, a concurrent DROP, cannot happen anymore. Put the test
coverage back with a bit of a trick, by deleting directly from the
catalog table.
Co-authored-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/bf72b82c-124d-4efa-a484-bb928e9494e4@eisentraut.org
Commit d31bbfb659 removed the comment at objectNamesToOids() that
there is no locking, because that commit added locking. But to fix
all the problems, we'd still need a stronger lock. So put the comment
back with more a detailed explanation.
Co-authored-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/bf72b82c-124d-4efa-a484-bb928e9494e4@eisentraut.org
This patch fixes a bug in how 'load_external_function' handles
'$libdir/ prefixes in module paths.
Previously, 'load_external_function' would unconditionally strip
'$libdir/' from the beginning of the 'filename' string. This caused
an issue when the path was nested, such as "$libdir/nested/my_lib".
Stripping the prefix resulted in a path of "nested/my_lib", which
would fail to be found by the expand_dynamic_library_name function
because the original '$libdir' macro was removed.
To fix this, the code now checks for the presence of an additional
directory separator ('/' or '\') after the '$libdir/' prefix. The
prefix is only stripped if the remaining string does not contain a
separator. This ensures that simple filenames like '"$libdir/my_lib"'
are correctly handled, while nested paths are left intact for
'expand_dynamic_library_name' to process correctly.
Reported-by: Dilip Kumar <dilipbalaut@gmail.com>
Co-authored-by: Matheus Alcantara <matheusssilv97@gmail.com>
Co-authored-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAFiTN-uKNzAro4tVwtJhF1UqcygfJ%2BR%2BRL%3Db-_ZMYE3LdHoGhA%40mail.gmail.com
When checking for expression indexes that may be affected by a Unicode
update during upgrade, check for a few more functions. Specifically,
check for documented regexp functions, as well as the new CASEFOLD()
function.
Also, fully-qualify references to pg_catalog.text and
pg_catalog.regtype.
Discussion: https://postgr.es/m/399b656a3abb0c9283538a040f72199c0601525c.camel@j-davis.com
Backpatch-through: 18
Followup to 4e1e41733 and 52ecd05ae. oauth-utils.c uses
pthread_sigmask(), requiring -pthread on Debian bullseye at minimum.
Reported-by: Christoph Berg <myon@debian.org>
Tested-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/aK4PZgC0wuwQ5xSK%40msg.df7cb.de
Backpatch-through: 18
When vacuumdb's --missing-stats-only option is used, the catalog
query for retrieving the list of relations to process must read
pg_statistic and pg_statistic_ext_data. However, those catalogs
can only be read by superusers by default, so --missing-stats-only
is effectively superuser-only. This is unfortunate, but since the
option is primarily intended for use by administrators after
running pg_upgrade, let's just live with it for v18. This commit
adds a note about the aforementioned privilege requirements to the
documentation for --missing-stats-only.
We first tried to improve matters by modifying the query to read
the pg_stats and pg_stats_ext system views instead. While that is
indeed more lenient from a privilege standpoint, it is also
borderline incomprehensible. pg_stats shows rows for which the
user has the SELECT privilege on the corresponding column, and
pg_stats_ext shows rows for tables the user owns. Meanwhile,
ANALYZE requires either MAINTAIN on the table or, for non-shared
relations, ownership of the database. But even if the privilege
discrepancies were tolerable, the performance impact was not.
Ultimately, the modified query was substantially more expensive, so
we abandoned the idea.
For v19, perhaps we could introduce a simple, inexpensive way to
discover which relations are missing statistics, such as a system
function or view with similar privilege requirements to ANALYZE.
Unfortunately, it is far too late for anything like that in v18.
Reviewed-by: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwHh43suEfss1wvBsk7vqiou%3DUY0zcy8HGyE5hBp%2BHZ7SQ%40mail.gmail.com
Backpatch-through: 18
Commit 4b754d6c1 introduced the concept of an excludeOnly scan key,
which cannot select matching index entries but can reject
non-matching tuples, for example a tsquery such as '!term'. There are
poorly-documented assumptions that such scan keys do not appear as the
first scan key. ginNewScanKey did nothing to ensure that, however,
with the result that certain GIN index searches could go into an
infinite loop while apparently-equivalent queries with the clauses in
a different order were fine.
Fix by teaching ginNewScanKey to place all excludeOnly scan keys
after all not-excludeOnly ones. So far as we know at present,
it might be sufficient to avoid the case where the very first
scan key is excludeOnly; but I'm not very convinced that there
aren't other dependencies on the ordering.
Bug: #19031
Reported-by: Tim Wood <washwithcare@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19031-0638148643d25548@postgresql.org
Backpatch-through: 13
The CHECK_FOR_INTERRUPTS call in gingetbitmap turns out to be
inadequate to prevent a long uninterruptible loop, because
we now know a case where looping occurs within scanGetItem.
While the next patch will fix the bug that caused that, it
seems foolish to assume that no similar patterns are possible.
Let's do the CFI within scanGetItem's retry loop, instead.
This demonstrably allows canceling out of the loop exhibited
in bug #19031.
Bug: #19031
Reported-by: Tim Wood <washwithcare@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19031-0638148643d25548@postgresql.org
Backpatch-through: 13
The Self-Join Elimination SJE feature messes up keeping and removing RowMark's
in remove_self_joins_one_group(). That didn't lead to user-level error,
because the planned RowMark is only used to reference a rtable entry in later
execution stages. An RTE entry for keeping and removing relations is
identical and refers to the same relation OID.
To reduce confusion and prevent future issues, this commit cleans up the code
and fixes the incorrect behaviour. Furthermore, it includes sanity checks in
setrefs.c on existing non-null RTE and RelOptInfo entries for each RowMark.
Discussion: https://postgr.es/m/18c6bd6c-6d2a-419a-b0da-dfedef34b585%40gmail.com
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Backpatch-through: 18
Rename inner and outer to rrel and krel, respectively, to highlight their
connection to r and k indexes. For the same reason, rename imark and omark
to rmark and kmark.
Discussion: https://postgr.es/m/18c6bd6c-6d2a-419a-b0da-dfedef34b585%40gmail.com
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Backpatch-through: 18
This is a follow-up for commit c2c2c7e225. It further clarifies the
following in the initcap function documentation:
* Document that title case is used for digraphs in specific locales,
* Reference particular ICU function used,
* Add note about the purpose of the function.
Discussion: https://postgr.es/m/804cc10ef95d4d3b298e76b181fd9437%40postgrespro.ru
Author: Oleg Tselebrovskiy <o.tselebrovskiy@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
This changes configure and meson.build to require at least C11,
instead of the previous C99. The installation documentation is
updated accordingly.
configure.ac previously used AC_PROG_CC_C99 to activate C99. But
there is no AC_PROG_CC_C11 in Autoconf 2.69, because it's too
old. (Also, post-2.69, the AC_PROG_CC_Cnn macros were deprecated and
AC_PROG_CC activates the last supported C mode.) We could update the
required Autoconf version, but that might be a separate project that
no one wants to undertake at the moment. Instead, we open-code the
test for C11 using some inspiration from later Autoconf versions. But
instead of writing an elaborate test program, we keep it simple and
just check __STDC_VERSION__, which should be good enough in practice.
In meson.build, we update the existing C99 test to C11, but again we
just check for __STDC_VERSION__.
This also removes the separate option for the conforming preprocessor
on MSVC, added by commit 8fd9bb1d96, since that is activated
automatically in C11 mode.
Note, we don't use the "official" way to set the C standard in Meson
using the c_std project option, because that is impossible to use
correctly (see <https://github.com/mesonbuild/meson/issues/14717>).
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/01a69441-af54-4822-891b-ca28e05b215a@eisentraut.org
To better record the internal behaviors of oauth-curl.c, add a unit test
suite for the socket and timer handling code. This is all based on TAP
and driven by our existing Test::More infrastructure.
This commit is a replay of 1443b6c0e, which was reverted due to
buildfarm failures. Compared with that, this version protects the build
targets in the Makefile with a with_libcurl conditional, and it tweaks
the code style in 001_oauth.pl.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CAOYmi+nDZxJHaWj9_jRSyf8uMToCADAmOfJEggsKW-kY7aUwHA@mail.gmail.com
Discussion: https://postgr.es/m/CAOYmi+m=xY0P_uAzAP_884uF-GhQ3wrineGwc9AEnb6fYxVqVQ@mail.gmail.com
libpq-oauth uses floor() but did not link against libm. Since libpq
itself uses -lm, nothing in the buildfarm has had problems with
libpq-oauth yet, and it seems difficult to hit a failure in practice.
But commit 1443b6c0e attempted to add an executable based on
libpq-oauth, which ran into link-time failures with Clang due to this
omission. It seems prudent to fix this for both the module and the
executable simultaneously so that no one trips over it in the future.
This is a Makefile-only change. The Meson side already pulls in libm,
through the os_deps dependency.
Discussion: https://postgr.es/m/CAOYmi%2Bn6ORcmV10k%2BdAs%2Bp0b9QJ4bfpk0WuHQaF5ODXxM8Y36A%40mail.gmail.com
Backpatch-through: 18
v17 introduced the MAINTAIN ON TABLES privilege. That changed the
applicable "baseacls" reaching buildACLCommands(). That yielded
spurious TestUpgradeXversion diffs. Change to use a TYPES privilege.
Types have the same one privilege in all supported versions, so they
avoid the problem. Per buildfarm. Back-patch to v13, like that commit.
Discussion: https://postgr.es/m/20250823144505.88.nmisch@google.com
Backpatch-through: 13
Commit 0decd5e89d missed DO_DEFAULT_ACL,
leading to assertion failures, potential dump order instability, and
spurious schema diffs. Back-patch to v13, like that commit.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/d32aaa8d-df7c-4f94-bcb3-4c85f02bea21@gmail.com
Backpatch-through: 13
This reverts commit bc22dc0e0d.
It appears that conditional variables are not suitable for use inside
critical sections. If WaitLatch()/WaitEventSetWaitBlock() face postmaster
death, they exit, releasing all locks instead of PANIC. In certain
situations, this leads to data corruption.
Reported-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/B3C69B86-7F82-4111-B97F-0005497BB745%40yandex-team.ru
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Yura Sokolov <y.sokolov@postgrespro.ru>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Backpatch-through: 18
Statistics aren't created for virtual generated columns, so
"vacuumdb --missing-stats-only" always chooses to analyze tables
that have them. To fix, modify vacuumdb's query for retrieving
relations that are missing statistics to exclude those columns.
Oversight in commit edba754f05.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/20250820104226.8ba51e43164cd590b863ce41%40sraoss.co.jp
Backpatch-through: 18
The protocol documentation states that the maximum length of a cancel
key is 256 bytes. This starts checking for that limit in libpq.
Otherwise third party backend implementations will probably start
using more bytes anyway. We also start requiring that a protocol 3.0
connection does not send a longer cancel key, to make sure that
servers don't start breaking old 3.0-only clients by accident. Finally
this also restricts the minimum key length to 4 bytes (both in the
protocol spec and in the libpq implementation).
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Jacob Champion <jchampion@postgresql.org>
Discussion: https://www.postgresql.org/message-id/df892f9f-5923-4046-9d6f-8c48d8980b50@iki.fi
Backpatch-through: 18
In most cases, if an out-of-memory situation happens, we attach the
error message to the connection and report it at the next
PQgetResult() call. However, there are a few cases, while processing
messages that are not associated with any particular query, where we
handled failed allocations differently and not very nicely:
- If we ran out of memory while processing an async notification,
getNotify() either returned EOF, which stopped processing any
further data until more data was received from the server, or
silently dropped the notification. Returning EOF is problematic
because if more data never arrives, e.g. because the connection was
used just to wait for the notification, or because the next
ReadyForQuery was already received and buffered, it would get stuck
forever. Silently dropping a notification is not nice either.
- (New in v18) If we ran out of memory while receiving BackendKeyData
message, getBackendKeyData() returned EOF, which has the same issues
as in getNotify().
- If we ran out of memory while saving a received a ParameterStatus
message, we just skipped it. A later call to PQparameterStatus()
would return NULL, even though the server did send the status.
Change all those cases to terminate the connnection instead. Our
options for reporting those errors are limited, but it seems better to
terminate than try to soldier on. Applications should handle
connection loss gracefully, whereas silently missing a notification,
parameter status, or cancellation key could cause much weirder
problems.
This also changes the error message on OOM while expanding the input
buffer. It used to report "cannot allocate memory for input buffer",
followed by "lost synchronization with server: got message type ...".
The "lost synchronization" message seems unnecessary, so remove that
and report only "cannot allocate memory for input buffer". (The
comment speculated that the out of memory could indeed be caused by
loss of sync, but that seems highly unlikely.)
This evolved from a more narrow patch by Jelte Fennema-Nio, which was
reviewed by Jacob Champion.
Somewhat arbitrarily, backpatch to v18 but no further. These are
long-standing issues, but we haven't received any complaints from the
field. We can backpatch more later, if needed.
Co-authored-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Jacob Champion <jchampion@postgresql.org>
Discussion: https://www.postgresql.org/message-id/df892f9f-5923-4046-9d6f-8c48d8980b50@iki.fi
Backpatch-through: 18
Commit 1585ff7387 changed GetTransactionSnapshot() to throw an error
if it's called during logical decoding, instead of returning the
historic snapshot. I made that change for extra protection, because a
historic snapshot can only be used to access catalog tables while
GetTransactionSnapshot() is usually called when you're executing
arbitrary queries. You might get very subtle visibility problems if
you tried to use the historic snapshot for arbitrary queries.
There's no built-in code in PostgreSQL that calls
GetTransactionSnapshot() during logical decoding, but it turns out
that the pglogical extension does just that, to evaluate row filter
expressions. You would get weird results if the row filter runs
arbitrary queries, but it is sane as long as you don't access any
non-catalog tables. Even though there are no checks to enforce that in
pglogical, a typical row filter expression does not access any tables
and works fine. Accessing tables marked with the user_catalog_table =
true option is also OK.
To fix pglogical with row filters, and any other extensions that might
do similar things, revert GetTransactionSnapshot() to return a
historic snapshot during logical decoding.
To try to still catch the unsafe usage of historic snapshots, add
checks in heap_beginscan() and index_beginscan() to complain if you
try to use a historic snapshot to scan a non-catalog table. We're very
close to the version 18 release however, so add those new checks only
in master.
Backpatch-through: 18
Reported-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://www.postgresql.org/message-id/20250809222338.cc.nmisch@google.com
Reduce from ShareLock to ShareUpdateExclusivelock. Validation during
ALTER DOMAIN ... ADD CONSTRAINT keeps using ShareLock.
Example:
create domain d1 as int;
create table t (a d1);
alter domain d1 add constraint cc10 check (value > 10) not valid;
begin;
alter domain d1 validate constraint cc10;
-- another session
insert into t values (8);
Now we should still be able to perform DML operations on table t while
the domain constraint is being validated. The equivalent works
already on table constraints.
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CACJufxHz92A88NLRTA2msgE2dpXpE-EoZ2QO61od76-6bfqurA%40mail.gmail.com
This code was relying on "long", which is signed 8 bytes everywhere
except on Windows where it is 4 bytes, that could potentially expose it
to overflows, even if the current uses in the code are fine as far as I
know. This code is now able to rely on the same sizeof() variable
everywhere, with int64. long was used for sizes, partition counts and
entry counts.
Some callers of the dynahash.c routines used long declarations, that can
be cleaned up to use int64 instead. There was one shortcut based on
SIZEOF_LONG, that can be removed. long is entirely removed from
dynahash.c and hsearch.h.
Similar work was done in b1e5c9fa9a.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aKQYp-bKTRtRauZ6@paquier.xyz
Temporary relations may share the same RelFileNumber with a permanent
relation, or other temporary relations associated with other sessions.
Being able to uniquely identify a temporary relation would require
RelidByRelfilenumber() to know about the proc number of the temporary
relation it wants to identify, something it is not designed for since
its introduction in f01d1ae3a1.
There are currently three callers of RelidByRelfilenumber():
- autoprewarm.
- Logical decoding, reorder buffer.
- pg_filenode_relation(), that attempts to find a relation OID based on
a tablespace OID and a RelFileNumber.
This makes the situation problematic particularly for the first two
cases, leading to the possibility of random ERRORs due to
inconsistencies that temporary relations can create in the cache
maintained by RelidByRelfilenumber(). The third case should be less of
an issue, as I suspect that there are few direct callers of
pg_filenode_relation().
The window where the ERRORs are happen is very narrow, requiring an OID
wraparound to create a lookup conflict in RelidByRelfilenumber() with a
temporary table reusing the same OID as another relation already cached.
The problem is easier to reach in workloads with a high OID consumption
rate, especially with a higher number of temporary relations created.
We could get pg_filenode_relation() and RelidByRelfilenumber() to work
with temporary relations if provided the means to identify them with an
optional proc number given in input, but the years have also shown that
we do not have a use case for it, yet. Note that this could not be
backpatched if pg_filenode_relation() needs changes. It is simpler to
ignore temporary relations.
Reported-by: Shenhao Wang <wangsh.fnst@fujitsu.com>
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-By: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-By: Takamichi Osumi <osumi.takamichi@fujitsu.com>
Reviewed-By: Michael Paquier <michael@paquier.xyz>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-By: Shenhao Wang <wangsh.fnst@fujitsu.com>
Discussion: https://postgr.es/m/bbaaf9f9-ebb2-645f-54bb-34d6efc7ac42@fujitsu.com
Backpatch-through: 13
The result of pgaio_io_get_id() was being assigned to a mix of int and
uint32 variables. This fixes it to use int consistently, which seems
the most correct. Also change the queue empty special value in
method_worker.c to -1 from UINT32_MAX.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/70c784b3-f60b-4652-b8a6-75e5f051243e%40eisentraut.org
Replication slot synchronization (sync_replication_slots = on)
requires wal_level to be logical. This commit prevents the server
from starting if sync_replication_slots is enabled but wal_level
is set to minimal or replica.
Failing early during startup helps users catch invalid configurations
immediately, which is important because changing wal_level requires
a server restart.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Discussion: https://postgr.es/m/CAH0PTU_pc3oHi__XESF9ZigCyzai1Mo3LsOdFyQA4aUDkm01RA@mail.gmail.com
This is similar to 19c6e92b13, in order to keep the style used in the
scripts consistent for the option names and values used in commands.
The places updated in this commit have been added recently in
71ea0d6795.
These changes are cosmetic; there is no need for a backpatch.
The description of this GUC provides a list of the situations where
full-page writes are generated. However, it is not completely exact,
mentioning only the cases where full_page_writes=on or base backups. It
is possible to generate full-page writes in more situations than these
two, making the description confusing as it implies that no other cases
exist.
The description is slightly reworded to take into account that other
cases are possible, without mentioning them directly to minimize the
maintenance burden should FPWs be generated in more contexts in the
future.
Author: Jingtang Zhang <mrdrivingduck@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/CAPsk3_CtAYa_fy4p6=x7qtoutrdKvg1kGk46D5fsE=sMt2546g@mail.gmail.com
Backpatch-through: 13
If we error out during execution of a SQL-language function, we will
often leave behind non-null pointers in its SQLFunctionCache's cplan
and eslist fields. This is problematic if the SQLFunctionCache is
re-used, because those pointers will point at resources that were
released during error cleanup. This problem escaped detection so far
because ordinarily we won't re-use an FmgrInfo+SQLFunctionCache struct
after a query error. However, in the rather improbable case that
someone implements an opclass support function in SQL language, there
will be long-lived FmgrInfos for it in the relcache, and then the
problem is reachable after the function throws an error.
To fix, add a flag to SQLFunctionCache that tracks whether execution
escapes out of fmgr_sql, and clear out the relevant fields during
init_sql_fcache if so. (This is going to need more thought if we ever
try to share FMgrInfos across threads; but it's very far from being
the only problem such a project will encounter, since many functions
regard fn_extra as being query-local state.)
This broke at commit 0313c5dc6; before that we did not try to re-use
SQLFunctionCache state across calls. Hence, back-patch to v18.
Bug: #19026
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19026-90aed5e71d0c8af3@postgresql.org
Backpatch-through: 18
In refuseDupeIndexAttach(), change from
errdetail("Another index is already attached for partition \"%s\"."...)
to
errdetail("Another index \"%s\" is already attached for partition \"%s\"."...)
so we can easily understand which index is already attached for
partition \"%s\".
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/CACJufxGBfykJ_1ztk9T%2BL_gLmkOSOF%2BmL9Mn4ZPydz-rh%3DLccQ%40mail.gmail.com
Some replication slot manipulations (logical decoding via SQL,
advancing) were failing an assertion when releasing a slot in
single-user mode, because active_pid was not set in a ReplicationSlot
when its slot is acquired.
ReplicationSlotAcquire() has some logic to be able to work with the
single-user mode. This commit sets ReplicationSlot->active_pid to
MyProcPid, to let the slot-related logic fall-through, considering the
single process as the one holding the slot.
Some TAP tests are added for various replication slot functions with the
single-user mode, while on it, for slot creation, drop, advancing, copy
and logical decoding with multiple slot types (temporary, physical vs
logical). These tests are skipped on Windows, as direct calls of
postgres --single would fail on permission failures. There is no
platform-specific behavior that needs to be checked, so living with this
restriction should be fine. The CI is OK with that, now let's see what
the buildfarm tells.
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Mutaamba Maasha <maasha@gmail.com>
Discussion: https://postgr.es/m/OSCPR01MB14966ED588A0328DAEBE8CB25F5FA2@OSCPR01MB14966.jpnprd01.prod.outlook.com
Backpatch-through: 13
vacuumdb should follow the behavior of the underlying VACUUM and ANALYZE
commands. When --analyze-only is used, it ought to analyze regular tables,
materialized views, and partitioned tables, just as ANALYZE (with no explicit
target tables) does. Otherwise, it should only process regular tables and
materialized views, since VACUUM skips partitioned tables when no targets
are given.
Previously, vacuumdb --analyze-only skipped partitioned tables. This was
inconsistent, and also inconvenient after pg_upgrade, where --analyze-only
is typically used to gather missing statistics.
This commit fixes the behavior so that vacuumdb --analyze-only also processes
partitioned tables. As a result, both vacuumdb --analyze-only and
ANALYZE (with no explicit targets) now analyze regular tables,
partitioned tables, and materialized views, but not foreign tables.
Because this is a nontrivial behavior change, it is applied only to master.
Reported-by: Zechman, Derek S <Derek.S.Zechman@snapon.com>
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Co-authored-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CO1PR04MB8281387B9AD9DE30976966BBC045A%40CO1PR04MB8281.namprd04.prod.outlook.com
This comment mentions that pg_buffercache locks all buffer
partitions simultaneously, but it hasn't done so since v10.
Oversight in commit 6e654546fb.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/aKTuAHVEuYCUmmIy%40nathan
This commit adds CHECK_FOR_INTERRUPTS to loops iterating over shared
buffers in several pg_buffercache functions, allowing them to be
interrupted during long-running operations.
Backpatch to all supported versions. Add CHECK_FOR_INTERRUPTS to the
loop in pg_buffercache_pages() in all supported branches, and to
pg_buffercache_summary() and pg_buffercache_usage_counts() in version
16 and newer.
Author: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDcejeLx7WunFT3DX6XKh1KshvGKa8F5au8xVhqVvvQPRw@mail.gmail.com
Backpatch-through: 13
This commit updates the pgoutput documentation with the following changes:
- Specify the data type for each pgoutput option.
- Clarify the relationship between proto_version and options such as
streaming and two_phase.
- Add a note on the use of pg_logical_slot_peek_changes and
pg_logical_slot_get_changes with pgoutput.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/CAHGQGwFJTbygdhhjR_iP4Oem=Lo1xsptWWOq825uoW+hG_Lfnw@mail.gmail.com
Previously, the documentation for pgoutput was located in the section on
the logical streaming replication protocol, and there was no index entry
for it. As a result, users had difficulty finding information about pgoutput.
This commit moves the pgoutput documentation under the logical decoding
section and adds an index entry, making it easier for users to locate and
access this information.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/CAHGQGwFJTbygdhhjR_iP4Oem=Lo1xsptWWOq825uoW+hG_Lfnw@mail.gmail.com
This just includes a link to the bki documentation, to help people get
started.
Before commit 372728b0d4, there was a README at
src/backend/catalog/README, but then this was moved to the SGML
documentation. So this effectively puts back a link to what was
moved. But src/include/catalog/ is probably a better location,
because that's where all the interesting files are.
Co-authored-by: Florents Tselai <florents.tselai@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+v5N400GJFJ9RyXAX7hFKbtF7vVQGvWdFWEfcSQmvVhi9xfrA@mail.gmail.com
The DROP SUBSCRIPTION command performs several operations: it stops the
subscription workers, removes subscription-related entries from system
catalogs, and deletes the replication slot on the publisher server.
Previously, this command acquired an AccessExclusiveLock on
pg_subscription before initiating these steps.
However, while holding this lock, the command attempts to connect to the
publisher to remove the replication slot. In cases where the connection is
made to a newly created database on the same server as subscriber, the
cache-building process during connection tries to acquire an
AccessShareLock on pg_subscription, resulting in a self-deadlock.
To resolve this issue, we reduce the lock level on pg_subscription during
DROP SUBSCRIPTION from AccessExclusiveLock to RowExclusiveLock. Earlier,
the higher lock level was used to prevent the launcher from starting a new
worker during the drop operation, as a restarted worker could become
orphaned.
Now, instead of relying on a strict lock, we acquire an AccessShareLock on
the specific subscription being dropped and re-validate its existence
after acquiring the lock. If the subscription is no longer valid, the
worker exits gracefully. This approach avoids the deadlock while still
ensuring that orphan workers are not created.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/18988-7312c868be2d467f@postgresql.org
This provides a single entry point to access some information about the
state of MultiXacts, able to return some data about multixacts offsets
and counts. Originally this function was only able to return some
information about the number of multixacts and multixact members,
extended here to provide some data about the oldest multixact ID in use
and the oldest offset, if known.
This change has been proposed in a patch that aims at providing more
monitoring capabilities for multixacts, and it is useful on its own.
GetMultiXactInfo() is added to multixact.h, becoming available for
out-of-core code.
Extracted from a larger patch by the same author.
Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA+QeY+AAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg@mail.gmail.com
This pointer was not used after its last update. This variable
assignment was most likely a vestige artifact of the earlier versions of
the patch set that have led to 5891c7a8ed.
This pointer update is useless, so let's remove it. It removes one call
to pgstat_dsa_init_size(), making the code slightly easier to grasp.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aKLsu2sdpnyeuSSc@ip-10-97-1-34.eu-west-3.compute.internal
Now that the only call to relation_has_unique_index_for() that
supplied an exprlist and oprlist has been removed, the loop handling
those lists is effectively dead code. This patch removes that loop
and simplifies the function accordingly.
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-EBnaRvEs7frTLbsXiweSTUXifsteF-d3rvv01FKO86w@mail.gmail.com
There are two implementation techniques for semijoins: one uses the
JOIN_SEMI jointype, where the executor emits at most one matching row
per left-hand side (LHS) row; the other unique-ifies the right-hand
side (RHS) and then performs a plain inner join.
The latter technique currently has some drawbacks related to the
unique-ification step.
* Only the cheapest-total path of the RHS is considered during
unique-ification. This may cause us to miss some optimization
opportunities; for example, a path with a better sort order might be
overlooked simply because it is not the cheapest in total cost. Such
a path could help avoid a sort at a higher level, potentially
resulting in a cheaper overall plan.
* We currently rely on heuristics to choose between hash-based and
sort-based unique-ification. A better approach would be to generate
paths for both methods and allow add_path() to decide which one is
preferable, consistent with how path selection is handled elsewhere in
the planner.
* In the sort-based implementation, we currently pay no attention to
the pathkeys of the input subpath or the resulting output. This can
result in redundant sort nodes being added to the final plan.
This patch improves semijoin planning by creating a new RelOptInfo for
the RHS rel to represent its unique-ified version. It then generates
multiple paths that represent elimination of distinct rows from the
RHS, considering both a hash-based implementation using the cheapest
total path of the original RHS rel, and sort-based implementations
that either exploit presorted input paths or explicitly sort the
cheapest total path. All resulting paths compete in add_path(), and
those deemed worthy of consideration are added to the new RelOptInfo.
Finally, the unique-ified rel is joined with the other side of the
semijoin using a plain inner join.
As a side effect, most of the code related to the JOIN_UNIQUE_OUTER
and JOIN_UNIQUE_INNER jointypes -- used to indicate that the LHS or
RHS path should be made unique -- has been removed. Besides, the
T_Unique path now has the same meaning for both semijoins and upper
DISTINCT clauses: it represents adjacent-duplicate removal on
presorted input. This patch unifies their handling by sharing the
same data structures and functions.
This patch also removes the UNIQUE_PATH_NOOP related code along the
way, as it is dead code -- if the RHS rel is provably unique, the
semijoin should have already been simplified to a plain inner join by
analyzejoins.c.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-EBnaRvEs7frTLbsXiweSTUXifsteF-d3rvv01FKO86w@mail.gmail.com
This test was the only one named following the convention used for
alternate output files. This was a little bit confusing when looking at
the diffs of the test, because one would think that the diffs are based
on an uncommon case, as alternate outputs are usually used for uncommon
configuration scenarios.
create_sequence_1 was the only test in the tree using such a name, and
it had no alternate output.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/aKLY6wCa_OInr3kY@paquier.xyz
Two header declarations were related to SQL-callable functions, that
should have been cleaned up in df9133fa63. Some more includes can be
removed on closer inspection, so let's clean up these as well, while on
it.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/345438.1755524834@sss.pgh.pa.us
This existed in a semi broken stated from be0a66666 until 296cba276.
Recent discussion has questioned the value of having this at all as it
only outputs static information from various of the hash table's
properties when the hash table is created.
Author: Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/OSCPR01MB1496650D03FA0293AB9C21416F534A@OSCPR01MB14966.jpnprd01.prod.outlook.com
Previously this was being output to stderr. This commit adjusts things
to use elog(DEBUG4). Here we also adjust the format of the message to
add the hash table name and also put the message on a single line. This
should make grepping the logs for this information easier.
Also get rid of the global hash table statistics. This seems very dated
and didn't fit very well with trying to put all the statistics for a
specific hash table on a single log line.
The main aim here is to allow it so we can have at least one buildfarm
member build with HASH_STATISTICS to help prevent future changes from
breaking things in that area. ca3891251 recently fixed some issues
here.
In passing, switch to using uint64 data types rather than longs for the
usage counters. The long type is 32 bits on some platforms we support.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAApHDvoccvJ9CG5zx+i-EyCzJbcL5K=CzqrnL_YN59qaL5hiaw@mail.gmail.com
Apparently the only Test::More function this script uses is
BAIL_OUT, so this omission just results in the wrong error
output appearing in the cases where it bails out.
Seems to have been an oversight in commit 9f899562d which
split Kerberos.pm out of another script.
Author: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACG=ezY1Dp-S94b78nN0ZuaBGGcMUB6_nF-VyYUwPt1ArFqmGA@mail.gmail.com
Backpatch-through: 17
Input with zero length can result in a buffer underflow when
accessing *(num + (len - 1)), as (len - 1) would produce a negative
index. Add an assertion for zero-length input to prevent it.
This was found by ALT Linux Team.
Reviewing the call sites shows that get_th() currently cannot be
applied to an empty string: it is always called on a string containing
a number we've just printed. Therefore, an assertion rather than a
user-facing error message is sufficient.
Co-authored-by: Alexander Kuznetsov <kuznetsovam@altlinux.org>
Discussion: https://www.postgresql.org/message-id/flat/e22df993-cdb4-4d0a-b629-42211ebed582@altlinux.org
A patch is under discussion to add more SQL capabilities related to
multixacts, and this move avoids bloating the file more than necessary.
This affects pg_get_multixact_members(). A side effect of this move is
the requirement to add mxstatus_to_string() to multixact.h.
Extracted from a larger patch by the same author, tweaked by me.
Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CA+QeY+AAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg@mail.gmail.com
Move the test for compiler options for C99 earlier in meson.build,
before we make use of the compiler for other tests. That way, if any
command-line options are needed, subsequent tests will also use them.
This is at the moment a theoretical problem, but it seems better to
get this correct. It also matches the order in the Autoconf-based
build more closely.
Discussion: https://www.postgresql.org/message-id/flat/01a69441-af54-4822-891b-ca28e05b215a@eisentraut.org
init_params() sets up "last_value" and "is_called" for a sequence
relation holdind its metadata, based on the sequence properties in
pg_sequences. "log_cnt" is the third property that can be updated in
this routine for FormData_pg_sequence_data, tracking when WAL records
should be generated for a sequence after nextval() iterations. This
routine is called when creating or altering a sequence.
This commit refactors init_params() to not depend anymore on
FormData_pg_sequence_data, removing traces of it in sequence.c, making
easier the manipulation of metadata related to sequences. The knowledge
about "log_cnt" is replaced with a more general "reset_state" flag, to
let the caller know if the sequence state should be reset. In the case
of in-core sequences, this relates to WAL logging. We still need to
depend on FormData_pg_sequence.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/ZWlohtKAs0uVVpZ3@paquier.xyz
This test was failing because MD5 computations are not supported in
these environments. This switches the test to rely on sha256() instead,
providing the same coverage while avoiding the failure.
Oversight in f57e214d1c. Per buildfarm members gecko, molamola,
shikra and froghopper.
Discussion: https://postgr.es/m/aKJijS2ZRfRZiYb0@paquier.xyz
Commit c5b7ba4e6 changed things so that the ri_RootResultRelInfo field
of this struct is set for both partitions and inheritance children and
used for tuple routing and transition capture (before that commit, it
was only set for partitions to route tuples into), but failed to update
these comments.
Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAPmGK14NF5CcdCmTZpxrvpvBiT0y4EqKikW1r_wAu1CEHeOmUA%40mail.gmail.com
Backpatch-through: 14
This test exercises the corner case in toast_save_datum() where CLUSTER
operations encounter duplicated TOAST references, reusing the existing
TOAST data instead of creating redundant copies.
During table rewrites like CLUSTER, both live and recently-dead versions
of a row may reference the same TOAST value. When copying the second or
later version of such a row, the system checks if a TOAST value already
exists in the new TOAST table using toastrel_valueid_exists(). If
found, toast_save_datum() sets data_todo = 0 so as redundant data is not
stored, ensuring only one copy of the TOAST value exists in the new
table.
The test relies on a combination of UPDATE, CLUSTER, and checks of the
TOAST values used before and after the relation rewrite, to make sure
that the same values are reused across the rewrite.
This is a continuation of 69f75d6714 to make sure that this corner
case keeps working should we mess with this area of the code.
Author: Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com>
Discussion: https://postgr.es/m/CAFAfj_E+kw5P713S8_jZyVgQAGVFfzFiTUJPrgo-TTtJJoazQw@mail.gmail.com
When extracting a timestamp from a UUIDv7, a conversion from
milliseconds to microseconds was using the incorrect constant
NS_PER_US instead of US_PER_MS. Although both constants have the same
value, this fix improves code clarity by using the semantically
correct constant.
Backpatch to v18, where UUIDv7 was introduced.
Author: Erik Nordström <erik@tigerdata.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CACAa4V+i07eaP6h4MHNydZeX47kkLPwAg0sqe67R=M5tLdxNuQ@mail.gmail.com
Backpatch-through: 18
Recent changes to src/tools/ci/README triggered warnings like
src/tools/ci/README:88: leftover conflict marker
Raise conflict-marker-size in .gitattributes to avoid these.
Add TAP tests that tests the LDAP Lookup of Connection Parameters
functionality in libpq. Prior to this commit, LDAP test coverage only
existed for the server-side authentication functionality and for
connection service file with parameters directly specified in the
file. The tests included here test a pg_service.conf that contains a
link to an LDAP system that contains all of the connection parameters.
Author: Andrew Jackson <andrewjackson947@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAKK5BkHixcivSCA9pfd_eUp7wkLRhvQ6OtGLAYrWC%3Dk7E76LDQ%40mail.gmail.com
bms_prev_member() could attempt to access memory outside of the words[]
array in cases where the prevbit was a number < -1 or > a->nwords *
BITS_PER_BITMAPWORD + 1.
Here we add the Asserts to help draw attention to bogus callers so we're
more likely to catch them during development.
In passing, fix wording of bms_prev_member's header comment which talks
about how we expect the callers to ensure only valid prevbit values are
used.
Author: Greg Burd <greg@burd.me>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2000A717-1FFE-4031-827B-9330FB2E9065%40getmailspring.com
The SQL test added in this commit check a specific code path that had no
coverage until now. When a TOAST datum is rewritten, toast_save_datum()
has a dedicated path to make sure that a new value is allocated if it
does not exist on the TOAST table yet.
This test uses a trick with PLAIN and EXTERNAL storage, with a tuple
large enough to be toasted and small enough to fit on a page. It is
initially stored in plain more, and the rewrite forces the tuple to be
stored externally. The key point is that there is no value allocated
during the initial insert, and that there is one after the rewrite. A
second pattern checked is the reuse of the same value across rewrites,
using \gset.
A set of patches under discussion is messing up with this area of the
code, so this makes sure that such rewrite cases remain consistent
across the board.
Author: Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAFAfj_E+kw5P713S8_jZyVgQAGVFfzFiTUJPrgo-TTtJJoazQw@mail.gmail.com
Handle 'ci-os-only' occurrences in the .cirrus.star file instead of
.cirrus.tasks.yml file. Now, 'ci-os-only' occurrences are controlled
from one central place instead of dealing with them in each task.
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/20240413021221.hg53rvqlvldqh57i%40awork3.anarazel.de
Backpatch: 15-, where CI support was added
We do not want to trigger some tasks by default, to avoid using too many
compute credits. These tasks have to be manually triggered to be run. But
e.g. for cfbot we do have sufficient resources, so we always want to start
those tasks.
With this commit, an individual repository can be configured to trigger
them automatically using an environment variable defined under
"Repository Settings", for example:
REPO_CI_AUTOMATIC_TRIGGER_TASKS="mingw netbsd openbsd"
This will enable cfbot to turn them on by default when running tests for the
Commitfest app.
Backpatch this back to PG 15, even though PG 15 does not have any manually
triggered task. Keeping the CI infrastructure the same seems advantageous.
Author: Andres Freund <andres@anarazel.de>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/20240413021221.hg53rvqlvldqh57i%40awork3.anarazel.de
Backpatch-through: 16
Doing that seems rather random and unnecessary. This commit removes
those and fixes fallout, which is pretty minimal. We do need to add a
forward declaration of struct TM_IndexDeleteOp (whose full definition
appears in tableam.h) so that _bt_delitems_delete_check()'s declaration
can use it.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/202508051109.lzk3lcuzsaxo@alvherre.pgsql
Make sure the memory allocated by make_absolute_path() is freed
when SelectConfigFiles() fails. Since all the callers will exit
immediately in that case, there's no practical gain here, but
silencing Valgrind leak complaints seems useful. In any case,
it was inconsistent that only one of the failure exits did this.
Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAJ7c6TMByXE8dc7zDvDWTQjk6o-XXAdRg_RAg5CBaUOgFPV3LQ%40mail.gmail.com
This function uses an argument named "maxsize" that is only used in
assertions, being set once outside the assertion area. Recent gcc
versions with -Wunused-but-set-parameter complain about a warning when
building without assertions enabled, because of that.
In order to fix this issue, PG_USED_FOR_ASSERTS_ONLY is added to the
function argument of SerializeClientConnectionInfo(), which is the first
time we are doing so in the tree. The CI is fine with the change, but
let's see what the buildfarm has to say on the matter.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Jacob Champion <jchampion@postgresql.org>
Discussion: https://postgr.es/m/pevajesswhxafjkivoq3yvwxga77tbncghlf3gq5fvchsvfuda@6uivg25sb3nx
Backpatch-through: 16
Commit 2633dae2e4 standardized LSN formatting but mistakenly changed
the logical snapshot filename format in SnapBuildSnapshotExists() from
"%X-%X.snap" to "%08X-%08X.snap". Other code still used the original
"%X-%X.snap" format, causing the replication slot synchronization worker
to fail to find existing snapshot files and produce excessive log messages.
This commit restores the original "%X-%X.snap" format
in SnapBuildSnapshotExists() to resolve the issue.
Author: Shveta Malik <shveta.malik@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwHuHPB-ucAk_Tq3uSs4Fdziu1Jp_AA_RD3m5Ycky7m48w@mail.gmail.com
Remove conditionally-compiled code for the other case.
Replace uses of FLOAT8PASSBYVAL with constant "true", mainly because
it was quite confusing in cases where the type we were dealing with
wasn't float8.
I left the associated pg_control and Pg_magic_struct fields in place.
Perhaps we should get rid of them, but it would save little, so it
doesn't seem worth thinking hard about the compatibility implications.
I just labeled them "vestigial" in places where that seemed helpful.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/1749799.1752797397@sss.pgh.pa.us
Remove conditionally-compiled code for smaller Datum widths,
and simplify comments that describe cases no longer of interest.
I also fixed up a few more places that were not using
DatumGetIntXX where they should, and made some cosmetic
adjustments such as using sizeof(int64) not sizeof(Datum)
in places where that fit better with the surrounding code.
One thing I remembered while preparing this part is that SP-GiST
stores pass-by-value prefix keys as Datums, so that the on-disk
representation depends on sizeof(Datum). That's even more
unfortunate than the existing commentary makes it out to be,
because now there is a hazard that the change of sizeof(Datum)
will break SP-GiST indexes on 32-bit machines. It appears that
there are no existing SP-GiST opclasses that are actually
affected; and if there are some that I didn't find, the number
of installations that are using them on 32-bit machines is
doubtless tiny. So I'm proceeding on the assumption that we
can get away with this, but it's something to worry about.
(gininsert.c looks like it has a similar problem, but it's okay
because the "tuples" it's constructing are just transient data
within the tuplesort step. That's pretty poorly documented
though, so I added some comments.)
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/1749799.1752797397@sss.pgh.pa.us
This patch makes sizeof(Datum) be 8 on all platforms including
32-bit ones. The objective is to allow USE_FLOAT8_BYVAL to be true
everywhere, and in consequence to remove a lot of code that is
specific to pass-by-reference handling of float8, int8, etc. The
code for abbreviated sort keys can be simplified similarly. In this
way we can reduce the maintenance effort involved in supporting 32-bit
platforms, without going so far as to actually desupport them. Since
Datum is strictly an in-memory concept, this has no impact on on-disk
storage, though an initdb or pg_upgrade will be needed to fix affected
catalog entries.
We have required platforms to support [u]int64 for ages, so this
breaks no supported platform. We can expect that this change will
make 32-bit builds a bit slower and more memory-hungry, although being
able to use pass-by-value handling of 8-byte types may buy back some
of that. But we stopped optimizing for 32-bit cases a long time ago,
and this seems like just another step on that path.
This initial patch simply forces the correct type definition and
USE_FLOAT8_BYVAL setting, and cleans up a couple of minor compiler
complaints that ensued. This is sufficient for testing purposes.
In the wake of a bunch of Datum-conversion cleanups by Peter
Eisentraut, this now compiles cleanly with gcc on a 32-bit platform.
(I'd only tested the previous version with clang, which it turns out
is less picky than gcc about width-changing coercions.)
There is a good deal of now-dead code that I'll remove in separate
follow-up patches.
A catversion bump is required because this affects initial catalog
contents (on 32-bit machines) in two ways: pg_type.typbyval changes
for some built-in types, and Const nodes in stored views/rules will
now have 8 bytes not 4 for pass-by-value types.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/1749799.1752797397@sss.pgh.pa.us
Currently the pdb file for libpq and some other libraries are named the same
for the static and shared libraries. That has been the case for a long time,
but recently started failing, after an image update started using a newer
ninja version. The issue is not itself caused by ninja, but just made visible,
as the newer version optimizes the build order and builds the shared libpq
earlier than the static library. Previously both static and shared libraries
were built at the same time, which prevented msvc from detecting the issue.
When using /DEBUG:FASTLINK pdb files cannot be updated, triggering the error.
We were using /DEBUG:FASTLINK due to running out of memory in the past, but
that was when using container based CI images, rather than full VMs.
This isn't really the correct fix (that'd be to deconflict the pdb file
names), but we'd like to get CI to become green again, and a proper fix (in
meson) will presumably take longer.
Suggested-by: Andres Freund <andres@anarazel.de>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAN55FZ1RuBhJmPWs3Oi%3D9UoezDfrtO-VaU67db5%2B0_uy19uF%2BA%40mail.gmail.com
Backpatch-through: 16
Previously our tests did not exercise kill_prior_tuples for hash and gist. For
gist some related paths were reached, but gist's implementation seems to not
work if all the dead tuples are on one page (or something like that). The
coverage for other index types was rather incidental.
Thus add an explicit test ensuring kill_prior_tuples works at all.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/lxzj26ga6ippdeunz6kuncectr5gfuugmm2ry22qu6hcx6oid6@lzx3sjsqhmt6
It turns out that on some platforms (at least current macOS, NetBSD,
OpenBSD) semget(2) will return EINVAL if there is a pre-existing
semaphore set with the same key and too few semaphores. Our code
expects EEXIST in that case and treats EINVAL as a hard failure,
resulting in failure during initdb or postmaster start.
POSIX does document EINVAL for too-few-semaphores-in-set, and is
silent on its priority relative to EEXIST, so this behavior arguably
conforms to spec. Nonetheless it's quite problematic because EINVAL
is also documented to mean that nsems is greater than the system's
limit on the number of semaphores per set (SEMMSL). If that is
where the problem lies, retrying would just become an infinite loop.
To resolve this contradiction, retry after EINVAL, but also install a
loop limit that will make us give up regardless of the specific errno
after trying 1000 different keys. (1000 is a pretty arbitrary number,
but it seems like it should be sufficient.) I like this better than
the previous infinite-looping behavior, since it will also keep us out
of trouble if (say) we get EACCES due to a system-level permissions
problem rather than anything to do with a specific semaphore set.
This problem has only been observed in the field in PG 17, which uses
a higher nsems value than other branches (cf. 38da05346, 810a8b1c8).
That makes it possible to get the failure if a new v17 postmaster
has a key collision with an existing postmaster of another branch.
In principle though, we might see such a collision against a semaphore
set created by some other application, in which case all branches are
vulnerable on these platforms. Hence, backpatch.
Reported-by: Gavin Panella <gavinpanella@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CALL7chmzY3eXHA7zHnODUVGZLSvK3wYCSP0RmcDFHJY8f28Q3g@mail.gmail.com
Backpatch-through: 13
Set body indent to 0 to make use of the horizontal space better (and
some reviewers thought it was also more readable).
Add some left and right margin to the warning boxes, otherwise they
drift too far off the page in combination with the above change.
Author: Noboru Saito <noborusai@gmail.com>
Reviewed-by: Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com>
Reviewed-by: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Tatsuo Ishii <ishii@postgresql.org>
Discussion: https://www.postgresql.org/message-id/flat/CAAM3qnLyMUD79XF+SqAVwWCwURCF3hyuFY9Ki9Csbqs-zMwwnw@mail.gmail.com
The tests fixed in this commit were changing the sampling setting of a
foreign server, but then were analyzing a local table instead of a
foreign table, meaning that the test was not running for its original
purpose.
This commit changes the ANALYZE commands to analyze the foreign table,
and changes the foreign table definition to point to a valid remote
table. Attempting to analyze the foreign table "analyze_ftable" would
have failed before this commit, because "analyze_rtable1" is not defined
on the remote side.
Issue introduced by 8ad51b5f44.
Author: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/CADkLM=cpUiJ3QF7aUthTvaVMmgQcm7QqZBRMDLhBRTR+gJX-Og@mail.gmail.com
Backpatch-through: 16
fb9f955025 optimized code generation by using specialized variants of
ExecSeqScan* for [not] having a qual, projection etc. This allowed the
compiler to optimize the code out the code for qual / projection. However, as
observed by David Rowley at the time, the compiler couldn't prove the
opposite, i.e. that the qual etc *are* present.
By using pg_assume(), introduced in d65eb5b1b8, we can tell the compiler that
the relevant variables are non-null.
This reduces the code size to a surprising degree and seems to lead to a small
but reproducible performance gain.
Reviewed-by: Amit Langote <amitlangote09@gmail.com> Discussion:
https://postgr.es/m/CA+HiwqFk-MbwhfX_kucxzL8zLmjEt9MMcHi2YF=DyhPrSjsBEA@mail.gmail.com
Without using stamp files, meson lists the generated headers as the dependency
for every .c file, bloating build.ninja by more than 2x. Processing all the
dependencies also increases the time to generate build.ninja.
The immediate benefit is that this makes re-configuring and clean builds a bit
faster. The main motivation however is that I have other patches that
introduce additional build targets that further would increase the size of
build.ninja, making re-configuring more noticeably slower.
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/cgkdgvzdpinkacf4v33mky7tbmk467oda5dd4dlmucjjockxzi@xkqfvjoq4uiy
A malicious server could inject psql meta-commands into plain-text
dump output (i.e., scripts created with pg_dump --format=plain,
pg_dumpall, or pg_restore --file) that are run at restore time on
the machine running psql. To fix, introduce a new "restricted"
mode in psql that blocks all meta-commands (except for \unrestrict
to exit the mode), and teach pg_dump, pg_dumpall, and pg_restore to
use this mode in plain-text dumps.
While at it, encourage users to only restore dumps generated from
trusted servers or to inspect it beforehand, since restoring causes
the destination to execute arbitrary code of the source superusers'
choice. However, the client running the dump and restore needn't
trust the source or destination superusers.
Reported-by: Martin Rakhmanov
Reported-by: Matthieu Denais <litezeraw@gmail.com>
Reported-by: RyotaK <ryotak.mail@gmail.com>
Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Security: CVE-2025-8714
Backpatch-through: 13
Maliciously-crafted object names could achieve SQL injection during
restore. CVE-2012-0868 fixed this class of problem at the time, but
later work reintroduced three cases. Commit
bc8cd50fef (back-patched to v11+ in
2023-05 releases) introduced the pg_dump case. Commit
6cbdbd9e8d (v12+) introduced the two
pg_dumpall cases. Move sanitize_line(), unchanged, to dumputils.c so
pg_dumpall has access to it in all supported versions. Back-patch to
v13 (all supported versions).
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Backpatch-through: 13
Security: CVE-2025-8715
Commit e2d4ef8de8 (the fix for CVE-2017-7484) added security checks
to the selectivity estimation functions to prevent them from running
user-supplied operators on data obtained from pg_statistic if the user
lacks privileges to select from the underlying table. In cases
involving inheritance/partitioning, those checks were originally
performed against the child RTE (which for plain inheritance might
actually refer to the parent table). Commit 553d2ec271 then extended
that to also check the parent RTE, allowing access if the user had
permissions on either the parent or the child. It turns out, however,
that doing any checks using the child RTE is incorrect, since
securityQuals is set to NULL when creating an RTE for an inheritance
child (whether it refers to the parent table or the child table), and
therefore such checks do not correctly account for any RLS policies or
security barrier views. Therefore, do the security checks using only
the parent RTE. This is consistent with how RLS policies are applied,
and the executor's ACL checks, both of which use only the parent
table's permissions/policies. Similar checks are performed in the
extended stats code, so update that in the same way, centralizing all
the checks in a new function.
In addition, note that these checks by themselves are insufficient to
ensure that the user has access to the table's data because, in a
query that goes via a view, they only check that the view owner has
permissions on the underlying table, not that the current user has
permissions on the view itself. In the selectivity estimation
functions, there is no easy way to navigate from underlying tables to
views, so add permissions checks for all views mentioned in the query
to the planner startup code. If the user lacks permissions on a view,
a permissions error will now be reported at planner-startup, and the
selectivity estimation functions will not be run.
Checking view permissions at planner-startup in this way is a little
ugly, since the same checks will be repeated at executor-startup.
Longer-term, it might be better to move all the permissions checks
from the executor to the planner so that permissions errors can be
reported sooner, instead of creating a plan that won't ever be run.
However, such a change seems too far-reaching to be back-patched.
Back-patch to all supported versions. In v13, there is the added
complication that UPDATEs and DELETEs on inherited target tables are
planned using inheritance_planner(), which plans each inheritance
child table separately, so that the selectivity estimation functions
do not know that they are dealing with a child table accessed via its
parent. Handle that by checking access permissions on the top parent
table at planner-startup, in the same way as we do for views. Any
securityQuals on the top parent table are moved down to the child
tables by inheritance_planner(), so they continue to be checked by the
selectivity estimation functions.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Backpatch-through: 13
Security: CVE-2025-8713
The internal queue of buffers could become corrupted in a rare edge case
that failed to invalidate an entry, causing a stale buffer to be
"forwarded" to StartReadBuffers(). This is a simple fix for the
immediate problem.
A small API change might be able to remove this and related fragility
entirely, but that will have to wait a bit.
Defect in commit ed0b87ca.
Bug: 19006
Backpatch-through: 18
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/19006-80fcaaf69000377e%40postgresql.org
Fix a couple more places where an explicit Datum conversion
is needed (not clear how we missed these in ff89e182d and
previous commits).
Replace the minority usage "(Datum) NULL" with "(Datum) 0".
The former depends on the assumption that Datum is the same
width as Pointer, the latter doesn't. Anyway consistency
is a good thing.
This is, I believe, the last of the notational mop-up needed
before we can consider changing Datum to uint64 everywhere.
It's also important cleanup for more aggressive ideas such
as making Datum a struct.
Discussion: https://postgr.es/m/1749799.1752797397@sss.pgh.pa.us
Discussion: https://postgr.es/m/8246d7ff-f4b7-4363-913e-827dadfeb145@eisentraut.org
Add various missing conversions from and to Datum. The previous code
mostly relied on implicit conversions or its own explicit casts
instead of using the correct DatumGet*() or *GetDatum() functions.
We think these omissions are harmless. Some actual bugs that were
discovered during this process have been committed
separately (80c758a2e1, fd2ab03fea).
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/8246d7ff-f4b7-4363-913e-827dadfeb145%40eisentraut.org
before checking ->has_scram_keys. MyProcPort is NULL in background
workers. So this could crash for example if a background worker
accessed a suitable configured foreign table.
Author: Alexander Pyhalov <a.pyhalov@postgrespro.ru>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/27b29a35-9b96-46a9-bc1a-914140869dac%40gmail.com
Commit 1443b6c0e introduced buildfarm breakage for Autoconf animals,
which expect to be able to run `make installcheck` on the libpq-oauth
directory even if libcurl support is disabled. Some other Meson animals
complained of a missing -lm link as well.
Since this is the day before a freeze, revert for now and come back
later.
Discussion: https://postgr.es/m/CAOYmi%2BnCkoh3zB%2BGkZad44%3DFNskwUg6F1kmuxqQZzng7Zgj5tw%40mail.gmail.com
To better record the internal behaviors of oauth-curl.c, add a unit test
suite for the socket and timer handling code. This is all based on TAP
and driven by our existing Test::More infrastructure.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/CAOYmi+nDZxJHaWj9_jRSyf8uMToCADAmOfJEggsKW-kY7aUwHA@mail.gmail.com
Tracking down the bugs that led to the addition of comb_multiplexer()
and drain_timer_events() was difficult, because an inefficient flow is
not visibly different from one that is working properly. To help
maintainers notice when something has gone wrong, track the number of
calls into the flow as part of debug mode, and print the total when the
flow finishes.
A new test makes sure the total count is less than 100. (We expect
something on the order of 10.) This isn't foolproof, but it is able to
catch several regressions in the logic of the prior two commits, and
future work to add TLS support to the oauth_validator test server should
strengthen it as well.
Backpatch-through: 18
Discussion: https://postgr.es/m/CAOYmi+nDZxJHaWj9_jRSyf8uMToCADAmOfJEggsKW-kY7aUwHA@mail.gmail.com
In a case similar to the previous commit, an expired timer can remain
permanently readable if Curl does not remove the timeout itself. Since
that removal isn't guaranteed to happen in real-world situations,
implement drain_timer_events() to reset the timer before calling into
drive_request().
Moving to drain_timer_events() happens to fix a logic bug in the
previous caller of timer_expired(), which treated an error condition as
if the timer were expired instead of bailing out.
The previous implementation of timer_expired() gave differing results
for epoll and kqueue if the timer was reset. (For epoll, a reset timer
was considered to be expired, and for kqueue it was not.) This didn't
previously cause problems, since timer_expired() was only called while
the timer was known to be set, but both implementations now use the
kqueue logic.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/CAOYmi+nDZxJHaWj9_jRSyf8uMToCADAmOfJEggsKW-kY7aUwHA@mail.gmail.com
If Curl needs to switch the direction of a socket's registration (e.g.
from CURL_POLL_IN to CURL_POLL_OUT), it expects the old registration to
be discarded. For epoll, this happened via EPOLL_CTL_MOD, but for
kqueue, the old registration would remain if it was not explicitly
removed by Curl.
Explicitly remove the opposite-direction event during registrations. (If
that event doesn't exist, we'll just get an ENOENT, which will be
ignored by the same code that handles CURL_POLL_REMOVE.) A few
assertions are also added to strengthen the relationship between the
number of events added, the number of events pulled off the queue, and
the lengths of the kevent arrays.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/CAOYmi+nDZxJHaWj9_jRSyf8uMToCADAmOfJEggsKW-kY7aUwHA@mail.gmail.com
If a socket is added to the kqueue, becomes readable/writable, and
subsequently becomes non-readable/writable again, the kqueue itself will
remain readable until either the socket registration is removed, or the
stale event is cleared via a call to kevent().
In many simple cases, Curl itself will remove the socket registration
quickly, but in real-world usage, this is not guaranteed to happen. The
kqueue can then remain stuck in a permanently readable state until the
request ends, which results in pointless wakeups for the client and
wasted CPU time.
Implement comb_multiplexer() to call kevent() and unstick any stale
events that would cause unnecessary callbacks. This is called right
after drive_request(), before we return control to the client to wait.
Suggested-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/CAOYmi+nDZxJHaWj9_jRSyf8uMToCADAmOfJEggsKW-kY7aUwHA@mail.gmail.com
The code used
return (Selectivity) 0.0;
where
PG_RETURN_FLOAT8(0.0);
would be correct.
On 64-bit systems, these are pretty much equivalent, but on 32-bit
systems, PG_RETURN_FLOAT8() correctly produces a pointer, but the old
wrong code would return a null pointer, possibly leading to a crash
elsewhere.
We think this code is actually not reachable because bqarr_in won't
accept an empty query, and there is no other function that will
create query_int values. But better be safe and not let such
incorrect code lie around.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/8246d7ff-f4b7-4363-913e-827dadfeb145%40eisentraut.org
This function is called from ATExecAttachPartition/ATExecAddInherit,
which prevent tables with row-level triggers with transition tables from
becoming partitions or inheritance children, to check if there is such a
trigger on the given table, but failed to check if a found trigger is
row-level, causing the caller functions to needlessly prevent a table
with only a statement-level trigger with transition tables from becoming
a partition or inheritance child. Repair.
Oversight in commit 501ed02cf.
Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/CAPmGK167mXzwzzmJ_0YZ3EZrbwiCxtM1vogH_8drqsE6PtxRYw%40mail.gmail.com
Backpatch-through: 13
Previously, pg_dump --filter could misinterpret invalid object types
in the filter file as valid ones. For example, the invalid object type
"table-data" (likely a typo for the valid "table_data") could be
mistakenly recognized as "table", causing pg_dump to succeed
when it should have failed.
This happened because pg_dump identified keywords as sequences of
ASCII alphabetic characters, treating non-alphabetic characters
(like hyphens) as keyword boundaries. As a result, "table-data" was
parsed as "table".
To fix this, pg_dump --filter now treats keywords as strings of
non-whitespace characters, ensuring invalid types like "table-data"
are correctly rejected.
Back-patch to v17, where the --filter option was introduced.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Srinath Reddy <srinath2133@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAHGQGwFzPKUwiV5C-NLBqz1oK1+z9K8cgrF+LcxFem-p3_Ftug@mail.gmail.com
Backpatch-through: 17
Commit 9e6104c66 disallowed transition tables on foreign tables, but
failed to account for cases where a foreign table is a child table of a
partitioned/inherited table on which transition tables exist, leading to
incorrect transition tuples collected from such foreign tables for
queries on the parent table triggering transition capture. This
occurred not only for inherited UPDATE/DELETE but for partitioned INSERT
later supported by commit 3d956d956, which should have handled it at
least for the INSERT case, but didn't.
To fix, modify ExecAR*Triggers to throw an error if the given relation
is a foreign table requesting transition capture. Also, this commit
fixes make_modifytable so that in case of an inherited UPDATE/DELETE
triggering transition capture, FDWs choose normal operations to modify
child foreign tables, not DirectModify; which is needed because they
would otherwise skip the calls to ExecAR*Triggers at execution, causing
unexpected behavior.
Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CAPmGK14QJYikKzBDCe3jMbpGENnQ7popFmbEgm-XTNuk55oyHg%40mail.gmail.com
Backpatch-through: 13
Dropping twice a pgstats entry should not happen, and the error report
generated was missing the "generation" counter (tracking when an entry
is reused) that has been added in 818119afcc.
Like d92573adcb, backpatch down to v15 where this information is
useful to have, to gather more information from instances where the
problem shows up. A report has shown that this error path has been
reached on a standby based on 17.3, for a relation stats entry and an
OID close to wraparound.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/CAN4RuQvYth942J2+FcLmJKgdpq6fE5eqyFvb_PuskxF2eL=Wzg@mail.gmail.com
Backpatch-through: 15
libpq-oauth was missing from the installed_targets list, so
$ ninja clean && ninja install-quiet
failed with the error message
ERROR: File 'src/interfaces/libpq-oauth/libpq-oauth.a' could not be found
It seems a little odd to have to tell Meson what's missing, since it
clearly knows how to build that file during regular installation. But
the "quiet" variant we've created must use --no-rebuild, to avoid
spawning concurrent ninja processes that would step on each other.
Reported-by: Andres Freund <andres@anarazel.de>
Backpatch-through: 18
Discussion: https://postgr.es/m/hbpqdwxkfnqijaxzgdpvdtp57s7gwxa5d6sbxswovjrournlk6%404jnb2gzan4em
Although the "Floating-Point Types" section says that "float" data
type is taken to mean "double precision", this information was not
reflected in the data type table that lists all data type aliases.
Reported-by: alexander.kjall@hafslund.no
Author: Euler Taveira <euler@eulerto.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/175456294638.800.12038559679827947313@wrigleys.postgresql.org
Backpatch-through: 13
This adds a few more functions to int128.h, allowing more of numeric.c
to use 128-bit integers on all platforms.
Specifically, int64_div_fast_to_numeric() and the following aggregate
functions can now use 128-bit integers for improved performance on all
platforms, rather than just platforms with native support for int128:
- SUM(int8)
- AVG(int8)
- STDDEV_POP(int2 or int4)
- STDDEV_SAMP(int2 or int4)
- VAR_POP(int2 or int4)
- VAR_SAMP(int2 or int4)
In addition to improved performance on platforms lacking native
128-bit integer support, this significantly simplifies this numeric
code by allowing a lot of conditionally compiled code to be deleted.
A couple of numeric functions (div_var_int64() and sqrt_var()) still
contain conditionally compiled 128-bit integer code that only works on
platforms with native 128-bit integer support. Making those work more
generally would require rolling our own higher precision 128-bit
division, which isn't supported for now.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWgBMc9ZwKMYqQpaQz2X6gaamYRB+RnMsUNcdMcL2Mj_w@mail.gmail.com
Small touch-up on commits 25505082f0 and 50fd428b2b. Fix the
formatting of the example messages in the documentation and adjust the
wording to match the code.
Use Min(NBuffers, MAX_CHECKPOINT_REQUESTS) instead of NBuffers in
CheckpointerShmemSize() to match the actual array size limit set in
CheckpointerShmemInit(). This prevents wasting shared memory when
NBuffers > MAX_CHECKPOINT_REQUESTS. Also, fix the comment.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1439188.1754506714%40sss.pgh.pa.us
Author: Xuneng Zhou <xunengzhou@gmail.com>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Recent ICU versions have added U_SHOW_CPLUSPLUS_HEADER_API, and we need
to set this to zero as well to hide the ICU C++ APIs from pg_locale.h
Per discussion, we want cpluspluscheck to work cleanly in backbranches,
so backpatch both this and its predecessor commit ed26c4e25a to all
supported versions.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1115793.1754414782%40sss.pgh.pa.us
Backpatch-through: 13
In the non-native code in int128_add_int64_mul_int64(), use signed
64-bit integer multiplication instead of unsigned multiplication for
the first three product terms. This simplifies the code needed to add
each product term to the result, leading to more compact and efficient
code. The actual performance gain is quite modest, but it seems worth
it to improve the code's readability.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWgBMc9ZwKMYqQpaQz2X6gaamYRB+RnMsUNcdMcL2Mj_w@mail.gmail.com
On platforms without native 128-bit integer support, simplify the test
for carry in int128_add_uint64() by noting that the low-part addition
is unsigned integer arithmetic, which is just modular arithmetic.
Therefore the test for carry can simply be written as "new value < old
value" (i.e., a test for modular wrap-around). This can then be made
branchless so that on modern compilers it produces the same machine
instructions as native 128-bit addition, making it significantly
simpler and faster.
Similarly, the test for carry in int128_add_int64() can be written in
much the same way, but with an extra term to compensate for the sign
of the value being added. Again, on modern compilers this leads to
branchless code, often identical to the native 128-bit integer
addition machine code.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWgBMc9ZwKMYqQpaQz2X6gaamYRB+RnMsUNcdMcL2Mj_w@mail.gmail.com
Commit d85ce012f9 has added some new error handling code to
date_trunc() of timestamp, timestamptz, and interval with infinite
values.
However, the new test cases added by that commit did not actually test
all of the new code, missing coverage for the following cases:
1) For timestamp without time zone:
1-1) infinite value with valid unit
1-2) infinite value with unsupported unit
1-3) finite value with unsupported unit, for a code path older than
d85ce012f9.
2) For timestamp with time zone, without a time zone specified for the
truncation:
2-1) infinite value with valid unit
2-2) infinite value with unsupported unit
2-3) finite value with unsupported unit, for a code path older than
d85ce012f9.
3) For timestamp with time zone, with a time zone specified for the
truncation:
3-1) infinite value with valid unit.
3-2) infinite value with unsupported unit.
This commit also provides coverage for the bug fixed in 2242b26ce4,
through cases 2-1) and 3-1), when using an infinite value with a valid
unit, with[out] the optional time zone parameter used for the
truncation.
Author: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/2d320b6f-b4af-4fbc-9eec-5d0fa15d187b@eisentraut.org
Discussion: https://postgr.es/m/4bf60a84-2862-4a53-acd5-8eddf134a60e@eisentraut.org
Backpatch-through: 18
The code used a PG_RETURN_TIMESTAMPTZ() where the return type is
TimestampTz and not a Datum.
On 64-bit systems, there is no effect since this just ends up casting
64-bit integers back and forth. On 32-bit systems, timestamptz is
pass-by-reference. PG_RETURN_TIMESTAMPTZ() allocates new memory and
returns the address, meaning that the caller could interpret this as a
timestamp value.
The effect is using "date_trunc(..., 'infinity'::timestamptz) will
return random values (instead of the correct return value 'infinity').
Bug introduced in commit d85ce012f9.
Author: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2d320b6f-b4af-4fbc-9eec-5d0fa15d187b@eisentraut.org
Discussion: https://postgr.es/m/4bf60a84-2862-4a53-acd5-8eddf134a60e@eisentraut.org
Backpatch-through: 18
This commit makes use of the existing PqMsg_* macros in more places
and adds new PqReplMsg_* and PqBackupMsg_* macros for use in
special replication and backup messages, respectively.
Author: Dave Cramer <davecramer@gmail.com>
Co-authored-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/aIECfYfevCUpenBT@nathan
Discussion: https://postgr.es/m/CAFcNs%2Br73NOUb7%2BqKrV4HHEki02CS96Z%2Bx19WaFgE087BWwEng%40mail.gmail.com
The name "namspace" looks like a typo, but it was presumably meant
to avoid using the "namespace" C++ keyword. This commit renames
the parameter to "nameSpace" to prevent future confusion while
still avoiding the keyword.
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/aJJxpfsDfiQ1VbJ5%40nathan
This rearranges the code in src/include/common/int128.h, so that the
native and non-native implementations of each function are together
inside the function body (as they are in src/include/common/int.h),
rather than being in separate parts of the file.
This improves readability and maintainability, making it easier to
compare the native and non-native implementations, and avoiding the
need to duplicate every function comment and declaration.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWgBMc9ZwKMYqQpaQz2X6gaamYRB+RnMsUNcdMcL2Mj_w@mail.gmail.com
These were introduced (commit efdc7d7475) at the same time as we were
moving to using the standard inttypes.h format macros (commit
a0ed19e0a9). It doesn't seem useful to keep a new already-deprecated
interface like this with only a few users, so remove the new symbols
again and have the callers use PRIx64.
(Also, INT64_HEX_FORMAT was kind of a misnomer, since hex formats all
use unsigned types.)
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/0ac47b5d-e5ab-4cac-98a7-bdee0e2831e4%40eisentraut.org
This creates a new test module src/test/modules/test_int128 and moves
src/tools/testint128.c into it so that it can be built using the
normal build system, allowing the 128-bit integer arithmetic functions
in src/include/common/int128.h to be tested automatically. For now,
the tests are skipped on platforms that don't have native int128
support.
While at it, fix the test128 union in the test code: the "hl" member
of test128 was incorrectly defined to be a union instead of a struct,
which meant that the tests were only ever setting and checking half of
each 128-bit integer value.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWgBMc9ZwKMYqQpaQz2X6gaamYRB+RnMsUNcdMcL2Mj_w@mail.gmail.com
toast_save_datum() has for a very long time some code able to handle
short varlenas (values up to 126 bytes reduced to a 1-byte header),
converting such varlenas to an external on-disk TOAST pointer with the
value saved uncompressed in the secondary TOAST relation.
There was zero coverage for this code path. This commit adds a test
able to exercise it, relying on two external attributes, one with a low
toast_tuple_target, so as it is possible to trigger the threshold for
the insertion of short varlenas into the TOAST relation.
Author: Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aJAl7-NvIk0kZByz@paquier.xyz
ALTER TABLE ... SET EXPRESSION AS removes statistics for the target column,
so running ANALYZE afterward is recommended. But this was previously not
documented, even though a similar recommendation exists for
ALTER TABLE ... SET DATA TYPE, which also clears the column's statistics.
This commit updates the documentation to include the ANALYZE recommendation
for SET EXPRESSION AS.
Since v18, virtual generated columns are supported, and these columns never
have statistics. Therefore, ANALYZE is not needed after SET DATA TYPE or
SET EXPRESSION AS when used on virtual generated columns. This commit also
updates the documentation to clarify that ANALYZE is unnecessary in such cases.
Back-patch the ANALYZE recommendation for SET EXPRESSION AS to v17
where the feature was introduced, and the note about virtual generated
columns to v18 where those columns were added.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/20250804151418.0cf365bd2855d606763443fe@sraoss.co.jp
Backpatch-through: 17
Following commit e035863c9a, building with -O0 began triggering
warnings about potentially uninitialized 'workbuf' usage. While
theoretically the initialization isn't necessary since VARDATA()
doesn't access the contents of the pointed-to object, this commit
explicitly initializes the workbuf variable to suppress the warning.
Buildfarm members adder and flaviventris have shown the warning.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAD21AoCOZxfqnNgfM5yVKJZYnOq5m2Q96fBGy1fovEqQ9V4OZA@mail.gmail.com
The result of "DirectFunctionCall1(numeric_float8, d)" is already in
Datum form, but the code was incorrectly applying PG_RETURN_FLOAT8()
to it. On machines where float8 is pass-by-reference, this would
result in complete garbage, since an unpredictable pointer value
would be treated as an integer and then converted to float. It's not
entirely clear how much of a problem would ensue on 64-bit hardware,
but certainly interpreting a float8 bitpattern as uint64 and then
converting that to float isn't the intended behavior.
As luck would have it, even the complete-garbage case doesn't break
BRIN indexes, since the results are only used to make choices about
how to merge values into ranges: at worst, we'd make poor choices
resulting in an inefficient index. Doubtless that explains the lack
of field complaints. However, users with BRIN indexes that use the
numeric_minmax_multi_ops opclass may wish to reindex in hopes of
making their indexes more efficient.
Author: Peter Eisentraut <peter@eisentraut.org>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2093712.1753983215@sss.pgh.pa.us
Backpatch-through: 14
This commit introduces a new column backup_type that indicates the
type of backup being performed: either 'full' or 'incremental'.
Bump catalog version.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Yugo Nagata <nagata@sraoss.co.jp>
Discussion: https://postgr.es/m/CAOzEurQuzbHwTj1ehk1a+eeQDidJPyrE5s6mYumkjwjZnurhkQ@mail.gmail.com
During CREATE DATABASE, if changing the locale provider, require that
a new locale is specified rather than trying to reinterpret the
template's locale using the new provider.
This only affects the behavior when the template uses the builtin
provider and CREATE DATABASE specifies the ICU provider without
specifying the locale. Previously, that may have succeeded due to
loose validation by ICU, whereas now that will cause an error. Because
it can cause an error, backport only to unreleased versions.
Discussion: https://postgr.es/m/5038b33a6dc639009f4b3d43fa6ae0c5ba9e04f7.camel@j-davis.com
Backpatch-through: 18
We've only bothered converting the external interfaces, not the
endian-dependent internal macros (which should not be used by any
callers other than the interface functions in this header, anyway).
The VARTAG_1B_E() changes are required for C++ compatibility.
Author: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/928ea48f-77c6-417b-897c-621ef16685a6@eisentraut.org
Macros like VARDATA() and VARSIZE() should be thought of as taking
values of type pointer to struct varlena or some other related struct.
The way they are implemented, you can pass anything to it and it will
cast it right. But this is in principle incorrect. To fix, add the
required DatumGetPointer() calls. Or in a couple of cases, remove
superfluous PointerGetDatum() calls.
It is planned in a subsequent patch to change macros like VARDATA()
and VARSIZE() to inline functions, which will enforce stricter typing.
This is in preparation for that.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/928ea48f-77c6-417b-897c-621ef16685a6%40eisentraut.org
Previously, specifying the publication option 'publish_generated_columns'
without an explicit value would incorrectly default to 'stored', which is
not the intended behavior.
This patch fixes the issue by raising an ERROR when no value is provided
for 'publish_generated_columns', ensuring that users must explicitly
specify a valid option.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Backpatch-through: 18, where it was introduced
Discussion: https://postgr.es/m/CAHut+PsCUCWiEKmB10DxhoPfXbF6jw5RD9ib2LuaQeA_XraW7w@mail.gmail.com
Import usleep, which, due to an oversight in oversight in commit
48796a98d5 was used but not imported.
Correct the comparison string used in two logfile checks. Previously, it
was incorrect and thus the test could never have failed.
Also wordsmith a comment to make it clear when hot_standby_feedback is
meant to be on during the test scenarios.
Reported-by: Melanie Plageman <melanieplageman@gmail.com>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_YO2mEm%3DZWZKPjTMU%3DgW5Y83_KMi_1cr51JwavH0ctd7w%40mail.gmail.com
Backpatch-through: 16
func.sgml has grown over the years to the point where it is very
difficult to manage. This commit splits out each sect1 piece into its
own file, which is then included in the main file, so that the built
documentation should be identical to the pre-split documentation. All
these new files are placed in a new "func" subdirectory, and the
previous func.sgml is removed.
Done using scripts developed by:
Author: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxFgAh1--EMwOjMuANe=VTmjkNaZjH+AzSe04-8ZCGiESA@mail.gmail.com
When prep_buildtree is used to prepare a build tree when the source
directory already contains another build tree, then it will produce
the directory structure of the first build tree in the second one.
For example, if there is
postgresql/
postgresql/build1/
and a new build tree postgresql/build2/ is prepared, then this will
produce
postgresql/build2/build1/
because it just copies all subdirectories of the source tree. This is
not harmful, but it's pretty stupid and can be confusing, and it slows
down prep_buildtree when there are many build trees.
When prep_buildtree was first created, it was more common for the
build tree to be outside the source tree, in which case this is not a
problem. But now with the arrival of meson, it appears to be more
common (and also the way it is documented in the PostgreSQL
documentation) to have the build tree inside the source tree. (To be
clear: This change does not affect meson at all. But it would be an
issue for example if you have a meson build tree and a configure build
tree under the same source tree.)
To fix this, change the "find" command to process only those top-level
directories that we know about (namely config, contrib, doc, src). (I
alternatively looked for ways to ignore directories that looked like
build directories, but that seemed extremely complicated.) With that,
we can also remove the code that ignores directories related to
source-control management.
In passing, also remove the workaround for handling prebuilt docs,
since that has been obsolete since commit 54fac0e505.
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/8b96b07f-1f48-46e9-b26e-01b2c9e4ac8d%40eisentraut.org
This name is only used as documentation, and using this name is
consistent with its byte being a 'w'. Renaming it would also make the
use of a symbolic name based on the word "WAL" rather than the obsolete
"XLog" term more consistent, per future commits along the lines of
37c7a7eeb6, 4a68d50088, f4b54e1ed9.
Discussion: https://postgr.es/m/aIECfYfevCUpenBT@nathan
Previously, enabling sync_replication_slots while wal_level was not set
to logical could cause the server to shut down. This was because
the postmaster performed a configuration check before launching
the slot synchronization worker and raised an ERROR if the settings
were incompatible. Since ERROR is treated as FATAL in the postmaster,
this resulted in the entire server shutting down unexpectedly.
This commit changes the postmaster to log that message with a LOG-level
instead of raising an ERROR, allowing the server to continue running
even with the misconfiguration.
Back-patch to v17, where slot synchronization was introduced.
Reported-by: Hugo DUBOIS <hdubois@scaleway.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Hugo DUBOIS <hdubois@scaleway.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Discussion: https://postgr.es/m/CAH0PTU_pc3oHi__XESF9ZigCyzai1Mo3LsOdFyQA4aUDkm01RA@mail.gmail.com
Backpatch-through: 17
It's possible to use a CHECK (col IS NOT NULL) constraint to skip
scanning a table for nulls when adding a NOT NULL constraint on the same
column. However, if the CHECK constraint is dropped on the same command
that the NOT NULL is added, this fails, i.e., makes the NOT NULL addition
slow. The best we can do about it at this stage is to document this so
that users aren't taken by surprise.
(In Postgres 18 you can directly add the NOT NULL constraint as NOT
VALID instead, so there's no longer much use for the CHECK constraint,
therefore no point in building mechanism to support the case better.)
Reported-by: Andrew <psy2000usa@yahoo.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/175385113607.786.16774570234342968908@wrigleys.postgresql.org
This enhancement builds upon the infrastructure introduced in commit
228c370868, which enables the preservation of deleted tuples and their
origin information on the subscriber. This capability is crucial for
handling concurrent transactions replicated from remote nodes.
The update introduces support for detecting update_deleted conflicts
during the application of update operations on the subscriber. When an
update operation fails to locate the target row-typically because it has
been concurrently deleted-we perform an additional table scan. This scan
uses the SnapshotAny mechanism and we do this additional scan only when
the retain_dead_tuples option is enabled for the relevant subscription.
The goal of this scan is to locate the most recently deleted tuple-matching
the old column values from the remote update-that has not yet been removed
by VACUUM and is still visible according to our slot (i.e., its deletion
is not older than conflict-detection-slot's xmin). If such a tuple is
found, the system reports an update_deleted conflict, including the origin
and transaction details responsible for the deletion.
This provides a groundwork for more robust and accurate conflict
resolution process, preventing unexpected behavior by correctly
identifying cases where a remote update clashes with a deletion from
another origin.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Coverity complained that the "errtrace" string is leaked if we return
early because backtrace_symbols fails. Another criticism that could
be leveled at this is that not providing any hint of what happened is
user-unfriendly. Fix that.
The odds of a leak here are small, and typically it wouldn't matter
anyway since the leak will be in ErrorContext which will soon get
reset. So I'm not feeling a need to back-patch.
If ndatums is zero, the code would allocate zero-length boundKinds
and boundDatums chunks, which would have nothing pointing to them,
leading to Valgrind complaints. Rearrange the code to avoid the
useless pallocs, and also to not bother computing byval/typlen when
they aren't used.
I'm unsure why I didn't see this in my Valgrind testing back in May.
This code hasn't changed since then, but maybe we added a regression
test that reaches this edge case. Or possibly I just failed to
notice the reports, which do say "0 bytes lost".
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
CompleteCachedPlan intentionally doesn't worry about small
leaks from PlanCacheComputeResultDesc. However, Valgrind
knows nothing of engineering tradeoffs and complains anyway.
Silence it by doing things the hard way if USE_VALGRIND.
I don't really love this patch, because it makes the handling
of plansource->resultDesc different from the handling of query
dependencies and search_path just above, which likewise are willing
to accept small leaks into the cached plan's context. However,
those cases aren't provoking Valgrind complaints. (Perhaps in a
CLOBBER_CACHE_ALWAYS build, they would?) For the moment, this
makes the src/pl/plpgsql tests leak-free according to Valgrind.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
Like the situation with function cache loading, text search
dictionary loading functions tend to leak some cruft into the
dictionary's long-lived cache context. To judge by the examples in
the core regression tests, not very many bytes are at stake.
Moreover, I don't see a way to prevent such leaks without changing the
API for TS template initialization functions: right now they do not
have to worry about making sure that their results are long-lived.
Hence, I think we should install a suppression rule rather than trying
to fix this completely. However, I did grab some low-hanging fruit:
several places were leaking the result of get_tsearch_config_filename.
This seems worth doing mostly because they are inconsistent with other
dictionaries that were freeing it already.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
PL/pgSQL and SQL-function parsing leak some stuff into the long-lived
function cache context. This isn't really a huge practical problem,
since it's not a large amount of data and the cruft will be recovered
if we have to re-parse the function. It's not clear that it's worth
working any harder than the previous patch did to eliminate these
leak complaints, so instead silence them with a suppression rule.
This suppression rule also hides the fact that CachedFunction structs
are intentionally leaked in some cases because we're unsure if any
fn_extra pointers remain. That might be nice to do something about
eventually, but it's not clear how.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
format_procedure leaks memory, so run it in a short-lived context
not the session-lifespan cache context for the PL/pgSQL function.
parse_datatype called the core parser in the function's cache context,
thus leaking potentially a lot of storage into that context. We were
also being a bit careless with the TypeName structures made in that
code path and others. Most of the time we don't need to retain the
TypeName, so make sure it is made in the short-lived temp context,
and copy it only if we do need to retain it.
These are far from the only leaks in PL/pgSQL compilation, but
they're the biggest as far as I've seen, and further improvement
looks like it'd require delicate and bug-prone surgery.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
These changes don't actually fix any leaks. They just make sure that
Valgrind will find pointers to data structures that remain allocated
at process exit, and thus not falsely complain about leaks. In
particular, we are trying to avoid situations where there is no
pointer to the beginning of an allocated block (except possibly
within the block itself, which Valgrind won't count).
* Because dynahash.c never frees hashtable storage except by deleting
the whole hashtable context, it doesn't bother to track the individual
blocks of elements allocated by element_alloc(). This results in
"possibly lost" complaints from Valgrind except when the first element
of each block is actively in use. (Otherwise it'll be on a freelist,
but very likely only reachable via "interior pointers" within element
blocks, which doesn't satisfy Valgrind.)
To fix, if we're building with USE_VALGRIND, expend an extra pointer's
worth of space in each element block so that we can chain them all
together from the HTAB header. Skip this in shared hashtables though:
Valgrind doesn't track those, and we'd need additional locking to make
it safe to manipulate a shared chain.
While here, update a comment obsoleted by 9c911ec06.
* Put the dlist_node fields of catctup and catclist structs first.
This ensures that the dlist pointers point to the starts of these
palloc blocks, and thus that Valgrind won't consider them
"possibly lost".
* The postmaster's PMChild structs and the autovac launcher's
avl_dbase structs also have the dlist_node-is-not-first problem,
but putting it first still wouldn't silence the warning because we
bulk-allocate those structs in an array, so that Valgrind sees a
single allocation. Commonly the first array element will be pointed
to only from some later element, so that the reference would be an
interior pointer even if it pointed to the array start. (This is the
same issue as for dynahash elements.) Since these are pretty simple
data structures, I don't feel too bad about faking out Valgrind by
just keeping a static pointer to the array start.
(This is all quite hacky, and it's not hard to imagine usages where
we'd need some other idea in order to have reasonable leak tracking of
structures that are only accessible via dlist_node lists. But these
changes seem to be enough to silence this class of leakage complaints
for the moment.)
* Free a couple of data structures manually near the end of an
autovacuum worker's run when USE_VALGRIND, and ensure that the final
vac_update_datfrozenxid() call is done in a non-permanent context.
This doesn't have any real effect on the process's total memory
consumption, since we're going to exit as soon as that last
transaction is done. But it does pacify Valgrind.
* Valgrind complains about the postmaster's socket-files and
lock-files lists being leaked, which we can silence by just
not nulling out the static pointers to them.
* Valgrind seems not to consider the global "environ" variable as
a valid root pointer; so when we allocate a new environment array,
it claims that data is leaked. To fix that, keep our own
statically-allocated copy of the pointer, similarly to the previous
item.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
In the current system architecture, none of these are worth obsessing
over; most are once-per-process leaks. However, Valgrind complains
about all of them, and if we get to using threads rather than
processes for backend sessions, it will become more interesting to
avoid per-session leaks.
* Fix leaks in StartupXLOG() and ShutdownWalRecovery().
* Fix leakage of pq_mq_handle in a parallel worker.
While at it, move mq_putmessage's "Assert(pq_mq_handle != NULL)"
to someplace where it's not trivially useless.
* Fix leak in logicalrep_worker_detach().
* Don't leak the startup-packet buffer in ProcessStartupPacket().
* Fix leak in evtcache.c's DecodeTextArrayToBitmapset().
If the presented array is toasted, this neglected to free the
detoasted copy, which was then leaked into EventTriggerCacheContext.
* I'm distressed by the amount of code that BuildEventTriggerCache
is willing to run while switched into a long-lived cache context.
Although the detoasted array is the only leak that Valgrind reports,
let's tighten things up while we're here. (DecodeTextArrayToBitmapset
is still run in the cache context, so doing this doesn't remove the
need for the detoast fix. But it reduces the surface area for other
leaks.)
* load_domaintype_info() intentionally leaked some intermediate cruft
into the long-lived DomainConstraintCache's memory context, reasoning
that the amount of leakage will typically not be much so it's not
worth doing a copyObject() of the final tree to avoid that. But
Valgrind knows nothing of engineering tradeoffs and complains anyway.
On the whole, the copyObject doesn't cost that much and this is surely
not a performance-critical code path, so let's do it the clean way.
* MarkGUCPrefixReserved didn't bother to clean up removed placeholder
GUCs at all, which shows up as a leak in one regression test.
It seems appropriate for it to do as much cleanup as
define_custom_variable does when replacing placeholders, so factor
that code out into a helper function. define_custom_variable's logic
was one brick shy of a load too: it forgot to free the separate
allocation for the placeholder's name.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
Arrange that only the "aligned chunk" part of the allocated space is
included in a Valgrind vchunk. This suppresses complaints about that
vchunk being possibly lost because PG is retaining only pointers to
the aligned chunk. Also make sure that trailing wasted space is
marked NOACCESS.
As a tiny performance improvement, arrange that MCXT_ALLOC_ZERO zeroes
only the returned "aligned chunk", not the wasted padding space.
In passing, fix GetLocalBufferStorage to use MemoryContextAllocAligned
instead of rolling its own implementation, which was equally broken
according to Valgrind.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
When determining whether an allocated chunk is still reachable,
Valgrind will consider only pointers within what it believes to be
allocated chunks. Normally, all of a block obtained from malloc()
would be considered "allocated" --- but it turns out that if we use
VALGRIND_MEMPOOL_ALLOC to designate sub-section(s) of a malloc'ed
block as allocated, all the rest of that malloc'ed block is ignored.
This leads to lots of false positives of course. In particular,
in any multi-malloc-block context, all but the primary block were
reported as leaked. We also had a problem with context "ident"
strings, which were reported as leaked unless there was some other
pointer to them besides the one in the context header.
To fix, we need to use VALGRIND_MEMPOOL_ALLOC to designate
a context's management structs (the context struct itself and
any per-block headers) as allocated chunks. That forces moving
the VALGRIND_CREATE_MEMPOOL/VALGRIND_DESTROY_MEMPOOL calls into
the per-context-type code, so that the pool identifier can be
made as soon as we've allocated the initial block, but otherwise
it's fairly straightforward. Note that in Valgrind's eyes there
is no distinction between these allocations and the allocations
that the mmgr modules hand out to user code. That's fine for
now, but perhaps someday we'll want to do better yet.
When reading this patch, it's helpful to start with the comments
added at the head of mcxt.c.
Author: Andres Freund <andres@anarazel.de>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
Discussion: https://postgr.es/m/20210317181531.7oggpqevzz6bka3g@alap3.anarazel.de
Previously, when running pgbench in pipeline mode with a custom script
that triggered retriable errors (e.g., serialization errors),
an assertion failure could occur:
Assertion failed: (res == ((void*)0)), function discardUntilSync, file pgbench.c, line 3515.
The root cause was that pgbench incorrectly assumed only a single
pipeline sync message would be received at the end. In reality,
multiple pipeline sync messages can be sent and must be handled properly.
This commit fixes the issue by updating pgbench to correctly process
multiple pipeline sync messages, preventing the assertion failure.
Back-patch to v15, where the bug was introduced.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Tatsuo Ishii <ishii@postgresql.org>
Discussion: https://postgr.es/m/CAHGQGwFAX56Tfx+1ppo431OSWiLLuW72HaGzZ39NkLkop6bMzQ@mail.gmail.com
Backpatch-through: 15
It was not very clear that the triggers are only allowed on plain tables
(not foreign tables). Also, rephrase the documentation for better
readability.
Follow up to commit 9e6104c66.
Reported-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/CAPmGK16XBs9ptNr8Lk4f-tJZogf6y-Prz%3D8yhvJbb_4dpsc3mQ%40mail.gmail.com
Backpatch-through: 13
In ReorderBufferProcessTXN(), used to send the data of a transaction to
an output plugin, INSERT ON CONFLICT changes (INTERNAL_SPEC_INSERT) are
delayed until a confirmation record arrives (INTERNAL_SPEC_CONFIRM),
updating the change being processed.
8c58624df4 has added an extra step after processing a change to update
the progress of the transaction, by calling the callback
update_progress_txn() based on the LSN stored in a change after a
threshold of CHANGES_THRESHOLD (100) is reached. This logic has missed
the fact that for an INSERT ON CONFLICT change the data is freed once
processed, hence update_progress_txn() could be called pointing to a LSN
value that's already been freed. This could result in random crashes,
depending on the workload.
Per discussion, this issue is fixed by reusing in update_progress_txn()
the LSN from the change processed found at the beginning of the loop,
meaning that for a INTERNAL_SPEC_CONFIRM change the progress is updated
using the LSN of the INTERNAL_SPEC_CONFIRM change, and not the LSN from
its INTERNAL_SPEC_INSERT change. This is actually more correct, as we
want to update the progress to point to the INTERNAL_SPEC_CONFIRM
change.
Masahiko Sawada has found a nice trick to reproduce the issue: hardcode
CHANGES_THRESHOLD at 1 and run test_decoding (test "ddl" being enough)
on an instance running valgrind. The bug has been analyzed by Ethan
Mertz, who also originally suggested the solution used in this patch.
Issue introduced by 8c58624df4, so backpatch down to v16.
Author: Ethan Mertz <ethan.mertz@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aIsQqDZ7x4LAQ6u1@paquier.xyz
Backpatch-through: 16
Currently, ALTER DATABASE/ROLE/SYSTEM RESET [ALL] with an unknown
custom GUC with a prefix reserved by MarkGUCPrefixReserved() errors
(unless a superuser runs a RESET ALL variant). This is problematic
for cases such as an extension library upgrade that removes a GUC.
To fix, simply make sure the relevant code paths explicitly allow
it. Note that we require superuser or privileges on the parameter
to reset it. This is perhaps a bit more restrictive than is
necessary, but it's not clear whether further relaxing the
requirements is safe.
Oversight in commit 88103567cb. The ALTER SYSTEM fix is dependent
on commit 2d870b4aef, which first appeared in v17. Unfortunately,
back-patching that commit would introduce ABI breakage, and while
that breakage seems unlikely to bother anyone, it doesn't seem
worth the risk. Hence, the ALTER SYSTEM part of this commit is
omitted on v15 and v16.
Reported-by: Mert Alev <mert@futo.org>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/18964-ba09dea8c98fccd6%40postgresql.org
Backpatch-through: 15
PostgreSQL always sends the BackendKeyData message at connection
startup, but there are some third party backend implementations out
there that don't support cancellation, and don't send the message
[1]. While the protocol docs left it up for interpretation if that is
valid behavior, libpq in PostgreSQL 17 and below accepted it. It does
not seem like the libpq behavior was intentional though, since it did
so by sending CancelRequest messages with all zeros to such servers
(instead of returning an error or making the cancel a no-op).
In version 18 the behavior was changed to return an error when trying
to create the cancel object with PGgetCancel() or PGcancelCreate().
This was done without any discussion, as part of supporting different
lengths of cancel packets for the new 3.2 version of the protocol.
This commit changes the behavior of PGgetCancel() / PGcancel() once
more to only return an error when the cancel object is actually used
to send a cancellation, instead of when merely creating the object.
The reason to do so is that some clients [2] create a cancel object as
part of their connection creation logic (thus having the cancel object
ready for later when they need it), so if creating the cancel object
returns an error, the whole connection attempt fails. By delaying the
error, such clients will still be able to connect to the third party
backend implementations in question, but when actually trying to
cancel a query, the user will be notified that that is not possible
for the server that they are connected to.
This commit only changes the behavior of the older PGgetCancel() /
PQcancel() functions, not the more modern PQcancelCreate() family of
functions. I.e. PQcancelCreate() returns a failed connection object
(CONNECTION_BAD) if the server didn't send a cancellation key. Unlike
the old PQgetCancel() function, we're not aware of any clients in the
field that use PQcancelCreate() during connection startup in a way
that would prevent connecting to such servers.
[1] AWS RDS Proxy is definitely one of them, and CockroachDB might be
another.
[2] psycopg2 (but not psycopg3).
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Backpatch-through: 18
Discussion: https://www.postgresql.org/message-id/20250617.101056.1437027795118961504.ishii%40postgresql.org
A deadlock can occur when the DDL command and the apply worker acquire
catalog locks in different orders while dropping replication origins.
The issue is rare in PG16 and higher branches because, in most cases, the
tablesync worker performs the origin drop in those branches, and its
locking sequence does not conflict with DDL operations.
This patch ensures consistent lock acquisition to prevent such deadlocks.
As per buildfarm.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Ajin Cherian <itsajin@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 14, where it was introduced
Discussion: https://postgr.es/m/bab95e12-6cc5-4ebb-80a8-3e41956aa297@gmail.com
Commit c407d5426b added tab completion for ALTER ROLE|USER ... RESET,
with the intent to offer only the variables actually set on the role.
But as soon as the user started typing something, it would start to
offer all possible matching variables.
Fix this the same way ALTER DATABASE ... RESET does it, i.e. by
properly considering the prefix.
A second issue causing similar symptoms (offering variables that are not
actually set for a role) was caused by a match to another pattern. The
ALTER DATABASE ... RESET was already excluded, so do the same thing for
ROLE/USER.
Report and fix by Dagfinn Ilmari Mannsåker. Backpatch to 18, same as
c407d5426b.
Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/87qzyghw2x.fsf%40wibble.ilmari.org
Discussion: https://postgr.es/m/87tt4lumqz.fsf%40wibble.ilmari.org
Backpatch-through: 18
Commit 9df8727c50 failed to schema-quality the unnest() call in the
query used to list the variables in ALTER DATABASE ... RESET. If there's
another unnest() function in the search_path, this could cause either
failures, or even security issues (when the tab-completion gets used by
privileged accounts).
Report and fix by Dagfinn Ilmari Mannsåker. Backpatch to 18, same as
9df8727c50.
Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/87qzyghw2x.fsf%40wibble.ilmari.org
Discussion: https://postgr.es/m/87tt4lumqz.fsf%40wibble.ilmari.org
Backpatch-through: 18
pg_dump sorts objects by their logical names, e.g. (nspname, relname,
tgname), before dependency-driven reordering. That removes one source
of logically-identical databases differing in their schema-only dumps.
In other words, it helps with schema diffing. The logical name sort
ignored essential sort keys for constraints, operators, PUBLICATION
... FOR TABLE, PUBLICATION ... FOR TABLES IN SCHEMA, operator classes,
and operator families. pg_dump's sort then depended on object OID,
yielding spurious schema diffs. After this change, OIDs affect dump
order only in the event of catalog corruption. While pg_dump also
wrongly ignored pg_collation.collencoding, CREATE COLLATION restrictions
have been keeping that imperceptible in practical use.
Use techniques like we use for object types already having full sort key
coverage. Where the pertinent queries weren't fetching the ignored sort
keys, this adds columns to those queries and stores those keys in memory
for the long term.
The ignorance of sort keys became more problematic when commit
172259afb5 added a schema diff test
sensitive to it. Buildfarm member hippopotamus witnessed that.
However, dump order stability isn't a new goal, and this might avoid
other dump comparison failures. Hence, back-patch to v13 (all supported
versions).
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/20250707192654.9e.nmisch@google.com
Backpatch-through: 13
This patch adds two new counters to pg_stat_statements:
- generic_plan_calls
- custom_plan_calls
These counters track how many times a prepared statement was executed
using a generic or custom plan, respectively, providing a global
equivalent at query level, for top and non-top levels, of
pg_prepared_statements whose data is restricted to a single session.
This commit builds upon e125e36002. The module is bumped to version
1.13. PGSS_FILE_HEADER is bumped as well, something that the latest
patches touching the on-disk format of the PGSS file did not actually
bother with since 2022..
Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Nikolay Samokhvalov <nik@postgres.ai>
Discussion: https://postgr.es/m/CAA5RZ0uFw8Y9GCFvafhC=OA8NnMqVZyzXPfv_EePOt+iv1T-qQ@mail.gmail.com
Commit 719dcf3c42 introduced a field called CachedPlanType in
PlannedStmt to allow extensions to determine whether a cached plan is
generic or custom.
After discussion, the concepts that we want to track are a bit wider
than initially anticipated, as it is closer to knowing from which
"source" or "origin" a PlannedStmt has been generated or retrieved.
Custom and generic cached plans are a subset of that.
Based on the state of HEAD, we have been able to define two more
origins:
- "standard", for the case where PlannedStmt is generated in
standard_planner(), the most common case.
- "internal", for the fake PlannedStmt generated internally by some
query patterns.
This could be tuned in the future depending on what is needed. This
looks like a good starting point, at least. The default value is called
"UNKNOWN", provided as fallback value. This value is not used in the
core code, the idea is to let extensions building their own PlannedStmts
know about this new field.
Author: Michael Paquier <michael@paquier.xyz>
Co-authored-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/aILaHupXbIGgF2wJ@paquier.xyz
The sentence in question gave readers the impression that vacuumdb
removes statistics for a period of time while analyzing, but it's
actually meant to convey that --analyze-in-stages temporarily
replaces existing statistics with ones generated with lower
statistics targets.
Reported-by: Frédéric Yhuel <frederic.yhuel@dalibo.com>
Reviewed-by: Frédéric Yhuel <frederic.yhuel@dalibo.com>
Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/4b94ca16-7a6d-4581-b2aa-4ea79dbc082a%40dalibo.com
Backpatch-through: 18
Presently, pg_upgrade assumes that all non-default tablespaces
don't move to different directories during upgrade. Unfortunately,
this isn't true for in-place tablespaces, which move to the new
cluster's pg_tblspc directory. This commit teaches pg_upgrade to
handle in-place tablespaces by retrieving the tablespace
directories for both the old and new clusters. In turn, we can
relax the prohibition on non-default tablespaces for same-version
upgrades, i.e., if all non-default tablespaces are in-place,
pg_upgrade may proceed.
This change is primarily intended to enable additional pg_upgrade
testing with non-default tablespaces, as is done in
006_transfer_modes.pl.
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aA_uBLYMUs5D66Nb%40nathan
Recent discussions of the mechanisms used to manage global data have
raised concerns about their robustness and security. Rather than try
to deal with those concerns at a very late stage of the release cycle,
the conclusion is to revert these features and work on them for the
next release.
This reverts parts or all of the following commits:
1495eff7bd Non text modes for pg_dumpall, correspondingly change pg_restore
5db3bf7391 Clean up from commit 1495eff7bd289f74d0cb Add more TAP tests for pg_dumpall
2ef5790806 Fix a couple of error messages and tests for them
b52a4a5f28 Clean up error messages from 1495eff7bd4170298b6e Further cleanup for directory creation on pg_dump/pg_dumpall
22cb6d2895 Fix memory leak in pg_restore.c
928394b664 Improve various new-to-v18 appendStringInfo calls
39729ec01d Fix fat fingering in 22cb6d28955822bf21d5 Add missing space in pg_restore documentation.
f09088a01d Free memory properly in pg_restore.c
40b9c27014 pg_restore cleanups
4aad2cb770 Portability fix: isdigit() must be passed an unsigned char.
88e947136b Fix typos and grammar in the code
f60420cff6 doc: Alphabetize long options for pg_dump[all].
bc35adee8d doc: Put new options in consistent order on man pages
a876464abc Message style improvements
dec6643487 Improve pg_dump/pg_dumpall help synopses and terminology
0ebd242555 Run pgperltidy
Discussion: https://postgr.es/m/20250708212819.09.nmisch@google.com
Backpatch-to: 18
Reviewed-by: Noah Misch <noah@leadboat.com>
The configure checks used two incorrect functions when checking the
presence of some routines in an environment:
- __get_cpuidex() for the check of __cpuidex().
- __get_cpuid() for the check of __cpuid().
This means that Postgres has never been able to detect the presence of
these functions, impacting environments where these exist, like Windows.
Simply fixing the function name does not work. For example, using
configure with MinGW on Windows causes the checks to detect all four of
__get_cpuid(), __get_cpuid_count(), __cpuidex() and __cpuid() to be
available, causing a compilation failure as this messes up with the
MinGW headers as we would include both <intrin.h> and <cpuid.h>.
The Postgres code expects only one in { __get_cpuid() , __cpuid() } and
one in { __get_cpuid_count() , __cpuidex() } to exist. This commit
reshapes the configure checks to do exactly what meson is doing, which
has been working well for us: check one, then the other, but never allow
both to be detected in a given build.
The logic is wrong since 3dc2d62d04 and 792752af4e where these
checks have been introduced (the second case is most likely a copy-pasto
coming from the first case), with meson documenting that the configure
checks were broken. As far as I can see, they are not once applied
consistently with what the code expects, but let's see if the buildfarm
has different something to say. The comment in meson.build is adjusted
as well, to reflect the new reality.
Author: Lukas Fittl <lukas@fittl.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aIgwNYGVt5aRAqTJ@paquier.xyz
Backpatch-through: 13
If the client sent a query cancel request with backend PID 0, it
tripped an assertion. With assertions disabled, you got this in the
log instead:
LOG: invalid cancel request with PID 0
LOG: wrong key in cancel request for process 0
Query cancellations don't even require authentication, so we better
tolerate bogus requests. Fix by turning the assertion into a regular
runtime check.
Spotted while testing libpq behavior with a modified server that
didn't send BackendKeyData to the client.
Backpatch-through: 18
For many optional libraries, we extract the -L and -l switches needed
to link the library from a helper program such as llvm-config. In
some cases we put the resulting -L switches into LDFLAGS ahead of
-L switches specified via --with-libraries. That risks breaking
the user's intention for --with-libraries.
It's not such a problem if the library's -L switch points to a
directory containing only that library, but on some platforms a
library helper may "helpfully" offer a switch such as -L/usr/lib
that points to a directory holding all standard libraries. If the
user specified --with-libraries in hopes of overriding the standard
build of some library, the -L/usr/lib switch prevents that from
happening since it will come before the user-specified directory.
To fix, avoid inserting these switches directly into LDFLAGS during
configure, instead adding them to LIBDIRS or SHLIB_LINK. They will
still eventually get added to LDFLAGS, but only after the switches
coming from --with-libraries.
The same problem exists for -I switches: those coming from
--with-includes should appear before any coming from helper programs
such as llvm-config. We have not heard field complaints about this
case, but it seems certain that a user attempting to override a
standard library could have issues.
The changes for this go well beyond configure itself, however,
because many Makefiles have occasion to manipulate CPPFLAGS to
insert locally-desirable -I switches, and some of them got it wrong.
The correct ordering is any -I switches pointing at within-the-
source-tree-or-build-tree directories, then those from the tree-wide
CPPFLAGS, then those from helper programs. There were several places
that risked pulling in a system-supplied copy of libpq headers, for
example, instead of the in-tree files. (Commit cb36f8ec2 fixed one
instance of that a few months ago, but this exercise found more.)
The Meson build scripts may or may not have any comparable problems,
but I'll leave it to someone else to investigate that.
Reported-by: Charles Samborski <demurgos@demurgos.net>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/70f2155f-27ca-4534-b33d-7750e20633d7@demurgos.net
Backpatch-through: 13
When I prepared 71c0921b6 et al yesterday, I was thinking that the
logic involving explicitly freeing the node_list output was still
needed to dodge leakage bugs in libxml2. But I was misremembering:
we introduced that only because with early 2.13.x releases we could
not trust xmlParseBalancedChunkMemory's result code, so we had to
look to see if a node list was returned or not. There's no reason
to believe that xmlParseBalancedChunkMemory will fail to clean up
the node list when required, so simplify. (This essentially
completes reverting all the non-cosmetic changes in 6082b3d5d.)
Reported-by: Jim Jones <jim.jones@uni-muenster.de>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/997668.1753802857@sss.pgh.pa.us
Backpatch-through: 13
pgfdw_report_error has the same design fault as elog/ereport
do, namely that it might or might not return depending on elevel.
While those functions are too widely used to redesign, there are
only about 30 call sites for pgfdw_report_error, and it's not
exposed for extension use. So let's rethink it. Split it into
pgfdw_report_error() which hard-wires ERROR elevel and is marked
pg_noreturn, and pgfdw_report() which allows only elevels less
than ERROR. (Thanks to Álvaro Herrera for suggesting this naming.)
The motivation for doing this now is that in the wake of commit
80aa9848b, which removed a bunch of PG_TRYs from postgres_fdw,
we're seeing more thorough flow analysis there from C compilers
and Coverity. Marking pgfdw_report_error as noreturn where
appropriate should help prevent false-positive complaints.
We could alternatively have invented a macro wrapper similar
to what we use for elog/ereport, but that code is sufficiently
fragile that I didn't find it appetizing to make another copy.
Since 80aa9848b already changed pgfdw_report_error's signature,
this won't make back-patching any harder than it was already.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/420221.1753714491@sss.pgh.pa.us
In the wake of commit 80aa9848b, a few compilers think that
postgresAcquireSampleRowsFunc's "reltuples" might be used
uninitialized. The logic is visibly correct, both before
and after that change; presumably what happened here is that
the previous presence of a setjmp() in the function stopped
them from attempting any flow analysis at all. Add a dummy
initialization to silence the warning.
Reported-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAExHW5tkerCufA_F6oct5dMJ61N+yVrVgYXL7M8dD-5_zXjrDw@mail.gmail.com
Previously, if a background worker crashed and the server restarted
with restart_after_crash enabled, the worker was not restarted
as expected. This issue was fixed by commit b5d084c535,
which ensures that background workers without the never-restart flag
are correctly restarted after a crash-and-restart cycle.
To guard against regressions, this commit adds a test that verifies
a background worker successfully restarts in such a scenario.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: ChangAo Chen <cca5507@qq.com>
Discussion: https://postgr.es/m/CAHGQGwHF-PdUOgiXCH_8K5qBm8b13h0Qt=dSoFXZybXQdbf-tw@mail.gmail.com
This commit adds an extra --timeout=PG_TEST_TIMEOUT_DEFAULT to the call
of pg_isready done in is_alive(), so as it is possible to have more
leverage with the call on machines constrained on resources.
By default the timeout is 180s, and it can be changed depending on the
environment where the tests are run.
Per buildfarm member mamba, where the default timeout of 3s used by
pg_isready has proved that it may not be enough as the postmaster may
not have the time it needs to reply to a ping request.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/29b637df-f818-4b52-986a-f11ba28300e9@gmail.com
This commit documents differences in the definition of word separators for
the initcap function between libc and ICU locale providers.
Backpatch to all supported branches.
Discussion: https://postgr.es/m/804cc10ef95d4d3b298e76b181fd9437%40postgrespro.ru
Author: Oleg Tselebrovskiy <o.tselebrovskiy@postgrespro.ru>
Backpatch-through: 13
There've been a few complaints that it can be overly difficult to figure
out why the planner picked a Memoize plan. To help address that, here we
adjust the EXPLAIN output to display the following additional details:
1) The estimated number of cache entries that can be stored at once
2) The estimated number of unique lookup keys that we expect to see
3) The number of lookups we expect
4) The estimated hit ratio
Technically #4 can be calculated using #1, #2 and #3, but it's not a
particularly obvious calculation, so we opt to display it explicitly.
The original patch by Lukas Fittl only displayed the hit ratio, but
there was a fear that might lead to more questions about how that was
calculated. The idea with displaying all 4 is to be transparent which
may allow queries to be tuned more easily. For example, if #2 isn't
correct then maybe extended statistics or a manual n_distinct estimate can
be used to help fix poor plan choices.
Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CAP53Pky29GWAVVk3oBgKBDqhND0BRBN6yTPeguV_qSivFL5N_g%40mail.gmail.com
This mostly reverts commit 6082b3d5d, "Use xmlParseInNodeContext
not xmlParseBalancedChunkMemory". It turns out that
xmlParseInNodeContext will reject text chunks exceeding 10MB, while
(in most libxml2 versions) xmlParseBalancedChunkMemory will not.
The bleeding-edge libxml2 bug that we needed to work around a year
ago is presumably no longer a factor, and the argument that
xmlParseBalancedChunkMemory is semi-deprecated is not enough to
justify a functionality regression. Hence, go back to doing it
the old way.
Reported-by: Michael Paquier <michael@paquier.xyz>
Author: Michael Paquier <michael@paquier.xyz>
Co-authored-by: Erik Wienhold <ewie@ewie.name>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/aIGknLuc8b8ega2X@paquier.xyz
Backpatch-through: 13
Commit ffae5cc5a6 added this hint in 2006,
but it's now obsolete and doesn't reflect what users should really check
in this situation. We were not able to agree on a new hint, so just delete
the existing one and update the comments to mention one possibility that
is known to cause problems of this kind: something other than PostgreSQL
is modifying files in the PostgreSQL data directory.
Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Robert Haas <rhaas@postgresql.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/CAKZiRmxNbcaL76x=09Sxf7aUmrRQJBf8drzDdUHo+j9_eM+VMg@mail.gmail.com
Commit 473a575e05 purported to make this
function stash the error message in *syncrep_parse_result_p, but
it didn't actually.
As a result, an attempt to set synchronous_standby_names to any value
that does not parse resulted in a generic "parser failed." message
rather than anything more specific. This fixes that.
Discussion: http://postgr.es/m/CA+TgmoYF9wPNZ-Q_EMfib_espgHycY-eX__6Tzo2GpYpVXqCdQ@mail.gmail.com
Backpatch-through: 18
This routines includes three code paths that can fail, with the
allocated prepared statement name going out of scope.
Per report from Coverity. Oversight in commit a6eabec680, that has
played with the order of some ecpg_strdup() calls in this code path.
The callback added in fc415edf8c used to check if there is any pending
data to flush for fixed-numbered statistics, done by looping across all
the builtin and custom stats kinds with a call to have_fixed_pending_cb,
is proving to able to show in workloads that do not report any stats
(read-only, no function calls, no WAL, no IO, etc). The code used in
v17 was cheaper than that what HEAD has introduced, relying on three
boolean checks for WAL, SLRU and IO stats.
This commit switches the code to use a more efficient approach than
fc415edf8c, with a single boolean flag that can be switched to "true"
by any fixed-numbered stats kinds to force pgstat_report_stat() to go
through one round of reports. The flag is reset by pgstat_report_stat()
once a full round of reports is done. The flag being false means that
fixed-numbered stats kinds saw no activity, and that there is no pending
data to flush.
ac000fca74 took one step in improving the performance by reducing the
number of stats kinds that the backend can hold. This commit takes a
more drastic step by bringing back the code efficiency to what it was
before v18 with a cheap check at the beginning of pgstat_report_stat()
for its fast-exit path.
The callback have_static_pending_cb is removed as an effect of all that.
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/eb224uegsga2hgq7dfq3ps5cduhpqej7ir2hjxzzozjthrekx5@dysei6buqthe
Backpatch-through: 18
If the number of sync requests is big enough, the palloc() call in
AbsorbSyncRequests() will attempt to allocate more than 1 GB of memory,
resulting in failure. This can lead to an infinite loop in the checkpointer
process, as it repeatedly fails to absorb the pending requests.
This commit introduces the following changes to cope with this problem:
1. Turn pending checkpointer requests array in shared memory into a bounded
ring buffer.
2. Limit maximum ring buffer size to 10M items.
3. Make AbsorbSyncRequests() process requests incrementally in 10K batches.
Even #2 makes the whole queue size fit the maximum palloc() size of 1 GB.
of continuous lock holding.
This commit is for master only. Simpler fix, which just limits a request
queue size to 10M, will be backpatched.
Reported-by: Ekaterina Sokolova <e.sokolova@postgrespro.ru>
Discussion: https://postgr.es/m/db4534f83a22a29ab5ee2566ad86ca92%40postgrespro.ru
Author: Maxim Orlov <orlovmg@gmail.com>
Co-authored-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Similar checks are done for the mandatory table AM callbacks. A portion
of the index AM callbacks are optional and can be NULL; the rest is
mandatory and is documented as such in the documentation and in amapi.h.
These checks are useful to detect quickly if all the mandatory callbacks
are defined when implementing a new index access method, as the
assertions are run when loading the AM.
Author: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/ME0P300MB0445795D31CEAB92C58B41FDB651A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Valgrind complains that the PQconninfoOption array returned by libpq
is leaked. We apparently believed that we could suppress that warning
by storing that array's address in a static variable. However, modern
C compilers are bright enough to optimize the static variable away.
We could escalate that arms race by making the variable global.
But on the whole it seems better to revise the code so that it
can free libpq's result properly. The only thing that costs
us is copying the parameter-name keywords; which seems like a
pretty negligible cost in a function that runs at most once per
process.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/2976982.1748049023@sss.pgh.pa.us
Remove a bunch of PG_TRY constructs, de-volatilize related
variables, remove some PQclear calls in error paths.
Aside from making the code simpler and shorter, this should
provide some marginal performance gains.
For ease of review, I did not re-indent code within the removed
PG_TRY constructs. That'll be done in a separate patch.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/2976982.1748049023@sss.pgh.pa.us
Commit 232d8caea fixed a case where postgres_fdw could lose track
of a PGresult object, resulting in a process-lifespan memory leak.
But I have little faith that there aren't other potential PGresult
leakages, now or in future, in the backend modules that use libpq.
Therefore, this patch proposes infrastructure that makes all
PGresults returned from libpq act as though they are palloc'd
in the CurrentMemoryContext (with the option to relocate them to
another context later). This should greatly reduce the risk of
careless leaks, and it also permits removal of a bunch of code
that attempted to prevent such leaks via PG_TRY blocks.
This patch adds infrastructure that wraps each PGresult in a
"libpqsrv_PGresult" that provides a memory context reset callback
to PQclear the PGresult. Code using this abstraction is inherently
memory-safe to the same extent as we are accustomed to in most backend
code. Furthermore, we add some macros that automatically redirect
calls of the libpq functions concerned with PGresults to use this
infrastructure, so that almost no source-code changes are needed to
wheel this infrastructure into place in all the backend code that
uses libpq.
Perhaps in future we could create similar infrastructure for
PGconn objects, but there seems less need for that.
This patch just creates the infrastructure and makes relevant code
use it, including reverting 232d8caea in favor of this mechanism.
A good deal of follow-on simplification is possible now that we don't
have to be so cautious about freeing PGresults, but I'll put that in
a separate patch.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/2976982.1748049023@sss.pgh.pa.us
This flag was effectively a no-op in EXEC_BACKEND (ie, Windows)
builds, because it was kept in the process-local HTAB struct,
and it could only ever become set in the postmaster's copy.
The simplest fix is to move it to the shared HASHHDR struct.
We could keep a copy in HTAB as well, as we do with keysize
and some other fields, but the "too much contention" argument
doesn't seem to apply here: we only examine isfixed during
element_alloc(), which had better not get hit very often for
a shared hashtable.
This oversight dates to 7c797e719 which invented the option.
But back-patching doesn't seem appropriate given the lack of
field complaints. If there is anyone running an affected
workload on Windows, they might be unhappy about the behavior
changing in a minor release.
Author: Aidar Imamov <a.imamov@postgrespro.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/4d0cb35ff01c5c74d2b9a582ecb73823@postgrespro.ru
This changes the grammar for REINDEX, CHECKPOINT, CLUSTER, ANALYZE/ANALYSE;
they still accept the same options as before, but the grammar is written
differently for convenience of future development.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/202507231538.ir7pjzoow6oe@alvherre.pgsql
Previously, if a background worker crashed (e.g., due to a SIGKILL) and
the server restarted due to restart_after_crash being enabled,
the worker was not restarted as expected. Background workers without
the never-restart flag should automatically restart in this case.
This issue was introduced in commit 28a520c0b7, which failed to reset
the rw_pid field in the RegisteredBgWorker struct for the crashed worker.
This commit fixes the problem by resetting rw_pid for all eligible
background workers during the crash-and-restart cycle.
Back-patched to v18, where the bug was introduced.
Bug fix patches were proposed by Andrey Rudometov and ChangAo Chen,
but this commit uses a different approach.
Reported-by: Andrey Rudometov <unlimitedhikari@gmail.com>
Reported-by: ChangAo Chen <cca5507@qq.com>
Author: Andrey Rudometov <unlimitedhikari@gmail.com>
Author: ChangAo Chen <cca5507@qq.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: ChangAo Chen <cca5507@qq.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Discussion: https://postgr.es/m/CAF6JsWiO=i24qYitWe6ns1sXqcL86rYxdyU+pNYk-WueKPSySg@mail.gmail.com
Discussion: https://postgr.es/m/tencent_E00A056B3953EE6440F0F40F80EC30427D09@qq.com
Backpatch-through: 18
LatchWaitSetPostmasterDeathPos, the latch event position for the
postmaster death event, is initialized under IsUnderPostmaster.
WaitLatch() considered it as a valid wait target in single-user mode
(!IsUnderPostmaster), which was incorrect.
One code path found to fail with an assertion failure is a database drop
in single-user mode while waiting in WaitForProcSignalBarrier() after
the drop.
Oversight in commit 84e5b2f07a.
Author: Patrick Stählin <me@packi.ch>
Co-authored-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Discussion: https://postgr.es/m/18996-3a2744c8140488de@postgresql.org
Backpatch-through: 18
This commit changes stats kinds to have the following bounds, making
their handling in core cheaper by default:
- PGSTAT_KIND_CUSTOM_MIN 128 -> 24
- PGSTAT_KIND_MAX 256 -> 32
The original numbers were rather high, and showed an impact on
performance in pgstat_report_stat() for the case of simple queries with
its early-exit path if there are no pending statistics to flush. This
logic will be improved more in a follow-up commit to bring the
performance of pgstat_report_stat() on par with v17 and older versions.
Lowering the bounds is a change worth doing on its own, independently of
the other improvement.
These new numbers should be enough to leave some room for the following
years for built-in and custom stats kinds, with stable ID numbers. At
least that should be enough to start with this facility for extension
developers. It can be always increased in the tree depending on the
requirements wanted.
Per discussion with Andres Freund and Bertrand Drouvot.
Discussion: https://postgr.es/m/eb224uegsga2hgq7dfq3ps5cduhpqej7ir2hjxzzozjthrekx5@dysei6buqthe
Backpatch-through: 18
This function is declared as returning a uint8, but it returns a
bool in one code path. To fix, return (uint8) 0 instead of false
there. This should behave exactly the same as before, but it might
prevent future compiler complaints.
Oversight in commit a892234f83.
Author: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://postgr.es/m/aIHluT2isN58jqHV%40jrouhaud
Previously, the tool could replay the same transaction twice, once during
recovery, then again during replication after the subscriber was set up.
This occurred because the same recovery_target_lsn was used both to
finalize recovery and to start replication. If
recovery_target_inclusive = true, the transaction at that LSN would be
applied during recovery and then sent again by the publisher leading to
duplication.
To prevent this, we now set recovery_target_inclusive = false. This
ensures the transaction at recovery_target_lsn is not reapplied during
recovery, avoiding duplication when replication begins.
Bug #18897
Reported-by: Zane Duffield <duffieldzane@gmail.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/18897-d3db67535860dddb@postgresql.org
PlannedStmt gains a new field, called CachedPlanType, able to track if a
given plan tree originates from the cache and if we are dealing with a
generic or custom cached plan.
This field can be used for monitoring or statistical purposes, in the
executor hooks, for example, based on the planned statement attached to
a QueryDesc. A patch is under discussion for pg_stat_statements to
provide an equivalent of the counters in pg_prepared_statements for
custom and generic plans, to provide a more global view of such data, as
this data is now restricted to the current session.
The concept introduced in this commit is useful on its own, and has been
extracted from a larger patch by the same author.
Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAA5RZ0uFw8Y9GCFvafhC=OA8NnMqVZyzXPfv_EePOt+iv1T-qQ@mail.gmail.com
Ensure the test waits for the apply worker to exit after disabling the
subscription. This is necessary to safely enable the retain_dead_tuples
option. Also added a similar wait in another part of the test to prevent
unintended apply worker activity that could lead to test failures
post-subscription disable.
Reported by Michael Paquier as per cfbot.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Discussion: https://postgr.es/m/aIGLgfRJIBwExoPj@paquier.xyz
This commit adds missing index entries for the functions pg_buffercache_numa()
and pg_buffercache_usage_counts() in the pg_buffercache documentation.
It also makes the function titles consistent by adding parentheses after
function names where they were previously missing.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/7d19af4b-7da3-4862-9f52-ff958960bd8d@oss.nttdata.com
Backpatch-through: 18
Solaris has never bothered to add "const" to the second argument
of PAM conversation procs, as all other Unixen did decades ago.
This resulted in an "incompatible pointer" compiler warning when
building --with-pam, but had no more serious effect than that,
so we never did anything about it. However, as of GCC 14 the
case is an error not warning by default.
To complicate matters, recent OpenIndiana (and maybe illumos
in general?) *does* supply the "const" by default, so we can't
just assume that platforms using our solaris template need help.
What we can do, short of building a configure-time probe,
is to make solaris.h #define _PAM_LEGACY_NONCONST, which
causes OpenIndiana's pam_appl.h to revert to the traditional
definition, and hopefully will have no effect anywhere else.
Then we can use that same symbol to control whether we include
"const" in the declaration of pam_passwd_conv_proc().
Bug: #18995
Reported-by: Andrew Watkins <awatkins1966@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18995-82058da9ab4337a7@postgresql.org
Backpatch-through: 13
lwlock.c, lwlock.h, and wait_event_names.txt each contain a list of
built-in LWLock tranches. It is easy to miss one or the other when
adding or removing tranches, and discrepancies have adverse effects
(e.g., breaking JOINs between pg_stat_activity and pg_wait_events).
This commit moves the lists of built-in tranches in lwlock.{c,h} to
lwlocklist.h and adds a cross-check to the script that generates
lwlocknames.h. If the lists do not match exactly, building will
fail.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aHpOgwuFQfcFMZ/B%40ip-10-97-1-34.eu-west-3.compute.internal
Commit 70e81861fa split out xlogrecovery.c/h and moved some enums
related to recovery targets to xlogrecovery.h. However, it seems that
the enum RecoveryTargetAction was inadvertently left out by that commit.
This commit moves it to xlogrecovery.h for consistency.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20240904.173013.1132986940678039655.horikyota.ntt@gmail.com
Logical replication requires reliable conflict detection to maintain data
consistency across nodes. To achieve this, we must prevent premature
removal of tuples deleted by other origins and their associated commit_ts
data by VACUUM, which could otherwise lead to incorrect conflict reporting
and resolution.
This patch introduces a mechanism to retain deleted tuples on the
subscriber during the application of concurrent transactions from remote
nodes. Retaining these tuples allows us to correctly ignore concurrent
updates to the same tuple. Without this, an UPDATE might be misinterpreted
as an INSERT during resolutions due to the absence of the original tuple.
Additionally, we ensure that origin metadata is not prematurely removed by
vacuum freeze, which is essential for detecting update_origin_differs and
delete_origin_differs conflicts.
To support this, a new replication slot named pg_conflict_detection is
created and maintained by the launcher on the subscriber. Each apply
worker tracks its own non-removable transaction ID, which the launcher
aggregates to determine the appropriate xmin for the slot, thereby
retaining necessary tuples.
Conflict information retention (deleted tuples and commit_ts) can be
enabled per subscription via the retain_conflict_info option. This is
disabled by default to avoid unnecessary overhead for configurations that
do not require conflict resolution or logging.
During upgrades, if any subscription on the old cluster has
retain_conflict_info enabled, a conflict detection slot will be created to
protect relevant tuples from deletion when the new cluster starts.
This is a foundational work to correctly detect update_deleted conflict
which will be done in a follow-up patch.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Compilers such as gcc and clang seem to perform this rewrite
automatically when the lookup string is known at compile-time to contain
a single character. The MSVC compiler does not seem apply the same
optimization, and the code being adjusted here is within an #ifdef WIN32,
so it seems worth adjusting this with the assumption that strchr() will be
slightly more performant.
There are a couple more instances in contrib/fuzzystrmatch that this
commit could also have adjusted. After some discussion, we deemed those
not important enough to bother with.
Author: Dmitry Mityugov <d.mityugov@postgrespro.ru>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: David Rowley <drowleyml@gmail.com>
Discussion: https://postgr.es/m/9c1beea6c7a5e9fb6677f26620f1f257%40postgrespro.ru
Various code paths of the ECPG code did not check for memory allocation
failures, including the specific case where ecpg_strdup() considers a
NULL value given in input as a valid behavior. strdup() returning
itself NULL on failure, there was no way to make the difference between
what could be valid and what should fail.
With the different cases in mind, ecpg_strdup() is redesigned and gains
a new optional argument, giving its callers the possibility to
differentiate allocation failures and valid cases where the caller is
giving a NULL value in input. Most of the ECPG code does not expect a
NULL value, at the exception of ECPGget_desc() (setlocale) and
ECPGconnect(), like dbname being unspecified, with repeated strdup
calls.
The code is adapted to work with this new routine. Note the case of
ecpg_auto_prepare(), where the code order is switched so as we handle
failures with ecpg_strdup() before manipulating any cached data,
avoiding inconsistencies.
This class of failure is unlikely a problem in practice, so no backpatch
is done. Random OOM failures in ECPGconnect() could cause the driver to
connect to a different server than the one wanted by the caller, because
it could fallback to default values instead of the parameters defined
depending on the combinations of allocation failures and successes.
Author: Evgeniy Gorbanev <gorbanyoves@basealt.ru>
Co-authored-by: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/a6b193c1-6994-4d9c-9059-aca4aaf41ddd@basealt.ru
Commit 112faf1378 introduced a translation marker in libpq-be-fe-helpers.h,
but this caused build failures on some platforms—such as the one reported
by buildfarm member indri—due to linker issues with dblink. This is the same
problem previously addressed in commit 213c959a29.
To fix the issue, this commit removes the translation marker from
libpq-be-fe-helpers.h, following the approach used in 213c959a29.
It also removes the associated gettext_noop() calls added in commit
112faf1378, as they are no longer needed.
While reviewing this, a gettext_noop() call was also found in
contrib/basic_archive. Since contrib modules don't support translation,
this call has been removed as well.
Per buildfarm member indri.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/0e6299d9-608a-4ffa-aeb1-40cb8a99000b@oss.nttdata.com
The definition of \dRp+ was modified in commit 7054186c4e. This patch
updates the column list and row filter examples to align with the revised
definition.
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed by: Peter Smith <smithpb2250@gmail.com>
Backpatch-through: 18, where it was introduced
Discussion: https://postgr.es/m/CANhcyEUvqkSO6b9zi_fs_BBPEge5acj4mf8QKmq2TX-7axa7EQ@mail.gmail.com
This index AM callback has been introduced in c1ec02be1d and it is
optional, currently only being used by BRIN. Optional callbacks are
documented with NULL as possible value in amapi.h and indexam.sgml, but
this callback has missed this part of the description.
Reported-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/CAHut+PvgYcPmPDi1YdHMJY5upnyGRpc0N8pk1xNB11xDSBwNog@mail.gmail.com
Backpatch-through: 17
Previously, NOTICE, WARNING, and similar messages received from remote
servers over replication, postgres_fdw, or dblink connections were printed
directly to stderr on the local server (e.g., the subscriber). As a result,
these messages lacked log prefixes (e.g., timestamp), making them harder
to trace and correlate with other log entries.
This commit addresses the issue by introducing a custom notice receiver
for replication, postgres_fdw, and dblink connections. These messages
are now logged via ereport(), ensuring they appear in the logs with proper
formatting and context, which improves clarity and aids in debugging.
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CALDaNm2xsHpWRtLm-VL_HJCsaE3+1Y_n-jDEAr3-suxVqc3xoQ@mail.gmail.com
ECPGconnect() caches established connections to the server, supporting
the case of a NULL connection name when a database name is not specified
by its caller.
A follow-up call to ECPGget_PGconn() to get an established connection
from the cached set with a non-NULL name could cause a NULL pointer
dereference if a NULL connection was listed in the cache and checked for
a match. At least two connections are necessary to reproduce the issue:
one with a NULL name and one with a non-NULL name.
Author: Aleksander Alekseev <aleksander@tigerdata.com>
Discussion: https://postgr.es/m/CAJ7c6TNvFTPUTZQuNAoqgzaSGz-iM4XR61D7vEj5PsQXwg2RyA@mail.gmail.com
Backpatch-through: 13
In commit b262ad440, we introduced an optimization that reduces an IS
[NOT] NULL qual on a NOT NULL column to constant true or constant
false, provided we can prove that the input expression of the NullTest
is not nullable by any outer joins or grouping sets. This deduction
happens quite late in the planner, during the distribution of quals to
rels in query_planner. However, this approach has some drawbacks: we
can't perform any further folding with the constant, and it turns out
to be prone to bugs.
Ideally, this deduction should happen during constant folding.
However, the per-relation information about which columns are defined
as NOT NULL is not available at that point. This information is
currently collected from catalogs when building RelOptInfos for base
or "other" relations.
This patch moves the collection of NOT NULL attribute information for
relations before pull_up_sublinks, storing it in a hash table keyed by
relation OID. It then uses this information to perform the NullTest
deduction for Vars during constant folding. This also makes it
possible to leverage this information to pull up NOT IN subqueries.
Note that this patch does not get rid of restriction_is_always_true
and restriction_is_always_false. Removing them would prevent us from
reducing some IS [NOT] NULL quals that we were previously able to
reduce, because (a) the self-join elimination may introduce new IS NOT
NULL quals after constant folding, and (b) if some outer joins are
converted to inner joins, previously irreducible NullTest quals may
become reducible.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMbWs4-bFJ1At4btk5wqbezdu8PLtQ3zv-aiaY3ry9Ymm=jgFQ@mail.gmail.com
There are several pieces of catalog information that need to be
retrieved for a relation during the early stage of planning. These
include relhassubclass, which is used to clear the inh flag if the
relation has no children, as well as a column's attgenerated and
default value, which are needed to expand virtual generated columns.
More such information may be required in the future.
Currently, these pieces of catalog data are collected in multiple
places, resulting in repeated table_open/table_close calls for each
relation in the rangetable. This patch centralizes the collection of
all required early-stage catalog information into a single loop over
the rangetable, allowing each relation to be opened and closed only
once.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMbWs4-bFJ1At4btk5wqbezdu8PLtQ3zv-aiaY3ry9Ymm=jgFQ@mail.gmail.com
Currently, we expand virtual generated columns after we have pulled up
any SubLinks within the query's quals. This ensures that the virtual
generated column references within SubLinks that should be transformed
into joins are correctly expanded. This approach works well and has
posed no issues.
In an upcoming patch, we plan to centralize the collection of catalog
information needed early in the planner. This will help avoid
repeated table_open/table_close calls for relations in the rangetable.
Since this information is required during sublink pull-up, we are
moving the expansion of virtual generated columns to occur beforehand.
To achieve this, if any EXISTS SubLinks can be pulled up, their
rangetables are processed just before pulling them up.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMbWs4-bFJ1At4btk5wqbezdu8PLtQ3zv-aiaY3ry9Ymm=jgFQ@mail.gmail.com
Commit e5da0fe3c2 introduced catalog entries for not-null constraints
on domains; but because commit b0e96f3119 (the original work for
catalogued not-null constraints on tables) forgot to teach pg_dump to
process the comments for them, this one also forgot. Add that now.
We also need to teach repairDependencyLoop() about the new type of
constraints being possible for domains.
Backpatch-through: 17
Co-authored-by: jian he <jian.universality@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Reported-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxF-0bqVR=j4jonS6N2Ka6hHUpFyu3_3TWKNhOW_4yFSSg@mail.gmail.com
When pg_recvlogical receives a SIGHUP signal, it closes the current
output file and reopens a new one. This is useful since it allows us to
rotate the output file by renaming the current file and sending a SIGHUP.
This behavior was previously undocumented. This commit adds
the missing documentation.
Back-patch to all supported versions.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Discussion: https://postgr.es/m/0977fc4f-1523-4ecd-8a0e-391af4976367@oss.nttdata.com
Backpatch-through: 13
The only practical effect of these changes is to avoid a useless
list_copy() operation when there is a single hashclause. That's
never going to make any noticeable performance difference, but
the code is arguably clearer this way, especially if we take the
opportunity to add some comments so that readers don't have to
reverse-engineer the usage of these local variables. Also add
some braces for better/more consistent style.
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAHewXNnHBOO9NEa=NBDYOrwZL4oHu2NOcTYvqyNyWEswo8f5OQ@mail.gmail.com
This commit is only for HEAD and v18, where the test has been removed.
It also incorporates improvements below to stability and coverage of the
original test, which were already backpatched to v17.
- Add one pg_logical_emit_message() call to force the creation of a record
that spawns across two pages.
- Make the logic wait for the checkpoint completion.
Author: Alexander Korotkov <akorotkov@postgresql.org>
Co-authored-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Backpatch-through: 18
Currently, the comments in 047_checkpoint_physical_slot. It shows an
incomplete intention to wait for checkpoint completion before performing
an immediate database stop. However, an immediate node stop can occur both
before and after checkpoint completion. Both cases should work correctly.
But we would like the test to be more stable and deterministic. This is why
this commit makes this test explicitly wait for the checkpoint completion
log message.
Discussion: https://postgr.es/m/CAPpHfdurV-j_e0pb%3DUFENAy3tyzxfF%2ByHveNDNQk2gM82WBU5A%40mail.gmail.com
Discussion: https://postgr.es/m/aHXLep3OaX_vRTNQ%40paquier.xyz
Author: Alexander Korotkov <akorotkov@postgresql.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Backpatch-through: 17
If a crash occurs while writing a WAL record that spans multiple pages, the
recovery process marks the page with the XLP_FIRST_IS_OVERWRITE_CONTRECORD
flag. However, logical decoding currently attempts to read the full WAL
record based on its expected size before checking this flag, which can lead
to an infinite wait if the remaining data is never written (e.g., no activity
after crash).
This patch updates the logic first to read the page header and check for
the XLP_FIRST_IS_OVERWRITE_CONTRECORD flag before attempting to reconstruct
the full WAL record. If the flag is set, decoding correctly identifies
the record as incomplete and avoids waiting for WAL data that will never
arrive.
Discussion: https://postgr.es/m/CAAKRu_ZCOzQpEumLFgG_%2Biw3FTa%2BhJ4SRpxzaQBYxxM_ZAzWcA%40mail.gmail.com
Discussion: https://postgr.es/m/CALDaNm34m36PDHzsU_GdcNXU0gLTfFY5rzh9GSQv%3Dw6B%2BQVNRQ%40mail.gmail.com
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Backpatch-through: 13
This commit improves the recovery TAP test 027_stream_regress so as
regression diffs are printed only if both the primary and the standby
are still alive after the main regression test suite finishes, relying
on d4c9195eff to do the job.
Particularly, a crash of the primary could scribble the contents
reported with mostly useless data, as the diffs would refer to query
that failed to run, not necessarily the cause of the crash.
Suggested-by: Andres Freund <andres@anarazel.de>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ1D6KXvjSs7YGsDeadqCxNF3UUhjRAfforzzP0k-cE=bA@mail.gmail.com
This new routine acts as a wrapper of pg_isready, that can be run on a
node to check its connection status. This will be used in a recovery
test in a follow-up commit.
Suggested-by: Andres Freund <andres@anarazel.de>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ1D6KXvjSs7YGsDeadqCxNF3UUhjRAfforzzP0k-cE=bA@mail.gmail.com
Instead of laboriously computing the exact output length, use strlen
to get an upper bound cheaply. (This is still O(N) of course, but
the constant factor is a lot less.) This will typically result in
overallocating the output datum, but that's of little concern since
it's a short-lived allocation in just about all use-cases.
A simple microbenchmark showed about 40% speedup for long input
strings.
While here, make some cosmetic cleanups and add a test case that
covers the double-backslash code path in byteain and byteaout.
Author: Steven Niu <niushiji@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Stepan Neretin <slpmcf@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/ca315729-140b-426e-81a6-6cd5cfe7ecc5@gmail.com
Presently, pg_dump generates commands like
SELECT pg_catalog.lo_create('5432');
ALTER LARGE OBJECT 5432 OWNER TO alice;
GRANT SELECT ON LARGE OBJECT 5432 TO bob;
for each large object. This is particularly slow at restore time,
especially when there are tens or hundreds of millions of large
objects. From reports and personal experience, such slow restores
seem to be most painful when encountered during pg_upgrade. This
commit teaches pg_dump to instead dump pg_largeobject_metadata and
the corresponding pg_shdepend rows when in binary upgrade mode,
i.e., pg_dump now generates commands like
COPY pg_catalog.pg_largeobject_metadata (oid, lomowner, lomacl) FROM stdin;
5432 16384 {alice=rw/alice,bob=r/alice}
\.
COPY pg_catalog.pg_shdepend (dbid, classid, objid, objsubid, refclassid, refobjid, deptype) FROM stdin;
5 2613 5432 0 1260 16384 o
5 2613 5432 0 1260 16385 a
\.
Testing indicates the COPY approach can be significantly faster.
To do any better, we'd probably need to find a way to copy/link
pg_largeobject_metadata's files during pg_upgrade, which would be
limited to upgrades from >= v16 (since commit 7b378237aa changed
the storage format for aclitem, which is used for
pg_largeobject_metadata.lomacl).
Note that this change only applies to binary upgrade mode (i.e.,
dumps initiated by pg_upgrade) since it inserts rows directly into
catalogs. Also, this optimization can only be used for upgrades
from >= v12 because pg_largeobject_metadata was created WITH OIDS
in older versions, which prevents pg_dump from handling
pg_largeobject_metadata.oid properly. With some extra effort, it
might be possible to support upgrades from older versions, but the
added complexity didn't seem worth it to support versions that will
have been out-of-support for nearly 3 years by the time this change
is released.
Experienced hackers may remember that prior to v12, pg_upgrade
copied/linked pg_largeobject_metadata's files (see commit
12a53c732c). Besides the aforementioned storage format issues,
this approach failed to transfer the relevant pg_shdepend rows, and
pg_dump still had to generate an lo_create() command per large
object so that creating the dependent comments and security labels
worked. We could perhaps adopt a hybrid approach for upgrades from
v16 and newer (i.e., generate lo_create() commands for each large
object, copy/link pg_largeobject_metadata's files, and COPY the
relevant pg_shdepend rows), but further testing is needed.
Reported-by: Hannu Krosing <hannuk@google.com>
Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Hannu Krosing <hannuk@google.com>
Reviewed-by: Nitin Motiani <nitinmotiani@google.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMT0RQSS-6qLH%2BzYsOeUbAYhop3wmQTkNmQpo5--QRDUR%2BqYmQ%40mail.gmail.com
If a MERGE inside a CTE attempts an UPDATE or DELETE on a table with
BEFORE ROW triggers, and a concurrent UPDATE or DELETE happens, the
merge code would fail (crashing in the case of an UPDATE action, and
potentially executing the wrong action for a DELETE action).
This is the same issue that 9321c79c86 attempted to fix, except now
for a MERGE inside a CTE. As noted in 9321c79c86, what needs to happen
is for the trigger code to exit early, returning the TM_Result and
TM_FailureData information to the merge code, if a concurrent
modification is detected, rather than attempting to do an EPQ
recheck. The merge code will then do its own rechecking, and rescan
the action list, potentially executing a different action in light of
the concurrent update. In particular, the trigger code must never call
ExecGetUpdateNewTuple() for MERGE, since that is bound to fail because
MERGE has its own per-action projection information.
Commit 9321c79c86 did this using estate->es_plannedstmt->commandType
in the trigger code to detect that a MERGE was being executed, which
is fine for a plain MERGE command, but does not work for a MERGE
inside a CTE. Fix by passing that information to the trigger code as
an additional parameter passed to ExecBRUpdateTriggers() and
ExecBRDeleteTriggers().
Back-patch as far as v17 only, since MERGE cannot appear inside a CTE
prior to that. Additionally, take care to preserve the trigger ABI in
v17 (though not in v18, which is still in beta).
Bug: #18986
Reported-by: Yaroslav Syrytsia <me@ys.lc>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/18986-e7a8aac3d339fa47@postgresql.org
Backpatch-through: 17
When using a prepared statement to select data from a PostgreSQL foreign
table (postgres_fdw) with the "field = ANY($1)" expression, the operation
is not pushed down when an implicit type case is applied, and a generic plan
is used. This commit resolves the issue by supporting the push-down of
ArrayCoerceExpr, which is used in this case. The support is quite
straightforward and similar to other nods, such as RelabelType.
Discussion: https://postgr.es/m/4f0cea802476d23c6e799512ffd17aff%40postgrespro.ru
Author: Alexander Pyhalov <a.pyhalov@postgrespro.ru>
Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
This is the documented behavior, and it worked that way before
v10. However, addition of the connhost[] array created cases
where conn->connhost[conn->whichhost].port is NULL. The rest
of libpq is careful to substitute DEF_PGPORT[_STR] for a null
or empty port string, but we failed to do so here, leading to
possibly returning NULL. As of v18 that causes psql's \conninfo
command to segfault. Older psql versions avoid that, but it's
pretty likely that other clients have trouble with this,
so we'd better back-patch the fix.
In stable branches, just revert to our historical behavior of
returning an empty string when there was no user-given port
specification. However, it seems substantially more useful and
indeed more correct to hand back DEF_PGPORT_STR in such cases,
so let's make v18 and master do that.
Author: Daniele Varrazzo <daniele.varrazzo@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA+mi_8YTS8WPZPO0PAb2aaGLwHuQ0DEQRF0ZMnvWss4y9FwDYQ@mail.gmail.com
Backpatch-through: 13
We have an assertion to ensure that a command tag has been assigned by
the time we're done executing, but if we happen to execute a command
with no queries, the assertion would fail. Per discussion, rather than
contort things to get a tag assigned, just remove the assertion.
Oversight in 2f9661311b. That commit also retained a comment that
explained logic that had been adjacent to it but diffused into various
places, leaving none apt to keep part of the comment. Remove that part,
and rewrite what remains for extra clarity.
Bug: #18984
Backpatch-through: 13
Reported-by: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michaël Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/18984-0f4778a6599ac3ae@postgresql.org
This commit adds a note to the pg_overexplain page that describes
how to use it (LOAD, session_preload_libraries, or
shared_preload_libraries). The new text is mostly lifted from the
auto_explain page. We should probably consider centralizing this
information in the future.
While at it, add a missing "module" to the opening sentence.
Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/aHVWKM8l8kLlZzgv%40nathan
Backpatch-through: 18
The regression tests for TOAST compression methods are split into two
independent files: one specific to LZ4 and interactions between two
different TOAST compression methods, now called compression_lz4, and a
second one for the "core" cases where only pglz is required.
This saves 300 lines in diffs coming from the alternate output of
compression.sql, required for builds where lz4 is not available. The
new test is skipped if the build does not support LZ4 compression,
relying on an \if and the values reported in pg_settings for the GUC
default_toast_compression, "lz4" being available only under USE_LZ4.
Another benefit of this split is that this facilitates the addition of
more compression methods for TOAST, which are under discussion.
Note the trick added for the tests of the GUC default_toast_compression,
where VERBOSITY = terse is used to avoid the HINT printing the lists of
values available in the GUC, which are environment-dependent. This
makes compression.sql independent of the availability of LZ4.
The code coverage of toast_compression.c is slightly improved, increased
from 89% to 91%, with one new case covered in lz4_compress_datum() for
incompressible data.
Author: Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aDlcU-ym9KfMj9sG@paquier.xyz
The terms used in wait_event_names.txt and lwlock.c were inconsistent
for MultiXactOffsetSLRU and MultiXactMemberSLRU, which could cause joins
between pg_wait_events and pg_stat_activity to fail. lwlock.c is
adjusted in this commit to what the historical name of the event has
always been, and what is documented.
Oversight in 53c2a97a92. 08b9b9e043 has fixed a similar
inconsistency some time ago.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/aHdxN0D0hKXzHFQG@ip-10-97-1-34.eu-west-3.compute.internal
Backpatch-through: 17
The paragraph for introducing INSERT and COPY discussed how a file
could be used for bulk loading with COPY, without actually showing
what the file would look like. This adds a programlisting for the
file contents.
Backpatch to all supported branches since this example has lacked
the file contents since PostgreSQL 7.2.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/158017814191.19852.15019251381150731439@wrigleys.postgresql.org
Backpatch-through: 13
Avoid dependence on setlocale().
strcoll(), etc., are not called directly; all collation-sensitive
calls should go through pg_locale.c and use the appropriate
provider. By setting LC_COLLATE to C, we avoid accidentally depending
on libc behavior when using a different provider.
No behavior change in the backend, but it's possible that some
extensions will be affected. Such extensions should be updated to use
the pg_locale_t APIs.
Discussion: https://postgr.es/m/9875f7f9-50f1-4b5d-86fc-ee8b03e8c162@eisentraut.org
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
We skip dumping constraints together with domains if they are invalid
('separate') so that they appear after data -- but their comments were
dumped together with the domain definition, which in effect leads to the
comment being dumped when the constraint does not yet exist. Delay
them in the same way.
Oversight in 7eca575d1c28; backpatch all the way back.
Author: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxF_C2pe6J_+nPr6C5jf5rQnbYP8XOKr4HM8yHZtp2aQqQ@mail.gmail.com
_bt_first need only store one ScanKeyData struct on the stack for the
purposes of building an IS NOT NULL key based on an implied NOT NULL
constraint. We don't need INDEX_MAX_KEYS-many ScanKeyData structs.
This saves us a little over 2KB in stack space. It's possible that this
has some performance benefit. It also seems simpler and more direct.
It isn't possible for more than a single index attribute to need its own
implied IS NOT NULL key: the first such attribute/IS NOT NULL key always
makes _bt_first stop adding additional boundary keys to startKeys[].
Using INDEX_MAX_KEYS-many ScanKeyData entries was (at best) misleading.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Mircea Cadariu <cadariu.mircea@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wzm=1kJMSZhhTLoM5BPbwQNWxUj-ynOEh=89ptDZAVgauw@mail.gmail.com
Previously, pg_dumpall would still dump global objects such as roles
and tablespaces even when --statistics-only or --no-schema was specified.
Since these global objects are treated as schema-level data, they should
be skipped in these cases.
This commit fixes the issue by ensuring that global objects are not
dumped when either --statistics-only or --no-schema is used.
Author: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/08129593-6f3c-4fb9-94b7-5aa2eefb99b0@oss.nttdata.com
Backpatch-through: 18
This adjusts the wording to match the changes in commits
5987553fde, a233a603ba, and pgweb commit 2d764dbc08.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/aHVo791guQR6uqwT%40nathan
Backpatch-through: 13
This code used a NO_LZ4_SUPPORT() macro to issue an error in the code
paths where LZ4 [de]compression is attempted but the build does not
support it. This commit refactors the code to use a more flexible error
message so as it can be used for other compression methods, where the
method is given in input of macro.
Extracted from a larger patch by the same author.
Author: Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CAFAfj_HX84EK4hyRYw50AOHOcdVi-+FFwAAPo7JHx4aShCvunQ@mail.gmail.com
The pgoutput plugin initializes optional parameters like "binary" with
default values at the start of processing. However, the "origin"
parameter was previously missed and left without explicit initialization.
Although the PGOutputData struct, which holds these settings,
is zero-initialized at allocation (resulting in publish_no_origin field
for "origin" parameter being false by default), this default was not
set explicitly, unlike other parameters.
This commit adds explicit initialization of the "origin" parameter to
ensure consistency and clarity in how defaults are handled.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/d2790f10-238d-4cb5-a743-d9d2a9dd900f@oss.nttdata.com
The pgoutput plugin options are described in the logical streaming
replication protocol documentation, but their default values were
previously not mentioned. This made it less convenient for users,
for example, when specifying those options to use pg_recvlogical
with pgoutput plugin.
This commit adds the explanations of the default values for pgoutput
options to improve clarity and usability.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/d2790f10-238d-4cb5-a743-d9d2a9dd900f@oss.nttdata.com
Previously, the documentation described the streaming option as a boolean,
which is outdated since it's no longer a boolean as of protocol version 4.
This could confuse users.
This commit updates the description to remove the "boolean" reference and
clearly list the valid values for the streaming option.
Back-patch to v16, where the streaming option changed to a non-boolean.
Author: Euler Taveira <euler@eulerto.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/8d21fb98-5c25-4dee-8387-e5a62b01ea7d@app.fastmail.com
Backpatch-through: 16
The last_vacuum and vacuum_count fields in pg_stat_all_tables already
state that they do not include VACUUM FULL. However, total_vacuum_time,
which also excludes VACUUM FULL, did not mention this. This could
mislead users into thinking VACUUM FULL time is included.
To address this, this commit updates the documentation for
pg_stat_all_tables to explicitly state that total_vacuum_time does not
count VACUUM FULL.
Back-patched to v18, where total_vacuum_time was introduced.
Additionally, this commit clarifies that n_ins_since_vacuum also
excludes VACUUM FULL. Although n_ins_since_vacuum was added in v13,
we are not back-patching this change to stable branches, as it is
a documentation improvement, not a bug fix.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/2ac375d1-591b-4f1b-a2af-f24335567866@oss.nttdata.com
Backpatch-through: 18
The grammar was a little shaky and confusing here, so word-smith it
a bit. Also, adjust the comments in pg_ident.conf.sample to use the
same terminology as the SGML docs, in particular "DATABASE-USERNAME"
not "PG-USERNAME".
Back-patch appropriate subsets. I did not risk changing
pg_ident.conf.sample in released branches, but it still seems OK
to change it in v18.
Reported-by: Alexey Shishkin <alexey.shishkin@enterprisedb.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/175206279327.3157504.12519088928605422253@wrigleys.postgresql.org
Backpatch-through: 13
It's impossible to reach this case with either ra or rb being
WJB_DONE, because our earlier checks that the structure and
length of the inputs match should guarantee that we reach their
ends simultaneously. However, the comment completely fails to
explain this, and the Asserts don't cover it either. The comment
is pretty obscure anyway, so rewrite it, and extend the Asserts
to reject WJB_DONE.
This is only cosmetic, so no need for back-patch.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/0c623e8a204187b87b4736792398eaf1@postgrespro.ru
Because not every path through JsonbIteratorNext() sets val->type,
some compilers complain that compareJsonbContainers() is comparing
possibly-uninitialized values. The paths that don't set it return
WJB_DONE, WJB_END_ARRAY, or WJB_END_OBJECT, so it's clear by
manual inspection that the "(ra == rb)" code path is safe, and
indeed we aren't seeing warnings about that. But the (ra != rb)
case is much less obviously safe. In Assert-enabled builds it
seems that the asserts rejecting WJB_END_ARRAY and WJB_END_OBJECT
persuade gcc 15.x not to warn, which makes little sense because
it's impossible to believe that the compiler can prove of its
own accord that ra/rb aren't WJB_DONE here. (In fact they never
will be, so the code isn't wrong, but why is there no warning?)
Without Asserts, the appearance of warnings is quite unsurprising.
We discussed fixing this by converting those two Asserts into
pg_assume, but that seems not very satisfactory when it's so unclear
why the compiler is or isn't warning: the warning could easily
reappear with some other compiler version. Let's fix it in a less
magical, more future-proof way by changing JsonbIteratorNext()
so that it always does set val->type. The cost of that should be
pretty negligible, and it makes the function's API spec less squishy.
Reported-by: Erik Rijkers <er@xs4all.nl>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/988bf1bc-3f1f-99f3-bf98-222f1cd9dc5e@xs4all.nl
Discussion: https://postgr.es/m/0c623e8a204187b87b4736792398eaf1@postgrespro.ru
Backpatch-through: 13
Minor wordsmithing of the func.sgml paragraph describing
statement_timestamp() and allied functions: don't switch between
"statement" and "command" when those are being used to mean about
the same thing.
Also, add some text to protocol.sgml describing the perhaps-surprising
behavior these functions have in a multi-statement Query message.
Reported-by: P M <petermittere@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/175223006802.3157505.14764328206246105568@wrigleys.postgresql.org
Backpatch-through: 13
Previously, when pressing Tab after GRANT or REVOKE ... ON LARGE OBJECT
or ON FOREIGN SERVER, TO or FROM was incorrectly suggested by psql's
tab-completion. This was not appropriate, as those clauses are not valid
at that point.
This commit fixes the issue by preventing TO and FROM from being offered
immediately after those specific GRANT/REVOKE statements.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/20250408122857.b2b06dde4e6a08290af02336@sraoss.co.jp
Previously, amcheck could produce misleading error message when
a partitioned index was passed to functions like bt_index_check().
For example, bt_index_check() with a partitioned btree index produced:
ERROR: expected "btree" index as targets for verification
DETAIL: Relation ... is a btree index.
Reporting "expected btree index as targets" even when the specified
index was a btree was confusing. In this case, the function should fail
since the partitioned index specified is not valid target. This commit
improves the error reporting to better reflect this actual issue. Now,
bt_index_check() with a partitioned index, the error message is:
ERROR: expected index as targets for verification
DETAIL: This operation is not supported for partitioned indexes.
This commit also applies the following minor changes:
- Simplifies index_checkable() by using get_am_name() to retrieve
the access method name.
- Changes index_checkable() from extern to static, as it is only used
in verify_common.c.
- Updates the error code for invalid indexes to
ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE,
aligning with usage in similar modules like pgstattuple.
Author: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/8829854bbfc8635ddecd0846bb72dfda@oss.nttdata.com
This new psql variable can be used to check which service file has been
used for a connection. Like other variables, this can be set in a
PROMPT or reported by an \echo, like these commands:
\echo :SERVICEFILE
\set PROMPT1 '=(%:SERVICEFILE:)%# '
This relies on commits 092f3c63ef and fef6da9e9c to retrieve this
information from the connection's PQconninfoOption.
Author: Ryo Kanbayashi <kanbayashi.dev@gmail.com>
Discussion: https://postgr.es/m/CAKkG4_nCjx3a_F3gyXHSPWxD8Sd8URaM89wey7fG_9g7KBkOCQ@mail.gmail.com
If the system-name field of a pg_ident.conf line is a regex
containing capturing parentheses, you can write \1 in the
user-name field to represent the captured part of the system
name. But what happens if you write \1 more than once?
The only reasonable expectation IMO is that each \1 gets
replaced, but presently our code replaces only the first.
Fix that.
Also, improve the tests for this feature to exercise cases
where a non-empty string needs to be substituted for \1.
The previous testing didn't inspire much faith that it
was verifying correct operation of the substitution code.
Given the lack of field complaints about this, I don't
feel a need to back-patch.
Reported-by: David G. Johnston <david.g.johnston@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAKFQuwZu6kZ8ZPvJ3pWXig+6UX4nTVK-hdL_ZS3fSdps=RJQQQ@mail.gmail.com
This commit adds the possibility to specify a service file in a
connection string, using a new option called "servicefile". The parsing
of the service file happens so as things are done in this order of
priority:
- The servicefile connection option.
- Environment variable PGSERVICEFILE.
- Default path, depending on the HOME environment.
Note that in the last default case, we need to fill in "servicefile" for
the connection's PQconninfoOption to let clients know which service file
has been used for the connection. Some TAP tests are added, with a few
tweaks required for Windows when using URIs or connection option values,
for the location paths.
Author: Torsten Förtsch <tfoertsch123@gmail.com>
Co-authored-by: Ryo Kanbayashi <kanbayashi.dev@gmail.com>
Discussion: https://postgr.es/m/CAKkG4_nCjx3a_F3gyXHSPWxD8Sd8URaM89wey7fG_9g7KBkOCQ@mail.gmail.com
The values of the "result" variables in these functions are
always integers; using a float8 variable accomplishes nothing
except to incur useless conversions to and from float. While
that wastes a few nanoseconds, these functions aren't all that
time-critical. But it seems worth fixing to remove possible
reader confusion.
Also, in the case of date2isoyear(), "result" is a very poorly
chosen variable name because it is *not* the function's result.
Rename it to "week", and do the same in date2isoweek() for
consistency.
Since this is mostly cosmetic, there seems little need
for back-patch.
Author: Sergey Fukanchik <s.fukanchik@postgrespro.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/6323a-68726500-1-7def9d00@137821581
method_worker.c installed SignalHandlerForConfigReload, but it failed to
actually process reload requests. That hasn't yet produced any concrete
problem reports in terms of GUC changes it should have cared about in
v18, but it was inconsistent.
It did cause problems for a couple of patches in development that need
IO workers to react to ALTER SYSTEM + pg_reload_conf(). Fix extracted
from one of those patches.
Back-patch to 18.
Reported-by: Dmitry Dolgov <9erthalion6@gmail.com>
Discussion: https://postgr.es/m/sh5uqe4a4aqo5zkkpfy5fobe2rg2zzouctdjz7kou4t74c66ql%40yzpkxb7pgoxf
In an ancient ancestor of this code, the postmaster assigned IDs to IO
workers. Now it tracks them in an unordered array and doesn't know
their IDs, so it might be confusing to readers that it still referred to
their indexes as IDs.
No change in behavior, just variable name and error message cleanup.
Back-patch to 18.
Discussion: https://postgr.es/m/CA%2BhUKG%2BwbaZZ9Nwc_bTopm4f-7vDmCwLk80uKDHj9mq%2BUp0E%2Bg%40mail.gmail.com
Adopt PgAioXXX convention for pgaio module type names. Rename a
function that didn't use a pgaio_worker_ submodule prefix. Rename the
internal submit function's arguments to match the indirectly relevant
function pointer declaration and nearby examples. Rename the array of
handle IDs in PgAioSubmissionQueue to sqes, a term of art seen in the
systems it emulates, also clarifying that they're not IO handle
pointers as the old name might imply.
No change in behavior, just type, variable and function name cleanup.
Back-patch to 18.
Discussion: https://postgr.es/m/CA%2BhUKG%2BwbaZZ9Nwc_bTopm4f-7vDmCwLk80uKDHj9mq%2BUp0E%2Bg%40mail.gmail.com
getid() and putid(), which parse and deparse role names within ACL
input/output, applied isalnum() to see if a character within a role
name requires quoting. They did this even for non-ASCII characters,
which is problematic because the results would depend on encoding,
locale, and perhaps even platform. So it's possible that putid()
could elect not to quote some string that, later in some other
environment, getid() will decide is not a valid identifier, causing
dump/reload or similar failures.
To fix this in a way that won't risk interoperability problems
with unpatched versions, make getid() treat any non-ASCII as a
legitimate identifier character (hence not requiring quotes),
while making putid() treat any non-ASCII as requiring quoting.
We could remove the resulting excess quoting once we feel that
no unpatched servers remain in the wild, but that'll be years.
A lesser problem is that getid() did the wrong thing with an input
consisting of just two double quotes (""). That has to represent an
empty string, but getid() read it as a single double quote instead.
The case cannot arise in the normal course of events, since we don't
allow empty-string role names. But let's fix it while we're here.
Although we've not heard field reports of problems with non-ASCII
role names, there's clearly a hazard there, so back-patch to all
supported versions.
Reported-by: Peter Eisentraut <peter@eisentraut.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3792884.1751492172@sss.pgh.pa.us
Backpatch-through: 13
Commit b0635bfda split off the CPPFLAGS/LDFLAGS/LDLIBS for libcurl into
their own separate Makefile variables, but I neglected to move the
existing AC_CHECKs for Curl into a place where they would make use of
those variables. They instead tested the system libcurl, which 1) is
unhelpful if a different Curl is being used for the build and 2) will
fail the build entirely if no system libcurl exists. Correct the order
of operations here.
Reported-by: Ivan Kush <ivan.kush@tantorlabs.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Ivan Kush <ivan.kush@tantorlabs.com>
Discussion: https://postgr.es/m/8a611028-51a1-408c-b592-832e2e6e1fc9%40tantorlabs.com
Backpatch-through: 18
This option, which is disabled by default, can be used to request
the checkpoint also flush dirty buffers of unlogged relations. As
with the MODE option, the server may consolidate the options for
concurrently requested checkpoints. For example, if one session
uses (FLUSH_UNLOGGED FALSE) and another uses (FLUSH_UNLOGGED TRUE),
the server may perform one checkpoint with FLUSH_UNLOGGED enabled.
Author: Christoph Berg <myon@debian.org>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/aDnaKTEf-0dLiEfz%40msg.df7cb.de
This option may be set to FAST (the default) to request the
checkpoint be completed as fast as possible, or SPREAD to request
the checkpoint be spread over a longer interval (based on the
checkpoint-related configuration parameters). Note that the server
may consolidate the options for concurrently requested checkpoints.
For example, if one session requests a "fast" checkpoint and
another requests a "spread" checkpoint, the server may perform one
"fast" checkpoint.
Author: Christoph Berg <myon@debian.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/aDnaKTEf-0dLiEfz%40msg.df7cb.de
This commit adds the boilerplate code for supporting a list of
options in CHECKPOINT commands. No actual options are supported
yet, but follow-up commits will add support for MODE and
FLUSH_UNLOGGED. While at it, this commit refactors the code for
executing CHECKPOINT commands to its own function since it's about
to become significantly larger.
Author: Christoph Berg <myon@debian.org>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Discussion: https://postgr.es/m/aDnaKTEf-0dLiEfz%40msg.df7cb.de
The new name more accurately reflects the effects of this flag on a
requested checkpoint. Checkpoint-related log messages (i.e., those
controlled by the log_checkpoints configuration parameter) will now
say "fast" instead of "immediate", too. Likewise, references to
"immediate" checkpoints in the documentation have been updated to
say "fast". This is preparatory work for a follow-up commit that
will add a MODE option to the CHECKPOINT command.
Author: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/aDnaKTEf-0dLiEfz%40msg.df7cb.de
The new name more accurately relects the effects of this flag on a
requested checkpoint. Checkpoint-related log messages (i.e., those
controlled by the log_checkpoints configuration parameter) will now
say "flush-unlogged" instead of "flush-all", too. This is
preparatory work for a follow-up commit that will add a
FLUSH_UNLOGGED option to the CHECKPOINT command.
Author: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/aDnaKTEf-0dLiEfz%40msg.df7cb.de
We already forced LC_MESSAGES to C in order to get consistent
message output, but that isn't enough to stabilize messages
that include %f or similar formatting.
I'm a bit surprised that this hasn't come up before. Perhaps
we ought to back-patch this change, but I'll refrain for now.
Reported-by: Bernd Helmle <mailings@oopsware.de>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/6f024eaa7885eddf5e0eb4ba1d095fbc7146519b.camel@oopsware.de
Previously, the check_hook functions for max_slot_wal_keep_size and
idle_replication_slot_timeout would incorrectly raise an ERROR for values
set in postgresql.conf during upgrade, even though those values were not
actively used in the upgrade process.
To prevent logical slot invalidation during upgrade, we used to set
special values for these GUCs. Now, instead of relying on those values, we
directly prevent WAL removal and logical slot invalidation caused by
max_slot_wal_keep_size and idle_replication_slot_timeout.
Note: PostgreSQL 17 does not include the idle_replication_slot_timeout
GUC, so related changes were not backported.
BUG #18979
Reported-by: jorsol <jorsol@gmail.com>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed by: vignesh C <vignesh21@gmail.com>
Reviewed by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/219561.1751826409@sss.pgh.pa.us
Discussion: https://postgr.es/m/18979-a1b7fdbb7cd181c6@postgresql.org
In the description of StartupMessage, the protocol version was left
3.0. Instead of just updating it, this commit removes the hard coded
protocol version and shows the numbers as an example. This makes that
the part of the doc does not need to be updated when the version is
changed in the future.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Tatsuo Ishii <ishii@postgresql.org>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://postgr.es/m/20250626.155608.568829483879866256.ishii%40postgresql.org
This commit updates the documentation to clarify that "idle" in
idle_replication_slot_timeout means the replication slot is inactive,
that is, not currently used by any replication connection.
Without this clarification, "idle" could be misinterpreted to mean
that the slot is not advancing or that no data is being streamed,
even if a connection exists.
Back-patch to v18 where idle_replication_slot_timeout was added.
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Gunnar Morling <gunnar.morling@googlemail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CADGJaX_0+FTguWpNSpgVWYQP_7MhoO0D8=cp4XozSQgaZ40Odw@mail.gmail.com
Backpatch-through: 18
Previously, the idle_replication_slot_timeout parameter used minutes
as its unit, based on the assumption that values would typically exceed
one minute in production environments. However, this caused unexpected
behavior: specifying a value below 30 seconds would round down to 0,
effectively disabling the timeout. This could be surprising to users.
To allow finer-grained control and avoid such confusion, this commit changes
the unit of idle_replication_slot_timeout to seconds. Larger values can
still be specified easily using standard time suffixes, for example,
'24h' for 24 hours.
Back-patch to v18 where idle_replication_slot_timeout was added.
Reported-by: Gunnar Morling <gunnar.morling@googlemail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CADGJaX_0+FTguWpNSpgVWYQP_7MhoO0D8=cp4XozSQgaZ40Odw@mail.gmail.com
Backpatch-through: 18
When sslkeylogfile has been set but the file fails to open in an
otherwise successful connection, the log entry added to the conn
object is never printed. Instead print the error on stderr for
increased visibility. This is a debugging tool so using stderr
for logging is appropriate. Also while there, remove the umask
call in the callback as it's not useful.
Issues noted by Peter Eisentraut in post-commit review, backpatch
down to 18 when support for sslkeylogfile was added
Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/70450bee-cfaa-48ce-8980-fc7efcfebb03@eisentraut.org
Backpatch-through: 18
Commit a45c78e328 moved large object metadata from SECTION_PRE_DATA
to SECTION_DATA but neglected to move PRIO_LARGE_OBJECT in
dbObjectTypePriorities accordingly. While this hasn't produced any
known live bugs, it causes problems for a proposed patch that
optimizes upgrades with many large objects. Fixing the priority
might also make the topological sort step marginally faster by
reducing the number of ordering violations that have to be fixed.
Reviewed-by: Nitin Motiani <nitinmotiani@google.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/aBkQLSkx1zUJ-LwJ%40nathan
Discussion: https://postgr.es/m/aG_5DBCjdDX6KAoD%40nathan
Backpatch-through: 17
During the development cycle of v18, btree_gist has been bumped once to
1.8 for the addition of translate_cmptype support functions (originally
7406ab623f, renamed in 32edf732e8). 1.9 has added sortsupport
functions (e4309f73f6).
There is no need for two version bumps in a module for a single major
release of PostgreSQL. This commit unifies both upgrades to a single
SQL script, downgrading btree_gist to 1.8.
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/13c61807-f702-4afe-9a8d-795e2fd40923@illuminatedcomputing.com
Backpatch-through: 18
This function can be used to retrieve the information about all the
injection points attached to a cluster, providing coverage for
InjectionPointList() introduced in 7b2eb72b1b.
The original proposal turned around a system function, but that would
not be backpatchable to stable branches. It was also a bit weird to
have a system function that fails depending on if the build allows
injection points or not.
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: Rahila Syed <rahilasyed90@gmail.com>
Discussion: https://postgr.es/m/Z_xYkA21KyLEHvWR@paquier.xyz
Since b0635bfda, libpq uses dlopen() and related functions. On some
platforms these are not supplied by libc, but by a separate library
libdl, in which case we need to make sure that that dependency is
known to the linker. Meson seems to take care of that automatically,
but the Makefile didn't cater for it.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1328170.1752082586@sss.pgh.pa.us
Backpatch-through: 18
Increase the size of the "direct" histogram to 10K elements,
so that we can precisely track loop times up to 10 microseconds.
(Going further than that seems pretty uninteresting, even for
very old and slow machines.)
Relabel "Per loop time" as "Average loop time" for clarity.
Pre-zero the histogram arrays to make sure that they are loaded
into processor cache and any copy-on-write overhead has happened
before we enter the timing loop. Also use unlikely() to keep
the compiler from thinking that the clock-went-backwards case
is part of the hot loop. Neither of these hacks made a lot of
difference on my own machine, but they seem like they might help
on some platforms.
Discussion: https://postgr.es/m/be0339cc-1ae1-4892-9445-8e6d8995a44d@eisentraut.org
This commit adds a new system view that provides information about
entries in the dynamic shared memory (DSM) registry. Specifically,
it returns the name, type, and size of each entry. Note that since
we cannot discover the size of dynamic shared memory areas (DSAs)
and hash tables backed by DSAs (dshashes) without first attaching
to them, the size column is left as NULL for those.
Bumps catversion.
Author: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Sungwoo Chang <swchangdev@gmail.com>
Discussion: https://postgr.es/m/4D445D3E-81C5-4135-95BB-D414204A0AB4%40gmail.com
Commit c273d9d8ce reworked tab-completion of COPY and \copy in psql
and added support for completing options within WITH clauses. However,
the same COPY options were suggested for both COPY TO and COPY FROM
commands, even though some options are only valid for one or the
other.
This commit separates the COPY options for COPY FROM and COPY TO
commands to provide more accurate auto-completion suggestions.
Back-patch to v14 where tab-completion for COPY and \copy options
within WITH clauses was first supported.
Author: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Reviewed-by: Yugo Nagata <nagata@sraoss.co.jp>
Discussion: https://postgr.es/m/079e7a2c801f252ae8d522b772790ed7@oss.nttdata.com
Backpatch-through: 14
This commit enhances psql's tab completion to support TO/FROM
after "GRANT/REVOKE ... ON LARGE OBJECT ...". Additionally,
since "ALTER DEFAULT PRIVILEGES" now supports large objects,
tab completion is also updated for "GRANT/REVOKE ... ON LARGE OBJECTS"
with TO/FROM.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Yugo Nagata <nagata@sraoss.co.jp>
Discussion: https://postgr.es/m/ade0ab29-777f-47f6-9d0d-1af67728a86e@oss.nttdata.com
This test corresponds to the case of a "service" defined in a service
file, that libpq is not able to support in parseServiceFile().
This has come up during the review of a patch to add more features in
this area, useful on its own. Piece extracted from a larger patch by
the same author.
Author: Ryo Kanbayashi <kanbayashi.dev@gmail.com>
Discussion: https://postgr.es/m/Zz2AE7NKKLIZTtEh@paquier.xyz
Clarified that the failover steps apply to a specific PostgreSQL subscriber
and added guidance for verifying replication slot synchronization during
planned failover. Additionally, corrected the standby query to avoid false
positives by checking invalidation_reason IS NULL instead of conflicting.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Author: Shveta Malik <shveta.malik@gmail.com>
Backpatch-through: 17, where it was introduced
Discussion: https://www.postgresql.org/message-id/CAExHW5uiZ-fF159=jwBwPMbjZeZDtmcTbN+hd4mrURLCg2uzJg@mail.gmail.com
This routine has been introduced as a shortcut to be able to retrieve a
service name from an active connection, for psql. Per discussion, and
as it is only used by psql, let's remove it to not clutter the libpq API
more than necessary.
The logic in psql is replaced by lookups of PQconninfoOption for the
active connection, instead, updated each time the variables are synced
by psql, the prompt shortcut relying on the variable synced.
Reported-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250706161319.c1.nmisch@google.com
Backpatch-through: 18
What we want in these places is "xmlChar *volatile ptr",
not "volatile xmlChar *ptr". The former means that the
pointer variable itself needs to be treated as volatile,
while the latter says that what it points to is volatile.
Since the point here is to ensure that the pointer variables
don't go crazy after a longjmp, it's the former semantics
that we need. The misplacement of "volatile" also led
to needing to cast away volatile in some places.
Also fix a number of places where variables that are assigned to
within a PG_TRY and then used after it were not initialized or
not marked as volatile. (A few buildfarm members were issuing
"may be used uninitialized" warnings about some of these variables,
which is what drew my attention to this area.) In most cases
these variables were being set as the last step within the PG_TRY
block, which might mean that we could get away without the "volatile"
marking. But doing that seems unsafe and is definitely not per our
coding conventions.
These problems seem to have come in with 732061150, so no need
for back-patch.
xmltotext_with_options() did not consider the possibility that
pg_xml_init() could fail --- most likely due to OOM. If that
happened, the already-parsed xmlDoc structure would be leaked.
Oversight in commit 483bdb2af.
Bug: #18981
Author: Dmitry Kovalenko <d.kovalenko@postgrespro.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18981-9bc3c80f107ae925@postgresql.org
Backpatch-through: 16
Most of our platforms have better-than-microsecond timing resolution,
so the original definition of this program is getting less and less
useful. Make it report nanoseconds not microseconds. Also, add a
second output table that reports the exact observed timing durations,
up to a limit of 1024 ns; and be sure to report the largest observed
duration.
The documentation for this program included a lot of system-specific
details that now seem largely obsolete. Move all that text to the
PG wiki, where perhaps it will be easier to maintain and update.
Also, improve the TAP test so that it actually runs a short standard
run, allowing most of the code to be exercised; its coverage before
was abysmal.
Author: Hannu Krosing <hannuk@google.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/be0339cc-1ae1-4892-9445-8e6d8995a44d@eisentraut.org
Per buildfarm member culicidae, the query checking for stats reported by
the WAL summarizer related to WAL reads is proving to be unstable.
Instead of a one-time query, this commit replaces the logic with a
polling query checking for the WAL read stats, making the test more
reliable on machines that could be slow with the stats reports.
This test has been introduced in f4694e0f35, so backpatch down to v18.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/f35ba3db-fca7-4693-bc35-6db64488e4b1@gmail.com
Backpatch-through: 18
By default io_uring creates a shared memory mapping for each io_uring
instance, leading to a large number of memory mappings. Unfortunately a large
number of memory mappings slows things down, backend exit is particularly
affected. To address that, newer kernels (6.5) support using user-provided
memory for the memory. By putting the relevant memory into shared memory we
don't need any additional mappings.
On a system with a new enough kernel and liburing, there is no discernible
overhead when doing a pgbench -S -C anymore.
Reported-by: MARK CALLAGHAN <mdcallag@gmail.com>
Reviewed-by: "Burd, Greg" <greg@burd.me>
Reviewed-by: Jim Nasby <jnasby@upgrade.com>
Discussion: https://postgr.es/m/CAFbpF8OA44_UG+RYJcWH9WjF7E3GA6gka3gvH6nsrSnEe9H0NA@mail.gmail.com
Backpatch-through: 18
For an ordered Append or MergeAppend, we need to inject an explicit
sort into any subpath that is not already well enough ordered.
Currently, only explicit full sorts are considered; incremental sorts
are not yet taken into account.
In this patch, for subpaths of an ordered Append or MergeAppend, we
choose to use explicit incremental sort if it is enabled and there are
presorted keys.
The rationale is based on the assumption that incremental sort is
always faster than full sort when there are presorted keys, a premise
that has been applied in various parts of the code. In addition, the
current cost model tends to favor incremental sort as being cheaper
than full sort in the presence of presorted keys, making it reasonable
not to consider full sort in such cases.
No backpatch as this could result in plan changes.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4_V7a2enTR+T3pOY_YZ-FU8ZsFYym2swOz4jNMqmSgyuw@mail.gmail.com
In b0635bfda, I added an early header check to the Meson OAuth support,
which was intended to duplicate the later checks for
HAVE_SYS_[EVENT|EPOLL]_H. However, I implemented the new test via
check_header() -- which tries to compile -- rather than has_header(),
which just looks for the file's existence.
The distinction matters on OpenBSD, where <sys/event.h> can't be
compiled without including prerequisite headers, so -Dlibcurl=enabled
failed on that platform. Switch to has_header() to fix this.
Note that reviewers expressed concern about the difference between our
Autoconf feature tests (which compile headers) and our Meson feature
tests (which do not). I'm not opposed to aligning the two, but I want to
avoid making bigger changes as part of this fix.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/flat/CAOYmi+kdR218ke2zu74oTJvzYJcqV1MN5=mGAPqZQuc79HMSVA@mail.gmail.com
Backpatch-through: 18
Commit 2633dae2e4 added some zero padding to various LSNs output
routines so that the low word is always 8 hex digits long, for easy
human consumption. This included the pg_lsn datatype, which breaks the
pg_upgrade test when it compares the pg_dump output of an older version.
Silence this problem by setting the pg_lsn columns to NULL before the
upgrade.
Discussion: https://postgr.es/m/202507071504.xm2r26u7lmzr@alvherre.pgsql
pl/pgsql's notion of an "expression" is very broad, encompassing
any SQL SELECT query that returns a single column and no more than
one row. So there are cases, for example evaluation of an aggregate
function, where the query involves significant work and it'd be useful
to run it with parallel workers. This used to be possible, but
commits 3eea7a0c9 et al unintentionally disabled it.
The simplest fix is to make exec_eval_expr() pass maxtuples = 0
rather than 2 to exec_run_select(). This avoids the new rule that
we will never use parallelism when a nonzero "count" limit is passed
to ExecutorRun(). (Note that the pre-3eea7a0c9 behavior was indeed
unsafe, so reverting that rule is not in the cards.) The reason
for passing 2 before was that exec_eval_expr() will throw an error
if it gets more than one returned row, so we figured that as soon
as we have two rows we know that will happen and we might as well
stop running the query. That choice was cost-free when it was made;
but disabling parallelism is far from cost-free, so now passing 2
amounts to optimizing a failure case at the expense of useful cases.
An expression query that can return more than one row is certainly
broken. People might now need to wait a bit longer to discover such
breakage; but hopefully few will use enormously expensive cases as
their first test of new pl/pgsql logic.
Author: Dipesh Dhameliya <dipeshdhameliya125@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CABgZEgdfbnq9t6xXJnmXbChNTcWFjeM_6nuig41tm327gYi2ig@mail.gmail.com
Backpatch-through: 13
Functions to bootstrap and zero pages in various SLRU callers were
fairly duplicative. We can slash almost two hundred lines with a couple
of simple helpers:
- SimpleLruZeroAndWritePage: Does the equivalent of SimpleLruZeroPage
followed by flushing the page to disk
- XLogSimpleInsertInt64: Does a XLogBeginInsert followed by XLogInsert
of a trivial record whose data is just an int64.
Author: Evgeny Voropaev <evgeny.voropaev@tantorlabs.com>
Reviewed by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/flat/97820ce8-a1cd-407f-a02b-47368fadb14b%40tantorlabs.com
This commit standardizes the output format for LSNs to ensure consistent
representation across various tools and messages. Previously, LSNs were
inconsistently printed as `%X/%X` in some contexts, while others used
zero-padding. This often led to confusion when comparing.
To address this, the LSN format is now uniformly set to `%X/%08X`,
ensuring the lower 32-bit part is always zero-padded to eight
hexadecimal digits.
Author: Japin Li <japinli@hotmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/ME0P300MB0445CA53CA0E4B8C1879AF84B641A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
This refactoring is a follow-up of the work done in 5a1dfde833, that
has switched 2PC file names to use FullTransactionIds when written on
disk. This will help with the integration of a follow-up solution
related to the handling of two-phase files during recovery, to address
older defects while reading these from disk after a crash.
This change is useful in itself as it reduces the need to build the
file names from epoch numbers and TransactionIds, because we can use
directly FullTransactionIds from which the 2PC file names are guessed.
So this avoids a lot of back-and-forth between the FullTransactionIds
retrieved from the file names and how these are passed around in the
internal 2PC logic.
Note that the core of the change is the use of a FullTransactionId
instead of a TransactionId in GlobalTransactionData, that tracks 2PC
file information in shared memory. The change in TwoPhaseCallback makes
this commit unfit for stable branches.
Noah has contributed a good chunk of this patch. I have spent some time
on it as well while working on the issues with two-phase state files and
recovery.
Author: Noah Misch <noah@leadboat.com>
Co-Authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/Z5sd5O9JO7NYNK-C@paquier.xyz
Discussion: https://postgr.es/m/20250116205254.65.nmisch@google.com
libxml2 has deprecated the members of xmlBuffer, and it is recommended
to access them with dedicated routines. We have only one case in the
tree where this shows an impact: xml2/xpath.c where "content" was
getting directly accessed. The rest of the code looked fine, checking
the PostgreSQL code with libxml2 close to the top of its "2.14" branch.
xmlBufferContent() exists since year 2000 based on a check of the
upstream libxml2 tree, so let's switch to it.
Like 400928b83b, backpatch all the way down as this can have an impact
on all the branches already released once newer versions of libxml2 get
more popular.
Reported-by: Walid Ibrahim <walidib@amazon.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/aGdSdcR4QTjEHX6s@paquier.xyz
Backpatch-through: 13
When estimating the cost/size of a pre-sorted path for a given upper
relation using local stats, this function dereferences the passed-in
PgFdwPathExtraData pointer without checking that it is not NULL. But
that is not a bug as the pointer is guaranteed to be non-NULL in that
case; to avoid confusion, add an Assert to ensure that it is not NULL
before dereferencing it.
Reported-by: Ranier Vilela <ranier.vf@gmail.com>
Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Ranier Vilela <ranier.vf@gmail.com>
Discussion: https://postgr.es/m/CAEudQArgiALbV1akQpeZOgim7XP05n%3DbDP1%3DTcOYLA43nRX_vA%40mail.gmail.com
With tables defined like this,
CREATE TABLE ip (id int PRIMARY KEY);
CREATE TABLE ic (id int) INHERITS (ip);
ALTER TABLE ic ALTER id DROP NOT NULL;
pg_upgrade fails during the schema restore phase due to this error:
ERROR: column "id" in child table must be marked NOT NULL
This can only be fixed by marking the child column as NOT NULL before
the upgrade, which could take an arbitrary amount of time (because ic's
data must be scanned). Have pg_upgrade's check mode warn if that
condition is found, so that users know what to adjust before running the
upgrade for real.
Author: Ali Akbar <the.apaan@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/CACQjQLoMsE+1pyLe98pi0KvPG2jQQ94LWJ+PTiLgVRK4B=i_jg@mail.gmail.com
Commit d70b17636d introduced the IndexCheckableCallback typedef for
a callback function, but it was never used. This commit removes
the unused typedef to clean up dead code.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/e1ea4e14-3b21-4e01-a5f2-0686883265df@oss.nttdata.com
Attempting to use commit timestamps during bootstrapping leads to an
assertion failure, that can be reached for example with an initdb -c
that enables track_commit_timestamp. It makes little sense to register
a commit timestamp for a BootstrapTransactionId, so let's disable the
activation of the module in this case.
This problem has been independently reported once by each author of this
commit. Each author has proposed basically the same patch, relying on
IsBootstrapProcessingMode() to skip the use of commit_ts during
bootstrap. The test addition is a suggestion by me, and is applied down
to v16.
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Author: Andy Fan <zhihuifan1213@163.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/OSCPR01MB14966FF9E4C4145F37B937E52F5102@OSCPR01MB14966.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/87plejmnpy.fsf@163.com
Backpatch-through: 13
Previously, truncating a temporary relation required scanning the entire
local buffer pool once per relation fork to invalidate buffers. This could
be slow, especially with a large local buffers, as the scan was repeated
multiple times.
A similar issue with regular tables (shared buffers) was addressed in
commit 6d05086c0a by scanning the buffer pool only once for all forks.
This commit applies the same optimization to temporary relations,
improving truncation performance.
Author: Daniil Davydov <3danissimo@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Discussion: https://postgr.es/m/CAJDiXggNqsJOH7C5co4jA8nDk8vw-=sokyh5s1_TENWnC6Ofcg@mail.gmail.com
If, after removal of useless null-constant arguments, a CoalesceExpr
has exactly one remaining argument, we can just take that argument as
the result, without bothering to wrap a new CoalesceExpr around it.
This isn't likely to produce any great improvement in runtime per se,
but it can lead to better plans since the planner no longer has to
treat the expression as non-strict.
However, there were a few regression test cases that intentionally
wrote COALESCE(x) as a shorthand way of creating a non-strict
subexpression. To avoid ruining the intent of those tests, write
COALESCE(x,x) instead. (If anyone ever proposes de-duplicating
COALESCE arguments, we'll need another iteration of this arms race.
But it seems pretty unlikely that such an optimization would be
worthwhile.)
Author: Maksim Milyutin <maksim.milyutin@tantorlabs.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/8e8573c3-1411-448d-877e-53258b7b2be0@tantorlabs.ru
Using the just-added infrastructure, extend btree_gin to support
cross-type operators in its other opclasses. All of the cross-type
comparison operators supported by the core btree opclasses for
these datatypes are now available for btree_gin indexes as well.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Discussion: https://postgr.es/m/262624.1738460652@sss.pgh.pa.us
Extend the infrastructure in btree_gin.c to permit cross-type
operators, and add the code to support them for the int2, int4,
and int8 opclasses. (To keep this patch digestible, I left
the other datatypes for a separate patch.) This improves the
usability of btree_gin indexes by allowing them to support the
same set of queries that a regular btree index does.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Discussion: https://postgr.es/m/262624.1738460652@sss.pgh.pa.us
Previous commits invented timestamp2timestamptz_opt_overflow,
date2timestamp_opt_overflow, and date2timestamptz_opt_overflow
functions to perform non-error-throwing conversions between
datetime types. This patch completes the set by adding
timestamp2date_opt_overflow, timestamptz2date_opt_overflow,
and timestamptz2timestamp_opt_overflow.
In addition, adjust timestamp2timestamptz_opt_overflow so that it
doesn't throw error if timestamp2tm fails, but treats that as an
overflow case. The situation probably can't arise except with an
invalid timestamp value, and I can't think of a way that that would
happen except data corruption. However, it's pretty silly to have a
function whose entire reason for existence is to not throw errors for
out-of-range inputs nonetheless throw an error for out-of-range input.
The new APIs are not used in this patch, but will be needed in
upcoming btree_gin changes.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Discussion: https://postgr.es/m/262624.1738460652@sss.pgh.pa.us
Commits 8319e5cb5 et al missed the fact that ATPostAlterTypeCleanup
contains three calls to ATPostAlterTypeParse, and the other two
also need protection against passing a relid that we don't yet
have lock on. Add similar logic to those code paths, and add
some test cases demonstrating the need for it.
In v18 and master, the test cases demonstrate that there's a
behavioral discrepancy between stored generated columns and virtual
generated columns: we disallow changing the expression of a stored
column if it's used in any composite-type columns, but not that of
a virtual column. Since the expression isn't actually relevant to
either sort of composite-type usage, this prohibition seems
unnecessary; but changing it is a matter for separate discussion.
For now we are just documenting the existing behavior.
Reported-by: jian he <jian.universality@gmail.com>
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: CACJufxGKJtGNRRSXfwMW9SqVOPEMdP17BJ7DsBf=tNsv9pWU9g@mail.gmail.com
Backpatch-through: 13
The command is: ALTER TABLE x ADD [CONSTRAINT y] NOT NULL z
This syntax was added in 18, but I got pushback for getting commit
dbf42b84ac in 18 (also tab-completion for new syntax) after the
feature freeze, so I'll put this in master only for now.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/d4f14c6b-086b-463c-b15f-01c7c9728eab@oss.nttdata.com
Discussion: https://postgr.es/m/202505111448.bwbfomrymq4b@alvherre.pgsql
Commit 08aa89b326 removed the COMMIT_TS_SETTS WAL record,
leaving xl_commit_ts_set and SizeOfCommitTsSet unused. However,
it missed removing these definitions. This commit cleans up
the leftover code.
Since this is a cleanup rather than a bug fix, it is applied only to
the master branch.
Author: Andy Fan <zhihuifan1213@163.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/87ecuzmkqf.fsf@163.com
The documentation for pg_replication_slots previously mentioned only
max_slot_wal_keep_size as a condition under which the wal_status column
could show unreserved or lost. However, since commit be87200,
replication slots can also be invalidated due to horizon or wal_level,
and since commit ac0e33136a, idle_replication_slot_timeout can also
trigger this state.
This commit updates the description of the wal_status column to
reflect that max_slot_wal_keep_size is not the only cause of the lost state.
Back-patched to v16, where the additional invalidation cases were introduced.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Discussion: https://postgr.es/m/78b34e84-2195-4f28-a151-5d204a382fdd@oss.nttdata.com
Backpatch-through: 16
If certain constraint characteristic clauses (NO INHERIT, NOT VALID, NOT
ENFORCED) are given to CREATE CONSTRAINT TRIGGER, the resulting error
message is
ERROR: TRIGGER constraints cannot be marked NO INHERIT
which is a bit silly, because these aren't "constraints of type
TRIGGER". Hardcode a better error message to prevent it. This is a
cosmetic fix for quite a fringe problem with no known complaints from
users, so no backpatch.
While at it, silently accept ENFORCED if given.
Author: Amul Sul <sulamul@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAAJ_b97hd-jMTS7AjgU6TDBCzDx_KyuKxG+K-DtYmOieg+giyQ@mail.gmail.com
Discussion: https://postgr.es/m/CACJufxHSp2puxP=q8ZtUGL1F+heapnzqFBZy5ZNGUjUgwjBqTQ@mail.gmail.com
AlterDomainStmt.subtype used characters for its subtypes of commands,
SET|DROP DEFAULT|NOT NULL and ADD|DROP|VALIDATE CONSTRAINT, which were
hardcoded in a couple of places of the code. The code is improved by
using an enum instead, with the same character values as the original
code.
Note that the field was documented in parsenodes.h and that it forgot to
mention 'V' (VALIDATE CONSTRAINT).
Author: Quan Zongliang <quanzongliang@yeah.net>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/41ff310b-16bd-44b9-a3ef-97e20f14b709@yeah.net
The documentation previously stated that the wal_status column is NULL
if restart_lsn is NULL in the pg_replication_slots view. This is incorrect,
and wal_status can be "lost" even when restart_lsn is NULL.
This commit removes the incorrect description.
Back-patched to all supported versions.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Discussion: https://postgr.es/m/c9d23cdc-b5dd-455a-8ee9-f1f24d701d89@oss.nttdata.com
Backpatch-through: 13
The COPY FROM command now accepts a non-negative integer for the HEADER option,
allowing multiple header lines to be skipped. This is useful when the input
contains multi-line headers that should be ignored during data import.
Author: Shinya Kato <shinya11.kato@gmail.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Yugo Nagata <nagata@sraoss.co.jp>
Discussion: https://postgr.es/m/CAOzEurRPxfzbxqeOPF_AGnAUOYf=Wk0we+1LQomPNUNtyZGBZw@mail.gmail.com
Currently check_recovery_target_timeline() converts any value that is
not "current", "latest", or a valid integer to 0. So, for example, the
following configuration added to postgresql.conf followed by a startup:
recovery_target_timeline = 'bogus'
recovery_target_timeline = '9999999999'
... results in the following error patterns:
FATAL: 22023: recovery target timeline 0 does not exist
FATAL: 22023: recovery target timeline 1410065407 does not exist
This is confusing, because the server does not reflect the intention of
the user, and just reports incorrect data unrelated to the GUC.
The origin of the problem is that we do not perform a range check in the
GUC value passed-in for recovery_target_timeline. This commit improves
the situation by using strtou64() and by providing stricter range
checks. Some test cases are added for the cases of an incorrect, an
upper-bound and a lower-bound timeline value, checking the sanity of the
reports based on the contents of the server logs.
Author: David Steele <david@pgmasters.net>
Discussion: https://postgr.es/m/e5d472c7-e9be-4710-8dc4-ebe721b62cea@pgbackrest.org
Currently, we do not support Memoize for SEMI and ANTI joins because
nested loop SEMI/ANTI joins do not scan the inner relation to
completion, which prevents Memoize from marking the cache entry as
complete. One might argue that we could mark the cache entry as
complete after fetching the first inner tuple, but that would not be
safe: if the first inner tuple and the current outer tuple do not
satisfy the join clauses, a second inner tuple matching the parameters
would find the cache entry already marked as complete.
However, if the inner side is provably unique, this issue doesn't
arise, since there would be no second matching tuple. That said, this
doesn't help in the case of SEMI joins, because a SEMI join with a
provably unique inner side would already have been reduced to an inner
join by reduce_unique_semijoins.
Therefore, in this patch, we check whether the inner relation is
provably unique for ANTI joins and enable the use of Memoize in such
cases.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/CAMbWs48FdLiMNrmJL-g6mDvoQVt0yNyJAqMkv4e2Pk-5GKCZLA@mail.gmail.com
This routine has come as a useful piece to be able to know the list of
injection points currently attached in a system. One area would be to
use it in a set-returning function, or just let out-of-core code play
with it.
This hides the internals of the shared memory array lookup holding the
information about the injection points (point name, library and function
name), allocating the result in a palloc'd List consumable by the
caller.
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Rahila Syed <rahilasyed90@gmail.com>
Discussion: https://postgr.es/m/Z_xYkA21KyLEHvWR@paquier.xyz
Discussion: https://postgr.es/m/aBG2rPwl3GE7m1-Q@paquier.xyz
PQcancelCreate failed to copy struct pg_conn_host's "type" field,
instead leaving it zero (a/k/a CHT_HOST_NAME). This seemingly
has no great ill effects if it should have been CHT_UNIX_SOCKET
instead, but if it should have been CHT_HOST_ADDRESS then a
null-pointer dereference will occur when the cancelConn is used.
Bug: #18974
Reported-by: Maxim Boguk <maxim.boguk@gmail.com>
Author: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18974-575f02b2168b36b3@postgresql.org
Backpatch-through: 17
In commit fe07100e82, I renamed a couple of functions in
test_dsm_registry to make it clear what they are testing. However,
the buildfarm's cross-version upgrade tests run pg_upgrade with the
test modules installed, so this caused errors like:
ERROR: could not find function "get_val_in_shmem" in file ".../test_dsm_registry.so"
To fix, revert those renames. I could probably get away with only
un-renaming the C symbols, but I figured I'd avoid introducing
function name mismatches. Also, AFAICT the buildfarm's
cross-version upgrade tests do not run the test module tests
post-upgrade, else we'll need to properly version the extension.
Per buildfarm member crake.
Discussion: https://postgr.es/m/aGVuYUNW23tStUYs%40nathan
Presently, the dynamic shared memory (DSM) registry only provides
GetNamedDSMSegment(), which allocates a fixed-size segment. To use
the DSM registry for more sophisticated things like dynamic shared
memory areas (DSAs) or a hash table backed by a DSA (dshash), users
need to create a DSM segment that stores various handles and LWLock
tranche IDs and to write fairly complicated initialization code.
Furthermore, there is likely little variation in this
initialization code between libraries.
This commit introduces functions that simplify allocating a DSA or
dshash within the DSM registry. These functions are very similar
to GetNamedDSMSegment(). Notable differences include the lack of
an initialization callback parameter and the prohibition of calling
the functions more than once for a given entry in each backend
(which should be trivially avoidable in most circumstances). While
at it, this commit bumps the maximum DSM registry entry name length
from 63 bytes to 127 bytes.
Also note that even though one could presumably detach/destroy the
DSAs and dshashes created in the registry, such use-cases are not
yet well-supported, if for no other reason than the associated DSM
registry entries cannot be removed. Adding such support is left as
a future exercise.
The test_dsm_registry test module contains tests for the new
functions and also serves as a complete usage example.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Rahila Syed <rahilasyed90@gmail.com>
Discussion: https://postgr.es/m/aEC8HGy2tRQjZg_8%40nathan
Restore nbtree preprocessing comments describing how we mark nbtree row
compare members required to how they were prior to 2016 bugfix commit
a298a1e0.
Oversight in commit bd3f59fd, which made nbtree preprocessing revert to
the original 2006 rules, but neglected to revert these comments.
Backpatch-through: 18
The array-based variant of width_bucket() has always accepted NaN
inputs, treating them as equal but larger than any non-NaN,
as we do in ordinary comparisons. But up to now, the four-argument
variants threw errors for a NaN operand. This is inconsistent
and unnecessary, since we can perfectly well regard NaN as falling
after the last bucket.
We do still throw error for NaN or infinity histogram-bound inputs,
since there's no way to compute sensible bucket boundaries.
Arguably this is a bug fix, but given the lack of field complaints
I'm content to fix it in master.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/2822872.1750540911@sss.pgh.pa.us
Trying to alter a constraint so that it becomes NOT VALID results in an
error that assumes the constraint is a foreign key. This is potentially
wrong, so give a more generic error message.
While at it, give CREATE CONSTRAINT TRIGGER a better error message as
well.
Co-authored-by: jian he <jian.universality@gmail.com>
Co-authored-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Co-authored-by: Amul Sul <sulamul@gmail.com>
Discussion: https://postgr.es/m/CACJufxHSp2puxP=q8ZtUGL1F+heapnzqFBZy5ZNGUjUgwjBqTQ@mail.gmail.com
Recent nbtree bugfix commit 5f4d98d4 added a special case to the code
that sets up a page-level prefix of keys that are definitely satisfied
by every tuple on the page: whenever _bt_set_startikey reached a row
compare key, we'd refuse to apply the pstate.forcenonrequired behavior
in scans where that usually happens (scans with a higher-order array
key). That hack made the scan avoid essentially the same infinite
cycling behavior that also affected nbtree scans with redundant keys
(keys that preprocessing could not eliminate) prior to commit f09816a0.
There are now serious doubts about this row compare workaround.
Testing has shown that a scan with a row compare key and an array key
could still read the same leaf page twice (without the scan's direction
changing), which isn't supposed to be possible following the SAOP
enhancements added by Postgres 17 commit 5bf748b8. Also, we still
allowed a required row compare key to be used with forcenonrequired mode
when its header key happened to be beyond the pstate.ikey set by
_bt_set_startikey, which was complicated and brittle.
The underlying problem was that row compares had inconsistent rules
around how scans start (which keys can be used for initial positioning
purposes) and how scans end (which keys can set continuescan=false).
Quals with redundant keys that could not be eliminated by preprocessing
also had that same quality to them prior to today's bugfix f09816a0. It
now seems prudent to bring row compare keys in line with the new charter
for required keys, by making the start and end rules symmetric.
This commit fixes two points of disagreement between _bt_first and
_bt_check_rowcompare. Firstly, _bt_check_rowcompare was capable of
ending the scan at the point where it needed to compare an ISNULL-marked
row compare member that came immediately after a required row compare
member. _bt_first now has symmetric handling for NULL row compares.
Secondly, _bt_first had its own ideas about which keys were safe to use
for initial positioning purposes. It could use fewer or more keys than
_bt_check_rowcompare. _bt_first now uses the same requiredness markings
as _bt_check_rowcompare for this.
Now that _bt_first and _bt_check_rowcompare agree on how to start and
end scans, we can get rid of the forcenonrequired special case, without
any risk of infinite cycling. This approach also makes row compare keys
behave more like regular scalar keys, particularly within _bt_first.
Fixing these inconsistencies necessitates dealing with a related issue
with the way that row compares were marked required by preprocessing: we
didn't mark any lower-order row members required following 2016 bugfix
commit a298a1e0. That approach was over broad. The bug in question was
actually an oversight in how _bt_check_rowcompare dealt with tuple NULL
values that failed to satisfy a scan key marked required in the opposite
scan direction (it was a bug in 2011 commits 6980f817 and 882368e8, not
a bug in 2006 commit 3a0a16cb). Go back to marking row compare members
as required using the original 2006 rules, and fix the 2016 bug in a
more principled way: by limiting use of the "set continuescan=false with
a key required in the opposite scan direction upon encountering a NULL
tuple value" optimization to the first/most significant row member key.
While it isn't safe to use an implied IS NOT NULL qualifier to end the
scan when it comes from a required lower-order row compare member key,
it _is_ generally safe for such a required member key to end the scan --
provided the key is marked required in the _current_ scan direction.
This fixes what was arguably an oversight in either commit 5f4d98d4 or
commit 8a510275. It is a direct follow-up to today's commit f09816a0.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Discussion: https://postgr.es/m/CAH2-Wz=pcijHL_mA0_TJ5LiTB28QpQ0cGtT-ccFV=KzuunNDDQ@mail.gmail.com
Backpatch-through: 18
nbtree preprocessing's handling of redundant (and contradictory) keys
created problems for scans with = arrays. It was just about possible
for a scan with an = array key and one or more redundant keys (keys that
preprocessing could not eliminate due an incomplete opfamily and a
cross-type key) to get stuck. Testing has shown that infinite cycling
where the scan never manages to make forward progress was possible.
This could happen when the scan's arrays were reset in _bt_readpage's
forcenonrequired=true path (added by bugfix commit 5f4d98d4) when the
arrays weren't at least advanced up to the same point that they were in
at the start of the _bt_readpage call. Earlier redundant keys prevented
the finaltup call to _bt_advance_array_keys from reaching lower-order
keys that needed to be used to sufficiently advance the scan's arrays.
To fix, make preprocessing leave the scan's keys in a state that is as
close as possible to how it'll usually leave them (in the common case
where there's no redundant keys that preprocessing failed to eliminate).
Now nbtree preprocessing _reliably_ leaves behind at most one required
>/>= key per index column, and at most one required </<= key per index
column. Columns that have one or more = keys that are eligible to be
marked required (based on the traditional rules) prioritize the = keys
over redundant inequality keys; they'll _reliably_ be left with only one
of the = keys as the index column's only required key.
Keys that are not marked required (whether due to the new preprocessing
step running or for some other reason) are relocated to the end of the
so->keyData[] array as needed. That way they'll always be evaluated
after the scan's required keys, and so cannot prevent code in places
like _bt_advance_array_keys and _bt_first from reaching a required key.
Also teach _bt_first to decide which initial positioning keys to use
based on the same requiredness markings that have long been used by
_bt_checkkeys/_bt_advance_array_keys. This is a necessary condition for
reliably avoiding infinite cycling. _bt_advance_array_keys expects to
be able to reason about what'll happen in the next _bt_first call should
it start another primitive index scan, by evaluating inequality keys
that were marked required in the opposite-to-scan scan direction only.
Now everybody (_bt_first, _bt_checkkeys, and _bt_advance_array_keys)
will always agree on which exact key will be used on each index column
to start and/or end the scan (except when row compare keys are involved,
which have similar problems not addressed by this commit).
An upcoming commit will finish off the work started by this commit by
harmonizing how _bt_first, _bt_checkkeys, and _bt_advance_array_keys
apply row compare keys to start and end scans.
This fixes what was arguably an oversight in either commit 5f4d98d4 or
commit 8a510275.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Discussion: https://postgr.es/m/CAH2-Wz=ds4M+3NXMgwxYxqU8MULaLf696_v5g=9WNmWL2=Uo2A@mail.gmail.com
Backpatch-through: 18
The previous minimum was to maintain support for Python 3.5, but we
now require Python 3.6 anyway (commit 45363fca63), so that reason is
obsolete. A small raise to Meson 0.57 allows getting rid of a fair
amount of version conditionals and silences some future-deprecated
warnings.
With the version bump, the following deprecation warnings appeared and
are fixed:
WARNING: Project targets '>=0.57' but uses feature deprecated since '0.55.0': ExternalProgram.path. use ExternalProgram.full_path() instead
WARNING: Project targets '>=0.57' but uses feature deprecated since '0.56.0': meson.build_root. use meson.project_build_root() or meson.global_build_root() instead.
It turns out that meson 0.57.0 and 0.57.1 are buggy for our use, so
the minimum is actually set to 0.57.2. This is specific to this
version series; in the future we won't necessarily need to be this
precise.
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/42e13eb0-862a-441e-8d84-4f0fd5f6def0%40eisentraut.org
Commit c120550edb optimized the vacuuming of relations without
indexes (a.k.a. one-pass strategy) by directly marking dead item IDs
as LP_UNUSED. However, the periodic FSM vacuum was still checking if
dead item IDs had been marked as LP_DEAD when attempting to vacuum the
FSM every VACUUM_FSM_EVERY_PAGES blocks. This condition was never met
due to the optimization, resulting in missed FSM vacuum
opportunities.
This commit modifies the periodic FSM vacuum condition to use the
number of tuples deleted during HOT pruning. This count includes items
marked as either LP_UNUSED or LP_REDIRECT, both of which are expected
to result in new free space to report.
Back-patch to v17 where the vacuum optimization for tables with no
indexes was introduced.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAD21AoBL8m6B9GSzQfYxVaEgvD7-Kr3AJaS-hJPHC+avm-29zw@mail.gmail.com
Backpatch-through: 17
When decompressing some input data, the calculation for the initial
starting point and the initial size were incorrect, potentially leading
to failures when decompressing contents with LZ4. These initialization
points are fixed in this commit, bringing the logic closer to what
exists for gzip and zstd.
The contents of the compressed data is clear (for example backups taken
with LZ4 can still be decompressed with a "lz4" command), only the
decompression part reading the input data was impacted by this issue.
This code path impacts pg_basebackup and pg_verifybackup, which can use
the LZ4 decompression routines with an archive streamer, or any tools
that try to use the archive streamers in src/fe_utils/.
The issue is easier to reproduce with files that have a low-compression
rate, like ones filled with random data, for a size of at least 512kB,
but this could happen with anything as long as it is stored in a data
folder. Some tests are added based on this idea, with a file filled
with random bytes grabbed from the backend, written at the root of the
data folder. This is proving good enough to reproduce the original
problem.
Author: Mikhail Gribkov <youzhick@gmail.com>
Discussion: https://postgr.es/m/CAMEv5_uQS1Hg6KCaEP2JkrTBbZ-nXQhxomWrhYQvbdzR-zy-wA@mail.gmail.com
Backpatch-through: 15
This commit moves all the routines related to the bytea data type into
its own new file, called bytea.c, clearing some of the bloat in
varlena.c. This includes the routines for:
- Input, output, receive and send
- Comparison
- Casts to integer types
- bytea-specific functions
The internals of the routines moved here are unchanged, with one
exception. This comes with a twist in bytea_string_agg_transfn(), where
the call to makeStringAggState() is replaced by the internals of this
routine, still located in varlena.c. This simplifies the move to the
new file by not having to expose makeStringAggState().
Author: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CAJ7c6TMPVPJ5DL447zDz5ydctB8OmuviURtSwd=PHCRFEPDEAQ@mail.gmail.com
Prior to this patch, every FETCH call would generate a unique queryId
with a different size specified. Depending on the workloads, this could
lead to a significant bloat in pg_stat_statements, as repeatedly calling
a specific cursor would result in a new queryId each time. For example,
FETCH 1 c1; and FETCH 2 c1; would produce different queryIds.
This patch improves the situation by normalizing the fetch size, so as
semantically similar statements generate the same queryId. As a result,
statements like the below, which differ syntactically but have the same
effect, will now share a single queryId:
FETCH FROM c1
FETCH NEXT c1
FETCH 1 c1
In order to do a normalization based on the keyword used in FETCH,
FetchStmt is tweaked with a new FetchDirectionKeywords. This matters
for "howMany", which could be set to a negative value depending on the
direction, and we want to normalize the queries with enough information
about the direction keywords provided, including RELATIVE, ABSOLUTE or
all the ALL variants.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0tA6LbHCg2qSS+KuM850BZC_+ZgHV7Ug6BXw22TNyF+MA@mail.gmail.com
We stopped defining IOV_MAX on non-Windows systems in 75357ab94, on
the assumption that every non-Windows system defines it in <limits.h>
as required by X/Open. GNU Hurd, however, doesn't follow that
standard either. Put back the old logic to assume 16 if it's
not defined.
Author: Michael Banck <mbanck@gmx.net>
Co-authored-by: Christoph Berg <myon@debian.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/6862e8d1.050a0220.194b8d.76fa@mx.google.com
Discussion: https://postgr.es/m/6846e0c3.df0a0220.39ef9b.c60e@mx.google.com
Backpatch-through: 16
The existing code assumed that O_RDONLY is defined as 0, but this is
not required by POSIX and is not true on GNU Hurd. We can avoid
the assumption by relying on O_ACCMODE to mask the fcntl() result.
(Hopefully, all supported platforms define that.)
Author: Michael Banck <mbanck@gmx.net>
Co-authored-by: Samuel Thibault
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/6862e8d1.050a0220.194b8d.76fa@mx.google.com
Discussion: https://postgr.es/m/68480868.5d0a0220.1e214d.68a6@mx.google.com
Backpatch-through: 13
Previously, pattern matching and case mapping behavior branched based
on the provider. Refactor to use a method table, which is less
error-prone.
This is also a step toward multiple provider versions, which we may
want to support in the future.
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/2830211e1b6e6a2e26d845780b03e125281ea17b.camel%40j-davis.com
Querying the NUMA status can be quite time consuming, especially with
large shared buffers. 8cc139bec3 called numa_move_pages() once, for
all buffers, and we had to wait for the syscall to complete.
But with the chunking, introduced by 7fe2f67c7c to work around a kernel
bug, we can do CHECK_FOR_INTERRUPTS() after each chunk, allowing users
to abort the execution.
Reviewed-by: Christoph Berg <myon@debian.org>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aEtDozLmtZddARdB@msg.df7cb.de
Backpatch-through: 18
When querying NUMA status of pages in shared memory, we need to touch
the memory first to get valid results. This may trigger valgrind
reports, because some of the memory (e.g. unpinned buffers) may be
marked as noaccess.
Solved by adding a valgrind suppresion. An alternative would be to
adjust the access/noaccess status before touching the memory, but that
seems far too invasive. It would require all those places to have
detailed knowledge of what the shared memory stores.
The pg_numa_touch_mem_if_required() macro is replaced with a function.
Macros are invisible to suppressions, so it'd have to suppress reports
for the caller - e.g. pg_get_shmem_allocations_numa(). So we'd suppress
reports for the whole function, and that seems to heavy-handed. It might
easily hide other valid issues.
Reviewed-by: Christoph Berg <myon@debian.org>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aEtDozLmtZddARdB@msg.df7cb.de
Backpatch-through: 18
There's a kernel bug in do_pages_stat(), affecting systems combining
64-bit kernel and 32-bit user space. The function splits the request
into chunks of 16 pointers, but forgets the pointers are 32-bit when
advancing to the next chunk. Some of the pointers get skipped, and
memory after the array is interpreted as pointers. The result is that
the produced status of memory pages is mostly bogus.
Systems combining 64-bit and 32-bit environments like this might seem
rare, but that's not the case - all 32-bit Debian packages are built in
a 32-bit chroot on a system with a 64-bit kernel.
This is a long-standing kernel bug (since 2010), affecting pretty much
all kernels, so it'll take time until all systems get a fixed kernel.
Luckily, we can work around the issue by chunking the requests the same
way do_pages_stat() does, at least on affected systems. We don't know
what kernel a 32-bit build will run on, so all 32-bit builds use chunks
of 16 elements (the largest chunk before hitting the issue).
64-bit builds are not affected by this issue, and so could work without
the chunking. But chunking has other advantages, so we apply chunking
even for 64-bit builds, with chunks of 1024 elements.
Reported-by: Christoph Berg <myon@debian.org>
Author: Christoph Berg <myon@debian.org>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aEtDozLmtZddARdB@msg.df7cb.de
Context: https://marc.info/?l=linux-mm&m=175077821909222&w=2
Backpatch-through: 18
Switch MSVC to use the conforming preprocessor, using the
/Zc:preprocessor option.
This allows us to drop the alternative implementation of
VA_ARGS_NARGS() for the previous "traditional" preprocessor.
This also prepares the way for enabling C11 mode in the future, which
enables the conforming preprocessor by default.
This now requires Visual Studio 2019. The installation documentation
is adjusted accordingly.
Discussion: https://www.postgresql.org/message-id/flat/01a69441-af54-4822-891b-ca28e05b215a%40eisentraut.org
The contrib module xml2/ has always been fuzzy with the cleanup of the
memory allocated by the calls internal to libxml2, even if there are
APIs in place giving a lot of control over the error behavior, all
located in the backend's xml.c.
The code paths fixed in the commit address multiple defects, while
sanitizing the code:
- In xpath.c, several allocations are done by libxml2 for
xpath_workspace, whose memory cleanup could go out of sight as it relied
on a single TRY/CATCH block done in pgxml_xpath(). workspace->res is
allocated by libxml2, and may finish by not being freed at all upon a
failure outside of a TRY area. This code is refactored so as the
TRY/CATCH block of pgxml_xpath() is moved one level higher to its
callers, which are responsible for cleaning up the contents of a
workspace on failure. cleanup_workspace() now requires a volatile
workspace, forcing as a rule that a TRY/CATCH block should be used.
- Several calls, like xmlStrdup(), xmlXPathNewContext(),
xmlXPathCtxtCompile(), etc. can return NULL on failures (for most of
them allocation failures. These forgot to check for failures, or missed
that pg_xml_error_occurred() should be called, to check if an error is
already on the stack.
- Some memory allocated by libxml2 calls was freed in an incorrect way,
"resstr" in xslt_process() being one example.
The class of errors fixed here are for problems that are unlikely going
to happen in practice, so no backpatch is done. The changes have
finished by being rather invasive, so it is perhaps not a bad thing to
be conservative and to keep these changes only on HEAD anyway.
Author: Michael Paquier <michael@paquier.xyz>
Reported-by: Karavaev Alexey <maralist86@mail.ru>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18943-2f2a04ab03904598@postgresql.org
This commit fixes some defects in the backend's xml.c, found upon
inspection of the internals of libxml2:
- xmlEncodeSpecialChars() can fail on malloc(), returning NULL back to
the caller. xmltext() assumed that this could never happen. Like other
code paths, a TRY/CATCH block is added there, covering also the fact
that cstring_to_text_with_len() could fail a memory allocation, where
the backend would miss to free the buffer allocated by
xmlEncodeSpecialChars().
- Some libxml2 routines called in xmlelement() can return NULL, like
xmlAddChildList() or xmlTextWriterStartElement(). Dedicated errors are
added for them.
- xml_xmlnodetoxmltype() missed that xmlXPathCastNodeToString() can fail
on an allocation failure. In this case, the call can just be moved to
the existing TRY/CATCH block.
All these code paths would cause the server to crash. As this is
unlikely a problem in practice, no backpatch is done. Jim and I have
caught these defects, not sure who has scored the most. The contrib
module xml2/ has similar defects, which will be addressed in a separate
change.
Reported-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Discussion: https://postgr.es/m/aEEingzOta_S_Nu7@paquier.xyz
The current code in resolve_column_ref (dating to commits 01f7d2990
and fe24d7816) believes that not finding a RECFIELD datum is a
can't-happen case, in consequence of which I didn't spend a whole lot
of time considering what to do if it did happen. But it turns out
that it *can* happen if the would-be field name is a fully-reserved
PL/pgSQL keyword. Change the error message to describe that
situation, and add a test case demonstrating it.
This might need further refinement if anyone can find other ways to
trigger a failure here; but without an example it's not clear what
other error to throw.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://postgr.es/m/2185258.1745617445@sss.pgh.pa.us
On close inspection, there does not seem to be a strong reason
why these should be fully-reserved keywords. I guess they just
escaped consideration in previous attempts to minimize PL/pgSQL's
list of reserved words.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://postgr.es/m/2185258.1745617445@sss.pgh.pa.us
This provides a convenient way to look up a database's OID. For
example, the query
SELECT * FROM pg_shdepend
WHERE dbid = (SELECT oid FROM pg_database
WHERE datname = current_database());
can now be simplified to
SELECT * FROM pg_shdepend
WHERE dbid = current_database()::regdatabase;
Like the regrole type, regdatabase has cluster-wide scope, so we
disallow regdatabase constants from appearing in stored
expressions.
Bumps catversion.
Author: Ian Lawrence Barwick <barwick@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/aBpjJhyHpM2LYcG0%40nathan
If the method is called in scalar context and we didn't pass in a stderr
handle, one won't be created. However, some error paths assume that it
exists, so in this case create a dummy stderr to avoid the resulting
perl error.
Per gripe from Oleg Tselebrovskiy <o.tselebrovskiy@postgrespro.ru> and
adapted from his patch.
Discussion: https://postgr.es/m/378eac5de4b8ecb5be7bcdf2db9d2c4d@postgrespro.ru
Previously, tab completion for COPY only suggested plain tables
and partitioned tables, even though materialized views are also
valid for COPY TO (since commit 534874fac0), and foreign tables
are valid for COPY FROM.
This commit enhances tab completion for COPY to also include
materialized views and foreign tables.
Views with INSTEAD OF INSERT triggers are supported with
COPY FROM but rarely used, so plain views are intentionally
excluded from completion.
Author: jian he <jian.universality@gmail.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/CACJufxFxnSkikp+GormAGHcMTX1YH2HRXW1+3dJM9w7yY9hdsg@mail.gmail.com
When postfix operators where dropped in 1ed6b8956, the CREATE OPERATOR
docs were not updated to make the RIGHTARG argument mandatory in the
grammar.
While at it, make the RIGHTARG docs more concise. Also, the operator
docs were mentioning "infix" in the introduction, while using "binary"
everywhere else.
Author: Christoph Berg <myon@debian.org>
Discussion: https://www.postgresql.org/message-id/flat/aAtpbnQphv4LWAye@msg.df7cb.de
Commit 1546e17f9d accidentally misspelled additionally as
additionaly. Backpatch to v17 to match where the original
commit was backpatched.
Author: Daniel Gustafsson <daniel@yesql.se>
Backpatch-through: 17
This commit refactors the vacuum routines that rely on VacuumParams,
adding const markers where necessary to force a new policy in the code.
This structure should not use a pointer as it may be used across
multiple relations, and its contents should never be updated.
vacuum_rel() stands as an exception as it touches the "index_cleanup"
and "truncate" options.
VacuumParams has been introduced in 0d83138974, and 661643deda has
fixed a bug impacting VACUUM operating on multiple relations. The
changes done in tableam.h break ABI compatibility, so this commit can
only happen on HEAD.
Author: Shihao Zhong <zhong950419@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CAGRkXqTo+aK=GTy5pSc-9cy8H2F2TJvcrZ-zXEiNJj93np1UUw@mail.gmail.com
log_line_prefix is changed to include "%b", the backend type in the TAP
test configuration. %v and %x are removed from the CI configuration,
with the format around %b changed.
The lack of backend type in postgresql.conf set by Cluster.pm for the
TAP test configuration was something that has been bugging me, beginning
the discussion that has led to this change. The change in the CI has
come up during the discussion, to become consistent with pg_regress.c,
%v and %x not being that useful to have.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/aC0VaIWAXLgXcHVP@paquier.xyz