Commit graph

63991 commits

Author SHA1 Message Date
Jacob Champion
b977bd308a oauth: Allow validators to register custom HBA options
OAuth validators can already use custom GUCs to configure behavior
globally. But we currently provide no ability to adjust settings for
individual HBA entries, because the original design focused on a world
where a provider covered a "single audience" of users for one database
cluster. This assumption does not apply to multitenant use cases, where
a single validator may be controlling access for wildly different user
groups.

To improve this use case, add two new API calls for use by validator
callbacks: RegisterOAuthHBAOptions() and GetOAuthHBAOption().
Registering options "foo" and "bar" allows a user to set "validator.foo"
and "validator.bar" in an oauth HBA entry. These options are stringly
typed (syntax validation is solely the responsibility of the defining
module), and names are restricted to a subset of ASCII to avoid tying
our hands with future HBA syntax improvements.

Unfortunately, we can't check the custom option names during a reload of
the configuration, like we do with standard HBA options, without
requiring all validators to be loaded via shared_preload_libraries.
(I consider this to be a nonstarter: most validators should probably use
session_preload_libraries at most, since requiring a full restart just
to update authentication behavior will be unacceptable to many users.)
Instead, the new validator.* options are checked against the registered
list at connection time.

Multiple alternatives were proposed and/or prototyped, including
extending the GUC system to allow per-HBA overrides, joining forces with
recent refactoring work on the reloptions subsystem, and giving the
ability to customize HBA options to all PostgreSQL extensions. I
personally believe per-HBA GUC overrides are the best option, because
several existing GUCs like authentication_timeout and pre_auth_delay
would fit there usefully. But the recent addition of SNI per-host
settings in 4f433025f indicates that a more general solution is needed,
and I expect that to take multiple releases' worth of discussion.

This compromise patch, then, is intentionally designed to be an
architectural dead end: simple to describe, cheap to maintain, and
providing just enough functionality to let validators move forward for
PG19. The hope is that it will be replaced in the future by a solution
that can handle per-host, per-HBA, and other per-context configuration
with the same functionality that GUCs provide today. In the meantime,
the bulk of the code in this patch consists of strict guardrails on the
simple API, to try to ensure that we don't have any reason to regret its
existence during its unknown lifespan.

I owe particular thanks here to Zsolt Parragi, who prototyped several
approaches that guided the final design.

Suggested-by: Zsolt Parragi <zsolt.parragi@percona.com>
Suggested-by: VASUKI M <vasukianand0119@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAN4CZFM3b8u5uNNNsY6XCya257u%2BDofms3su9f11iMCxvCacag%40mail.gmail.com
2026-04-07 08:15:19 -07:00
Jacob Champion
6d00fb9048 libpq: Split PGOAUTHDEBUG=UNSAFE into multiple options
PGOAUTHDEBUG is a blunt instrument: you get all the debugging features,
or none of them. The most annoying consequence during manual use is the
Curl debug trace, which tends to obscure the device flow prompt
entirely. The promotion of PGOAUTHCAFILE into its own feature in
993368113 improved the situation somewhat, but there's still the
discomfort of knowing you have to opt into many dangerous behaviors just
to get the single debug feature you wanted.

Explode the PGOAUTHDEBUG syntax into a comma-separated list. The old
"UNSAFE" value enables everything, like before. Any individual unsafe
features still require the envvar to begin with an "UNSAFE:" prefix, to
try to interrupt the flow of someone who is about to do something they
should not.

So now, rather than

    PGOAUTHDEBUG=UNSAFE        # enable all the unsafe things

a developer can say

    PGOAUTHDEBUG=call-count    # only show me the call count. safe!
    PGOAUTHDEBUG=UNSAFE:trace  # print secrets, but don't allow HTTP

To avoid adding more build system scaffolding to libpq-oauth, implement
this entirely in a small private header. This unfortunately can't be
standalone, so it needs a headerscheck exception.

Author: Zsolt Parragi <zsolt.parragi@percona.com>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAOYmi%2B%3DfbZNJSkHVci%3DGpR8XPYObK%3DH%2B2ERRha0LDTS%2BifsWnw%40mail.gmail.com
Discussion: https://postgr.es/m/CAN4CZFMmDZMH56O9vb_g7vHqAk8ryWFxBMV19C39PFghENg8kA%40mail.gmail.com
2026-04-07 08:15:14 -07:00
Álvaro Herrera
e76d8c749c
Reserve replication slots specifically for REPACK
Add a new GUC max_repack_replication_slots, which lets the user reserve
some additional replication slots for concurrent repack (and only
concurrent repack).  With this, the user doesn't have to worry about
changing the max_replication_slots in order to cater for use of
concurrent repack.

(We still use the same pool of bgworkers though, but that's less
commonly a problem than slots.)

Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Discussion: https://postgr.es/m/202604012148.nnnmyxxrr6nh@alvherre.pgsql
2026-04-07 16:55:29 +02:00
Heikki Linnakangas
979387f188 Fix harmless leftover in _hash_kill_items()
Checking for 'havePin' is sufficient here. An earlier version of the
patch didn't have the 'havePin' variable and used
'so->hashso_bucket_buf == so->currPos.buf' as the condition when both
locking and unlocking the page. The havePin variable was added later
during development, but the unlocking condition wasn't fully
updated. Tidy it up.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/b9de8d05-3b02-4a27-9b0b-03972fa4bfd3@iki.fi
2026-04-07 17:38:11 +03:00
Andrew Dunstan
55890a9194 Add errdetail() with PID and UID about source of termination signal.
When a backend is terminated via pg_terminate_backend() or an external
SIGTERM, the error message now includes the sender's PID and UID as
errdetail, making it easier to identify the source of unexpected
terminations in multi-user environments.

On platforms that support SA_SIGINFO (Linux, FreeBSD, and most modern
Unix systems), the signal handler captures si_pid and si_uid from the
siginfo_t structure.  On platforms without SA_SIGINFO, the detail is
simply omitted.

Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Chao Li <1356863904@qq.com>
Discussion: https://postgr.es/m/CAKZiRmyrOWovZSdixpLd3PGMQXuQL_zw2Ght5XhHCkQ1uDsxjw@mail.gmail.com
2026-04-07 10:22:33 -04:00
Robert Haas
c10edb102a pg_stash_advice: Allow stashed advice to be persisted to disk.
If pg_stash_advice.persist = true, stashed advice will be written to
pg_stash_advice.tsv in the data directory, periodically and at
shutdown. On restart, stash modifications are locked out until this
file has been reloaded, but queries will not be, so there may be a
short window after startup during which previously-stashed advice is
not automatically applied.

Author: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Lukas Fittl <lukas@fittl.com>
Discussion: https://postgr.es/m/CA+Tgmob87qsWa-VugofU6epuV0H5XjWZGMbQas4Q-ADKmvSyBg@mail.gmail.com
2026-04-07 10:11:36 -04:00
Andres Freund
29e7dbf5e4 Minimal fix for WAIT FOR ... MODE 'standby_flush'
The investigation into the negative test performance impact of 7e8aeb9e48
lead to discovering that there are a few issues with WAIT FOR.

This commit is just a minimal fix to prevent hangs in standby_flush mode, due
to WAIT FOR ... 'standby_flush' seeing a 0 LSN if a newly started walreceiver
does not receive any writes, because the stanby is already caught up.

There are several other issues and this is isn't necessarily the best fix. But
this way we get the hangs out of the way.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/zqbppucpmkeqecfy4s5kscnru4tbk6khp3ozqz6ad2zijz354k@w4bdf4z3wqoz
2026-04-07 09:48:09 -04:00
Álvaro Herrera
8fb95a8ab6
doc: Add an example of REPACK (CONCURRENTLY)
Suggested-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CALDaNm3tiKhtegx5Cawi34UjbHmNGEDNAtScGM1RgWRtV-5_0Q@mail.gmail.com
2026-04-07 15:33:55 +02:00
Heikki Linnakangas
9480c585df Tidy up #ifdef USE_INJECTION_POINTS guards
Remove unnecessary #ifdef guard around the function prototypes; they
are already inside a larger #ifdef block. Move #include "subsystems.h"
inside the USE_INJECTION_POINTS guard; it's needed for
InjectionPointShmemCallbacks, which is a also inside the guard.

Reported-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/87y0iz2c1v.fsf@wibble.ilmari.org
2026-04-07 16:18:31 +03:00
Álvaro Herrera
be142fa008
Fix tests under wal_level=minimal
Buildfarm members which have specifically configured to use
wal_level=minimal fail the repack regression tests, which require
wal_level=replica.  Add a temp config file to fix that.
2026-04-07 15:14:32 +02:00
Heikki Linnakangas
257c8231bf Modernize and optimize pg_buffercache_pages()
Refactor pg_buffercache_pages() to use SFRM_Materialize mode and
construct a tuplestore directly. That's simpler and more efficient
than collecting all the data to a custom array first.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Author: Palak Chaturvedi <chaturvedipalak1911@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAExHW5sMsaz1j+hrdhyo-DJp7JCgJx87=q2iJfOc_9mwYWyvmw@mail.gmail.com
2026-04-07 16:04:48 +03:00
Heikki Linnakangas
9f3755ea07 Optimize sorting and deduplicating trigrams
Use templated qsort() so that the comparison function can be
inlined. To speed up qunique(), use a specialized comparison function
that only checks for equality.

Author: David Geier <geidav.pg@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/2a76b5ef-4b12-4023-93a1-eed6e64968f3@gmail.com
2026-04-07 14:11:25 +03:00
Tomas Vondra
884f9b3c76 Use add_size/mul_size for index instrumentation size calculations
Use overflow-safe size arithmetic in the Index[Only]Scan and parallel
instrumentation functions, consistent with other executor nodes (Hash,
Sort, Agg, Memoize). This was an oversight in dd78e69cfc.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
2026-04-07 12:47:28 +02:00
Tomas Vondra
9c18b47e61 Fix BitmapHeapScan non-parallel-aware EXPLAIN ANALYZE
Allocates shared bitmap table scan instrumentation for all parallel
scans. Previously, the instrumentation was only allocated for
parallel-aware scans, other bitmap heap scans in the parallel query had
no shared instrumentation and EXPLAIN didn't report exact/lossy pages.
This affected cases like scans on the outside of a parallel join or
queries run with debug_parallel_query=regress.

Fixed by allocating a separate DSM chunk for shared instrumentation and
doing so regardless of parallel-awareness. The instrumentation is
allocated in its own DSM chunk, separate from ParallelBitmapHeapState.

Report an initial patch by me. The approach with a separate DSM was
proposed and implemented by Melanie.

Not backpatched. The issue affects Postgres 18 (since 5a1e6df3b8), but
having multiple DSM chunks is possible only since dd78e69cfc. If we
decide to fix this in backbranches too, it will need to be done in a
less invasive way.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
2026-04-07 12:47:13 +02:00
Álvaro Herrera
0d3dba38c7
Allow logical replication snapshots to be database-specific
By default, the logical decoding assumes access to shared catalogs, so
the snapshot builder needs to consider cluster-wide XIDs during startup.
That in turn means that, if any transaction is already running (and has
XID assigned), the snapshot builder needs to wait for its completion, as
it does not know if that transaction performed catalog changes earlier.

A possible problem with this concept is that if REPACK (CONCURRENTLY) is
running in some database, backends running the same command in other
databases get stuck until the first one has committed. Thus only a
single backend in the cluster can run REPACK (CONCURRENTLY) at any time.
Likewise, REPACK (CONCURRENTLY) can block walsenders starting on behalf
of subscriptions throughout the cluster.

This patch adds a new option to logical replication output plugin, to
declare that it does not use shared catalogs (i.e. catalogs that can be
changed by transactions running in other databases in the cluster). In
that case, no snapshot the backend will use during the decoding needs to
contain information about transactions running in other databases. Thus
the snapshot builder only needs to wait for completion of transactions
in the current database.

Currently we only use this option in the REPACK background worker. It
could possibly be used in the plugin for logical replication too,
however that would need thorough analysis of that plugin.

Bump WAL version number, due to a new field in xl_running_xacts.

Author: Antonin Houska <ah@cybertec.at>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/90475.1775218118@localhost
2026-04-07 12:31:18 +02:00
Álvaro Herrera
a3b069ef90
Avoid different-size pointer-to-integer cast
Buildfarm member mamba is unhappy that I wrote "(Datum) NULL" in commit
28d534e2ae:
  https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2026-04-07%2005%3A08%3A08
Use "(Datum) 0" which is what we do everywhere else.

Discussion: https://postgr.es/m/CANWCAZaOs_+WPH13ow33Q==+FwBwVZkqzm4vND=WEB4_NBmv1Q@mail.gmail.com
2026-04-07 12:28:05 +02:00
Heikki Linnakangas
6f5ad00ab7 Optimize sort and deduplication in ginExtractEntries()
Remove NULLs from the array first, and use qsort to deduplicate only
the non-NULL items. This simplifies the comparison function. Also
replace qsort_arg() with a templated version so that the comparison
function can be inlined. These changes make ginExtractEntries() a
little faster especially for simple datatypes like integers.

Author: David Geier <geidav.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/6d16b6bd-a1ff-4469-aefb-a1c8274e561a@iki.fi
2026-04-07 13:26:39 +03:00
Peter Eisentraut
b6ccd30d8f Add isolation tests for UPDATE/DELETE FOR PORTION OF
Add documentation about concurrency issues related to UPDATE/DELETE
FOR PORTION OF as well as supporting isolation tests.

Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/ec498c3d-5f2b-48ec-b989-5561c8aa2024%40illuminatedcomputing.com
2026-04-07 11:22:11 +02:00
Álvaro Herrera
5bcc3fbd19
Fix valgrind failure
Buildfarm member skink reports that the new REPACK code is trying to
write uninitialized bytes to disk, which correspond to padding space in
the SerializedSnapshotData struct.  Silence that by initializing the
memory in SerializeSnapshot() to all zeroes.

Co-authored-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/1976915.1775537087@sss.pgh.pa.us
2026-04-07 11:13:50 +02:00
John Naylor
8c3e22a8f8 Use .h for the file containing the page checksum code fragment
Commit 5e13b0f24 used a .c file for a file containing a code fragment,
to avoid adding an exception to headerscheck. That turned out to be
too clever, since it meant installation didn't happen by the usual
mechanism. Make it look like a normal header and add the requisite
exception.

Bug: #19450
Reported-by: RekGRpth <rekgrpth@gmail.com>
Discussion: https://postgr.es/m/19450-bb0612c50c6786e5@postgresql.org
2026-04-07 15:52:55 +07:00
John Naylor
30229be755 Simplify SortSupport for the macaddr data type
As of commit 6aebedc38 Datums are 64-bit values. Since MAC addresses
have only 6 bytes, the abbreviated key always contains the entire
MAC address and is thus authoritative (for practical purposes -- the
tuple sort machinery has no way of knowing that). Abbreviating this
datatype is cheap, and aborting abbreviation prevents optimizations
like radix sort, so remove cardinality estimation.

Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Suggested-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CAJ7c6TMk10rF_LiMz6j9rRy1rqk-5s+wBPuBefLix4cY+-4s1w@mail.gmail.com
2026-04-07 13:29:27 +07:00
Michael Paquier
49cc0d4148 Mark JumbleState as a const in the post_parse_analyze hook
This commit changes the post_parse_analyze_hook_type() hook to take a
const JumbleState, to tell external modules that they are not allowed to
touch the JumbleState that has been compiled by the core code.  This
fixes a pretty old problem with pg_stat_statements, that had always the
idea of modifying the lengths of the constants stored in the
JumbleState.  The previous state could confuse extensions that need to
look at a JumbleState depending on the loading order, if
pg_stat_statements is part of the stack loaded.

Another piece included in this commit is the move of the routine
fill_in_constant_lengths() to queryjumblefuncs.c, to give an option to
extensions to compile the lengths of the constants, if necessary.  I was
surprised by the number of external code that carries a copy of this
routine (see the thread for details).  Previously, this routine modified
JumbleState.  It now copies the set of LocationLens from JumbleState,
and fills the constant lengths for separate use.

pg_stat_statements is updated to use the new ComputeConstantLengths().
JumbleState is now marked with a const in the module, where relevant.

Author: Sami Imseih <samimseih@gmail.com>
Co-authored-by: Lukas Fittl <lukas@fittl.com>
Discussion: https://postgr.es/m/CAA5RZ0tZp5qU0ikZEEqJnxvdSNGh1DWv80sb-k4QAUmiMoOp_Q@mail.gmail.com
2026-04-07 15:22:49 +09:00
John Naylor
51098839cf Split CREATE STATISTICS error reasons out into errdetails
Some errmsgs in statscmds.c were phrased as "...cannot be used
because...". Put the reasons into errdetails. While at it, switch
from passive voice to "cannot create..." for the errmsg.

Author: Yugo Nagata <nagata@sraoss.co.jp>
Suggested-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CANWCAZaZeX0omWNh_ZbD_JVujzYQdRUW8UZOQ4dWh9Sg7OcAow@mail.gmail.com
2026-04-07 11:37:48 +07:00
Michael Paquier
3284e3f63c Fix injection point detach timing problem in TAP test for lock stats
injection_points_detach() could fail because of a concurrent cleanup
triggered by injection_points_set_local() when a session finishes.
This problem could be reproduced by adding a hardcoded sleep in
InjectionPointDetach(), and has been detected by the CI.

As the test is designed so as the injection point is detached before
being awaken, there is no need for it to be local, similarly to test
010_index_concurrently_upsert.  This commit removes
injection_points_set_local(), replacing it with a confirmation that the
point has been attached in the session expected to block on a lock.
With this removal, the detach cannot happen concurrently anymore, only
before when the point is woken up.

Issue introduced by 557a9f1e3e, where the test has been added.

Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/rp6wz4lnz5qn4zlh7uxtavzfrmqvycy2g42z4zasfss2gxi54f@zzcsjdvdflwp
2026-04-07 13:17:13 +09:00
Michael Paquier
17132f55c5 Fix shmem allocation of fixed-sized custom stats kind
StatsShmemSize(), that computes the shmem size needed for pgstats,
includes the amount of shared memory wanted by all the custom stats
kinds registered.  However, the shared memory allocation was done by
ShmemAlloc() in StatsShmemInit(), meaning that the space reserved was
not used, wasting some memory.

These extra allocations would show up under "<anonymous>" in
pg_shmem_allocations, as the allocations done by ShmemAlloc() are not
tracked by ShmemIndexEnt.

Issue introduced by 7949d95945.

Author: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/04b04387-92f5-476c-90b0-4064e71c5f37@iki.fi
Backpatch-through: 18
2026-04-07 11:59:49 +09:00
Amit Langote
5c54c3ed1b Fix deferred FK check batching introduced by commit b7b27eb41a
That commit introduced AfterTriggerIsActive() to detect whether
we are inside the after-trigger firing machinery, so that RI trigger
functions can take the batched fast path.  It was implemented using
query_depth >= 0, which correctly identified immediate trigger firing
but missed the deferred case where query_depth is -1 at COMMIT via
AfterTriggerFireDeferred().  This caused deferred FK checks to fall
back to the per-row fast path instead of the batched path.

The correct check is whether we are inside an after-trigger firing
loop specifically.  Introduce afterTriggerFiringDepth, a counter
incremented around the trigger-firing loops in AfterTriggerEndQuery,
AfterTriggerFireDeferred, and AfterTriggerSetState, and decremented
after FireAfterTriggerBatchCallbacks() returns.  AfterTriggerIsActive()
now returns afterTriggerFiringDepth > 0.

Reported-by: Chao Li <li.evan.chao@gmail.com>
Author: Chao Li <li.evan.chao@gmail.com>
Co-authored-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/C2133B47-79CD-40FF-B088-02D20D654806@gmail.com
2026-04-07 10:45:59 +09:00
Michael Paquier
9897957805 Fix shared memory size of template code for custom fixed-sized pgstats
On HEAD, the template code for custom fixed-sized pgstats is in the test
module test_custom_stats.  On REL_18_STABLE, this code lives in the test
module injection_points.

Both cases were underestimating the size of the shared memory area
required for the storage of the stats data, using a single entry rather
than the whole area.  This underestimation meant that there was no
memory allocated for the LWLock required for the stats, and even more.
This problem would be also misleading for extension developers looking
at this code.

This issue has been noticed while digging into a different bug reported
by Heikki Linnakangas, showing that the underestimation was causing
failures in the TAP tests of the test modules for 32-bit builds.  The
other issue reported, related to the memory allocation of custom
fixed-sized pgstats, will be fixed in a follow-up commit.

Discussion: https://postgr.es/m/adMk_lWbnz3HDOA8@paquier.xyz
Backpatch-through: 18
2026-04-07 08:24:32 +09:00
Melanie Plageman
dd78e69cfc Allocate separate DSM chunk for parallel Index[Only]Scan instrumentation
Previously, parallel index and index-only scans packed the parallel scan
descriptor and shared instrumentation (for EXPLAIN ANALYZE) into a
single DSM allocation. Since scans may be instrumented without being
parallel-aware, and vice versa, using separate DSM chunks -- each with
its own TOC key -- is cleaner. A future commit will extend this pattern
to other scan node types.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
2026-04-06 19:10:19 -04:00
Melanie Plageman
43222b8e53 Assert no duplicate keys in shm_toc_insert()
shm_toc_insert() silently accepts duplicate keys. Since shm_toc_lookup()
returns the first matching entry, any later entry with the same key
would be unreachable. Add an assertion to catch this.

Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
2026-04-06 18:41:47 -04:00
Nathan Bossart
87f61f0c82 Add pg_stat_autovacuum_scores system view.
This view contains one row for each table in the current database,
showing the current autovacuum scores for that specific table.  It
also shows whether autovacuum would vacuum or analyze the table.

Bumps catversion.

Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Discussion: https://postgr.es/m/CAA5RZ0s4xjMrB-VAnLccC7kY8d0-4806-Lsac-czJsdA1LXtAw%40mail.gmail.com
2026-04-06 16:56:33 -05:00
Daniel Gustafsson
b3a37ffbc5 Use PG_DATA_CHECKSUM_OFF instead of hardcoded value
For a long time, the online checksums patchset kept the "off" state as
literal zero without a label to be consistent with the previous coding
which only had a label for the "on" state.  Later, when an "off" label
was made not all uses in the code got the memo.  Fix by setting these
to PG_DATA_CHECKSUM_OFF.

While there, fix a duplicate word in a comment introduced by the same
commit.

Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAJ7c6TPRTnQFXXX1CRcYoTLXw2swtDH==uSz1MYoMKdLrKZHjA@mail.gmail.com
2026-04-06 22:11:53 +02:00
Álvaro Herrera
28d534e2ae
Add CONCURRENTLY option to REPACK
When this flag is specified, REPACK no longer acquires access-exclusive
lock while the new copy of the table is being created; instead, it
creates the initial copy under share-update-exclusive lock only (same as
vacuum, etc), and it follows an MVCC snapshot; it sets up a replication
slot starting at that snapshot, and uses a concurrent background worker
to do logical decoding starting at the snapshot to populate a stash of
concurrent data changes.  Those changes can then be re-applied to the
new copy of the table just before swapping the relfilenodes.
Applications can continue to access the original copy of the table
normally until just before the swap, which is the only point at which
the access-exclusive lock is needed.

There are some loose ends in this commit:
1. concurrent repack needs its own replication slot in order to apply
   logical decoding, which are a scarce resource and easy to run out of.
2. due to the way the historic snapshot is initially set up, only one
   REPACK process can be running at any one time on the whole system.
3. there's a danger of deadlocking (and thus abort) due to the lock
   upgrade required at the final phase.

These issues will be addressed in upcoming commits.

The design and most of the code are by Antonin Houska, heavily based on
his own pg_squeeze third-party implementation.

Author: Antonin Houska <ah@cybertec.at>
Co-authored-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Noriyoshi Shinoda <noriyoshi.shinoda@hpe.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/5186.1706694913@antos
Discussion: https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql
2026-04-06 21:55:08 +02:00
Alexander Korotkov
10484c2cc7 Document that WAIT FOR may be interrupted by recovery conflicts
Add a note to the WAIT FOR documentation explaining that sessions
using this command on a standby server may be interrupted by recovery
conflicts.  Some conflicts are unavoidable - for example, replaying
a tablespace drop terminates all backends unconditionally.

Discussion: https://postgr.es/m/CAPpHfds7oSCbZqob7ytT_Lso8fv-NW8LnedUTE4Krde%2B3rkJeA%40mail.gmail.com
Author: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
2026-04-06 22:47:26 +03:00
Alexander Korotkov
7e8aeb9e48 Use WAIT FOR LSN in PostgreSQL::Test::Cluster::wait_for_catchup()
When the standby is passed as a PostgreSQL::Test::Cluster instance,
use the WAIT FOR LSN command on the standby server to implement
wait_for_catchup() for replay, write, and flush modes.  This is more
efficient than polling pg_stat_replication on the upstream, as the
WAIT FOR LSN command uses a latch-based wakeup mechanism.

The optimization applies when:
- The standby is passed as a Cluster object (not just a name string)
- The mode is 'replay', 'write', or 'flush' (not 'sent')

Rather than pre-checking pg_is_in_recovery() on the standby (which
would add an extra round-trip on every call), we issue WAIT FOR LSN
directly and handle the 'not in recovery' result as a signal to fall
back to polling.

For 'sent' mode, when the standby is passed as a string (e.g., a
subscription name for logical replication), when the standby has been
promoted, or when WAIT FOR LSN is interrupted by a recovery conflict,
the function falls back to the original polling-based approach using
pg_stat_replication on the upstream.  The recovery conflict fallback
is necessary because some conflicts are unavoidable - for example,
ResolveRecoveryConflictWithTablespace() kills all backends
unconditionally, regardless of what they are doing.

The recovery conflict detection matches the English error message
"conflict with recovery", which is reliable because the test suite
runs with LC_MESSAGES=C.

Discussion: https://postgr.es/m/CABPTF7UiArgW-sXj9CNwRzUhYOQrevLzkYcgBydmX5oDes1sjg%40mail.gmail.com
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@kurilemu.de>
2026-04-06 22:47:26 +03:00
Alexander Korotkov
834038c1f8 Avoid syscache lookup while building a WAIT FOR tuple descriptor
Use TupleDescInitBuiltinEntry instead of TupleDescInitEntry when building
the result tuple descriptor for the WAIT FOR command. This avoids a syscache
access that could re-establish a catalog snapshot after we've explicitly
released all snapshots before the wait.

Discussion: https://postgr.es/m/CABPTF7U%2BSUnJX_woQYGe%3D%3DR9Oz%2B-V6X0VO2stBLPGfJmH_LEhw%40mail.gmail.com
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2026-04-06 22:47:26 +03:00
Nathan Bossart
775fe51daa Remove recheck_relation_needs_vacanalyze().
This function is a thin wrapper around relation_needs_vacanalyze()
that handles fetching and freeing the pgstat entry for the table.
Since all callers of relation_needs_vacanalyze() do that anyway, we
can teach that function to fetch/free the pgstat entry and use it
instead.

Suggested-by: Álvaro Herrera <alvherre@kurilemu.de>
Author: Sami Imseih <samimseih@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0s4xjMrB-VAnLccC7kY8d0-4806-Lsac-czJsdA1LXtAw%40mail.gmail.com
2026-04-06 14:30:52 -05:00
Robert Haas
e972dff6c3 auto_explain: Add new GUC, auto_explain.log_extension_options.
The associated value should look like something that could be
part of an EXPLAIN options list, but restricted to EXPLAIN options
added by extensions.

For example, if pg_overexplain is loaded, you could set
auto_explain.log_extension_options = 'DEBUG, RANGE_TABLE'.
You can also specify arguments to these options in the same manner
as normal e.g. 'DEBUG 1, RANGE_TABLE false'.

Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Discussion: http://postgr.es/m/CA+Tgmob-0W8306mvrJX5Urtqt1AAasu8pi4yLrZ1XfwZU-Uj1w@mail.gmail.com
2026-04-06 15:19:42 -04:00
Tom Lane
d516974840 Support more object types within CREATE SCHEMA.
Having rejected the principle that we should know how to re-order
the sub-commands of CREATE SCHEMA, there is not really anything
except a little coding to stop us from supporting more object types.
This patch adds support for creating functions (including procedures
and aggregates), operators, types (including domains), collations,
and text search objects.

SQL:2021 specifies that we should allow functions, procedures,
types, domains, and collations, so this moves us a great deal
closer to full SQL compatibility of CREATE SCHEMA.  What remains
missing from their list are casts, transforms, roles, and some
object types we don't support yet (e.g. CREATE CHARACTER SET).
Supporting casts or transforms would be problematic because
they don't have names at all, let alone schema-qualified names,
so it'd be quite a stretch to say that they belong to a schema.
Roles likewise are not schema-qualified, plus they are global
to a cluster, making it even less reasonable to consider them
as belonging to a schema.  So I don't see us trying to complete
the list.

User-defined aggregates and operators are outside the spec's ken,
as are text search objects, so adding them does not do anything for
spec compatibility.  But they go along with these other object types,
plus it takes no additional code to support them since they are
represented as DefineStmts like some variants of CREATE TYPE.
It would indeed take some effort to reject them.

Author: Kirill Reshke <reshkekirill@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CALdSSPh4jUSDsWu3K58hjO60wnTRR0DuO4CKRcwa8EVuOSfXxg@mail.gmail.com
2026-04-06 15:16:25 -04:00
Tom Lane
404db8f9ed Execute foreign key constraints in CREATE SCHEMA at the end.
The previous patch simplified CREATE SCHEMA's behavior to "execute all
subcommands in the order they are written".  However, that's a bit too
simple, as the spec clearly requires forward references in foreign key
constraint clauses to work, see feature F311-01.  (Most other SQL
implementations seem to read more into the spec than that, but it's
not clear that there's justification for more in the text, and this is
the only case that doesn't introduce unresolvable issues.)  We never
implemented that before, but let's do so now.

To fix it, transform FOREIGN KEY clauses into ALTER TABLE ... ADD
FOREIGN KEY commands and append them to the end of the CREATE SCHEMA's
subcommand list.  This works because the foreign key constraints are
independent and don't affect any other DDL that might be in CREATE
SCHEMA.  For simplicity, we do this for all FOREIGN KEY clauses even
if they would have worked where they were.

Author: Jian He <jian.universality@gmail.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1075425.1732993688@sss.pgh.pa.us
2026-04-06 15:16:25 -04:00
Tom Lane
a9c350d9ee Don't try to re-order the subcommands of CREATE SCHEMA.
transformCreateSchemaStmtElements has always believed that it is
supposed to re-order the subcommands of CREATE SCHEMA into a safe
execution order.  However, it is nowhere near being capable of doing
that correctly.  Nor is there reason to think that it ever will be,
or that that is a well-defined requirement.  (The SQL standard does
say that it should be possible to do foreign-key forward references
within CREATE SCHEMA, but it's not clear that the text requires
anything more than that.)  Moreover, the problem will get worse as
we add more subcommand types.  Let's just drop the whole idea and
execute the commands in the order given, which seems like a much
less astonishment-prone definition anyway.  The foreign-key issue
will be handled in a follow-up patch.

This will result in a release-note-worthy incompatibility,
which is that forward references like
	CREATE SCHEMA myschema
	    CREATE VIEW myview AS SELECT * FROM mytable
	    CREATE TABLE mytable (...);
used to work and no longer will.  Considering how many closely
related variants never worked, this isn't much of a loss.

Along the way, pass down a ParseState so that we can provide an
error cursor for "wrong schema name" and related errors, and fix
transformCreateSchemaStmtElements so that it doesn't scribble
on the parsetree passed to it.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/1075425.1732993688@sss.pgh.pa.us
2026-04-06 15:16:25 -04:00
Masahiko Sawada
1ff3180ca0 Allow autovacuum to use parallel vacuum workers.
Previously, autovacuum always disabled parallel vacuum regardless of
the table's index count or configuration. This commit enables
autovacuum workers to use parallel index vacuuming and index cleanup,
using the same parallel vacuum infrastructure as manual VACUUM.

Two new configuration options control the feature. The GUC
autovacuum_max_parallel_workers sets the maximum number of parallel
workers a single autovacuum worker may launch; it defaults to 0,
preserving existing behavior unless explicitly enabled. The per-table
storage parameter autovacuum_parallel_workers provides per-table
limits. A value of 0 disables parallel vacuum for the table, a
positive value caps the worker count (still bounded by the GUC), and
-1 (the default) defers to the GUC.

To handle cases where autovacuum workers receive a SIGHUP and update
their cost-based vacuum delay parameters mid-operation, a new
propagation mechanism is added to vacuumparallel.c. The leader stores
its effective cost parameters in a DSM segment. Parallel vacuum
workers poll for changes in vacuum_delay_point(); if an update is
detected, they apply the new values locally via VacuumUpdateCosts().

A new test module, src/test/modules/test_autovacuum, is added to
verify that parallel autovacuum workers are correctly launched and
that cost-parameter updates are propagated as expected.

The patch was originally proposed by Maxim Orlov, but the
implementation has undergone significant architectural changes
since then during the review process.

Author: Daniil Davydov <3danissimo@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: zengman <zengman@halodbtech.com>
Discussion: https://postgr.es/m/CACG=ezZOrNsuLoETLD1gAswZMuH2nGGq7Ogcc0QOE5hhWaw=cw@mail.gmail.com
2026-04-06 11:48:29 -07:00
Álvaro Herrera
c0b53ec063
Rename cluster.c to repack.c (and corresponding .h)
CLUSTER is no longer the favored way to invoke this functionality, and
the code is about to shift its focus to the REPACK more ambitiously.
Rename the file to avoid leaving an unnecessary historical artifact
around.

Author: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/202603271635.owyhm7btgoic@alvherre.pgsql
2026-04-06 20:11:01 +02:00
Tom Lane
21c69dc73f Disallow system columns in COPY FROM WHERE conditions.
These columns haven't been computed yet when the filtering happens
(since we've not written the candidate tuple into the table); so
any check on them is wrong or useless.  Worse, since aa606b931 such a
reference results in an access off the end of a TupleDesc, potentially
causing a phony "generated columns are not supported in COPY FROM
WHERE conditions" error; and since c98ad086a it throws an Assert
instead.

Actually we could allow tableoid, which has been set to the OID of the
table named as the COPY target.  However, plausible uses for tests of
tableoid would involve a partitioned target table, and the user would
wish it to read as the OID of the destination partition.  There has
been some discussion of changing things to make it work like that,
but pending that happening we should just disallow tableoid along
with other system columns.

It seems best though to install this prohibition only in HEAD.
In the back branches we'll just guard the unsafe TupleDesc access,
and people will keep getting whatever semantics they got before.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/6f435023-8ab6-47c2-ba07-035d0c4212f9@gmail.com
2026-04-06 14:05:01 -04:00
Tom Lane
f7da81f68b Add missing .gitignore files.
contrib/pg_stash_advice and src/test/modules/test_shmem
missed these, leading to complaints from git after an
in-tree check-world run.

Use our standard boilerplate list of ignorable subdirectories,
although the two modules presently create different subsets
of that.
2026-04-06 13:25:29 -04:00
Tom Lane
6582010c80 Fix null-bitmap combining in array_agg_array_combine().
This code missed the need to update the combined state's
nullbitmap if state1 already had a bitmap but state2 didn't.
We need to extend the existing bitmap with 1's but didn't.
This could result in wrong output from a parallelized
array_agg(anyarray) calculation, if the input has a mix of
null and non-null elements.  The errors depended on timing
of the parallel workers, and therefore would vary from one
run to another.

Also install guards against integer overflow when calculating
the combined object's sizes, and make some trivial cosmetic
improvements.

Author: Dmytro Astapov <dastapov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFQUnFj2pQ1HbGp69+w2fKqARSfGhAi9UOb+JjyExp7kx3gsqA@mail.gmail.com
Backpatch-through: 16
2026-04-06 13:14:53 -04:00
Robert Haas
0442f1c9ef Add a guc_check_handler to the EXPLAIN extension mechanism.
It would be useful to be able to tell auto_explain to set a custom
EXPLAIN option, but it would be bad if it tried to do so and the
option name or value wasn't valid, because then every query would fail
with a complaint about the EXPLAIN option. So add a guc_check_handler
that auto_explain will be able to use to only try to set option
name/value/type combinations that have been determined to be legal,
and to emit useful messages about ones that aren't.

Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Discussion: http://postgr.es/m/CA+Tgmob-0W8306mvrJX5Urtqt1AAasu8pi4yLrZ1XfwZU-Uj1w@mail.gmail.com
2026-04-06 12:31:47 -04:00
Nathan Bossart
e3481edfd1 Remove autoanalyze corner case.
The restructuring in commit 53b8ca6881 revealed an interesting
corner case: if a table needs vacuuming for wraparound prevention
and autovacuum is disabled for it, we might still choose to analyze
it.  Research seems to indicate this was an accidental addition by
commit 48188e1621, and further discussion indicates there is
consensus that it is unnecessary and can be removed.

Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Discussion: https://postgr.es/m/adB9nSsm_S0D9708%40nathan
2026-04-06 11:28:46 -05:00
Robert Haas
e0e819cc08 Expose helper functions scan_quoted_identifier and scan_identifier.
Previously, this logic was embedded within SplitIdentifierString,
SplitDirectoriesString, and SplitGUCList. Factoring it out saves
a bit of duplicated code, and also makes it available to extensions
that might want to do similar things without necessarily wanting to
do exactly the same thing.

Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Discussion: http://postgr.es/m/CA+Tgmob-0W8306mvrJX5Urtqt1AAasu8pi4yLrZ1XfwZU-Uj1w@mail.gmail.com
2026-04-06 11:13:25 -04:00
Fujii Masao
ca2b5443e2 Add TAP tests for log_lock_waits
This commit updates 011_lock_stats.pl to verify log_lock_waits behavior.

The tests check that messages are emitted both when a wait occurs and
when the lock is acquired, and that the "still waiting for" message is logged
exactly once per wait, even if the backend wakes up during the wait.

The latter covers the behavior introduced by commit fd6ecbfa75.

Author: Hüseyin Demir <huseyin.d3r@gmail.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAB5wL7YB1my9W5k5i=SY+=sTjeozyJ0YkvGXrVfeDNzuRkoTPg@mail.gmail.com
2026-04-06 23:49:40 +09:00
Fujii Masao
93dc1ace20 Release postmaster working memory context in slotsync worker
Child processes do not need the postmaster's working memory context and
normally release it at the start of their main entry point. However,
the slotsync worker forgot to do so.

This commit makes the slotsync worker release the postmaster's working
memory context at startup, preventing unintended use.

Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tiancheng Ge <getiancheng_2012@163.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwHO05JaUpgKF8FBDmPdBUJsK22axRRcgmAUc2Jyi8OK8g@mail.gmail.com
2026-04-06 23:04:18 +09:00