Commit graph

47775 commits

Author SHA1 Message Date
David Rowley
a63bbc811d Use stack-allocated StringInfoDatas, where possible
6d0eba662 already did most of the changes, but some new ones snuck in
just prior to that commit, so these got missed.

Having these short-lived StringInfoDatas on the stack rather than having
them get palloc'd by makeStringInfo() is simply for performance as it
saves doing a 2nd palloc.

Since this code is new to v19, it makes sense to improve it now rather
than wait until we branch as having v19 and v20 differ here just makes it
harder to backpatch fixes in this area.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/adt4wpj4FZwR+S7I@ip-10-97-1-34.eu-west-3.compute.internal
2026-04-13 10:43:19 +12:00
Michael Paquier
80156cee06 Honor passed-in database OIDs in pgstat_database.c
Three routines in pgstat_database.c incorrectly ignore the database OID
provided by their caller, using MyDatabaseId instead:
- pgstat_report_connect()
- pgstat_report_disconnect()
- pgstat_reset_database_timestamp()

The first two functions, for connection and disconnection, each have a
single caller that already passes MyDatabaseId.  This was harmless,
still incorrect.

The timestamp reset function also has a single caller, but in this case
the issue has a real impact: it fails to reset the timestamp for the
shared-database entry (datid=0) when operating on shared objects.  This
situation can occur, for example, when resetting counters for shared
relations via pg_stat_reset_single_table_counters().

There is currently one test in the tree that checks the reset of a
shared relation, for pg_shdescription, we rely on it to check what is
stored in pg_stat_database.  As stats_reset may be NULL, two resets are
done to provide a baseline for comparison.

Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Dapeng Wang <wangdp20191008@gmail.com>
Discussion: https://postgr.es/m/ABBD5026-506F-4006-A569-28F72C188693@gmail.com
Backpatch-through: 15
2026-04-11 17:02:52 +09:00
Richard Guo
77d0e82e58 Fix estimate_array_length error with set-operation array coercions
When a nested set operation's output type doesn't match the parent's
expected type, recurse_set_operations builds a projection target list
using generate_setop_tlist with varno 0.  If the required type
coercion involves an ArrayCoerceExpr, estimate_array_length could be
called on such a Var, and would pass it to examine_variable, which
errors in find_base_rel because varno 0 has no valid relation entry.

Fix by skipping the statistics lookup for Vars with varno 0.

Bug introduced by commit 9391f7152.  Back-patch to v17, where
estimate_array_length was taught to use statistics.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/adjW8rfPDkplC7lF@pryzbyj2023
Backpatch-through: 17
2026-04-11 16:38:47 +09:00
Thomas Munro
b2a17ba7a5 read_stream: Remove obsolete comment.
This comment was describing the v17 implementation (or io_method=sync).

Backpatch-through: 18
2026-04-11 11:25:25 +12:00
Masahiko Sawada
c22d115f1d Fix unstable log verification in test_autovacuum.
The test in test_autovacuum was unstable because it called
log_contains() immediately after verifying autovacuum_count in
pg_stat_user_tables. This created a race condition where the
statistics could be updated before the autovacuum logs were fully
flushed to disk.

This commit replaces log_contains() with wait_for_log() to ensure the
test waits for the parallel vacuum messages to appear. Additionally,
remove the checks of the autovacuum count. Verifying the log messages
is sufficient to confirm parallel autovacuum behavior, as logging is
only enabled for the specific table under test.

Per report from buildfarm member flaviventris.

Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/525d0f48-93f7-493f-a988-f39b460a79bc@gmail.com
2026-04-10 16:01:42 -07:00
Fujii Masao
de74d1e9a5 Adjust log level of logical decoding messages by context
Commit 21b018e7ea lowered some logical decoding messages from LOG to DEBUG1.
However, per discussion on pgsql-hackers, messages from background activity
(e.g., walsender or slotsync worker) should remain at LOG, as they are less
frequent and more likely to indicate issues that DBAs should notice.

For foreground SQL functions (e.g., pg_logical_slot_peek_binary_changes()),
keep these messages at DEBUG1 to avoid excessive log noise. They can still be
enabled by lowering client_min_messages or log_min_messages for the session.

This commit updates logical decoding to log these messages at LOG for
background activity and at DEBUG1 for foreground execution.

Suggested-by: Robert Haas <robertmhaas@gmail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CA+TgmoYsu2+YAo9eLGkDp5VP-pfQ-jOoX382vS4THKHeRTNgew@mail.gmail.com
2026-04-10 22:59:34 +09:00
Andrew Dunstan
eec8e234bd Revert "Add built-in fuzzing harnesses for security testing."
This reverts commit 4a18907b41.

inadvertenly pushed, mea culpa
2026-04-10 09:53:58 -04:00
Andrew Dunstan
3f8913f683 Use size_t instead of Size in pg_waldump
In commit b15c151398 I missed the memo about not using Size in new
code.

Per complaint from Thomas Munro

Discussion: https://postgr.es/m/CA+hUKGJkeTVuq5u5WKJm6xkwmW577UuQ7fA=PyBCSR3h9g2GtQ@mail.gmail.com
2026-04-10 09:29:00 -04:00
Andrew Dunstan
4a18907b41 Add built-in fuzzing harnesses for security testing.
Add 12 libFuzzer-compatible fuzzing harnesses behind a new -Dfuzzing=true
meson option.  Each harness implements LLVMFuzzerTestOneInput() and can
also be built in standalone mode (reading from files) when no fuzzer
engine is detected.

Frontend targets (no backend dependencies):
  fuzz_json            - non-incremental JSON parser (pg_parse_json)
  fuzz_json_incremental - incremental/chunked JSON parser
  fuzz_conninfo        - libpq connection string parser (PQconninfoParse)
  fuzz_pglz            - PGLZ decompressor (pglz_decompress)
  fuzz_unescapebytea   - libpq bytea unescape (PQunescapeBytea)
  fuzz_b64decode       - base64 decoder (pg_b64_decode)
  fuzz_saslprep        - SASLprep normalization (pg_saslprep)
  fuzz_parsepgarray    - array literal parser (parsePGArray)
  fuzz_pgbench_expr    - pgbench expression parser (via Bison/Flex)

Backend targets (link against postgres_lib):
  fuzz_rawparser       - SQL raw parser (raw_parser)
  fuzz_regex           - regex engine (pg_regcomp/pg_regexec)
  fuzz_typeinput       - type input functions (numeric/date/timestamp/interval)
2026-04-10 07:13:08 -04:00
Andrew Dunstan
2b5ba2a0a1 Fix heap-buffer-overflow in pglz_decompress() on corrupt input.
When decoding a match tag, pglz_decompress() reads 2 bytes (or 3
for extended-length matches) from the source buffer before checking
whether enough data remains.  The existing bounds check (sp > srcend)
occurs after the reads, so truncated compressed data that ends
mid-tag causes a read past the allocated buffer.

Fix by validating that sufficient source bytes are available before
reading each part of the match tag.  The post-read sp > srcend
check is no longer needed and is removed.

Found by fuzz testing with libFuzzer and AddressSanitizer.
2026-04-10 07:13:08 -04:00
Andrew Dunstan
2478bd5db0 Fix incremental JSON parser numeric token reassembly across chunks.
When the incremental JSON parser splits a numeric token across chunk
boundaries, it accumulates continuation characters into the partial
token buffer.  The accumulator's switch statement unconditionally
accepted '+', '-', '.', 'e', and 'E' as valid numeric continuations
regardless of position, which violated JSON number grammar
(-? int [frac] [exp]).  For example, input "4-" fed in single-byte
chunks would accumulate the '-' into the numeric token, producing an
invalid token that later triggered an assertion failure during
re-lexing.

Fix by tracking parser state (seen_dot, seen_exp, prev character)
across the existing partial token and incoming bytes, so that each
character class is accepted only in its grammatically valid position.
2026-04-10 07:13:08 -04:00
Amit Langote
009ea1b08d Add test case for same-type reordered FK columns
The test added in 980c1a85d8 covered reordered FK columns with
different types, which triggered an "operator not a member of opfamily"
error in the fast-path prior to that commit.  Add a test for the
same-type case, which is also fixed by that commit but where the wrong
scan key ordering instead produced a spurious FK violation without any
internal error.

Reported-by: Fredrik Widlert <fredrik.widlert@digpro.se>
Discussion: https://postgr.es/m/CADfhSr8hYc-4Cz7vfXH_oV-Jq81pyK9W4phLrOGspovsg2W7Kw@mail.gmail.com
2026-04-10 17:44:06 +09:00
Amit Langote
d6e96bacd3 Move afterTriggerFiringDepth into AfterTriggersData
The static variable afterTriggerFiringDepth introduced by commit
5c54c3ed1b is logically part of the after-trigger state and in
hindsight should have been a field in AfterTriggersData alongside
query_depth and the other per-transaction after-trigger state.
Move it there as firing_depth.  Also update its comment to
accurately reflect its sole remaining purpose: signaling to
AfterTriggerIsActive() that after-trigger firing is active.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFt4NGTNk7BinOsHHM48E9zGAa852vCfGoSe1bbL=JNFQ@mail.gmail.com
2026-04-10 16:17:58 +09:00
Richard Guo
f6936bf9da Fix var_is_nonnullable() to account for varreturningtype
var_is_nonnullable() failed to consider varreturningtype, which meant
it could incorrectly claim a Var is non-nullable based on a column's
NOT NULL constraint even when the Var refers to a non-existent row.
Specifically, OLD.col is NULL for INSERT (no old row exists) and
NEW.col is NULL for DELETE (no new row exists), regardless of any NOT
NULL constraint on the column.

This caused the planner's constant folding in eval_const_expressions
to incorrectly simplify IS NULL / IS NOT NULL tests on such Vars.  For
example, "old.a IS NULL" in an INSERT's RETURNING clause would be
folded to false when column "a" has a NOT NULL constraint, even though
the correct result is true.

Fix by returning false from var_is_nonnullable() when varreturningtype
is not VAR_RETURNING_DEFAULT, since such Vars can be NULL regardless
of table constraints.

Author: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDfaAipL6YzOq2H=gAhKBbcUTYmfbAv+W1zueOfRKH43FQ@mail.gmail.com
2026-04-10 15:51:00 +09:00
Amit Langote
155c03ee9d Assert index_attnos[0] == 1 in ri_FastPathFlushArray()
ri_FastPathFlushArray() handles single-column FKs only, so
index_attnos[0] is always 1.  Add an Assert to make this invariant
explicit, as a followup to 980c1a85d8.

Suggested-by: Junwang Zhao <zhjwpku@gmail.com> (offlist)
Discussion: https://www.postgresql.org/message-id/CADfhSr-pCkbDxmiOVYSAGE5QGjsQ48KKH_W424SPk%2BpwzKZFaQ%40mail.gmail.com
2026-04-10 15:24:38 +09:00
Amit Langote
980c1a85d8 Fix FK fast-path scan key ordering for mismatched column order
The fast-path foreign key check introduced in 2da86c1ef9 assumed that
constraint key positions directly correspond to index column positions.
This is not always true as a FK constraint can reference PK columns in a
different order than they appear in the PK's unique index.

For example, if the PK is (a, b, c) and the FK references them as
(a, c, b), the constraint stores keys in the FK-specified order, but
the index has columns in PK order. The buggy code used the constraint
key index to access rd_opfamily[i], which retrieved the wrong operator
family when columns were reordered, causing "operator X is not a member
of opfamily Y" errors.

After fixing the opfamily lookup, a second issue started to happen:
btree index scans require scan keys to be ordered by attribute number.
The code was placing scan keys at array position i with attribute number
idx_attno, producing out-of-order keys when columns were swapped. This
caused "btree index keys must be ordered by attribute" errors.

The fix adds an index_attnos array to FastPathMeta that maps each
constraint key position to its corresponding index column position.
In ri_populate_fastpath_metadata(), we search indkey to find the actual
index column for each pk_attnums[i] and use that position for the
opfamily lookup. In build_index_scankeys(), we place each scan key at
the array position corresponding to its index column
(skeys[idx_attno-1]) rather than at the constraint key position,
ensuring scan keys are properly ordered by attribute number as btree
requires.

Reported-by: Fredrik Widlert <fredrik.widlert@digpro.se>
Author: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://www.postgresql.org/message-id/CADfhSr-pCkbDxmiOVYSAGE5QGjsQ48KKH_W424SPk%2BpwzKZFaQ%40mail.gmail.com
2026-04-10 13:33:55 +09:00
Amit Langote
03029409b4 Fix typo left by 34a3078629
Reported-by: jie wang <jugierwang@gmail.com>
Reported-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAJnZyeDyaS=X-eYN=9rDYqK=6ma1gMLa0qDgfNbZKK0e0+q99Q@mail.gmail.com
2026-04-10 13:32:38 +09:00
Amit Langote
34a3078629 Fix RI fast-path crash under nested C-level SPI
When a C-language function uses SPI_connect/SPI_execute/SPI_finish to
INSERT into a table with FK constraints, the FK AFTER triggers fire
and schedule ri_FastPathEndBatch via
RegisterAfterTriggerBatchCallback(), opening PK relations under
CurrentResourceOwner at the time of the SPI call.  The query_depth > 0
guard in FireAfterTriggerBatchCallbacks suppresses the callback at
that nesting level, deferring teardown to the outer query's
AfterTriggerEndQuery. By then the resource owner active during the SPI
call may have been released, decrementing the cached relations'
refcounts to zero. ri_FastPathTeardown, running under the outer
query's resource owner, then crashes in assert builds when it attempts
to close relations whose refcounts are already zero:

  TRAP: failed Assert("rel->rd_refcnt > 0")

Fix by storing batch callbacks at the level where they should fire:
in AfterTriggersQueryData.batch_callbacks for immediate constraints
(fired by AfterTriggerEndQuery) and in AfterTriggersData.batch_callbacks
for deferred constraints (fired by AfterTriggerFireDeferred and
AfterTriggerSetState).  RegisterAfterTriggerBatchCallback() routes the
callback to the current query-level list when query_depth >= 0, and to
the top-level list otherwise.  FireAfterTriggerBatchCallbacks() takes a
list parameter and simply iterates and invokes it; memory cleanup is
handled by the caller.  This replaces the query_depth > 0 guard with
list-level scoping.  Note that deferred constraints are unaffected by
this bug: their callbacks fire at commit via AfterTriggerFireDeferred,
under the outer transaction's resource owner, which remains valid
throughout.

Also add firing_batch_callbacks to AfterTriggersData to enforce that
callbacks do not register new callbacks during
FireAfterTriggerBatchCallbacks(), which would be unsafe as it could
modify the list being iterated.  An Assert in
RegisterAfterTriggerBatchCallback() enforces this discipline for
future callers.  The flag is reset at transaction and subtransaction
boundaries to handle cases where an error thrown by a callback is
caught and the subtransaction is rolled back.

While at it, ensure callbacks are properly accounted for at all
transaction boundaries, as cleanup of b7b27eb41a: discard any
remaining top-level callbacks on both commit and abort in
AfterTriggerEndXact(), and clean up query-level callbacks in
AfterTriggerFreeQuery().

Note that ri_PerformCheck() calls SPI with fire_triggers=false, which
skips AfterTriggerBeginQuery/EndQuery for that SPI command.  Any
triggers queued during that SPI command are not fired immediately but
deferred to the outer query level.  Since the fast-path check for
those triggers runs under the outer query's resource owner rather than
a nested SPI resource owner, and ri_PerformCheck() does not create
a dedicated child resource owner, the bug described above does not
apply.

Reported-by: Evan Montgomery-Recht <montge@mianetworks.net>
Reported-by: Sandro Santilli <strk@kbt.io>
Analyzed-by: Evan Montgomery-Recht <montge@mianetworks.net>
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g@mail.gmail.com
2026-04-10 12:41:34 +09:00
Michael Paquier
5b5bf51e43 Zero-fill private_data when attaching an injection point
InjectionPointAttach() did not initialize the private_data buffer of the
shared memory entry before (perhaps partially) overwriting it.  When the
private data is set to NULL by the caler, the buffer was left
uninitialized.  If set, it could have stale contents.

The buffer is initialized to zero, so as the contents recorded when a
point is attached are deterministic.

Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0tsGHu2h6YLnVu4HiK05q+gTE_9WVUAqihW2LSscAYS-g@mail.gmail.com
Backpatch-through: 17
2026-04-10 11:17:09 +09:00
Nathan Bossart
71ff232a5b Fix double-free in pg_stat_autovacuum_scores.
Presently, relation_needs_vacanalyze() unconditionally frees the
pgstat entry returned by pgstat_fetch_stat_tabentry_ext().  This
behavior was first added by commit 02502c1bca to avoid memory
leakage in autovacuum.  While this is fine for autovacuum since it
forces stats_fetch_consistency to "none", it is not okay for other
callers that use "cache" or "snapshot".  This manifests as a
double-free when pg_stat_autovacuum_scores is called multiple times
in the same transaction.

To fix, add a "bool *may_free" parameter to
pgstat_fetch_stat_tabentry_ext() that returns whether it is safe
for the caller to explicitly pfree() the result.  If a caller would
rather leave it to the memory context machinery to free the result,
it can pass NULL as the "may_free" argument (or just ignore its
value).

Oversight in commit 87f61f0c82.

Reported-by: Tender Wang <tndrwang@gmail.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAHewXNkJKdwb3D5OnksrdOqzqUnXUEMpDam1TPW0vfUkW%3D7jUw%40mail.gmail.com
Discussion: https://postgr.es/m/5684f479-858e-4c5d-b8f5-bcf05de1f909%40gmail.com
2026-04-09 13:07:06 -05:00
Masahiko Sawada
8030b839d3 Remove an unstable wait from parallel autovacuum regression test.
The test 001_parallel_autovacuum.pl verified that vacuum delay
parameters are propagated to parallel vacuum workers by using
injection points. It previously waited for autovacuum to complete on
the test_autovac table. However, since injection points are
cluster-wide, an autovacuum worker could be triggered on tables in
other databases (e.g., template1) and get stuck at the same injection
point. This could lead to a timeout when the test waits for the
expected table's autovacuum to finish.

This commit removes the wait for autovacuum completion from this
specific test case. Since the primary goal is to verify the
propagation of parameter updates, which is already confirmed via log
messages, waiting for the entire vacuum process to finish is
unnecessary and prone to instability in concurrent test environments.

Author:	Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0s+kZZRMSF4HW7tZ9W2jS1o4B+Fg8dr5a-T6mANX+mdQA@mail.gmail.com
2026-04-09 09:13:32 -07:00
Andres Freund
7fc36c5db5 instrumentation: Avoid CPUID 0x15/0x16 for Hypervisor TSC frequency
This restricts the retrieval of the TSC frequency whilst under a Hypervisor to
either Hypervisor-specific CPUID registers (0x40000010), or TSC
calibration. We previously allowed retrieving from the traditional CPUID
registers for TSC frequency (0x15/0x16) like on bare metal, but it turns out
that they are not trustworthy when virtualized and can report wildly incorrect
frequencies, like 7 kHz when the actual calibrated frequencty is 2.5 GHz.

Per report from buildfarm member drongo.

Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/jr4hk2sxhqcfpb67ftz5g4vw33nm67cgf7go3wwmqsafu5aclq%405m67ukuhyszz
2026-04-09 11:50:46 -04:00
Nathan Bossart
60165db6e1 Add LOG_NEVER error level code.
This logging level means not to emit the log, which is useful for
functions like relation_needs_vacanalyze().  This function accepts
a log level argument but not all callers want it to emit logs.

Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3101163.1775676098%40sss.pgh.pa.us
2026-04-09 10:18:15 -05:00
Richard Guo
8b6c89e377 Fix integer overflow in nodeWindowAgg.c
In nodeWindowAgg.c, the calculations for frame start and end positions
in ROWS and GROUPS modes were performed using simple integer addition.
If a user-supplied offset was sufficiently large (close to INT64_MAX),
adding it to the current row or group index could cause a signed
integer overflow, wrapping the result to a negative number.

This led to incorrect behavior where frame boundaries that should have
extended indefinitely (or beyond the partition end) were treated as
falling at the first row, or where valid rows were incorrectly marked
as out-of-frame.  Depending on the specific query and data, these
overflows can result in incorrect query results, execution errors, or
assertion failures.

To fix, use overflow-aware integer addition (ie, pg_add_s64_overflow)
to check for overflows during these additions.  If an overflow is
detected, the boundary is now clamped to INT64_MAX.  This ensures the
logic correctly treats the boundary as extending to the end of the
partition.

Bug: #19405
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/19405-1ecf025dda171555@postgresql.org
Backpatch-through: 14
2026-04-09 19:28:33 +09:00
Richard Guo
c1408956e3 Strip PlaceHolderVars from partition pruning operands
When pulling up a subquery, its targetlist items may be wrapped in
PlaceHolderVars to enforce separate identity or as a result of outer
joins.  This causes any upper-level WHERE clauses referencing these
outputs to contain PlaceHolderVars, which prevents partprune.c from
recognizing that they match partition key columns, defeating partition
pruning.

To fix, strip PlaceHolderVars from operands before comparing them to
partition keys.  A PlaceHolderVar with empty phnullingrels appearing
in a relation-scan-level expression is effectively a no-op, so
stripping it is safe.  This parallels the existing treatment in
indxpath.c for index matching.

In passing, rename strip_phvs_in_index_operand() to strip_noop_phvs()
and move it from indxpath.c to placeholder.c, since it is now a
general-purpose utility used by both index matching and partition
pruning code.

Back-patch to v18.  Although this issue exists before that, changes in
that version made it common enough to notice.  Given the lack of field
reports for older versions, I am not back-patching further.  In the
v18 back-patch, strip_phvs_in_index_operand() is retained as a thin
wrapper around the new strip_noop_phvs() to avoid breaking third-party
extensions that may reference it.

Reported-by: Cándido Antonio Martínez Descalzo <candido@ninehq.com>
Diagnosed-by: David Rowley <dgrowleyml@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAH5YaUwVUWETTyVECTnhs7C=CVwi+uMSQH=cOkwAUqMdvXdwWA@mail.gmail.com
Backpatch-through: 18
2026-04-09 16:41:31 +09:00
Amit Langote
e1cc57fabd Add nkeys parameter to recheck_matched_pk_tuple()
The function looped over ii_NumIndexKeyAttrs elements of the skeys
array, but one caller (ri_FastPathFlushArray) passes a one-element
array since it only handles single-column FKs.  The function
signature did not communicate this constraint, which static analysis
flags as a potential out-of-bounds read.

Add an nkeys parameter and assert that it matches
ii_NumIndexKeyAttrs, then use it in the loop.  The call sites
already know the key count.

Reported-by: Evan Montgomery-Recht <montge@mianetworks.net>
Discussion: https://postgr.es/m/CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g@mail.gmail.com
2026-04-09 14:45:31 +09:00
Michael Paquier
e0fa5bd146 Reduce presence of syscache.h in src/include/
ee642cccc4 has added syscache.h in inval.h and objectaddress.h,
enlarging by a lot the footprint of this header, particularly via
objectaddress.h.  A change in syscache.h would cause a lot more files to
be recompiled.

This commit reduces the presence of syscache.h by switching to a direct
use of syscache_ids.h in inval.h and objectaddress.h, where the enum
SysCacheIdentifier is defined.  genbki.pl gains an #ifndef block for
this header, so as its inclusion is more controlled.

Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/vlcexdcimsmvu3aplt2yxpfndkgtuvjsrms2fdl46rbw3k2kug@drspkoxlaije
2026-04-09 08:49:36 +09:00
Álvaro Herrera
2cff363715
Simplify declaration of memcpy target
The existing one is understandable failing on (some?) 32-bit platforms.

Reported-by: Tomas Vondra <tomas@vondra.me>
Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1c197f2d-49a2-4830-8dde-55867218b62d@vondra.me
2026-04-08 22:58:56 +02:00
Peter Eisentraut
f8eec1ced6 Add missing PGDLLIMPORT markings 2026-04-08 15:49:33 +02:00
Thomas Munro
a1643d40b3 Remove RADIUS support.
Our RADIUS implementation supported only the deprecated RADIUS/UDP
variant, without the recommended Message-Authenticator attribute to
mitigate against the Blast-RADIUS vulnerability.  By now, popular RADIUS
servers are expected to generate loud warnings or reject our
authentication attempts outright.

Since there have been no user reports about this, it seems unlikely that
there are users.

Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Michael Banck <mbanck@gmx.net>
Discussion: https://postgr.es/m/CA%2BhUKG%2BSH309V8KECU5%3DxuLP9Dks0v9f9UVS2W74fPAE5O21dg%40mail.gmail.com
2026-04-08 22:38:43 +12:00
Etsuro Fujita
28972b6fc3 Add support for importing statistics from remote servers.
Add a new FDW callback routine that allows importing remote statistics
for a foreign table directly to the local server, instead of collecting
statistics locally.  The new callback routine is called at the beginning
of the ANALYZE operation on the table, and if the FDW failed to import
the statistics, the existing callback routine is called on the table to
collect statistics locally.

Also implement this for postgres_fdw.  It is enabled by "restore_stats"
option both at the server and table level.  Currently, it is the user's
responsibility to ensure remote statistics to import are up-to-date, so
the default is false.

Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/CADkLM%3DchrYAx%3DX2KUcDRST4RLaRLivYDohZrkW4LLBa0iBhb5w%40mail.gmail.com
2026-04-08 19:15:00 +09:00
Thomas Munro
d1c01b79d4 aio: Adjust I/O worker pool automatically.
The size of the I/O worker pool used to implement io_method=worker was
previously controlled by the io_workers setting, defaulting to 3.  It
was hard to know how to tune it effectively.  That is replaced with:

  io_min_workers=2
  io_max_workers=8 (up to 32)
  io_worker_idle_timeout=60s
  io_worker_launch_interval=100ms

The pool is automatically sized within the configured range according to
recent variation in demand.  It grows when existing workers detect that
latency might be introduced by queuing, and shrinks when the
highest-numbered worker is idle for too long.  Work was already
concentrated into low-numbered workers in anticipation of this logic.

The logic for waking extra workers now also tries to measure and reduce
the number of spurious wakeups, though they are not entirely eliminated.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2Bm4xV0LMoH2c%3DoRAdEXuCnh%2BtGBTWa7uFeFMGgTLAw%2BQ%40mail.gmail.com
2026-04-08 19:08:32 +12:00
John Naylor
948ef7cdc4 Exit early from pg_comp_crc32c_pmull for small inputs
The vectorized path in commit fbc57f2bc had a side effect of putting
more branches in the path taken for small inputs. To reduce risk
of regressions, only proceed with the vectorized path if we can
guarantee that the remaining input after the alignment preamble is
greater than 64 bytes. That also allows removing length checks in
the alignment preamble.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/CANWCAZZ48GuLYhJCcTy8TXysjrMVJL6n1n7NP94=iG+t80YKPw@mail.gmail.com
2026-04-08 13:52:14 +07:00
Thomas Munro
ce11e63f81 pg_upgrade: Check for unsupported encodings.
Since we have dropped MULE_INTERNAL, add a check that all encodings used
in the source cluster are still supported according to
PG_ENCODING_BE_VALID().  This is done generically, in case we decide to
drop another encoding some day.

Suggested-by: Jeff Davis <pgsql@j-davis.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKGKXDXh-FdU0orjfv%2BF08f%3DD91BhV3Ra-4zL-q%2BJmGYqTA%40mail.gmail.com
2026-04-08 17:45:09 +12:00
Thomas Munro
77645d44e3 Remove MULE_INTERNAL encoding.
This was useful before widespread Unicode adoption, and was based on the
internal encoding Emacs used to mix multiple sub-encodings.  Emacs
itself has stopped using it, and our implementation hadn't been updated
with modern underlying standards.  It is thought to be very unlikely
that anyone is still using it in the field.  Since such a complex
encoding comes with costs and risks, we agreed to drop support.

Any existing database using this encoding would need to be dumped and
restored with a new encoding to upgrade to PostgreSQL 19, most likely
UTF8, since pg_upgrade would fail.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tatsuo Ishii <ishii@postgresql.org>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/CA%2BhUKGKXDXh-FdU0orjfv%2BF08f%3DD91BhV3Ra-4zL-q%2BJmGYqTA%40mail.gmail.com
2026-04-08 17:40:06 +12:00
Andres Freund
2c16deee2f instrumentation: Allocate query level instrumentation in ExecutorStart
Until now extensions that wanted to measure overall query execution could
create QueryDesc->totaltime, which the core executor would then start and
stop.  That's a bit odd and composes badly, e.g. extensions always had to use
INSTRUMENT_ALL, because otherwise another extension might not get what they
need.

Instead this introduces a new field, QueryDesc->query_instr_options, that
extensions can use to indicate whether they need query level instrumentation
populated, and with which instrumentation options. Extensions should take care
to only add options they need, instead of replacing the options of others.

The prior name of the field, totaltime, sounded like it would only measure
time, but these days the instrumentation infrastructure can track more
resources.  The secondary benefit is that this will make it obvious to
extensions that they may not create the Instrumentation struct themselves
anymore (often extensions build only against a postgres build without
assertions).

Adjust pg_stat_statements and auto_explain to match, and lower the
requested instrumentation level for auto_explain to INSTRUMENT_TIMER,
since the summary instrumentation it needs is only runtime.

The reason to push this now, rather in the PG 20 cycle, is that 5a79e78501
already required extensions using query level instrumentations to adjust their
code, and it seemed undesirable to require them to do so again for 20.

Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAP53Pkyqsht+exJQYRsjhSWYKu+vFGHhPub7m6PmFD6Or0=p1g@mail.gmail.com
2026-04-08 00:06:45 -04:00
Fujii Masao
db93032a7c Fix slotsync worker blocking promotion when stuck in wait
Previously, on standby promotion, the startup process sent SIGUSR1 to
the slotsync worker (or a backend performing slot synchronization) and
waited for it to exit. This worked in most cases, but if the process was
blocked waiting for a response from the primary (e.g., due to a network
failure), SIGUSR1 would not interrupt the wait. As a result, the process
could remain stuck, causing the startup process to wait for a long time
and delaying promotion.

This commit fixes the issue by introducing a new procsignal reason,
PROCSIG_SLOTSYNC_MESSAGE. On promotion, the startup process
sends this signal, and the handler sets interrupt flags so the process
exits (or errors out) promptly at CHECK_FOR_INTERRUPTS(), allowing
promotion to complete without delay.

Backpatch to v17, where slotsync was introduced.

Author: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwFzNYroAxSoyJhqTU-pH=t4Ej6RyvhVmBZ91Exj_TPMMQ@mail.gmail.com
Backpatch-through: 17
2026-04-08 11:22:21 +09:00
Andres Freund
544000288e instrumentation: Move ExecProcNodeInstr to allow inlining
This moves the implementation of ExecProcNodeInstr, the ExecProcNode variant
that gets used when instrumentation is on, to be defined in instrument.c
instead of execProcNode.c, and marks functions it uses as inline.

This allows compilers to generate an optimized implementation, and shows a 4
to 12% reduction in instrumentation overhead for queries that move lots of
rows.

Author: Lukas Fittl <lukas@fittl.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAP53PkzdBK8VJ1fS4AZ481LgMN8f9mJiC39ZRHqkFUSYq6KWmg@mail.gmail.com
2026-04-07 21:36:49 -04:00
Tomas Vondra
e157fe6f76 Add EXPLAIN (IO) instrumentation for TidRangeScan
Adds support for EXPLAIN (IO) instrumentation for TidRange scans. This
requires adding shared instrumentation for parallel scans, using the
separate DSM approach introduced by dd78e69cfc.

Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
2026-04-07 23:25:05 +02:00
Andres Freund
16fca48254 pg_test_timing: Also test RDTSC[P] timing, report time source, TSC frequency
This adds support to pg_test_timing for the different timing sources added by
294520c444.

Author: Lukas Fittl <lukas@fittl.com>
Author: David Geier <geidav.pg@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> (in an earlier version)
Discussion: https://www.postgresql.org/message-id/flat/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de
2026-04-07 17:12:08 -04:00
Tomas Vondra
3b1117d6e2 Add EXPLAIN (IO) instrumentation for SeqScan
Adds support for EXPLAIN (IO) instrumentation for sequential scans. This
requires adding shared instrumentation, using the separate DSM approach
introduced by dd78e69cfc.

Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
2026-04-07 23:07:03 +02:00
Tom Lane
b268928f93 Suppress unused-variable warning.
x86 machines lacking HAVE__CPUIDEX saw a complaint about
"unused variable 'reg'", per buildfarm as well as local
experience.  Oversight in bcb2cf41f.
2026-04-07 17:03:20 -04:00
Tomas Vondra
681daed931 Add EXPLAIN (IO) infrastructure with BitmapHeapScan support
Allows collecting details about AIO / prefetch for scan nodes backed by
a ReadStream. This may be enabled by a new "IO" option in EXPLAIN, and
it shows information about the prefetch distance and I/O requests.

As of this commit this applies only to BitmapHeapScan, because that's
the only scan node using a ReadStream and collecting instrumentation
from workers in a parallel query. Support for SeqScan and TidRangeScan,
the other scan nodes using ReadStream, will be added in subsequent
commits.

The stats are collected only when required by EXPLAIN ANALYZE, with the
IO option (disabled by default). The amount of collected statistics is
very limited, but we don't want to clutter EXPLAIN with too much data.

The IOStats struct is stored in the scan descriptor as a field, next to
other fields used by table AMs. A pointer to the field is passed to the
ReadStream, and updated directly.

It's the responsibility of the table AM to allocate the struct (e.g. in
ambeginscan) whenever the flag SO_SCAN_INSTRUMENT flag is passed to the
scan, so that the executor and ReadStream has access to it.

The collected stats are designed for ReadStream, but are meant to be
reasonably generic in case a TAM manages I/Os in different ways.

Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
2026-04-07 22:33:34 +02:00
Tomas Vondra
10d5a12a93 Switch EXPLAIN to unaligned output for json/xml/yaml
Use unaligned output for multiple EXPLAIN queries using non-text format
in regression tests. With aligned output adding/removing explain fields
can be very disruptive, as it often modifies the whole block because of
padding. Unaligned output does not have this issue.

Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me
2026-04-07 22:12:27 +02:00
Tom Lane
4edd6036d6 Fix WITHOUT OVERLAPS' interaction with domains.
UNIQUE/PRIMARY KEY ... WITHOUT OVERLAPS requires the no-overlap
column to be a range or multirange, but it should allow a domain
over such a type too.  This requires minor adjustments in both
the parser and executor.

In passing, fix a nearby break-instead-of-continue thinko in
transformIndexConstraint.  This had the effect of disabling
parse-time validation of the no-overlap column's type in the context
of ALTER TABLE ADD CONSTRAINT, if it follows a dropped column.
We'd still complain appropriately at runtime though.

Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxGoAmN_0iJ=hjTG0vGpOSOyy-vYyfE+-q0AWxrq2_p5XQ@mail.gmail.com
Backpatch-through: 18
2026-04-07 14:45:37 -04:00
Andres Freund
294520c444 instrumentation: Use Time-Stamp Counter on x86-64 to lower overhead
This allows the direct use of the Time-Stamp Counter (TSC) value retrieved
from the CPU using RDTSC/RDTSCP instructions, instead of APIs like
clock_gettime() on POSIX systems.

This reduces the overhead of EXPLAIN with ANALYZE and TIMING ON. Tests showed
that the overhead on top of actual runtime when instrumenting queries moving
lots of rows through the plan can be reduced from 2x as slow to 1.2x as slow
compared to the actual runtime. More complex workloads such as TPCH queries
have also shown ~20% gains when instrumented compared to before.

To control use of the TSC, the new "timing_clock_source" GUC is introduced,
whose default ("auto") automatically uses the TSC when reliable, for example
when running on modern Intel CPUs, or when running on Linux and the system
clocksource is reported as "tsc". The use of the operating system clock source
can be enforced by setting "system", or on x86-64 architectures the use of TSC
can be enforced by explicitly setting "tsc".

In order to use the TSC the frequency is first determined by use of CPUID, and
if not available, by running a short calibration loop at program start,
falling back to the system clock source if TSC values are not stable.

Note, that we split TSC usage into the RDTSC CPU instruction which does not
wait for out-of-order execution (faster, less precise) and the RDTSCP
instruction, which waits for outstanding instructions to retire. RDTSCP is
deemed to have little benefit in the typical InstrStartNode() /
InstrStopNode() use case of EXPLAIN, and can be up to twice as slow. To
separate these use cases, the new macro INSTR_TIME_SET_CURRENT_FAST() is
introduced, which uses RDTSC.

The original macro INSTR_TIME_SET_CURRENT() uses RDTSCP and is supposed to be
used when precision is more important than performance. When the system timing
clock source is used both of these macros instead utilize the system
APIs (clock_gettime / QueryPerformanceCounter) like before.

Additional users of interval timing, such as track_io_timing and
track_wal_io_timing could also benefit from being converted to use
INSTR_TIME_SET_CURRENT_FAST() but are left for future changes.

Author: Lukas Fittl <lukas@fittl.com>
Author: Andres Freund <andres@anarazel.de>
Author: David Geier <geidav.pg@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com> (in an earlier version)
Reviewed-by: Maciek Sakrejda <m.sakrejda@gmail.com> (in an earlier version)
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (in an earlier version)
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> (in an earlier version)
Discussion: https://postgr.es/m/20200612232810.f46nbqkdhbutzqdg@alap3.anarazel.de
2026-04-07 13:00:24 -04:00
Andres Freund
bcb2cf41f9 Allow retrieving x86 TSC frequency/flags from CPUID
This adds additional x86 specific CPUID checks for flags needed for
determining whether the Time-Stamp Counter (TSC) is usable on a given system,
as well as a helper function to retrieve the TSC frequency from CPUID.

This is intended for a future patch that will utilize the TSC to lower the
overhead of timing instrumentation.

In passing, always make pg_cpuid_subleaf reset the variables used for its
result, to avoid accidentally using stale results if __get_cpuid_count errors
out and the caller doesn't check for it.

Author: Lukas Fittl <lukas@fittl.com>
Author: David Geier <geidav.pg@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: John Naylor <john.naylor@postgresql.org>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> (in an earlier version)
Discussion: https://www.postgresql.org/message-id/flat/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de
2026-04-07 13:00:24 -04:00
Andres Freund
0022622c93 instrumentation: Standardize ticks to nanosecond conversion method
The timing infrastructure (INSTR_* macros) measures time elapsed using
clock_gettime() on POSIX systems, which returns the time as nanoseconds,
and QueryPerformanceCounter() on Windows, which is a specialized timing
clock source that returns a tick counter that needs to be converted to
nanoseconds using the result of QueryPerformanceFrequency().

This conversion currently happens ad-hoc on Windows, e.g. when calling
INSTR_TIME_GET_NANOSEC, which calls QueryPerformanceFrequency() on every
invocation, despite the frequency being stable after program start,
incurring unnecessary overhead. It also causes a fractured implementation
where macros are defined differently between platforms.

To ease code readability, and prepare for a future change that intends
to use a ticks-to-nanosecond conversion on x86-64 for TSC use, introduce
new pg_ticks_to_ns() / pg_ns_to_ticks() functions that get called from
INSTR_* macros on all platforms.

These functions rely on a separately initialized ticks_per_ns_scaled
value, that represents the conversion ratio. This value is initialized
from QueryPerformanceFrequency() on Windows, and set to zero on x86-64
POSIX systems, which results in the ticks being treated as nanoseconds.
Other architectures always directly return the original ticks.

To support this, pg_initialize_timing() is introduced, and is now
mandatory for both the backend and any frontend programs to call before
utilizing INSTR_* macros.

In passing, fix variable names in comment documenting INSTR_TIME_ADD_NANOSEC().

Author: Lukas Fittl <lukas@fittl.com>
Author: David Geier <geidav.pg@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://www.postgresql.org/message-id/flat/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de
2026-04-07 13:00:24 -04:00
Jacob Champion
b977bd308a oauth: Allow validators to register custom HBA options
OAuth validators can already use custom GUCs to configure behavior
globally. But we currently provide no ability to adjust settings for
individual HBA entries, because the original design focused on a world
where a provider covered a "single audience" of users for one database
cluster. This assumption does not apply to multitenant use cases, where
a single validator may be controlling access for wildly different user
groups.

To improve this use case, add two new API calls for use by validator
callbacks: RegisterOAuthHBAOptions() and GetOAuthHBAOption().
Registering options "foo" and "bar" allows a user to set "validator.foo"
and "validator.bar" in an oauth HBA entry. These options are stringly
typed (syntax validation is solely the responsibility of the defining
module), and names are restricted to a subset of ASCII to avoid tying
our hands with future HBA syntax improvements.

Unfortunately, we can't check the custom option names during a reload of
the configuration, like we do with standard HBA options, without
requiring all validators to be loaded via shared_preload_libraries.
(I consider this to be a nonstarter: most validators should probably use
session_preload_libraries at most, since requiring a full restart just
to update authentication behavior will be unacceptable to many users.)
Instead, the new validator.* options are checked against the registered
list at connection time.

Multiple alternatives were proposed and/or prototyped, including
extending the GUC system to allow per-HBA overrides, joining forces with
recent refactoring work on the reloptions subsystem, and giving the
ability to customize HBA options to all PostgreSQL extensions. I
personally believe per-HBA GUC overrides are the best option, because
several existing GUCs like authentication_timeout and pre_auth_delay
would fit there usefully. But the recent addition of SNI per-host
settings in 4f433025f indicates that a more general solution is needed,
and I expect that to take multiple releases' worth of discussion.

This compromise patch, then, is intentionally designed to be an
architectural dead end: simple to describe, cheap to maintain, and
providing just enough functionality to let validators move forward for
PG19. The hope is that it will be replaced in the future by a solution
that can handle per-host, per-HBA, and other per-context configuration
with the same functionality that GUCs provide today. In the meantime,
the bulk of the code in this patch consists of strict guardrails on the
simple API, to try to ensure that we don't have any reason to regret its
existence during its unknown lifespan.

I owe particular thanks here to Zsolt Parragi, who prototyped several
approaches that guided the final design.

Suggested-by: Zsolt Parragi <zsolt.parragi@percona.com>
Suggested-by: VASUKI M <vasukianand0119@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAN4CZFM3b8u5uNNNsY6XCya257u%2BDofms3su9f11iMCxvCacag%40mail.gmail.com
2026-04-07 08:15:19 -07:00
Jacob Champion
6d00fb9048 libpq: Split PGOAUTHDEBUG=UNSAFE into multiple options
PGOAUTHDEBUG is a blunt instrument: you get all the debugging features,
or none of them. The most annoying consequence during manual use is the
Curl debug trace, which tends to obscure the device flow prompt
entirely. The promotion of PGOAUTHCAFILE into its own feature in
993368113 improved the situation somewhat, but there's still the
discomfort of knowing you have to opt into many dangerous behaviors just
to get the single debug feature you wanted.

Explode the PGOAUTHDEBUG syntax into a comma-separated list. The old
"UNSAFE" value enables everything, like before. Any individual unsafe
features still require the envvar to begin with an "UNSAFE:" prefix, to
try to interrupt the flow of someone who is about to do something they
should not.

So now, rather than

    PGOAUTHDEBUG=UNSAFE        # enable all the unsafe things

a developer can say

    PGOAUTHDEBUG=call-count    # only show me the call count. safe!
    PGOAUTHDEBUG=UNSAFE:trace  # print secrets, but don't allow HTTP

To avoid adding more build system scaffolding to libpq-oauth, implement
this entirely in a small private header. This unfortunately can't be
standalone, so it needs a headerscheck exception.

Author: Zsolt Parragi <zsolt.parragi@percona.com>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAOYmi%2B%3DfbZNJSkHVci%3DGpR8XPYObK%3DH%2B2ERRha0LDTS%2BifsWnw%40mail.gmail.com
Discussion: https://postgr.es/m/CAN4CZFMmDZMH56O9vb_g7vHqAk8ryWFxBMV19C39PFghENg8kA%40mail.gmail.com
2026-04-07 08:15:14 -07:00