postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-07-15 20:52:53 -04:00

Author	SHA1	Message	Date
Amit Langote	03029409b4	Fix typo left by `34a3078629` Reported-by: jie wang <jugierwang@gmail.com> Reported-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAJnZyeDyaS=X-eYN=9rDYqK=6ma1gMLa0qDgfNbZKK0e0+q99Q@mail.gmail.com	2026-04-10 13:32:38 +09:00
Amit Langote	34a3078629	Fix RI fast-path crash under nested C-level SPI When a C-language function uses SPI_connect/SPI_execute/SPI_finish to INSERT into a table with FK constraints, the FK AFTER triggers fire and schedule ri_FastPathEndBatch via RegisterAfterTriggerBatchCallback(), opening PK relations under CurrentResourceOwner at the time of the SPI call. The query_depth > 0 guard in FireAfterTriggerBatchCallbacks suppresses the callback at that nesting level, deferring teardown to the outer query's AfterTriggerEndQuery. By then the resource owner active during the SPI call may have been released, decrementing the cached relations' refcounts to zero. ri_FastPathTeardown, running under the outer query's resource owner, then crashes in assert builds when it attempts to close relations whose refcounts are already zero: TRAP: failed Assert("rel->rd_refcnt > 0") Fix by storing batch callbacks at the level where they should fire: in AfterTriggersQueryData.batch_callbacks for immediate constraints (fired by AfterTriggerEndQuery) and in AfterTriggersData.batch_callbacks for deferred constraints (fired by AfterTriggerFireDeferred and AfterTriggerSetState). RegisterAfterTriggerBatchCallback() routes the callback to the current query-level list when query_depth >= 0, and to the top-level list otherwise. FireAfterTriggerBatchCallbacks() takes a list parameter and simply iterates and invokes it; memory cleanup is handled by the caller. This replaces the query_depth > 0 guard with list-level scoping. Note that deferred constraints are unaffected by this bug: their callbacks fire at commit via AfterTriggerFireDeferred, under the outer transaction's resource owner, which remains valid throughout. Also add firing_batch_callbacks to AfterTriggersData to enforce that callbacks do not register new callbacks during FireAfterTriggerBatchCallbacks(), which would be unsafe as it could modify the list being iterated. An Assert in RegisterAfterTriggerBatchCallback() enforces this discipline for future callers. The flag is reset at transaction and subtransaction boundaries to handle cases where an error thrown by a callback is caught and the subtransaction is rolled back. While at it, ensure callbacks are properly accounted for at all transaction boundaries, as cleanup of `b7b27eb41a`: discard any remaining top-level callbacks on both commit and abort in AfterTriggerEndXact(), and clean up query-level callbacks in AfterTriggerFreeQuery(). Note that ri_PerformCheck() calls SPI with fire_triggers=false, which skips AfterTriggerBeginQuery/EndQuery for that SPI command. Any triggers queued during that SPI command are not fired immediately but deferred to the outer query level. Since the fast-path check for those triggers runs under the outer query's resource owner rather than a nested SPI resource owner, and ri_PerformCheck() does not create a dedicated child resource owner, the bug described above does not apply. Reported-by: Evan Montgomery-Recht <montge@mianetworks.net> Reported-by: Sandro Santilli <strk@kbt.io> Analyzed-by: Evan Montgomery-Recht <montge@mianetworks.net> Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g@mail.gmail.com	2026-04-10 12:41:34 +09:00
Jeff Davis	90630ec429	Document new catalog columns, missed in commit `8185bb5347`. Reported-by: "Shinoda, Noriyoshi (PSD Japan FSI)" <noriyoshi.shinoda@hpe.com> Co-authored-by: "Shinoda, Noriyoshi (PSD Japan FSI)" <noriyoshi.shinoda@hpe.com> Discussion: https://postgr.es/m/LV8PR84MB3787135EBDBF7747A05731F3EE592@LV8PR84MB3787.NAMPRD84.PROD.OUTLOOK.COM	2026-04-09 20:29:42 -07:00
Michael Paquier	5b5bf51e43	Zero-fill private_data when attaching an injection point InjectionPointAttach() did not initialize the private_data buffer of the shared memory entry before (perhaps partially) overwriting it. When the private data is set to NULL by the caler, the buffer was left uninitialized. If set, it could have stale contents. The buffer is initialized to zero, so as the contents recorded when a point is attached are deterministic. Author: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0tsGHu2h6YLnVu4HiK05q+gTE_9WVUAqihW2LSscAYS-g@mail.gmail.com Backpatch-through: 17	2026-04-10 11:17:09 +09:00
Nathan Bossart	71ff232a5b	Fix double-free in pg_stat_autovacuum_scores. Presently, relation_needs_vacanalyze() unconditionally frees the pgstat entry returned by pgstat_fetch_stat_tabentry_ext(). This behavior was first added by commit `02502c1bca` to avoid memory leakage in autovacuum. While this is fine for autovacuum since it forces stats_fetch_consistency to "none", it is not okay for other callers that use "cache" or "snapshot". This manifests as a double-free when pg_stat_autovacuum_scores is called multiple times in the same transaction. To fix, add a "bool *may_free" parameter to pgstat_fetch_stat_tabentry_ext() that returns whether it is safe for the caller to explicitly pfree() the result. If a caller would rather leave it to the memory context machinery to free the result, it can pass NULL as the "may_free" argument (or just ignore its value). Oversight in commit `87f61f0c82`. Reported-by: Tender Wang <tndrwang@gmail.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Suggested-by: Andres Freund <andres@anarazel.de> Suggested-by: Tom Lane <tgl@sss.pgh.pa.us> Author: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CAHewXNkJKdwb3D5OnksrdOqzqUnXUEMpDam1TPW0vfUkW%3D7jUw%40mail.gmail.com Discussion: https://postgr.es/m/5684f479-858e-4c5d-b8f5-bcf05de1f909%40gmail.com	2026-04-09 13:07:06 -05:00
Masahiko Sawada	8030b839d3	Remove an unstable wait from parallel autovacuum regression test. The test 001_parallel_autovacuum.pl verified that vacuum delay parameters are propagated to parallel vacuum workers by using injection points. It previously waited for autovacuum to complete on the test_autovac table. However, since injection points are cluster-wide, an autovacuum worker could be triggered on tables in other databases (e.g., template1) and get stuck at the same injection point. This could lead to a timeout when the test waits for the expected table's autovacuum to finish. This commit removes the wait for autovacuum completion from this specific test case. Since the primary goal is to verify the propagation of parameter updates, which is already confirmed via log messages, waiting for the entire vacuum process to finish is unnecessary and prone to instability in concurrent test environments. Author: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0s+kZZRMSF4HW7tZ9W2jS1o4B+Fg8dr5a-T6mANX+mdQA@mail.gmail.com	2026-04-09 09:13:32 -07:00
Andres Freund	7fc36c5db5	instrumentation: Avoid CPUID 0x15/0x16 for Hypervisor TSC frequency This restricts the retrieval of the TSC frequency whilst under a Hypervisor to either Hypervisor-specific CPUID registers (0x40000010), or TSC calibration. We previously allowed retrieving from the traditional CPUID registers for TSC frequency (0x15/0x16) like on bare metal, but it turns out that they are not trustworthy when virtualized and can report wildly incorrect frequencies, like 7 kHz when the actual calibrated frequencty is 2.5 GHz. Per report from buildfarm member drongo. Author: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/jr4hk2sxhqcfpb67ftz5g4vw33nm67cgf7go3wwmqsafu5aclq%405m67ukuhyszz	2026-04-09 11:50:46 -04:00
Nathan Bossart	60165db6e1	Add LOG_NEVER error level code. This logging level means not to emit the log, which is useful for functions like relation_needs_vacanalyze(). This function accepts a log level argument but not all callers want it to emit logs. Suggested-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/3101163.1775676098%40sss.pgh.pa.us	2026-04-09 10:18:15 -05:00
Richard Guo	8b6c89e377	Fix integer overflow in nodeWindowAgg.c In nodeWindowAgg.c, the calculations for frame start and end positions in ROWS and GROUPS modes were performed using simple integer addition. If a user-supplied offset was sufficiently large (close to INT64_MAX), adding it to the current row or group index could cause a signed integer overflow, wrapping the result to a negative number. This led to incorrect behavior where frame boundaries that should have extended indefinitely (or beyond the partition end) were treated as falling at the first row, or where valid rows were incorrectly marked as out-of-frame. Depending on the specific query and data, these overflows can result in incorrect query results, execution errors, or assertion failures. To fix, use overflow-aware integer addition (ie, pg_add_s64_overflow) to check for overflows during these additions. If an overflow is detected, the boundary is now clamped to INT64_MAX. This ensures the logic correctly treats the boundary as extending to the end of the partition. Bug: #19405 Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/19405-1ecf025dda171555@postgresql.org Backpatch-through: 14	2026-04-09 19:28:33 +09:00
Peter Eisentraut	11d6042337	Update config.guess and config.sub	2026-04-09 11:26:14 +02:00
Richard Guo	c1408956e3	Strip PlaceHolderVars from partition pruning operands When pulling up a subquery, its targetlist items may be wrapped in PlaceHolderVars to enforce separate identity or as a result of outer joins. This causes any upper-level WHERE clauses referencing these outputs to contain PlaceHolderVars, which prevents partprune.c from recognizing that they match partition key columns, defeating partition pruning. To fix, strip PlaceHolderVars from operands before comparing them to partition keys. A PlaceHolderVar with empty phnullingrels appearing in a relation-scan-level expression is effectively a no-op, so stripping it is safe. This parallels the existing treatment in indxpath.c for index matching. In passing, rename strip_phvs_in_index_operand() to strip_noop_phvs() and move it from indxpath.c to placeholder.c, since it is now a general-purpose utility used by both index matching and partition pruning code. Back-patch to v18. Although this issue exists before that, changes in that version made it common enough to notice. Given the lack of field reports for older versions, I am not back-patching further. In the v18 back-patch, strip_phvs_in_index_operand() is retained as a thin wrapper around the new strip_noop_phvs() to avoid breaking third-party extensions that may reference it. Reported-by: Cándido Antonio Martínez Descalzo <candido@ninehq.com> Diagnosed-by: David Rowley <dgrowleyml@gmail.com> Author: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/CAH5YaUwVUWETTyVECTnhs7C=CVwi+uMSQH=cOkwAUqMdvXdwWA@mail.gmail.com Backpatch-through: 18	2026-04-09 16:41:31 +09:00
Amit Langote	e1cc57fabd	Add nkeys parameter to recheck_matched_pk_tuple() The function looped over ii_NumIndexKeyAttrs elements of the skeys array, but one caller (ri_FastPathFlushArray) passes a one-element array since it only handles single-column FKs. The function signature did not communicate this constraint, which static analysis flags as a potential out-of-bounds read. Add an nkeys parameter and assert that it matches ii_NumIndexKeyAttrs, then use it in the loop. The call sites already know the key count. Reported-by: Evan Montgomery-Recht <montge@mianetworks.net> Discussion: https://postgr.es/m/CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g@mail.gmail.com	2026-04-09 14:45:31 +09:00
Michael Paquier	e0fa5bd146	Reduce presence of syscache.h in src/include/ `ee642cccc4` has added syscache.h in inval.h and objectaddress.h, enlarging by a lot the footprint of this header, particularly via objectaddress.h. A change in syscache.h would cause a lot more files to be recompiled. This commit reduces the presence of syscache.h by switching to a direct use of syscache_ids.h in inval.h and objectaddress.h, where the enum SysCacheIdentifier is defined. genbki.pl gains an #ifndef block for this header, so as its inclusion is more controlled. Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/vlcexdcimsmvu3aplt2yxpfndkgtuvjsrms2fdl46rbw3k2kug@drspkoxlaije	2026-04-09 08:49:36 +09:00
Álvaro Herrera	2cff363715	Simplify declaration of memcpy target The existing one is understandable failing on (some?) 32-bit platforms. Reported-by: Tomas Vondra <tomas@vondra.me> Suggested-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1c197f2d-49a2-4830-8dde-55867218b62d@vondra.me	2026-04-08 22:58:56 +02:00
Daniel Gustafsson	b364828f82	doc: Fix data_checksums data type Commit `f19c0eccae` changed the data_checksums GUC datatype from a boolean to an enum. This updates the documentation to accurately reflect its new type and document the new possible states: 'on', 'off', 'inprogress-on', and 'inprogress-off'. Also update the xref for more information to point to the section on data checksums rather than the initdb checksum option. Author: Lakshmi N <lakshmin.jhs@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CA+3i_M-AtTnqTB2KLBTpu-c-jvnTuy7bGxyxs80rgiQLxWrRUQ@mail.gmail.com	2026-04-08 22:53:43 +03:00
Nathan Bossart	e0851bded6	Add a couple of commits to .git-blame-ignore-revs.	2026-04-08 13:41:22 -05:00
Peter Eisentraut	f8eec1ced6	Add missing PGDLLIMPORT markings	2026-04-08 15:49:33 +02:00
Thomas Munro	a1643d40b3	Remove RADIUS support. Our RADIUS implementation supported only the deprecated RADIUS/UDP variant, without the recommended Message-Authenticator attribute to mitigate against the Blast-RADIUS vulnerability. By now, popular RADIUS servers are expected to generate loud warnings or reject our authentication attempts outright. Since there have been no user reports about this, it seems unlikely that there are users. Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Aleksander Alekseev <aleksander@tigerdata.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Michael Banck <mbanck@gmx.net> Discussion: https://postgr.es/m/CA%2BhUKG%2BSH309V8KECU5%3DxuLP9Dks0v9f9UVS2W74fPAE5O21dg%40mail.gmail.com	2026-04-08 22:38:43 +12:00
Etsuro Fujita	28972b6fc3	Add support for importing statistics from remote servers. Add a new FDW callback routine that allows importing remote statistics for a foreign table directly to the local server, instead of collecting statistics locally. The new callback routine is called at the beginning of the ANALYZE operation on the table, and if the FDW failed to import the statistics, the existing callback routine is called on the table to collect statistics locally. Also implement this for postgres_fdw. It is enabled by "restore_stats" option both at the server and table level. Currently, it is the user's responsibility to ensure remote statistics to import are up-to-date, so the default is false. Author: Corey Huinker <corey.huinker@gmail.com> Co-authored-by: Etsuro Fujita <etsuro.fujita@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com> Discussion: https://postgr.es/m/CADkLM%3DchrYAx%3DX2KUcDRST4RLaRLivYDohZrkW4LLBa0iBhb5w%40mail.gmail.com	2026-04-08 19:15:00 +09:00
Thomas Munro	d1c01b79d4	aio: Adjust I/O worker pool automatically. The size of the I/O worker pool used to implement io_method=worker was previously controlled by the io_workers setting, defaulting to 3. It was hard to know how to tune it effectively. That is replaced with: io_min_workers=2 io_max_workers=8 (up to 32) io_worker_idle_timeout=60s io_worker_launch_interval=100ms The pool is automatically sized within the configured range according to recent variation in demand. It grows when existing workers detect that latency might be introduced by queuing, and shrinks when the highest-numbered worker is idle for too long. Work was already concentrated into low-numbered workers in anticipation of this logic. The logic for waking extra workers now also tries to measure and reduce the number of spurious wakeups, though they are not entirely eliminated. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKG%2Bm4xV0LMoH2c%3DoRAdEXuCnh%2BtGBTWa7uFeFMGgTLAw%2BQ%40mail.gmail.com	2026-04-08 19:08:32 +12:00
John Naylor	948ef7cdc4	Exit early from pg_comp_crc32c_pmull for small inputs The vectorized path in commit `fbc57f2bc` had a side effect of putting more branches in the path taken for small inputs. To reduce risk of regressions, only proceed with the vectorized path if we can guarantee that the remaining input after the alignment preamble is greater than 64 bytes. That also allows removing length checks in the alignment preamble. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://postgr.es/m/CANWCAZZ48GuLYhJCcTy8TXysjrMVJL6n1n7NP94=iG+t80YKPw@mail.gmail.com	2026-04-08 13:52:14 +07:00
Thomas Munro	ce11e63f81	pg_upgrade: Check for unsupported encodings. Since we have dropped MULE_INTERNAL, add a check that all encodings used in the source cluster are still supported according to PG_ENCODING_BE_VALID(). This is done generically, in case we decide to drop another encoding some day. Suggested-by: Jeff Davis <pgsql@j-davis.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKGKXDXh-FdU0orjfv%2BF08f%3DD91BhV3Ra-4zL-q%2BJmGYqTA%40mail.gmail.com	2026-04-08 17:45:09 +12:00
Thomas Munro	77645d44e3	Remove MULE_INTERNAL encoding. This was useful before widespread Unicode adoption, and was based on the internal encoding Emacs used to mix multiple sub-encodings. Emacs itself has stopped using it, and our implementation hadn't been updated with modern underlying standards. It is thought to be very unlikely that anyone is still using it in the field. Since such a complex encoding comes with costs and risks, we agreed to drop support. Any existing database using this encoding would need to be dumped and restored with a new encoding to upgrade to PostgreSQL 19, most likely UTF8, since pg_upgrade would fail. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Tatsuo Ishii <ishii@postgresql.org> Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/CA%2BhUKGKXDXh-FdU0orjfv%2BF08f%3DD91BhV3Ra-4zL-q%2BJmGYqTA%40mail.gmail.com	2026-04-08 17:40:06 +12:00
Andres Freund	2c16deee2f	instrumentation: Allocate query level instrumentation in ExecutorStart Until now extensions that wanted to measure overall query execution could create QueryDesc->totaltime, which the core executor would then start and stop. That's a bit odd and composes badly, e.g. extensions always had to use INSTRUMENT_ALL, because otherwise another extension might not get what they need. Instead this introduces a new field, QueryDesc->query_instr_options, that extensions can use to indicate whether they need query level instrumentation populated, and with which instrumentation options. Extensions should take care to only add options they need, instead of replacing the options of others. The prior name of the field, totaltime, sounded like it would only measure time, but these days the instrumentation infrastructure can track more resources. The secondary benefit is that this will make it obvious to extensions that they may not create the Instrumentation struct themselves anymore (often extensions build only against a postgres build without assertions). Adjust pg_stat_statements and auto_explain to match, and lower the requested instrumentation level for auto_explain to INSTRUMENT_TIMER, since the summary instrumentation it needs is only runtime. The reason to push this now, rather in the PG 20 cycle, is that `5a79e78501` already required extensions using query level instrumentations to adjust their code, and it seemed undesirable to require them to do so again for 20. Author: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAP53Pkyqsht+exJQYRsjhSWYKu+vFGHhPub7m6PmFD6Or0=p1g@mail.gmail.com	2026-04-08 00:06:45 -04:00
Fujii Masao	db93032a7c	Fix slotsync worker blocking promotion when stuck in wait Previously, on standby promotion, the startup process sent SIGUSR1 to the slotsync worker (or a backend performing slot synchronization) and waited for it to exit. This worked in most cases, but if the process was blocked waiting for a response from the primary (e.g., due to a network failure), SIGUSR1 would not interrupt the wait. As a result, the process could remain stuck, causing the startup process to wait for a long time and delaying promotion. This commit fixes the issue by introducing a new procsignal reason, PROCSIG_SLOTSYNC_MESSAGE. On promotion, the startup process sends this signal, and the handler sets interrupt flags so the process exits (or errors out) promptly at CHECK_FOR_INTERRUPTS(), allowing promotion to complete without delay. Backpatch to v17, where slotsync was introduced. Author: Nisha Moond <nisha.moond412@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/CAHGQGwFzNYroAxSoyJhqTU-pH=t4Ej6RyvhVmBZ91Exj_TPMMQ@mail.gmail.com Backpatch-through: 17	2026-04-08 11:22:21 +09:00
Andres Freund	544000288e	instrumentation: Move ExecProcNodeInstr to allow inlining This moves the implementation of ExecProcNodeInstr, the ExecProcNode variant that gets used when instrumentation is on, to be defined in instrument.c instead of execProcNode.c, and marks functions it uses as inline. This allows compilers to generate an optimized implementation, and shows a 4 to 12% reduction in instrumentation overhead for queries that move lots of rows. Author: Lukas Fittl <lukas@fittl.com> Suggested-by: Andres Freund <andres@anarazel.de> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAP53PkzdBK8VJ1fS4AZ481LgMN8f9mJiC39ZRHqkFUSYq6KWmg@mail.gmail.com	2026-04-07 21:36:49 -04:00
Tomas Vondra	e157fe6f76	Add EXPLAIN (IO) instrumentation for TidRangeScan Adds support for EXPLAIN (IO) instrumentation for TidRange scans. This requires adding shared instrumentation for parallel scans, using the separate DSM approach introduced by `dd78e69cfc`. Author: Tomas Vondra <tomas@vondra.me> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me	2026-04-07 23:25:05 +02:00
Andres Freund	16fca48254	pg_test_timing: Also test RDTSC[P] timing, report time source, TSC frequency This adds support to pg_test_timing for the different timing sources added by `294520c444`. Author: Lukas Fittl <lukas@fittl.com> Author: David Geier <geidav.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: David Geier <geidav.pg@gmail.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> (in an earlier version) Discussion: https://www.postgresql.org/message-id/flat/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de	2026-04-07 17:12:08 -04:00
Tomas Vondra	3b1117d6e2	Add EXPLAIN (IO) instrumentation for SeqScan Adds support for EXPLAIN (IO) instrumentation for sequential scans. This requires adding shared instrumentation, using the separate DSM approach introduced by `dd78e69cfc`. Author: Tomas Vondra <tomas@vondra.me> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me	2026-04-07 23:07:03 +02:00
Tom Lane	b268928f93	Suppress unused-variable warning. x86 machines lacking HAVE__CPUIDEX saw a complaint about "unused variable 'reg'", per buildfarm as well as local experience. Oversight in `bcb2cf41f`.	2026-04-07 17:03:20 -04:00
Tomas Vondra	61c36a34a4	auto_explain: Add new GUC auto_explain.log_io Allows enabling the new EXPLAIN "IO" option for auto_explain. Author: Tomas Vondra <tomas@vondra.me> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Lukas Fittl <lukas@fittl.com> Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me	2026-04-07 22:49:44 +02:00
Tomas Vondra	681daed931	Add EXPLAIN (IO) infrastructure with BitmapHeapScan support Allows collecting details about AIO / prefetch for scan nodes backed by a ReadStream. This may be enabled by a new "IO" option in EXPLAIN, and it shows information about the prefetch distance and I/O requests. As of this commit this applies only to BitmapHeapScan, because that's the only scan node using a ReadStream and collecting instrumentation from workers in a parallel query. Support for SeqScan and TidRangeScan, the other scan nodes using ReadStream, will be added in subsequent commits. The stats are collected only when required by EXPLAIN ANALYZE, with the IO option (disabled by default). The amount of collected statistics is very limited, but we don't want to clutter EXPLAIN with too much data. The IOStats struct is stored in the scan descriptor as a field, next to other fields used by table AMs. A pointer to the field is passed to the ReadStream, and updated directly. It's the responsibility of the table AM to allocate the struct (e.g. in ambeginscan) whenever the flag SO_SCAN_INSTRUMENT flag is passed to the scan, so that the executor and ReadStream has access to it. The collected stats are designed for ReadStream, but are meant to be reasonably generic in case a TAM manages I/Os in different ways. Author: Tomas Vondra <tomas@vondra.me> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me	2026-04-07 22:33:34 +02:00
Tomas Vondra	10d5a12a93	Switch EXPLAIN to unaligned output for json/xml/yaml Use unaligned output for multiple EXPLAIN queries using non-text format in regression tests. With aligned output adding/removing explain fields can be very disruptive, as it often modifies the whole block because of padding. Unaligned output does not have this issue. Author: Tomas Vondra <tomas@vondra.me> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me	2026-04-07 22:12:27 +02:00
Tom Lane	4edd6036d6	Fix WITHOUT OVERLAPS' interaction with domains. UNIQUE/PRIMARY KEY ... WITHOUT OVERLAPS requires the no-overlap column to be a range or multirange, but it should allow a domain over such a type too. This requires minor adjustments in both the parser and executor. In passing, fix a nearby break-instead-of-continue thinko in transformIndexConstraint. This had the effect of disabling parse-time validation of the no-overlap column's type in the context of ALTER TABLE ADD CONSTRAINT, if it follows a dropped column. We'd still complain appropriately at runtime though. Author: Jian He <jian.universality@gmail.com> Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CACJufxGoAmN_0iJ=hjTG0vGpOSOyy-vYyfE+-q0AWxrq2_p5XQ@mail.gmail.com Backpatch-through: 18	2026-04-07 14:45:37 -04:00
Andres Freund	294520c444	instrumentation: Use Time-Stamp Counter on x86-64 to lower overhead This allows the direct use of the Time-Stamp Counter (TSC) value retrieved from the CPU using RDTSC/RDTSCP instructions, instead of APIs like clock_gettime() on POSIX systems. This reduces the overhead of EXPLAIN with ANALYZE and TIMING ON. Tests showed that the overhead on top of actual runtime when instrumenting queries moving lots of rows through the plan can be reduced from 2x as slow to 1.2x as slow compared to the actual runtime. More complex workloads such as TPCH queries have also shown ~20% gains when instrumented compared to before. To control use of the TSC, the new "timing_clock_source" GUC is introduced, whose default ("auto") automatically uses the TSC when reliable, for example when running on modern Intel CPUs, or when running on Linux and the system clocksource is reported as "tsc". The use of the operating system clock source can be enforced by setting "system", or on x86-64 architectures the use of TSC can be enforced by explicitly setting "tsc". In order to use the TSC the frequency is first determined by use of CPUID, and if not available, by running a short calibration loop at program start, falling back to the system clock source if TSC values are not stable. Note, that we split TSC usage into the RDTSC CPU instruction which does not wait for out-of-order execution (faster, less precise) and the RDTSCP instruction, which waits for outstanding instructions to retire. RDTSCP is deemed to have little benefit in the typical InstrStartNode() / InstrStopNode() use case of EXPLAIN, and can be up to twice as slow. To separate these use cases, the new macro INSTR_TIME_SET_CURRENT_FAST() is introduced, which uses RDTSC. The original macro INSTR_TIME_SET_CURRENT() uses RDTSCP and is supposed to be used when precision is more important than performance. When the system timing clock source is used both of these macros instead utilize the system APIs (clock_gettime / QueryPerformanceCounter) like before. Additional users of interval timing, such as track_io_timing and track_wal_io_timing could also benefit from being converted to use INSTR_TIME_SET_CURRENT_FAST() but are left for future changes. Author: Lukas Fittl <lukas@fittl.com> Author: Andres Freund <andres@anarazel.de> Author: David Geier <geidav.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: David Geier <geidav.pg@gmail.com> Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> (in an earlier version) Reviewed-by: Maciek Sakrejda <m.sakrejda@gmail.com> (in an earlier version) Reviewed-by: Robert Haas <robertmhaas@gmail.com> (in an earlier version) Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> (in an earlier version) Discussion: https://postgr.es/m/20200612232810.f46nbqkdhbutzqdg@alap3.anarazel.de	2026-04-07 13:00:24 -04:00
Andres Freund	bcb2cf41f9	Allow retrieving x86 TSC frequency/flags from CPUID This adds additional x86 specific CPUID checks for flags needed for determining whether the Time-Stamp Counter (TSC) is usable on a given system, as well as a helper function to retrieve the TSC frequency from CPUID. This is intended for a future patch that will utilize the TSC to lower the overhead of timing instrumentation. In passing, always make pg_cpuid_subleaf reset the variables used for its result, to avoid accidentally using stale results if __get_cpuid_count errors out and the caller doesn't check for it. Author: Lukas Fittl <lukas@fittl.com> Author: David Geier <geidav.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: David Geier <geidav.pg@gmail.com> Reviewed-by: John Naylor <john.naylor@postgresql.org> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> (in an earlier version) Discussion: https://www.postgresql.org/message-id/flat/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de	2026-04-07 13:00:24 -04:00
Andres Freund	0022622c93	instrumentation: Standardize ticks to nanosecond conversion method The timing infrastructure (INSTR_* macros) measures time elapsed using clock_gettime() on POSIX systems, which returns the time as nanoseconds, and QueryPerformanceCounter() on Windows, which is a specialized timing clock source that returns a tick counter that needs to be converted to nanoseconds using the result of QueryPerformanceFrequency(). This conversion currently happens ad-hoc on Windows, e.g. when calling INSTR_TIME_GET_NANOSEC, which calls QueryPerformanceFrequency() on every invocation, despite the frequency being stable after program start, incurring unnecessary overhead. It also causes a fractured implementation where macros are defined differently between platforms. To ease code readability, and prepare for a future change that intends to use a ticks-to-nanosecond conversion on x86-64 for TSC use, introduce new pg_ticks_to_ns() / pg_ns_to_ticks() functions that get called from INSTR_* macros on all platforms. These functions rely on a separately initialized ticks_per_ns_scaled value, that represents the conversion ratio. This value is initialized from QueryPerformanceFrequency() on Windows, and set to zero on x86-64 POSIX systems, which results in the ticks being treated as nanoseconds. Other architectures always directly return the original ticks. To support this, pg_initialize_timing() is introduced, and is now mandatory for both the backend and any frontend programs to call before utilizing INSTR_* macros. In passing, fix variable names in comment documenting INSTR_TIME_ADD_NANOSEC(). Author: Lukas Fittl <lukas@fittl.com> Author: David Geier <geidav.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: David Geier <geidav.pg@gmail.com> Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Discussion: https://www.postgresql.org/message-id/flat/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de	2026-04-07 13:00:24 -04:00
Jacob Champion	b977bd308a	oauth: Allow validators to register custom HBA options OAuth validators can already use custom GUCs to configure behavior globally. But we currently provide no ability to adjust settings for individual HBA entries, because the original design focused on a world where a provider covered a "single audience" of users for one database cluster. This assumption does not apply to multitenant use cases, where a single validator may be controlling access for wildly different user groups. To improve this use case, add two new API calls for use by validator callbacks: RegisterOAuthHBAOptions() and GetOAuthHBAOption(). Registering options "foo" and "bar" allows a user to set "validator.foo" and "validator.bar" in an oauth HBA entry. These options are stringly typed (syntax validation is solely the responsibility of the defining module), and names are restricted to a subset of ASCII to avoid tying our hands with future HBA syntax improvements. Unfortunately, we can't check the custom option names during a reload of the configuration, like we do with standard HBA options, without requiring all validators to be loaded via shared_preload_libraries. (I consider this to be a nonstarter: most validators should probably use session_preload_libraries at most, since requiring a full restart just to update authentication behavior will be unacceptable to many users.) Instead, the new validator.* options are checked against the registered list at connection time. Multiple alternatives were proposed and/or prototyped, including extending the GUC system to allow per-HBA overrides, joining forces with recent refactoring work on the reloptions subsystem, and giving the ability to customize HBA options to all PostgreSQL extensions. I personally believe per-HBA GUC overrides are the best option, because several existing GUCs like authentication_timeout and pre_auth_delay would fit there usefully. But the recent addition of SNI per-host settings in `4f433025f` indicates that a more general solution is needed, and I expect that to take multiple releases' worth of discussion. This compromise patch, then, is intentionally designed to be an architectural dead end: simple to describe, cheap to maintain, and providing just enough functionality to let validators move forward for PG19. The hope is that it will be replaced in the future by a solution that can handle per-host, per-HBA, and other per-context configuration with the same functionality that GUCs provide today. In the meantime, the bulk of the code in this patch consists of strict guardrails on the simple API, to try to ensure that we don't have any reason to regret its existence during its unknown lifespan. I owe particular thanks here to Zsolt Parragi, who prototyped several approaches that guided the final design. Suggested-by: Zsolt Parragi <zsolt.parragi@percona.com> Suggested-by: VASUKI M <vasukianand0119@gmail.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Discussion: https://postgr.es/m/CAN4CZFM3b8u5uNNNsY6XCya257u%2BDofms3su9f11iMCxvCacag%40mail.gmail.com	2026-04-07 08:15:19 -07:00
Jacob Champion	6d00fb9048	libpq: Split PGOAUTHDEBUG=UNSAFE into multiple options PGOAUTHDEBUG is a blunt instrument: you get all the debugging features, or none of them. The most annoying consequence during manual use is the Curl debug trace, which tends to obscure the device flow prompt entirely. The promotion of PGOAUTHCAFILE into its own feature in `993368113` improved the situation somewhat, but there's still the discomfort of knowing you have to opt into many dangerous behaviors just to get the single debug feature you wanted. Explode the PGOAUTHDEBUG syntax into a comma-separated list. The old "UNSAFE" value enables everything, like before. Any individual unsafe features still require the envvar to begin with an "UNSAFE:" prefix, to try to interrupt the flow of someone who is about to do something they should not. So now, rather than PGOAUTHDEBUG=UNSAFE # enable all the unsafe things a developer can say PGOAUTHDEBUG=call-count # only show me the call count. safe! PGOAUTHDEBUG=UNSAFE:trace # print secrets, but don't allow HTTP To avoid adding more build system scaffolding to libpq-oauth, implement this entirely in a small private header. This unfortunately can't be standalone, so it needs a headerscheck exception. Author: Zsolt Parragi <zsolt.parragi@percona.com> Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Discussion: https://postgr.es/m/CAOYmi%2B%3DfbZNJSkHVci%3DGpR8XPYObK%3DH%2B2ERRha0LDTS%2BifsWnw%40mail.gmail.com Discussion: https://postgr.es/m/CAN4CZFMmDZMH56O9vb_g7vHqAk8ryWFxBMV19C39PFghENg8kA%40mail.gmail.com	2026-04-07 08:15:14 -07:00
Álvaro Herrera	e76d8c749c	Reserve replication slots specifically for REPACK Add a new GUC max_repack_replication_slots, which lets the user reserve some additional replication slots for concurrent repack (and only concurrent repack). With this, the user doesn't have to worry about changing the max_replication_slots in order to cater for use of concurrent repack. (We still use the same pool of bgworkers though, but that's less commonly a problem than slots.) Author: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com> Discussion: https://postgr.es/m/202604012148.nnnmyxxrr6nh@alvherre.pgsql	2026-04-07 16:55:29 +02:00
Heikki Linnakangas	979387f188	Fix harmless leftover in _hash_kill_items() Checking for 'havePin' is sufficient here. An earlier version of the patch didn't have the 'havePin' variable and used 'so->hashso_bucket_buf == so->currPos.buf' as the condition when both locking and unlocking the page. The havePin variable was added later during development, but the unlocking condition wasn't fully updated. Tidy it up. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/b9de8d05-3b02-4a27-9b0b-03972fa4bfd3@iki.fi	2026-04-07 17:38:11 +03:00
Andrew Dunstan	55890a9194	Add errdetail() with PID and UID about source of termination signal. When a backend is terminated via pg_terminate_backend() or an external SIGTERM, the error message now includes the sender's PID and UID as errdetail, making it easier to identify the source of unexpected terminations in multi-user environments. On platforms that support SA_SIGINFO (Linux, FreeBSD, and most modern Unix systems), the signal handler captures si_pid and si_uid from the siginfo_t structure. On platforms without SA_SIGINFO, the detail is simply omitted. Author: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Chao Li <1356863904@qq.com> Discussion: https://postgr.es/m/CAKZiRmyrOWovZSdixpLd3PGMQXuQL_zw2Ght5XhHCkQ1uDsxjw@mail.gmail.com	2026-04-07 10:22:33 -04:00
Robert Haas	c10edb102a	pg_stash_advice: Allow stashed advice to be persisted to disk. If pg_stash_advice.persist = true, stashed advice will be written to pg_stash_advice.tsv in the data directory, periodically and at shutdown. On restart, stash modifications are locked out until this file has been reloaded, but queries will not be, so there may be a short window after startup during which previously-stashed advice is not automatically applied. Author: Robert Haas <rhaas@postgresql.org> Co-authored-by: Lukas Fittl <lukas@fittl.com> Discussion: https://postgr.es/m/CA+Tgmob87qsWa-VugofU6epuV0H5XjWZGMbQas4Q-ADKmvSyBg@mail.gmail.com	2026-04-07 10:11:36 -04:00
Andres Freund	29e7dbf5e4	Minimal fix for WAIT FOR ... MODE 'standby_flush' The investigation into the negative test performance impact of `7e8aeb9e48` lead to discovering that there are a few issues with WAIT FOR. This commit is just a minimal fix to prevent hangs in standby_flush mode, due to WAIT FOR ... 'standby_flush' seeing a 0 LSN if a newly started walreceiver does not receive any writes, because the stanby is already caught up. There are several other issues and this is isn't necessarily the best fix. But this way we get the hangs out of the way. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/zqbppucpmkeqecfy4s5kscnru4tbk6khp3ozqz6ad2zijz354k@w4bdf4z3wqoz	2026-04-07 09:48:09 -04:00
Álvaro Herrera	8fb95a8ab6	doc: Add an example of REPACK (CONCURRENTLY) Suggested-by: vignesh C <vignesh21@gmail.com> Discussion: https://postgr.es/m/CALDaNm3tiKhtegx5Cawi34UjbHmNGEDNAtScGM1RgWRtV-5_0Q@mail.gmail.com	2026-04-07 15:33:55 +02:00
Heikki Linnakangas	9480c585df	Tidy up #ifdef USE_INJECTION_POINTS guards Remove unnecessary #ifdef guard around the function prototypes; they are already inside a larger #ifdef block. Move #include "subsystems.h" inside the USE_INJECTION_POINTS guard; it's needed for InjectionPointShmemCallbacks, which is a also inside the guard. Reported-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://www.postgresql.org/message-id/87y0iz2c1v.fsf@wibble.ilmari.org	2026-04-07 16:18:31 +03:00
Álvaro Herrera	be142fa008	Fix tests under wal_level=minimal Buildfarm members which have specifically configured to use wal_level=minimal fail the repack regression tests, which require wal_level=replica. Add a temp config file to fix that.	2026-04-07 15:14:32 +02:00
Heikki Linnakangas	257c8231bf	Modernize and optimize pg_buffercache_pages() Refactor pg_buffercache_pages() to use SFRM_Materialize mode and construct a tuplestore directly. That's simpler and more efficient than collecting all the data to a custom array first. Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Author: Palak Chaturvedi <chaturvedipalak1911@gmail.com> Discussion: https://www.postgresql.org/message-id/CAExHW5sMsaz1j+hrdhyo-DJp7JCgJx87=q2iJfOc_9mwYWyvmw@mail.gmail.com	2026-04-07 16:04:48 +03:00
Heikki Linnakangas	9f3755ea07	Optimize sorting and deduplicating trigrams Use templated qsort() so that the comparison function can be inlined. To speed up qunique(), use a specialized comparison function that only checks for equality. Author: David Geier <geidav.pg@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://www.postgresql.org/message-id/2a76b5ef-4b12-4023-93a1-eed6e64968f3@gmail.com	2026-04-07 14:11:25 +03:00
Tomas Vondra	884f9b3c76	Use add_size/mul_size for index instrumentation size calculations Use overflow-safe size arithmetic in the Index[Only]Scan and parallel instrumentation functions, consistent with other executor nodes (Hash, Sort, Agg, Memoize). This was an oversight in `dd78e69cfc`. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Lukas Fittl <lukas@fittl.com> Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me	2026-04-07 12:47:28 +02:00

1 2 3 4 5 ...

64028 commits