postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-04-15 22:10:45 -04:00

Author	SHA1	Message	Date
Melanie Plageman	dd78e69cfc	Allocate separate DSM chunk for parallel Index[Only]Scan instrumentation Previously, parallel index and index-only scans packed the parallel scan descriptor and shared instrumentation (for EXPLAIN ANALYZE) into a single DSM allocation. Since scans may be instrumented without being parallel-aware, and vice versa, using separate DSM chunks -- each with its own TOC key -- is cleaner. A future commit will extend this pattern to other scan node types. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/flat/a177a6dd-240b-455a-8f25-aca0b1c08c6e%40vondra.me	2026-04-06 19:10:19 -04:00
Daniel Gustafsson	b3a37ffbc5	Use PG_DATA_CHECKSUM_OFF instead of hardcoded value For a long time, the online checksums patchset kept the "off" state as literal zero without a label to be consistent with the previous coding which only had a label for the "on" state. Later, when an "off" label was made not all uses in the code got the memo. Fix by setting these to PG_DATA_CHECKSUM_OFF. While there, fix a duplicate word in a comment introduced by the same commit. Author: Aleksander Alekseev <aleksander@tigerdata.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CAJ7c6TPRTnQFXXX1CRcYoTLXw2swtDH==uSz1MYoMKdLrKZHjA@mail.gmail.com	2026-04-06 22:11:53 +02:00
Álvaro Herrera	28d534e2ae	Add CONCURRENTLY option to REPACK When this flag is specified, REPACK no longer acquires access-exclusive lock while the new copy of the table is being created; instead, it creates the initial copy under share-update-exclusive lock only (same as vacuum, etc), and it follows an MVCC snapshot; it sets up a replication slot starting at that snapshot, and uses a concurrent background worker to do logical decoding starting at the snapshot to populate a stash of concurrent data changes. Those changes can then be re-applied to the new copy of the table just before swapping the relfilenodes. Applications can continue to access the original copy of the table normally until just before the swap, which is the only point at which the access-exclusive lock is needed. There are some loose ends in this commit: 1. concurrent repack needs its own replication slot in order to apply logical decoding, which are a scarce resource and easy to run out of. 2. due to the way the historic snapshot is initially set up, only one REPACK process can be running at any one time on the whole system. 3. there's a danger of deadlocking (and thus abort) due to the lock upgrade required at the final phase. These issues will be addressed in upcoming commits. The design and most of the code are by Antonin Houska, heavily based on his own pg_squeeze third-party implementation. Author: Antonin Houska <ah@cybertec.at> Co-authored-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Jim Jones <jim.jones@uni-muenster.de> Reviewed-by: Robert Treat <rob@xzilla.net> Reviewed-by: Noriyoshi Shinoda <noriyoshi.shinoda@hpe.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Discussion: https://postgr.es/m/5186.1706694913@antos Discussion: https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql	2026-04-06 21:55:08 +02:00
Masahiko Sawada	1ff3180ca0	Allow autovacuum to use parallel vacuum workers. Previously, autovacuum always disabled parallel vacuum regardless of the table's index count or configuration. This commit enables autovacuum workers to use parallel index vacuuming and index cleanup, using the same parallel vacuum infrastructure as manual VACUUM. Two new configuration options control the feature. The GUC autovacuum_max_parallel_workers sets the maximum number of parallel workers a single autovacuum worker may launch; it defaults to 0, preserving existing behavior unless explicitly enabled. The per-table storage parameter autovacuum_parallel_workers provides per-table limits. A value of 0 disables parallel vacuum for the table, a positive value caps the worker count (still bounded by the GUC), and -1 (the default) defers to the GUC. To handle cases where autovacuum workers receive a SIGHUP and update their cost-based vacuum delay parameters mid-operation, a new propagation mechanism is added to vacuumparallel.c. The leader stores its effective cost parameters in a DSM segment. Parallel vacuum workers poll for changes in vacuum_delay_point(); if an update is detected, they apply the new values locally via VacuumUpdateCosts(). A new test module, src/test/modules/test_autovacuum, is added to verify that parallel autovacuum workers are correctly launched and that cost-parameter updates are propagated as expected. The patch was originally proposed by Maxim Orlov, but the implementation has undergone significant architectural changes since then during the review process. Author: Daniil Davydov <3danissimo@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: zengman <zengman@halodbtech.com> Discussion: https://postgr.es/m/CACG=ezZOrNsuLoETLD1gAswZMuH2nGGq7Ogcc0QOE5hhWaw=cw@mail.gmail.com	2026-04-06 11:48:29 -07:00
Heikki Linnakangas	9b5acad3f4	Convert all remaining subsystems to use the new shmem allocation API This removes all remaining uses of ShmemInitStruct() and ShmemInitHash() from built-in code. Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com	2026-04-06 02:13:10 +03:00
Heikki Linnakangas	2e0943a859	Convert SLRUs to use the new shmem allocation functions I replaced the old SimpleLruInit() function without a backwards compatibility wrapper, because few extensions define their own SLRUs. Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com	2026-04-06 02:13:02 +03:00
Heikki Linnakangas	c6d55714ba	Use the new shmem allocation functions in a few core subsystems These subsystems have some complicating properties, making them slightly harder to convert than most: - The initialization callbacks of some of these subsystems have dependencies, i.e. they need to be initialized in the right order. - The ProcGlobal pointer still needs to be inherited by the BackendParameters mechanism on EXEC_BACKEND builds, because ProcGlobal is required by InitProcess() to get a PGPROC entry, and the PGPROC entry is required to use LWLocks, and usually attaching to shared memory areas requires the use of LWLocks. - Similarly, ProcSignal pointer still needs to be handled by BackendParameters, because query cancellation connections access it without calling InitProcess Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com	2026-04-06 02:12:59 +03:00
Etsuro Fujita	de28140ded	postgres_fdw: Inherit the local transaction's access/deferrable modes. READ ONLY transactions should prevent modifications to foreign data as well as local data, but postgres_fdw transactions declared as READ ONLY that reference foreign tables mapped to a remote view executing volatile functions would modify data on remote servers, as it would open remote transactions in READ WRITE mode. Similarly, DEFERRABLE transactions should not abort due to a serialization failure even when accessing foreign data, but postgres_fdw transactions declared as DEFERRABLE would abort due to that failure in a remote server, as it would open remote transactions in NOT DEFERRABLE mode. To fix, modify postgres_fdw to open remote transactions in the same access/deferrable modes as the local transaction. This commit also modifies it to open remote subtransactions in the same access mode as the local subtransaction. This commit changes the behavior of READ ONLY/DEFERRABLE transactions using postgres_fdw; in particular, it doesn't allow the READ ONLY transactions to modify data on remote servers anymore, so such transactions should be redeclared as READ WRITE or rewritten using other tools like dblink. The release notes should note this as an incompatibility. These issues exist since the introduction of postgres_fdw, but to avoid the incompatibility in the back branches, fix them in master only. Author: Etsuro Fujita <etsuro.fujita@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/CAPmGK16n_hcUUWuOdmeUS%2Bw4Q6dZvTEDHb%3DOP%3D5JBzo-M3QmpQ%40mail.gmail.com Discussion: https://postgr.es/m/E1uLe9X-000zsY-2g%40gemulon.postgresql.org	2026-04-05 18:55:00 +09:00
Peter Geoghegan	2d3490dd99	heapam: Keep buffer pins across index scan resets. Avoid dropping the heap page pin (xs_cbuf) and visibility map pin (xs_vmbuffer) within heapam_index_fetch_reset. Retaining these pins saves cycles during certain nested loop joins and merge joins that frequently restore a saved mark: cases where the next tuple fetched after a reset often falls on the same heap page will now avoid the cost of repeated pinning and unpinning. Avoiding dropping the scan's heap page buffer pin is preparation for an upcoming patch that will add I/O prefetching to index scans. Testing of that patch (which makes heapam tend to pin more buffers concurrently than was typical before now) shows that the aforementioned cases get a small but clearly measurable benefit from this optimization. Upcoming work to add a slot-based table AM interface for index scans (which is further preparation for prefetching) will move VM checks for index-only scans out of the executor and into heapam. That will expand the role of xs_vmbuffer to include VM lookups for index-only scans (the field won't just be used for setting pages all-visible during on-access pruning via the enhancement recently introduced by commit `b46e1e54`). Avoiding dropping the xs_vmbuffer pin will preserve the historical behavior of nodeIndexonlyscan.c, which always kept this pin on a rescan; that aspect of this commit isn't really new. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wz=g=JTSyDB4UtB5su2ZcvsS7VbP+ZMvvaG6ABoCb+s8Lw@mail.gmail.com	2026-04-04 13:49:37 -04:00
Peter Geoghegan	c7d09595e4	heapam: Track heap block in IndexFetchHeapData. Add an explicit BlockNumber field (xs_blk) to IndexFetchHeapData that tracks which heap block is currently pinned in xs_cbuf. heapam_index_fetch_tuple now uses xs_blk to determine when buffer switching is needed, replacing the previous approach that compared buffer identities via ReleaseAndReadBuffer on every non-HOT-chain call. This is preparatory work for an upcoming commit that will add index prefetching using a read stream. Delegating the release of a currently pinned buffer to ReleaseAndReadBuffer won't work anymore -- at least not when the next buffer that the scan needs to pin is one returned by read_stream_next_buffer (not a buffer returned by ReadBuffer). Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wz=g=JTSyDB4UtB5su2ZcvsS7VbP+ZMvvaG6ABoCb+s8Lw@mail.gmail.com	2026-04-04 11:45:33 -04:00
Peter Geoghegan	a29fdd6c8d	Move heapam_handler.c index scan code to new file. Move the heapam index fetch callbacks (index_fetch_begin, index_fetch_reset, index_fetch_end, and index_fetch_tuple) into a new dedicated file. Also move heap_hot_search_buffer over. This is a purely mechanical move with no functional impact. Upcoming work to add a slot-based table AM interface for index scans will substantially expand this code. Keeping it in heapam_handler.c would clutter a file whose primary role is to wire up the TableAmRoutine callbacks. Bitmap heap scans and sequential scans would benefit from similar separation in the future. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/bmbrkiyjxoal6o5xadzv5bveoynrt3x37wqch7w3jnwumkq2yo@b4zmtnrfs4mh	2026-04-04 11:30:41 -04:00
Peter Geoghegan	1adff1a0c5	Rename heapam_index_fetch_tuple argument for clarity. Rename heapam_index_fetch_tuple's call_again argument to heap_continue, for consistency with the pointed-to variable name (IndexScanDescData's xs_heap_continue field). Preparation for an upcoming commit that will move index scan related heapam functions into their own file. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/bmbrkiyjxoal6o5xadzv5bveoynrt3x37wqch7w3jnwumkq2yo@b4zmtnrfs4mh	2026-04-04 11:30:05 -04:00
Daniel Gustafsson	f19c0eccae	Online enabling and disabling of data checksums This allows data checksums to be enabled, or disabled, in a running cluster without restricting access to the cluster during processing. Data checksums could prior to this only be enabled during initdb or when the cluster is offline using the pg_checksums app. This commit introduce functionality to enable, or disable, data checksums while the cluster is running regardless of how it was initialized. A background worker launcher process is responsible for launching a dynamic per-database background worker which will mark all buffers dirty for all relation with storage in order for them to have data checksums calculated on write. Once all relations in all databases have been processed, the data_checksums state will be set to on and the cluster will at that point be identical to one which had data checksums enabled during initialization or via offline processing. When data checksums are being enabled, concurrent I/O operations from backends other than the data checksums worker will write the checksums but not verify them on reading. Only when all backends have absorbed the procsignalbarrier for setting data_checksums to on will they also start verifying checksums on reading. The same process is repeated during disabling; all backends write checksums but do not verify them until the barrier for setting the state to off has been absorbed by all. This in-progress state is used to ensure there are no false negatives (or positives) due to reading a checksum which is not in sync with the page. A new testmodule, test_checksums, is introduced with an extensive set of tests covering both online and offline data checksum mode changes. The tests which run concurrent pgbdench during online processing are gated behind the PG_TEST_EXTRA flag due to being very expensive to run. Two levels of PG_TEST_EXTRA flags exist to turn on a subset of the expensive tests, or the full suite of multiple runs. This work is based on an earlier version of this patch which was reviewed by among others Heikki Linnakangas, Robert Haas, Andres Freund, Tomas Vondra, Michael Banck and Andrey Borodin. During the work on this new version, Tomas Vondra has given invaluable assistance with not only coding and reviewing but very in-depth testing. Author: Daniel Gustafsson <daniel@yesql.se> Author: Magnus Hagander <magnus@hagander.net> Co-authored-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CABUevExz9hUUOLnJVr2kpw9Cx=o4MCr1SVKwbupzuxP7ckNutA@mail.gmail.com Discussion: https://postgr.es/m/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de Discussion: https://postgr.es/m/CABUevEwE3urLtwxxqdgd5O2oQz9J717ZzMbh+ziCSa5YLLU_BA@mail.gmail.com	2026-04-03 22:58:51 +02:00
Fujii Masao	5770679918	Remove redundant SetLatch() calls in interrupt handling functions Interrupt handling functions (e.g., HandleCatchupInterrupt(), HandleParallelApplyMessageInterrupt()) are called only by procsignal_sigusr1_handler(), which already calls SetLatch() for the current process at the end of its processing. Therefore, these interrupt handling functions do not need to call SetLatch() themselves. However, previously, some of these functions redundantly called SetLatch(). This commit removes those unnecessary calls. While duplicate SetLatch() calls are redundant, they are harmless, so this change is not backpatched. Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Discussion: https://postgr.es/m/CALj2ACWd5apddj6Cd885WwJ6LquYu_G81C4GoR4xSoDV1x-FEA@mail.gmail.com	2026-04-02 23:55:30 +09:00
David Rowley	331d829e62	Fix nocachegetattr() so it again supports deforming cstrings `c456e3911` added various optimizations to the tuple deformation routines. One optimization assumed that heap tuples would never contain cstrings. That optimization also made its way into nocachegetattr(), which isn't correct as ROW() types get formed into HeapTuples by ExecEvalRow() and those can contain cstring Datums. nocachegetattr() gets used to extract Datums from those tuples. Here we remove the pg_assume(), which was there to instruct the compiler to omit the attlen == -2 related code in att_addlength_pointer(). Author: David Rowley <dgrowleyml@gmail.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/80aeac57-8f50-4732-a5b4-c2373c3f8149@gmail.com	2026-04-02 14:11:17 +13:00
Álvaro Herrera	db89a47115	Give an 'options' parameter to tuple_delete/_update The tuple_insert() method already has an equivalent argument, so this makes sense just on consistency grounds, for future growth. table_delete() can immediately use it to carry the 'changingPart' boolean; for table_update we don't have any options at present. Author: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> (older version) Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Antonin Houska <ah@cybertec.at> Discussion: https://postgr.es/m/202603171606.kf6pmhscqbqz@alvherre.pgsql	2026-04-01 20:26:57 +02:00
Álvaro Herrera	ec2f81766a	Fix vicinity of tuple_insert to use uint32, not int, for options Oversight in commit `1bd6f22f43`: I was way too optimistic about the compiler letting me know what variables needed to be updated, and missed a few of them. Clean it up. Author: Álvaro Herrera <alvherre@kurilemu.de> Reported-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/40E570EE-5A60-49D8-B8F7-2F8F2B7C8DFA@gmail.com	2026-04-01 18:14:51 +02:00
Nathan Bossart	771fe0948c	Avoid including vacuum.h in tableam.h and heapam.h. Commit `2252fcd427` modified some function prototypes in tableam.h and heapam.h to take a VacuumParams argument instead of a pointer, which required including vacuum.h in those headers. vacuum.h has a reasonably large dependency tree, and headers like tableam.h are widely included, so this is not ideal. To fix, change the functions in question to accept a "const VacuumParams *" argument instead. That allows us to use a forward declaration for VacuumParams and avoid including vacuum.h. Since vacuum_rel() needs to scribble on the params argument, we still pass it by value to that function so that the original struct is not modified. Reported-by: Andres Freund <andres@anarazel.de> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/rzxpxod4c4la62yvutyrvgoyilrl2fx55djaf2suidy7np5m6c%403l2ln476eadh	2026-03-31 12:43:52 -05:00
Tom Lane	fb7a9050d5	Doc: improve explanation of GiST compress/decompress methods. The docs previously didn't explain that leaf and non-leaf keys could be treated differently, even though many of our opclasses do exactly that. It also wasn't explained how that relates to the STORAGE option, particularly since only one storage type can be specified for both leaf and non-leaf keys. While here, reorganize the text slightly, rather than sticking additional detail into what's supposed to be a brief summary paragraph. Author: Paul A Jungwirth <pj@illuminatedcomputing.com> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA+renyWs5Np+FLSYfL+eu20S4U671A3fQGb-+7e22HLrD1NbYw@mail.gmail.com	2026-03-31 11:23:26 -04:00
Daniel Gustafsson	097ab69d17	Formalize WAL record for XLOG_CHECKPOINT_REDO XLOG_CHECKPOINT_REDO only contains the wal_level copied straight in without an encapsulating record structure. While it works, it makes future uses of XLOG_CHECKPOINT_REDO hard as there is nowhere to put new data items. This fix this was inspired by the online checksums patch which adds data to this record, but this change has value on its own. Author: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/c92b5d8b-bc03-47bc-b209-2e4a719eee32@iki.fi	2026-03-31 09:38:01 +02:00
Nathan Bossart	bab2f27eaa	Remove bits* typedefs. In addition to removing the bits8, bits16, and bits32 typedefs, this commit replaces all uses with uint8, uint16, or uint32. bits* provided little benefit beyond establishing the intent of the variable, and they were inconsistently used for that purpose. Third-party code should instead use the corresponding uint* typedef. Suggested-by: Andres Freund <andres@anarazel.de> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://postgr.es/m/absbX33E4eaA0Ity%40nathan	2026-03-30 16:12:08 -05:00
Melanie Plageman	378a216187	Set pd_prune_xid on insert Now that on-access pruning can update the visibility map (VM) during read-only queries, set the page’s pd_prune_xid hint during INSERT and on the new page during UPDATE. This allows heap_page_prune_and_freeze() to set the VM the first time a page is read after being filled with tuples. This may avoid I/O amplification by setting the page all-visible when it is still in shared buffers and allowing later vacuums to skip scanning the page. It also enables index-only scans of newly inserted data much sooner. As a side benefit, this addresses a long-standing note in heap_insert() and heap_multi_insert(): aborted inserts can now be pruned on-access rather than lingering until the next VACUUM. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com	2026-03-30 16:07:11 -04:00
Melanie Plageman	b46e1e54d0	Allow on-access pruning to set pages all-visible Many queries do not modify the underlying relation. For such queries, if on-access pruning occurs during the scan, we can check whether the page has become all-visible and update the visibility map accordingly. Previously, only vacuum and COPY FREEZE marked pages as all-visible or all-frozen. This commit implements on-access VM setting for sequential scans, tid range scans, sample scans, bitmap heap scans, and the underlying heap relation in index scans. Setting the visibility map on-access can avoid write amplification caused by vacuum later needing to set the page all-visible, which could trigger a write and potentially an FPI. It also allows more frequent index-only scans, since they require pages to be marked all-visible in the VM. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com	2026-03-30 15:47:07 -04:00
Álvaro Herrera	349bd88202	Don't use bits32 in table AM interface Seems there's near-universal dislike for the bitsXX typedefs. Revert that part of commit `1bd6f22f43` in favor of using plain uint32.	2026-03-30 19:06:33 +02:00
Melanie Plageman	dcd8cc1c85	Thread flags through begin-scan APIs Add an AM user-settable flags parameter to several of the table scan functions, one table AM callback, and index_beginscan(). This allows users to pass additional context to be used when building the scan descriptors. For index scans, a new flags field is added to IndexFetchTableData, and the heap AM saves the caller-provided flags there. This introduces an extension point for follow-up work to pass per-scan information (such as whether the relation is read-only for the current query) from the executor to the AM layer. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/2be31f17-5405-4de9-8d73-90ebc322f7d8%40vondra.me	2026-03-30 12:27:24 -04:00
Álvaro Herrera	1bd6f22f43	Have table_insert and siblings use an unsigned type for options Using signed types can lead to bugs, such as the one fixed by commit `2a2e1b470b`. Discussion: https://postgr.es/m/44e6ze3kuunhky63wmfjxrmn72pds2whwf5ok6hpz7c4my7k2h@l65zhpcuasnf	2026-03-30 13:58:16 +02:00
Andres Freund	8df3c48e46	Use UnlockReleaseBuffer() in more places An upcoming commit will make UnlockReleaseBuffer() considerably faster and more scalable than doing LockBuffer(BUFFER_LOCK_UNLOCK); ReleaseBuffer();. But it's a small performance benefit even as-is. Most of the callsites changed in this patch are not performance sensitive, however some, like the nbtree ones, are in critical paths. This patch changes all the easily convertible places over to UnlockReleaseBuffer() mainly because I needed to check all of them anyway, and reducing cases where the operations are done separately makes the checking easier. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d	2026-03-27 15:56:29 -04:00
Andres Freund	41d3d64e87	bufmgr: Don't copy pages while writing out After the series of preceding commits introducing and using BufferBeginSetHintBits()/BufferSetHintBits16(), hint bits are not set anymore while IO is going on. Therefore we do not need to copy pages while they are being written out anymore. For the same reason XLogSaveBufferForHint() now does not need to operate on a copy of the page anymore, but can instead use the normal XLogRegisterBuffer() mechanism. For that the assertions and comments to XLogRegisterBuffer() had to be updated to allow share-exclusive locked buffers to be registered. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d	2026-03-27 15:56:29 -04:00
Fujii Masao	400a790a48	Avoid sending duplicate WAL locations in standby status replies Previously, when the startup process applied WAL and requested walreceiver to send an apply notification to the primary, walreceiver sent a status reply unconditionally, even if the WAL locations had not advanced since the previous update. As a result, the standby could send two consecutive status reply messages with identical WAL locations even though wal_receiver_status_interval had not yet elapsed. This could unexpectedly reset the reported replication lag, making it difficult for users to monitor lag. The second message was also unnecessary because it reported no progress. This commit updates walreceiver to send a reply only when the apply location has advanced since the last status update, even when the startup process requests a notification. Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAOzEurTzcUrEzrH97DD7+Yz=HGPU81kzWQonKZvqBwYhx2G9_A@mail.gmail.com	2026-03-26 20:54:32 +09:00
Michael Paquier	4287c50fc2	Improve timeout handling of pg_promote() Previously, pg_promote() looped a fixed number of times, calculated from the specified timeout, and waited 100ms on a latch, once per iteration, for the promotion of a standby to complete. However, unrelated signals to the backend could set the latch and wake up the backend early, resulting in a faster consumption of the loops and an execution time of the function that does not match with the timeout input given in input. This could be confusing for the function caller, especially if some backend-side timeout is aggressive, because the function would return much earlier than expected and report that the promote request has not completed within the time requested. This commit refines the logic to track the time actually elapsed, by looping until the requested duration has truly passed. The code calculates the end time we expect, then uses it when looping. Author: Robert Pang <robertpang@google.com> Reviewed-by: Tiancheng Ge <getiancheng_2012@163.com> Discussion: https://postgr.es/m/CAJhEC07OK8J7tLUbyiccnuOXRE7UKxBNqD2-pLfeFXa=tBoWtw@mail.gmail.com	2026-03-26 10:39:40 +09:00
Melanie Plageman	a881cc9c7e	Remove XLOG_HEAP2_VISIBLE entirely There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it can be removed. This includes deleting the xl_heap_visible struct and all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE records. Bumps XLOG_PAGE_MAGIC because we removed a WAL record type. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com	2026-03-24 17:58:12 -04:00
Melanie Plageman	a759ced2f1	WAL log VM setting for empty pages in XLOG_HEAP2_PRUNE_VACUUM_SCAN As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now marks empty pages all-visible and all-frozen in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record. This has no real independent benefit, but empty pages were the last user of XLOG_HEAP2_VISIBLE, so by making this change we can next remove all of the XLOG_HEAP2_VISIBLE code. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Earlier version Reviewed-by: Robert Haas <robertmhaas@gmail.com>	2026-03-24 17:30:54 -04:00
Melanie Plageman	1252a4ee28	WAL log VM setting during vacuum phase I in XLOG_HEAP2_PRUNE_VACUUM_SCAN Vacuum no longer emits a separate WAL record for each page set all-visible or all-frozen during phase I. Instead, visibility map updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that is already emitted for pruning and freezing. Previously, heap_page_prune_and_freeze() determined whether a page was all-visible, but the corresponding VM bits were only set later in lazy_scan_prune(). Now the VM is updated immediately in heap_page_prune_and_freeze(), at the same time as the heap modifications. This reduces WAL volume produced by vacuum. For now, vacuum is still the only user of heap_page_prune_and_freeze() allowed to set the VM. On-access pruning is not yet able to set the VM. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Earlier version Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com	2026-03-24 16:49:46 -04:00
Melanie Plageman	9ba3ec076a	Keep newest live XID up-to-date even if page not all-visible During pruning, we keep track of the newest xmin of live tuples on the page visible to all running and future transactions so that we can use it later as the snapshot conflict horizon when setting the VM if the page turns out to be all-visible. Previously, we stopped updating this value once we determined the page was not all-visible. However, maintaining it even when the page is not all-visible is inexpensive and makes the snapshot conflict horizon calculation clearer. This guarantees it won't contain a stale value. Since we'll keep it up to date all the time now anyway, there's no reason not to maintain set_all_visible for on-access pruning. This will allow us to set the VM on-access in the future. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk	2026-03-24 15:37:18 -04:00
Melanie Plageman	dd5716f3c7	Use GlobalVisState in vacuum to determine page level visibility During vacuum's first and third phases, we examine tuples' visibility to determine if we can set the page all-visible in the visibility map. Previously, this check compared tuple xmins against a single XID chosen at the start of vacuum (OldestXmin). We now use GlobalVisState, which enables future work to set the VM during on-access pruning, since ordinary queries have access to GlobalVisState but not OldestXmin. This also benefits vacuum: in some cases, GlobalVisState may advance during a vacuum, allowing more pages to become considered all-visible. And, in the future, we could easily add a heuristic to update GlobalVisState more frequently during vacuums of large tables. OldestXmin is still used for freezing and as a backstop to ensure we don't freeze a dead tuple that wasn't yet prunable according to GlobalVisState in the rare occurrences where GlobalVisState moves backwards. Because comparing a transaction ID against GlobalVisState is more expensive than comparing against a single XID, we defer this check until after scanning all tuples on the page. Therefore, we perform the GlobalVisState check only once per page. This is safe because visibility_cutoff_xid records the newest live xmin on the page; if it is globally visible, then the entire page is all-visible. Using GlobalVisState means on-access pruning can also maintain visibility_cutoff_xid, which is required to set the visibility map on-access in the future. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493	2026-03-24 14:50:59 -04:00
Álvaro Herrera	2102ebb195	Don't include storage/lock.h in so many headers Since storage/locktags.h was added by commit `322bab7974`, many headers can be made leaner by depending on that instead of on storage/lock.h, which has many other dependencies. (In fact, some of these changes were possible even before that.) Author: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/abvrRZo52Yx9ZzWQ@ip-10-97-1-34.eu-west-3.compute.internal	2026-03-24 17:11:12 +01:00
Fujii Masao	1c162c965a	Report detailed errors from XLogFindNextRecord() failures. Previously, XLogFindNextRecord() did not return detailed error information when it failed to find a valid WAL record. As a result, callers such as the WAL summarizer, pg_waldump, and pg_walinspect could only report generic errors (e.g., "could not find a valid record after ..."), making troubleshooting difficult. This commit fix the issue by extending XLogFindNextRecord() to return detailed error information on failure, and updating its callers to include those details in their error messages. For example, when pg_waldump is run on a WAL file with an invalid magic number, it now reports not only the generic error but also the specific cause (e.g., "invalid magic number"). Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com> Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/CAO6_XqoxJXddcT4wkd9Xd+cD6Sz-fyspRGuV4Bq-wbXG4pVNzA@mail.gmail.com	2026-03-24 22:33:09 +09:00
Robert Haas	c98ad086ad	Bounds-check access to TupleDescAttr with an Assert. The second argument to TupleDescAttr should always be at least zero and less than natts; otherwise, we index outside of the attribute array. Assert that this is the case. Various violations, or possible violations, of this rule that are currently in the tree are actually harmless, because while we do call TupleDescAttr() before verifying that the argument is within range, we don't actually dereference it unless the argument was within range all along. Nonetheless, the Assert means we should be more careful, so tidy up accordingly. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CA+TgmoacixUZVvi00hOjk_d9B4iYKswWP1gNqQ8Vfray-AcOCA@mail.gmail.com	2026-03-24 08:58:50 -04:00
Peter Geoghegan	e5836f7b7d	Add fake LSN support to hash index AM. Use fake LSNs in all hash AM critical sections that write a WAL record. This gives us a reliable way (a way that works during scans of both logged and unlogged relations) to detect when an index page was concurrently modified during the window between when the page is initially read (by _hash_readpage) and when the page has any known-dead items LP_DEAD-marked (by _hash_kill_items). Preparation for an upcoming patch that makes the hash index AM use the amgetbatch interface, enabling I/O prefetching during hash index scans. The amgetbatch design imposes certain rules on index AMs with respect to how they hold on to index page buffer pins (at least in the case of pins held as an interlock against unsafe concurrent TID recycling by VACUUM). These rules have consequences for routines that set LP_DEAD bits on index tuples from an amgetbatch index AM: such routines have an inherent need to reason about concurrent TID recycling by VACUUM, but can no longer rely on their amgettuple routine holding on to a buffer pin (during the aforementioned window) as an interlock against such recycling. Instead, they have to follow a new, standardized approach. The new approach taken by amgetbatch index AMs when setting LP_DEAD bits is heavily based on the current nbtree dropPin design, which was added by commit `2ed5b87f`. It also works by checking if the page's LSN advanced during the window where unsafe concurrent TID recycling might have taken place. This commit is similar to commit `8a879119`, which taught nbtree to use fake LSNs to improve its dropPin behavior. However, unlike that commit, this is not an independently useful enhancement, since hash doesn't implement anything like nbtree's dropPin behavior (not yet). Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-WzkehuhxyuA8quc7rRN3EtNXpiKsjPfO8mhb+0Dr2K0Dtg@mail.gmail.com	2026-03-22 17:31:43 -04:00
Melanie Plageman	01b7e4a46d	Add pruning fast path for all-visible and all-frozen pages Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID, heap_page_prune_and_freeze() can be invoked for pages with no pruning or freezing work to do. To avoid this, if a page is already all-frozen or it is all-visible and no freezing will be attempted, exit early. We can't exit early if vacuum passed DISABLE_PAGE_SKIPPING, though. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk	2026-03-22 15:46:50 -04:00
Melanie Plageman	4f7ecca84d	Detect and fix visibility map corruption in more cases Move VM corruption detection and repair into heap page pruning. This allows VM repair during on-access pruning, not only during vacuum. Also, expand corruption detection to cover pages marked all-visible that contain dead tuples and tuples inserted or deleted by in-progress transactions, rather than only all-visible pages with LP_DEAD items. Pinning the correct VM page before on-access pruning is cheap when compared to the cost of actually pruning. The vmbuffer is saved in the scan descriptor, so a query should only need to pin each VM page once, and a single VM page covers a large number of heap pages. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk	2026-03-22 11:52:40 -04:00
Heikki Linnakangas	516310ed4d	Don't reset 'latest_page_number' when replaying multixid truncation 'latest_page_number' is set to the correct value, according to nextOffset, early at system startup. Contrary to the comment, it hence should be set up correctly by the time we get to WAL replay. This was committed to back-branches earlier already (commit `817f74600d`), to fix a bug in a backwards-compatibility codepath. We don't have that bug on 'master', but the change nevertheless makes sense on 'master' too. Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://www.postgresql.org/message-id/20260214090150.GC2297@p46.dedyn.io;lightning.p46.dedyn.io Discussion: https://www.postgresql.org/message-id/e1787b17-dc93-4621-a5a1-c713d1ac6a1b@iki.fi	2026-03-22 14:23:54 +02:00
Nathan Bossart	48f11bfa06	Bump transaction/multixact ID warning limits to 100M. These warning limits were last changed to 40M by commit `cd5e82256d`. For the benefit of workloads that rapidly consume transactions or multixacts, this commit bumps the limits to 100M. This will hopefully give users enough time to react. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com> Discussion: https://postgr.es/m/aRdhSSFb9zZH_0zc%40nathan	2026-03-20 14:15:33 -05:00
Nathan Bossart	e646450e60	Add percentage of available IDs to wraparound warnings. This commit adds DETAIL messages to the existing wraparound WARNINGs that include the percentage of transaction/multixact IDs that remain available for use. The hope is that this more clearly expresses the urgency of the situation. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com> Discussion: https://postgr.es/m/aRdhSSFb9zZH_0zc%40nathan	2026-03-20 14:15:33 -05:00
Masahiko Sawada	adcdbe9386	Add parallel vacuum worker usage to VACUUM (VERBOSE) and autovacuum logs. This commit adds both the number of parallel workers planned and the number of parallel workers actually launched to the output of VACUUM (VERBOSE) and autovacuum logs. Previously, this information was only reported as an INFO message during VACUUM (VERBOSE), which meant it was not included in autovacuum logs in practice. Although autovacuum does not yet support parallel vacuum, a subsequent patch will enable it and utilize these logs in its regression tests. This change also improves observability by making it easier to verify if parallel vacuum is utilizing the expected number of workers. Author: Daniil Davydov <3danissimo@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CACG=ezZOrNsuLoETLD1gAswZMuH2nGGq7Ogcc0QOE5hhWaw=cw@mail.gmail.com	2026-03-19 15:01:47 -07:00
Andres Freund	f5eb854ab6	Fix use of wrong variable in _hash_kill_items() In `82467f627b` I somehow ended up using 'so->currPos.buf' instead of the 'buf' variable, which is incorrect when the buffer is not already pinned. At the very least this can lead to assertion failures Unfortunately this shows that this code path was not covered. Expand src/test/modules/index/specs/killtuples.spec to test it. Until now the 'result' step always reported either a 0 or 1 buffer accesses, but when exercising hash overflows, more buffers are accessed. To avoid depending on the precise number of accesses, change the result step to return whether there were any heap accesses. That makes the change a lot more verbose, but still seems worth it. Reported-by: Alexander Kuzmenkov <akuzmenkov@tigerdata.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Reported-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/vjtmvwvbxt7w5uyacxpzibpj65ewcb7uqaqbhd4arvnjbp5jqz%405ksdh6fsyqve Discussion: https://postgr.es/m/b9de8d05-3b02-4a27-9b0b-03972fa4bfd3@iki.fi	2026-03-17 14:54:41 -04:00
David Rowley	d8a859d22b	Reduce size of CompactAttribute struct to 8 bytes Previously, this was 16 bytes. With the use of some bitflags and by reducing the attcacheoff field size to a 16-bit type, we can halve the size of the struct. It's unlikely that caching the offsets for offsets larger than what will fit in a 16-bit int will help much as the tuple is very likely to have some non-fixed-width types anyway, the offsets of which we cannot cache. Shrinking this down to 8 bytes helps by accessing fewer cachelines when performing tuple deformation. The fields used there are all fully fledged fields, which don't require any bitmasking to extract the value of. It also helps to more efficiently calculate the address of a compact_attrs[] element in TupleDesc as the x86 LEA instruction can work with 8 byte offsets, which allows the element address to be calculated from the TupleDesc's address in a single instruction using LEA's concurrent shift and add. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CAApHDvodSVBj3ypOYbYUCJX%2BNWL%3DVZs63RNBQ_FxB_F%2B6QXF-A%40mail.gmail.com	2026-03-17 15:06:31 +13:00
David Rowley	7a2ab122a1	Fix thinko in nocachegetattr() and nocache_index_getattr() This code was recently adjusted by `c456e3911`, but that commit didn't get the logic correct when finding the attnum to start walking the tuple in. If there is a NULL, we need to start walking the tuple before it. Author: David Rowley <dgrowleyml@gmail.com> Reported-by: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/CAHewXNnb-s_=VdVUZ9h7dPA0u3hxV8x2aU3obZytnqQZ_MiROA@mail.gmail.com	2026-03-17 09:00:39 +13:00
Álvaro Herrera	fba4233c83	Reduce header inclusions via execnodes.h Remove a bunch of #include lines from execnodes.h. Most of these requier suitable typedefs to be added, so that it still compiles standalone. In one case, the fix is to move a struct definition to the one .c file where it is needed. Also some light clean up in plannodes.h and genam.h, though not as extensive as in execnodes.h. Author: Álvaro Herrera <alvherre@kurilemu.de> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/202603131240.ihwqdxnj7w2o@alvherre.pgsql	2026-03-16 14:34:57 +01:00
Michael Paquier	bfa3c4f106	Optimize hash index bulk-deletion with streaming read This commit refactors hashbulkdelete() to use streaming reads, improving the efficiency of the operation by prefetching upcoming buckets while processing a current bucket. There are some specific changes required to make sure that the cleanup work happens in accordance to the data pushed to the stream read callback. When the cached metadata page is refreshed to be able to process the next set of buckets, the stream is reset and the data fed to the stream read callback has to be updated. The reset needs to happen in two code paths, when _hash_getcachedmetap() is called. The author has seen better performance numbers than myself on this one (with tweaks similar to `6c228755ad`). The numbers are good enough for both of us that this change is worth doing, in terms of IO and runtime. Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/CABPTF7VrqfbcDXqGrdLQ2xaQ=K0RzExNuw6U_GGqzSJu32wfdQ@mail.gmail.com	2026-03-16 09:22:09 +09:00

1 2 3 4 5 ...

5579 commits