postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-04-23 15:19:08 -04:00

Author	SHA1	Message	Date
Tom Lane	3628af4210	Add a macro for the declared typlen of type timetz. pg_type.typlen says 12 for the size of timetz, but sizeof(TimeTzADT) will be 16 on most platforms due to alignment padding. Using the sizeof number is no problem for usages such as palloc'ing a result datum, but in usages such as datumCopy we really ought to match what pg_type says. Add a macro TIMETZ_TYPLEN so that we have a symbolic way to write that rather than hard-coding "12". I cannot find any place where we've needed this so far, but an upcoming patch requires it. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/2329959.1765047648@sss.pgh.pa.us	2025-12-07 11:33:35 -05:00
Tom Lane	6498287696	Handle constant inputs to corr() and related aggregates more precisely. The SQL standard says that corr() and friends should return NULL in the mathematically-undefined case where all the inputs in one of the columns have the same value. We were checking that by seeing if the sums Sxx and Syy were zero, but that approach is very vulnerable to roundoff error: if a sum is close to zero but not exactly that, we'd come out with a pretty silly non-NULL result. Instead, directly track whether the inputs are all equal by remembering the common value in each column. Once we detect that a new input is different from before, represent that by storing NaN for the common value. (An objection to this scheme is that if the inputs are all NaN, we will consider that they were not all equal. But under IEEE float arithmetic rules, one NaN is never equal to another, so this behavior is arguably correct. Moreover it matches what we did before in such cases.) Then, leave the sums at their exact value of zero for as long as we haven't detected different input values. This solution requires the aggregate transition state to contain 8 float values not 6, which is not problematic, and it seems to add less than 1% to the aggregates' runtime, which seems acceptable. While we're here, improve corr()'s final function to cope with overflow/underflow in the final calculation, and to clamp its result to [-1, 1] in case of roundoff error. Although this is arguably a bug fix, it requires a catversion bump due to the change in aggregates' initial states, so it can't be back-patched. Patch written by me, but many of the ideas are due to Dean Rasheed, who also did a deal of testing. Bug: #19340 Reported-by: Oleg Ivanov <o15611@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Co-authored-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/19340-6fb9f6637f562092@postgresql.org	2025-12-06 18:31:26 -05:00
Amit Kapila	5db6a344ab	Rename column slotsync_skip_at to slotsync_last_skip. Commit `76b78721ca` introduced two new columns in pg_stat_replication_slots to improve monitoring of slot synchronization. One of these columns was named slotsync_skip_at, which is inconsistent with the naming convention used for similar columns in other system views. Columns that store timestamps of the most recent event typically use the 'last_' in the column name (e.g., last_autovacuum, checksum_last_failure). Renaming slotsync_skip_at to slotsync_last_skip aligns with this pattern, making the purpose of the column clearer and improving overall consistency across the views. Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Michael Banck <mbanck@gmx.net> Discussion: https://postgr.es/m/20251128091552.GB13635@p46.dedyn.io;lightning.p46.dedyn.io Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com	2025-12-05 04:12:55 +00:00
Peter Eisentraut	c6be3daa05	Remove no longer needed casts to Pointer These casts used to be required when Pointer was char , but now it's void (commit `1b2bb5077e`), so they are not needed anymore. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org	2025-12-04 19:40:08 +01:00
Andres Freund	6c5c393b74	Rename BUFFERPIN wait event class to BUFFER In an upcoming patch more wait events will be added to the wait event class (for buffer locking), making the current name too specific. Alternatively we could introduce a dedicated wait event class for those, but it seems somewhat confusing to have a BUFFERPIN and a BUFFER wait event class. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-12-03 18:38:20 -05:00
Andres Freund	7902a47c20	Add pg_atomic_unlocked_write_u64 The 64bit equivalent of pg_atomic_unlocked_write_u32(), to be used in an upcoming patch converting BufferDesc.state into a 64bit atomic. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-12-03 18:38:20 -05:00
Andres Freund	156680055d	bufmgr: Turn BUFFER_LOCK_* into an enum It seems cleaner to use an enum to tie the different values together. It also helps to have a more descriptive type in the argument to various functions. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-12-03 18:38:20 -05:00
Heikki Linnakangas	789d65364c	Set next multixid's offset when creating a new multixid With this commit, the next multixid's offset will always be set on the offsets page, by the time that a backend might try to read it, so we no longer need the waiting mechanism with the condition variable. In other words, this eliminates "corner case 2" mentioned in the comments. The waiting mechanism was broken in a few scenarios: - When nextMulti was advanced without WAL-logging the next multixid. For example, if a later multixid was already assigned and WAL-logged before the previous one was WAL-logged, and then the server crashed. In that case the next offset would never be set in the offsets SLRU, and a query trying to read it would get stuck waiting for it. Same thing could happen if pg_resetwal was used to forcibly advance nextMulti. - In hot standby mode, a deadlock could happen where one backend waits for the next multixid assignment record, but WAL replay is not advancing because of a recovery conflict with the waiting backend. The old TAP test used carefully placed injection points to exercise the old waiting code, but now that the waiting code is gone, much of the old test is no longer relevant. Rewrite the test to reproduce the IPC/MultixactCreation hang after crash recovery instead, and to verify that previously recorded multixids stay readable. Backpatch to all supported versions. In back-branches, we still need to be able to read WAL that was generated before this fix, so in the back-branches this includes a hack to initialize the next offsets page when replaying XLOG_MULTIXACT_CREATE_ID for the last multixid on a page. On 'master', bump XLOG_PAGE_MAGIC instead to indicate that the WAL is not compatible. Author: Andrey Borodin <amborodin@acm.org> Reviewed-by: Dmitry Yurichev <dsy.075@yandex.ru> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Ivan Bykov <i.bykov@modernsys.ru> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/172e5723-d65f-4eec-b512-14beacb326ce@yandex.ru Backpatch-through: 14	2025-12-03 19:15:08 +02:00
Peter Eisentraut	9790affcce	Fix stray references to SubscriptRef This type never existed. SubscriptingRef was meant instead. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/2eaa45e3-efc5-4d75-b082-f8159f51445f%40eisentraut.org	2025-12-03 14:44:14 +01:00
Peter Eisentraut	1b2bb5077e	Change Pointer to void * The comment for the Pointer type said 'XXX Pointer arithmetic is done with this, so it can't be void * under "true" ANSI compilers.'. This has been fixed in the previous commit `756a436893`. This now changes the definition of the type from char * to void *, as envisaged by that comment. Extension code that relies on using Pointer for pointer arithmetic will need to make changes similar to commit `756a436893`, but those changes would be backward compatible. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org	2025-12-03 10:22:17 +01:00
Nathan Bossart	f894acb24a	Show size of DSAs and dshashes in pg_dsm_registry_allocations. Presently, this view reports NULL for the size of DSAs and dshash tables because 1) the current backend might not be attached to them and 2) the registry doesn't save the pointers to the dsa_area or dshash_table in local memory. Also, the view doesn't show partially-initialized entries to avoid ambiguity, since those entries would report a NULL size as well. This commit introduces a function that looks up the size of a DSA given its handle (transiently attaching to the control segment if needed) and teaches pg_dsm_registry_allocations to use it to show the size of successfully-initialized DSA and dshash entries. Furthermore, the view now reports partially-initialized entries with a NULL size. Reviewed-by: Rahila Syed <rahilasyed90@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aSeEDeznAsHR1_YF%40nathan	2025-12-02 10:29:45 -06:00
Michael Paquier	713d9a847e	Update some timestamp[tz] functions to use soft-error reporting This commit updates two functions that convert "timestamptz" to "timestamp", and vice-versa, to use the soft error reporting rather than a their own logic to do the same. These are now named as follows: - timestamp2timestamptz_safe() - timestamptz2timestamp_safe() These functions were suffixed with "_opt_overflow", previously. This shaves some code, as it is possible to detect how a timestamp[tz] overflowed based on the returned value rather than a custom state. It is optionally possible for the callers of these functions to rely on the error generated internally by these functions, depending on the error context. Similar work has been done in `d03668ea05` and `4246a977ba`. Reviewed-by: Amul Sul <sulamul@gmail.com> Discussion: https://postgr.es/m/aS09YF2GmVXjAxbJ@paquier.xyz	2025-12-02 09:30:23 +09:00
Jeff Davis	19b966243c	Make regex "max_chr" depend on encoding, not provider. The regex mechanism scans through the first "max_chr" character values to cache character property ranges (isalpha, etc.). For single-byte encodings, there's no sense in scanning beyond UCHAR_MAX; but for UTF-8 it makes sense to cache higher code point values (though not all of them; only up to MAX_SIMPLE_CHR). Prior to `5a38104b36`, the logic about how many character values to scan was based on the pg_regex_strategy, which was dependent on the provider. Commit `5a38104b36` preserved that logic exactly, allowing different providers to define the "max_chr". Now, change it to depend only on the encoding and whether ctype_is_c. For this specific calculation, distinguishing between providers creates more complexity than it's worth. Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com Reviewed-by: Chao Li <li.evan.chao@gmail.com>	2025-12-01 11:06:17 -08:00
Michael Paquier	a87987cafc	Move WAL sequence code into its own file This split exists for most of the other RMGRs, and makes cleaner the separation between the WAL code, the redo code and the record description code (already in its own file) when it comes to the sequence RMGR. The redo and masking routines are moved to a new file, sequence_xlog.c. All the RMGR routines are now located in a new header, sequence_xlog.h. This separation is useful for a different patch related to sequences that I have been working on, where it makes a refactoring of sequence.c easier if its RMGR routines and its core routines are split. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/aSfTxIWjiXkTKh1E@paquier.xyz	2025-12-01 16:21:41 +09:00
Michael Paquier	d03668ea05	Switch some date/timestamp functions to use the soft error reporting This commit changes some functions related to the data types date and timestamp to use the soft error reporting rather than a custom boolean flag called "overflow", used to let the callers of these functions know if an overflow happens. This results in the removal of some boilerplate code, as it is possible to rely on an error context rather than a custom state, with the possibility to use the error generated inside the functions updated here, if necessary. These functions were suffixed with "_opt_overflow". They are now renamed to use "_safe" as suffix. This work is similar to `4246a977ba`. Author: Amul Sul <sulamul@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAAJ_b95HEmFyzHZfsdPquSHeswcopk8MCG1Q_vn4tVkZ+xxofw@mail.gmail.com	2025-12-01 15:22:20 +09:00
Peter Eisentraut	8b3e2c622a	Fix pg_isblank() There was a pg_isblank() function that claimed to be a replacement for the standard isblank() function, which was thought to be "not very portable yet". We can now assume that it's portable (it's in C99). But pg_isblank() actually diverged from the standard isblank() by also accepting '\r', while the standard one only accepts space and tab. This was added to support parsing pg_hba.conf under Windows. But the hba parsing code now works completely differently and already handles line endings before we get to pg_isblank(). The other user of pg_isblank() is for ident protocol message parsing, which also handles '\r' separately. So this behavior is now obsolete and confusing. To improve clarity, I separated those concerns. The ident parsing now gets its own function that hardcodes the whitespace characters mentioned by the relevant RFC. pg_isblank() is now static in hba.c and is a wrapper around the standard isblank(), with some extra logic to ensure robust treatment of non-ASCII characters. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/170308e6-a7a3-4484-87b2-f960bb564afa%40eisentraut.org	2025-11-28 08:33:07 +01:00
Amit Kapila	e68b6adad9	Add slotsync_skip_reason column to pg_replication_slots view. Introduce a new column, slotsync_skip_reason, in the pg_replication_slots view. This column records the reason why the last slot synchronization was skipped. It is primarily relevant for logical replication slots on standby servers where the 'synced' field is true. The value is NULL when synchronization succeeds. Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com> Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com	2025-11-28 05:21:35 +00:00
Michael Paquier	9660906dbd	Add routines for marking buffers dirty efficiently This commit introduces new internal bufmgr routines for marking shared buffers as dirty: * MarkDirtyUnpinnedBuffer() * MarkDirtyRelUnpinnedBuffers() * MarkDirtyAllUnpinnedBuffers() These functions provide an efficient mechanism to respectively mark one buffer, all the buffers of a relation, or the entire shared buffer pool as dirty, something that can be useful to force patterns for the checkpointer. MarkDirtyUnpinnedBufferInternal(), an extra routine, is used by these three, to mark as dirty an unpinned buffer. They are intended as developer tools to manipulate buffer dirtiness in bulk, and will be used in a follow-up commit. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Joseph Koshakow <koshy44@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Yuhang Qiu <iamqyh@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw@mail.gmail.com	2025-11-28 07:39:33 +09:00
Peter Eisentraut	e7075a3405	Use C11 alignas in pg_atomic_uint64 definitions They were already using pg_attribute_aligned. This replaces that with alignas and moves that into the required syntactic position. This ends up making these three atomics implementations appear a bit more consistent, but shouldn't change anything otherwise. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org	2025-11-27 07:53:34 +01:00
David Rowley	0ca3b16973	Add parallelism support for TID Range Scans In v14, `bb437f995` added support for scanning for ranges of TIDs using a dedicated executor node for the purpose. Here, we allow these scans to be parallelized. The range of blocks to scan is divvied up similarly to how a Parallel Seq Scans does that, where 'chunks' of blocks are allocated to each worker and the size of those chunks is slowly reduced down to 1 block per worker by the time we're nearing the end of the scan. Doing that means workers finish at roughly the same time. Allowing TID Range Scans to be parallelized removes the dilemma from the planner as to whether a Parallel Seq Scan will cost less than a non-parallel TID Range Scan due to the CPU concurrency of the Seq Scan (disk costs are not divided by the number of workers). It was possible the planner could choose the Parallel Seq Scan which would result in reading additional blocks during execution than the TID Scan would have. Allowing Parallel TID Range Scans removes the trade-off the planner makes when choosing between reduced CPU costs due to parallelism vs additional I/O from the Parallel Seq Scan due to it scanning blocks from outside of the required TID range. There is also, of course, the traditional parallelism performance benefits to be gained as well, which likely doesn't need to be explained here. Author: Cary Huang <cary.huang@highgo.ca> Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Steven Niu <niushiji@gmail.com> Discussion: https://postgr.es/m/18f2c002a24.11bc2ab825151706.3749144144619388582@highgo.ca	2025-11-27 14:05:04 +13:00
David Rowley	42473b3b31	Have the planner replace COUNT(ANY) with COUNT(), when possible This adds SupportRequestSimplifyAggref to allow pg_proc.prosupport functions to receive an Aggref and allow them to determine if there is a way that the Aggref call can be optimized. Also added is a support function to allow transformation of COUNT(ANY) into COUNT(). This is possible to do when the given "ANY" cannot be NULL and also that there are no ORDER BY / DISTINCT clauses within the Aggref. This is a useful transformation to do as it is common that people write COUNT(1), which until now has added unneeded overhead. When counting a NOT NULL column. The overheads can be worse as that might mean deforming more of the tuple, which for large fact tables may be many columns in. It may be possible to add prosupport functions for other aggregates. We could consider if ORDER BY could be dropped for some calls, e.g. the ORDER BY is quite useless in MAX(c ORDER BY c). There is a little bit of passing fallout from adjusting expr_is_nonnullable() to handle Const which results in a plan change in the aggregates.out regression test. Previously, nothing was able to determine that "One-Time Filter: (100 IS NOT NULL)" was always true, therefore useless to include in the plan. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Discussion: https://postgr.es/m/CAApHDvqGcPTagXpKfH=CrmHBqALpziThJEDs_MrPqjKVeDF9wA@mail.gmail.com	2025-11-27 10:43:28 +13:00
Jeff Davis	8d299052fe	Add #define for UNICODE_CASEMAP_BUFSZ. Useful for mapping a single codepoint at a time into a statically-allocated buffer. Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com Reviewed-by: Chao Li <li.evan.chao@gmail.com>	2025-11-26 10:05:11 -08:00
Jeff Davis	ec4997a9d7	Inline pg_ascii_tolower() and pg_ascii_toupper(). Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com Reviewed-by: Chao Li <li.evan.chao@gmail.com>	2025-11-26 10:04:32 -08:00
Peter Eisentraut	8fe4aef829	Replace internal C function pg_hypot() by standard hypot() The code comment said, "It is expected that this routine will eventually be replaced with the C99 hypot() function.", so let's do that now. This function is tested via the geometry regression test, so if it is faulty on any platform, it will show up there. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/170308e6-a7a3-4484-87b2-f960bb564afa%40eisentraut.org	2025-11-26 07:48:29 +01:00
Amit Kapila	76b78721ca	Add slotsync skip statistics. This patch adds two new columns to the pg_stat_replication_slots view: slotsync_skip_count - the total number of times a slotsync operation was skipped. slotsync_skip_at - the timestamp of the most recent skip. These additions provide better visibility into replication slot synchronization behavior. A future patch will introduce the slotsync_skip_reason column in pg_replication_slots to capture the reason for skip. Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com	2025-11-25 07:06:02 +00:00
Michael Paquier	ed823da128	Rename routines for write/read of pgstats file This commit renames write_chunk and read_chunk to respectively pgstat_write_chunk() and pgstat_read_chunk(), along with the *_s convenience macros. These are made available for plug-ins, so as any code that decides to write and/or read stats data can rely on a single code path for this work. Extracted from a larger patch by the same author. Author: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com	2025-11-25 10:55:40 +09:00
Tom Lane	698fa924b1	Improve detection of implicitly-temporary views. We've long had a practice of making views temporary by default if they reference any temporary tables. However the implementation was pretty incomplete, in that it only searched for RangeTblEntry references to temp relations. Uses of temporary types, regclass constants, etc were not detected even though the dependency mechanism considers them grounds for dropping the view. Thus a view not believed to be temp could silently go away at session exit anyhow. To improve matters, replace the ad-hoc isQueryUsingTempRelation() logic with use of the dependency-based infrastructure introduced by commit `572c40ba9`. This is complete by definition, and it's less code overall. While we're at it, we can also extend the warning NOTICE (or ERROR in the case of a materialized view) to mention one of the temp objects motivating the classification of the view as temp, as was done for functions in `572c40ba9`. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Jim Jones <jim.jones@uni-muenster.de> Discussion: https://postgr.es/m/19cf6ae1-04cd-422c-a760-d7e75fe6cba9@uni-muenster.de	2025-11-24 17:00:16 -05:00
Jacob Champion	0664aa4ff8	Reorganize pqcomm.h a bit Group the PG_PROTOCOL() codes, add a comment to AuthRequest now that the AUTH_REQ codes live in a different header, and make some small adjustments to spacing and comment style for the sake of scannability. Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CAOYmi%2B%3D6zg4oXXOQtifrVao_YKiujTDa3u6bxnU08r0FsSig4g%40mail.gmail.com	2025-11-24 10:01:30 -08:00
Jacob Champion	8934f2136c	Add pg_add_size_overflow() and friends Commit `600086f47` added (several bespoke copies of) size_t addition with overflow checks to libpq. Move this to common/int.h, along with its subtraction and multiplication counterparts. pg_neg_size_overflow() is intentionally omitted; I'm not sure we should add SSIZE_MAX to win32_port.h for the sake of a function with no callers. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAOYmi%2B%3D%2BpqUd2MUitvgW1pAJuXgG_TKCVc3_Ek7pe8z9nkf%2BAg%40mail.gmail.com	2025-11-24 09:59:38 -08:00
Peter Eisentraut	d4c0f91f7d	C11 alignas instead of unions -- extended alignments This replaces some uses of pg_attribute_aligned() with the standard alignas() for cases where extended alignment (larger than max_align_t) is required. This patch stipulates that all supported compilers must support alignments up to PG_IO_ALIGN_SIZE, but that seems pretty likely. We can then also desupport the case where direct I/O is disabled because pg_attribute_aligned is not supported. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org	2025-11-24 07:39:37 +01:00
David Rowley	07d1dc3aeb	Fix incorrect IndexOptInfo header comment The comment incorrectly indicated that indexcollations[] stored collations for both key columns and INCLUDE columns, but in reality it only has elements for the key columns. canreturn[] didn't get a mention, so add that while we're here. Author: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAEG8a3LwbZgMKOQ9CmZarX5DEipKivdHp5PZMOO-riL0w%3DL%3D4A%40mail.gmail.com Backpatch-through: 14	2025-11-24 17:00:01 +13:00
Tom Lane	572c40ba94	Issue a NOTICE if a created function depends on any temp objects. We don't have an official concept of temporary functions. (You can make one explicitly in pg_temp, but then you have to explicitly schema-qualify it on every call.) However, until now we were quite laissez-faire about whether a non-temporary function could depend on a temporary object, such as a temp table or view. If one does, it will silently go away at end of session, due to the automatic DROP ... CASCADE on the session's temporary objects. People have complained that that's surprising; however, we can't really forbid it because other people (including our own regression tests) rely on being able to do it. Let's compromise by emitting a NOTICE at CREATE FUNCTION time. This is somewhat comparable to our ancient practice of emitting a NOTICE when forcing a view to become temp because it depends on temp tables. Along the way, refactor recordDependencyOnExpr() so that the dependencies of an expression can be combined with other dependencies, instead of being emitted separately and perhaps duplicatively. We should probably make the implementation of temp-by-default views use the same infrastructure used here, but that's for another patch. It's unclear whether there are any other object classes that deserve similar treatment. Author: Jim Jones <jim.jones@uni-muenster.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/19cf6ae1-04cd-422c-a760-d7e75fe6cba9@uni-muenster.de	2025-11-23 15:02:55 -05:00
Tom Lane	b140c8d7a3	Add SupportRequestInlineInFrom planner support request. This request allows a support function to replace a function call appearing in FROM (typically a set-returning function) with an equivalent SELECT subquery. The subquery will then be subject to the planner's usual optimizations, potentially allowing a much better plan to be generated. While the planner has long done this automatically for simple SQL-language functions, it's now possible for extensions to do it for functions outside that group. Notably, this could be useful for functions that are presently implemented in PL/pgSQL and work by generating and then EXECUTE'ing a SQL query. Author: Paul A Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/09de6afa-c33d-4d94-a5cb-afc6cea0d2bb@illuminatedcomputing.com	2025-11-22 19:33:34 -05:00
Peter Eisentraut	5eed8ce50c	Add range_minus_multi and multirange_minus_multi functions The existing range_minus function raises an exception when the range is "split", because then the result can't be represented by a single range. For example '[0,10)'::int4range - '[4,5)' would be '[0,4)' and '[5,10)'. This commit adds new set-returning functions so that callers can get results even in the case of splits. There is no risk of an exception for multiranges, but a set-returning function lets us handle them the same way we handle ranges. Both functions return zero results if the subtraction would give an empty range/multirange. The main use-case for these functions is to implement UPDATE/DELETE FOR PORTION OF, which must compute the application-time of "temporal leftovers": the part of history in an updated/deleted row that was not changed. To preserve the untouched history, we will implicitly insert one record for each result returned by range/multirange_minus_multi. Using a set-returning function will also let us support user-defined types for application-time update/delete in the future. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/ec498c3d-5f2b-48ec-b989-5561c8aa2024%40illuminatedcomputing.com	2025-11-22 09:42:03 +01:00
Peter Eisentraut	97e04c74be	C11 alignas instead of unions This changes a few union members that only existed to ensure alignments and replaces them with the C11 alignas specifier. This change only uses fundamental alignments (meaning approximately alignments of basic types), which all C11 compilers must support. There are opportunities for similar changes using extended alignments, for example in PGIOAlignedBlock, but these are not necessarily supported by all compilers, so they are kept as a separate change. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org	2025-11-21 10:08:24 +01:00
Melanie Plageman	1937ed7062	Refactor heap_page_prune_and_freeze() parameters into a struct heap_page_prune_and_freeze() had accumulated an unwieldy number of input parameters and upcoming work to handle VM updates in this function will add even more. Introduce a new PruneFreezeParams struct to group the function’s input parameters, improving readability and maintainability. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc%407uw6jyyxuyf7	2025-11-20 10:32:14 -05:00
Peter Eisentraut	300c8f5324	Add <stdalign.h> to c.h This allows using the C11 constructs alignas and alignof (not done in this patch). Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org	2025-11-19 08:18:25 +01:00
Alexander Korotkov	75e82b2f5a	Optimize shared memory usage for WaitLSNProcInfo We need separate pairing heaps for different WaitLSNType's, because there might be waiters for different LSN's at the same time. However, one process can wait only for one type of LSN at a time. So, no need for inHeap and heapNode fields to be arrays. Discussion: https://postgr.es/m/CAPpHfdsBR-7sDtXFJ1qpJtKiohfGoj%3DvqzKVjWxtWsWidx7G_A%40mail.gmail.com Author: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>	2025-11-18 09:50:12 +02:00
Amit Kapila	3edaf29fa5	Rename two columns in pg_stat_subscription_stats. This patch renames the sync_error_count column to sync_table_error_count in the pg_stat_subscription_stats view. The new name makes the purpose explicit now that a separate column exists to track sequence synchronization errors. Additionally, the column seq_sync_error_count is renamed to sync_seq_error_count to maintain a consistent naming pattern, making it easier for users to group, and query synchronization related counters. Author: Vignesh C <vignesh21@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CALDaNm3WwJmz=-4ybTkhniB-Nf3qmFG9Zx1uKjyLLoPF5NYYXA@mail.gmail.com	2025-11-18 03:58:55 +00:00
Michael Paquier	e76defbcf0	Rework output format of pg_dependencies The existing format of pg_dependencies uses a single-object JSON structure, with each key value embedding all the knowledge about the set attributes tracked, like: {"1 => 5": 1.000000, "5 => 1": 0.423130} While this is a very compact format, it is confusing to read and it is difficult to manipulate the values within the object, particularly when tracking multiple attributes. The new output format introduced in this commit is a JSON array of objects, with: - A key named "degree", with a float value. - A key named "attributes", with an array of attribute numbers. - A key named "dependency", with an attribute number. The values use the same underlying type as previously when printed, with a new output format that shows now as follows: [{"degree": 1.000000, "attributes": [1], "dependency": 5}, {"degree": 0.423130, "attributes": [5], "dependency": 1}] This new format will become handy for a follow-up set of changes, so as it becomes possible to inject extended statistics rather than require an ANALYZE, like in a dump/restore sequence or after pg_upgrade on a new cluster. This format has been suggested by Tomas Vondra. The key names are defined in the header introduced by `1f927cce44`, to ease the integration of frontend-specific changes that are still under discussion. (Again a personal note: if anybody comes up with better name for the keys, of course feel free.) The bulk of the changes come from the regression tests, where jsonb_pretty() is now used to make the outputs generated easier to parse. Author: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com	2025-11-17 10:44:26 +09:00
Michael Paquier	1f927cce44	Rework output format of pg_ndistinct The existing format of pg_ndistinct uses a single-object JSON structure where each key is itself a comma-separated list of attnums, like: {"3, 4": 11, "3, 6": 11, "4, 6": 11, "3, 4, 6": 11} While this is a very compact format, it is confusing to read and it is difficult to manipulate the values within the object. The new output format introduced in this commit is an array of objects, with: - A key named "attributes", that contains an array of attribute numbers. - A key named "ndistinct", represented as an integer. The values use the same underlying type as previously when printed, with a new output format that shows now as follows: [{"ndistinct": 11, "attributes": [3,4]}, {"ndistinct": 11, "attributes": [3,6]}, {"ndistinct": 11, "attributes": [4,6]}, {"ndistinct": 11, "attributes": [3,4,6]}] This new format will become handy for a follow-up set of changes, so as it becomes possible to inject extended statistics rather than require an ANALYZE, like in a dump/restore sequence or after pg_upgrade on a new cluster. This format has been suggested by Tomas Vondra. The key names are defined in a new header, to ease with the integration of frontend-specific changes that are still under discussion. (Personal note: I am not specifically wedded to these key names, but if there are better name suggestions for this release, feel free.) The bulk of the changes come from the regression tests, where jsonb_pretty() is now used to make the outputs generated easier to parse. Author: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com	2025-11-17 09:52:20 +09:00
David Rowley	586d63214e	Adjust MemSet macro to use size_t rather than long Likewise for MemSetAligned. "long" wasn't the most suitable type for these macros as with MSVC in 64-bit builds, sizeof(long) == 4, which is narrower than the processor's word size, therefore these macros had to perform twice as many loops as they otherwise might. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CAApHDvoGFjSA3aNyVQ3ivbyc4ST=CC5L-_VjEUQ92HbE2Cxovg@mail.gmail.com	2025-11-17 12:27:00 +13:00
David Rowley	9c047da51f	Get rid of long datatype in CATCACHE_STATS enabled builds "long" is 32 bits on Windows 64-bit. Switch to a datatype that's 64-bit on all platforms. While we're there, use an unsigned type as these fields count things that have occurred, of which it's not possible to have negative numbers of. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CAApHDvoGFjSA3aNyVQ3ivbyc4ST=CC5L-_VjEUQ92HbE2Cxovg@mail.gmail.com	2025-11-17 12:26:41 +13:00
Alexander Korotkov	23792d7381	Fix incorrect function name in comments Update comments to reference WaitForLSN() instead of the outdated WaitForLSNReplay() function name. Discussion: https://postgr.es/m/CABPTF7UieOYbOgH3EnQCasaqcT1T4N6V2wammwrWCohQTnD_Lw%40mail.gmail.com Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>	2025-11-15 12:27:42 +02:00
Michael Paquier	84fb27511d	Replace off_t by pgoff_t in I/O routines PostgreSQL's Windows port has never been able to handle files larger than 2GB due to the use of off_t for file offsets, only 32-bit on Windows. This causes signed integer overflow at exactly 2^31 bytes when trying to handle files larger than 2GB, for the routines touched by this commit. Note that large files are forbidden by ./configure (`3c6248a828`) and meson (recent change, see `79cd66f28c`). This restriction also exists in v16 and older versions for the now-dead MSVC scripts. The code base already defines pgoff_t as __int64 (64-bit) on Windows for this purpose, and some function declarations in headers use it, but many internals still rely on off_t. This commit switches more routines to use pgoff_t, offering more portability, for areas mainly related to file extensions and storage. These are not critical for WAL segments yet, which have currently a maximum size allowed of 1GB (well, this opens the door at allowing a larger size for them). This matters more for segment files if we want to lift the large file restriction in ./configure and meson in the future, which would make sense to remove once/if all traces of off_t are gone from the tree. This can additionally matter for out-of-core code that may want files larger than 2GB in places where off_t is four bytes in size. Note that off_t is still used in other parts of the tree like buffile.c, WAL sender/receiver, base backup, pg_combinebackup, etc. These other code paths can be addressed separately, and their update will be required if we want to remove the large file restriction in the future. This commit is a good first cut in itself towards more portability, hopefully. On Unix-like systems, pgoff_t is defined as off_t, so this change only affects Windows behavior. Author: Bryan Green <dbryan.green@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/0f238ff4-c442-42f5-adb8-01b762c94ca1@gmail.com	2025-11-13 12:41:40 +09:00
Heikki Linnakangas	8eeb4a0f7c	Fix bug where we truncated CLOG that was still needed by LISTEN/NOTIFY The async notification queue contains the XID of the sender, and when processing notifications we call TransactionIdDidCommit() on the XID. But we had no safeguards to prevent the CLOG segments containing those XIDs from being truncated away. As a result, if a backend didn't for some reason process its notifications for a long time, or when a new backend issued LISTEN, you could get an error like: test=# listen c21; ERROR: 58P01: could not access status of transaction 14279685 DETAIL: Could not open file "pg_xact/000D": No such file or directory. LOCATION: SlruReportIOError, slru.c:1087 To fix, make VACUUM "freeze" the XIDs in the async notification queue before truncating the CLOG. Old XIDs are replaced with FrozenTransactionId or InvalidTransactionId. Note: This commit is not a full fix. A race condition remains, where a backend is executing asyncQueueReadAllNotifications() and has just made a local copy of an async SLRU page which contains old XIDs, while vacuum concurrently truncates the CLOG covering those XIDs. When the backend then calls TransactionIdDidCommit() on those XIDs from the local copy, you still get the error. The next commit will fix that remaining race condition. This was first reported by Sergey Zhuravlev in 2021, with many other people hitting the same issue later. Thanks to: - Alexandra Wang, Daniil Davydov, Andrei Varashen and Jacques Combrink for investigating and providing reproducable test cases, - Matheus Alcantara and Arseniy Mukhin for review and earlier proposed patches to fix this, - Álvaro Herrera and Masahiko Sawada for reviews, - Yura Sokolov aka funny-falcon for the idea of marking transactions as committed in the notification queue, and - Joel Jacobson for the final patch version. I hope I didn't forget anyone. Backpatch to all supported versions. I believe the bug goes back all the way to commit `d1e027221d`, which introduced the SLRU-based async notification queue. Discussion: https://www.postgresql.org/message-id/16961-25f29f95b3604a8a@postgresql.org Discussion: https://www.postgresql.org/message-id/18804-bccbbde5e77a68c2@postgresql.org Discussion: https://www.postgresql.org/message-id/CAK98qZ3wZLE-RZJN_Y%2BTFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw@mail.gmail.com Backpatch-through: 14	2025-11-12 20:59:36 +02:00
Álvaro Herrera	877a024902	Split out innards of pg_tablespace_location() This creates a src/backend/catalog/pg_tablespace.c supporting file containing a new function get_tablespace_location(), which lets the code underlying pg_tablespace_location() be reused for other purposes. Author: Manni Wood <manni.wood@enterprisedb.com> Author: Nishant Sharma <nishant.sharma@enterprisedb.com> Reviewed-by: Vaibhav Dalvi <vaibhav.dalvi@enterprisedb.com> Reviewed-by: Ian Lawrence Barwick <barwick@gmail.com> Reviewed-by: Jim Jones <jim.jones@uni-muenster.de> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CAKWEB6rmnmGKUA87Zmq-s=b3Scsnj02C0kObQjnbL2ajfPWGEw@mail.gmail.com	2025-11-12 16:39:55 +01:00
Peter Eisentraut	d2f24df19b	Clean up qsort comparison function for GUC entries guc_var_compare() is invoked from qsort() on an array of struct config_generic, but the function accesses these directly as strings (char *). This relies on the name being the first field, so this works. But we can write this more clearly by using the struct and then accessing the field through the struct. Before the reorganization of the GUC structs (commit `a13833c35f`), the old code was probably more convenient, but now we can write this more clearly and correctly. After this change, it is no longer required that the name is the first field in struct config_generic, so remove that comment. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/2c961fa1-14f6-44a2-985c-e30b95654e8d%40eisentraut.org	2025-11-11 07:55:10 +01:00
Heikki Linnakangas	e510378358	Bump PG_CONTROL_VERSION for commit `3e0ae46d90` Commit `3e0ae46d90` added a field to ControlFileData and bumped CATALOG_VERSION_NO, but CATALOG_VERSION_NO is not the right version number for ControlFileData changes. Bumping either one will force an initdb, but PG_CONTROL_VERSION is more accurate. Bump PG_CONTROL_VERSION now. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/1874404.1762787779@sss.pgh.pa.us	2025-11-10 19:12:43 +02:00
Nathan Bossart	5e4fcbe531	Check for CREATE privilege on the schema in CREATE STATISTICS. This omission allowed table owners to create statistics in any schema, potentially leading to unexpected naming conflicts. For ALTER TABLE commands that require re-creating statistics objects, skip this check in case the user has since lost CREATE on the schema. The addition of a second parameter to CreateStatistics() breaks ABI compatibility, but we are unaware of any impacted third-party code. Reported-by: Jelte Fennema-Nio <postgres@jeltef.nl> Author: Jelte Fennema-Nio <postgres@jeltef.nl> Co-authored-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Security: CVE-2025-12817 Backpatch-through: 13	2025-11-10 09:00:00 -06:00
Heikki Linnakangas	3e0ae46d90	Move SLRU_PAGES_PER_SEGMENT to pg_config_manual.h It seems plausible that someone might want to experiment with different values. The pressing reason though is that I'm reviewing a patch that requires pg_upgrade to manipulate SLRU files. That patch needs to access SLRU_PAGES_PER_SEGMENT from pg_upgrade code, and slru.h, where SLRU_PAGES_PER_SEGMENT is currently defined, cannot be included from frontend code. Moving it to pg_config_manual.h makes it accessible. Now that it's a little more likely that someone might change SLRU_PAGES_PER_SEGMENT, add a cluster compatibility check for it. Bump catalog version because of the new field in the control file. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://www.postgresql.org/message-id/c7a4ea90-9f7b-4953-81be-b3fcb47db057@iki.fi	2025-11-10 16:11:41 +02:00
Thomas Munro	c5d34f4a55	Fix generic read and write barriers for Clang. generic-gcc.h maps our read and write barriers to C11 acquire and release fences using compiler builtins, for platforms where we don't have our own hand-rolled assembler. This is apparently enough for GCC, but the C11 memory model is only defined in terms of atomic accesses, and our barriers for non-atomic, non-volatile accesses were not always respected under Clang's stricter interpretation of the standard. This explains the occasional breakage observed on new RISC-V + Clang animal greenfly in lock-free PgAioHandle manipulation code containing a repeating pattern of loads and read barriers. The problem can also be observed in code generated for MIPS and LoongAarch, though we aren't currently testing those with Clang, and on x86, though we use our own assembler there. The scariest aspect is that we use the generic version on very common ARM systems, but it doesn't seem to reorder the relevant code there (or we'd have debugged this long ago). Fix by inserting an explicit compiler barrier. It expands to an empty assembler block declared to have memory side-effects, so registers are flushed and reordering is prevented. In those respects this is like the architecture-specific assembler versions, but the compiler is still in charge of generating the appropriate fence instruction. Done for write barriers on principle, though concrete problems have only been observed with read barriers. Reported-by: Alexander Lakhin <exclusion@gmail.com> Tested-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/d79691be-22bd-457d-9d90-18033b78c40a%40gmail.com Backpatch-through: 13	2025-11-08 12:26:43 +13:00
Amit Kapila	f6a4c498dc	Add seq_sync_error_count to subscription statistics. This commit adds a new column, seq_sync_error_count, to the pg_stat_subscription_stats view. This counter tracks the number of errors encountered by the sequence synchronization worker during operation. Since a single worker handles the synchronization of all sequences, this value may reflect errors from multiple sequences. This addition improves observability of sequence synchronization behavior and helps monitor potential issues during replication. Author: Vignesh C <vignesh21@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com	2025-11-07 08:05:08 +00:00
Andres Freund	c75ebc657f	bufmgr: Allow some buffer state modifications while holding header lock Until now BufferDesc.state was not allowed to be modified while the buffer header spinlock was held. This meant that operations like unpinning buffers needed to use a CAS loop, waiting for the buffer header spinlock to be released before updating. The benefit of that restriction is that it allowed us to unlock the buffer header spinlock with just a write barrier and an unlocked write (instead of a full atomic operation). That was important to avoid regressions in `48354581a4`. However, since then the hottest buffer header spinlock uses have been replaced with atomic operations (in particular, the most common use of PinBuffer_Locked(), in GetVictimBuffer() (formerly in BufferAlloc()), has been removed in `5e89985928`). This change will allow, in a subsequent commit, to release buffer pins with a single atomic-sub operation. This previously was not possible while such operations were not allowed while the buffer header spinlock was held, as an atomic-sub would not have allowed a race-free check for the buffer header lock being held. Using atomic-sub to unpin buffers is a nice scalability win, however it is not the primary motivation for this change (although it would be sufficient). The primary motivation is that we would like to merge the buffer content lock into BufferDesc.state, which will result in more frequent changes of the state variable, which in some situations can cause a performance regression, due to an increased CAS failure rate when unpinning buffers. The regression entirely vanishes when using atomic-sub. Naively implementing this would require putting CAS loops in every place modifying the buffer state while holding the buffer header lock. To avoid that, introduce UnlockBufHdrExt(), which can set/add flags as well as the refcount, together with releasing the lock. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-11-06 16:42:10 -05:00
Álvaro Herrera	06edbed478	Introduce XLogRecPtrIsValid() XLogRecPtrIsInvalid() is inconsistent with the affirmative form of macros used for other datatypes, and leads to awkward double negatives in a few places. This commit introduces XLogRecPtrIsValid(), which allows code to be written more naturally. This patch only adds the new macro. XLogRecPtrIsInvalid() is left in place, and all existing callers remain untouched. This means all supported branches can accept hypothetical bug fixes that use the new macro, and at the same time any code that compiled with the original formulation will continue to silently compile just fine. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Backpatch-through: 13 Discussion: https://postgr.es/m/aQB7EvGqrbZXrMlg@ip-10-97-1-34.eu-west-3.compute.internal	2025-11-06 19:08:29 +01:00
Heikki Linnakangas	aa9c5fd3e3	Refactor shared memory allocation for semaphores Before commit `e25626677f`, spinlocks were implemented using semaphores on some platforms (--disable-spinlocks). That made it necessary to initialize semaphores early, before any spinlocks could be used. Now that we don't support --disable-spinlocks anymore, we can allocate the shared memory needed for semaphores the same way as other shared memory structures. Since the semaphores are used only in the PGPROC array, move the semaphore shmem size estimation and initialization calls to ProcGlobalShmemSize() and InitProcGlobal(). Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/CAExHW5seSZpPx-znjidVZNzdagGHOk06F+Ds88MpPUbxd1kTaA@mail.gmail.com	2025-11-06 14:45:00 +02:00
Peter Eisentraut	01a985c3c4	Re-run autoheader Some of the changes in pg_config.h.in from commit `3853a6956c` didn't match the order that a fresh run would produce.	2025-11-06 07:37:22 +01:00
Alexander Korotkov	447aae13b0	Implement WAIT FOR command WAIT FOR is to be used on standby and specifies waiting for the specific WAL location to be replayed. This option is useful when the user makes some data changes on primary and needs a guarantee to see these changes are on standby. WAIT FOR needs to wait without any snapshot held. Otherwise, the snapshot could prevent the replay of WAL records, implying a kind of self-deadlock. This is why separate utility command seems appears to be the most robust way to implement this functionality. It's not possible to implement this as a function. Previous experience shows that stored procedures also have limitation in this aspect. Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: jian he <jian.universality@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>	2025-11-05 11:44:13 +02:00
Alexander Korotkov	3b4e53a075	Add infrastructure for efficient LSN waiting Implement a new facility that allows processes to wait for WAL to reach specific LSNs, both on primary (waiting for flush) and standby (waiting for replay) servers. The implementation uses shared memory with per-backend information organized into pairing heaps, allowing O(1) access to the minimum waited LSN. This enables fast-path checks: after replaying or flushing WAL, the startup process or WAL writer can quickly determine if any waiters need to be awakened. Key components: - New xlogwait.c/h module with WaitForLSNReplay() and WaitForLSNFlush() - Separate pairing heaps for replay and flush waiters - WaitLSN lightweight lock for coordinating shared state - Wait events WAIT_FOR_WAL_REPLAY and WAIT_FOR_WAL_FLUSH for monitoring This infrastructure can be used by features that need to wait for WAL operations to complete. Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>	2025-11-05 11:44:13 +02:00
Alexander Korotkov	8af3ae0d4b	Add pairingheap_initialize() for shared memory usage The existing pairingheap_allocate() uses palloc(), which allocates from process-local memory. For shared memory use cases, the pairingheap structure must be allocated via ShmemAlloc() or embedded in a shared memory struct. Add pairingheap_initialize() to initialize an already- allocated pairingheap structure in-place, enabling shared memory usage. Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru> Author: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>	2025-11-05 11:44:13 +02:00
Amit Kapila	5509055d69	Add sequence synchronization for logical replication. This patch introduces sequence synchronization. Sequences that are synced will have 2 states: - INIT (needs [re]synchronizing) - READY (is already synchronized) A new sequencesync worker is launched as needed to synchronize sequences. A single sequencesync worker is responsible for synchronizing all sequences. It begins by retrieving the list of sequences that are flagged for synchronization, i.e., those in the INIT state. These sequences are then processed in batches, allowing multiple entries to be synchronized within a single transaction. The worker fetches the current sequence values and page LSNs from the remote publisher, updates the corresponding sequences on the local subscriber, and finally marks each sequence as READY upon successful synchronization. Sequence synchronization occurs in 3 places: 1) CREATE SUBSCRIPTION - The command syntax remains unchanged. - The subscriber retrieves sequences associated with publications. - Published sequences are added to pg_subscription_rel with INIT state. - Initiate the sequencesync worker to synchronize all sequences. 2) ALTER SUBSCRIPTION ... REFRESH PUBLICATION - The command syntax remains unchanged. - Dropped published sequences are removed from pg_subscription_rel. - Newly published sequences are added to pg_subscription_rel with INIT state. - Initiate the sequencesync worker to synchronize only newly added sequences. 3) ALTER SUBSCRIPTION ... REFRESH SEQUENCES - A new command introduced for PG19 by `f0b3573c3a`. - All sequences in pg_subscription_rel are reset to INIT state. - Initiate the sequencesync worker to synchronize all sequences. - Unlike "ALTER SUBSCRIPTION ... REFRESH PUBLICATION" command, addition and removal of missing sequences will not be done in this case. Author: Vignesh C <vignesh21@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com> Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com	2025-11-05 05:59:58 +00:00
Michael Paquier	e0ca61e7c4	Add WalRcvGetState() to retrieve the state of a WAL receiver This has come up as useful as an alternative of WalRcvStreaming(), to be able to do sanity checks based on the state of a WAL receiver. This will be used in a follow-up commit. Author: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/19093-c4fff49a608f82a0@postgresql.org	2025-11-04 12:57:36 +09:00
Michael Paquier	17b2d5ec75	Fix unconditional WAL receiver shutdown during stream-archive transition Commit `b4f584f9d2` (affecting v15~, later backpatched down to 13 as of `3635a0a35a`) introduced an unconditional WAL receiver shutdown when switching from streaming to archive WAL sources. This causes problems during a timeline switch, when a WAL receiver enters WALRCV_WAITING state but remains alive, waiting for instructions. The unconditional shutdown can break some monitoring scenarios as the WAL receiver gets repeatedly terminated and re-spawned, causing pg_stat_wal_receiver.status to show a "streaming" instead of "waiting" status, masking the fact that the WAL receiver is waiting for a new TLI and a new LSN to be able to continue streaming. This commit changes the WAL receiver behavior so as the shutdown becomes conditional, with InstallXLogFileSegmentActive being always reset to prevent the regression fixed by `b4f584f9d2`: only terminate the WAL receiver when it is actively streaming (WALRCV_STREAMING, WALRCV_STARTING, or WALRCV_RESTARTING). When in WALRCV_WAITING state, just reset InstallXLogFileSegmentActive flag to allow archive restoration without killing the process. WALRCV_STOPPED and WALRCV_STOPPING are not reachable states in this code path. For the latter, the startup process is the one in charge of setting WALRCV_STOPPING via ShutdownWalRcv(), waiting for the WAL receiver to reach a WALRCV_STOPPED state after switching walRcvState, so WaitForWALToBecomeAvailable() cannot be reached while a WAL receiver is in a WALRCV_STOPPING state. A regression test is added to check that a WAL receiver is not stopped on timeline jump, that fails when the fix of this commit is reverted. Reported-by: Ryan Bird <ryanzxg@gmail.com> Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/19093-c4fff49a608f82a0@postgresql.org Backpatch-through: 13	2025-11-04 10:47:38 +09:00
Álvaro Herrera	645cb44c54	Add \pset options for boolean value display New \pset variables display_true and display_false allow the user to change how true and false values are displayed. Author: David G. Johnston <David.G.Johnston@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CAKFQuwYts3vnfQ5AoKhEaKMTNMfJ443MW2kFswKwzn7fiofkrw@mail.gmail.com Discussion: https://postgr.es/m/56308F56.8060908@joh.to	2025-11-03 17:40:39 +01:00
Tom Lane	8f29467c57	Change "long" numGroups fields to be Cardinality (i.e., double). We've been nibbling away at removing uses of "long" for a long time, since its width is platform-dependent. Here's one more: change the remaining "long" fields in Plan nodes to Cardinality, since the three surviving examples all represent group-count estimates. The upstream planner code was converted to Cardinality some time ago; for example the corresponding fields in Path nodes are type Cardinality, as are the arguments of the make_foo_path functions. Downstream in the executor, it turns out that these all feed to the table-size argument of BuildTupleHashTable. Change that to "double" as well, and fix it so that it safely clamps out-of-range values to the uint32 limit of simplehash.h, as was not being done before. Essentially, this is removing all the artificial datatype-dependent limitations on these values from upstream processing, and applying just one clamp at the moment where we're forced to do so by the datatype choices of simplehash.h. Also, remove BuildTupleHashTable's misguided attempt to enforce work_mem/hash_mem_limit. It doesn't have enough information (particularly not the expected tuple width) to do that accurately, and it has no real business second-guessing the caller's choice. For all these plan types, it's really the planner's responsibility to not choose a hashed implementation if the hashtable is expected to exceed hash_mem_limit. The previous patch improved the accuracy of those estimates, and even if BuildTupleHashTable had more information it should arrive at the same conclusions. Reported-by: Jeff Janes <jeff.janes@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com	2025-11-02 16:57:43 -05:00
Tom Lane	1ea5bdb00b	Improve planner's estimates of tuple hash table sizes. For several types of plan nodes that use TupleHashTables, the planner estimated the expected size of the table as basically numEntries * (MAXALIGN(dataWidth) + MAXALIGN(SizeofHeapTupleHeader)). This is pretty far off, especially for small data widths, because it doesn't account for the overhead of the simplehash.h hash table nor for any per-tuple "additional space" the plan node may request. Jeff Janes noted a case where the estimate was off by about a factor of three, even though the obvious hazards such as inaccurate estimates of numEntries or dataWidth didn't apply. To improve matters, create functions provided by the relevant executor modules that can estimate the required sizes with reasonable accuracy. (We're still not accounting for effects like allocator padding, but this at least gets the first-order effects correct.) I added functions that can estimate the tuple table sizes for nodeSetOp and nodeSubplan; these rely on an estimator for TupleHashTables in general, and that in turn relies on one for simplehash.h hash tables. That feels like kind of a lot of mechanism, but if we take any short-cuts we're violating modularity boundaries. The other places that use TupleHashTables are nodeAgg, which took pains to get its numbers right already, and nodeRecursiveunion. I did not try to improve the situation for nodeRecursiveunion because there's nothing to improve: we are not making an estimate of the hash table size, and it wouldn't help us to do so because we have no non-hashed alternative implementation. On top of that, our estimate of the number of entries to be hashed in that module is so suspect that we'd likely often choose the wrong implementation if we did have two ways to do it. Reported-by: Jeff Janes <jeff.janes@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com	2025-11-02 16:57:26 -05:00
Tom Lane	645c1e2752	Avoid mixing void and integer in a conditional expression. The C standard says that the second and third arguments of a conditional operator shall be both void type or both not-void type. The Windows version of INTERRUPTS_PENDING_CONDITION() got this wrong. It's pretty harmless because the result of the operator is ignored anyway, but apparently recent versions of MSVC have started issuing a warning about it. Silence the warning by casting the dummy zero to void. Reported-by: Christian Ullrich <chris@chrullrich.net> Author: Bryan Green <dbryan.green@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/cc4ef8db-f8dc-4347-8a22-e7ebf44c0308@chrullrich.net Backpatch-through: 13	2025-11-02 12:30:44 -05:00
Peter Eisentraut	8a27d418f8	Mark function arguments of type "Datum " as "const Datum " where possible Several functions in the codebase accept "Datum " parameters but do not modify the pointed-to data. These have been updated to take "const Datum " instead, improving type safety and making the interfaces clearer about their intent. This change helps the compiler catch accidental modifications and better documents immutability of arguments. Most of "Datum " parameters have a pairing "bool isnull" parameter, they are constified as well. No functional behavior is changed by this patch. Author: Chao Li <lic@highgo.com> Discussion: https://www.postgresql.org/message-id/flat/CAEoWx2msfT0knvzUa72ZBwu9LR_RLY4on85w2a9YpE-o2By5HQ@mail.gmail.com	2025-10-31 10:47:25 +01:00
Tom Lane	c106ef0807	Use BumpContext contexts in TupleHashTables, and do some code cleanup. For all extant uses of TupleHashTables, execGrouping.c itself does nothing with the "tablecxt" except to allocate new hash entries in it, and the callers do nothing with it except to reset the whole context. So this is an ideal use-case for a BumpContext, and the hash tables are frequently big enough for the savings to be significant. (Commit `cc721c459` already taught nodeAgg.c this idea, but neglected the other callers of BuildTupleHashTable.) While at it, let's clean up some ill-advised leftovers from rebasing TupleHashTables on simplehash.h: * Many comments and variable names were based on the idea that the tablecxt holds the whole TupleHashTable, whereas now it only holds the hashed tuples (plus any caller-defined "additional storage"). Rename to names like tuplescxt and tuplesContext, and adjust the comments. Also adjust the memory context names to be like "<Foo> hashed tuples". * Make ResetTupleHashTable() reset the tuplescxt rather than relying on the caller to do so; that was fairly bizarre and seems like a recipe for leaks. This is less efficient in the case where nodeAgg.c uses the same tuplescxt for several different hashtables, but only microscopically so because mcxt.c will short-circuit the extra resets via its isReset flag. I judge the extra safety and intellectual cleanliness well worth those few cycles. * Remove the long-obsolete "allow_jit" check added by ac88807f9; instead, just Assert that metacxt and tuplescxt are different. We need that anyway for this definition of ResetTupleHashTable() to be safe. There is a side issue of the extent to which this change invalidates the planner's estimates of hashtable memory consumption. However, those estimates are already pretty bad, so improving them seems like it can be a separate project. This change is useful to do first to establish consistent executor behavior that the planner can expect. A loose end not addressed here is that the "entrysize" calculation in BuildTupleHashTable seems wrong: "sizeof(TupleHashEntryData) + additionalsize" corresponds neither to the size of the simplehash entries nor to the total space needed per tuple. It's questionable why BuildTupleHashTable is second-guessing its caller's nbuckets choice at all, since the original source of the number should have had more information. But that all seems wrapped up with the planner's estimation logic, so let's leave it for the planned followup patch. Reported-by: Jeff Janes <jeff.janes@gmail.com> Reported-by: David Rowley <dgrowleyml@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com Discussion: https://postgr.es/m/2268409.1761512111@sss.pgh.pa.us	2025-10-30 11:21:22 -04:00
Peter Eisentraut	e1ac846f3d	Mark ItemPointer arguments as const throughout This is a follow up `991295f`. I searched over src/ and made all ItemPointer arguments as const as much as possible. Note: We cut out from the original patch the pieces that would have created incompatibilities in the index or table AM APIs. Those could be considered separately. Author: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/CAEoWx2nBaypg16Z5ciHuKw66pk850RFWw9ACS2DqqJ_AkKeRsw%40mail.gmail.com	2025-10-30 14:12:06 +01:00
Peter Eisentraut	8ce795fcb7	Fix some confusing uses of const There are a few places where we have typedef struct FooData { ... } FooData; typedef FooData Foo; and then function declarations with bar(const Foo x) which isn't incorrect but probably meant bar(const FooData x) meaning that the thing x points to is immutable, not x itself. This patch makes those changes where appropriate. In one case (execGrouping.c), the thing being pointed to was not immutable, so in that case remove the const altogether, to avoid further confusion. Co-authored-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/CAEoWx2m2E0xE8Kvbkv31ULh_E%2B5zph-WA_bEdv3UR9CLhw%2B3vg%40mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAEoWx2kTDz%3Db6T2xHX78vy_B_osDeCC5dcTCi9eG0vXHp5QpdQ%40mail.gmail.com	2025-10-30 11:20:04 +01:00
Peter Eisentraut	3479a0f823	const-qualify ItemPointer comparison functions Add const qualifiers to ItemPointerEquals() and ItemPointerCompare(). This will allow further changes up the stack. It also complements commit `aeb767ca0b`, as we now have all of itemptr.h appropriately const-qualified. Author: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAEoWx2nBaypg16Z5ciHuKw66pk850RFWw9ACS2DqqJ_AkKeRsw@mail.gmail.com	2025-10-30 10:13:47 +01:00
Jeff Davis	3853a6956c	Use C11 char16_t and char32_t for Unicode code points. Reviewed-by: Tatsuo Ishii <ishii@postgresql.org> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/bedcc93d06203dfd89815b10f815ca2de8626e85.camel%40j-davis.com	2025-10-29 14:17:13 -07:00
Peter Eisentraut	a13833c35f	Reorganize GUC structs Instead of having five separate GUC structs, one for each type, with the generic part contained in each of them, flip it around and have one common struct, with the type-specific part has a subfield. The very original GUC design had type-specific structs and type-specific lists, and the membership in one of the lists defined the type. But now the structs themselves know the type (from the .vartype field), and they are all loaded into a common hash table at run time, and so this original separation no longer makes sense. It creates a bunch of inconsistencies in the code about whether the type-specific or the generic struct is the primary struct, and a lot of casting in between, which makes certain assumptions about the struct layouts. After the change, all these casts are gone and all the data is accessed via normal field references. Also, various code is simplified because only one kind of struct needs to be processed. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/flat/8fdfb91e-60fb-44fa-8df6-f5dea47353c9@eisentraut.org	2025-10-29 09:52:29 +01:00
Peter Eisentraut	f0f2c0c1ae	Replace pg_restrict by standard restrict MSVC in C11 mode supports the standard restrict qualifier, so we don't need the workaround naming pg_restrict anymore. Even though restrict is in C99 and should be supported by all supported compilers, we keep the configure test and the hardcoded redirection to __restrict, because that will also work in C++ in all supported compilers. (restrict is not part of the C++ standard.) For backward compatibility for extensions, we keep a #define of pg_restrict around, but our own code doesn't use it anymore. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/0e3d8644-c01d-4374-86ea-9f0a987981f0%40eisentraut.org	2025-10-29 07:52:58 +01:00
Peter Eisentraut	c094be259b	Remove obsolete comment The comment "type prefixes (const, signed, volatile, inline) are handled in pg_config.h." has been mostly not true for a long time.	2025-10-29 07:32:21 +01:00
Michael Paquier	d3111cb753	Fix correctness issue with computation of FPI size in WAL stats XLogRecordAssemble() may be called multiple times before inserting a record in XLogInsertRecord(), and the amount of FPIs generated inside a record whose insertion is attempted multiple times may vary. The logic added in `f9a09aa295` touched directly pgWalUsage in XLogRecordAssemble(), meaning that it could be possible for pgWalUsage to be incremented multiple times for a single record. This commit changes the code to use the same logic as the number of FPIs added to a record, where XLogRecordAssemble() returns this information and feeds it to XLogInsertRecord(), updating pgWalUsage only when a record is inserted. Reported-by: Shinya Kato <shinya11.kato@gmail.com> Discussion: https://postgr.es/m/CAOzEurSiSr+rusd0GzVy8Bt30QwLTK=ugVMnF6=5WhsSrukvvw@mail.gmail.com	2025-10-29 09:13:31 +09:00
Michael Paquier	f9a09aa295	Add wal_fpi_bytes to pg_stat_wal and pg_stat_get_backend_wal() This new counter, called "wal_fpi_bytes", tracks the total amount in bytes of full page images (FPIs) generated in WAL. This data becomes available globally via pg_stat_wal, and for backend statistics via pg_stat_get_backend_wal(). Previously, this information could only be retrieved with pg_waldump or pg_walinspect, which may not be available depending on the environment, and are expensive to execute. It offers hints about how much FPIs impact the WAL generated, which could be a large percentage for some workloads, as well as the effects of wal_compression or page holes. Bump catalog version. Bump PGSTAT_FILE_FORMAT_ID, due to the addition of wal_fpi_bytes in PgStat_WalCounters. Author: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAOzEurQtZEAfg6P0kU3Wa-f9BWQOi0RzJEMPN56wNTOmJLmfaQ@mail.gmail.com	2025-10-28 16:21:51 +09:00
Amit Kapila	3e8e05596a	Add worker type argument to logical replication worker functions. Extend logicalrep_worker_stop, logicalrep_worker_wakeup, and logicalrep_worker_find to accept a worker type argument. This change enables differentiation between logical replication worker types, such as apply workers and table sync workers. While preserving existing behavior, it lays the groundwork for upcoming patch to add sequence synchronization workers. Author: Vignesh C <vignesh21@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com	2025-10-28 05:47:50 +00:00
Peter Eisentraut	10b5bb3bff	Add some const qualifications Add some const qualifications afforded by the previous change that added a const qualification to PageAddItemExtended(). Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://www.postgresql.org/message-id/flat/c75cccf5-5709-407b-a36a-2ae6570be766@eisentraut.org	2025-10-27 09:55:59 +01:00
Peter Eisentraut	76acf4b722	Remove Item type This type is just char * underneath, it provides no real value, no type safety, and just makes the code one level more mysterious. It is more idiomatic to refer to blobs of memory by a combination of void * and size_t, so change it to that. Also, since this type hides the pointerness, we can't apply qualifiers to what is pointed to, which requires some unconstify nonsense. This change allows fixing that. Extension code that uses the Item type can change its code to use void * to be backward compatible. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://www.postgresql.org/message-id/flat/c75cccf5-5709-407b-a36a-2ae6570be766@eisentraut.org	2025-10-27 09:55:59 +01:00
Peter Eisentraut	64d2b0968e	Remove meaninglist restrict qualifiers The use of the restrict qualifier in casts is meaningless, so remove them. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/0e3d8644-c01d-4374-86ea-9f0a987981f0%40eisentraut.org	2025-10-27 08:53:09 +01:00
Jeff Davis	371a302eec	Comment typo fixes: pg_wchar_t should be pg_wchar. Reported-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CA+hUKGJ5Xh0KxLYXDZuPvw1_fHX=yuzb4xxtam1Cr6TPZZ1o+w@mail.gmail.com	2025-10-26 12:31:50 -07:00
Amit Kapila	f0b3573c3a	Introduce "REFRESH SEQUENCES" for subscriptions. This patch adds support for a new SQL command: ALTER SUBSCRIPTION ... REFRESH SEQUENCES This command updates the sequence entries present in the pg_subscription_rel catalog table with the INIT state to trigger resynchronization. In addition to the new command, the following subscription commands have been enhanced to automatically refresh sequence mappings: ALTER SUBSCRIPTION ... REFRESH PUBLICATION ALTER SUBSCRIPTION ... ADD PUBLICATION ALTER SUBSCRIPTION ... DROP PUBLICATION ALTER SUBSCRIPTION ... SET PUBLICATION These commands will perform the following actions: Add newly published sequences that are not yet part of the subscription. Remove sequences that are no longer included in the publication. This ensures that sequence replication remains aligned with the current state of the publication on the publisher side. Note that the actual synchronization of sequence data/values will be handled in a subsequent patch that introduces a dedicated sequence sync worker. Author: Vignesh C <vignesh21@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com> Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com> Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com	2025-10-23 08:30:27 +00:00
Nathan Bossart	d10866f1fd	Fix type of infomask parameter in htup_details.h functions. Oversight in commit `34694ec888`. Since there aren't any known live bugs related to this, no back-patch. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/aPk4u955ZPPZ_nYw%40nathan	2025-10-22 16:47:38 -05:00
Fujii Masao	f33e60a53a	Make invalid primary_slot_name follow standard GUC error reporting. Previously, if primary_slot_name was set to an invalid slot name and the configuration file was reloaded, both the postmaster and all other backend processes reported a WARNING. With many processes running, this could produce a flood of duplicate messages. The problem was that the GUC check hook for primary_slot_name reported errors at WARNING level via ereport(). This commit changes the check hook to use GUC_check_errdetail() and GUC_check_errhint() for error reporting. As with other GUC parameters, this causes non-postmaster processes to log the message at DEBUG3, so by default, only the postmaster's message appears in the log file. Backpatch to all supported versions. Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Chao Li <lic@highgo.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Discussion: https://postgr.es/m/CAHGQGwFud-cvthCTfusBfKHBS6Jj6kdAPTdLWKvP2qjUX6L_wA@mail.gmail.com Backpatch-through: 13	2025-10-22 20:09:43 +09:00
Michael Paquier	2519fa8362	Bump catalog version for new function error_on_null() Oversight in `2b75c38b70`. No comments. Discussion: https://postgr.es/m/aPgu7kwiT4iGo6Ya@paquier.xyz	2025-10-22 10:08:47 +09:00
Michael Paquier	2b75c38b70	Add error_on_null(), checking if the input is the null value This polymorphic function produces an error if the input value is detected as being the null value; otherwise it returns the input value unchanged. This function can for example become handy in SQL function bodies, to enforce that exactly one row was returned. Author: Joel Jacobson <joel@compiler.org> Reviewed-by: Vik Fearing <vik@postgresfriends.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/ece8c6d1-2ab1-45d5-ba12-8dec96fc8886@app.fastmail.com Discussion: https://postgr.es/m/de94808d-ed58-4536-9e28-e79b09a534c7@app.fastmail.com	2025-10-22 09:55:17 +09:00
Jeff Davis	ff53907c35	Make char2wchar() static. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com	2025-10-21 09:32:12 -07:00
Jeff Davis	844385d12e	Remove obsolete global database_ctype_is_c. Now that tsearch uses the database default locale, there's no need to track the database CTYPE separately. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com	2025-10-21 09:32:04 -07:00
Álvaro Herrera	b7cc6474e9	Make smgr access for a BufferManagerRelation safer in relcache inval Currently there's no bug, because we have no code path where we invalidate relcache entries where it'd cause a problem. But it's more robust to do it this way in case we introduce such a path later, as some Postgres forks reportedly already have. Author: Daniil Davydov <3danissimo@gmail.com> Reviewed-by: Stepan Neretin <slpmcf@gmail.com> Discussion: https://postgr.es/m/CAJDiXgj3FNzAhV+jjPqxMs3jz=OgPohsoXFj_fh-L+nS+13CKQ@mail.gmail.com	2025-10-21 10:51:55 +03:00
Jeff Davis	e533524b23	Add pg_database_locale() to retrieve database default locale. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com	2025-10-18 16:25:23 -07:00
Jeff Davis	67a8b49e96	Add pg_iswxdigit(), useful for tsearch. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com	2025-10-18 16:25:11 -07:00
David Rowley	5c0a20003b	Fix reset of incorrect hash iterator in GROUPING SETS queries This fixes an unlikely issue when fetching GROUPING SET results from their internally stored hash tables. It was possible in rare cases that the hash iterator would be set up incorrectly which could result in a crash. This was introduced in `4d143509c`, so backpatch to v18. Many thanks to Yuri Zamyatin for reporting and helping to debug this issue. Bug: #19078 Reported-by: Yuri Zamyatin <yuri@yrz.am> Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/19078-dfd62f840a2c0766@postgresql.org Backpatch-through: 18	2025-10-18 16:07:04 +13:00
David Rowley	86d118f9a6	Englishify comment wording Switch to using the English word here rather than using a verbified function name. The full word still fits within a single comment line, so it's probably better just to use that instead of trying to shorten it, which might cause confusion. Author: Rafia Sabih <rafia.pghackers@gmail.com> Discussion: https://postgr.es/m/CA+FpmFe7LnRF2NA_QfARjkSWme4mNt+Udwbh2Yb=zZm35Ji31w@mail.gmail.com	2025-10-18 12:50:14 +13:00
Masahiko Sawada	fd53065013	Remove unused data_bufsz from DecodedBkpBlock struct. Author: Mikhail Gribkov <youzhick@gmail.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/CAMEv5_sxuaiAfSy1ZyN%3D7UGbHg3C10cwHhEk8nXEjiCsBVs4vQ%40mail.gmail.com	2025-10-17 11:28:54 -07:00
Peter Eisentraut	e1a912c86d	Change config_generic.vartype to be initialized at compile time Previously, this was initialized at run time so that it did not have to be maintained by hand in guc_tables.c. But since that table is now generated anyway, we might as well generate this bit as well. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/8fdfb91e-60fb-44fa-8df6-f5dea47353c9@eisentraut.org	2025-10-17 10:33:54 +02:00
Nathan Bossart	812221b204	Remove partColsUpdated. This information appears to have been unused since commit `c5b7ba4e67`. We could not find any references in third-party code, either. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/aO_CyFRpbVMtgJWM%40nathan	2025-10-16 11:31:38 -05:00
Amit Kapila	41c674d2e3	Refactor logical worker synchronization code into a separate file. To support the upcoming addition of a sequence synchronization worker, this patch extracts common synchronization logic shared by table sync workers and the new sequence sync worker into a dedicated file. This modularization improves code reuse, maintainability, and clarity in the logical workers framework. Author: vignesh C <vignesh21@gmail.com> Author: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com	2025-10-16 05:10:50 +00:00
Jeff Davis	af164f31b9	Add pg_iswalpha() and related functions. Per-character pg_locale_t APIs. Useful for tsearch parsing and potentially other places. Significant overlap with the regc_wc_isalpha() and related functions in regc_pg_locale.c, but this change leaves those intact for now. Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com	2025-10-15 12:54:01 -07:00
Nathan Bossart	688dc6299a	Fix lookups in pg_{clear,restore}_{attribute,relation}_stats(). Presently, these functions look up the relation's OID, lock it, and then check privileges. Not only does this approach provide no guarantee that the locked relation matches the arguments of the lookup, but it also allows users to briefly lock relations for which they do not have privileges, which might enable denial-of-service attacks. This commit adjusts these functions to use RangeVarGetRelidExtended(), which is purpose-built to avoid both of these issues. The new RangeVarGetRelidCallback function is somewhat complicated because it must handle both tables and indexes, and for indexes, we must check privileges on the parent table and lock it first. Also, it needs to handle a couple of extremely unlikely race conditions involving concurrent OID reuse. A downside of this change is that the coding doesn't allow for locking indexes in AccessShare mode anymore; everything is locked in ShareUpdateExclusive mode. Per discussion, the original choice of lock levels was intended for a now defunct implementation that used in-place updates, so we believe this change is okay. Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/Z8zwVmGzXyDdkAXj%40nathan Backpatch-through: 18	2025-10-15 12:47:33 -05:00
Peter Eisentraut	5f4c3b33a9	Change reset_extra into a config_generic common field This is not specific to the GUC parameter type, so it can be part of the generic struct rather than the type-specific struct (like the related "extra" field). This allows for some code simplifications. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/8fdfb91e-60fb-44fa-8df6-f5dea47353c9@eisentraut.org	2025-10-15 15:20:28 +02:00
Peter Eisentraut	dd3ae37830	Add log_autoanalyze_min_duration The log output functionality of log_autovacuum_min_duration applies to both VACUUM and ANALYZE, so it is not possible to separate the VACUUM and ANALYZE log output thresholds. Logs are likely to be output only for VACUUM and not for ANALYZE. Therefore, we decided to separate the threshold for log output of VACUUM by autovacuum (log_autovacuum_min_duration) and the threshold for log output of ANALYZE by autovacuum (log_autoanalyze_min_duration). Author: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Kasahara Tatsuhito <kasaharatt@oss.nttdata.com> Discussion: https://www.postgresql.org/message-id/flat/CAOzEurQtfV4MxJiWT-XDnimEeZAY+rgzVSLe8YsyEKhZcajzSA@mail.gmail.com	2025-10-15 14:31:12 +02:00
Etsuro Fujita	12609fbacb	Fix EvalPlanQual handling of foreign/custom joins in ExecScanFetch. If inside an EPQ recheck, ExecScanFetch would run the recheck method function for foreign/custom joins even if they aren't descendant nodes in the EPQ recheck plan tree, which is problematic at least in the foreign-join case, because such a foreign join isn't guaranteed to have an alternative local-join plan required for running the recheck method function; in the postgres_fdw case this could lead to a segmentation fault or an assert failure in an assert-enabled build when running the recheck method function. Even if inside an EPQ recheck, any scan nodes that aren't descendant ones in the EPQ recheck plan tree should be normally processed by using the access method function; fix by modifying ExecScanFetch so that if inside an EPQ recheck, it runs the recheck method function for foreign/custom joins that are descendant nodes in the EPQ recheck plan tree as before and runs the access method function for foreign/custom joins that aren't. This fix also adds to postgres_fdw an isolation test for an EPQ recheck that caused issues stated above. Oversight in commit `385f337c9`. Reported-by: Kristian Lejao <kristianlejao@gmail.com> Author: Masahiko Sawada <sawada.mshk@gmail.com> Co-authored-by: Etsuro Fujita <etsuro.fujita@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com> Discussion: https://postgr.es/m/CAD21AoBpo6Gx55FBOW+9s5X=nUw3Xpq64v35fpDEKsTERnc4TQ@mail.gmail.com Backpatch-through: 13	2025-10-15 17:15:00 +09:00
Peter Eisentraut	29dc7a6687	Add some const qualifiers in guc-related source files, in anticipation of some further restructuring. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/8fdfb91e-60fb-44fa-8df6-f5dea47353c9@eisentraut.org	2025-10-15 10:05:53 +02:00
Amit Kapila	2436b8c047	Standardize use of REFRESH PUBLICATION in code and messages. This patch replaces ALTER SUBSCRIPTION REFRESH with ALTER SUBSCRIPTION REFRESH PUBLICATION in comments and error messages to improve clarity and support future extensibility. The change aligns with upcoming addition REFRESH SEQUENCES for sequence synchronization. Author: vignesh C <vignesh21@gmail.com> Author: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com	2025-10-15 03:42:27 +00:00
Melanie Plageman	43b05b38ea	Inline TransactionIdFollows/Precedes[OrEquals]() These functions appeared prominently in a profile of a patch that sets the visibility map on-access. Inline them to remove call overhead and make them cheaper to use in hot paths. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl%40mdyyqpqzxjek	2025-10-14 17:03:48 -04:00
Jeff Davis	8efe982fe2	pg_regc_locale.c: rename some static functions. Use the more specific prefix "regc_" rather than the generic prefix "pg_". A subsequent commit will create generic versions of some of these functions that can be called from other modules. Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com	2025-10-14 11:04:04 -07:00
Melanie Plageman	4a8fb58671	Bump XLOG_PAGE_MAGIC after xl_heap_prune change `add323da40` altered xl_heap_prune, changing the WAL format, but neglected to bump XLOG_PAGE_MAGIC. Do so now. Author: Melanie Plageman <melanieplageman@gmail.com> Reported-by: Kirill Reshke <reshkekirill@gmail.com> Reported-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/aO3Gw6hCAZFUd5ab%40paquier.xyz	2025-10-14 10:13:10 -04:00
Richard Guo	1206df04c2	Rename apply_at to apply_agg_at for clarity The field name "apply_at" in RelAggInfo was a bit ambiguous. Rename it to "apply_agg_at" to improve clarity and make its purpose clearer. Per complaint from David Rowley, Robert Haas. Suggested-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA+TgmoZ0KR2_XCWHy17=HHcQ3p2Mamc9c6Dnnhf1J6wPYFD9ng@mail.gmail.com	2025-10-14 16:35:22 +09:00
Michael Paquier	cd0be131ba	Introduce frontend API able to retrieve the contents of PG_VERSION get_pg_version() is able to return a version number, that can be used for comparisons based on PG_VERSION_NUM. A macro is added to convert the result to a major version number, to work with PG_MAJORVERSION_NUM. It is possible to pass to the routine an optional argument, where the contents retrieved from PG_VERSION are saved. This requirement matters for some of the frontend code (one example: pg_upgrade wants that for tablespace paths with a version number strictly older than v10). This will be used by a set of follow-up patches, to be consumed in various frontend tools that duplicate a logic similar to do what this new routine does, like: - pg_resetwal - pg_combinebackup - pg_createsubscriber - pg_upgrade This routine supports both the post-v10 version number and the older flavor (aka 9.6), as required at least by pg_upgrade. Author: Michael Paquier <michael@paquier.xyz> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/aOiirvWJzwdVCXph@paquier.xyz	2025-10-14 16:20:42 +09:00
Melanie Plageman	add323da40	Eliminate XLOG_HEAP2_VISIBLE from vacuum phase III Instead of emitting a separate XLOG_HEAP2_VISIBLE WAL record for each page that becomes all-visible in vacuum's third phase, specify the VM changes in the already emitted XLOG_HEAP2_PRUNE_VACUUM_CLEANUP record. Visibility checks are now performed before marking dead items unused. This is safe because the heap page is held under exclusive lock for the entire operation. This reduces the number of WAL records generated by VACUUM phase III by up to 50%. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com	2025-10-13 18:01:06 -04:00
Tom Lane	03bf7a12c5	Fix incorrect message-printing in win32security.c. log_error() would probably fail completely if used, and would certainly print garbage for anything that needed to be interpolated into the message, because it was failing to use the correct printing subroutine for a va_list argument. This bug likely went undetected because the error cases this code is used for are rarely exercised - they only occur when Windows security API calls fail catastrophically (out of memory, security subsystem corruption, etc). The FRONTEND variant can be fixed just by calling vfprintf() instead of fprintf(). However, there was no va_list variant of write_stderr(), so create one by refactoring that function. Following the usual naming convention for such things, call it vwrite_stderr(). Author: Bryan Green <dbryan.green@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAF+pBj8goe4fRmZ0V3Cs6eyWzYLvK+HvFLYEYWG=TzaM+tWPnw@mail.gmail.com Backpatch-through: 13	2025-10-13 17:56:45 -04:00
Michael Paquier	3a36543d7d	Fix two typos in xlogstats.h and xlogstats.c Issue found while browsing this area of the code, introduced and copy-pasted around by `2258e76f90`. Backpatch-through: 15	2025-10-10 11:51:45 +09:00
Melanie Plageman	d96f87332b	Eliminate COPY FREEZE use of XLOG_HEAP2_VISIBLE Instead of emitting a separate WAL XLOG_HEAP2_VISIBLE record for setting bits in the VM, specify the VM block changes in the XLOG_HEAP2_MULTI_INSERT record. This halves the number of WAL records emitted by COPY FREEZE. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com	2025-10-09 16:29:01 -04:00
Amit Kapila	96b3784973	Add "ALL SEQUENCES" support to publications. This patch adds support for the ALL SEQUENCES clause in publications, enabling synchronization/replication of all sequences that is useful for upgrades. Publications can now include all sequences via FOR ALL SEQUENCES. psql enhancements: \d shows publications for a given sequence. \dRp indicates if a publication includes all sequences. ALL SEQUENCES can be combined with ALL TABLES, but not with other options like TABLE or TABLES IN SCHEMA. We can extend support for more granular clauses in future. The view pg_publication_sequences provides information about the mapping between publications and sequences. This patch enables publishing of sequences; subscriber-side support will be added in upcoming patches. Author: vignesh C <vignesh21@gmail.com> Author: Tomas Vondra <tomas@vondra.me> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com> Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com	2025-10-09 03:48:54 +00:00
Andres Freund	5e89985928	bufmgr: Don't lock buffer header in StrategyGetBuffer() Previously StrategyGetBuffer() acquired the buffer header spinlock for every buffer, whether it was reusable or not. If reusable, it'd be returned, with the lock held, to GetVictimBuffer(), which then would pin the buffer with PinBuffer_Locked(). That's somewhat violating the spirit of the guidelines for holding spinlocks (i.e. that they are only held for a few lines of consecutive code) and necessitates using PinBuffer_Locked(), which scales worse than PinBuffer() due to holding the spinlock. This alone makes it worth changing the code. However, the main reason to change this is that a future commit will make PinBuffer_Locked() slower (due to making UnlockBufHdr() slower), to gain scalability for the much more common case of pinning a pre-existing buffer. By pinning the buffer with a single atomic operation, iff the buffer is reusable, we avoid any potential regression for miss-heavy workloads. There strictly are fewer atomic operations for each potential buffer after this change. The price for this improvement is that freelist.c needs two CAS loops and needs to be able to set up the resource accounting for pinned buffers. The latter is achieved by exposing a new function for that purpose from bufmgr.c, that seems better than exposing the entire private refcount infrastructure. The improvement seems worth the complexity. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-10-08 17:04:07 -04:00
Andres Freund	3baae90013	bufmgr: fewer calls to BufferDescriptorGetContentLock We're planning to merge buffer content locks into BufferDesc.state. To reduce the size of that patch, centralize calls to BufferDescriptorGetContentLock(). The biggest part of the change is in assertions, by introducing BufferIsLockedByMe[InMode]() (and removing BufferIsExclusiveLocked()). This seems like an improvement even without aforementioned plans. Additionally replace some direct calls to LWLockAcquire() with calls to LockBuffer(). Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-10-08 16:06:19 -04:00
Masahiko Sawada	d3b6183dd9	Add mem_exceeded_count column to pg_stat_replication_slots. This commit introduces a new column mem_exceeded_count to the pg_stat_replication_slots view. This counter tracks how often the memory used by logical decoding exceeds the logical_decoding_work_mem limit. The new statistic helps users determine whether exceeding the logical_decoding_work_mem limit is a rare occurrences or a frequent issue, information that wasn't available through existing statistics. Bumps catversion. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/978D21E8-9D3B-40EA-A4B1-F87BABE7868C@yesql.se	2025-10-08 10:05:04 -07:00
Tom Lane	14ad0d7bf2	Cleanup NAN code in float.h, too. In the same spirit as `3bf905692`, assume that all compilers we still support provide the NAN macro, and get rid of workarounds for that. The C standard allows implementations to omit NAN if the underlying float arithmetic lacks quiet (non-signaling) NaNs. However, we've required that feature for years: the workarounds only supported lack of the macro, not lack of the functionality. I put in a compile-time #error if there's no macro, just for clarity. Also fix up the copies of these functions in ecpglib, and leave a breadcrumb for the next hacker who touches them. History of the hacks being removed here can be found in commits `1bc2d544b`, `4d17a2146`, `cec8394b5`. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1952095.1759764279@sss.pgh.pa.us	2025-10-08 12:19:53 -04:00
Robert Haas	4685977cc5	Add extension_state member to PlannedStmt. Extensions can stash data computed at plan time into this list using planner_shutdown_hook (or perhaps other mechanisms) and then access it from any code that has access to the PlannedStmt (such as explain hooks), allowing for extensible debugging and instrumentation of plans. Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com	2025-10-08 09:07:49 -04:00
Robert Haas	94f3ad3961	Add planner_setup_hook and planner_shutdown_hook. These hooks allow plugins to get control at the earliest point at which the PlannerGlobal object is fully initialized, and then just before it gets destroyed. This is useful in combination with the extendable plan state facilities (see extendplan.h) and perhaps for other purposes as well. Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com	2025-10-08 09:05:38 -04:00
Robert Haas	c83ac02ec7	Add ExplainState argument to pg_plan_query() and planner(). This allows extensions to have access to any data they've stored in the ExplainState during planning. Unfortunately, it won't help with EXPLAIN EXECUTE is used, but since that case is less common, this still seems like an improvement. Since planner() has quite a few arguments now, also add some documentation of those arguments and the return value. Author: Robert Haas <rhaas@postgresql.org> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com	2025-10-08 08:33:29 -04:00
Richard Guo	8e11859102	Implement Eager Aggregation Eager aggregation is a query optimization technique that partially pushes aggregation past a join, and finalizes it once all the relations are joined. Eager aggregation may reduce the number of input rows to the join and thus could result in a better overall plan. In the current planner architecture, the separation between the scan/join planning phase and the post-scan/join phase means that aggregation steps are not visible when constructing the join tree, limiting the planner's ability to exploit aggregation-aware optimizations. To implement eager aggregation, we collect information about aggregate functions in the targetlist and HAVING clause, along with grouping expressions from the GROUP BY clause, and store it in the PlannerInfo node. During the scan/join planning phase, this information is used to evaluate each base or join relation to determine whether eager aggregation can be applied. If applicable, we create a separate RelOptInfo, referred to as a grouped relation, to represent the partially-aggregated version of the relation and generate grouped paths for it. Grouped relation paths can be generated in two ways. The first method involves adding sorted and hashed partial aggregation paths on top of the non-grouped paths. To limit planning time, we only consider the cheapest or suitably-sorted non-grouped paths in this step. Alternatively, grouped paths can be generated by joining a grouped relation with a non-grouped relation. Joining two grouped relations is currently not supported. To further limit planning time, we currently adopt a strategy where partial aggregation is pushed only to the lowest feasible level in the join tree where it provides a significant reduction in row count. This strategy also helps ensure that all grouped paths for the same grouped relation produce the same set of rows, which is important to support a fundamental assumption of the planner. For the partial aggregation that is pushed down to a non-aggregated relation, we need to consider all expressions from this relation that are involved in upper join clauses and include them in the grouping keys, using compatible operators. This is essential to ensure that an aggregated row from the partial aggregation matches the other side of the join if and only if each row in the partial group does. This ensures that all rows within the same partial group share the same "destiny", which is crucial for maintaining correctness. One restriction is that we cannot push partial aggregation down to a relation that is in the nullable side of an outer join, because the NULL-extended rows produced by the outer join would not be available when we perform the partial aggregation, while with a non-eager-aggregation plan these rows are available for the top-level aggregation. Pushing partial aggregation in this case may result in the rows being grouped differently than expected, or produce incorrect values from the aggregate functions. If we have generated a grouped relation for the topmost join relation, we finalize its paths at the end. The final paths will compete in the usual way with paths built from regular planning. The patch was originally proposed by Antonin Houska in 2017. This commit reworks various important aspects and rewrites most of the current code. However, the original patch and reviews were very useful. Author: Richard Guo <guofenglinux@gmail.com> Author: Antonin Houska <ah@cybertec.at> (in an older version) Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> (in an older version) Reviewed-by: Andy Fan <zhihuifan1213@163.com> (in an older version) Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> (in an older version) Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com	2025-10-08 17:04:23 +09:00
Richard Guo	185e304263	Allow negative aggtransspace to indicate unbounded state size This patch reuses the existing aggtransspace in pg_aggregate to signal that an aggregate's transition state can grow unboundedly. If aggtransspace is set to a negative value, it now indicates that the transition state may consume unpredictable or large amounts of memory, such as in aggregates like array_agg or string_agg that accumulate input rows. This information can be used by the planner to avoid applying memory-sensitive optimizations (e.g., eager aggregation) when there is a risk of excessive memory usage during partial aggregation. Bump catalog version. Per idea from Robert Haas, though applied differently than originally suggested. Discussion: https://postgr.es/m/CA+TgmoYbkvYwLa+1vOP7RDY7kO2=A7rppoPusoRXe44VDOGBPg@mail.gmail.com	2025-10-08 17:01:48 +09:00
Michael Paquier	b71bae41a0	Add stats_reset to pg_stat_user_functions It is possible to call pg_stat_reset_single_function_counters() for a single function, but the reset time was missing the system view showing its statistics. Like all the fields of pg_stat_user_functions, the GUC track_functions needs to be enabled to show the statistics about function executions. Bump catalog version. Bump PGSTAT_FILE_FORMAT_ID, as a result of the new field added to PgStat_StatFuncEntry. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aONjnsaJSx-nEdfU@paquier.xyz	2025-10-08 12:43:40 +09:00
David Rowley	3bf905692c	Cleanup INFINITY code in float.h The INFINITY macro is always defined per C99 standard, so this should mean we can now get rid of the workaround code for when that macro isn't defined. Also, delete the (now unneeded) #pragma code which was disabling a compiler warning in MSVC. There was a comment explaining why the #pragma was placed outside the function body to work around a MSVC compiler bug, but the link explaining that was dead, as reported by jian he. Author: David Rowley <dgrowleyml@gmail.com> Reported-by: jian he <jian.universality@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CACJufxGARYETnNwtCK7QC0zE_7gq-tfN0mME=gT5rTNtC=VSHQ@mail.gmail.com	2025-10-08 12:07:17 +13:00
Robert Haas	64095d1574	Remove PlannerInfo's join_search_private method. Instead, use the new mechanism that allows planner extensions to store private state inside a PlannerInfo, treating GEQO as an in-core planner extension. This is a useful test of the new facility, and also buys back a few bytes of storage. To make this work, we must remove innerrel_is_unique_ext's hack of testing whether join_search_private is set as a proxy for whether the join search might be retried. Add a flag that extensions can use to explicitly signal their intentions instead. Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com	2025-10-07 12:43:45 -04:00
Robert Haas	0132dddab3	Allow private state in certain planner data structures. Extension that make extensive use of planner hooks may want to coordinate their efforts, for example to avoid duplicate computation, but that's currently difficult because there's no really good way to pass data between different hooks. To make that easier, allow for storage of extension-managed private state in PlannerGlobal, PlannerInfo, and RelOptInfo, along very similar lines to what we have permitted for ExplainState since commit `c65bc2e1d1`. Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com	2025-10-07 12:09:30 -04:00
Robert Haas	8c49a484e8	Assign each subquery a unique name prior to planning it. Previously, subqueries were given names only after they were planned, which makes it difficult to use information from a previous execution of the query to guide future planning. If, for example, you knew something about how you want "InitPlan 2" to be planned, you won't know whether the subquery you're currently planning will end up being "InitPlan 2" until after you've finished planning it, by which point it's too late to use the information that you had. To fix this, assign each subplan a unique name before we begin planning it. To improve consistency, use textual names for all subplans, rather than, as we did previously, a mix of numbers (such as "InitPlan 1") and names (such as "CTE foo"), and make sure that the same name is never assigned more than once. We adopt the somewhat arbitrary convention of using the type of sublink to set the plan name; for example, a query that previously had two expression sublinks shown as InitPlan 2 and InitPlan 1 will now end up named expr_1 and expr_2. Because names are assigned before rather than after planning, some of the regression test outputs show the numerical part of the name switching positions: what was previously SubPlan 2 was actually the first one encountered, but we finished planning it later. We assign names even to subqueries that aren't shown as such within the EXPLAIN output. These include subqueries that are a FROM clause item or a branch of a set operation, rather than something that will be turned into an InitPlan or SubPlan. The purpose of this is to make sure that, below the topmost query level, there's always a name for each subquery that is stable from one planning cycle to the next (assuming no changes to the query or the database schema). Author: Robert Haas <rhaas@postgresql.org> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: http://postgr.es/m/3641043.1758751399@sss.pgh.pa.us	2025-10-07 09:18:54 -04:00
Nathan Bossart	ec8719ccbf	Optimize hex_encode() and hex_decode() using SIMD. The hex_encode() and hex_decode() functions serve as the workhorses for hexadecimal data for bytea's text format conversion functions, and some workloads are sensitive to their performance. This commit adds new implementations that use routines from port/simd.h, which testing indicates are much faster for larger inputs. For small or invalid inputs, we fall back on the existing scalar versions. Since we are using port/simd.h, these optimizations apply to both x86-64 and AArch64. Author: Nathan Bossart <nathandbossart@gmail.com> Co-authored-by: Chiranmoy Bhattacharya <chiranmoy.bhattacharya@fujitsu.com> Co-authored-by: Susmitha Devanga <devanga.susmitha@fujitsu.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/aLhVWTRy0QPbW2tl%40nathan	2025-10-06 12:28:50 -05:00
Amit Kapila	b93172ca59	Expose sequence page LSN via pg_get_sequence_data. This patch enhances the pg_get_sequence_data function to include the page-level LSN (Log Sequence Number) of the sequence. This additional metadata will be used by upcoming patches to support synchronization of sequences during logical replication. By exposing the LSN, we enable more accurate tracking of sequence changes, which is essential for maintaining consistency across replicated nodes. Author: vignesh C <vignesh21@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://www.postgresql.org/message-id/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com	2025-10-06 08:30:16 +00:00
Michael Paquier	42c6b74d89	Add comment in ginxlog.h about block used with ginxlogInsertListPage All the other structures describe the list of blocks used, and in the case of a GIN_INSERT_LISTPAGE record block 0 refers to a list page with the items added to it. Author: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/CALdSSPgk=9WRoXhZy5fdk+T1hiau7qbL_vn94w_L1N=gtEdbsg@mail.gmail.com	2025-10-06 16:23:51 +09:00
Michael Paquier	a5b543258a	Add stats_reset to pg_stat_all_{tables,indexes} and related views It is possible to call pg_stat_reset_single_table_counters() on a relation (index or table) but the reset time was missing from the system views showing their statistics. This commit adds the reset time as an attribute of pg_stat_all_tables, pg_stat_all_indexes, and other relations related to them. Bump catalog version. Bump PGSTAT_FILE_FORMAT_ID, as a result of the new field added to PgStat_StatTabEntry. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/aN8l182jKxEq1h9f@paquier.xyz	2025-10-06 15:31:21 +09:00
Álvaro Herrera	1a8b5b11e4	Don't include access/htup_details.h in executor/tuptable.h This is not at all needed; I suspect it was a simple mistake in commit `5408e233f0`. It causes htup_details.h to bleed into a huge number of places via execnodes.h. Remove it and fix fallout. Discussion: https://postgr.es/m/202510021240.ptc2zl5cvwen@alvherre.pgsql	2025-10-05 18:00:38 +02:00
Álvaro Herrera	1b6f61bd89	Don't include execnodes.h in brin.h or gin.h These headers don't need execnodes.h for anything. I think they never have. Discussion: https://postgr.es/m/202510021240.ptc2zl5cvwen@alvherre.pgsql	2025-10-05 17:35:25 +02:00
Nathan Bossart	f8f4afe751	Optimize vector8_has_le() on AArch64. Presently, the SIMD implementation of this function uses unsigned saturating subtraction to find bytes less than or equal to the given value, which is a workaround for the lack of unsigned comparison instructions on some architectures. However, Neon offers vminvq_u8(), which returns the minimum (unsigned) value in the vector. This commit adds a Neon-specific implementation that uses vminvq_u8() to optimize vector8_has_le() on AArch64. In passing, adjust the SSE2 implementation to use vector8_min() and vector8_eq() to find values less than or equal to the given value. This was the only use of vector8_ssub(), so it has been removed. Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/aNHDNDSHleq0ogC_%40nathan	2025-10-03 14:02:47 -05:00
Tatsuo Ishii	25a30bbd42	Add IGNORE NULLS/RESPECT NULLS option to Window functions. Add IGNORE NULLS/RESPECT NULLS option (null treatment clause) to lead, lag, first_value, last_value and nth_value window functions. If unspecified, the default is RESPECT NULLS which includes NULL values in any result calculation. IGNORE NULLS ignores NULL values. Built-in window functions are modified to call new API WinCheckAndInitializeNullTreatment() to indicate whether they accept IGNORE NULLS/RESPECT NULLS option or not (the API can be called by user defined window functions as well). If WinGetFuncArgInPartition's allowNullTreatment argument is true and IGNORE NULLS option is given, WinGetFuncArgInPartition() or WinGetFuncArgInFrame() will return evaluated function's argument expression on specified non NULL row (if it exists) in the partition or the frame. When IGNORE NULLS option is given, window functions need to visit and evaluate same rows over and over again to look for non null rows. To mitigate the issue, 2-bit not null information array is created while executing window functions to remember whether the row has been already evaluated to NULL or NOT NULL. If already evaluated, we could skip the evaluation work, thus we could get better performance. Author: Oliver Ford <ojford@gmail.com> Co-authored-by: Tatsuo Ishii <ishii@postgresql.org> Reviewed-by: Krasiyan Andreev <krasiyan@gmail.com> Reviewed-by: Andrew Gierth <andrew@tao11.riddles.org.uk> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Fetter <david@fetter.org> Reviewed-by: Vik Fearing <vik@postgresfriends.org> Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-by: Chao Li <lic@highgo.com> Discussion: https://postgr.es/m/flat/CAGMVOdsbtRwE_4+v8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A@mail.gmail.com	2025-10-03 09:47:36 +09:00
Peter Eisentraut	8e2acda2b0	Rename pg_builtin_integer_constant_p to pg_integer_constant_p Since it's not builtin. Also fix a related typo. Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAApHDvom02B_XNVSkvxznVUyZbjGAR%2B5myA89ZcbEd3%3DPA9UcA%40mail.gmail.com	2025-09-30 21:15:46 +02:00
Peter Eisentraut	57d46dff9b	Make some use of anonymous unions [reorderbuffer xact_time] Make some use of anonymous unions, which are allowed as of C11, as examples and encouragement for future code, and to test compilers. This commit changes the ReorderBufferTXN struct. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/f00a9968-388e-4f8c-b5ef-5102e962d997%40eisentraut.org	2025-09-30 12:35:50 +02:00
Peter Eisentraut	4b7e6c73b0	Make some use of anonymous unions [pg_locale_t] Make some use of anonymous unions, which are allowed as of C11, as examples and encouragement for future code, and to test compilers. This commit changes the pg_locale_t type. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/f00a9968-388e-4f8c-b5ef-5102e962d997%40eisentraut.org	2025-09-30 12:35:50 +02:00
Álvaro Herrera	3bf31dd243	Do a tiny bit of header file maintenance Stop including utils/relcache.h in access/genam.h, and stop including htup_details.h in nodes/tidbitmap.h. Both these files (genam.h and tidbitmap.h) are widely used in other header files, so it's in our best interest that they remain as lean as reasonable. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/202509291356.o5t6ny2hoa3q@alvherre.pgsql	2025-09-30 12:28:29 +02:00
Tom Lane	ef38a4d975	Add GROUP BY ALL. GROUP BY ALL is a form of GROUP BY that adds any TargetExpr that does not contain an aggregate or window function into the groupClause of the query, making it exactly equivalent to specifying those same expressions in an explicit GROUP BY list. This feature is useful for certain kinds of data exploration. It's already present in some other DBMSes, and the SQL committee recently accepted it into the standard, so we can be reasonably confident in the syntax being stable. We do have to invent part of the semantics, as the standard doesn't allow for expressions in GROUP BY, so they haven't specified what to do with window functions. We assume that those should be treated like aggregates, i.e., left out of the constructed GROUP BY list. In passing, wordsmith some existing documentation about GROUP BY, and update some neglected synopsis entries in select_into.sgml. Author: David Christensen <david@pgguru.net> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAHM0NXjz0kDwtzoe-fnHAqPB1qA8_VJN0XAmCgUZ+iPnvP5LbA@mail.gmail.com	2025-09-29 16:55:17 -04:00
Michael Paquier	7bd2975fa9	Add support for tracking of entry count in pgstats Stats kinds can set a new option called "track_entry_count" (disabled by default, available for variable-numbered stats) that will make pgstats track the number of entries that exist in its shared hashtable. As there is only one code path where a new entry is added, and one code path where entries are freed, the count tracking is straight-forward in its implementation. Reads of these counters are optimistic, and may change across two calls. The counter is incremented when an entry is created (not when reused), and is decremented when an entry is freed from the hashtable (marked for drop with its refcount reaching 0), which is something that pgstats decides internally. A first use case of this facility would be pg_stat_statements, where we need to be able to cap the number of entries that would be stored in the shared hashtable, based on its "max" GUC. The module currently relies on hash_get_num_entries(), which offers a cheap way to count how many entries are in its hash table, but we cannot do that in pgstats for variable-sized stats kinds as a single hashtable is used for all the stats kinds. Independently of PGSS, this is useful for other custom stats kinds that want to cap, control, or track the number of entries they have, without depending on a potentially expensive sequential scan to know the number of entries while holding an extra exclusive lock. Author: Michael Paquier <michael@paquier.xyz> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Keisuke Kuroda <keisuke.kuroda.3862@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aMPKWR81KT5UXvEr@paquier.xyz	2025-09-29 08:57:57 +09:00
David Rowley	59c2f03d1e	Teach MSVC that elog/ereport ERROR doesn't return It had always been intended that this already works correctly as pg_unreachable() uses __assume(0) on MSVC, and that directs the compiler in a way so it knows that a given function won't return. However, with ereport_domain(), it didn't work... It's now understood that the failure to determine that elog(ERROR) does not return comes from the inability of the MSVC compiler to detect the "const int elevel_" is the same as the "elevel" macro parameter. MSVC seems to be unable to make the "if (elevel_ >= ERROR) branch constantly-true when the macro is used with any elevel >= ERROR, therefore the pg_unreachable() is seen to only be present in a conditional branch rather than present unconditionally. While there seems to be no way to force the compiler into knowing that elevel_ is equal to elevel within the ereport_domain() macro, there is a way in C11 to determine if the elevel parameter is a compile-time constant or not. This is done via some hackery using the _Generic() intrinsic function, which gives us functionality similar to GCC's __builtin_constant_p(), albeit only for integers. Here we define pg_builtin_integer_constant_p() for this purpose. Callers can check for availability via HAVE_PG_BUILTIN_INTEGER_CONSTANT_P. ereport_domain() has been adjusted to use pg_builtin_integer_constant_p() instead of __builtin_constant_p(). It's not quite clear at this stage if this now allows us to forego doing the likes of "return NULL; /* keep compiler quiet /" as there may be other compilers in use that have similar struggles. It's just a matter of time before someone commits a function that does not "return" a value after an elog(ERROR). Let's make time and lack of complaints about said commit be the judge of if we need to continue the "/ keep compiler quiet */" palaver. Author: David Rowley <drowleyml@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CAApHDvom02B_XNVSkvxznVUyZbjGAR+5myA89ZcbEd3=PA9UcA@mail.gmail.com	2025-09-27 22:41:04 +12:00
Masahiko Sawada	66cdef4425	Remove unused for_all_tables field from AlterPublicationStmt. No backpatch as AlterPublicationStmt struct is exposed. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CAD21AoC6B6AuxWOST-TkxUbDgp8FwX=BLEJZmKLG_VrH-hfxpA@mail.gmail.com	2025-09-26 09:23:00 -07:00
Álvaro Herrera	dbf8cfb4f0	Create a separate file listing backend types Use our established coding pattern to reduce maintenance pain when adding other per-process-type characteristics. Like PG_KEYWORD, PG_CMDTAG, PG_RMGR. To keep the strings translatable, the relevant makefile now also scans src/include for this specific file. I didn't want to have it scan all .h files, as then gettext would have to scan all header files. I didn't find any way to affect the meson behavior in this respect though. Author: Álvaro Herrera <alvherre@kurilemu.de> Co-authored-by: Jonathan Gonzalez V. <jonathan.abdiel@gmail.com> Discussion: https://postgr.es/m/202507151830.dwgz5nmmqtdy@alvherre.pgsql	2025-09-26 15:21:49 +02:00
Álvaro Herrera	7e638d7f50	Don't include execnodes.h in replication/conflict.h ... which silently propagates a lot of headers into many places via pgstat.h, as evidenced by the variety of headers that this patch needs to add to seemingly random places. Add a minimum of typedefs to conflict.h to be able to remove execnodes.h, and fix the fallout. Backpatch to 18, where conflict.h first appeared. Discussion: https://postgr.es/m/202509191927.uj2ijwmho7nv@alvherre.pgsql	2025-09-25 14:52:41 +02:00
Álvaro Herrera	81fc3e28e3	Update some more forward declarations to use typedef As commit `d4d1fc527b`. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/202509191025.22agk3fvpilc@alvherre.pgsql	2025-09-25 14:33:19 +02:00
Fujii Masao	7fcb32ad02	Fix incorrect and inconsistent comments in tableam.h and heapam.c. This commit corrects several issues in function comments: * The parameter "rel" was incorrectly referred to as "relation" in the comments for table_tuple_delete(), table_tuple_update(), and table_tuple_lock(). * In table_tuple_delete(), "changingPart" was listed as an output parameter in the comments but is actually input. * In table_tuple_update(), "slot" was listed as an input parameter in the comments but is actually output. * The comment for "update_indexes" in table_tuple_update() was mis-indented. * The comments for heap_lock_tuple() incorrectly referenced a non-existent "tid" parameter. Author: Chao Li <lic@highgo.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/CAEoWx2nB6Ay8g=KEn7L3qbYX_4+sLk9XOMkV0XZqHR4cTY8ZvQ@mail.gmail.com	2025-09-25 00:51:59 +09:00
Peter Eisentraut	a5b35fcedb	Remove PointerIsValid() This doesn't provide any value over the standard style of checking the pointer directly or comparing against NULL. Also remove related: - AllocPointerIsValid() [unused] - IndexScanIsValid() [had one user] - HeapScanIsValid() [unused] - InvalidRelation [unused] Leaving HeapTupleIsValid(), ItemIdIsValid(), PortalIsValid(), RelationIsValid for now, to reduce code churn. Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/ad50ab6b-6f74-4603-b099-1cd6382fb13d%40eisentraut.org Discussion: https://www.postgresql.org/message-id/CA+hUKG+NFKnr=K4oybwDvT35dW=VAjAAfiuLxp+5JeZSOV3nBg@mail.gmail.com Discussion: https://www.postgresql.org/message-id/bccf2803-5252-47c2-9ff0-340502d5bd1c@iki.fi	2025-09-24 15:17:20 +02:00
Robert Haas	f2bae51dfd	Keep track of what RTIs a Result node is scanning. Result nodes now include an RTI set, which is only non-NULL when they have no subplan, and is taken from the relid set of the RelOptInfo that the Result is generating. ExplainPreScanNode now takes notice of these RTIs, which means that a few things get schema-qualified in the regression tests that previously did not. This makes the output more consistent between cases where some part of the plan tree is replaced by a Result node and those where this does not happen. Likewise, pg_overexplain's EXPLAIN (RANGE_TABLE) now displays the RTIs stored in a Result node just as it already does for other RTI-bearing node types. Result nodes also now include a result_reason, which tells us something about why the Result node was inserted. Using that information, EXPLAIN now emits, where relevant, a "Replaces" line describing the origin of a Result node. The purpose of these changes is to allow code that inspects a Plan tree to understand the origin of Result nodes that appear therein. Discussion: http://postgr.es/m/CA+TgmoYeUZePZWLsSO+1FAN7UPePT_RMEZBKkqYBJVCF1s60=w@mail.gmail.com Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>	2025-09-23 09:07:55 -04:00
David Rowley	9fc7f6ab72	Fix various incorrect filename references Author: Chao Li <li.evan.chao@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAEoWx2=hOBCPm-Z=F15twr_23XjHeoXSbifP5GdEdtWona97wQ@mail.gmail.com	2025-09-22 13:33:17 +12:00
Richard Guo	e3a0304eba	Fix misleading comment in RangeTblEntry The comment describing join_using_alias incorrectly referred to the alias field as being defined "below", when it actually appears earlier in the RangeTblEntry struct. This patch fixes that. Author: Steve Lau <stevelauc@outlook.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/TYWPR01MB10612B020C33FD08F729415CEB613A@TYWPR01MB10612.jpnprd01.prod.outlook.com	2025-09-22 10:04:39 +09:00
Tom Lane	261f89a976	Track the maximum possible frequency of non-MCE array elements. The lossy-counting algorithm that ANALYZE uses to identify most-common array elements has a notion of cutoff frequency: elements with frequency greater than that are guaranteed to be collected, elements with smaller frequencies are not. In cases where we find fewer MCEs than the stats target would permit us to store, the cutoff frequency provides valuable additional information, to wit that there are no non-MCEs with frequency greater than that. What the selectivity estimation functions actually use the "minfreq" entry for is as a ceiling on the possible frequency of non-MCEs, so using the cutoff rather than the lowest stored MCE frequency provides a tighter bound and more accurate estimates. Therefore, instead of redundantly storing the minimum observed MCE frequency, store the cutoff frequency when there are fewer tracked values than we want. (When there are more, then of course we cannot assert that no non-stored elements are above the cutoff frequency, since we're throwing away some that are; so we still use the minimum stored frequency in that case.) Notably, this works even when none of the values are common enough to be called MCEs. In such cases we previously stored nothing in the STATISTIC_KIND_MCELEM pg_statistic slot, which resulted in the selectivity functions falling back to default estimates. So in that case we want to construct a STATISTIC_KIND_MCELEM entry that contains no "values" but does have "numbers", to wit the three extra numbers that the MCELEM entry type defines. A small obstacle is that update_attstats() has traditionally stored a null, not an empty array, when passed zero "values" for a slot. That gives rise to an MCELEM entry that get_attstatsslot() will spit up on. The least risky solution seems to be to adjust update_attstats() so that it will emit a non-null (but possibly empty) array when the passed stavalues array pointer isn't NULL, rather than conditioning that on numvalues > 0. In other existing cases I don't believe that that changes anything. For consistency, handle the stanumbers array the same way. In passing, improve the comments in routines that use STATISTIC_KIND_MCELEM data. Particularly, explain why we use minfreq / 2 not minfreq as the estimate for non-MCE values. Thanks to Matt Long for the suggestion that we could apply this idea even when there are more than zero MCEs. Reported-by: Mark Frost <FROSTMAR@uk.ibm.com> Reported-by: Matt Long <matt@mattlong.org> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/PH3PPF1C905D6E6F24A5C1A1A1D8345B593E16FA@PH3PPF1C905D6E6.namprd15.prod.outlook.com	2025-09-20 14:48:16 -04:00
Nathan Bossart	18cdf5932a	Fix obsolete references to postgres.h in comments. Oversights in commits `d08741eab5` and `d952373a98`. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aMxbfSJ2wLWd32x-%40nathan	2025-09-19 09:19:03 -05:00
Amit Kapila	5b148706c5	Add optional pid parameter to pg_replication_origin_session_setup(). Commit `216a784829` introduced parallel apply workers, allowing multiple processes to share a replication origin. To support this, replorigin_session_setup() was extended to accept a pid argument identifying the process using the origin. This commit exposes that capability through the SQL interface function pg_replication_origin_session_setup() by adding an optional pid parameter. This enables multiple processes to coordinate replication using the same origin when using SQL-level replication functions. This change allows the non-builtin logical replication solutions to implement parallel apply for large transactions. Additionally, an existing internal error was made user-facing, as it can now be triggered via the exposed SQL API. Author: Doruk Yilmaz <doruk@mixrank.com> Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Discussion: https://postgr.es/m/CAMPB6wfe4zLjJL8jiZV5kjjpwBM2=rTRme0UCL7Ra4L8MTVdOg@mail.gmail.com Discussion: https://postgr.es/m/CAE2gYzyTSNvHY1+iWUwykaLETSuAZsCWyryokjP6rG46ZvRgQA@mail.gmail.com	2025-09-19 05:38:40 +00:00
Michael Paquier	3cd3a039da	Document and check that PgStat_HashKey has no padding This change is a tighter rework of `7d85d87f4d`, which tried to improve the code so as it would work should PgStat_HashKey gain new fields that create padding bytes. However, the previous change is proving to not be enough as some code paths of pgstats do not pass PgStat_HashKey by reference (valgrind would warn when padding is added to the structure, through a new field). Per discussion, let's document and check that PgStat_HashKey has no padding rather than try to complicate the code of pgstats so as it is able to work around that. This removes a couple of memset(0) calls that should not be required. While on it, this commit adds a static assertion checking that no padding is introduced in the structure, by checking that the size of PgStat_HashKey matches with the sum of the size of all its fields. The object ID part of the hash key is already 8 bytes, which should be plenty enough already. A comment is added to discourage the addition of new fields. Author: Michael Paquier <michael@paquier.xyz> Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0t9omat+HVSakJXwTMWvhpYFcAZb41RPWKwrKFUgmAFBQ@mail.gmail.com	2025-09-19 09:54:05 +09:00
Thomas Munro	0951942bba	jit: Fix type used for Datum values in LLVM IR. Commit `2a600a93` made Datum 8 bytes wide everywhere. It was no longer appropriate to use TypeSizeT on 32 bit systems, and JIT compilation would fail with various type check errors. Introduce a separate LLVMTypeRef with the name TypeDatum. TypeSizeT is still used in some places for actual size_t values. Reported-by: Dmitry Mityugov <d.mityugov@postgrespro.ru> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Tested-by: Dmitry Mityugov <d.mityugov@postgrespro.ru> Discussion: https://postgr.es/m/0a9f0be59171c2e8f1b3bc10f4fcf267%40postgrespro.ru	2025-09-17 13:38:35 +12:00
Tom Lane	83a5641945	Provide more-specific error details/hints for function lookup failures. Up to now we've contented ourselves with a one-size-fits-all error hint when we fail to find any match to a function or procedure call. That was mostly okay in the beginning, but it was never great, and since the introduction of named arguments it's really not adequate. We at least ought to distinguish "function name doesn't exist" from "function name exists, but not with those argument names". And the rules for named-argument matching are arcane enough that some more detail seems warranted if we match the argument names but the call still doesn't work. This patch creates a framework for dealing with these problems: FuncnameGetCandidates and related code will now pass back a bitmask of flags showing how far the match succeeded. This allows a considerable amount of granularity in the reports. The set-bits-in-a-bitmask approach means that when there are multiple candidate functions, the report will reflect the match(es) that got the furthest, which seems correct. Also, we can avoid mentioning "maybe add casts" unless failure to match argument types is actually the issue. Extend the same return-a-bitmask approach to OpernameGetCandidates. The issues around argument names don't apply to operator syntax, but it still seems worth distinguishing between "there is no operator of that name" and "we couldn't match the argument types". While at it, adjust these messages and related ones to more strictly separate "detail" from "hint", following our message style guidelines' distinction between those. Reported-by: Dominique Devienne <ddevienne@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/1756041.1754616558@sss.pgh.pa.us	2025-09-16 12:17:02 -04:00
Peter Eisentraut	e56a601e06	Move pg_int64 back to postgres_ext.h Fix for commit `3c86223c99`. That commit moved the typedef of pg_int64 from postgres_ext.h to libpq-fe.h, because the only remaining place where it might be used is libpq users, and since the type is obsolete, the intent was to limit its scope. The problem is that if someone builds an extension against an older (pre-PG18) server version and a new (PG18) libpq, they might get two typedefs, depending on include file order. This is not allowed under C99, so they might get warnings or errors, depending on the compiler and options. The underlying types might also be different (e.g., long int vs. long long int), which would also lead to errors. This scenario is plausible when using the standard Debian packaging, which provides only the newest libpq but per-major-version server packages. The fix is to undo that part of commit `3c86223c99`. That way, the typedef is in the same header file across versions. At least, this is the safest fix doable before PostgreSQL 18 releases. Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://www.postgresql.org/message-id/25144219-5142-4589-89f8-4e76948b32db%40eisentraut.org	2025-09-16 10:48:56 +02:00
Peter Eisentraut	bce18ef3c6	Fix incorrect const qualifier Commit `7202d72787` added in passing some const qualifiers, but the one on the postmaster_child_launch() startup_data argument was incorrect, because the function itself modifies the pointed-to data. This is hidden from the compiler because of casts. The qualifiers on the functions called by postmaster_child_launch() are still correct.	2025-09-16 07:27:32 +02:00
Peter Eisentraut	4bd9191298	Change fmgr.h typedefs to use original names fmgr.h defined some types such as fmNodePtr which is just Node *, but it made its own types to avoid having to include various header files. With C11, we can now instead typedef the original names without fear of conflicts. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/10d32190-f31b-40a5-b177-11db55597355@eisentraut.org	2025-09-15 11:04:10 +02:00
Peter Eisentraut	dc41d7415f	Remove hbaPort type This was just a workaround to avoid including the header file that defines the Port type. With C11, we can now just re-define the Port type without the possibility of a conflict. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/10d32190-f31b-40a5-b177-11db55597355@eisentraut.org	2025-09-15 11:04:10 +02:00
Peter Eisentraut	d4d1fc527b	Update various forward declarations to use typedef There are a number of forward declarations that use struct but not the customary typedef, because that could have led to repeat typedefs, which was not allowed. This is now allowed in C11, so we can update these to provide the typedefs as well, so that the later uses of the types look more consistent. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/10d32190-f31b-40a5-b177-11db55597355@eisentraut.org	2025-09-15 11:04:10 +02:00
Peter Eisentraut	70407d39b7	Improve ExplainState type handling in header files Now that we can have repeat typedefs with C11, we don't need to use "struct ExplainState" anymore but can instead make a typedef where necessary. This doesn't change anything but makes it look nicer. (There are more opportunities for similar changes, but this is broken out because there was a separate discussion about it, and it's somewhat bulky on its own.) Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/f36c0a45-98cd-40b2-a7cc-f2bf02b12890%40eisentraut.org#a12fb1a2c1089d6d03010f6268871b00 Discussion: https://www.postgresql.org/message-id/flat/10d32190-f31b-40a5-b177-11db55597355@eisentraut.org	2025-09-15 11:04:10 +02:00
Peter Eisentraut	1e3b5edb8e	Remove workarounds against repeat typedefs This is allowed in C11, so we don't need the workarounds anymore. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/10d32190-f31b-40a5-b177-11db55597355@eisentraut.org	2025-09-15 11:04:10 +02:00
Peter Eisentraut	ae0e1be9f2	Allow redeclaration of typedef yyscan_t This is allowed in C11, so we don't need the workaround guards against it anymore. This effectively reverts commit `382092a0cd` that put these guards in place. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/10d32190-f31b-40a5-b177-11db55597355@eisentraut.org	2025-09-12 08:16:00 +02:00
Peter Eisentraut	25f36066dd	Remove traces of support for Sun Studio compiler Per discussion, this compiler suite is no longer maintained, and it has not been able to compile PostgreSQL since at least PostgreSQL 17. This removes all the remaining support code for this compiler. Note that the Solaris operating system continues to be supported, but using GCC as the compiler. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/a0f817ee-fb86-483a-8a14-b6f7f5991b6e%40eisentraut.org	2025-09-12 07:39:05 +02:00
Richard Guo	2d756ebbe8	Fix misuse of Relids for storing attribute numbers The typedef Relids (Bitmapset ) is intended to represent set of relation identifiers, but was incorrectly used in several places to store sets of attribute numbers. This is my oversight in `e2debb643`. Fix that by replacing such usages with Bitmapset to reflect the correct semantics. Author: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/CAEG8a3LJhp_xriXf39iCz0TsK+M-2biuhDhpLC6Baxw8+ZYT3A@mail.gmail.com	2025-09-12 11:12:19 +09:00
Nathan Bossart	ed1aad15e0	Move named LWLock tranche requests to shared memory. In EXEC_BACKEND builds, GetNamedLWLockTranche() can segfault when called outside of the postmaster process, as it might access NamedLWLockTrancheRequestArray, which won't be initialized. Given the lack of reports, this is apparently unusual, presumably because it is usually called from a shmem_startup_hook like this: mystruct = ShmemInitStruct(..., &found); if (!found) { mystruct->locks = GetNamedLWLockTranche(...); ... } This genre of shmem_startup_hook evades the aforementioned segfaults because the struct is initialized in the postmaster, so all other callers skip the !found path. We considered modifying the documentation or requiring GetNamedLWLockTranche() to be called from the postmaster, but ultimately we decided to simply move the request array to shared memory (and add it to the BackendParameters struct), thereby allowing calls outside postmaster on all platforms. Since the main shared memory segment is initialized after accepting LWLock tranche requests, postmaster builds the request array in local memory first and then copies it to shared memory later. Given the lack of reports, back-patching seems unnecessary. Reported-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0v1_15QPg5Sqd2Qz5rh_qcsyCeHHmRDY89xVHcy2yt5BQ%40mail.gmail.com	2025-09-11 16:13:55 -05:00
Álvaro Herrera	1d5800019f	Improve comment about snapshot macros The comment mistakenly had "the others" for "the other", but this commit also reorders the comment so it matches the macros below. Now we describe the levels in increasing strictness. In addition, it seems easier to follow if we introduce one level at a time, rather than describing two, followed by "the other" (and then jumping back to one of the first two). Finally, reword the sentence about the purpose of the macros, which was slightly off-point. Author: Paul Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Rustam ALLAKOV <rustamallakov@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CA+renyUp=xja80rBaB6NpY3RRdi750y046x28bo_xg29zKY72Q@mail.gmail.com	2025-09-11 19:49:57 +02:00
Peter Eisentraut	4fbe015145	Remove checks for no longer supported GCC versions Since commit `f5e0186f86` (Raise C requirement to C11), we effectively require at least GCC version 4.7, so checks for older versions can be removed. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/a0f817ee-fb86-483a-8a14-b6f7f5991b6e%40eisentraut.org	2025-09-11 12:05:59 +02:00
Michael Paquier	26eadf4d2b	Fix description of WAL record blocks in hash_xlog.h hash_xlog.h included descriptions for the blocks used in WAL records that were was not completely consistent with how the records are generated, with one block missing for SQUEEZE_PAGE, and inconsistent descriptions used for block 0 in VACUUM_ONE_PAGE and MOVE_PAGE_CONTENTS. This information was incorrect since `c11453ce0a`, cross-checking the logic for the record generation. Author: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/CALdSSPj1j=a1d1hVA3oabRFz0hSU3KKrYtZPijw4UPUM7LY9zw@mail.gmail.com Backpatch-through: 13	2025-09-11 17:17:04 +09:00
Michael Paquier	c88ce73eda	Fix incorrect file reference in guc.h GucSource_Names was documented as being in guc.c, but since `0a20ff54f5` it is located in guc_tables.c. The reference to the location of GucSource_Names is important, as GucSource needs to be kept in sync with GucSource_Names. Author: David G. Johnston <david.g.johnston@gmail.com> Discussion: https://postgr.es/m/CAKFQuwYPgAHWPYjPzK7iXzhSZ6MKR8w20_Nz7ZXpOvx=kZbs7A@mail.gmail.com Backpatch-through: 16	2025-09-11 10:15:33 +09:00
Tom Lane	bdc6cfcd12	Eliminate duplicative hashtempcxt in nodeSubplan.c. Instead of building a separate memory context that's used just for running hash functions, make the hash functions run in the per-tuple context of the node's innerecontext. This saves a little space at runtime, and it avoids needing to reset two contexts instead of one inside buildSubPlanHash's main loop. This largely reverts commit `133924e13`. That's safe to do now because `bf6c614a2` decoupled the evaluation context used by TupleHashTableMatch from that used for hash function evaluation, so that there's no longer a risk of resetting the innerecontext too soon. Per discussion of bug #19040, although this is not directly a fix for that. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Haiyang Li <mohen.lhy@alibaba-inc.com> Reviewed-by: Fei Changhong <feichanghong@qq.com> Discussion: https://postgr.es/m/19040-c9b6073ef814f48c@postgresql.org	2025-09-10 16:15:08 -04:00
Michael Paquier	e6da68a6e1	Remove dynahash.h All the callers of my_log2() are now limited inside dynahash.c, so let's remove this header. The same capability is provided by pg_bitutils.h already. Discussion: https://postgr.es/m/CAEZATCUJPQD_7sC-wErak2CQGNa6bj2hY-mr8wsBki=kX7f2_A@mail.gmail.com	2025-09-10 14:11:50 +09:00
Dean Rasheed	faf071b553	Add date and timestamp variants of random(min, max). This adds 3 new variants of the random() function: random(min date, max date) returns date random(min timestamp, max timestamp) returns timestamp random(min timestamptz, max timestamptz) returns timestamptz Each returns a random value x in the range min <= x <= max. Author: Damien Clochard <damien@dalibo.info> Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Vik Fearing <vik@postgresfriends.org> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/f524d8cab5914613d9e624d9ce177d3d@dalibo.info	2025-09-09 10:39:30 +01:00
Melanie Plageman	4b5f206de2	Remove unused xl_heap_prune member, reason `f83d709760` refactored xl_heap_prune and added an unused member, reason. While PruneReason is used when constructing this WAL record to set the WAL record definition, it doesn't need to be stored in a separate field in the record. Remove it. We won't backport this, since modifying an exposed struct definition to remove an unused field would do more harm than good. Author: Melanie Plageman <melanieplageman@gmail.com> Reported-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/tvvtfoxz5ykpsctxjbzxg3nldnzfc7geplrt2z2s54pmgto27y%40hbijsndifu45	2025-09-08 14:25:10 -04:00
Amit Kapila	1f7e9ba3ac	Post-commit review fixes for `228c370868`. This commit fixes three issues: 1) When a disabled subscription is created with retain_dead_tuples set to true, the launcher is not woken up immediately, which may lead to delays in creating the conflict detection slot. Creating the conflict detection slot is essential even when the subscription is not enabled. This ensures that dead tuples are retained, which is necessary for accurately identifying the type of conflict during replication. 2) Conflict-related data was unnecessarily retained when the subscription does not have a table. 3) Conflict-relevant data could be prematurely removed before applying prepared transactions on the publisher that are in the commit critical section. This issue occurred because the backend executing COMMIT PREPARED was not accounted for during the computation of oldestXid in the commit phase on the publisher. As a result, the subscriber could advance the conflict slot's xmin without waiting for such COMMIT PREPARED transactions to complete. We fixed this issue by identifying prepared transactions that are in the commit critical section during computation of oldestXid in commit phase. Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/OS9PR01MB16913DACB64E5721872AA5C02943BA@OS9PR01MB16913.jpnprd01.prod.outlook.com Discussion: https://postgr.es/m/OS9PR01MB16913F67856B0DA2A909788129400A@OS9PR01MB16913.jpnprd01.prod.outlook.com	2025-09-08 06:10:15 +00:00
Tatsuo Ishii	06473f5a34	Allow to log raw parse tree. This commit allows to log the raw parse tree in the same way we currently log the parse tree, rewritten tree, and plan tree. To avoid unnecessary log noise for users not interested in this detail, a new GUC option, "debug_print_raw_parse", has been added. When starting the PostgreSQL process with "-d N", and N is 3 or higher, debug_print_raw_parse is enabled automatically, alongside debug_print_parse. Author: Chao Li <lic@highgo.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Tatsuo Ishii <ishii@postgresql.org> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/CAEoWx2mcO0Gpo4vd8kPMAFWeJLSp0MeUUnaLdE1x0tSVd-VzUw%40mail.gmail.com	2025-09-06 07:49:51 +09:00
Andres Freund	2c78940527	bufmgr: Remove freelist, always use clock-sweep This set of changes removes the list of available buffers and instead simply uses the clock-sweep algorithm to find and return an available buffer. This also removes the have_free_buffer() function and simply caps the pg_autoprewarm process to at most NBuffers. While on the surface this appears to be removing an optimization it is in fact eliminating code that induces overhead in the form of synchronization that is problematic for multi-core systems. The main reason for removing the freelist, however, is not the moderate improvement in scalability, but that having the freelist would require dedicated complexity in several upcoming patches. As we have not been able to find a case benefiting from the freelist... Author: Greg Burd <greg@burd.me> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/70C6A5B5-2A20-4D0B-BC73-EB09DD62D61C@getmailspring.com	2025-09-05 12:25:59 -04:00
Andres Freund	50e4c6ace5	bufmgr: Use consistent naming of the clock-sweep algorithm Minor edits to comments only. Author: Greg Burd <greg@burd.me> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/70C6A5B5-2A20-4D0B-BC73-EB09DD62D61C@getmailspring.com	2025-09-05 12:25:59 -04:00
Dean Rasheed	6ede13d1b5	Fix concurrent update issue with MERGE. When executing a MERGE UPDATE action, if there is more than one concurrent update of the target row, the lock-and-retry code would sometimes incorrectly identify the latest version of the target tuple, leading to incorrect results. This was caused by using the ctid field from the TM_FailureData returned by table_tuple_lock() in a case where the result was TM_Ok, which is unsafe because the TM_FailureData struct is not guaranteed to be fully populated in that case. Instead, it should use the tupleid passed to (and updated by) table_tuple_lock(). To reduce the chances of similar errors in the future, improve the commentary for table_tuple_lock() and TM_FailureData to make it clearer that table_tuple_lock() updates the tid passed to it, and most fields of TM_FailureData should not be relied on in non-failure cases. An exception to this is the "traversed" field, which is set in both success and failure cases. Reported-by: Dmitry <dsy.075@yandex.ru> Author: Yugo Nagata <nagata@sraoss.co.jp> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/1570d30e-2b95-4239-b9c3-f7bf2f2f8556@yandex.ru Backpatch-through: 15	2025-09-05 08:18:18 +01:00
Michael Paquier	4246a977ba	Switch some numeric-related functions to use soft error reporting This commit changes some functions related to the data type numeric to use the soft error reporting rather than a custom boolean flag (called "have_error") that callers of these functions could rely on to bypass the generation of ERROR reports, letting the callers do their own error handling (timestamp, jsonpath and numeric_to_char() require them). This results in the removal of some boilerplate code that was required to handle both the ereport() and the "have_error" code paths bypassing ereport(), unifying everything under the soft error reporting facility. While on it, some duplicated error messages are removed. The function upgraded in this commit were suffixed with "_opt_error" in their names. They are renamed to "_safe" instead. This change relies on `d9f7f5d32f`, that has introduced the soft error reporting infrastructure. Author: Amul Sul <sulamul@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/CAAJ_b96No5h5tRuR+KhcC44YcYUCw8WAHuLoqqyyop8_k3+JDQ@mail.gmail.com	2025-09-05 13:53:47 +09:00
Michael Paquier	ae45312008	Change pg_lsn_in_internal() to use soft error reporting pg_lsn includes pg_lsn_in_internal() for the purpose of parsing a LSN position for the GUC recovery_target_lsn (`21f428ebde`). It relies on a boolean called "have_error" that would be set when the LSN parsing fails, then let its callers handle any errors. `d9f7f5d32f` has added support for soft error reporting. This commit removes some boilerplate code and switches the routine to use soft error reporting directly, giving to the callers of pg_lsn_in_internal() the possibility to be fed the error message generated on failure. pg_lsn_in_internal() routine is renamed to pg_lsn_in_safe(), for consistency with other similar routines that are given an escontext. Author: Amul Sul <sulamul@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/CAAJ_b96No5h5tRuR+KhcC44YcYUCw8WAHuLoqqyyop8_k3+JDQ@mail.gmail.com	2025-09-05 12:59:29 +09:00
Peter Eisentraut	f0478149c3	Clean up newly added guc_tables.inc.c There was a missing makefile rule to clean up the guc_tables.inc.c symlink in src/include/. Oversight in commit `6359989654`. Author: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/dae6fe89-1e0c-4c3f-8d92-19d23374fb10%40eisentraut.org	2025-09-04 17:25:43 +02:00
Dean Rasheed	5386bfb9c1	Fix replica identity check for INSERT ON CONFLICT DO UPDATE. If an INSERT has an ON CONFLICT DO UPDATE clause, the executor must check that the target relation supports UPDATE as well as INSERT. In particular, it must check that the target relation has a REPLICA IDENTITY if it publishes updates. Formerly, it was not doing this check, which could lead to silently breaking replication. Fix by adding such a check to CheckValidResultRel(), which requires adding a new onConflictAction argument. In back-branches, preserve ABI compatibility by introducing a wrapper function with the original signature. Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Tested-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/OS3PR01MB57180C87E43A679A730482DF94B62@OS3PR01MB5718.jpnprd01.prod.outlook.com Backpatch-through: 13	2025-09-04 11:27:53 +01:00
Nathan Bossart	38b602b028	Move dynamically-allocated LWLock tranche names to shared memory. There are two ways for shared libraries to allocate their own LWLock tranches. One way is to call RequestNamedLWLockTranche() in a shmem_request_hook, which requires the library to be loaded via shared_preload_libraries. The other way is to call LWLockNewTrancheId(), which is not subject to the same restrictions. However, LWLockNewTrancheId() does require each backend to store the tranche's name in backend-local memory via LWLockRegisterTranche(). This API is a little cumbersome and leads to things like unhelpful pg_stat_activity.wait_event values in backends that haven't loaded the library. This commit moves these LWLock tranche names to shared memory, thus eliminating the need for each backend to call LWLockRegisterTranche(). Instead, the tranche name must be provided to LWLockNewTrancheId(), which immediately makes the name available to all backends. Since the tranche name array is append-only, lookups can ordinarily avoid locking as long as their local copy of the LWLock counter is greater than the requested tranche ID. One downside of this approach is that we now have a hard limit on both the length of tranche names (NAMEDATALEN-1 bytes) and the number of dynamically-allocated tranches (256). Besides a limit of NAMEDATALEN-1 bytes for tranche names registered via RequestNamedLWLockTranche(), no such limits previously existed. We could avoid these new limits by using dynamic shared memory, but the complexity involved didn't seem worth it. We briefly considered making the tranche limit user-configurable but ultimately decided against that, too. Since there is still a lot of time left in the v19 development cycle, it's possible we will revisit this choice. Author: Sami Imseih <samimseih@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Rahila Syed <rahilasyed90@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAA5RZ0vvED3naph8My8Szv6DL4AxOVK3eTPS0qXsaKi%3DbVdW2A%40mail.gmail.com	2025-09-03 13:57:48 -05:00
Peter Eisentraut	6359989654	Generate GUC tables from .dat file Store the information in guc_tables.c in a .dat file similar to the catalog data in src/include/catalog/, and generate a part of guc_tables.c from that. The goal is to make it easier to edit that information, and to be able to make changes to the downstream data structures more easily. (Essentially, those are the same reasons as for the original adoption of the .dat format.) Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: David E. Wheeler <david@justatheory.com> Discussion: https://www.postgresql.org/message-id/flat/dae6fe89-1e0c-4c3f-8d92-19d23374fb10%40eisentraut.org	2025-09-03 09:45:17 +02:00
Michael Paquier	c6ea528b47	Update outdated references to the SLRU ControlLock SLRU bank locks are referred as "bank locks" or "SLRU bank locks" in the code comments. The comments updated in this commit use the latter term. Oversight in `53c2a97a92`, that has replaced the single ControlLock by the bank control locks. Author: Julien Rouhaud <julien.rouhaud@free.fr> Discussion: https://postgr.es/m/aLUT2UO8RjJOzZNq@jrouhaud Backpatch-through: 17	2025-09-03 10:20:28 +09:00
Nathan Bossart	510777a2d5	Change ReplicationSlotPersistentData's "synced" member to a bool. Note that this doesn't require bumping SLOT_VERSION because we require sizeof(bool) == 1, thanks to commit `97525bc5c8`. Overight in commit `ddd5f4f54a`. Discussion: Ranier Vilela <ranier.vf@gmail.com>	2025-09-02 16:53:54 -05:00
Michael Paquier	eccba079c2	Generate pgstat_count_slru*() functions for slru using macros This change replaces seven functions definitions by macros, reducing a bit some repetitive patterns in the code. An interesting side effect is that this removes an inconsistency in the naming of SLRU increment functions with the field names. This change is similar to `850f4b4c8c`, `8018ffbf58` or `83a1a1b566`. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aLHA//gr4dTpDHHC@ip-10-97-1-34.eu-west-3.compute.internal	2025-09-02 16:22:03 +09:00
Amit Kapila	a850be2fe6	Add max_retention_duration option to subscriptions. This commit introduces a new subscription parameter, max_retention_duration, aimed at mitigating excessive accumulation of dead tuples when retain_dead_tuples is enabled and the apply worker lags behind the publisher. When the time spent advancing a non-removable transaction ID exceeds the max_retention_duration threshold, the apply worker will stop retaining conflict detection information. In such cases, the conflict slot's xmin will be set to InvalidTransactionId, provided that all apply workers associated with the subscription (with retain_dead_tuples enabled) confirm the retention duration has been exceeded. To ensure retention status persists across server restarts, a new column subretentionactive has been added to the pg_subscription catalog. This prevents unnecessary reactivation of retention logic after a restart. The conflict detection slot will not be automatically re-initialized unless a new subscription is created with retain_dead_tuples = true, or the user manually re-enables retain_dead_tuples. A future patch will introduce support for automatic slot re-initialization once at least one apply worker confirms that the retention duration is within the configured max_retention_duration. Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com	2025-09-02 03:20:18 +00:00
Nathan Bossart	67fcf48c3b	Make LWLockCounter a global variable. Using the LWLockCounter requires first calculating its address in shared memory like this: LWLockCounter = (int ) ((char ) MainLWLockArray - sizeof(int)); Commit `82e861fbe1` started this trend in order to fix EXEC_BACKEND builds, but it could also be fixed by adding it to the BackendParameters struct. The current approach is somewhat difficult to follow, so this commit switches to the latter. While at it, swap around the code in LWLockShmemSize() to match the order of assignments in CreateLWLocks() for added readability. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aLDLnan9gNCS9fHx%40nathan	2025-08-29 12:13:37 -05:00
Peter Eisentraut	991295f387	Mark ItemPointer arguments as const in tuple/table lock functions The functions LockTuple, ConditionalLockTuple, UnlockTuple, and XactLockTableWait take an ItemPointer argument that they do not modify, so the argument can be const-qualified to better convey intent and allow the compiler to enforce immutability. Author: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAEoWx2m9e4rECHBwpRE4%2BGCH%2BpbYZXLh2f4rB1Du5hDfKug%2BOg%40mail.gmail.com	2025-08-29 07:39:58 +02:00
Álvaro Herrera	325fc0ab14	Avoid including commands/dbcommands.h in so many places This has been done historically because of get_database_name (which since commit `cb98e6fb8f` belongs in lsyscache.c/h, so let's move it there) and get_database_oid (which is in the right place, but whose declaration should appear in pg_database.h rather than dbcommands.h). Clean this up. Also, xlogreader.h and stringinfo.h are no longer needed by dbcommands.h since commit `f1fd515b39`, so remove them. Author: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/202508191031.5ipojyuaswzt@alvherre.pgsql	2025-08-28 12:39:04 +02:00
Andres Freund	5865150b6d	aio: Stop using enum bitfields due to bad code generation During an investigation into rather odd aio related errors on macos, observed by Alexander and Konstantin, we started to wonder if bitfield access is related to the error. At the moment it looks like it is related, we cannot reproduce the failures when replacing the bitfields. In addition, the problem can only be reproduced with some compiler [versions] and not everyone has been able to reproduce the issue. The observed problem is that, very rarely, PgAioHandle->{state,target} are in an inconsistent state, after having been checked to be in a valid state not long before, triggering an assertion failure. Unfortunately, this could be caused by wrong compiler code generation or somehow of missing memory barriers - we don't really know. In theory there should not be any concurrent write access to the handle in the state the bug is triggered, as the handle was idle and is just being initialized. Separately from the bug, we observed that at least gcc and clang generate rather terrible code for the bitfield access. Even if it's not clear if the observed assertion failure is actually caused by the bitfield somehow, the bad code generation alone is sufficient reason to stop using bitfields. Therefore, replace the enum bitfields with uint8s and instead cast in each switch statement. Reported-by: Alexander Lakhin <exclusion@gmail.com> Reported-by: Konstantin Knizhnik <knizhnik@garret.ru> Discussion: https://postgr.es/m/1500090.1745443021@sss.pgh.pa.us Backpatch-through: 18	2025-08-27 19:12:11 -04:00
Peter Eisentraut	99234e9ddc	Message wording improvements Use "row" instead of "tuple" for user-facing information for logical replication conflicts.	2025-08-25 23:15:24 +02:00
Nathan Bossart	3ef2b863a3	Use PqMsg_* macros in fe-protocol3.c. Oversight in commit `f4b54e1ed9`. Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Discussion: https://postgr.es/m/aKx5vEbbP03JNgtp%40nathan	2025-08-25 11:08:26 -05:00

... 2 3 4 5 6 ...

12688 commits