postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-04-23 07:07:22 -04:00

Author	SHA1	Message	Date
Amit Kapila	308622edf1	Avoid including utils/timestamp.h in conflict.h. conflict.h currently includes utils/timestamp.h despite only requiring basic timestamp type definitions. This creates unnecessary overhead. Replace the include with datatype/timestamp.h to provide the necessary types. This change requires explicitly including utils/timestamp.h in test_custom_fixed_stats.c, which previously relied on the indirect inclusion. Extracted from the larger patch by Andres Freund. Discussion: https://postgr.es/m/aY-UE-4t7FiYgH3t@alap3.anarazel.de	2026-02-23 10:19:05 +05:30
Heikki Linnakangas	412f78c66e	Align PGPROC to cache line boundary On common architectures, the PGPROC struct happened to be a multiple of 64 bytes on PG 18, but it's changed on 'master' since. There was worry that changing the alignment might hurt performance, due to false cacheline sharing across elements in the proc array. However, there was no explicit alignment, so any alignment to cache lines was accidental. Add explicit alignment to remove worry about false sharing. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/3dd6f70c-b94d-4428-8e75-74a7136396be@iki.fi	2026-02-22 13:13:43 +02:00
Heikki Linnakangas	2e0853176f	Rearrange fields in PGPROC, for clarity The ordering was pretty random, making it hard to get an overview of what's in it. Group related fields together, and add comments to act as separators between the groups. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/3dd6f70c-b94d-4428-8e75-74a7136396be@iki.fi	2026-02-22 12:45:13 +02:00
Álvaro Herrera	0eeffd31bf	Avoid name collision with NOT NULL constraints If a CREATE TABLE statement defined a constraint whose name is identical to the name generated for a NOT NULL constraint, we'd throw an (unnecessary) unique key violation error on pg_constraint_conrelid_contypid_conname_index: this can easily be avoided by choosing a different name for the NOT NULL constraint. Fix by passing the constraint names already created by AddRelationNewConstraints() to AddRelationNotNullConstraints(), so that the latter can avoid name collisions with them. Bug: #19393 Author: Laurenz Albe <laurenz.albe@cybertec.at> Reported-by: Hüseyin Demir <huseyin.d3r@gmail.com> Backpatch-through: 18 Discussion: https://postgr.es/m/19393-6a82427485a744cf@postgresql.org	2026-02-21 12:22:08 +01:00
Heikki Linnakangas	36bbcd5be3	Split PGPROC 'links' field into two, for clarity The field was mainly used for the position in a LOCK's wait queue, but also as the position in a the freelist when the PGPROC entry was not in use. The reuse saves some memory at the expense of readability, which seems like a bad tradeoff. If we wanted to make the struct smaller there's other things we could do, but we're actually just discussing adding padding to the struct for performance reasons. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/3dd6f70c-b94d-4428-8e75-74a7136396be@iki.fi	2026-02-20 22:34:42 +02:00
Amit Kapila	9842e8aca0	Avoid including worker_internal.h in pgstat.h. pgstat.h is a widely included header. Including worker_internal.h there is unnecessary and creates tight coupling. By refactoring pgstat_report_subscription_error() to fetch the required LogicalRepWorkerType internally rather than receiving it as an argument, we can eliminate the need for the internal header. Reported-by: Andres Freund <andres@anarazel.de> Author: Nisha Moond <nisha.moond412@gmail.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/aY-UE-4t7FiYgH3t@alap3.anarazel.de	2026-02-20 09:26:33 +05:30
Nathan Bossart	ba401828c1	Remove SpinLockFree() and S_LOCK_FREE(). S_LOCK_FREE() is used by the test program in s_lock.c, but nobody has voiced concerns about losing some coverage there. SpinLockFree() appears to have been unused since it was introduced by commit `499abb0c0f`. There was agreement to remove these in 2020, but it never happened. Since we still have agreement for removal in 2026, let's do that now. Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/aZX2oUcKf7IzHnnK%40nathan Discussion: https://postgr.es/m/20200608225338.m5zho424w6lpwb2d%40alap3.anarazel.de	2026-02-19 16:19:41 -06:00
Nathan Bossart	aa71a35a40	Assume "inline" keyword is available. This has been a keyword since C99, and we now require C11, so we no longer need to use __inline__ or to check for it at configure time. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/aZdGbDaV4_yKCMc-%40nathan	2026-02-19 14:37:29 -06:00
Fujii Masao	fb80f388f4	Add per-subscription wal_receiver_timeout setting. This commit allows setting wal_receiver_timeout per subscription using the CREATE SUBSCRIPTION and ALTER SUBSCRIPTION commands. The value is stored in the subwalrcvtimeout column of the pg_subscription catalog. When set, this value overrides the global wal_receiver_timeout for the subscription's apply worker. The default is -1, which means the global setting (from the server configuration, command line, role, or database) remains in effect. This feature is useful for configuring different timeout values for each subscription, especially when connecting to multiple publisher servers, to improve failure detection. Bump catalog version. Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/a1414b64-bf58-43a6-8494-9704975a41e9@oss.nttdata.com	2026-02-20 01:00:09 +09:00
Peter Eisentraut	8354b9d6b6	Use fallthrough attribute instead of comment Instead of using comments to mark fallthrough switch cases, use the fallthrough attribute. This will (in the future, not here) allow supporting other compilers besides gcc. The commenting convention is only supported by gcc, the attribute is supported by clang, and in the fullness of time the C23 standard attribute would allow supporting other compilers as well. Right now, we package the attribute into a macro called pg_fallthrough. This commit defines that macro and replaces the existing comments with that macro invocation. We also raise the level of the gcc -Wimplicit-fallthrough= option from 3 to 5 to enforce the use of the attribute. Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Discussion: https://www.postgresql.org/message-id/flat/76a8efcd-925a-4eaf-bdd1-d972cd1a32ff%40eisentraut.org	2026-02-19 08:51:12 +01:00
Tom Lane	759b03b24c	Simplify creation of built-in functions with default arguments. Up to now, to create such a function, one had to make a pg_proc.dat entry and then overwrite it with a CREATE OR REPLACE command in system_functions.sql. That's error-prone (cf. bug #19409) and results in leaving dead rows in the initial contents of pg_proc. Manual maintenance of pg_node_tree strings seems entirely impractical, and parsing expressions during bootstrap would be extremely difficult as well. But Andres Freund observed that all the current use-cases are simple constants, and building a Const node is well within the capabilities of bootstrap mode. So this patch invents a special case: if bootstrap mode is asked to ingest a non-null value for pg_proc.proargdefaults (which would otherwise fail in pg_node_tree_in), it parses the value as an array literal and then feeds the element strings to the input functions for the corresponding parameter types. Then we can build a suitable pg_node_tree string with just a few more lines of code. This allows removing all the system_functions.sql entries that are just there to set up default arguments, replacing them with proargdefaults fields in pg_proc.dat entries. The old technique remains available in case someone needs a non-constant default. The initial contents of pg_proc are demonstrably the same after this patch, except that (1) json_strip_nulls and jsonb_strip_nulls now have the correct provolatile setting, as per bug #19409; (2) pg_terminate_backend, make_interval, and drandom_normal now have defaults that don't include a type coercion, which is how they should have been all along. In passing, remove some unused entries from bootstrap.c's TypInfo[] array. I had to add some new ones because we'll now need an entry for each default-possessing system function parameter, but we shouldn't carry more than we need there; it's just a maintenance gotcha. Bug: #19409 Reported-by: Lucio Chiessi <lucio.chiessi@trustly.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Author: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/183292bb-4891-4c96-a3ca-e78b5e0e1358@dunslane.net Discussion: https://postgr.es/m/19409-e16cd2605e59a4af@postgresql.org	2026-02-18 14:14:44 -05:00
Álvaro Herrera	3894f08abe	Update obsolete comment table_tuple_update's update_indexes argument hasn't been a boolean since commit `19d8e2308b`. Backpatch-through: 16	2026-02-18 18:09:54 +01:00
Michael Paquier	ee642cccc4	Switch SysCacheIdentifier to a typedef enum The main purpose of this change is to allow an ABI checker to understand when the list of SysCacheIdentifier changes, by switching all the routine declarations that relied on a signed integer for a syscache ID to this new type. This is going to be useful in the long-term for versions newer than v19 so as we will be able to check when the list of values in SysCacheIdentifier is updated in a non-ABI compliant fashion. Most of the changes of this commit are due to the new definition of SyscacheCallbackFunction, where a SysCacheIdentifier is now required for the syscache ID. It is a mechanical change, still slightly invasive. There are more areas in the tree that could be improved with an ABI checker in mind; this takes care of only one area. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Author: Andreas Karlsson <andreas@proxel.se> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/289125.1770913057@sss.pgh.pa.us	2026-02-18 09:58:38 +09:00
Álvaro Herrera	b7271aa1d7	Use a bitmask for ExecInsertIndexTuples options ... instead of passing a bunch of separate booleans. Also, rearrange the argument list in a hopefully more sensible order. Discussion: https://postgr.es/m/202602111846.xpvuccb3inbx@alvherre.pgsql Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> (older version)	2026-02-17 17:59:45 +01:00
Tom Lane	6be5b76d66	Ensure that all three build methods install the same set of files. syscache_info.h was installed into $installdir/include/server/catalog if you use a non-VPATH autoconf build, but not if you use a VPATH build or meson. That happened because the makefiles blindly install src/include/catalog/*.h, and in a non-VPATH build the generated header files would be swept up in that. While it's hard to conjure a reason to need syscache_info.h outside of backend build, it's also hard to get the makefiles to skip syscache_info.h, so let's go the other way and install it in the other two cases too. Another problem, new in v19, was that meson builds install a copy of src/include/catalog/README, while autoconf builds do not. The issue here is that that file is new and wasn't added to meson.build's exclusion list. While it's clearly a bug if different build methods don't install the same set of files, I doubt anyone would thank us for changing the behavior in released branches. Hence, fix in master only. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/946828.1771185367@sss.pgh.pa.us	2026-02-16 15:20:15 -05:00
Peter Eisentraut	d50c86e743	Change remaining StaticAssertStmt() to StaticAssertDecl() This completes the work started by commit `75f49221c2`. In basebackup.c, changing the StaticAssertStmt to StaticAssertDecl results in having the same StaticAssertDecl() in 2 functions. So, it makes more sense to move it to file scope instead. Also, as it depends on some computations based on 2 tar blocks, define TAR_NUM_TERMINATION_BLOCKS. In deadlock.c, change the StaticAssertStmt to StaticAssertDecl and keep it in the function scope. Add new braces to avoid warning from -Wdeclaration-after-statement. In aset.c, change the StaticAssertStmt to StaticAssertDecl and move it to file scope. Finally, update the comments in c.h a bit. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Co-authored-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/aYH6ii46AvGVCB84%40ip-10-97-1-34.eu-west-3.compute.internal	2026-02-16 09:22:43 +01:00
John Naylor	ef3c3cf6d0	Perform radix sort on SortTuples with pass-by-value Datums Radix sort can be much faster than quicksort, but for our purposes it is limited to sequences of unsigned bytes. To make tuples with other types amenable to this technique, several features of tuple comparison must be accounted for, i.e. the sort key must be "normalized": 1. Signedness -- It's possible to modify a signed integer such that it can be compared as unsigned. For example, a signed char has range -128 to 127. If we cast that to unsigned char and add 128, the range of values becomes 0 to 255 while preserving order. 2. Direction -- SQL allows specification of ASC or DESC. The descending case is easily handled by taking the complement of the unsigned representation. 3. NULL values -- NULLS FIRST and NULLS LAST must work correctly. This commmit only handles the case where datum1 is pass-by-value Datum (possibly abbreviated) that compares like an ordinary integer. (Abbreviations of values of type "numeric" are a convenient counterexample.) First, tuples are partitioned by nullness in the correct NULL ordering. Then the NOT NULL tuples are sorted with radix sort on datum1. For tiebreaks on subsequent sortkeys (including the first sort key if abbreviated), we divert to the usual qsort. ORDER BY queries on pre-warmed buffers are up to 2x faster on high cardinality inputs with radix sort than the sort specializations added by commit `697492434`, so get rid of them. It's sufficient to fall back to qsort_tuple() for small arrays. Moderately low cardinality inputs show more modest improvents. Our qsort is strongly optimized for very low cardinality inputs, but radix sort is usually equal or very close in those cases. The changes to the regression tests are caused by under-specified sort orders, e.g. "SELECT a, b from mytable order by a;". For unstable sorts, such as our qsort and this in-place radix sort, there is no guarantee of the order of "b" within each group of "a". The implementation is taken from ska_byte_sort() (Boost licensed), which is similar to American flag sort (an in-place radix sort) with modifications to make it better suited for modern pipelined CPUs. The technique of normalization described above can also be extended to the case of multiple keys. That is left for future work (Thanks to Peter Geoghegan for the suggestion to look into this area). Reviewed-by: Chengpeng Yan <chengpeng_yan@outlook.com> Reviewed-by: zengman <zengman@halodbtech.com> Reviewed-by: ChangAo Chen <cca5507@qq.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Chao Li <li.evan.chao@gmail.com> (earlier version) Discussion: https://postgr.es/m/CANWCAZYzx7a7E9AY16Jt_U3+GVKDADfgApZ-42SYNiig8dTnFA@mail.gmail.com	2026-02-14 13:50:06 +07:00
Heikki Linnakangas	d7a4291bb7	Fix comment neglected in commit `ddc3250208` I renamed the field in commit `ddc3250208`, but missed this one reference.	2026-02-12 19:41:02 +02:00
Nathan Bossart	a468898883	Remove specialized word-length popcount implementations. The uses of these functions do not justify the level of micro-optimization we've done and may even hurt performance in some cases (e.g., due to using function pointers). This commit removes all architecture-specific implementations of pg_popcount{32,64} and converts the portable ones to inlined functions in pg_bitutils.h. These inlined versions should produce the same code as before (but inlined), so in theory this is a net gain for many machines. A follow-up commit will replace the remaining loops over these word-length popcount functions with calls to pg_popcount(), further reducing the need for architecture-specific implementations. Suggested-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Greg Burd <greg@burd.me> Discussion: https://postgr.es/m/CANWCAZY7R%2Biy%2Br9YM_sySNydHzNqUirx1xk0tB3ej5HO62GdgQ%40mail.gmail.com	2026-02-12 11:32:49 -06:00
Nathan Bossart	cb7b2e5e8e	Remove some unnecessary optimizations in popcount code. Over the past few releases, we've added a huge amount of complexity to our popcount implementations. Commits `fbe327e5b4`, `79e232ca01`, `8c6653516c`, and `25dc485074` did some preliminary refactoring, but many opportunities remain. In particular, if we disclaim interest in micro-optimizing this code for 32-bit builds and in unnecessary alignment checks on x86-64, we can remove a decent chunk of code. I cannot find public discussion or benchmarks for the code this commit removes, but it seems unlikely that this change will noticeably impact performance on affected systems. Suggested-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/CANWCAZY7R%2Biy%2Br9YM_sySNydHzNqUirx1xk0tB3ej5HO62GdgQ%40mail.gmail.com	2026-02-12 11:32:49 -06:00
Dean Rasheed	88327092ff	Add support for INSERT ... ON CONFLICT DO SELECT. This adds a new ON CONFLICT action DO SELECT [FOR UPDATE/SHARE], which returns the pre-existing rows when conflicts are detected. The INSERT statement must have a RETURNING clause, when DO SELECT is specified. The optional FOR UPDATE/SHARE clause allows the rows to be locked before they are are returned. As with a DO UPDATE conflict action, an optional WHERE clause may be used to prevent rows from being selected for return (but as with a DO UPDATE action, rows filtered out by the WHERE clause are still locked). Bumps catversion as stored rules change. Author: Andreas Karlsson <andreas@proxel.se> Author: Marko Tiikkaja <marko@joh.to> Author: Viktor Holmberg <v@viktorh.net> Reviewed-by: Joel Jacobson <joel@compiler.org> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/d631b406-13b7-433e-8c0b-c6040c4b4663@Spark Discussion: https://postgr.es/m/5fca222d-62ae-4a2f-9fcb-0eca56277094@Spark Discussion: https://postgr.es/m/2b5db2e6-8ece-44d0-9890-f256fdca9f7e@proxel.se Discussion: https://postgr.es/m/CAL9smLCdV-v3KgOJX3mU19FYK82N7yzqJj2HAwWX70E=P98kgQ@mail.gmail.com	2026-02-12 09:57:04 +00:00
Dean Rasheed	706cadde32	Remove p_is_insert from struct ParseState. The only place that used p_is_insert was transformAssignedExpr(), which used it to distinguish INSERT from UPDATE when handling indirection on assignment target columns -- see commit `c1ca3a19df`. However, this information is already available to transformAssignedExpr() via its exprKind parameter, which is always either EXPR_KIND_INSERT_TARGET or EXPR_KIND_UPDATE_TARGET. As noted in the commit message for `c1ca3a19df`, this use of p_is_insert isn't particularly pretty, so have transformAssignedExpr() use the exprKind parameter instead. This then allows p_is_insert to be removed entirely, which simplifies state management in a few other places across the parser. Author: Viktor Holmberg <v@viktorh.net> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/badc3b4c-da73-4000-b8d3-638a6f53a769@Spark	2026-02-12 09:01:42 +00:00
Nathan Bossart	1d92e0c2cc	Add password expiration warnings. This commit adds a new parameter called password_expiration_warning_threshold that controls when the server begins emitting imminent-password-expiration warnings upon successful password authentication. By default, this parameter is set to 7 days, but this functionality can be disabled by setting it to 0. This patch also introduces a new "connection warning" infrastructure that can be reused elsewhere. For example, we may want to warn about the use of MD5 passwords for a couple of releases before removing MD5 password support. Author: Gilles Darold <gilles@darold.net> Co-authored-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: songjinzhou <tsinghualucky912@foxmail.com> Reviewed-by: liu xiaohui <liuxh.zj.cn@gmail.com> Reviewed-by: Yuefei Shi <shiyuefei1004@gmail.com> Reviewed-by: Steven Niu <niushiji@gmail.com> Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/129bcfbf-47a6-e58a-190a-62fc21a17d03%40migops.com	2026-02-11 10:36:15 -06:00
Álvaro Herrera	1efdd7cc63	Cleanup for log_min_messages changes in `38e0190ced` * Remove an unused variable * Use "default log level" consistently (instead of "generic") * Keep the process types in alphabetical order (missed one place in the SGML docs) * Since log_min_messages type was changed from enum to string, it is a good idea to add single quotes when printing it out. Otherwise it fails if the user copies and pastes from the SHOW output to SET, except in the simplest case. Using single quotes reduces confusion. * Use lowercase string for the burned-in default value, to keep the same output as previous versions. Author: Euler Taveira <euler@eulerto.com> Author: Man Zeng <zengman@halodbtech.com> Author: Noriyoshi Shinoda <noriyoshi.shinoda@hpe.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/202602091250.genyflm2d5dw@alvherre.pgsql	2026-02-11 16:38:18 +01:00
Heikki Linnakangas	7984ce7a1d	Move ProcStructLock to the ProcGlobal struct It protects the freeProcs and some other fields in ProcGlobal, so let's move it there. It's good for cache locality to have it next to the thing it protects, and just makes more sense anyway. I believe it was allocated as a separate shared memory area just for historical reasons. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/b78719db-0c54-409f-b185-b0d59261143f@iki.fi	2026-02-11 16:48:45 +02:00
Robert Haas	7358abcc60	Store information about Append node consolidation in the final plan. An extension (or core code) might want to reconstruct the planner's decisions about whether and where to perform partitionwise joins from the final plan. To do so, it must be possible to find all of the RTIs of partitioned tables appearing in the plan. But when an AppendPath or MergeAppendPath pulls up child paths from a subordinate AppendPath or MergeAppendPath, the RTIs of the subordinate path do not appear in the final plan, making this kind of reconstruction impossible. To avoid this, propagate the RTI sets that would have been present in the 'apprelids' field of the subordinate Append or MergeAppend nodes that would have been created into the surviving Append or MergeAppend node, using a new 'child_append_relid_sets' field for that purpose. The value of this field is a list of Bitmapsets, because each relation whose append-list was pulled up had its own set of RTIs: just one, if it was a partitionwise scan, or more than one, if it was a partitionwise join. Since our goal is to see where partitionwise joins were done, it is essential to avoid losing the information about how the RTIs were grouped in the pulled-up relations. This commit also updates pg_overexplain so that EXPLAIN (RANGE_TABLE) will display the saved RTI sets. Co-authored-by: Robert Haas <rhaas@postgresql.org> Co-authored-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Greg Burd <greg@burd.me> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Haibo Yan <tristan.yim@gmail.com> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com	2026-02-10 17:55:59 -05:00
Michael Paquier	9181c870ba	Improve type handling of varlena structures This commit changes the definition of varlena to a typedef, so as it becomes possible to remove "struct" markers from various declarations in the code base. Historically, "struct" markers are not the project style for variable declarations, so this update simplifies the code and makes it more consistent across the board. This change has an impact on the following structures, simplifying declarations using them: - varlena - varatt_indirect - varatt_external This cleanup has come up in a different path set that played with TOAST and varatt.h, independently worth doing on its own. Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Reviewed-by: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aW8xvVbovdhyI4yo@paquier.xyz	2026-02-11 07:33:24 +09:00
Robert Haas	0d4391b265	Store information about elided nodes in the final plan. An extension (or core code) might want to reconstruct the planner's choice of join order from the final plan. To do so, it must be possible to find all of the RTIs that were part of the join problem in that plan. Commit `adbad833f3`, together with the earlier work in `8c49a484e8`, is enough to let us match up RTIs we see in the final plan with RTIs that we see during the planning cycle, but we still have a problem if the planner decides to drop some RTIs out of the final plan altogether. To fix that, when setrefs.c removes a SubqueryScan, single-child Append, or single-child MergeAppend from the final Plan tree, record the type of the removed node and the RTIs that the removed node would have scanned in the final plan tree. It would be natural to record this information on the child of the removed plan node, but that would require adding an additional pointer field to type Plan, which seems undesirable. So, instead, store the information in a separate list that the executor need never consult, and use the plan_node_id to identify the plan node with which the removed node is logically associated. Also, update pg_overexplain to display these details. Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Greg Burd <greg@burd.me> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Haibo Yan <tristan.yim@gmail.com> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com	2026-02-10 16:46:05 -05:00
Robert Haas	adbad833f3	Store information about range-table flattening in the final plan. Suppose that we're currently planning a query and, when that same query was previously planned and executed, we learned something about how a certain table within that query should be planned. We want to take note when that same table is being planned during the current planning cycle, but this is difficult to do, because the RTI of the table from the previous plan won't necessarily be equal to the RTI that we see during the current planning cycle. This is because each subquery has a separate range table during planning, but these are flattened into one range table when constructing the final plan, changing RTIs. Commit `8c49a484e8` allows us to match up subqueries seen in the previous planning cycles with the subqueries currently being planned just by comparing textual names, but that's not quite enough to let us deduce anything about individual tables, because we don't know where each subquery's range table appears in the final, flattened range table. To fix that, store a list of SubPlanRTInfo objects in the final planned statement, each including the name of the subplan, the offset at which it begins in the flattened range table, and whether or not it was a dummy subplan -- if it was, some RTIs may have been dropped from the final range table, but also there's no need to control how a dummy subquery gets planned. The toplevel subquery has no name and always begins at rtoffset 0, so we make no entry for it. This commit teaches pg_overexplain's RANGE_TABLE option to make use of this new data to display the subquery name for each range table entry. Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Greg Burd <greg@burd.me> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Haibo Yan <tristan.yim@gmail.com> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com	2026-02-10 15:33:39 -05:00
Robert Haas	0f4c8d33d4	Pass cursorOptions to planner_setup_hook. Commit `94f3ad3961` failed to do this because I couldn't think of a use for the information, but this has proven to be short-sighted. Best to fix it before this code is officially released. Now, the only argument to standard_planenr that isn't passed to planner_setup_hook is boundParams, but that is accessible via glob->boundParams, and so doesn't need to be passed separately. Discussion: https://www.postgresql.org/message-id/CA+TgmoYS4ZCVAF2jTce=bMP0Oq_db_srocR4cZyO0OBp9oUoGg@mail.gmail.com	2026-02-10 11:50:28 -05:00
Heikki Linnakangas	be5257725d	Refactor ProcessRecoveryConflictInterrupt for readability Two changes here: 1. Introduce a separate RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK flag to indicate a suspected deadlock that involves a buffer pin. Previously the startup process used the same flag for a deadlock involving just regular locks, and to check for deadlocks involving the buffer pin. The cases are handled separately in the startup process, but the receiving backend had to deduce which one it was based on HoldingBufferPinThatDelaysRecovery(). With a separate flag, the receiver doesn't need to guess. 2. Rewrite the ProcessRecoveryConflictInterrupt() function to not rely on fallthrough through the switch-statement. That was difficult to read. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/4cc13ba1-4248-4884-b6ba-4805349e7f39@iki.fi	2026-02-10 16:23:10 +02:00
Heikki Linnakangas	17f51ea818	Separate RecoveryConflictReasons from procsignals Share the same PROCSIG_RECOVERY_CONFLICT flag for all recovery conflict reasons. To distinguish, have a bitmask in PGPROC to indicate the reason(s). Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/4cc13ba1-4248-4884-b6ba-4805349e7f39@iki.fi	2026-02-10 16:23:08 +02:00
Heikki Linnakangas	ddc3250208	Use ProcNumber rather than pid in ReplicationSlot This helps the next commit. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/4cc13ba1-4248-4884-b6ba-4805349e7f39@iki.fi	2026-02-10 16:23:05 +02:00
Michael Paquier	307447e6db	Add information about range type stats to pg_stats_ext_exprs This commit adds three attributes to the system view pg_stats_ext_exprs, whose data can exist when involving a range type in an expression: range_length_histogram range_empty_frac range_bounds_histogram These statistics fields exist since `918eee0c49`, and have become viewable in pg_stats later in `bc3c8db8ae`. This puts the definition of pg_stats_ext_exprs on par with pg_stats. This issue has showed up during the discussion about the restore of extended statistics for expressions, so as it becomes possible to query the stats data to restore from the catalogs. Having access to this data is useful on its own, without the restore part. Some documentation and some tests are added, written by me. Corey has authored the part in system_views.sql. Bump catalog version. Author: Corey Huinker <corey.huinker@gmail.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/aYmCUx9VvrKiZQLL@paquier.xyz	2026-02-10 12:36:57 +09:00
Álvaro Herrera	cbef472558	Remove HeapTupleheaderSetXminCommitted/Invalid functions They are not and never have been used by any known code -- apparently we just cargo-culted them in commit `37484ad2aa` (or their ancestor macros anyway, which begat these functions in commit `34694ec888`). Allegedly they're also potentially dangerous; users are better off going through HeapTupleSetHintBits instead. Author: Andy Fan <zhihuifan1213@163.com> Discussion: https://postgr.es/m/87sejogt4g.fsf@163.com	2026-02-09 19:15:20 +01:00
Tom Lane	8ebdf41c26	Harden _int_matchsel() against being attached to the wrong operator. While the preceding commit prevented such attachments from occurring in future, this one aims to prevent further abuse of any already- created operator that exposes _int_matchsel to the wrong data types. (No other contrib module has a vulnerable selectivity estimator.) We need only check that the Const we've found in the query is indeed of the type we expect (query_int), but there's a difficulty: as an extension type, query_int doesn't have a fixed OID that we could hard-code into the estimator. Therefore, the bulk of this patch consists of infrastructure to let an extension function securely look up the OID of a datatype belonging to the same extension. (Extension authors have requested such functionality before, so we anticipate that this code will have additional non-security uses, and may soon be extended to allow looking up other kinds of SQL objects.) This is done by first finding the extension that owns the calling function (there can be only one), and then thumbing through the objects owned by that extension to find a type that has the desired name. This is relatively expensive, especially for large extensions, so a simple cache is put in front of these lookups. Reported-by: Daniel Firer as part of zeroday.cloud Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Noah Misch <noah@leadboat.com> Security: CVE-2026-2004 Backpatch-through: 14	2026-02-09 10:14:22 -05:00
Tom Lane	60e7ae41a6	Guard against unexpected dimensions of oidvector/int2vector. These data types are represented like full-fledged arrays, but functions that deal specifically with these types assume that the array is 1-dimensional and contains no nulls. However, there are cast pathways that allow general oid[] or int2[] arrays to be cast to these types, allowing these expectations to be violated. This can be exploited to cause server memory disclosure or SIGSEGV. Fix by installing explicit checks in functions that accept these types. Reported-by: Altan Birler <altan.birler@tum.de> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Noah Misch <noah@leadboat.com> Security: CVE-2026-2003 Backpatch-through: 14	2026-02-09 09:57:43 -05:00
Álvaro Herrera	38e0190ced	Allow log_min_messages to be set per process type Change log_min_messages from being a single element to a comma-separated list of type:level elements, with 'type' representing a process type, and 'level' being a log level to use for that type of process. The list must also have a freestanding level specification which is used for process types not listed, which convenientely makes the whole thing backwards-compatible. Some choices made here could be contested; for instance, we use the process type `backend` to affect regular backends as well as dead-end backends and the standalone backend, and `autovacuum` means both the launcher and the workers. I think it's largely sensible though, and it can easily be tweaked if desired. Author: Euler Taveira <euler@eulerto.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Tan Yang <332696245@qq.com> Discussion: https://postgr.es/m/e85c6671-1600-4112-8887-f97a8a5d07b2@app.fastmail.com	2026-02-09 13:23:10 +01:00
Thomas Munro	1e7fe06c10	Replace pg_mblen() with bounds-checked versions. A corrupted string could cause code that iterates with pg_mblen() to overrun its buffer. Fix, by converting all callers to one of the following: 1. Callers with a null-terminated string now use pg_mblen_cstr(), which raises an "illegal byte sequence" error if it finds a terminator in the middle of the sequence. 2. Callers with a length or end pointer now use either pg_mblen_with_len() or pg_mblen_range(), for the same effect, depending on which of the two seems more convenient at each site. 3. A small number of cases pre-validate a string, and can use pg_mblen_unbounded(). The traditional pg_mblen() function and COPYCHAR macro still exist for backward compatibility, but are no longer used by core code and are hereby deprecated. The same applies to the t_isXXX() functions. Security: CVE-2026-2006 Backpatch-through: 14 Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Co-authored-by: Noah Misch <noah@leadboat.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reported-by: Paul Gerste (as part of zeroday.cloud) Reported-by: Moritz Sanft (as part of zeroday.cloud)	2026-02-09 12:44:04 +13:00
John Naylor	7467041cde	Future-proof sort template against undefined behavior Commit `176dffdf7` added a NULL array pointer check before performing a qsort in order to prevent undefined behavior when passing NULL pointer and zero length. To head off future degenerate cases, check that there are at least two elements to sort before proceeding with insertion sort. This has the added advantage of allowing us to remove four equivalent checks that guarded against recursion/iteration. There might be a tiny performance penalty from unproductive recursions, but we can buy that back by increasing the insertion sort threshold. That is left for future work. Discussion: https://postgr.es/m/CANWCAZZWvds_35nXc4vXD-eBQa_=mxVtqZf-PM_ps=SD7ghhJg@mail.gmail.com	2026-02-07 17:02:35 +07:00
Peter Eisentraut	0af05b5dbb	Revert "Change copyObject() to use typeof_unqual" This reverts commit `4cfce4e62c`. This implementation fails to compile on newer MSVC that support __typeof_unqual__. (Older versions did not support it and compiled fine.) Revert for now and research further. Reported-by: Bryan Green <dbryan.green@gmail.com> Discussion: https://www.postgresql.org/message-id/b03ddcd4-2a16-49ee-b105-e7f609f3c514%40gmail.com	2026-02-07 10:08:38 +01:00
Nathan Bossart	ba1e14134a	Adjust style of some debugging macros. This commit adjusts a few debugging macros to match the style of those in pg_config_manual.h. Like commits `123661427b` and `b4cbc106a6`, these were discovered while reviewing Aleksander Alekseev's proposed changes to pgindent. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aP-H6kSsGOxaB21k%40nathan	2026-02-06 16:24:21 -06:00
Jacob Champion	d8d7c5dc8f	libpq: Prepare for protocol grease during 19beta The main reason that libpq doesn't request protocol version 3.2 by default is because other proxy/server implementations don't implement the negotiation. This is a bit of a chicken-and-egg problem: We don't bump the default version that libpq requests, but other implementations may not be incentivized to implement version negotiation if their users never run into issues. One established practice to combat this is to flip Postel's Law on its head, by sending parameters that the server cannot possibly support. If the server fails the handshake instead of correctly negotiating, then the problem is surfaced naturally. If the server instead claims to support the bogus parameters, then we fail the connection to make the lie obvious. This is called "grease" (or "greasing"), after the GREASE mechanism in TLS that popularized the concept: https://www.rfc-editor.org/rfc/rfc8701.html This patch reserves 3.9999 as an explicitly unsupported protocol version number and `_pq_.test_protocol_negotiation` as an explicitly unsupported protocol extension. A later commit will send these by default in order to stress-test the ecosystem during the beta period; that commit will then be reverted before 19 RC1, so that we can decide what to do with whatever data has been gathered. The _pq_.test_protocol_negotiation change here is intentionally docs- only: after its implementation is reverted, the parameter should remain reserved. Extracted/adapted from a patch by Jelte Fennema-Nio. Author: Jelte Fennema-Nio <postgres@jeltef.nl> Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com> Discussion: https://postgr.es/m/DDPR5BPWH1RJ.1LWAK6QAURVAY%40jeltef.nl	2026-02-06 10:31:45 -08:00
Thomas Munro	f94e9141a0	Add file_extend_method=posix_fallocate,write_zeros. Provide a way to disable the use of posix_fallocate() for relation files. It was introduced by commit `4d330a61bb`. The new setting file_extend_method=write_zeros can be used as a workaround for problems reported from the field: * BTRFS compression is disabled by the use of posix_fallocate() * XFS could produce spurious ENOSPC errors in some Linux kernel versions, though that problem is reported to have been fixed The default is file_extend_method=posix_fallocate if available, as before. The write_zeros option is similar to PostgreSQL < 16, except that now it's multi-block. Backpatch-through: 16 Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reported-by: Dimitrios Apostolou <jimis@gmx.net> Discussion: https://postgr.es/m/b1843124-fd22-e279-a31f-252dffb6fbf2%40gmx.net	2026-02-06 17:38:49 +13:00
Masahiko Sawada	7a1f0f8747	pg_upgrade: Optimize logical replication slot caught-up check. Commit `29d0a77fa6` improved pg_upgrade to allow migrating logical slots provided that all logical slots have caught up (i.e., they have no pending decodable WAL records). Previously, this verification was done by checking each slot individually, which could be time-consuming if there were many logical slots to migrate. This commit optimizes the check to avoid reading the same WAL stream multiple times. It performs the check only for the slot with the minimum confirmed_flush_lsn and applies the result to all other slots in the same database. This limits the check to at most one logical slot per database. During the check, we identify the last decodable WAL record's LSN to report any slots with unconsumed records, consistent with the existing error reporting behavior. Additionally, the maximum confirmed_flush_lsn among all logical slots on the database is used as an early scan cutoff; finding a decodable WAL record beyond this point implies that no slot has caught up. Performance testing demonstrated that the execution time remains stable regardless of the number of slots in the database. Note that we do not distinguish slots based on their output plugins. A hypothetical plugin might use a replication origin filter that filters out changes from a specific origin. In such cases, we might get a false positive (erroneously considering a slot caught up). However, this is safe from a data integrity standpoint, such scenarios are rare, and the impact of a false positive is minimal. This optimization is applied only when the old cluster is version 19 or later. Bump catalog version. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAD21AoBZ0LAcw1OHGEKdW7S5TRJaURdhEk3CLAW69_siqfqyAg@mail.gmail.com	2026-02-04 17:11:27 -08:00
Heikki Linnakangas	084e42bc71	Add backendType to PGPROC, replacing isRegularBackend We can immediately make use of it in pg_signal_backend(), which previously fetched the process type from the backend status array with pgstat_get_backend_type_by_proc_number(). That was correct but felt a little questionable to me: backend status should be for observability purposes only, not for permission checks. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/b77e4962-a64a-43db-81a1-580444b3e8f5@iki.fi	2026-02-04 13:06:04 +02:00
Peter Eisentraut	4cfce4e62c	Change copyObject() to use typeof_unqual Currently, when the argument of copyObject() is const-qualified, the return type is also, because the use of typeof carries over all the qualifiers. This is incorrect, since the point of copyObject() is to make a copy to mutate. But apparently no code ran into it. The new implementation uses typeof_unqual, which drops the qualifiers, making this work correctly. typeof_unqual is standardized in C23, but all recent versions of all the usual compilers support it even in non-C23 mode, at least as __typeof_unqual__. We add a configure/meson test for typeof_unqual and __typeof_unqual__ and use it if it's available, else we use the existing fallback of just returning void *. Reviewed-by: David Geier <geidav.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/92f9750f-c7f6-42d8-9a4a-85a3cbe808f3%40eisentraut.org	2026-02-04 09:22:41 +01:00
Heikki Linnakangas	cd375d5b6d	Remove useless errdetail_abort() I don't understand how to reach errdetail_abort() with MyProc->recoveryConflictPending set. If a recovery conflict signal is received, ProcessRecoveryConflictInterrupt() raises an ERROR or FATAL error to cancel the query or connection, and abort processing clears the flag. The error message from ProcessRecoveryConflictInterrupt() is very clear that the query or connection was terminated because of recovery conflict. The only way to reach it AFAICS is with a race condition, if the startup process sends a recovery conflict signal when the transaction has just entered aborted state for some other reason. And in that case the detail would be misleading, as the transaction was already aborted for some other reason, not because of the recovery conflict. errdetail_abort() was the only user of the recoveryConflictPending flag in PGPROC, so we can remove that and all the related code too. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/4cc13ba1-4248-4884-b6ba-4805349e7f39@iki.fi	2026-02-03 15:08:13 +02:00
Álvaro Herrera	96e2af6050	Reject ADD CONSTRAINT NOT NULL if name mismatches existing constraint When using ALTER TABLE ... ADD CONSTRAINT to add a not-null constraint with an explicit name, we have to ensure that if the column is already marked NOT NULL, the provided name matches the existing constraint name. Failing to do so could lead to confusion regarding which constraint object actually enforces the rule. This patch adds a check to throw an error if the user tries to add a named not-null constraint to a column that already has one with a different name. Reported-by: yanliang lei <msdnchina@163.com> Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de> Co-authored-bu: Srinath Reddy Sadipiralla <srinath2133@gmail.com> Backpatch-through: 18 Discussion: https://postgr.es/m/19351-8f1c523ead498545%40postgresql.org	2026-02-03 12:33:29 +01:00
Peter Eisentraut	955e507668	Change StaticAssertVariableIsOfType to be a declaration This allows moving the uses to more natural and useful positions. Also, a declaration is the more native use of static assertions in C. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/2273bc2a-045d-4a75-8584-7cd9396e5534%40eisentraut.org	2026-02-03 08:46:02 +01:00
Peter Eisentraut	137d05df2f	Rename AssertVariableIsOfType to StaticAssertVariableIsOfType This keeps run-time assertions and static assertions clearly separate. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/2273bc2a-045d-4a75-8584-7cd9396e5534%40eisentraut.org	2026-02-03 08:45:24 +01:00
Tom Lane	da7a1dc0d6	Refactor att_align_nominal() to improve performance. Separate att_align_nominal() into two macros, similarly to what was already done with att_align_datum() and att_align_pointer(). The inner macro att_nominal_alignby() is really just TYPEALIGN(), while att_align_nominal() retains its previous API by mapping TYPALIGN_xxx values to numbers of bytes to align to and then calling att_nominal_alignby(). In support of this, split out tupdesc.c's logic to do that mapping into a publicly visible function typalign_to_alignby(). Having done that, we can replace performance-critical uses of att_align_nominal() with att_nominal_alignby(), where the typalign_to_alignby() mapping is done just once outside the loop. In most places I settled for doing typalign_to_alignby() once per function. We could in many places pass the alignby value in from the caller if we wanted to change function APIs for this purpose; but I'm a bit loath to do that, especially for exported APIs that extensions might call. Replacing a char typalign argument by a uint8 typalignby argument would be an API change that compilers would fail to warn about, thus silently breaking code in hard-to-debug ways. I did revise the APIs of array_iter_setup and array_iter_next, moving the element type attribute arguments to the former; if any external code uses those, the argument-count change will cause visible compile failures. Performance testing shows that ExecEvalScalarArrayOp is sped up by about 10% by this change, when using a simple per-element function such as int8eq. I did not check any of the other loops optimized here, but it's reasonable to expect similar gains. Although the motivation for creating this patch was to avoid a performance loss if we add some more typalign values, it evidently is worth doing whether that patch lands or not. Discussion: https://postgr.es/m/1127261.1769649624@sss.pgh.pa.us	2026-02-02 14:39:50 -05:00
Tom Lane	0c9f46c428	In s_lock.h, use regular labels with %= instead of local labels. Up to now we've used GNU-style local labels for branch targets in s_lock.h's assembly blocks. But there's an alternative style, which I for one didn't know about till recently: use regular assembler labels, and insert a per-asm-block number in them using %= to ensure they are distinct across multiple TAS calls within one source file. gcc has had %= since gcc 2.0, and I've verified that clang knows it too. While the immediate motivation for changing this is that AIX's assembler doesn't do local labels, it seems to me that this is a superior solution anyway. There is nothing mnemonic about "1:", while a regular label can convey something useful, and at least to me it feels less error-prone. Therefore let's standardize on this approach, also converting the one other usage in s_lock.h. Discussion: https://postgr.es/m/399291.1769998688@sss.pgh.pa.us	2026-02-02 11:13:38 -05:00
Michael Paquier	d46aa32ea5	Fix build inconsistency due to the generation of wait-event code The build generates four files based on the wait event contents stored in wait_event_names.txt: - wait_event_types.h - pgstat_wait_event.c - wait_event_funcs_data.c - wait_event_types.sgml The SGML file is generated as part of a documentation build, with its data stored in doc/src/sgml/ for meson and configure. The three others are handled differently for meson and configure: - In configure, all the files are created in src/backend/utils/activity/. A link to wait_event_types.h is created in src/include/utils/. - In meson, all the files are created in src/include/utils/. The two C files, pgstat_wait_event.c and wait_event_funcs_data.c, are then included in respectively wait_event.c and wait_event_funcs.c, without the "utils/" path. For configure, this does not present a problem. For meson, this has to be combined with a trick in src/backend/utils/activity/meson.build, where include_directories needs to point to include/utils/ to make the inclusion of the C files work properly, causing builds to pull in PostgreSQL headers rather than system headers in some build paths, as src/include/utils/ would take priority. In order to fix this issue, this commit reworks the way the C/H files are generated, becoming consistent with guc_tables.inc.c: - For meson, basically nothing changes. The files are still generated in src/include/utils/. The trick with include_directories is removed. - For configure, the files are now generated in src/backend/utils/, with links in src/include/utils/ pointing to the ones in src/backend/. This requires extra rules in src/backend/utils/activity/Makefile so as a make command in this sub-directory is able to work. - The three files now fall under header-stamp, which is actually simpler as guc_tables.inc.c does the same. - wait_event_funcs_data.c and pgstat_wait_event.c are now included with "utils/" in their path. This problem has not been an issue in the buildfarm; it has been noted with AIX and a conflict with float.h. This issue could, however, create conflicts in the buildfarm depending on the environment with unexpected headers pulled in, so this fix is backpatched down to where the generation of the wait-event files has been introduced. While on it, this commit simplifies wait_event_names.txt regarding the paths of the files generated, to mention just the names of the files generated. The paths where the files are generated became incorrect. The path of the SGML path was wrong. This change has been tested in the CI, down to v17. Locally, I have run tests with configure (with and without VPATH), as well as meson, on the three branches. Combo oversight in `fa88928470` and `1e68e43d3f`. Reported-by: Aditya Kamath <aditya.kamath1@ibm.com> Discussion: https://postgr.es/m/LV8PR15MB64888765A43D229EA5D1CFE6D691A@LV8PR15MB6488.namprd15.prod.outlook.com Backpatch-through: 17	2026-02-02 08:02:39 +09:00
Heikki Linnakangas	e2362eb2bd	Move shmem allocator's fields from PGShmemHeader to its own struct For readability. It was a slight modularity violation to have fields in PGShmemHeader that were only used by the allocator code in shmem.c. And it was inconsistent that ShmemLock was nevertheless not stored there. Moving all the allocator-related fields to a separate struct makes it more consistent and modular, and removes the need to allocate and pass ShmemLock separately via BackendParameters. Merge InitShmemAccess() and InitShmemAllocation() into a single function that initializes the struct when called from postmaster, and when called from backends in EXEC_BACKEND mode, re-establishes the global variables. That's similar to all the *ShmemInit() functions that we have. Co-authored-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/CAExHW5uNRB9oT4pdo54qAo025MXFX4MfYrD9K15OCqe-ExnNvg@mail.gmail.com	2026-01-30 18:22:56 +02:00
Andres Freund	87f7b824f2	tableam: Perform CheckXidAlive check once per scan Previously, the CheckXidAlive check was performed within the table_scannext functions. This caused the check to be executed for every fetched tuple, an unnecessary overhead. To fix, move the check to table_beginscan* so it is performed once per scan rather than once per row. Note: table_tuple_fetch_row_version() does not use a scan descriptor; therefore, the CheckXidAlive check is retained in that function. The overhead is unlikely to be relevant for the existing callers. Reported-by: Andres Freund <andres@anarazel.de> Author: Dilip Kumar <dilipbalaut@gmail.com> Suggested-by: Andres Freund <andres@anarazel.de> Suggested-by: Amit Kapila <akapila@postgresql.org> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/tlpltqm5jjwj7mp66dtebwwhppe4ri36vdypux2zoczrc2i3mp%40dhv4v4nikyfg	2026-01-29 17:52:07 -05:00
Tom Lane	bd9dfac8b1	Further fix extended alignment for older g++. Commit `6ceef9408` was still one brick shy of a load, because it caused any usage at all of PGIOAlignedBlock or PGAlignedXLogBlock to fail under older g++. Notably, this broke "headerscheck --cplusplus". We can permit references to these structs as abstract structs though; only actual declaration of such a variable needs to be forbidden. Discussion: https://www.postgresql.org/message-id/3119480.1769189606@sss.pgh.pa.us	2026-01-29 16:16:36 -05:00
Michael Paquier	efbebb4e85	Add support for "mcv" in pg_restore_extended_stats() This commit adds support for the restore of extended statistics of the kind "mcv", aka most-common values. This format is different from n_distinct and dependencies stat types in that it is the combination of three of the four different arrays from the pg_stats_ext view which in turn require three different input parameters on pg_restore_extended_statistics(). These are translated into three input arguments for the function: - "most_common_vals", acting as a leader of the others. It is a 2-dimension array, that includes the common values. - "most_common_freqs", 1-dimension array of float8[], with a number of elements that has to match with "most_common_vals". - "most_common_base_freqs", 1-dimension array of float8[], with a number of elements that has to match with "most_common_vals". All three arrays are required to achieve the restore of this type of extended statistics (if "most_common_vals" happens to be NULL in the catalogs, the rest is NULL by design). Note that "most_common_val_nulls" is not required in input, its data is rebuilt from the decomposition of the "most_common_vals" array based on its text[] representation. The initial versions of the patch provided this option in input, but we do not require it and it simplifies a lot the result. Support in pg_dump is added down to v13 which is where the support for this type of extended statistics has been added, when --statistics is used. This means that upgrade and dumps can restore extended statistics data transparently, like "dependencies", "ndistinct", attribute and relation statistics. For MCV, the values are directly queried from the relevant catalogs. Author: Corey Huinker <corey.huinker@gmail.com> Co-authored-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com	2026-01-29 12:14:08 +09:00
Masahiko Sawada	8f1e2dfe03	Consolidate replication origin session globals into a single struct. This commit moves the separate global variables for replication origin state into a single ReplOriginXactState struct. This groups logically related variables, which improves code readability and simplifies state management (e.g., resetting the state) by handling them as a unit. Author: Chao Li <lic@highgo.com> Suggested-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CAEoWx2=pYvfRthXHTzSrOsf5_FfyY4zJyK4zV2v4W=yjUij1cA@mail.gmail.com	2026-01-28 12:26:22 -08:00
Masahiko Sawada	227eb4eea2	Refactor replication origin state reset helpers. Factor out common logic for clearing replorigin_session_* variables into a dedicated helper function, replorigin_xact_clear(). This removes duplicated assignments of these variables across multiple call sites, and makes the intended scope of each reset explicit. Author: Chao Li <lic@highgo.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CAEoWx2=pYvfRthXHTzSrOsf5_FfyY4zJyK4zV2v4W=yjUij1cA@mail.gmail.com	2026-01-28 11:45:26 -08:00
Masahiko Sawada	1fdbca159e	Standardize replication origin naming to use "ReplOrigin". The replication origin code was using inconsistent naming conventions. Functions were typically prefixed with 'replorigin', while typedefs and constants used "RepOrigin". This commit unifies the naming convention by renaming RepOriginId to ReplOriginId. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAD21AoBDgm3hDqUZ+nqu=ViHmkCnJBuJyaxG_yvv27BAi2zBmQ@mail.gmail.com	2026-01-28 11:03:29 -08:00
Robert Haas	4020b370f2	Allow for plugin control over path generation strategies. Each RelOptInfo now has a pgs_mask member which is a mask of acceptable strategies. For most rels, this is populated from PlannerGlobal's default_pgs_mask, which is computed from the values of the enable_* GUCs at the start of planning. For baserels, get_relation_info_hook can be used to adjust pgs_mask for each new RelOptInfo, at least for rels of type RTE_RELATION. Adjusting pgs_mask is less useful for other types of rels, but if it proves to be necessary, we can revisit the way this hook works or add a new one. For joinrels, two new hooks are added. joinrel_setup_hook is called each time a joinrel is created, and one thing that can be done from that hook is to manipulate pgs_mask for the new joinrel. join_path_setup_hook is called each time we're about to add paths to a joinrel by considering some particular combination of an outer rel, an inner rel, and a join type. It can modify the pgs_mask propagated into JoinPathExtraData to restrict strategy choice for that particular combination of rels. To make joinrel_setup_hook work as intended, the existing calls to build_joinrel_partition_info are moved later in the calling functions; this is because that function checks whether the rel's pgs_mask includes PGS_CONSIDER_PARTITIONWISE, so we want it to only be called after plugins have had a chance to alter pgs_mask. Upper rels currently inherit pgs_mask from the input relation. It's unclear that this is the most useful behavior, but at the moment there are no hooks to allow the mask to be set in any other way. Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Greg Burd <greg@burd.me> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Haibo Yan <tristan.yim@gmail.com> Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com	2026-01-28 11:28:34 -05:00
Amit Kapila	851f6649cc	Prevent invalidation of newly synced replication slots. A race condition could cause a newly synced replication slot to become invalidated between its initial sync and the checkpoint. When syncing a replication slot to a standby, the slot's initial restart_lsn is taken from the publisher's remote_restart_lsn. Because slot sync happens asynchronously, this value can lag behind the standby's current redo pointer. Without any interlocking between WAL reservation and checkpoints, a checkpoint may remove WAL required by the newly synced slot, causing the slot to be invalidated. To fix this, we acquire ReplicationSlotAllocationLock before reserving WAL for a newly synced slot, similar to commit `006dd4b2e5`. This ensures that if WAL reservation happens first, the checkpoint process must wait for slotsync to update the slot's restart_lsn before it computes the minimum required LSN. However, unlike in ReplicationSlotReserveWal(), this lock alone cannot protect a newly synced slot if a checkpoint has already run CheckPointReplicationSlots() before slotsync updates the slot. In such cases, the remote restart_lsn may be stale and earlier than the current redo pointer. To prevent relying on an outdated LSN, we use the oldest WAL location available if it is greater than the remote restart_lsn. This ensures that newly synced slots always start with a safe, non-stale restart_lsn and are not invalidated by concurrent checkpoints. Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Vitaly Davydov <v.davydov@postgrespro.ru> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Backpatch-through: 17 Discussion: https://postgr.es/m/TY4PR01MB16907E744589B1AB2EE89A31F94D7A%40TY4PR01MB16907.jpnprd01.prod.outlook.com	2026-01-27 05:06:29 +00:00
Melanie Plageman	648a7e28d7	Eliminate use of cached VM value in lazy_scan_prune() lazy_scan_prune() takes a parameter from lazy_scan_heap() indicating whether the page was marked all-visible in the VM at the time it was last checked in find_next_unskippable_block(). This behavior is historical, dating back to commit `608195a3a3`, when we did not pin the VM page until deciding we must read it. Now that the VM page is already pinned, there is no meaningful benefit to relying on a cached VM status. Removing this cached value simplifies the logic in both lazy_scan_heap() and lazy_scan_prune(). It also clarifies future work that will set the visibility map on-access: such paths will not have a cached value available, which would make the logic harder to reason about. And eliminating it enables us to detect and repair VM corruption on-access. Along with removing the cached value and unconditionally checking the visibility status of the heap page, this commit also moves the VM corruption handling to occur first. This reordering should have no performance impact, since the checks are inexpensive and performed only once per page. It does, however, make the control flow easier to understand. The new restructuring also makes it possible to set the VM after fixing corruption (if pruning found the page all-visible). Now that no callers of visibilitymap_set() use its return value, change its (and visibilitymap_set_vmbits()) return type to void. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com	2026-01-26 17:00:13 -05:00
Peter Eisentraut	5ca5f12c2c	Fix accidentally cast away qualifiers This fixes cases where a qualifier (const, in all cases here) was dropped by a cast, but the cast was otherwise necessary or desirable, so the straightforward fix is to add the qualifier into the cast. Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b04f4d3a-5e70-4e73-9ef2-87f777ca4aac%40eisentraut.org	2026-01-26 16:02:31 +01:00
Peter Eisentraut	6ceef9408c	Disable extended alignment uses on older g++ Fix for commit `a9bdb63bba`. The previous plan of redefining alignas didn't work, because it interfered with other C++ header files (e.g., LLVM). So now the new workaround is to just disable the affected typedefs under the affected compilers. These are not typically used in extensions anyway. Discussion: https://www.postgresql.org/message-id/3119480.1769189606%40sss.pgh.pa.us	2026-01-26 10:23:14 +01:00
Michael Paquier	0e80f3f88d	Add pg_restore_extended_stats() This function closely mirror its relation and attribute counterparts, but for extended statistics (i.e. CREATE STATISTICS) objects, being able to restore extended statistics for an extended stats object. Like the other functions, the goal of this feature is to ease the dump or upgrade of clusters so as ANALYZE would not be required anymore after these operations, stats being directly loaded into the target cluster without any post-dump/upgrade computation. The caller of this function needs the following arguments for the extended stats to restore: - The name of the relation. - The schema name of the relation. - The name of the extended stats object. - The schema name of the extended stats object. - If the stats are inherited or not. - One or more extended stats kind with its data. This commit adds only support for the restore of the extended statistics kind "n_distinct", building the basic infrastructure for the restore of more extended statistics kinds in follow-up commits, including MVC and dependencies. The support for "n_distinct" is eased in this commit thanks to the previous work done particularly in commits `1f927cce44` and `44eba8f06e`, that have added the input function for the type pg_ndistinct, used as data type in input of this new restore function. Bump catalog version. Author: Corey Huinker <corey.huinker@gmail.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com	2026-01-26 15:08:15 +09:00
Michael Paquier	c100340729	Remove PG_MMAP_FLAGS from mem.h Based on name of the macro, it was implied that it could be used for all mmap() calls on portability grounds. However, its use is limited to sysv_shmem.c, for CreateAnonymousSegment(). This commit removes the declaration, reducing the confusion around it as a portability tweak, being limited to SysV-style shared memory. This macro has been introduced in `b0fc0df936` for sysv_shmem.c, originally. It has been moved to mem.h in `0ac5e5a7e1` a bit later. Suggested by: Peter Eisentraut <peter@eisentraut.org> Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CAExHW5vTWABxuM5fbQcFkGuTLwaxuZDEE2vtx2WuMUWk6JnF4g@mail.gmail.com Discussion: https://postgr.es/m/12add41a-7625-4639-a394-a5563e349322@eisentraut.org	2026-01-26 10:52:02 +09:00
Peter Eisentraut	a9bdb63bba	Work around buggy alignas in older g++ Older g++ (<9.3) mishandle the alignas specifier (raise warnings that the alignment is too large), but the more or less equivalent attribute works. So as a workaround, #define alignas to that attribute for those versions. see <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89357> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/3119480.1769189606%40sss.pgh.pa.us	2026-01-25 11:32:47 +01:00
Dean Rasheed	b4307ae2e5	Fix trigger transition table capture for MERGE in CTE queries. When executing a data-modifying CTE query containing MERGE and some other DML operation on a table with statement-level AFTER triggers, the transition tables passed to the triggers would fail to include the rows affected by the MERGE. The reason is that, when initializing a ModifyTable node for MERGE, MakeTransitionCaptureState() would create a TransitionCaptureState structure with a single "tcs_private" field pointing to an AfterTriggersTableData structure with cmdType == CMD_MERGE. Tuples captured there would then not be included in the sets of tuples captured when executing INSERT/UPDATE/DELETE ModifyTable nodes in the same query. Since there are no MERGE triggers, we should only create AfterTriggersTableData structures for INSERT/UPDATE/DELETE. Individual MERGE actions should then use those, thereby sharing the same capture tuplestores as any other DML commands executed in the same query. This requires changing the TransitionCaptureState structure, replacing "tcs_private" with 3 separate pointers to AfterTriggersTableData structures, one for each of INSERT, UPDATE, and DELETE. Nominally, this is an ABI break to a public structure in commands/trigger.h. However, since this is a private field pointing to an opaque data structure, the only way to create a valid TransitionCaptureState is by calling MakeTransitionCaptureState(), and no extensions appear to be doing that anyway, so it seems safe for back-patching. Backpatch to v15, where MERGE was introduced. Bug: #19380 Reported-by: Daniel Woelfel <dwwoelfel@gmail.com> Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/19380-4e293be2b4007248%40postgresql.org Backpatch-through: 15	2026-01-24 11:30:48 +00:00
Jacob Champion	f7521bf721	pqcomm.h: Explicitly reserve protocol v3.1 Document this unused version alongside the other special protocol numbers. Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAOYmi%2BkKyw%3Dh-5NKqqpc7HC5M30_QmzFx3kgq2AdipyNj47nUw%40mail.gmail.com	2026-01-23 12:57:15 -08:00
Michael Paquier	a36164e746	Add WALRCV_CONNECTING state to the WAL receiver Previously, a WAL receiver freshly started would set its state to WALRCV_STREAMING immediately at startup, before actually establishing a replication connection. This commit introduces a new state called WALRCV_CONNECTING, which is the state used when the WAL receiver freshly starts, or when a restart is requested, with a switch to WALRCV_STREAMING once the connection to the upstream server has been established with COPY_BOTH, meaning that the WAL receiver is ready to stream changes. This change is useful for monitoring purposes, especially in environments with a high latency where a connection could take some time to be established, giving some room between the [re]start phase and the streaming activity. From the point of view of the startup process, that flips the shared memory state of the WAL receiver when it needs to be stopped, the existing WALRCV_STREAMING and the new WALRCV_CONNECTING states have the same semantics: the WAL receiver has started and it can be stopped. Based on an initial suggestion from Noah Misch, with some input from me about the design. Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Rahila Syed <rahilasyed90@gmail.com> Discussion: https://postgr.es/m/CABPTF7VQ5tGOSG5TS-Cg+Fb8gLCGFzxJ_eX4qg+WZ3ZPt=FtwQ@mail.gmail.com	2026-01-23 14:17:28 +09:00
Álvaro Herrera	69f98fce5b	Make some use of anonymous unions [reloptions] In the spirit of commit `4b7e6c73b0` and following, which see for more details; it appears to have been quite an uncontroversial C11 feature to use and it makes the code nicer to read. This commit changes the relopt_value struct. Author: Peter Eisentraut <peter@eisentraut.org> Author: Álvaro Herrera <alvherre@kurilemu.de> Note: Yes, this was written twice independently. Discussion: https://postgr.es/m/202601192106.zcdi3yu2gzti@alvherre.pgsql	2026-01-22 17:04:59 +01:00
Peter Eisentraut	c257ba8397	Record range constructor functions in pg_range When a range type is created, several construction functions are also created, two for the range type and three for the multirange type. These have an internal dependency, so they "belong" to the range type. But there was no way to identify those functions when given a range type. An upcoming patch needs access to the two- or possibly the three-argument range constructor function for a given range type. The only way to do that would be with fragile workarounds like matching names and argument types. The correct way to do that kind of thing is to record to the links in the system catalogs. This is what this patch does, it records the OIDs of these five constructor functions in the pg_range catalog. (Currently, there is no code that makes use of this.) Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://www.postgresql.org/message-id/7d63ddfa-c735-4dfe-8c7a-4f1e2a621058%40eisentraut.org	2026-01-22 15:56:29 +01:00
Nathan Bossart	25dc485074	Refactor some SIMD and popcount macros. This commit does the following: * Removes TRY_POPCNT_X86_64. We now assume that the required CPUID intrinsics are available when HAVE_X86_64_POPCNTQ is defined, as we have done since v16 for meson builds when USE_SSE42_CRC32C_WITH_RUNTIME_CHECK is defined and since v17 when USE_AVX512_POPCNT_WITH_RUNTIME_CHECK is defined. * Moves the MSVC check for HAVE_X86_64_POPCNTQ to configure-time. This way, we set it for all relevant platforms in one place. * Moves the #defines for USE_SSE2 and USE_NEON to c.h so that they can be used elsewhere without including simd.h. Consequently, we can remove the POPCNT_AARCH64 macro. * Moves the #includes for pg_bitutils.h to below the system headers in pg_popcount_{aarch64,x86}.c, since we no longer depend on macros from pg_bitutils.h to decide which system headers to use. Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/aWf_InS1VrbeXAfP%40nathan	2026-01-21 14:21:00 -06:00
Nathan Bossart	8c6653516c	Rename "fast" and "slow" popcount functions. Since we now have several implementations of the popcount functions, let's give them more descriptive names. This commit replaces "slow" with "portable" and "fast" with "sse42". While the POPCNT instruction is technically not part of SSE4.2, this naming scheme is close enough in practice and is arguably easier to understand than using "popcnt" instead. Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/aWf_InS1VrbeXAfP%40nathan	2026-01-21 14:21:00 -06:00
Nathan Bossart	79e232ca01	Move x86-64-specific popcount code to pg_popcount_x86.c. This moves the remaining x86-64-specific popcount implementations in pg_bitutils.c to pg_popcount_x86.c. Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/aWf_InS1VrbeXAfP%40nathan	2026-01-21 14:21:00 -06:00
Tom Lane	4576208454	Force standard_conforming_strings to always be ON. Continuing to support this backwards-compatibility feature has nontrivial costs; in particular it is potentially a security hazard if an application somehow gets confused about which setting the server is using. We changed the default to ON fifteen years ago, which seems like enough time for applications to have adapted. Let's remove support for the legacy string syntax. We should not remove the GUC altogether, since client-side code will still test it, pg_dump scripts will attempt to set it to ON, etc. Instead, just prevent it from being set to OFF. There is precedent for this approach (see commit `de66987ad`). This patch does remove the related GUC escape_string_warning, however. That setting does nothing when standard_conforming_strings is on, so it's now useless. We could leave it in place as a do-nothing setting to avoid breaking clients that still set it, if there are any. But it seems likely that any such client is also trying to turn off standard_conforming_strings, so it'll need work anyway. The client-side changes in this patch are pretty minimal, because even though we are dropping the server's support, most of our clients still need to be able to talk to older server versions. We could remove dead client code only once we disclaim compatibility with pre-v19 servers, which is surely years away. One change of note is that pg_dump/pg_dumpall now set standard_conforming_strings = on in their source session, rather than accepting the source server's default. This ensures that literals in view definitions and such will be printed in a way that's acceptable to v19+. In particular, pg_upgrade will work transparently even if the source installation has standard_conforming_strings = off. (However, pg_restore will behave the same as before if given an archive file containing standard_conforming_strings = off. Such an archive will not be safely restorable into v19+, but we shouldn't break the ability to extract valid data from it for use with an older server.) Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/3279216.1767072538@sss.pgh.pa.us	2026-01-21 15:08:38 -05:00
Álvaro Herrera	4d6a66f675	Allow Boolean reloptions to have ternary values From the user's point of view these are just Boolean values; from the implementation side we can now distinguish an option that hasn't been set. Reimplement the vacuum_truncate reloption using this type. This could also be used for reloptions vacuum_index_cleanup and buffering, but those additionally need a per-option "alias" for the state where the variable is unset (currently the value "auto"). Author: Nikolay Shaplov <dhyan@nataraj.su> Reviewed-by: Timur Magomedov <t.magomedov@postgrespro.ru> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://postgr.es/m/3474141.usfYGdeWWP@thinkpad-pgpro	2026-01-21 20:06:01 +01:00
Tom Lane	cec5fe0d1e	Remove useless flag PVC_INCLUDE_CONVERTROWTYPES. This was introduced in the SJE patch (`fc069a3a6`), but it doesn't do anything because pull_var_clause() never tests it. Apparently it snuck in from somebody's private fork. Remove it again, but only in HEAD -- seems best to let it be in v18. Author: Alexander Pyhalov <a.pyhalov@postgrespro.ru> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/70008c19d22e3dd1565ca57f8436c0ba@postgrespro.ru	2026-01-21 13:26:19 -05:00
Peter Eisentraut	b4555cb070	Fix for C++ compatibility After commit `476b35d4e3`, some buildfarm members are complaining about not recognizing _Noreturn when building the new C++ module test_cplusplusext. This is not a C++ feature, but it was gated like #if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L #define pg_noreturn _Noreturn But apparently that was not sufficient. Some platforms define __STDC_VERSION__ even in C++ mode. (In this particular case, it was g++ on Solaris, but apparently this is also done by some other platforms, and it is allowed by the C++ standard.) To fix, add a ... && !defined(__cplusplus) Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/CAGECzQR21OnnKiZO_1rLWO0-16kg1JBxnVq-wymYW0-_1cUNtg@mail.gmail.com	2026-01-21 08:54:35 +01:00
John Naylor	7892e25924	Update some comments for fasthash - Add advice about hashing multiple inputs with the incremental API - Generalize statements that were specific to C strings to include all variable length inputs, where applicable. - Update comments about the standalone functions and make it easy to find them. Reported-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: zengman <zengman@halodbtech.com> Discussion: https://postgr.es/m/CANWCAZZgKnf8dNOd_w03n88NqOfmMnMv2=D8_Oy6ADGyiMq+cg@mail.gmail.com Discussion: https://postgr.es/m/CANWCAZa-2mEUY27xBw2TpsybpvVu3Ez4ABrHCBqZpAs_UDTj2Q@mail.gmail.com	2026-01-21 14:11:40 +07:00
Amit Kapila	48efefa6ca	Improve errdetail for logical replication conflict messages. This change enhances the clarity and usefulness of error detail messages generated during logical replication conflicts. The following improvements have been made: 1. Eliminate redundant output: Avoid printing duplicate remote row and replica identity values for the multiple_unique_conflicts conflict type. 2. Improve message structure: Append tuple values directly to the main error message, separated by a colon (:), for better readability. 3. Simplify local row terminology: Remove the word 'existing' when referring to the local row, as this is already implied by context. 4. General code refinements: Apply miscellaneous code cleanups to improve how conflict detail messages are constructed and formatted. Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Shveta Malik <shveta.malik@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Discussion: https://postgr.es/m/CAHut+Psgkwy5-yGRJC15izecySGGysrbCszv_z93ess8XtCDOQ@mail.gmail.com	2026-01-21 04:58:03 +00:00
Michael Paquier	7ebb64c557	Add routine to free MCVList This addition is in the same spirit as `32e27bd320` for MVNDistinct and MVDependencies, except that we were missing a free routine for the third type of extended statistics, MCVList. I was not sure if we needed an equivalent for MCVList, but after more review of the main patch set for the import of extended statistics, it has become clear that we do. This is introduced as its own change as this routine can be useful on its own. This one is a piece that has not been written by Corey Huinker, I have just noticed it by myself on the way. Author: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com	2026-01-20 13:13:47 +09:00
Michael Paquier	d756fa1019	Add pg_clear_extended_stats() This function is able to clear the data associated to an extended statistics object, making things so as the object looks as newly-created. The caller of this function needs the following arguments for the extended stats to clear: - The name of the relation. - The schema name of the relation. - The name of the extended stats object. - The schema name of the extended stats object. - If the stats are inherited or not. The first two parameters are especially important to ensure a consistent lookup and ACL checks for the relation on which is based the extended stats object that will be cleared, relying first on a RangeVar lookup where permissions are checked without locking a relation, critical to prevent denial-of-service attacks when using this kind of function (see also `688dc6299a` for a similar concern). The third to fifth arguments give a way to target the extended stats records to clear. This has been extracted from a larger patch by the same author, for a piece which is again useful on its own. I have rewritten large portions of it. The tests have been extended while discussing this piece, resulting on what this commit includes. The intention behind this feature is to add support for the import of extended statistics across dumps and upgrades, this change building one piece that we will be able to rely on for the rest of the changes. Bump catalog version. Author: Corey Huinker <corey.huinker@gmail.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com	2026-01-16 08:13:30 +09:00
Andres Freund	d40fd85187	lwlock: Remove support for disowned lwlwocks This reverts commit `f8d7f29b3e`, plus parts of subsequent commits fixing a typo in a parameter name. Support for disowned lwlocks was added for the benefit of AIO, to be able to have content locks "owned" by the AIO subsystem. But as of commit `fcb9c977aa`, content locks do not use lwlocks anymore. It does not seem particularly likely that we need this facility outside of the AIO use-case, therefore remove the now unused functions. I did choose to keep the comment added in the aforementioned commit about lock->owner intentionally being left pointing to the last owner. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/cj5mcjdpucvw4a54hehslr3ctukavrbnxltvuzzhqnimvpju5e@cy3g3mnsefwz	2026-01-15 14:57:45 -05:00
Andres Freund	55fbfb738b	lwlock: Remove ForEachLWLockHeldByMe As of commit `fcb9c977aa`, ForEachLWLockHeldByMe(), introduced in `f4ece891fc`, is not used anymore, as content locks are now implemented in bufmgr.c. It doesn't seem that likely that a new user of the functionality will appear all that soon, making removal of the function seem like the most sensible path. It can easily be added back if necessary. Discussion: https://postgr.es/m/lneuyxqxamqoayd2ntau3lqjblzdckw6tjgeu4574ezwh4tzlg%40noioxkquezdw	2026-01-15 14:57:45 -05:00
Andres Freund	fcb9c977aa	bufmgr: Implement buffer content locks independently of lwlocks Until now buffer content locks were implemented using lwlocks. That has the obvious advantage of not needing a separate efficient implementation of locks. However, the time for a dedicated buffer content lock implementation has come: 1) Hint bits are currently set while holding only a share lock. This leads to having to copy pages while they are being written out if checksums are enabled, which is not cheap. We would like to add AIO writes, however once many buffers can be written out at the same time, it gets a lot more expensive to copy them, particularly because that copy needs to reside in shared buffers (for worker mode to have access to the buffer). In addition, modifying buffers while they are being written out can cause issues with unbuffered/direct-IO, as some filesystems (like btrfs) do not like that, due to filesystem internal checksums getting corrupted. The solution to this is to require a new share-exclusive lock-level to set hint bits and to write out buffers, making those operations mutually exclusive. We could introduce such a lock-level into the generic lwlock implementation, however it does not look like there would be other users, and it does add some overhead into important code paths. 2) For AIO writes we need to be able to race-freely check whether a buffer is undergoing IO and whether an exclusive lock on the page can be acquired. That is rather hard to do efficiently when the buffer state and the lock state are separate atomic variables. This is a major hindrance to allowing writes to be done asynchronously. 3) Buffer locks are by far the most frequently taken locks. Optimizing them specifically for their use case is worth the effort. E.g. by merging content locks into buffer locks we will be able to release a buffer lock and pin in one atomic operation. 4) There are more complicated optimizations, like long-lived "super pinned & locked" pages, that cannot realistically be implemented with the generic lwlock implementation. Therefore implement content locks inside bufmgr.c. The lockstate is stored as part of BufferDesc.state. The implementation of buffer content locks is fairly similar to lwlocks, with a few important differences: 1) An additional lock-level share-exclusive has been added. This lock-level conflicts with exclusive locks and itself, but not share locks. 2) Error recovery for content locks is implemented as part of the already existing private-refcount tracking mechanism in combination with resowners, instead of a bespoke mechanism as the case for lwlocks. This means we do not need to add dedicated error-recovery code paths to release all content locks (like done with LWLockReleaseAll() for lwlocks). 3) The lock state is embedded in BufferDesc.state instead of having its own struct. 4) The wakeup logic is a tad more complicated due to needing to support the additional lock-level This commit unfortunately introduces some code that is very similar to the code in lwlock.c, however the code is not equivalent enough to easily merge it. The future wins that this commit makes possible seem worth the cost. As of this commit nothing uses the new share-exclusive lock mode. It will be used in a future commit. It seemed too complicated to introduce the lock-level in a separate commit. It's worth calling out one wart in this commit: Despite content locks not being lwlocks anymore, they continue to use PGPROC->lw* - that seemed better than duplicating the relevant infrastructure. Another thing worth pointing out is that, after this change, content locks are not reported as LWLock wait events anymore, but as new wait events in the "Buffer" wait event class (see also `6c5c393b74`). The old BufferContent lwlock tranche has been removed. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Heikki Linnakangas <heikki.linnakangas@iki.fi> Reviewed-by: Greg Burd <greg@burd.me> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2026-01-15 14:26:53 -05:00
Andres Freund	dac328c8a6	bufmgr: Change BufferDesc.state to be a 64-bit atomic This is motivated by wanting to merge buffer content locks into BufferDesc.state in a future commit, rather than having a separate lwlock (see commit `c75ebc657f` for more details). As this change is rather mechanical, it seems to make sense to split it out into a separate commit, for easier review. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2026-01-15 14:20:41 -05:00
Tom Lane	282b1cde9d	Optimize LISTEN/NOTIFY via shared channel map and direct advancement. This patch reworks LISTEN/NOTIFY to avoid waking backends that have no need to process the notification messages we just sent. The primary change is to create a shared hash table that tracks which processes are listening to which channels (where a "channel" is defined by a database OID and channel name). This allows a notifying process to accurately determine which listeners are interested, replacing the previous weak approximation that listeners in other databases couldn't be interested. Secondly, if a listener is known not to be interested and is currently stopped at the old queue head, we avoid waking it at all and just directly advance its queue pointer past the notifications we inserted. These changes permit very significant improvements (integer multiples) in NOTIFY throughput, as well as a noticeable reduction in latency, when there are many listeners but only a few are interested in any specific message. There is no improvement for the simplest case where every listener reads every message, but any loss seems below the noise level. Author: Joel Jacobson <joel@compiler.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/6899c044-4a82-49be-8117-e6f669765f7e@app.fastmail.com	2026-01-15 14:12:15 -05:00
Álvaro Herrera	35e3fae738	Remove #include <math.h> where not needed Liujinyang reported the one in binaryheap.c, I then found and analyzed the rest. For future patches, we require git archaelogical analysis before we accept patches of this nature. Co-authored-by: liujinyang <21043272@qq.com> Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/tencent_6B302BFCAF6F010E00AB5C2C0ECB7AA3F205@qq.com	2026-01-15 19:09:47 +01:00
Heikki Linnakangas	d9c3c94365	Wake up autovacuum launcher from postmaster when a worker exits When an autovacuum worker exits, the launcher needs to be notified with SIGUSR2, so that it can rebalance and possibly launch a new worker. The launcher must be notified only after the worker has finished ProcKill(), so that the worker slot is available for a new worker. Before this commit, the autovacuum worker was responsible for that, which required a slightly complicated dance to pass the launcher's PID from FreeWorkerInfo() to ProcKill() in a global variable. Simplify that by moving the responsibility of the signaling to the postmaster. The postmaster was already doing it when it failed to fork a worker process, so it seems logical to make it responsible for notifying the launcher on worker exit too. That's also how the notification on background worker exit is done. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: li carol <carol.li2025@outlook.com> Discussion: https://www.postgresql.org/message-id/a5e27d25-c7e7-45d5-9bac-a17c8f462def@iki.fi	2026-01-15 18:02:25 +02:00
Heikki Linnakangas	c4b71e6f60	Remove some unnecessary code from multixact truncation With 64-bit multixact offsets, PerformMembersTruncation() doesn't need the starting offset anymore. The 'oldestOffset' value that TruncateMultiXact() calculates is no longer used for anything. Remove it, and the code to calculate it. 'oldestOffset' was included in the WAL record as 'startTruncMemb', which sounds nice if you e.g. look at the WAL with pg_waldump, but it was also confusing because we didn't actually use the value for determining what to truncate. Replaying the WAL would remove all segments older than 'endTruncMemb', regardless of 'startTruncMemb'. The 'startTruncOff' stored in the WAL record was similarly unnecessary even before 64-bit multixid offsets, it was stored just for the sake of symmetry with 'startTruncMemb'. Remove both from the WAL record, and rename the remaining 'endTruncOff' to 'oldestMulti' and 'endTruncMemb' to 'oldestOffset', for consistency with the variable names used for them in other places. Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Discussion: https://www.postgresql.org/message-id/000301b2-5b81-4938-bdac-90f6eb660843@iki.fi	2026-01-15 13:34:50 +02:00
Michael Paquier	32e27bd320	Introduce routines to validate and free MVNDistinct and MVDependencies These routines are useful to perform some basic validation checks on each object structure, working currently on attribute numbers for non-expression and expression attnums. These checks could be extended in the future. Note that this code is not used yet in the tree, and that these functions will become handy for an upcoming patch for the import of extended statistics data. However, they are worth their own independent change as they are actually useful by themselves, with at least the extension code argument in mind (or perhaps I am just feeling more pedantic today). Extracted from a larger patch by the same author, with many adjustments and fixes by me. Author: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com	2026-01-15 09:36:05 +09:00
Peter Eisentraut	fa16e7fd84	Revert "Replace pg_restrict by standard restrict" This reverts commit `f0f2c0c1ae`. The original problem that led to the use of pg_restrict was that MSVC couldn't handle plain restrict, and defining it to something else would conflict with its __declspec(restrict) that is used in system header files. In C11 mode, this is no longer a problem, as MSVC handles plain restrict. This led to the commit to replace pg_restrict with restrict. But this did not take C++ into account. Standard C++ does not have restrict, so we defined it as something else (for example, MSVC supports __restrict). But this then again conflicts with __declspec(restrict) in system header files. So we have to revert this attempt. The comments are updated to clarify that the reason for this is now C++ only. Reported-by: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/CAGECzQRoD7chJP1-dneSrhxUJv%2BBRcigoGOO4UwGzaShLot2Yw%40mail.gmail.com	2026-01-14 15:12:25 +01:00
Andres Freund	ff219c1987	bufmgr: Make definitions related to buffer descriptor easier to modify This is in preparation to widening the buffer state to 64 bits, which in turn is preparation for implementing content locks in bufmgr. This commit aims to make the subsequent commits a bit easier to review, by separating out reformatting etc from the actual changes. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/4csodkvvfbfloxxjlkgsnl2lgfv2mtzdl7phqzd4jxjadxm4o5@usw7feyb5bzf	2026-01-13 19:38:29 -05:00
Michael Paquier	e217dc7484	Fix query jumbling with GROUP BY clauses RangeTblEntry.groupexprs was marked with the node attribute query_jumble_ignore, causing a list of GROUP BY expressions to be ignored during the query jumbling. For example, these two queries could be grouped together within the same query ID: SELECT count() FROM t GROUP BY a; SELECT count() FROM t GROUP BY b; However, as such queries use different GROUP BY clauses, they should be split across multiple entries. This fixes an oversight in `247dea89f7`, that has introduced an RTE for GROUP BY clauses. Query IDs are documented as being stable across minor releases, but as this is a regression new to v18 and that we are still early in its support cycle, a backpatch is exceptionally done as this has broken a behavior that exists since query jumbling is supported in core, since its introduction in pg_stat_statements. The tests of pg_stat_statements are expanded to cover this area, with patterns involving GROUP BY and GROUPING clauses. Author: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxEy2W+tCqC7XuJ94r3ivWsM=onKJp94kRFx3hoARjBeFQ@mail.gmail.com Backpatch-through: 18	2026-01-14 08:44:12 +09:00
Andres Freund	0b96e734c5	heapam: Add batch mode mvcc check and use it in page mode There are two reasons for doing so: 1) It is generally faster to perform checks in a batched fashion and making sequential scans faster is nice. 2) We would like to stop setting hint bits while pages are being written out. The necessary locking becomes visible for page mode scans, if done for every tuple. With batching, the overhead can be amortized to only happen once per page. There are substantial further optimization opportunities along these lines: - Right now HeapTupleSatisfiesMVCCBatch() simply uses the single-tuple HeapTupleSatisfiesMVCC(), relying on the compiler to inline it. We could instead write an explicitly optimized version that avoids repeated xid tests. - Introduce batched version of the serializability test - Introduce batched version of HeapTupleSatisfiesVacuum Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/6rgb2nvhyvnszz4ul3wfzlf5rheb2kkwrglthnna7qhe24onwr@vw27225tkyar	2026-01-12 13:22:04 -05:00
Álvaro Herrera	225d1df1d2	Stop including {brin,gin}_tuple.h in tuplesort.h Doing this meant that those two headers, which are supposed to be internal to their corresponding index AMs, were being included pretty much universally, because tuplesort.h is included by execnodes.h which is very widely used. Stop that, and fix fallout. We also change indexing.h to no longer include execnodes.h (tuptable.h is sufficient), and relscan.h to no longer include buf.h (pointless since `c2fe139c20`). Author: Mario González <gonzalemario@gmail.com> Discussion: https://postgr.es/m/CAFsReFUcBFup=Ohv_xd7SNQ=e73TXi8YNEkTsFEE2BW7jS1noQ@mail.gmail.com	2026-01-12 18:09:49 +01:00
Álvaro Herrera	2defd00062	Move instrumentation-related structs to instrument_node.h Some structs and enums related to parallel query instrumentation had organically grown scattered across various files, and were causing header pollution especially through execnodes.h. Create a single file where they can live together. This only moves the structs to the new file; cleaning up the pollution by removing no-longer-necessary cross-header inclusion will be done in future commits. Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de> Co-authored-by: Mario González <gonzalemario@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/202510051642.wwmn4mj77wch@alvherre.pgsql Discussion: https://postgr.es/m/CAFsReFUr4KrQ60z+ck9cRM4WuUw1TCghN7EFwvV0KvuncTRc2w@mail.gmail.com	2026-01-12 16:59:28 +01:00
Andres Freund	e5a5e0a907	instrumentation: Keep time fields as instrtime, convert in callers Previously the instrumentation logic always converted to seconds, only for many of the callers to do unnecessary division to get to milliseconds. As an upcoming refactoring will split the Instrumentation struct, utilize instrtime always to keep things simpler. It's also a bit faster to not have to first convert to a double in functions like InstrEndLoop(), InstrAggNode(). Author: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAP53PkzZ3UotnRrrnXWAv=F4avRq9MQ8zU+bxoN9tpovEu6fGQ@mail.gmail.com	2026-01-09 13:38:00 -05:00
Heikki Linnakangas	bba81f9d3d	Inline ginCompareAttEntries for speed It is called in tight loops during GIN index build. Author: David Geier <geidav.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/5d366878-2007-4d31-861e-19294b7a583b@gmail.com	2026-01-09 20:31:43 +02:00
Peter Eisentraut	69d76fb2ab	Decouple C++ support in Meson's PGXS from LLVM enablement This is important for Postgres extensions that are written in C++, such as pg_duckdb, which uses PGXS as the build system currently. In the autotools build, C++ is not coupled to LLVM. If the autotools build is configured without --with-llvm, the C++ compiler and the various flags get persisted into the Makefile.global. Author: Tristan Partin <tristan@partin.io> Author: Jelte Fennema-Nio <postgres@jeltef.nl> Discussion: https://www.postgresql.org/message-id/flat/D98JHQF7H2A8.VSE3I4CJBTAB%40partin.io	2026-01-09 10:25:02 +01:00
Tom Lane	b352d3d80b	Mark GiST inet_ops as opcdefault, and deal with ensuing fallout. This patch completes the transition to making inet_ops be default for inet/cidr columns, rather than btree_gist's opclasses. Once we do that, though, pg_upgrade has a big problem. A dump from an older version will see btree_gist's opclasses as being default, so it will not mention the opclass explicitly in CREATE INDEX commands, which would cause the restore to create the indexes using inet_ops. Since that's not compatible with what's actually in the files, havoc would ensue. This isn't readily fixable, because the CREATE INDEX command strings are built by the older server's pg_get_indexdef() function; pg_dump hasn't nearly enough knowledge to modify those strings successfully. Even if we cared to put in the work to make that happen in pg_dump, it would be counterproductive because the end goal here is to get people off of these opclasses. Allowing such indexes to persist through pg_upgrade wouldn't advance that goal. Therefore, this patch just adds code to pg_upgrade to detect indexes that would be problematic and refuse to upgrade. There's another issue too: even without any indexes to worry about, pg_dump in binary-upgrade mode will reproduce the "CREATE OPERATOR CLASS ... DEFAULT" commands for btree_gist's opclasses, and those will fail because now we have a built-in opclass that provides a conflicting default. We could ask users to drop the btree_gist extension altogether before upgrading, but that would carry very severe penalties. It would affect perfectly-valid indexes for other data types, and it would drop operators that might be relied on in views or other database objects. Instead, put a hack in DefineOpClass to ignore the DEFAULT clauses for these opclasses when in binary-upgrade mode. This will result in installing a version of btree_gist that isn't quite the version it claims to be, but that can be fixed by issuing ALTER EXTENSION UPDATE afterwards. Since we don't apply that hack when not in binary-upgrade mode, it is now impossible to install any version of btree_gist less than 1.9 via CREATE EXTENSION. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/2483812.1754072263@sss.pgh.pa.us	2026-01-08 14:03:56 -05:00
Heikki Linnakangas	ad853bb877	Fix misc typos, mostly in comments The only user-visible change is the fix in the "malformed pg_dependencies" error detail. That one is new in commit `e1405aa5e3`, so no backpatching required.	2026-01-08 18:10:08 +02:00
Peter Eisentraut	5e7abdac99	strnlen() is now required Remove all configure checks and workarounds for strnlen() missing. It is required by POSIX 2008. Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/98ce805c-6103-421b-adc3-fcf8f3dddbe3%40eisentraut.org	2026-01-08 08:51:20 +01:00
Nathan Bossart	a516b3f00d	MSVC: Support building for AArch64. This commit does the following to get tests passing for MSVC/AArch64: * Implements spin_delay() with an ISB instruction (like we do for gcc/clang on AArch64). * Sets USE_ARMV8_CRC32C unconditionally. Vendor-supported versions of Windows for AArch64 require at least ARMv8.1, which is where CRC extension support became mandatory. * Implements S_UNLOCK() with _InterlockedExchange(). The existing implementation for MSVC uses _ReadWriteBarrier() (a compiler barrier), which is insufficient for this purpose on non-TSO architectures. There are likely other changes required to take full advantage of the hardware (e.g., atomics/arch-arm.h, simd.h, pg_popcount_aarch64.c), but those can be dealt with later. Author: Niyas Sait <niyas.sait@linaro.org> Co-authored-by: Greg Burd <greg@burd.me> Co-authored-by: Dave Cramer <davecramer@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Tested-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://postgr.es/m/A6152C7C-F5E3-4958-8F8E-7692D259FF2F%40greg.burd.me Discussion: https://postgr.es/m/CAFPTBD-74%2BAEuN9n7caJ0YUnW5A0r-KBX8rYoEJWqFPgLKpzdg%40mail.gmail.com	2026-01-07 13:42:57 -06:00
Michael Paquier	b139bd3b6e	Add data type oid8, 64-bit unsigned identifier This new identifier type provides support for 64-bit unsigned values, to be used in catalogs, like OIDs. An advantage of a new data type is that it becomes easier to grep for it in the code when assigning this type to a catalog attribute, linking it to dedicated APIs and internal structures. The following operators are added in this commit, with dedicated tests: - Casts with integer types and OID. - btree and hash operators - min/max functions. - C type with related macros and defines, named around "Oid8". This has been mentioned as useful on its own on the thread to add support for 64-bit TOAST values, so as it becomes possible to attach this data type to the TOAST code and catalog definitions. However, as this concept can apply to many more areas, it is implemented as its own independent change. This is based on a discussion with Andres Freund and Tom Lane. Bump catalog version. Author: Michael Paquier <michael@paquier.xyz> Reviewed-by: Greg Burd <greg@burd.me> Reviewed-by: Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com> Discussion: https://postgr.es/m/1891064.1754681536@sss.pgh.pa.us	2026-01-07 11:37:00 +09:00
Jeff Davis	af2d4ca191	Clean up ICU includes. Remove ICU includes from pg_locale.h, and instead include them in the few C files that need ICU. Clean up a few other includes in passing. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/48911db71d953edec66df0d2ce303563d631fbe0.camel@j-davis.com	2026-01-06 17:19:51 -08:00
Jeff Davis	c4ff35f104	ICU: use UTF8-optimized case conversion API Initializes a UCaseMap object once for use across calls, and uses UTF8-optimized APIs. Author: Andreas Karlsson <andreas@proxel.se> Reviewed-by: zengman <zengman@halodbtech.com> Discussion: https://postgr.es/m/5a010b27-8ed9-4739-86fe-1562b07ba564@proxel.se	2026-01-06 14:09:07 -08:00
Michael Paquier	f1e251be80	Allow bgworkers to be terminated for database-related commands Background workers gain a new flag, called BGWORKER_INTERRUPTIBLE, that offers the possibility to terminate the workers when these are connected to a database that is involved in one of the following commands: ALTER DATABASE RENAME TO ALTER DATABASE SET TABLESPACE CREATE DATABASE DROP DATABASE This is useful to give background workers the same behavior as backends and autovacuum workers, which are stopped when these commands are executed. The default behavior, that exists since 9.3, is still to never terminate bgworkers connected to the database involved in any of these commands. The new flag has to be set to terminate the workers. A couple of tests are added to worker_spi to track the commands that impact the termination of the workers. There is a test case for a non-interruptible worker, additionally, that relies on an injection point to make the wait time in CountOtherDBBackends() reduced from 5s to 0.3s for faster test runs. The tests rely on the contents of the server logs to check if a worker has been started or terminated: - LOG generated by worker_spi_main() at startup, once connection to database is done. - FATAL in bgworker_die() when terminated. A couple of tests run in the CI have showed that this method is stable enough. The safe_psql() calls that scan pg_stat_activity could be replaced with some poll_query_until() for more stability, if the current method proves to be an issue in the buildfarm. Author: Aya Iwata <iwata.aya@fujitsu.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Ryo Matsumura <matsumura.ryo@fujitsu.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Discussion: https://postgr.es/m/OS7PR01MB11964335F36BE41021B62EAE8EAE4A@OS7PR01MB11964.jpnprd01.prod.outlook.com	2026-01-06 14:24:29 +09:00
David Rowley	c5af141cd4	Clarify where various catcache.h dlist_nodes are used Also remove a comment which mentions we don't currently divide the per-cache lists into hash buckets. Since `473182c95`, we do. Author: ChangAo Chen <cca5507@qq.com> Discussion: https://postgr.es/m/tencent_7732789707C8768EA13785A7B5EA29103208@qq.com	2026-01-06 14:39:36 +13:00
Tom Lane	7dc95cc3b9	Update to latest Snowball sources. It's been almost a year since we last did this, and upstream has been busy. They've added stemmers for Polish and Esperanto, and also deprecated their old Dutch stemmer in favor of the Kraaij-Pohlmann algorithm. (The "dutch" stemmer is now the latter, and "dutch_porter" is the old algorithm.) Upstream also decided to rename their internal header "header.h" to something less generic: "snowball_runtime.h". Seems like a good thing, but it complicates this patch a bit because we were relying on interposing our own version of "header.h" to control system header inclusion order. (We're partially failing at that now, because now the generated stemmer files include <stddef.h> before snowball_runtime.h. I think that'll be okay, but if the buildfarm complains then we'll have to do more-extensive editing of the generated files.) I realized that we weren't documenting the available stemmers in any user-visible place, except indirectly through sample \dFd output. That's incomplete because we only provide built-in dictionaries for the recommended stemmers for each language, not alternative stemmers such as dutch_porter. So I added a list to the documentation. I did not do anything with the stopword lists. If those are still available from snowballstem.org, they are mighty well hidden. Discussion: https://postgr.es/m/1185975.1767569534@sss.pgh.pa.us	2026-01-05 15:22:37 -05:00
Alexander Korotkov	7a39f43d88	Extend xlogwait infrastructure with write and flush wait types Add support for waiting on WAL write and flush LSNs in addition to the existing replay LSN wait type. This provides the foundation for extending the WAIT FOR command with MODE parameter. Key changes are following. - Add WAIT_LSN_TYPE_STANDBY_WRITE and WAIT_LSN_TYPE_STANDBY_FLUSH to WaitLSNType. - Add GetCurrentLSNForWaitType() to retrieve the current LSN for each wait type. - Add new wait events WAIT_EVENT_WAIT_FOR_WAL_WRITE and WAIT_EVENT_WAIT_FOR_WAL_FLUSH for pg_stat_activity visibility. - Update WaitForLSN() to use GetCurrentLSNForWaitType() internally. Discussion: https://postgr.es/m/CABPTF7UiArgW-sXj9CNwRzUhYOQrevLzkYcgBydmX5oDes1sjg%40mail.gmail.com Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@kurilemu.de>	2026-01-05 19:56:19 +02:00
Michael Paquier	7d854bdc5b	Remove unneeded defines from pg_config.h.in This commit removes HAVE_ATOMIC_H and HAVE_MBARRIER_H from pg_config.h.in, cleanup that could have been done in `25f36066dd`. Author: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/b2c0d0b7-3944-487d-a03d-d155851958ff@gmail.com	2026-01-05 09:27:19 +09:00
Michael Paquier	b8cfcb9e00	Fix typos and inconsistencies in code and comments This change is a cocktail of harmonization of function argument names, grammar typos, renames for better consistency and unused code (see ltree). All of these have been spotted by the author. Author: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/b2c0d0b7-3944-487d-a03d-d155851958ff@gmail.com	2026-01-05 09:19:15 +09:00
Tom Lane	ba75f71752	Include error location in errors from ComputeIndexAttrs(). Make use of IndexElem's new location field to localize these errors better. Author: jian he <jian.universality@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CACJufxH3OgXF1hrzGAaWyNtye2jHEmk9JbtrtGv-KJK6tsGo5w@mail.gmail.com	2026-01-04 14:16:20 -05:00
Tom Lane	62299bbd90	Add parse location to IndexElem. This patch mostly just fills in the field, although a few error reports in resolve_unique_index_expr() are adjusted to use it. The next commit will add more uses. catversion bump out of an abundance of caution: I'm not sure IndexElem can appear in stored rules, but I'm not sure it can't either. Author: Álvaro Herrera <alvherre@kurilemu.de> Co-authored-by: jian he <jian.universality@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CACJufxH3OgXF1hrzGAaWyNtye2jHEmk9JbtrtGv-KJK6tsGo5w@mail.gmail.com Discussion: https://postgr.es/m/202512121327.f2zimsr6guso@alvherre.pgsql	2026-01-04 14:16:20 -05:00
Peter Eisentraut	0eadf1767a	Remove bogus const qualifier on PageGetItem() argument The function ends up casting away the const qualifier, so it was a lie. No callers appear to rely on the const qualifier on the argument, so the simplest solution is to just remove it. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/beusplf77varvhip6ryuhd2fchsx26qmmhduqz432bnglq634b%402dx4k6yxj4cm	2026-01-04 16:00:15 +01:00
Bruce Momjian	451c43974f	Update copyright for 2026 Backpatch-through: 14	2026-01-01 13:24:10 -05:00
Andrew Dunstan	f3c9e341cd	Add paths of extensions to pg_available_extensions Add a new "location" column to the pg_available_extensions and pg_available_extension_versions views, exposing the directory where the extension is located. The default system location is shown as '$system', the same value that can be used to configure the extension_control_path GUC. User-defined locations are only visible for super users, otherwise '<insufficient privilege>' is returned as a column value, the same behaviour that we already use in pg_stat_activity. I failed to resist the temptation to do a little extra editorializing of the TAP test script. Catalog version bumped. Author: Matheus Alcantara <mths.dev@pm.me> Reviewed-By: Chao Li <li.evan.chao@gmail.com> Reviewed-By: Rohit Prasad <rohit.prasad@arm.com> Reviewed-By: Michael Banck <mbanck@gmx.net> Reviewed-By: Manni Wood <manni.wood@enterprisedb.com> Reviewed-By: Euler Taveira <euler@eulerto.com> Reviewed-By: Quan Zongliang <quanzongliang@yeah.net>	2026-01-01 12:13:59 -05:00
Tom Lane	bc6374cd76	Change IndexAmRoutines to be statically-allocated structs. Up to now, index amhandlers were expected to produce a new, palloc'd struct on each call. That requires palloc/pfree overhead, and creates a risk of memory leaks if the caller fails to pfree, and the time taken to fill such a large structure isn't nil. Moreover, we were storing these things in the relcache, eating several hundred bytes for each cached index. There is not anything in these structs that needs to vary at runtime, so let's change the definition so that an amhandler can return a pointer to a "static const" struct of which there's only one copy per index AM. Mark all the core code's IndexAmRoutine pointers const so that we catch anyplace that might still try to change or pfree one. (This is similar to the way we were already handling TableAmRoutine structs. This commit does fix one comment that was infelicitously copied-and-pasted into tableamapi.c.) This commit needs to be called out in the v19 release notes as an API change for extension index AMs. An un-updated AM will still work (as of now, anyway) but it risks memory leaks and will be slower than necessary. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAEoWx2=vApYk2LRu8R0DdahsPNEhWUxGBZ=rbZo1EXE=uA+opQ@mail.gmail.com	2025-12-30 18:26:23 -05:00
Thomas Munro	1a28b4b455	jit: Drop redundant LLVM configure probes. We currently require LLVM 14, so these probes for LLVM 9 functions always succeeded. Even when the features aren't enabled in an LLVM build, dummy functions are defined (a problem for a later commit). The whole PGAC_CHECK_LLVM_FUNCTIONS macro and Meson equivalent are removed, because we switched to testing LLVM_VERSION_MAJOR at compile time in subsequent work and these were the last holdouts. That suits the nature of LLVM API evolution better, and also allows for strictly mechanical pruning in future commits like `820b5af7` and `972c2cd2`. They advanced the minimum LLVM version but failed to spot these. Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKGJgB6gvrdDohgwLfCwzVQm%3DVMtb9m0vzQn%3DCwWn-kwG9w%40mail.gmail.com	2025-12-30 20:24:42 +13:00
Michael Paquier	97b101776c	Add pg_get_multixact_stats() This new function exposes at SQL level some information related to multixacts, not available until now. This data is useful for monitoring purposes, especially for workloads that make a heavy use of multixacts: - num_mxids, number of MultiXact IDs in use. - num_members, number of member entries in use. - members_size, bytes used by num_members in pg_multixact/members/. - oldest_multixact: oldest MultiXact still needed. This patch has been originally proposed when MultiXactOffset was still 32 bits, to monitor wraparound. This part is not relevant anymore since `bd8d9c9bdf` that has widen MultiXactOffset to 64 bits. The monitoring of disk space usage for the members is still relevant. Some tests are added to check this function, in the shape of one isolation test with concurrent transactions that take a ROW SHARE lock, and some SQL tests for pg_read_all_stats. Some documentation is added to explain some patterns that can come from the information provided by the function. Bump catalog version. Author: Naga Appani <nagnrik@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com> Discussion: https://postgr.es/m/CA+QeY+AAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg@mail.gmail.com	2025-12-30 15:38:50 +09:00
Michael Paquier	0e3ad4b96a	Add MultiXactOffsetStorageSize() to multixact_internal.h This function calculates in bytes the storage taken between two multixact offsets. This will be used in an upcoming patch, introduced separately here as this piece can be useful on its own. Author: Naga Appani <nagnrik@gmail.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/aUyTvZMq2CLgNEB4@paquier.xyz	2025-12-30 14:13:40 +09:00
Michael Paquier	9cf746a453	Change GetMultiXactInfo() to return the next multixact offset This routine returned a number of members as a MultiXactOffset, calculated based on the difference between the next-to-be-assigned offset and the oldest offset. However, this number is not actually an offset but a number. This type confusion comes from the original implementation of MultiXactMemberFreezeThreshold(), in `53bb309d2d`. The number of members is now defined as a uint64, large enough for MultiXactOffset. This change will be used in a follow-up patch. Reviewed-by: Naga Appani <nagnrik@gmail.com> Discussion: https://postgr.es/m/aUyTvZMq2CLgNEB4@paquier.xyz	2025-12-30 14:03:49 +09:00
Richard Guo	ad66f705fa	Strip PlaceHolderVars from index operands When pulling up a subquery, we may need to wrap its targetlist items in PlaceHolderVars to enforce separate identity or as a result of outer joins. However, this causes any upper-level WHERE clauses referencing these outputs to contain PlaceHolderVars, which prevents indxpath.c from recognizing that they could be matched to index columns or index expressions, potentially affecting the planner's ability to use indexes. To fix, explicitly strip PlaceHolderVars from index operands. A PlaceHolderVar appearing in a relation-scan-level expression is effectively a no-op. Nevertheless, to play it safe, we strip only PlaceHolderVars that are not marked nullable. The stripping is performed recursively to handle cases where PlaceHolderVars are nested or interleaved with other node types. To minimize performance impact, we first use a lightweight walker to check for the presence of strippable PlaceHolderVars. The expensive mutator is invoked only if a candidate is found, avoiding unnecessary memory allocation and tree copying in the common case where no PlaceHolderVars are present. Back-patch to v18. Although this issue exists before that, changes in this version made it common enough to notice. Given the lack of field reports for older versions, I am not back-patching further. Reported-by: Haowu Ge <gehaowu@bitmoe.com> Author: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/62af586c-c270-44f3-9c5e-02c81d537e3d.gehaowu@bitmoe.com Backpatch-through: 18	2025-12-29 11:38:49 +09:00
Peter Eisentraut	b7057e4346	Change some Datum to void * for opaque pass-through pointer Here, Datum was used to pass around an opaque pointer between a group of functions. But one might as well use void * for that; the use of Datum doesn't achieve anything here and is just distracting. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/1c5d23cb-288b-4154-b1cd-191fe2301707%40eisentraut.org	2025-12-28 14:34:12 +01:00
Michael Paquier	9adf32da6b	Split some long Makefile lists This change makes more readable code diffs when adding new items or removing old items, while ensuring that lines do not get excessively long. Some SUBDIRS, PROGRAMS and REGRESS lists are split. Note that there are a few more REGRESS lists that could be split, particularly in contrib/. Author: Jelte Fennema-Nio <postgres@jeltef.nl> Co-Authored-By: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Man Zeng <zengman@halodbtech.com> Discussion: https://postgr.es/m/DF6HDGB559U5.3MPRFCWPONEAE@jeltef.nl	2025-12-28 09:17:42 +09:00
Peter Eisentraut	b63443718a	Remove MsgType type Presumably, the C type MsgType was meant to hold the protocol message type in the pre-version-3 era, but this was never fully developed even then, and the name is pretty confusing nowadays. It has only one vestigial use for cancel requests that we can get rid of. Since a cancel request is indicated by a special protocol version number, we can use the ProtocolVersion type, which MsgType was based on. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/505e76cb-0ca2-4e22-ba0f-772b5dc3f230%40eisentraut.org	2025-12-27 23:46:28 +01:00
Michael Paquier	213a1b8952	Move attribute statistics functions to stat_utils.c Many of the operations done for attribute stats in attribute_stats.c share the same logic as extended stats, as done by a patch under discussion to add support for extended stats import and export. All the pieces necessary for extended statistics are moved to stats_utils.c, which is the file where common facilities are shared for stats files. The following renames are done: * get_attr_stat_type() -> statatt_get_type() * init_empty_stats_tuple() -> statatt_init_empty_tuple() * set_stats_slot() -> statatt_set_slot() * get_elem_stat_type() -> statatt_get_elem_type() While on it, this commit adds more documentation for all these functions, describing more their internals and the dependencies that have been implied for attribute statistics. The same concepts apply to extended statistics, at some degree. Author: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Yu Wang <wangyu_runtime@163.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com	2025-12-25 15:13:39 +09:00
Richard Guo	325808cac9	Fix planner error with SRFs and grouping sets If there are any SRFs in a PathTarget, we must separate it into SRF-computing and SRF-free targets. This is because the executor can only handle SRFs that appear at the top level of the targetlist of a ProjectSet plan node. If we find a subexpression that matches an expression already computed in the previous plan level, we should treat it like a Var and should not split it again. setrefs.c will later replace the expression with a Var referencing the subplan output. However, when processing the grouping target for grouping sets, the planner can fail to recognize that an expression is already computed in the scan/join phase. The root cause is a mismatch in the nullingrels bits. Expressions in the grouping target carry the grouping nulling bit in their nullingrels to indicate that they can be nulled by the grouping step. However, the corresponding expressions in the scan/join target do not have these bits. As a result, the exact match check in list_member() fails, leading the planner to incorrectly believe that the expression needs to be re-evaluated from its arguments, which are often not available in the subplan. This can lead to planner errors such as "variable not found in subplan target list". To fix, ignore the grouping nulling bit when checking whether an expression from the grouping target is available in the pre-grouping input target. This aligns with the matching logic in setrefs.c. Backpatch to v18, where this issue was introduced. Bug: #19353 Reported-by: Marian MULLER REBEYROL <marian.muller@serli.com> Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/19353-aaa179bba986a19b@postgresql.org Backpatch-through: 18	2025-12-25 12:12:52 +09:00
Masahiko Sawada	67c20979ce	Toggle logical decoding dynamically based on logical slot presence. Previously logical decoding required wal_level to be set to 'logical' at server start. This meant that users had to incur the overhead of logical-level WAL logging even when no logical replication slots were in use. This commit adds functionality to automatically control logical decoding availability based on logical replication slot presence. The newly introduced module logicalctl.c allows logical decoding to be dynamically activated when needed when wal_level is set to 'replica'. When the first logical replication slot is created, the system automatically increases the effective WAL level to maintain logical-level WAL records. Conversely, after the last logical slot is dropped or invalidated, it decreases back to 'replica' WAL level. While activation occurs synchronously right after creating the first logical slot, deactivation happens asynchronously through the checkpointer process. This design avoids a race condition at the end of recovery; a concurrent deactivation could happen while the startup process enables logical decoding at the end of recovery, but WAL writes are still not permitted until recovery fully completes. The checkpointer will handle it after recovery is done. Asynchronous deactivation also avoids excessive toggling of the logical decoding status in workloads that repeatedly create and drop a single logical slot. On the other hand, this lazy approach can delay changes to effective_wal_level and the disabling logical decoding, especially when the checkpointer is busy with other tasks. We chose this lazy approach in all deactivation paths to keep the implementation simple, even though laziness is strictly required only for end-of-recovery cases. Future work might address this limitation either by using a dedicated worker instead of the checkpointer, or by implementing synchronous waiting during slot drops if workloads are significantly affected by the lazy deactivation of logical decoding. The effective WAL level, determined internally by XLogLogicalInfo, is allowed to change within a transaction until an XID is assigned. Once an XID is assigned, the value becomes fixed for the remainder of the transaction. This behavior ensures that the logging mode remains consistent within a writing transaction, similar to the behavior of GUC parameters. A new read-only GUC parameter effective_wal_level is introduced to monitor the actual WAL level in effect. This parameter reflects the current operational WAL level, which may differ from the configured wal_level setting. Bump PG_CONTROL_VERSION as it adds a new field to CheckPoint struct. Reviewed-by: Shveta Malik <shveta.malik@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CAD21AoCVLeLYq09pQPaWs+Jwdni5FuJ8v2jgq-u9_uFbcp6UbA@mail.gmail.com	2025-12-23 10:13:16 -08:00
Michael Paquier	e5f3839af6	Switch buffile.c/h to use pgoff_t instead of off_t off_t was previously used for offsets, which is 4 bytes on Windows, hence limiting the backend code to a hard limit for files longer than 2GB. This leads to some simplification in these files, removing some casts based on long, also 4 bytes on Windows. This commit removes one comment introduced in `db3c4c3a2d`, not relevant anymore as pgoff_t is a safe 8-byte alternative on Windows. This change is surprisingly not invasive, as the callers of BufFileTell(), BufFileSeek() and BufFileTruncateFileSet() (worker.c, tuplestore.c, etc.) track offsets in local structures that just to switch from off_t to pgoff_t for the most part. The file is still relying on a maximum file size of MAX_PHYSICAL_FILESIZE (1GB). This change allows the code to make this maximum potentially larger in the future, or larger on a per-demand basis. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aUStrqoOCDRFAq1M@paquier.xyz	2025-12-23 07:41:34 +09:00
Heikki Linnakangas	47a9f61fca	Use proper type for RestoreTransactionSnapshot's PGPROC arg Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/08cbaeb5-aaaf-47b6-9ed8-4f7455b0bc4b@iki.fi	2025-12-19 13:40:02 +02:00
Michael Paquier	167cb26718	Fix const correctness in pgstat data serialization callbacks `4ba012a8ed` defined the "header" (pointer to the stats data) of from_serialized_data() as a const, even though it is fine (and expected!) for the callback to modify the shared memory entry when loading the stats at startup. While on it, this commit updates the callback to_serialized_data() in the test module test_custom_stats to make the data extracted from the "header" parameter a const since it should never be modified: the stats are written to disk and no modifications are expected in the shared memory entry. This clarifies the API contract of these new callbacks. Reported-By: Peter Eisentraut <peter@eisentraut.org> Author: Michael Paquier <michael@paquier.xyz> Co-authored-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/d87a93b0-19c7-4db6-b9c0-d6827e7b2da1@eisentraut.org	2025-12-18 07:33:40 +09:00
Michael Paquier	f4e797171e	Change pgstat_report_vacuum() to use Relation This change makes pgstat_report_vacuum() more consistent with pgstat_report_analyze(), that also uses a Relation. This enforces a policy that callers of this routine should open and lock the relation whose statistics are updated before calling this routine. We will unlikely have a lot of callers of this routine in the tree, but it seems like a good idea to imply this requirement in the long run. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Suggested-by: Andres Freund <andres@anarazel.de> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aUEA6UZZkDCQFgSA@ip-10-97-1-34.eu-west-3.compute.internal	2025-12-17 11:26:17 +09:00
Jeff Davis	0a90df58cf	Avoid global LC_CTYPE dependency in pg_locale_icu.c. ICU still depends on libc for compatibility with certain historical behavior for single-byte encodings. Make the dependency explicit by holding a locale_t object when required. We should consider a better solution in the future, such as decoding the text to UTF-32 and using u_tolower(). That would be a behavior change and require additional infrastructure though; so for now, just avoid the global LC_CTYPE dependency. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com	2025-12-16 15:32:57 -08:00
Jeff Davis	87b2968df0	downcase_identifier(): use method table from locale provider. Previously, libc's tolower() was always used for lowercasing identifiers, regardless of the database locale (though only characters beyond 127 in single-byte encodings were affected). Refactor to allow each provider to supply its own implementation of identifier downcasing. For historical compatibility, when using a single-byte encoding, ICU still relies on tolower(). One minor behavior change is that, before the database default locale is initialized, it uses ASCII semantics to downcase the identifiers. Previously, it would use the postmaster's LC_CTYPE setting from the environment. While that could have some effect during GUC processing, for example, it would have been fragile to rely on the environment setting anyway. (Also, it only matters when the encoding is single-byte.) Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com	2025-12-16 15:32:41 -08:00
Jeff Davis	24bf379cb1	Clarify a #define introduced in `8d299052fe`. The value is the same, but use the right symbol for clarity.	2025-12-16 12:48:53 -08:00
Nathan Bossart	48d4a1423d	Allow passing a pointer to GetNamedDSMSegment()'s init callback. This commit adds a new "void *arg" parameter to GetNamedDSMSegment() that is passed to the initialization callback function. This is useful for reusing an initialization callback function for multiple DSM segments. Author: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CAN4CZFMjh8TrT9ZhWgjVTzBDkYZi2a84BnZ8bM%2BfLPuq7Cirzg%40mail.gmail.com	2025-12-15 14:27:16 -06:00
Noah Misch	64bf53dd61	Revisit cosmetics of "For inplace update, send nontransactional invalidations." This removes a never-used CacheInvalidateHeapTupleInplace() parameter. It adds README content about inplace update visibility in logical decoding. It rewrites other comments. Back-patch to v18, where commit `243e9b40f1` first appeared. Since this removes a CacheInvalidateHeapTupleInplace() parameter, expect a v18 ".abi-compliance-history" edit to follow. PGXN contains no calls to that function. Reported-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Reported-by: Ilyasov Ian <ianilyasov@outlook.com> Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Surya Poondla <s_poondla@apple.com> Discussion: https://postgr.es/m/CA+renyU+LGLvCqS0=fHit-N1J-2=2_mPK97AQxvcfKm+F-DxJA@mail.gmail.com Backpatch-through: 18	2025-12-15 12:19:49 -08:00
Jeff Davis	95a19fefdc	Remove incorrect declarations in pg_wchar.h. Oversight in commit `9acae56ce0`. Discussion: https://postgr.es/m/541F240E-94AD-4D65-9794-7D6C316BC3FF@gmail.com	2025-12-15 10:38:55 -08:00
Jeff Davis	54c41a6deb	Remove unused single-byte char_is_cased() API. https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com	2025-12-15 10:24:57 -08:00
Peter Eisentraut	17f446784d	Refactor static_assert() support. HAVE__STATIC_ASSERT was really a test for GCC statement expressions, as needed for StaticAssertExpr() now that _Static_assert could be assumed to be available through our C11 requirement. This artificially prevented Visual Studio from being able to use static_assert() in other contexts. Instead, make a new test for HAVE_STATEMENT_EXPRESSIONS, and use that to control only whether StaticAssertExpr() uses fallback code, not the other variants. This improves the quality of failure messages in the (much more common) other variants under Visual Studio. Also get rid of the two separate implementations for C++, since the C implementation is also also valid as C++11. While it is a stretch to apply HAVE_STATEMENT_EXPRESSIONS tested with $CC to a C++ compiler, the previous C++ coding assumed that the C++ compiler had them unconditionally, so it isn't a new stretch. In practice, the C and C++ compilers are very likely to agree, and if a combination is ever reported that falsifies this assumption we can always reconsider that. Author: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGKvr0x_oGmQTUkx%3DODgSksT2EtgCA6LmGx_jQFG%3DsDUpg%40mail.gmail.com	2025-12-15 11:54:23 +01:00
Michael Paquier	4ba012a8ed	Allow cumulative statistics to read/write auxiliary data from/to disk Cumulative stats kinds gain the capability to write additional per-entry data when flushing the stats at shutdown, and read this data when loading back the stats at startup. This can be fit for example in the case of variable-length data (like normalized query strings), so as it becomes possible to link the shared memory stats entries to data that is stored in a different area, like a DSA segment. Three new optional callbacks are added to PgStat_KindInfo, available to variable-numbered stats kinds: * to_serialized_data: writes auxiliary data for an entry. * from_serialized_data: reads auxiliary data for an entry. * finish: performs actions after read/write/discard operations. This is invoked after processing all the entries of a kind, allowing extensions to close file handles and clean up resources. Stats kinds have the option to store this data in the existing pgstats file, but can as well store it in one or more additional files whose names can be built upon the entry keys. The new serialized callbacks are called once an entry key is read or written from the main stats file. A file descriptor to the main pgstats file is available in the arguments of the callbacks. Author: Sami Imseih <samimseih@gmail.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com	2025-12-15 09:40:56 +09:00
Tom Lane	58dad7f349	Update typedefs.list to match what the buildfarm currently reports. The current list from the buildfarm includes quite a few typedef names that it used to miss. The reason is a bit obscure, but it seems likely to have something to do with our recent increased use of palloc_object and palloc_array. In any case, this makes the relevant struct declarations be much more nicely formatted, so I'll take it. Install the current list and re-run pgindent to update affected code. Syncing with the current list also removes some obsolete typedef names and fixes some alphabetization errors. Discussion: https://postgr.es/m/1681301.1765742268@sss.pgh.pa.us	2025-12-14 17:03:53 -05:00
Tom Lane	66b2282b0c	Make "pgoff_t" be a typedef not a #define. There doesn't seem to be any great reason why this has been a macro rather than a typedef. But doing it like that means our buildfarm typedef tooling doesn't capture the name as a typedef. That would result in pgindent glitches, except that we've seemingly kept it in typedefs.list manually. That's obviously error-prone, so let's convert it to a typedef now. Discussion: https://postgr.es/m/1681301.1765742268@sss.pgh.pa.us	2025-12-14 16:53:34 -05:00
Alexander Korotkov	b27e48213f	Refactor WaitLSNType enum to use a macro for type count Change WAIT_LSN_TYPE_COUNT from an enum sentinel to a macro definition, in a similar way to IOObject, IOContext, and BackendType enums. Remove explicit enum value assignments well. Author: Xuneng Zhou <xunengzhou@gmail.com>	2025-12-14 17:18:32 +02:00
Alexander Korotkov	4b3d173629	Implement ALTER TABLE ... SPLIT PARTITION ... command This new DDL command splits a single partition into several partitions. Just like the ALTER TABLE ... MERGE PARTITIONS ... command, new partitions are created using the createPartitionTable() function with the parent partition as the template. This commit comprises a quite naive implementation which works in a single process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the operations, including the tuple routing. This is why the new DDL command can't be recommended for large, partitioned tables under high load. However, this implementation comes in handy in certain cases, even as it is. Also, it could serve as a foundation for future implementations with less locking and possibly parallelism. Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru Author: Dmitry Koval <d.koval@postgrespro.ru> Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com> Co-authored-by: Tender Wang <tndrwang@gmail.com> Co-authored-by: Richard Guo <guofenglinux@gmail.com> Co-authored-by: Dagfinn Ilmari Mannsaker <ilmari@ilmari.org> Co-authored-by: Fujii Masao <masao.fujii@gmail.com> Co-authored-by: Jian He <jian.universality@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Robert Haas <rhaas@postgresql.org> Reviewed-by: Stephane Tachoires <stephane.tachoires@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Daniel Gustafsson <dgustafsson@postgresql.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Noah Misch <noah@leadboat.com>	2025-12-14 13:29:38 +02:00
Alexander Korotkov	f2e4cc4279	Implement ALTER TABLE ... MERGE PARTITIONS ... command This new DDL command merges several partitions into a single partition of the target table. The target partition is created using the new createPartitionTable() function with the parent partition as the template. This commit comprises a quite naive implementation which works in a single process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the operations, including the tuple routing. This is why this new DDL command can't be recommended for large partitioned tables under a high load. However, this implementation comes in handy in certain cases, even as it is. Also, it could serve as a foundation for future implementations with less locking and possibly parallelism. Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru Author: Dmitry Koval <d.koval@postgrespro.ru> Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com> Co-authored-by: Tender Wang <tndrwang@gmail.com> Co-authored-by: Richard Guo <guofenglinux@gmail.com> Co-authored-by: Dagfinn Ilmari Mannsaker <ilmari@ilmari.org> Co-authored-by: Fujii Masao <masao.fujii@gmail.com> Co-authored-by: Jian He <jian.universality@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Robert Haas <rhaas@postgresql.org> Reviewed-by: Stephane Tachoires <stephane.tachoires@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Daniel Gustafsson <dgustafsson@postgresql.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Noah Misch <noah@leadboat.com>	2025-12-14 13:29:17 +02:00
Peter Eisentraut	315342ffed	Use correct preprocessor conditional in relptr.h When relptr.h was added (commit `fbc1c12a94`), there was no check for HAVE_TYPEOF, so it used HAVE__BUILTIN_TYPES_COMPATIBLE_P, which already existed (commit `ea473fb2de`) and which was thought to cover approximately the same compilers. But the guarded code can also work without HAVE__BUILTIN_TYPES_COMPATIBLE_P, and we now have a check for HAVE_TYPEOF (commit `4cb824699e`), so let's fix this up to use the correct logic. Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://www.postgresql.org/message-id/CA%2BhUKGL7trhWiJ4qxpksBztMMTWDyPnP1QN%2BLq341V7QL775DA%40mail.gmail.com	2025-12-13 19:56:09 +01:00
Peter Eisentraut	493eb0da31	Replace most StaticAssertStmt() with StaticAssertDecl() Similar to commit `75f49221c2`, it is preferable to use StaticAssertDecl() instead of StaticAssertStmt() when possible. Discussion: https://www.postgresql.org/message-id/flat/CA%2BhUKGKvr0x_oGmQTUkx%3DODgSksT2EtgCA6LmGx_jQFG%3DsDUpg%40mail.gmail.com	2025-12-12 10:06:40 +01:00
Nathan Bossart	b4cbc106a6	Fix some comments. Like commit `123661427b`, these were discovered while reviewing Aleksander Alekseev's proposed changes to pgindent.	2025-12-11 15:13:04 -06:00
Peter Eisentraut	795e94c70c	Make <assert.h> consistently available in frontend and backend Previously, c.h made <assert.h> only available in frontends (#ifdef FRONTEND), which was probably reasonable, because the only thing it would give you is assert(), which you generally shouldn't use in the backend. But with C11, <assert.h> also makes available static_assert(), which would be useful everywhere. So this patch moves <assert.h> to the commonly available header files in c.h and fixes a small complication in regcustom.h that resulted from that. Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA%2BhUKGKvr0x_oGmQTUkx%3DODgSksT2EtgCA6LmGx_jQFG%3DsDUpg%40mail.gmail.com	2025-12-11 09:56:57 +01:00
Tom Lane	0909380e4c	Allow PG_PRINTF_ATTRIBUTE to be different in C and C++ code. Although clang claims to be compatible with gcc's printf format archetypes, this appears to be a falsehood: it likes __syslog__ (which gcc does not, on most platforms) and doesn't accept gnu_printf. This means that if you try to use gcc with clang++ or clang with g++, you get compiler warnings when compiling printf-like calls in our C++ code. This has been true for quite awhile, but it's gotten more annoying with the recent appearance of several buildfarm members that are configured like this. To fix, run separate probes for the format archetype to use with the C and C++ compilers, and conditionally define PG_PRINTF_ATTRIBUTE depending on __cplusplus. (We could alternatively insist that you not mix-and-match C and C++ compilers; but if the case works otherwise, this is a poor reason to insist on that.) No back-patch for now, but we may want to do that if this patch survives buildfarm testing. Discussion: https://postgr.es/m/986485.1764825548@sss.pgh.pa.us	2025-12-10 17:09:10 -05:00
Jeff Davis	630706ced0	Add pg_iswcased(). True if character has multiple case forms. Will be a useful multibyte-aware replacement for char_is_cased(). Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com	2025-12-10 11:56:11 -08:00
Jeff Davis	1e493158d3	Remove char_tolower() API. It's only useful for an ILIKE optimization for the libc provider using a single-byte encoding and a non-C locale, but it creates significant internal complexity. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com	2025-12-10 11:55:59 -08:00
Thomas Munro	c507ba55f5	Fix O_CLOEXEC flag handling in Windows port. PostgreSQL's src/port/open.c has always set bInheritHandle = TRUE when opening files on Windows, making all file descriptors inheritable by child processes. This meant the O_CLOEXEC flag, added to many call sites by commit `1da569ca1f` (v16), was silently ignored. The original commit included a comment suggesting that our open() replacement doesn't create inheritable handles, but it was a mis- understanding of the code path. In practice, the code was creating inheritable handles in all cases. This hasn't caused widespread problems because most child processes (archive_command, COPY PROGRAM, etc.) operate on file paths passed as arguments rather than inherited file descriptors. Even if a child wanted to use an inherited handle, it would need to learn the numeric handle value, which isn't passed through our IPC mechanisms. Nonetheless, the current behavior is wrong. It violates documented O_CLOEXEC semantics, contradicts our own code comments, and makes PostgreSQL behave differently on Windows than on Unix. It also creates potential issues with future code or security auditing tools. To fix, define O_CLOEXEC to _O_NOINHERIT in master, previously used by O_DSYNC. We use different values in the back branches to preserve existing values. In pgwin32_open_handle() we set bInheritHandle according to whether O_CLOEXEC is specified, for the same atomic semantics as POSIX in multi-threaded programs that create processes. Backpatch-through: 16 Author: Bryan Green <dbryan.green@gmail.com> Co-authored-by: Thomas Munro <thomas.munro@gmail.com> (minor adjustments) Discussion: https://postgr.es/m/e2b16375-7430-4053-bda3-5d2194ff1880%40gmail.com	2025-12-10 09:01:35 +13:00
Nathan Bossart	750816971b	Add ParallelSlotSetIdle(). This commit refactors the code for marking a ParallelSlot as idle to a new static inline function. This can be used to mark a slot that was obtained via ParallelSlotGetIdle() but that we don't intend to actually use for a query as idle again. This is preparatory work for a follow-up commit that will add a --dry-run option to vacuumdb. Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com Discussion: https://postgr.es/m/CADkLM%3DckHkX7Of5SrK7g0LokPUwJ%3Dkk8JU1GXGF5pZ1eBVr0%3DQ%40mail.gmail.com	2025-12-09 13:34:22 -06:00
Masahiko Sawada	ab40db3852	Add started_by column to pg_stat_progress_analyze view. The new column, started_by, indicates the initiator of the analyze ('manual' or 'autovacuum'), helping users and monitoring tools to better understand ANALYZE behavior. Bump catalog version. Author: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Yu Wang <wangyu_runtime@163.com> Discussion: https://postgr.es/m/CAA5RZ0suoicwxFeK_eDkUrzF7s0BVTaE7M%2BehCpYcCk5wiECpw%40mail.gmail.com	2025-12-09 11:23:45 -08:00
Masahiko Sawada	0d78952061	Add mode and started_by columns to pg_stat_progress_vacuum view. The new columns, mode and started_by, indicate the vacuum mode ('normal', 'aggressive', or 'failsafe') and the initiator of the vacuum ('manual', 'autovacuum', or 'autovacuum_wraparound'), respectively. This allows users and monitoring tools to better understand VACUUM behavior. Bump catalog version. Author: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Robert Treat <rob@xzilla.net> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Yu Wang <wangyu_runtime@163.com> Discussion: https://postgr.es/m/CAOzEurQcOY-OBL_ouEVfEaFqe_md3vB5pXjR_m6L71Dcp1JKCQ@mail.gmail.com	2025-12-09 10:51:14 -08:00
Tom Lane	f8715ec866	Support "j" length modifier in snprintf.c. POSIX has for a long time defined the "j" length modifier for printf conversions as meaning the size of intmax_t or uintmax_t. We got away without supporting that so far, because we were not using intmax_t anywhere. However, commit `e6be84356` re-introduced upstream's use of intmax_t and PRIdMAX into zic.c. It emerges that on some platforms (at least FreeBSD and macOS), <inttypes.h> defines PRIdMAX as "jd", so that snprintf.c falls over if that is used. (We hadn't noticed yet because it would only be apparent if bad data is fed to zic, resulting in an error report, and even then the only visible symptom is a missing line number in the error message.) We could revert that decision from our copy of zic.c, but on the whole it seems better to update snprintf.c to support this standard modifier. There might well be extensions, now or in future, that expect it to work. I did this in the lazy man's way of translating "j" to either "l" or "ll" depending on a compile-time sizeof() check, just as was done long ago to support "z" for size_t. One could imagine promoting intmax_t to have full support in snprintf.c, for example converting fmtint()'s value argument and internal arithmetic to use [u]intmax_t not [unsigned] long long. But that'd be more work and I'm hesitant to do it anyway: if there are any platforms out there where intmax_t is actually wider than "long long", this would doubtless result in a noticeable speed penalty to snprintf(). Let's not go there until we have positive evidence that there's a reason to, and some way to measure what size of penalty we're taking. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/3210703.1765236740@sss.pgh.pa.us	2025-12-09 11:43:25 -05:00
Heikki Linnakangas	bd8d9c9bdf	Widen MultiXactOffset to 64 bits This eliminates MultiXactOffset wraparound and the 2^32 limit on the total number of multixid members. Multixids are still limited to 2^31, but this is a nice improvement because 'members' can grow much faster than the number of multixids. On such systems, you can now run longer before hitting hard limits or triggering anti-wraparound vacuums. Not having to deal with MultiXactOffset wraparound also simplifies the code and removes some gnarly corner cases. We no longer need to perform emergency anti-wraparound freezing because of running out of 'members' space, so the offset stop limit is gone. But you might still not want 'members' to consume huge amounts of disk space. For that reason, I kept the logic for lowering vacuum's multixid freezing cutoff if a large amount of 'members' space is used. The thresholds for that are roughly the same as the "safe" and "danger" thresholds used before, 2 billion transactions and 4 billion transactions. This keeps the behavior for the freeze cutoff roughly the same as before. It might make sense to make this smarter or configurable, now that the threshold is only needed to manage disk usage, but that's left for the future. Add code to pg_upgrade to convert multitransactions from the old to the new format, rewriting the pg_multixact SLRU files. Because pg_upgrade now rewrites the files, we can get rid of some hacks we had put in place to deal with old bugs and upgraded clusters. Bump catalog version for the pg_multixact/offsets format change. Author: Maxim Orlov <orlovmg@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com> Discussion: https://www.postgresql.org/message-id/CACG%3DezaWg7_nt-8ey4aKv2w9LcuLthHknwCawmBgEeTnJrJTcw@mail.gmail.com	2025-12-09 13:53:03 +02:00
Heikki Linnakangas	bb3b1c4f64	Move pg_multixact SLRU page format definitions to a separate header This makes them accessible from pg_upgrade, needed by the next commit. I'm doing this mechanical move as a separate commit to make the next commit's changes to these definitions more obvious. Author: Maxim Orlov <orlovmg@gmail.com> Discussion: https://www.postgresql.org/message-id/CACG%3DezbZo_3_fnx%3DS5BfepwRftzrpJ%2B7WET4EkTU6wnjDTsnjg@mail.gmail.com	2025-12-09 13:45:01 +02:00
Michael Paquier	0c3c5c3b06	Use palloc_object() and palloc_array() in more areas of the tree The idea is to encourage more the use of these new routines across the tree, as these offer stronger type safety guarantees than palloc(). The following paths are included in this batch, treating all the areas proposed by the author for the most trivial changes, except src/backend (by far the largest batch): src/bin/ src/common/ src/fe_utils/ src/include/ src/pl/ src/test/ src/tutorial/ Similar work has been done in `31d3847a37`. The code compiles the same before and after this commit, with the following exceptions due to changes in line numbers because some of the new allocation formulas are shorter: blkreftable.c pgfnames.c pl_exec.c Author: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com	2025-12-09 14:53:17 +09:00
Andres Freund	aa749bde32	Improve documentation for pg_atomic_unlocked_write_u32() After my recent commit `7902a47c20`, Nathan noticed that pg_atomic_unlocked_write_u64() was not accurately described by the comments for the 32bit version. Turns out the 32bit version has suffered from copy-and-paste-itis since its introduction. Fix. Reported-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://postgr.es/m/aTGt7q4Jvn97uGAx@nathan	2025-12-08 23:11:19 -05:00
Peter Geoghegan	65d6acbc56	Relocate _bt_readpage and related functions. Quite a bit of code within nbtutils.c is only called by _bt_readpage. Move _bt_readpage and all of the nbtutils.c functions it depends on into a new .c file, nbtreadpage.c. Also reorder some of the functions within the new file for clarity. This commit has no functional impact. It is strictly mechanical. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Victor Yegorov <vyegorov@gmail.com> Discussion: https://postgr.es/m/CAH2-WzmwMwcwKFgaf+mYPwiz3iL4AqpXnwtW_O0vqpWPXRom9Q@mail.gmail.com	2025-12-08 13:15:00 -05:00
Tom Lane	0986e95161	Revise APIs for pushJsonbValue() and associated routines. Instead of passing "JsonbParseState **" to pushJsonbValue(), pass a pointer to a JsonbInState, which will contain the parseState stack pointer as well as other useful fields. Also, instead of returning a JsonbValue pointer that is often meaningless/ignored, return the top-level JsonbValue pointer in the "result" field of the JsonbInState. This involves a lot of (mostly mechanical) edits, but I think the results are notationally cleaner and easier to understand. Certainly the business with sometimes capturing the result of pushJsonbValue() and sometimes not was bug-prone and incapable of mechanical verification. In the new arrangement, JsonbInState.result remains null until we've completed a valid sequence of pushes, so that an incorrect sequence will result in a null-pointer dereference, not mistaken use of a partial result. However, this isn't simply an exercise in prettier notation. The real reason for doing it is to provide a mechanism whereby pushJsonbValue() can be told to construct the JsonbValue tree in a context that is not CurrentMemoryContext. That happens when a non-null "outcontext" is specified in the JsonbInState. No callers exercise that option in this patch, but the next patch in the series will make use of it. I tried to improve the comments in this area too. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: jian he <jian.universality@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/1060917.1753202222@sss.pgh.pa.us	2025-12-07 11:51:33 -05:00
Tom Lane	3628af4210	Add a macro for the declared typlen of type timetz. pg_type.typlen says 12 for the size of timetz, but sizeof(TimeTzADT) will be 16 on most platforms due to alignment padding. Using the sizeof number is no problem for usages such as palloc'ing a result datum, but in usages such as datumCopy we really ought to match what pg_type says. Add a macro TIMETZ_TYPLEN so that we have a symbolic way to write that rather than hard-coding "12". I cannot find any place where we've needed this so far, but an upcoming patch requires it. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/2329959.1765047648@sss.pgh.pa.us	2025-12-07 11:33:35 -05:00
Tom Lane	6498287696	Handle constant inputs to corr() and related aggregates more precisely. The SQL standard says that corr() and friends should return NULL in the mathematically-undefined case where all the inputs in one of the columns have the same value. We were checking that by seeing if the sums Sxx and Syy were zero, but that approach is very vulnerable to roundoff error: if a sum is close to zero but not exactly that, we'd come out with a pretty silly non-NULL result. Instead, directly track whether the inputs are all equal by remembering the common value in each column. Once we detect that a new input is different from before, represent that by storing NaN for the common value. (An objection to this scheme is that if the inputs are all NaN, we will consider that they were not all equal. But under IEEE float arithmetic rules, one NaN is never equal to another, so this behavior is arguably correct. Moreover it matches what we did before in such cases.) Then, leave the sums at their exact value of zero for as long as we haven't detected different input values. This solution requires the aggregate transition state to contain 8 float values not 6, which is not problematic, and it seems to add less than 1% to the aggregates' runtime, which seems acceptable. While we're here, improve corr()'s final function to cope with overflow/underflow in the final calculation, and to clamp its result to [-1, 1] in case of roundoff error. Although this is arguably a bug fix, it requires a catversion bump due to the change in aggregates' initial states, so it can't be back-patched. Patch written by me, but many of the ideas are due to Dean Rasheed, who also did a deal of testing. Bug: #19340 Reported-by: Oleg Ivanov <o15611@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Co-authored-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/19340-6fb9f6637f562092@postgresql.org	2025-12-06 18:31:26 -05:00
Amit Kapila	5db6a344ab	Rename column slotsync_skip_at to slotsync_last_skip. Commit `76b78721ca` introduced two new columns in pg_stat_replication_slots to improve monitoring of slot synchronization. One of these columns was named slotsync_skip_at, which is inconsistent with the naming convention used for similar columns in other system views. Columns that store timestamps of the most recent event typically use the 'last_' in the column name (e.g., last_autovacuum, checksum_last_failure). Renaming slotsync_skip_at to slotsync_last_skip aligns with this pattern, making the purpose of the column clearer and improving overall consistency across the views. Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Michael Banck <mbanck@gmx.net> Discussion: https://postgr.es/m/20251128091552.GB13635@p46.dedyn.io;lightning.p46.dedyn.io Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com	2025-12-05 04:12:55 +00:00
Peter Eisentraut	c6be3daa05	Remove no longer needed casts to Pointer These casts used to be required when Pointer was char , but now it's void (commit `1b2bb5077e`), so they are not needed anymore. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org	2025-12-04 19:40:08 +01:00
Andres Freund	6c5c393b74	Rename BUFFERPIN wait event class to BUFFER In an upcoming patch more wait events will be added to the wait event class (for buffer locking), making the current name too specific. Alternatively we could introduce a dedicated wait event class for those, but it seems somewhat confusing to have a BUFFERPIN and a BUFFER wait event class. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-12-03 18:38:20 -05:00
Andres Freund	7902a47c20	Add pg_atomic_unlocked_write_u64 The 64bit equivalent of pg_atomic_unlocked_write_u32(), to be used in an upcoming patch converting BufferDesc.state into a 64bit atomic. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-12-03 18:38:20 -05:00
Andres Freund	156680055d	bufmgr: Turn BUFFER_LOCK_* into an enum It seems cleaner to use an enum to tie the different values together. It also helps to have a more descriptive type in the argument to various functions. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-12-03 18:38:20 -05:00
Heikki Linnakangas	789d65364c	Set next multixid's offset when creating a new multixid With this commit, the next multixid's offset will always be set on the offsets page, by the time that a backend might try to read it, so we no longer need the waiting mechanism with the condition variable. In other words, this eliminates "corner case 2" mentioned in the comments. The waiting mechanism was broken in a few scenarios: - When nextMulti was advanced without WAL-logging the next multixid. For example, if a later multixid was already assigned and WAL-logged before the previous one was WAL-logged, and then the server crashed. In that case the next offset would never be set in the offsets SLRU, and a query trying to read it would get stuck waiting for it. Same thing could happen if pg_resetwal was used to forcibly advance nextMulti. - In hot standby mode, a deadlock could happen where one backend waits for the next multixid assignment record, but WAL replay is not advancing because of a recovery conflict with the waiting backend. The old TAP test used carefully placed injection points to exercise the old waiting code, but now that the waiting code is gone, much of the old test is no longer relevant. Rewrite the test to reproduce the IPC/MultixactCreation hang after crash recovery instead, and to verify that previously recorded multixids stay readable. Backpatch to all supported versions. In back-branches, we still need to be able to read WAL that was generated before this fix, so in the back-branches this includes a hack to initialize the next offsets page when replaying XLOG_MULTIXACT_CREATE_ID for the last multixid on a page. On 'master', bump XLOG_PAGE_MAGIC instead to indicate that the WAL is not compatible. Author: Andrey Borodin <amborodin@acm.org> Reviewed-by: Dmitry Yurichev <dsy.075@yandex.ru> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Ivan Bykov <i.bykov@modernsys.ru> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/172e5723-d65f-4eec-b512-14beacb326ce@yandex.ru Backpatch-through: 14	2025-12-03 19:15:08 +02:00
Peter Eisentraut	9790affcce	Fix stray references to SubscriptRef This type never existed. SubscriptingRef was meant instead. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/2eaa45e3-efc5-4d75-b082-f8159f51445f%40eisentraut.org	2025-12-03 14:44:14 +01:00
Peter Eisentraut	1b2bb5077e	Change Pointer to void * The comment for the Pointer type said 'XXX Pointer arithmetic is done with this, so it can't be void * under "true" ANSI compilers.'. This has been fixed in the previous commit `756a436893`. This now changes the definition of the type from char * to void *, as envisaged by that comment. Extension code that relies on using Pointer for pointer arithmetic will need to make changes similar to commit `756a436893`, but those changes would be backward compatible. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org	2025-12-03 10:22:17 +01:00
Nathan Bossart	f894acb24a	Show size of DSAs and dshashes in pg_dsm_registry_allocations. Presently, this view reports NULL for the size of DSAs and dshash tables because 1) the current backend might not be attached to them and 2) the registry doesn't save the pointers to the dsa_area or dshash_table in local memory. Also, the view doesn't show partially-initialized entries to avoid ambiguity, since those entries would report a NULL size as well. This commit introduces a function that looks up the size of a DSA given its handle (transiently attaching to the control segment if needed) and teaches pg_dsm_registry_allocations to use it to show the size of successfully-initialized DSA and dshash entries. Furthermore, the view now reports partially-initialized entries with a NULL size. Reviewed-by: Rahila Syed <rahilasyed90@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aSeEDeznAsHR1_YF%40nathan	2025-12-02 10:29:45 -06:00
Michael Paquier	713d9a847e	Update some timestamp[tz] functions to use soft-error reporting This commit updates two functions that convert "timestamptz" to "timestamp", and vice-versa, to use the soft error reporting rather than a their own logic to do the same. These are now named as follows: - timestamp2timestamptz_safe() - timestamptz2timestamp_safe() These functions were suffixed with "_opt_overflow", previously. This shaves some code, as it is possible to detect how a timestamp[tz] overflowed based on the returned value rather than a custom state. It is optionally possible for the callers of these functions to rely on the error generated internally by these functions, depending on the error context. Similar work has been done in `d03668ea05` and `4246a977ba`. Reviewed-by: Amul Sul <sulamul@gmail.com> Discussion: https://postgr.es/m/aS09YF2GmVXjAxbJ@paquier.xyz	2025-12-02 09:30:23 +09:00
Jeff Davis	19b966243c	Make regex "max_chr" depend on encoding, not provider. The regex mechanism scans through the first "max_chr" character values to cache character property ranges (isalpha, etc.). For single-byte encodings, there's no sense in scanning beyond UCHAR_MAX; but for UTF-8 it makes sense to cache higher code point values (though not all of them; only up to MAX_SIMPLE_CHR). Prior to `5a38104b36`, the logic about how many character values to scan was based on the pg_regex_strategy, which was dependent on the provider. Commit `5a38104b36` preserved that logic exactly, allowing different providers to define the "max_chr". Now, change it to depend only on the encoding and whether ctype_is_c. For this specific calculation, distinguishing between providers creates more complexity than it's worth. Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com Reviewed-by: Chao Li <li.evan.chao@gmail.com>	2025-12-01 11:06:17 -08:00
Michael Paquier	a87987cafc	Move WAL sequence code into its own file This split exists for most of the other RMGRs, and makes cleaner the separation between the WAL code, the redo code and the record description code (already in its own file) when it comes to the sequence RMGR. The redo and masking routines are moved to a new file, sequence_xlog.c. All the RMGR routines are now located in a new header, sequence_xlog.h. This separation is useful for a different patch related to sequences that I have been working on, where it makes a refactoring of sequence.c easier if its RMGR routines and its core routines are split. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/aSfTxIWjiXkTKh1E@paquier.xyz	2025-12-01 16:21:41 +09:00
Michael Paquier	d03668ea05	Switch some date/timestamp functions to use the soft error reporting This commit changes some functions related to the data types date and timestamp to use the soft error reporting rather than a custom boolean flag called "overflow", used to let the callers of these functions know if an overflow happens. This results in the removal of some boilerplate code, as it is possible to rely on an error context rather than a custom state, with the possibility to use the error generated inside the functions updated here, if necessary. These functions were suffixed with "_opt_overflow". They are now renamed to use "_safe" as suffix. This work is similar to `4246a977ba`. Author: Amul Sul <sulamul@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAAJ_b95HEmFyzHZfsdPquSHeswcopk8MCG1Q_vn4tVkZ+xxofw@mail.gmail.com	2025-12-01 15:22:20 +09:00
Peter Eisentraut	8b3e2c622a	Fix pg_isblank() There was a pg_isblank() function that claimed to be a replacement for the standard isblank() function, which was thought to be "not very portable yet". We can now assume that it's portable (it's in C99). But pg_isblank() actually diverged from the standard isblank() by also accepting '\r', while the standard one only accepts space and tab. This was added to support parsing pg_hba.conf under Windows. But the hba parsing code now works completely differently and already handles line endings before we get to pg_isblank(). The other user of pg_isblank() is for ident protocol message parsing, which also handles '\r' separately. So this behavior is now obsolete and confusing. To improve clarity, I separated those concerns. The ident parsing now gets its own function that hardcodes the whitespace characters mentioned by the relevant RFC. pg_isblank() is now static in hba.c and is a wrapper around the standard isblank(), with some extra logic to ensure robust treatment of non-ASCII characters. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/170308e6-a7a3-4484-87b2-f960bb564afa%40eisentraut.org	2025-11-28 08:33:07 +01:00
Amit Kapila	e68b6adad9	Add slotsync_skip_reason column to pg_replication_slots view. Introduce a new column, slotsync_skip_reason, in the pg_replication_slots view. This column records the reason why the last slot synchronization was skipped. It is primarily relevant for logical replication slots on standby servers where the 'synced' field is true. The value is NULL when synchronization succeeds. Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com> Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com	2025-11-28 05:21:35 +00:00
Michael Paquier	9660906dbd	Add routines for marking buffers dirty efficiently This commit introduces new internal bufmgr routines for marking shared buffers as dirty: * MarkDirtyUnpinnedBuffer() * MarkDirtyRelUnpinnedBuffers() * MarkDirtyAllUnpinnedBuffers() These functions provide an efficient mechanism to respectively mark one buffer, all the buffers of a relation, or the entire shared buffer pool as dirty, something that can be useful to force patterns for the checkpointer. MarkDirtyUnpinnedBufferInternal(), an extra routine, is used by these three, to mark as dirty an unpinned buffer. They are intended as developer tools to manipulate buffer dirtiness in bulk, and will be used in a follow-up commit. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Joseph Koshakow <koshy44@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Yuhang Qiu <iamqyh@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw@mail.gmail.com	2025-11-28 07:39:33 +09:00
Peter Eisentraut	e7075a3405	Use C11 alignas in pg_atomic_uint64 definitions They were already using pg_attribute_aligned. This replaces that with alignas and moves that into the required syntactic position. This ends up making these three atomics implementations appear a bit more consistent, but shouldn't change anything otherwise. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org	2025-11-27 07:53:34 +01:00
David Rowley	0ca3b16973	Add parallelism support for TID Range Scans In v14, `bb437f995` added support for scanning for ranges of TIDs using a dedicated executor node for the purpose. Here, we allow these scans to be parallelized. The range of blocks to scan is divvied up similarly to how a Parallel Seq Scans does that, where 'chunks' of blocks are allocated to each worker and the size of those chunks is slowly reduced down to 1 block per worker by the time we're nearing the end of the scan. Doing that means workers finish at roughly the same time. Allowing TID Range Scans to be parallelized removes the dilemma from the planner as to whether a Parallel Seq Scan will cost less than a non-parallel TID Range Scan due to the CPU concurrency of the Seq Scan (disk costs are not divided by the number of workers). It was possible the planner could choose the Parallel Seq Scan which would result in reading additional blocks during execution than the TID Scan would have. Allowing Parallel TID Range Scans removes the trade-off the planner makes when choosing between reduced CPU costs due to parallelism vs additional I/O from the Parallel Seq Scan due to it scanning blocks from outside of the required TID range. There is also, of course, the traditional parallelism performance benefits to be gained as well, which likely doesn't need to be explained here. Author: Cary Huang <cary.huang@highgo.ca> Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Steven Niu <niushiji@gmail.com> Discussion: https://postgr.es/m/18f2c002a24.11bc2ab825151706.3749144144619388582@highgo.ca	2025-11-27 14:05:04 +13:00
David Rowley	42473b3b31	Have the planner replace COUNT(ANY) with COUNT(), when possible This adds SupportRequestSimplifyAggref to allow pg_proc.prosupport functions to receive an Aggref and allow them to determine if there is a way that the Aggref call can be optimized. Also added is a support function to allow transformation of COUNT(ANY) into COUNT(). This is possible to do when the given "ANY" cannot be NULL and also that there are no ORDER BY / DISTINCT clauses within the Aggref. This is a useful transformation to do as it is common that people write COUNT(1), which until now has added unneeded overhead. When counting a NOT NULL column. The overheads can be worse as that might mean deforming more of the tuple, which for large fact tables may be many columns in. It may be possible to add prosupport functions for other aggregates. We could consider if ORDER BY could be dropped for some calls, e.g. the ORDER BY is quite useless in MAX(c ORDER BY c). There is a little bit of passing fallout from adjusting expr_is_nonnullable() to handle Const which results in a plan change in the aggregates.out regression test. Previously, nothing was able to determine that "One-Time Filter: (100 IS NOT NULL)" was always true, therefore useless to include in the plan. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Discussion: https://postgr.es/m/CAApHDvqGcPTagXpKfH=CrmHBqALpziThJEDs_MrPqjKVeDF9wA@mail.gmail.com	2025-11-27 10:43:28 +13:00
Jeff Davis	8d299052fe	Add #define for UNICODE_CASEMAP_BUFSZ. Useful for mapping a single codepoint at a time into a statically-allocated buffer. Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com Reviewed-by: Chao Li <li.evan.chao@gmail.com>	2025-11-26 10:05:11 -08:00
Jeff Davis	ec4997a9d7	Inline pg_ascii_tolower() and pg_ascii_toupper(). Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com Reviewed-by: Chao Li <li.evan.chao@gmail.com>	2025-11-26 10:04:32 -08:00
Peter Eisentraut	8fe4aef829	Replace internal C function pg_hypot() by standard hypot() The code comment said, "It is expected that this routine will eventually be replaced with the C99 hypot() function.", so let's do that now. This function is tested via the geometry regression test, so if it is faulty on any platform, it will show up there. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/170308e6-a7a3-4484-87b2-f960bb564afa%40eisentraut.org	2025-11-26 07:48:29 +01:00
Amit Kapila	76b78721ca	Add slotsync skip statistics. This patch adds two new columns to the pg_stat_replication_slots view: slotsync_skip_count - the total number of times a slotsync operation was skipped. slotsync_skip_at - the timestamp of the most recent skip. These additions provide better visibility into replication slot synchronization behavior. A future patch will introduce the slotsync_skip_reason column in pg_replication_slots to capture the reason for skip. Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com	2025-11-25 07:06:02 +00:00
Michael Paquier	ed823da128	Rename routines for write/read of pgstats file This commit renames write_chunk and read_chunk to respectively pgstat_write_chunk() and pgstat_read_chunk(), along with the *_s convenience macros. These are made available for plug-ins, so as any code that decides to write and/or read stats data can rely on a single code path for this work. Extracted from a larger patch by the same author. Author: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com	2025-11-25 10:55:40 +09:00
Tom Lane	698fa924b1	Improve detection of implicitly-temporary views. We've long had a practice of making views temporary by default if they reference any temporary tables. However the implementation was pretty incomplete, in that it only searched for RangeTblEntry references to temp relations. Uses of temporary types, regclass constants, etc were not detected even though the dependency mechanism considers them grounds for dropping the view. Thus a view not believed to be temp could silently go away at session exit anyhow. To improve matters, replace the ad-hoc isQueryUsingTempRelation() logic with use of the dependency-based infrastructure introduced by commit `572c40ba9`. This is complete by definition, and it's less code overall. While we're at it, we can also extend the warning NOTICE (or ERROR in the case of a materialized view) to mention one of the temp objects motivating the classification of the view as temp, as was done for functions in `572c40ba9`. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Jim Jones <jim.jones@uni-muenster.de> Discussion: https://postgr.es/m/19cf6ae1-04cd-422c-a760-d7e75fe6cba9@uni-muenster.de	2025-11-24 17:00:16 -05:00
Jacob Champion	0664aa4ff8	Reorganize pqcomm.h a bit Group the PG_PROTOCOL() codes, add a comment to AuthRequest now that the AUTH_REQ codes live in a different header, and make some small adjustments to spacing and comment style for the sake of scannability. Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CAOYmi%2B%3D6zg4oXXOQtifrVao_YKiujTDa3u6bxnU08r0FsSig4g%40mail.gmail.com	2025-11-24 10:01:30 -08:00
Jacob Champion	8934f2136c	Add pg_add_size_overflow() and friends Commit `600086f47` added (several bespoke copies of) size_t addition with overflow checks to libpq. Move this to common/int.h, along with its subtraction and multiplication counterparts. pg_neg_size_overflow() is intentionally omitted; I'm not sure we should add SSIZE_MAX to win32_port.h for the sake of a function with no callers. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAOYmi%2B%3D%2BpqUd2MUitvgW1pAJuXgG_TKCVc3_Ek7pe8z9nkf%2BAg%40mail.gmail.com	2025-11-24 09:59:38 -08:00
Peter Eisentraut	d4c0f91f7d	C11 alignas instead of unions -- extended alignments This replaces some uses of pg_attribute_aligned() with the standard alignas() for cases where extended alignment (larger than max_align_t) is required. This patch stipulates that all supported compilers must support alignments up to PG_IO_ALIGN_SIZE, but that seems pretty likely. We can then also desupport the case where direct I/O is disabled because pg_attribute_aligned is not supported. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org	2025-11-24 07:39:37 +01:00
David Rowley	07d1dc3aeb	Fix incorrect IndexOptInfo header comment The comment incorrectly indicated that indexcollations[] stored collations for both key columns and INCLUDE columns, but in reality it only has elements for the key columns. canreturn[] didn't get a mention, so add that while we're here. Author: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAEG8a3LwbZgMKOQ9CmZarX5DEipKivdHp5PZMOO-riL0w%3DL%3D4A%40mail.gmail.com Backpatch-through: 14	2025-11-24 17:00:01 +13:00
Tom Lane	572c40ba94	Issue a NOTICE if a created function depends on any temp objects. We don't have an official concept of temporary functions. (You can make one explicitly in pg_temp, but then you have to explicitly schema-qualify it on every call.) However, until now we were quite laissez-faire about whether a non-temporary function could depend on a temporary object, such as a temp table or view. If one does, it will silently go away at end of session, due to the automatic DROP ... CASCADE on the session's temporary objects. People have complained that that's surprising; however, we can't really forbid it because other people (including our own regression tests) rely on being able to do it. Let's compromise by emitting a NOTICE at CREATE FUNCTION time. This is somewhat comparable to our ancient practice of emitting a NOTICE when forcing a view to become temp because it depends on temp tables. Along the way, refactor recordDependencyOnExpr() so that the dependencies of an expression can be combined with other dependencies, instead of being emitted separately and perhaps duplicatively. We should probably make the implementation of temp-by-default views use the same infrastructure used here, but that's for another patch. It's unclear whether there are any other object classes that deserve similar treatment. Author: Jim Jones <jim.jones@uni-muenster.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/19cf6ae1-04cd-422c-a760-d7e75fe6cba9@uni-muenster.de	2025-11-23 15:02:55 -05:00
Tom Lane	b140c8d7a3	Add SupportRequestInlineInFrom planner support request. This request allows a support function to replace a function call appearing in FROM (typically a set-returning function) with an equivalent SELECT subquery. The subquery will then be subject to the planner's usual optimizations, potentially allowing a much better plan to be generated. While the planner has long done this automatically for simple SQL-language functions, it's now possible for extensions to do it for functions outside that group. Notably, this could be useful for functions that are presently implemented in PL/pgSQL and work by generating and then EXECUTE'ing a SQL query. Author: Paul A Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/09de6afa-c33d-4d94-a5cb-afc6cea0d2bb@illuminatedcomputing.com	2025-11-22 19:33:34 -05:00
Peter Eisentraut	5eed8ce50c	Add range_minus_multi and multirange_minus_multi functions The existing range_minus function raises an exception when the range is "split", because then the result can't be represented by a single range. For example '[0,10)'::int4range - '[4,5)' would be '[0,4)' and '[5,10)'. This commit adds new set-returning functions so that callers can get results even in the case of splits. There is no risk of an exception for multiranges, but a set-returning function lets us handle them the same way we handle ranges. Both functions return zero results if the subtraction would give an empty range/multirange. The main use-case for these functions is to implement UPDATE/DELETE FOR PORTION OF, which must compute the application-time of "temporal leftovers": the part of history in an updated/deleted row that was not changed. To preserve the untouched history, we will implicitly insert one record for each result returned by range/multirange_minus_multi. Using a set-returning function will also let us support user-defined types for application-time update/delete in the future. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/ec498c3d-5f2b-48ec-b989-5561c8aa2024%40illuminatedcomputing.com	2025-11-22 09:42:03 +01:00
Peter Eisentraut	97e04c74be	C11 alignas instead of unions This changes a few union members that only existed to ensure alignments and replaces them with the C11 alignas specifier. This change only uses fundamental alignments (meaning approximately alignments of basic types), which all C11 compilers must support. There are opportunities for similar changes using extended alignments, for example in PGIOAlignedBlock, but these are not necessarily supported by all compilers, so they are kept as a separate change. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org	2025-11-21 10:08:24 +01:00
Melanie Plageman	1937ed7062	Refactor heap_page_prune_and_freeze() parameters into a struct heap_page_prune_and_freeze() had accumulated an unwieldy number of input parameters and upcoming work to handle VM updates in this function will add even more. Introduce a new PruneFreezeParams struct to group the function’s input parameters, improving readability and maintainability. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc%407uw6jyyxuyf7	2025-11-20 10:32:14 -05:00
Peter Eisentraut	300c8f5324	Add <stdalign.h> to c.h This allows using the C11 constructs alignas and alignof (not done in this patch). Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org	2025-11-19 08:18:25 +01:00
Alexander Korotkov	75e82b2f5a	Optimize shared memory usage for WaitLSNProcInfo We need separate pairing heaps for different WaitLSNType's, because there might be waiters for different LSN's at the same time. However, one process can wait only for one type of LSN at a time. So, no need for inHeap and heapNode fields to be arrays. Discussion: https://postgr.es/m/CAPpHfdsBR-7sDtXFJ1qpJtKiohfGoj%3DvqzKVjWxtWsWidx7G_A%40mail.gmail.com Author: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>	2025-11-18 09:50:12 +02:00
Amit Kapila	3edaf29fa5	Rename two columns in pg_stat_subscription_stats. This patch renames the sync_error_count column to sync_table_error_count in the pg_stat_subscription_stats view. The new name makes the purpose explicit now that a separate column exists to track sequence synchronization errors. Additionally, the column seq_sync_error_count is renamed to sync_seq_error_count to maintain a consistent naming pattern, making it easier for users to group, and query synchronization related counters. Author: Vignesh C <vignesh21@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CALDaNm3WwJmz=-4ybTkhniB-Nf3qmFG9Zx1uKjyLLoPF5NYYXA@mail.gmail.com	2025-11-18 03:58:55 +00:00
Michael Paquier	e76defbcf0	Rework output format of pg_dependencies The existing format of pg_dependencies uses a single-object JSON structure, with each key value embedding all the knowledge about the set attributes tracked, like: {"1 => 5": 1.000000, "5 => 1": 0.423130} While this is a very compact format, it is confusing to read and it is difficult to manipulate the values within the object, particularly when tracking multiple attributes. The new output format introduced in this commit is a JSON array of objects, with: - A key named "degree", with a float value. - A key named "attributes", with an array of attribute numbers. - A key named "dependency", with an attribute number. The values use the same underlying type as previously when printed, with a new output format that shows now as follows: [{"degree": 1.000000, "attributes": [1], "dependency": 5}, {"degree": 0.423130, "attributes": [5], "dependency": 1}] This new format will become handy for a follow-up set of changes, so as it becomes possible to inject extended statistics rather than require an ANALYZE, like in a dump/restore sequence or after pg_upgrade on a new cluster. This format has been suggested by Tomas Vondra. The key names are defined in the header introduced by `1f927cce44`, to ease the integration of frontend-specific changes that are still under discussion. (Again a personal note: if anybody comes up with better name for the keys, of course feel free.) The bulk of the changes come from the regression tests, where jsonb_pretty() is now used to make the outputs generated easier to parse. Author: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com	2025-11-17 10:44:26 +09:00
Michael Paquier	1f927cce44	Rework output format of pg_ndistinct The existing format of pg_ndistinct uses a single-object JSON structure where each key is itself a comma-separated list of attnums, like: {"3, 4": 11, "3, 6": 11, "4, 6": 11, "3, 4, 6": 11} While this is a very compact format, it is confusing to read and it is difficult to manipulate the values within the object. The new output format introduced in this commit is an array of objects, with: - A key named "attributes", that contains an array of attribute numbers. - A key named "ndistinct", represented as an integer. The values use the same underlying type as previously when printed, with a new output format that shows now as follows: [{"ndistinct": 11, "attributes": [3,4]}, {"ndistinct": 11, "attributes": [3,6]}, {"ndistinct": 11, "attributes": [4,6]}, {"ndistinct": 11, "attributes": [3,4,6]}] This new format will become handy for a follow-up set of changes, so as it becomes possible to inject extended statistics rather than require an ANALYZE, like in a dump/restore sequence or after pg_upgrade on a new cluster. This format has been suggested by Tomas Vondra. The key names are defined in a new header, to ease with the integration of frontend-specific changes that are still under discussion. (Personal note: I am not specifically wedded to these key names, but if there are better name suggestions for this release, feel free.) The bulk of the changes come from the regression tests, where jsonb_pretty() is now used to make the outputs generated easier to parse. Author: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com	2025-11-17 09:52:20 +09:00
David Rowley	586d63214e	Adjust MemSet macro to use size_t rather than long Likewise for MemSetAligned. "long" wasn't the most suitable type for these macros as with MSVC in 64-bit builds, sizeof(long) == 4, which is narrower than the processor's word size, therefore these macros had to perform twice as many loops as they otherwise might. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CAApHDvoGFjSA3aNyVQ3ivbyc4ST=CC5L-_VjEUQ92HbE2Cxovg@mail.gmail.com	2025-11-17 12:27:00 +13:00
David Rowley	9c047da51f	Get rid of long datatype in CATCACHE_STATS enabled builds "long" is 32 bits on Windows 64-bit. Switch to a datatype that's 64-bit on all platforms. While we're there, use an unsigned type as these fields count things that have occurred, of which it's not possible to have negative numbers of. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CAApHDvoGFjSA3aNyVQ3ivbyc4ST=CC5L-_VjEUQ92HbE2Cxovg@mail.gmail.com	2025-11-17 12:26:41 +13:00
Alexander Korotkov	23792d7381	Fix incorrect function name in comments Update comments to reference WaitForLSN() instead of the outdated WaitForLSNReplay() function name. Discussion: https://postgr.es/m/CABPTF7UieOYbOgH3EnQCasaqcT1T4N6V2wammwrWCohQTnD_Lw%40mail.gmail.com Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>	2025-11-15 12:27:42 +02:00
Michael Paquier	84fb27511d	Replace off_t by pgoff_t in I/O routines PostgreSQL's Windows port has never been able to handle files larger than 2GB due to the use of off_t for file offsets, only 32-bit on Windows. This causes signed integer overflow at exactly 2^31 bytes when trying to handle files larger than 2GB, for the routines touched by this commit. Note that large files are forbidden by ./configure (`3c6248a828`) and meson (recent change, see `79cd66f28c`). This restriction also exists in v16 and older versions for the now-dead MSVC scripts. The code base already defines pgoff_t as __int64 (64-bit) on Windows for this purpose, and some function declarations in headers use it, but many internals still rely on off_t. This commit switches more routines to use pgoff_t, offering more portability, for areas mainly related to file extensions and storage. These are not critical for WAL segments yet, which have currently a maximum size allowed of 1GB (well, this opens the door at allowing a larger size for them). This matters more for segment files if we want to lift the large file restriction in ./configure and meson in the future, which would make sense to remove once/if all traces of off_t are gone from the tree. This can additionally matter for out-of-core code that may want files larger than 2GB in places where off_t is four bytes in size. Note that off_t is still used in other parts of the tree like buffile.c, WAL sender/receiver, base backup, pg_combinebackup, etc. These other code paths can be addressed separately, and their update will be required if we want to remove the large file restriction in the future. This commit is a good first cut in itself towards more portability, hopefully. On Unix-like systems, pgoff_t is defined as off_t, so this change only affects Windows behavior. Author: Bryan Green <dbryan.green@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/0f238ff4-c442-42f5-adb8-01b762c94ca1@gmail.com	2025-11-13 12:41:40 +09:00
Heikki Linnakangas	8eeb4a0f7c	Fix bug where we truncated CLOG that was still needed by LISTEN/NOTIFY The async notification queue contains the XID of the sender, and when processing notifications we call TransactionIdDidCommit() on the XID. But we had no safeguards to prevent the CLOG segments containing those XIDs from being truncated away. As a result, if a backend didn't for some reason process its notifications for a long time, or when a new backend issued LISTEN, you could get an error like: test=# listen c21; ERROR: 58P01: could not access status of transaction 14279685 DETAIL: Could not open file "pg_xact/000D": No such file or directory. LOCATION: SlruReportIOError, slru.c:1087 To fix, make VACUUM "freeze" the XIDs in the async notification queue before truncating the CLOG. Old XIDs are replaced with FrozenTransactionId or InvalidTransactionId. Note: This commit is not a full fix. A race condition remains, where a backend is executing asyncQueueReadAllNotifications() and has just made a local copy of an async SLRU page which contains old XIDs, while vacuum concurrently truncates the CLOG covering those XIDs. When the backend then calls TransactionIdDidCommit() on those XIDs from the local copy, you still get the error. The next commit will fix that remaining race condition. This was first reported by Sergey Zhuravlev in 2021, with many other people hitting the same issue later. Thanks to: - Alexandra Wang, Daniil Davydov, Andrei Varashen and Jacques Combrink for investigating and providing reproducable test cases, - Matheus Alcantara and Arseniy Mukhin for review and earlier proposed patches to fix this, - Álvaro Herrera and Masahiko Sawada for reviews, - Yura Sokolov aka funny-falcon for the idea of marking transactions as committed in the notification queue, and - Joel Jacobson for the final patch version. I hope I didn't forget anyone. Backpatch to all supported versions. I believe the bug goes back all the way to commit `d1e027221d`, which introduced the SLRU-based async notification queue. Discussion: https://www.postgresql.org/message-id/16961-25f29f95b3604a8a@postgresql.org Discussion: https://www.postgresql.org/message-id/18804-bccbbde5e77a68c2@postgresql.org Discussion: https://www.postgresql.org/message-id/CAK98qZ3wZLE-RZJN_Y%2BTFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw@mail.gmail.com Backpatch-through: 14	2025-11-12 20:59:36 +02:00
Álvaro Herrera	877a024902	Split out innards of pg_tablespace_location() This creates a src/backend/catalog/pg_tablespace.c supporting file containing a new function get_tablespace_location(), which lets the code underlying pg_tablespace_location() be reused for other purposes. Author: Manni Wood <manni.wood@enterprisedb.com> Author: Nishant Sharma <nishant.sharma@enterprisedb.com> Reviewed-by: Vaibhav Dalvi <vaibhav.dalvi@enterprisedb.com> Reviewed-by: Ian Lawrence Barwick <barwick@gmail.com> Reviewed-by: Jim Jones <jim.jones@uni-muenster.de> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CAKWEB6rmnmGKUA87Zmq-s=b3Scsnj02C0kObQjnbL2ajfPWGEw@mail.gmail.com	2025-11-12 16:39:55 +01:00
Peter Eisentraut	d2f24df19b	Clean up qsort comparison function for GUC entries guc_var_compare() is invoked from qsort() on an array of struct config_generic, but the function accesses these directly as strings (char *). This relies on the name being the first field, so this works. But we can write this more clearly by using the struct and then accessing the field through the struct. Before the reorganization of the GUC structs (commit `a13833c35f`), the old code was probably more convenient, but now we can write this more clearly and correctly. After this change, it is no longer required that the name is the first field in struct config_generic, so remove that comment. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/2c961fa1-14f6-44a2-985c-e30b95654e8d%40eisentraut.org	2025-11-11 07:55:10 +01:00
Heikki Linnakangas	e510378358	Bump PG_CONTROL_VERSION for commit `3e0ae46d90` Commit `3e0ae46d90` added a field to ControlFileData and bumped CATALOG_VERSION_NO, but CATALOG_VERSION_NO is not the right version number for ControlFileData changes. Bumping either one will force an initdb, but PG_CONTROL_VERSION is more accurate. Bump PG_CONTROL_VERSION now. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/1874404.1762787779@sss.pgh.pa.us	2025-11-10 19:12:43 +02:00
Nathan Bossart	5e4fcbe531	Check for CREATE privilege on the schema in CREATE STATISTICS. This omission allowed table owners to create statistics in any schema, potentially leading to unexpected naming conflicts. For ALTER TABLE commands that require re-creating statistics objects, skip this check in case the user has since lost CREATE on the schema. The addition of a second parameter to CreateStatistics() breaks ABI compatibility, but we are unaware of any impacted third-party code. Reported-by: Jelte Fennema-Nio <postgres@jeltef.nl> Author: Jelte Fennema-Nio <postgres@jeltef.nl> Co-authored-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Security: CVE-2025-12817 Backpatch-through: 13	2025-11-10 09:00:00 -06:00
Heikki Linnakangas	3e0ae46d90	Move SLRU_PAGES_PER_SEGMENT to pg_config_manual.h It seems plausible that someone might want to experiment with different values. The pressing reason though is that I'm reviewing a patch that requires pg_upgrade to manipulate SLRU files. That patch needs to access SLRU_PAGES_PER_SEGMENT from pg_upgrade code, and slru.h, where SLRU_PAGES_PER_SEGMENT is currently defined, cannot be included from frontend code. Moving it to pg_config_manual.h makes it accessible. Now that it's a little more likely that someone might change SLRU_PAGES_PER_SEGMENT, add a cluster compatibility check for it. Bump catalog version because of the new field in the control file. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://www.postgresql.org/message-id/c7a4ea90-9f7b-4953-81be-b3fcb47db057@iki.fi	2025-11-10 16:11:41 +02:00
Thomas Munro	c5d34f4a55	Fix generic read and write barriers for Clang. generic-gcc.h maps our read and write barriers to C11 acquire and release fences using compiler builtins, for platforms where we don't have our own hand-rolled assembler. This is apparently enough for GCC, but the C11 memory model is only defined in terms of atomic accesses, and our barriers for non-atomic, non-volatile accesses were not always respected under Clang's stricter interpretation of the standard. This explains the occasional breakage observed on new RISC-V + Clang animal greenfly in lock-free PgAioHandle manipulation code containing a repeating pattern of loads and read barriers. The problem can also be observed in code generated for MIPS and LoongAarch, though we aren't currently testing those with Clang, and on x86, though we use our own assembler there. The scariest aspect is that we use the generic version on very common ARM systems, but it doesn't seem to reorder the relevant code there (or we'd have debugged this long ago). Fix by inserting an explicit compiler barrier. It expands to an empty assembler block declared to have memory side-effects, so registers are flushed and reordering is prevented. In those respects this is like the architecture-specific assembler versions, but the compiler is still in charge of generating the appropriate fence instruction. Done for write barriers on principle, though concrete problems have only been observed with read barriers. Reported-by: Alexander Lakhin <exclusion@gmail.com> Tested-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/d79691be-22bd-457d-9d90-18033b78c40a%40gmail.com Backpatch-through: 13	2025-11-08 12:26:43 +13:00
Amit Kapila	f6a4c498dc	Add seq_sync_error_count to subscription statistics. This commit adds a new column, seq_sync_error_count, to the pg_stat_subscription_stats view. This counter tracks the number of errors encountered by the sequence synchronization worker during operation. Since a single worker handles the synchronization of all sequences, this value may reflect errors from multiple sequences. This addition improves observability of sequence synchronization behavior and helps monitor potential issues during replication. Author: Vignesh C <vignesh21@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com	2025-11-07 08:05:08 +00:00
Andres Freund	c75ebc657f	bufmgr: Allow some buffer state modifications while holding header lock Until now BufferDesc.state was not allowed to be modified while the buffer header spinlock was held. This meant that operations like unpinning buffers needed to use a CAS loop, waiting for the buffer header spinlock to be released before updating. The benefit of that restriction is that it allowed us to unlock the buffer header spinlock with just a write barrier and an unlocked write (instead of a full atomic operation). That was important to avoid regressions in `48354581a4`. However, since then the hottest buffer header spinlock uses have been replaced with atomic operations (in particular, the most common use of PinBuffer_Locked(), in GetVictimBuffer() (formerly in BufferAlloc()), has been removed in `5e89985928`). This change will allow, in a subsequent commit, to release buffer pins with a single atomic-sub operation. This previously was not possible while such operations were not allowed while the buffer header spinlock was held, as an atomic-sub would not have allowed a race-free check for the buffer header lock being held. Using atomic-sub to unpin buffers is a nice scalability win, however it is not the primary motivation for this change (although it would be sufficient). The primary motivation is that we would like to merge the buffer content lock into BufferDesc.state, which will result in more frequent changes of the state variable, which in some situations can cause a performance regression, due to an increased CAS failure rate when unpinning buffers. The regression entirely vanishes when using atomic-sub. Naively implementing this would require putting CAS loops in every place modifying the buffer state while holding the buffer header lock. To avoid that, introduce UnlockBufHdrExt(), which can set/add flags as well as the refcount, together with releasing the lock. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-11-06 16:42:10 -05:00
Álvaro Herrera	06edbed478	Introduce XLogRecPtrIsValid() XLogRecPtrIsInvalid() is inconsistent with the affirmative form of macros used for other datatypes, and leads to awkward double negatives in a few places. This commit introduces XLogRecPtrIsValid(), which allows code to be written more naturally. This patch only adds the new macro. XLogRecPtrIsInvalid() is left in place, and all existing callers remain untouched. This means all supported branches can accept hypothetical bug fixes that use the new macro, and at the same time any code that compiled with the original formulation will continue to silently compile just fine. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Backpatch-through: 13 Discussion: https://postgr.es/m/aQB7EvGqrbZXrMlg@ip-10-97-1-34.eu-west-3.compute.internal	2025-11-06 19:08:29 +01:00
Heikki Linnakangas	aa9c5fd3e3	Refactor shared memory allocation for semaphores Before commit `e25626677f`, spinlocks were implemented using semaphores on some platforms (--disable-spinlocks). That made it necessary to initialize semaphores early, before any spinlocks could be used. Now that we don't support --disable-spinlocks anymore, we can allocate the shared memory needed for semaphores the same way as other shared memory structures. Since the semaphores are used only in the PGPROC array, move the semaphore shmem size estimation and initialization calls to ProcGlobalShmemSize() and InitProcGlobal(). Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/CAExHW5seSZpPx-znjidVZNzdagGHOk06F+Ds88MpPUbxd1kTaA@mail.gmail.com	2025-11-06 14:45:00 +02:00
Peter Eisentraut	01a985c3c4	Re-run autoheader Some of the changes in pg_config.h.in from commit `3853a6956c` didn't match the order that a fresh run would produce.	2025-11-06 07:37:22 +01:00
Alexander Korotkov	447aae13b0	Implement WAIT FOR command WAIT FOR is to be used on standby and specifies waiting for the specific WAL location to be replayed. This option is useful when the user makes some data changes on primary and needs a guarantee to see these changes are on standby. WAIT FOR needs to wait without any snapshot held. Otherwise, the snapshot could prevent the replay of WAL records, implying a kind of self-deadlock. This is why separate utility command seems appears to be the most robust way to implement this functionality. It's not possible to implement this as a function. Previous experience shows that stored procedures also have limitation in this aspect. Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: jian he <jian.universality@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>	2025-11-05 11:44:13 +02:00
Alexander Korotkov	3b4e53a075	Add infrastructure for efficient LSN waiting Implement a new facility that allows processes to wait for WAL to reach specific LSNs, both on primary (waiting for flush) and standby (waiting for replay) servers. The implementation uses shared memory with per-backend information organized into pairing heaps, allowing O(1) access to the minimum waited LSN. This enables fast-path checks: after replaying or flushing WAL, the startup process or WAL writer can quickly determine if any waiters need to be awakened. Key components: - New xlogwait.c/h module with WaitForLSNReplay() and WaitForLSNFlush() - Separate pairing heaps for replay and flush waiters - WaitLSN lightweight lock for coordinating shared state - Wait events WAIT_FOR_WAL_REPLAY and WAIT_FOR_WAL_FLUSH for monitoring This infrastructure can be used by features that need to wait for WAL operations to complete. Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>	2025-11-05 11:44:13 +02:00
Alexander Korotkov	8af3ae0d4b	Add pairingheap_initialize() for shared memory usage The existing pairingheap_allocate() uses palloc(), which allocates from process-local memory. For shared memory use cases, the pairingheap structure must be allocated via ShmemAlloc() or embedded in a shared memory struct. Add pairingheap_initialize() to initialize an already- allocated pairingheap structure in-place, enabling shared memory usage. Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru> Author: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>	2025-11-05 11:44:13 +02:00
Amit Kapila	5509055d69	Add sequence synchronization for logical replication. This patch introduces sequence synchronization. Sequences that are synced will have 2 states: - INIT (needs [re]synchronizing) - READY (is already synchronized) A new sequencesync worker is launched as needed to synchronize sequences. A single sequencesync worker is responsible for synchronizing all sequences. It begins by retrieving the list of sequences that are flagged for synchronization, i.e., those in the INIT state. These sequences are then processed in batches, allowing multiple entries to be synchronized within a single transaction. The worker fetches the current sequence values and page LSNs from the remote publisher, updates the corresponding sequences on the local subscriber, and finally marks each sequence as READY upon successful synchronization. Sequence synchronization occurs in 3 places: 1) CREATE SUBSCRIPTION - The command syntax remains unchanged. - The subscriber retrieves sequences associated with publications. - Published sequences are added to pg_subscription_rel with INIT state. - Initiate the sequencesync worker to synchronize all sequences. 2) ALTER SUBSCRIPTION ... REFRESH PUBLICATION - The command syntax remains unchanged. - Dropped published sequences are removed from pg_subscription_rel. - Newly published sequences are added to pg_subscription_rel with INIT state. - Initiate the sequencesync worker to synchronize only newly added sequences. 3) ALTER SUBSCRIPTION ... REFRESH SEQUENCES - A new command introduced for PG19 by `f0b3573c3a`. - All sequences in pg_subscription_rel are reset to INIT state. - Initiate the sequencesync worker to synchronize all sequences. - Unlike "ALTER SUBSCRIPTION ... REFRESH PUBLICATION" command, addition and removal of missing sequences will not be done in this case. Author: Vignesh C <vignesh21@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com> Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com	2025-11-05 05:59:58 +00:00
Michael Paquier	e0ca61e7c4	Add WalRcvGetState() to retrieve the state of a WAL receiver This has come up as useful as an alternative of WalRcvStreaming(), to be able to do sanity checks based on the state of a WAL receiver. This will be used in a follow-up commit. Author: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/19093-c4fff49a608f82a0@postgresql.org	2025-11-04 12:57:36 +09:00
Michael Paquier	17b2d5ec75	Fix unconditional WAL receiver shutdown during stream-archive transition Commit `b4f584f9d2` (affecting v15~, later backpatched down to 13 as of `3635a0a35a`) introduced an unconditional WAL receiver shutdown when switching from streaming to archive WAL sources. This causes problems during a timeline switch, when a WAL receiver enters WALRCV_WAITING state but remains alive, waiting for instructions. The unconditional shutdown can break some monitoring scenarios as the WAL receiver gets repeatedly terminated and re-spawned, causing pg_stat_wal_receiver.status to show a "streaming" instead of "waiting" status, masking the fact that the WAL receiver is waiting for a new TLI and a new LSN to be able to continue streaming. This commit changes the WAL receiver behavior so as the shutdown becomes conditional, with InstallXLogFileSegmentActive being always reset to prevent the regression fixed by `b4f584f9d2`: only terminate the WAL receiver when it is actively streaming (WALRCV_STREAMING, WALRCV_STARTING, or WALRCV_RESTARTING). When in WALRCV_WAITING state, just reset InstallXLogFileSegmentActive flag to allow archive restoration without killing the process. WALRCV_STOPPED and WALRCV_STOPPING are not reachable states in this code path. For the latter, the startup process is the one in charge of setting WALRCV_STOPPING via ShutdownWalRcv(), waiting for the WAL receiver to reach a WALRCV_STOPPED state after switching walRcvState, so WaitForWALToBecomeAvailable() cannot be reached while a WAL receiver is in a WALRCV_STOPPING state. A regression test is added to check that a WAL receiver is not stopped on timeline jump, that fails when the fix of this commit is reverted. Reported-by: Ryan Bird <ryanzxg@gmail.com> Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/19093-c4fff49a608f82a0@postgresql.org Backpatch-through: 13	2025-11-04 10:47:38 +09:00
Álvaro Herrera	645cb44c54	Add \pset options for boolean value display New \pset variables display_true and display_false allow the user to change how true and false values are displayed. Author: David G. Johnston <David.G.Johnston@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CAKFQuwYts3vnfQ5AoKhEaKMTNMfJ443MW2kFswKwzn7fiofkrw@mail.gmail.com Discussion: https://postgr.es/m/56308F56.8060908@joh.to	2025-11-03 17:40:39 +01:00
Tom Lane	8f29467c57	Change "long" numGroups fields to be Cardinality (i.e., double). We've been nibbling away at removing uses of "long" for a long time, since its width is platform-dependent. Here's one more: change the remaining "long" fields in Plan nodes to Cardinality, since the three surviving examples all represent group-count estimates. The upstream planner code was converted to Cardinality some time ago; for example the corresponding fields in Path nodes are type Cardinality, as are the arguments of the make_foo_path functions. Downstream in the executor, it turns out that these all feed to the table-size argument of BuildTupleHashTable. Change that to "double" as well, and fix it so that it safely clamps out-of-range values to the uint32 limit of simplehash.h, as was not being done before. Essentially, this is removing all the artificial datatype-dependent limitations on these values from upstream processing, and applying just one clamp at the moment where we're forced to do so by the datatype choices of simplehash.h. Also, remove BuildTupleHashTable's misguided attempt to enforce work_mem/hash_mem_limit. It doesn't have enough information (particularly not the expected tuple width) to do that accurately, and it has no real business second-guessing the caller's choice. For all these plan types, it's really the planner's responsibility to not choose a hashed implementation if the hashtable is expected to exceed hash_mem_limit. The previous patch improved the accuracy of those estimates, and even if BuildTupleHashTable had more information it should arrive at the same conclusions. Reported-by: Jeff Janes <jeff.janes@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com	2025-11-02 16:57:43 -05:00
Tom Lane	1ea5bdb00b	Improve planner's estimates of tuple hash table sizes. For several types of plan nodes that use TupleHashTables, the planner estimated the expected size of the table as basically numEntries * (MAXALIGN(dataWidth) + MAXALIGN(SizeofHeapTupleHeader)). This is pretty far off, especially for small data widths, because it doesn't account for the overhead of the simplehash.h hash table nor for any per-tuple "additional space" the plan node may request. Jeff Janes noted a case where the estimate was off by about a factor of three, even though the obvious hazards such as inaccurate estimates of numEntries or dataWidth didn't apply. To improve matters, create functions provided by the relevant executor modules that can estimate the required sizes with reasonable accuracy. (We're still not accounting for effects like allocator padding, but this at least gets the first-order effects correct.) I added functions that can estimate the tuple table sizes for nodeSetOp and nodeSubplan; these rely on an estimator for TupleHashTables in general, and that in turn relies on one for simplehash.h hash tables. That feels like kind of a lot of mechanism, but if we take any short-cuts we're violating modularity boundaries. The other places that use TupleHashTables are nodeAgg, which took pains to get its numbers right already, and nodeRecursiveunion. I did not try to improve the situation for nodeRecursiveunion because there's nothing to improve: we are not making an estimate of the hash table size, and it wouldn't help us to do so because we have no non-hashed alternative implementation. On top of that, our estimate of the number of entries to be hashed in that module is so suspect that we'd likely often choose the wrong implementation if we did have two ways to do it. Reported-by: Jeff Janes <jeff.janes@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com	2025-11-02 16:57:26 -05:00
Tom Lane	645c1e2752	Avoid mixing void and integer in a conditional expression. The C standard says that the second and third arguments of a conditional operator shall be both void type or both not-void type. The Windows version of INTERRUPTS_PENDING_CONDITION() got this wrong. It's pretty harmless because the result of the operator is ignored anyway, but apparently recent versions of MSVC have started issuing a warning about it. Silence the warning by casting the dummy zero to void. Reported-by: Christian Ullrich <chris@chrullrich.net> Author: Bryan Green <dbryan.green@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/cc4ef8db-f8dc-4347-8a22-e7ebf44c0308@chrullrich.net Backpatch-through: 13	2025-11-02 12:30:44 -05:00
Peter Eisentraut	8a27d418f8	Mark function arguments of type "Datum " as "const Datum " where possible Several functions in the codebase accept "Datum " parameters but do not modify the pointed-to data. These have been updated to take "const Datum " instead, improving type safety and making the interfaces clearer about their intent. This change helps the compiler catch accidental modifications and better documents immutability of arguments. Most of "Datum " parameters have a pairing "bool isnull" parameter, they are constified as well. No functional behavior is changed by this patch. Author: Chao Li <lic@highgo.com> Discussion: https://www.postgresql.org/message-id/flat/CAEoWx2msfT0knvzUa72ZBwu9LR_RLY4on85w2a9YpE-o2By5HQ@mail.gmail.com	2025-10-31 10:47:25 +01:00
Tom Lane	c106ef0807	Use BumpContext contexts in TupleHashTables, and do some code cleanup. For all extant uses of TupleHashTables, execGrouping.c itself does nothing with the "tablecxt" except to allocate new hash entries in it, and the callers do nothing with it except to reset the whole context. So this is an ideal use-case for a BumpContext, and the hash tables are frequently big enough for the savings to be significant. (Commit `cc721c459` already taught nodeAgg.c this idea, but neglected the other callers of BuildTupleHashTable.) While at it, let's clean up some ill-advised leftovers from rebasing TupleHashTables on simplehash.h: * Many comments and variable names were based on the idea that the tablecxt holds the whole TupleHashTable, whereas now it only holds the hashed tuples (plus any caller-defined "additional storage"). Rename to names like tuplescxt and tuplesContext, and adjust the comments. Also adjust the memory context names to be like "<Foo> hashed tuples". * Make ResetTupleHashTable() reset the tuplescxt rather than relying on the caller to do so; that was fairly bizarre and seems like a recipe for leaks. This is less efficient in the case where nodeAgg.c uses the same tuplescxt for several different hashtables, but only microscopically so because mcxt.c will short-circuit the extra resets via its isReset flag. I judge the extra safety and intellectual cleanliness well worth those few cycles. * Remove the long-obsolete "allow_jit" check added by ac88807f9; instead, just Assert that metacxt and tuplescxt are different. We need that anyway for this definition of ResetTupleHashTable() to be safe. There is a side issue of the extent to which this change invalidates the planner's estimates of hashtable memory consumption. However, those estimates are already pretty bad, so improving them seems like it can be a separate project. This change is useful to do first to establish consistent executor behavior that the planner can expect. A loose end not addressed here is that the "entrysize" calculation in BuildTupleHashTable seems wrong: "sizeof(TupleHashEntryData) + additionalsize" corresponds neither to the size of the simplehash entries nor to the total space needed per tuple. It's questionable why BuildTupleHashTable is second-guessing its caller's nbuckets choice at all, since the original source of the number should have had more information. But that all seems wrapped up with the planner's estimation logic, so let's leave it for the planned followup patch. Reported-by: Jeff Janes <jeff.janes@gmail.com> Reported-by: David Rowley <dgrowleyml@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com Discussion: https://postgr.es/m/2268409.1761512111@sss.pgh.pa.us	2025-10-30 11:21:22 -04:00
Peter Eisentraut	e1ac846f3d	Mark ItemPointer arguments as const throughout This is a follow up `991295f`. I searched over src/ and made all ItemPointer arguments as const as much as possible. Note: We cut out from the original patch the pieces that would have created incompatibilities in the index or table AM APIs. Those could be considered separately. Author: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/CAEoWx2nBaypg16Z5ciHuKw66pk850RFWw9ACS2DqqJ_AkKeRsw%40mail.gmail.com	2025-10-30 14:12:06 +01:00
Peter Eisentraut	8ce795fcb7	Fix some confusing uses of const There are a few places where we have typedef struct FooData { ... } FooData; typedef FooData Foo; and then function declarations with bar(const Foo x) which isn't incorrect but probably meant bar(const FooData x) meaning that the thing x points to is immutable, not x itself. This patch makes those changes where appropriate. In one case (execGrouping.c), the thing being pointed to was not immutable, so in that case remove the const altogether, to avoid further confusion. Co-authored-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/CAEoWx2m2E0xE8Kvbkv31ULh_E%2B5zph-WA_bEdv3UR9CLhw%2B3vg%40mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAEoWx2kTDz%3Db6T2xHX78vy_B_osDeCC5dcTCi9eG0vXHp5QpdQ%40mail.gmail.com	2025-10-30 11:20:04 +01:00
Peter Eisentraut	3479a0f823	const-qualify ItemPointer comparison functions Add const qualifiers to ItemPointerEquals() and ItemPointerCompare(). This will allow further changes up the stack. It also complements commit `aeb767ca0b`, as we now have all of itemptr.h appropriately const-qualified. Author: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAEoWx2nBaypg16Z5ciHuKw66pk850RFWw9ACS2DqqJ_AkKeRsw@mail.gmail.com	2025-10-30 10:13:47 +01:00
Jeff Davis	3853a6956c	Use C11 char16_t and char32_t for Unicode code points. Reviewed-by: Tatsuo Ishii <ishii@postgresql.org> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/bedcc93d06203dfd89815b10f815ca2de8626e85.camel%40j-davis.com	2025-10-29 14:17:13 -07:00
Peter Eisentraut	a13833c35f	Reorganize GUC structs Instead of having five separate GUC structs, one for each type, with the generic part contained in each of them, flip it around and have one common struct, with the type-specific part has a subfield. The very original GUC design had type-specific structs and type-specific lists, and the membership in one of the lists defined the type. But now the structs themselves know the type (from the .vartype field), and they are all loaded into a common hash table at run time, and so this original separation no longer makes sense. It creates a bunch of inconsistencies in the code about whether the type-specific or the generic struct is the primary struct, and a lot of casting in between, which makes certain assumptions about the struct layouts. After the change, all these casts are gone and all the data is accessed via normal field references. Also, various code is simplified because only one kind of struct needs to be processed. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/flat/8fdfb91e-60fb-44fa-8df6-f5dea47353c9@eisentraut.org	2025-10-29 09:52:29 +01:00
Peter Eisentraut	f0f2c0c1ae	Replace pg_restrict by standard restrict MSVC in C11 mode supports the standard restrict qualifier, so we don't need the workaround naming pg_restrict anymore. Even though restrict is in C99 and should be supported by all supported compilers, we keep the configure test and the hardcoded redirection to __restrict, because that will also work in C++ in all supported compilers. (restrict is not part of the C++ standard.) For backward compatibility for extensions, we keep a #define of pg_restrict around, but our own code doesn't use it anymore. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/0e3d8644-c01d-4374-86ea-9f0a987981f0%40eisentraut.org	2025-10-29 07:52:58 +01:00
Peter Eisentraut	c094be259b	Remove obsolete comment The comment "type prefixes (const, signed, volatile, inline) are handled in pg_config.h." has been mostly not true for a long time.	2025-10-29 07:32:21 +01:00
Michael Paquier	d3111cb753	Fix correctness issue with computation of FPI size in WAL stats XLogRecordAssemble() may be called multiple times before inserting a record in XLogInsertRecord(), and the amount of FPIs generated inside a record whose insertion is attempted multiple times may vary. The logic added in `f9a09aa295` touched directly pgWalUsage in XLogRecordAssemble(), meaning that it could be possible for pgWalUsage to be incremented multiple times for a single record. This commit changes the code to use the same logic as the number of FPIs added to a record, where XLogRecordAssemble() returns this information and feeds it to XLogInsertRecord(), updating pgWalUsage only when a record is inserted. Reported-by: Shinya Kato <shinya11.kato@gmail.com> Discussion: https://postgr.es/m/CAOzEurSiSr+rusd0GzVy8Bt30QwLTK=ugVMnF6=5WhsSrukvvw@mail.gmail.com	2025-10-29 09:13:31 +09:00
Michael Paquier	f9a09aa295	Add wal_fpi_bytes to pg_stat_wal and pg_stat_get_backend_wal() This new counter, called "wal_fpi_bytes", tracks the total amount in bytes of full page images (FPIs) generated in WAL. This data becomes available globally via pg_stat_wal, and for backend statistics via pg_stat_get_backend_wal(). Previously, this information could only be retrieved with pg_waldump or pg_walinspect, which may not be available depending on the environment, and are expensive to execute. It offers hints about how much FPIs impact the WAL generated, which could be a large percentage for some workloads, as well as the effects of wal_compression or page holes. Bump catalog version. Bump PGSTAT_FILE_FORMAT_ID, due to the addition of wal_fpi_bytes in PgStat_WalCounters. Author: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAOzEurQtZEAfg6P0kU3Wa-f9BWQOi0RzJEMPN56wNTOmJLmfaQ@mail.gmail.com	2025-10-28 16:21:51 +09:00
Amit Kapila	3e8e05596a	Add worker type argument to logical replication worker functions. Extend logicalrep_worker_stop, logicalrep_worker_wakeup, and logicalrep_worker_find to accept a worker type argument. This change enables differentiation between logical replication worker types, such as apply workers and table sync workers. While preserving existing behavior, it lays the groundwork for upcoming patch to add sequence synchronization workers. Author: Vignesh C <vignesh21@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com	2025-10-28 05:47:50 +00:00
Peter Eisentraut	10b5bb3bff	Add some const qualifications Add some const qualifications afforded by the previous change that added a const qualification to PageAddItemExtended(). Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://www.postgresql.org/message-id/flat/c75cccf5-5709-407b-a36a-2ae6570be766@eisentraut.org	2025-10-27 09:55:59 +01:00
Peter Eisentraut	76acf4b722	Remove Item type This type is just char * underneath, it provides no real value, no type safety, and just makes the code one level more mysterious. It is more idiomatic to refer to blobs of memory by a combination of void * and size_t, so change it to that. Also, since this type hides the pointerness, we can't apply qualifiers to what is pointed to, which requires some unconstify nonsense. This change allows fixing that. Extension code that uses the Item type can change its code to use void * to be backward compatible. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://www.postgresql.org/message-id/flat/c75cccf5-5709-407b-a36a-2ae6570be766@eisentraut.org	2025-10-27 09:55:59 +01:00

... 3 4 5 6 7 ...

12907 commits