postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-04-26 00:31:07 -04:00

Author	SHA1	Message	Date
Tom Lane	a3179ab692	Recalculate where-needed data accurately after a join removal. Up to now, remove_rel_from_query() has done a pretty shoddy job of updating our where-needed bitmaps (per-Var attr_needed and per-PlaceHolderVar ph_needed relid sets). It removed direct mentions of the to-be-removed baserel and outer join, which is the minimum amount of effort needed to keep the data structures self-consistent. But it didn't account for the fact that the removed join ON clause probably mentioned Vars of other relations, and those Vars might now not be needed as high up in the join tree as before. It's easy to show cases where this results in failing to remove a lower outer join that could also have been removed. To fix, recalculate the where-needed bitmaps from scratch after each successful join removal. This sounds expensive, but it seems to add only negligible planner runtime. (We cheat a little bit by preserving "relation 0" entries in the bitmaps, allowing us to skip re-scanning the targetlist and HAVING qual.) The submitted test case drew attention because we had successfully optimized away the lower join prior to v16. I suspect that that's somewhat accidental and there are related cases that were never optimized before and now can be. I've not tried to come up with one, though. Perhaps we should back-patch this into v16 and v17 to repair the performance regression. However, since it took a year for anyone to notice the problem, it can't be affecting too many people. Let's let the patch bake awhile in HEAD, and see if we get more complaints. Per bug #18627 from Mikaël Gourlaouen. No back-patch for now. Discussion: https://postgr.es/m/18627-44f950eb6a8416c2@postgresql.org	2024-09-27 16:04:04 -04:00
Robert Haas	8dfd312902	pg_verifybackup: Verify tar-format backups. This also works for compressed tar-format backups. However, -n must be used, because we use pg_waldump to verify WAL, and it doesn't yet know how to verify WAL that is stored inside of a tarfile. Amul Sul, reviewed by Sravan Kumar and by me, and revised by me.	2024-09-27 08:40:24 -04:00
Michael Paquier	f762d99c87	Fix catalog data of new LO privilege functions This commit improves the catalog data in pg_proc for the three functions for has_largeobject_privilege(), introduced in `4eada203a5`: - Fix their descriptions (typos and consistency). - Reallocate OIDs to be within the 8000-9999 range as required by `a6417078c4`. Bump catalog version. Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/ZvUYR0V0dzWaLnsV@paquier.xyz	2024-09-27 07:26:29 +09:00
Alexander Korotkov	e658038772	Update oid for pg_wal_replay_wait() procedure Use an oid from 8000-9999 range, as required by `98eab30b93`. Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZvUY6bfTwB0GsyzP%40paquier.xyz	2024-09-26 11:49:41 +03:00
Noah Misch	aac2c9b4fd	For inplace update durability, make heap_update() callers wait. The previous commit fixed some ways of losing an inplace update. It remained possible to lose one when a backend working toward a heap_update() copied a tuple into memory just before inplace update of that tuple. In catalogs eligible for inplace update, use LOCKTAG_TUPLE to govern admission to the steps of copying an old tuple, modifying it, and issuing heap_update(). This includes MERGE commands. To avoid changing most of the pg_class DDL, don't require LOCKTAG_TUPLE when holding a relation lock sufficient to exclude inplace updaters. Back-patch to v12 (all supported versions). In v13 and v12, "UPDATE pg_class" or "UPDATE pg_database" can still lose an inplace update. The v14+ UPDATE fix needs commit `86dc90056d`, and it wasn't worth reimplementing that fix without such infrastructure. Reviewed by Nitin Motiani and (in earlier versions) Heikki Linnakangas. Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com	2024-09-24 15:25:18 -07:00
Noah Misch	a07e03fd8f	Fix data loss at inplace update after heap_update(). As previously-added tests demonstrated, heap_inplace_update() could instead update an unrelated tuple of the same catalog. It could lose the update. Losing relhasindex=t was a source of index corruption. Inplace-updating commands like VACUUM will now wait for heap_update() commands like GRANT TABLE and GRANT DATABASE. That isn't ideal, but a long-running GRANT already hurts VACUUM progress more just by keeping an XID running. The VACUUM will behave like a DELETE or UPDATE waiting for the uncommitted change. For implementation details, start at the systable_inplace_update_begin() header comment and README.tuplock. Back-patch to v12 (all supported versions). In back branches, retain a deprecated heap_inplace_update(), for extensions. Reported by Smolkin Grigory. Reviewed by Nitin Motiani, (in earlier versions) Heikki Linnakangas, and (in earlier versions) Alexander Lakhin. Discussion: https://postgr.es/m/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com	2024-09-24 15:25:18 -07:00
Jeff Davis	ac30021356	Allow length=-1 for NUL-terminated input to pg_strncoll(), etc. Like ICU, allow a length of -1 to be specified for NUL-terminated arguments to pg_strncoll(), pg_strnxfrm(), and pg_strnxfrm_prefix(). Simplifies the code and comments. Discussion: https://postgr.es/m/2d758e07dff26bcc7cbe2aec57431329bfe3679a.camel@j-davis.com	2024-09-24 15:15:18 -07:00
Jeff Davis	ceeaaed87a	Tighten up make_libc_collator() and make_icu_collator(). Ensure that error paths within these functions do not leak a collator, and return the result rather than using an out parameter. (Error paths in the caller may still result in a leaked collator, which will be addressed separately.) In make_libc_collator(), if the first newlocale() succeeds and the second one fails, close the first locale_t object. The function make_icu_collator() doesn't have any external callers, so change it to be static. Discussion: https://postgr.es/m/54d20e812bd6c3e44c10eddcd757ec494ebf1803.camel@j-davis.com	2024-09-24 12:01:45 -07:00
Nathan Bossart	6aa44060a3	Remove pg_authid's TOAST table. pg_authid's only varlena column is rolpassword, which unfortunately cannot be de-TOASTed during authentication because we haven't selected a database yet and cannot read pg_class. By removing pg_authid's TOAST table, attempts to set password hashes that require out-of-line storage will fail with a "row is too big" error instead. We may want to provide a more user-friendly error in the future, but for now let's just remove the useless TOAST table. Bumps catversion. Reported-by: Alexander Lakhin Reviewed-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/89e8649c-eb74-db25-7945-6d6b23992394%40gmail.com	2024-09-21 15:17:46 -05:00
Tomas Vondra	c4d5cb71d2	Increase the number of fast-path lock slots Replace the fixed-size array of fast-path locks with arrays, sized on startup based on max_locks_per_transaction. This allows using fast-path locking for workloads that need more locks. The fast-path locking introduced in 9.2 allowed each backend to acquire a small number (16) of weak relation locks cheaply. If a backend needs to hold more locks, it has to insert them into the shared lock table. This is considerably more expensive, and may be subject to contention (especially on many-core systems). The limit of 16 fast-path locks was always rather low, because we have to lock all relations - not just tables, but also indexes, views, etc. For planning we need to lock all relations that might be used in the plan, not just those that actually get used in the final plan. So even with rather simple queries and schemas, we often need significantly more than 16 locks. As partitioning gets used more widely, and the number of partitions increases, this limit is trivial to hit. Complex queries may easily use hundreds or even thousands of locks. For workloads doing a lot of I/O this is not noticeable, but for workloads accessing only data in RAM, the access to the shared lock table may be a serious issue. This commit removes the hard-coded limit of the number of fast-path locks. Instead, the size of the fast-path arrays is calculated at startup, and can be set much higher than the original 16-lock limit. The overall fast-path locking protocol remains unchanged. The variable-sized fast-path arrays can no longer be part of PGPROC, but are allocated as a separate chunk of shared memory and then references from the PGPROC entries. The fast-path slots are organized as a 16-way set associative cache. You can imagine it as a hash table of 16-slot "groups". Each relation is mapped to exactly one group using hash(relid), and the group is then processed using linear search, just like the original fast-path cache. With only 16 entries this is cheap, with good locality. Treating this as a simple hash table with open addressing would not be efficient, especially once the hash table gets almost full. The usual remedy is to grow the table, but we can't do that here easily. The access would also be more random, with worse locality. The fast-path arrays are sized using the max_locks_per_transaction GUC. We try to have enough capacity for the number of locks specified in the GUC, using the traditional 2^n formula, with an upper limit of 1024 lock groups (i.e. 16k locks). The default value of max_locks_per_transaction is 64, which means those instances will have 64 fast-path slots. The main purpose of the max_locks_per_transaction GUC is to size the shared lock table. It is often set to the "average" number of locks needed by backends, with some backends using significantly more locks. This should not be a major issue, however. Some backens may have to insert locks into the shared lock table, but there can't be too many of them, limiting the contention. The only solution is to increase the GUC, even if the shared lock table already has sufficient capacity. That is not free, especially in terms of memory usage (the shared lock table entries are fairly large). It should only happen on machines with plenty of memory, though. In the future we may consider a separate GUC for the number of fast-path slots, but let's try without one first. Reviewed-by: Robert Haas, Jakub Wartak Discussion: https://postgr.es/m/510b887e-c0ce-4a0c-a17a-2c6abb8d9a5c@enterprisedb.com	2024-09-21 20:09:35 +02:00
Tom Lane	54562c9cfa	Improve Asserts checking relation matching in parallel scans. table_beginscan_parallel and index_beginscan_parallel contain Asserts checking that the relation a worker will use in a parallel scan is the same one the leader intended. However, they were checking for relation OID match, which was not strong enough to detect the mismatch problem fixed in `126ec0bc7`. What would be strong enough is to compare relfilenodes instead. Arguably, that's a saner definition anyway, since a scan surely operates on a physical relation not a logical one. Hence, store and compare RelFileLocators not relation OIDs. Also ensure that index_beginscan_parallel checks the index identity not just the table identity. Discussion: https://postgr.es/m/2127254.1726789524@sss.pgh.pa.us	2024-09-20 16:37:55 -04:00
Alexander Korotkov	014f9f34d2	Move pg_wal_replay_wait() to xlogfuncs.c This commit moves pg_wal_replay_wait() procedure to be a neighbor of WAL-related functions in xlogfuncs.c. The implementation of LSN waiting continues to reside in the same place. By proposal from Michael Paquier. Reported-by: Peter Eisentraut Discussion: https://postgr.es/m/18c0fa64-0475-415e-a1bd-665d922c5201%40eisentraut.org	2024-09-19 14:26:11 +03:00
Nathan Bossart	b52c4fc3c0	Add TOAST table to pg_index. This change allows pg_index rows to use out-of-line storage for the "indexprs" and "indpred" columns, which enables use-cases such as very large index expressions. This system catalog was previously not given a TOAST table due to a fear of circularity issues (see commit `96cdeae07f`). Testing has not revealed any such problems, and it seems unlikely that the entries for system indexes could ever need out-of-line storage. In any case, it is still early in the v18 development cycle, so committing this now will hopefully increase the chances of finding any unexpected problems prior to release. Bumps catversion. Reported-by: Jonathan Katz Reviewed-by: Tom Lane Discussion: https://postgr.es/m/b611015f-b423-458c-aa2d-be0e655cc1b4%40postgresql.org	2024-09-18 14:42:57 -05:00
Michael Paquier	b14e9ce7d5	Extend PgStat_HashKey.objid from 4 to 8 bytes This opens the possibility to define keys for more types of statistics kinds in PgStat_HashKey, the first case being 8-byte query IDs for statistics like pg_stat_statements. This increases the size of PgStat_HashKey from 12 to 16 bytes, while PgStatShared_HashEntry, entry stored in the dshash for pgstats, keeps the same size due to alignment. xl_xact_stats_item, that tracks the stats items to drop in commit WAL records, is increased from 12 to 16 bytes. Note that individual chunks in commit WAL records should be multiples of sizeof(int), hence 8-byte object IDs are stored as two uint32, based on a suggestion from Heikki Linnakangas. While on it, the field of PgStat_HashKey is renamed from "objoid" to "objid", as for some stats kinds this field does not refer to OIDs but just IDs, like for replication slot stats. This commit bumps the following format variables: - PGSTAT_FILE_FORMAT_ID, as PgStat_HashKey is written to the stats file for non-serialized stats kinds in the dshash table. - XLOG_PAGE_MAGIC for the changes in xl_xact_stats_item. - Catalog version, for the SQL function pg_stat_have_stats(). Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/ZsvTS9EW79Up8I62@paquier.xyz	2024-09-18 12:44:15 +09:00
Thomas Munro	70d38e3d8a	Allow ReadStream to be consumed as raw block numbers. Commits `041b9680` and `6377e12a` changed the interface of scan_analyze_next_block() to take a ReadStream instead of a BlockNumber and a BufferAccessStrategy, and to return a value to indicate when the stream has run out of blocks. This caused integration problems for at least one known extension that uses specially encoded BlockNumber values that map to different underlying storage, because acquire_sample_rows() sets up the stream so that read_stream_next_buffer() reads blocks from the main fork of the relation's SMgrRelation. Provide read_stream_next_block(), as a way for such an extension to access the stream of raw BlockNumbers directly and forward them to its own ReadBuffer() calls after decoding, as it could in earlier releases. The new function returns the BlockNumber and BufferAccessStrategy that were previously passed directly to scan_analyze_next_block(). Alternatively, an extension could wrap the stream of BlockNumbers in another ReadStream with a callback that performs any decoding required to arrive at real storage manager BlockNumber values, so that it could benefit from the I/O combining and concurrency provided by read_stream.c. Another class of table access method that does nothing in scan_analyze_next_block() because it is not block-oriented could use this function to control the number of block sampling loops. It could match the previous behavior with "return read_stream_next_block(stream, &bas) != InvalidBlockNumber". Ongoing work is expected to provide better ANALYZE support for table access methods that don't behave like heapam with respect to storage blocks, but that will be for future releases. Back-patch to 17. Reported-by: Mats Kindahl <mats@timescale.com> Reviewed-by: Mats Kindahl <mats@timescale.com> Discussion: https://postgr.es/m/CA%2B14425%2BCcm07ocG97Fp%2BFrD9xUXqmBKFvecp0p%2BgV2YYR258Q%40mail.gmail.com	2024-09-18 11:34:28 +12:00
Peter Eisentraut	89f908a6d0	Add temporal FOREIGN KEY contraints Add PERIOD clause to foreign key constraint definitions. This is supported for range and multirange types. Temporal foreign keys check for range containment instead of equality. This feature matches the behavior of the SQL standard temporal foreign keys, but it works on PostgreSQL's native ranges instead of SQL's "periods", which don't exist in PostgreSQL (yet). Reference actions ON {UPDATE,DELETE} {CASCADE,SET NULL,SET DEFAULT} are not supported yet. (previously committed as `34768ee361`, reverted by 8aee330af55; this is essentially unchanged from those) Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-09-17 11:29:30 +02:00
Peter Eisentraut	fc0438b4e8	Add temporal PRIMARY KEY and UNIQUE constraints Add WITHOUT OVERLAPS clause to PRIMARY KEY and UNIQUE constraints. These are backed by GiST indexes instead of B-tree indexes, since they are essentially exclusion constraints with = for the scalar parts of the key and && for the temporal part. (previously committed as `46a0cd4cef`, reverted by 46a0cd4cefb; the new part is this:) Because 'empty' && 'empty' is false, the temporal PK/UQ constraint allowed duplicates, which is confusing to users and breaks internal expectations. For instance, when GROUP BY checks functional dependencies on the PK, it allows selecting other columns from the table, but in the presence of duplicate keys you could get the value from any of their rows. So we need to forbid empties. This all means that at the moment we can only support ranges and multiranges for temporal PK/UQs, unlike the original patch (above). Documentation and tests for this are added. But this could conceivably be extended by introducing some more general support for the notion of "empty" for other types. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-09-17 11:29:30 +02:00
Peter Eisentraut	7406ab623f	Add stratnum GiST support function This is support function 12 for the GiST AM and translates "well-known" RT*StrategyNumber values into whatever strategy number is used by the opclass (since no particular numbers are actually required). We will use this to support temporal PRIMARY KEY/UNIQUE/FOREIGN KEY/FOR PORTION OF functionality. This commit adds two implementations, one for internal GiST opclasses (just an identity function) and another for btree_gist opclasses. It updates btree_gist from 1.7 to 1.8, adding the support function for all its opclasses. (previously committed as `6db4598fcb`, reverted by 8aee330af55; this is essentially unchanged from those) Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-09-17 11:29:29 +02:00
Jeff Davis	b0c30612c5	Simplify checks for deterministic collations. Remove redundant checks for locale->collate_is_c now that we always have a valid pg_locale_t. Also, remove pg_locale_deterministic() wrapper, which is no longer useful after commit `e9931bfb75`. Just check the field directly, consistent with other fields in pg_locale_t. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se	2024-09-12 13:35:56 -07:00
Fujii Masao	4eada203a5	Add has_largeobject_privilege function. This function checks whether a user has specific privileges on a large object, identified by OID. The user can be provided by name, OID, or default to the current user. If the specified large object doesn't exist, the function returns NULL. It raises an error for a non-existent user name. This behavior is basically consistent with other privilege inquiry functions like has_table_privilege. Bump catalog version. Author: Yugo Nagata Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20240702163444.ab586f6075e502eb84f11b1a@sranhm.sraoss.co.jp	2024-09-12 21:51:26 +09:00
Fujii Masao	412229d197	Deduplicate code in LargeObjectExists and myLargeObjectExists. myLargeObjectExists() and LargeObjectExists() had nearly identical code, except for handling snapshots. This commit renames myLargeObjectExists() to LargeObjectExistsWithSnapshot() and refactors LargeObjectExists() to call it internally, reducing duplication. Author: Yugo Nagata Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20240702163444.ab586f6075e502eb84f11b1a@sranhm.sraoss.co.jp	2024-09-12 21:45:42 +09:00
Peter Eisentraut	23d0b48468	Remove hardcoded hash opclass function signature exceptions hashvalidate(), which validates the signatures of support functions for the hash AM, contained several hardcoded exceptions. For example, hash/date_ops support function 1 was hashint4(), which would ordinarily fail validation because the function argument is int4, not date. But this works internally because int4 and date are of the same size. There are several more exceptions like this that happen to work and were allowed historically but would now fail the function signature validation. This patch removes those exceptions by providing new support functions that have the proper declared signatures. They internally share most of the code with the "wrong" functions they replace, so the behavior is still the same. With the exceptions gone, hashvalidate() is now simplified and relies fully on check_amproc_signature(). hashvarlena() and hashvarlenaextended() are kept in pg_proc.dat because some extensions currently use them to build hash functions for their own types, and we need to keep exposing these functions as "LANGUAGE internal" functions for that to continue to work. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/29c3b746-69e7-482a-b37c-dbbf7e5b009b@eisentraut.org	2024-09-12 12:57:43 +02:00
Michael Paquier	00c76cf21c	Move logic related to WAL replay of Heap/Heap2 into its own file This brings more clarity to heapam.c, by cleanly separating all the logic related to WAL replay and the rest of Heap and Heap2, similarly to other RMGRs like hash, btree, etc. The header reorganization is also nice in heapam.c, cutting half of the headers required. Author: Li Yong Reviewed-by: Sutou Kouhei, Michael Paquier Discussion: https://postgr.es/m/EFE55E65-D7BD-4C6A-B630-91F43FD0771B@ebay.com	2024-09-12 13:32:05 +09:00
David Rowley	9fba1ed294	Adjust tuplestore stats API `1eff8279d` added an API to tuplestore.c to allow callers to obtain storage telemetry data. That API wasn't quite good enough for callers that perform tuplestore_clear() as the telemetry functions only accounted for the current state of the tuplestore, not the maximums before tuplestore_clear() was called. There's a pending patch that would like to add tuplestore telemetry output to EXPLAIN ANALYZE for WindowAgg. That node type uses tuplestore_clear() before moving to the next window partition and we want to show the maximum space used, not the space used for the final partition. Reviewed-by: Tatsuo Ishii, Ashutosh Bapat Discussion: https://postgres/m/CAApHDvoY8cibGcicLV0fNh=9JVx9PANcWvhkdjBnDCc9Quqytg@mail.gmail.com	2024-09-12 16:02:01 +12:00
Peter Eisentraut	0785d1b8b2	common/jsonapi: support libpq as a client Based on a patch by Michael Paquier. For libpq, use PQExpBuffer instead of StringInfo. This requires us to track allocation failures so that we can return JSON_OUT_OF_MEMORY as needed rather than exit()ing. Author: Jacob Champion <jacob.champion@enterprisedb.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Co-authored-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.camel@vmware.com	2024-09-11 09:01:07 +02:00
Peter Eisentraut	56fead44dc	Add amgettreeheight index AM API routine The only current implementation is for btree where it calls _bt_getrootheight(). Other index types can now also use this to pass information to their amcostestimate routine. Previously, btree was hardcoded and other index types could not hook into the optimizer at this point. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2024-09-10 10:03:23 +02:00
Richard Guo	f5050f795a	Mark expressions nullable by grouping sets When generating window_pathkeys, distinct_pathkeys, or sort_pathkeys, we failed to realize that the grouping/ordering expressions might be nullable by grouping sets. As a result, we may incorrectly deem that the PathKeys are redundant by EquivalenceClass processing and thus remove them from the pathkeys list. That would lead to wrong results in some cases. To fix this issue, we mark the grouping expressions nullable by grouping sets if that is the case. If the grouping expression is a Var or PlaceHolderVar or constructed from those, we can just add the RT index of the RTE_GROUP RTE to the existing nullingrels field(s); otherwise we have to add a PlaceHolderVar to carry on the nullingrel bit. However, we have to manually remove this nullingrel bit from expressions in various cases where these expressions are logically below the grouping step, such as when we generate groupClause pathkeys for grouping sets, or when we generate PathTarget for initial input to grouping nodes. Furthermore, in set_upper_references, the targetlist and quals of an Agg node should have nullingrels that include the effects of the grouping step, ie they will have nullingrels equal to the input Vars/PHVs' nullingrels plus the nullingrel bit that references the grouping RTE. In order to perform exact nullingrels matches, we also need to manually remove this nullingrel bit. Bump catversion because this changes the querytree produced by the parser. Thanks to Tom Lane for the idea to invent a new kind of RTE. Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from various threads. Author: Richard Guo Reviewed-by: Ashutosh Bapat, Sutou Kouhei Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com	2024-09-10 12:36:48 +09:00
Richard Guo	247dea89f7	Introduce an RTE for the grouping step If there are subqueries in the grouping expressions, each of these subqueries in the targetlist and HAVING clause is expanded into distinct SubPlan nodes. As a result, only one of these SubPlan nodes would be converted to reference to the grouping key column output by the Agg node; others would have to get evaluated afresh. This is not efficient, and with grouping sets this can cause wrong results issues in cases where they should go to NULL because they are from the wrong grouping set. Furthermore, during re-evaluation, these SubPlan nodes might use nulled column values from grouping sets, which is not correct. This issue is not limited to subqueries. For other types of expressions that are part of grouping items, if they are transformed into another form during preprocessing, they may fail to match lower target items. This can also lead to wrong results with grouping sets. To fix this issue, we introduce a new kind of RTE representing the output of the grouping step, with columns that are the Vars or expressions being grouped on. In the parser, we replace the grouping expressions in the targetlist and HAVING clause with Vars referencing this new RTE, so that the output of the parser directly expresses the semantic requirement that the grouping expressions be gotten from the grouping output rather than computed some other way. In the planner, we first preprocess all the columns of this new RTE and then replace any Vars in the targetlist and HAVING clause that reference this new RTE with the underlying grouping expressions, so that we will have only one instance of a SubPlan node for each subquery contained in the grouping expressions. Bump catversion because this changes the querytree produced by the parser. Thanks to Tom Lane for the idea to invent a new kind of RTE. Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from various threads. Author: Richard Guo Reviewed-by: Ashutosh Bapat, Sutou Kouhei Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com	2024-09-10 12:35:34 +09:00
Robert Haas	cdb6b0fdb0	Add PQfullProtocolVersion() to surface the precise protocol version. The existing function PQprotocolVersion() does not include the minor version of the protocol. In preparation for pending work that will bump that number for the first time, add a new function to provide it to clients that may care, using the (major * 10000 + minor) convention already used by PQserverVersion(). Jacob Champion based on earlier work by Jelte Fennema-Nio Discussion: http://postgr.es/m/CAOYmi+mM8+6Swt1k7XsLcichJv8xdhPnuNv7-02zJWsezuDL+g@mail.gmail.com	2024-09-09 11:54:55 -04:00
Michael Paquier	fc415edf8c	Add callbacks to control flush of fixed-numbered stats This commit adds two callbacks in pgstats to have a better control of the flush timing of pgstat_report_stat(), whose operation depends on the three PGSTAT_*_INTERVAL variables: - have_fixed_pending_cb(), to check if a stats kind has any pending data waiting for a flush. This is used as a fast path if there are no pending statistics to flush, and this check is done for fixed-numbered statistics only if there are no variable-numbered statistics to flush. A flush will need to happen if at least one callback reports any pending data. - flush_fixed_cb(), to do the actual flush. These callbacks are currently used by the SLRU, WAL and IO statistics, generalizing the concept for all stats kinds (builtin and custom). The SLRU and IO stats relied each on one global variable to determine whether a flush should happen; these are now local to pgstat_slru.c and pgstat_io.c, cleaning up a bit how the pending flush states are tracked in pgstat.c. pgstat_flush_io() and pgstat_flush_wal() are still required, but we do not need to check their return result anymore. Reviewed-by: Bertrand Drouvot, Kyotaro Horiguchi Discussion: https://postgr.es/m/ZtaVO0N-aTwiAk3w@paquier.xyz	2024-09-09 11:12:29 +09:00
Jeff Davis	51edc4ca54	Remove lc_ctype_is_c(). Instead always fetch the locale and look at the ctype_is_c field. hba.c relies on regexes working for the C locale without needing catalog access, which worked before due to a special case for C_COLLATION_OID in lc_ctype_is_c(). Move the special case to pg_set_regex_collation() now that lc_ctype_is_c() is gone. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se	2024-09-06 13:23:21 -07:00
Amit Langote	3422f5f93f	Update comment about ExprState.escontext The updated comment provides more helpful guidance by mentioning that escontext should be set when soft error handling is needed. Reported-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17	2024-09-06 10:13:53 +09:00
Michael Paquier	1b373aed20	Add callback for backend initialization in pgstats pgstat_initialize() is currently used by the WAL stats as a code path to take some custom actions when a backend starts. A callback is added to generalize the concept so as all stats kinds can do the same, for builtin and custom kinds, if set. Reviewed-by: Bertrand Drouvot, Kyotaro Horiguchi Discussion: https://postgr.es/m/ZtZr1K4PLdeWclXY@paquier.xyz	2024-09-05 16:05:21 +09:00
David Rowley	908a968612	Optimize WindowAgg's use of tuplestores When WindowAgg finished one partition of a PARTITION BY, it previously would call tuplestore_end() to purge all the stored tuples before again calling tuplestore_begin_heap() and carefully setting up all of the tuplestore read pointers exactly as required for the given frameOptions. Since the frameOptions don't change between partitions, this part does not make much sense. For queries that had very few rows per partition, the overhead of this was very large. It seems much better to create the tuplestore and the read pointers once and simply call tuplestore_clear() at the end of each partition. tuplestore_clear() moves all of the read pointers back to the start position and deletes all the previously stored tuples. A simple test query with 1 million partitions and 1 tuple per partition has been shown to run around 40% faster than without this change. The additional effort seems to have mostly been spent in malloc/free. Making this work required adding a new bool field to WindowAggState which had the unfortunate effect of being the 9th bool field in a group resulting in the struct being enlarged. Here we shuffle the fields around a little so that the two bool fields for runcondition relating stuff fit into existing padding. Also, move the "runcondition" field to be near those. This frees up enough space with the other bool fields so that the newly added one fits into the padding bytes. This was done to address a very small but apparent performance regression with queries containing a large number of rows per partition. Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Tatsuo Ishii <ishii@postgresql.org> Discussion: https://postgr.es/m/CAHoyFK9n-QCXKTUWT_xxtXninSMEv%2BgbJN66-y6prM3f4WkEHw%40mail.gmail.com	2024-09-05 16:18:30 +12:00
Jeff Davis	06421b0843	Remove lc_collate_is_c(). Instead just look up the collation and check collate_is_c field. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se	2024-09-04 14:35:25 -07:00
Michael Paquier	b4db64270e	Apply more quoting to GUC names in messages This is a continuation of `17974ec259`. More quotes are applied to GUC names in error messages and hints, taking care of what seems to be all the remaining holes currently in the tree for the GUCs. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Pv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w@mail.gmail.com	2024-09-04 13:50:44 +09:00
Amit Kapila	6c2b5edecc	Collect statistics about conflicts in logical replication. This commit adds columns in view pg_stat_subscription_stats to show the number of times a particular conflict type has occurred during the application of logical replication changes. The following columns are added: confl_insert_exists: Number of times a row insertion violated a NOT DEFERRABLE unique constraint. confl_update_origin_differs: Number of times an update was performed on a row that was previously modified by another origin. confl_update_exists: Number of times that the updated value of a row violates a NOT DEFERRABLE unique constraint. confl_update_missing: Number of times that the tuple to be updated is missing. confl_delete_origin_differs: Number of times a delete was performed on a row that was previously modified by another origin. confl_delete_missing: Number of times that the tuple to be deleted is missing. The update_origin_differs and delete_origin_differs conflicts can be detected only when track_commit_timestamp is enabled. Author: Hou Zhijie Reviewed-by: Shveta Malik, Peter Smith, Anit Kapila Discussion: https://postgr.es/m/OS0PR01MB57160A07BD575773045FC214948F2@OS0PR01MB5716.jpnprd01.prod.outlook.com	2024-09-04 08:55:21 +05:30
Noah Misch	c582b75851	Add block_range_read_stream_cb(), to deduplicate code. This replaces two functions for iterating over all blocks in a range. A pending patch will use this instead of adding a third. Nazir Bilal Yavuz Discussion: https://postgr.es/m/20240820184742.f2.nmisch@google.com	2024-09-03 10:46:20 -07:00
Peter Eisentraut	2b5f57977f	Add const qualifiers to XLogRegister*() functions Add const qualifiers to XLogRegisterData() and XLogRegisterBufData(). Several unconstify() calls can be removed. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/dd889784-9ce7-436a-b4f1-52e4a5e577bd@eisentraut.org	2024-09-03 08:06:03 +02:00
Michael Paquier	4236825197	Fix typos and grammar in code comments and docs Author: Alexander Lakhin Discussion: https://postgr.es/m/f7e514cf-2446-21f1-a5d2-8c089a6e2168@gmail.com	2024-09-03 14:49:04 +09:00
Michael Paquier	c7cd2d6ed0	Define PG_TBLSPC_DIR for path pg_tblspc/ in data folder Similarly to `2065ddf5e3`, this introduces a define for "pg_tblspc". This makes the style more consistent with the existing PG_STAT_TMP_DIR, for example. There is a difference with the other cases with the introduction of PG_TBLSPC_DIR_SLASH, required in two places for recovery and backups. Author: Bertrand Drouvot Reviewed-by: Ashutosh Bapat, Álvaro Herrera, Yugo Nagata, Michael Paquier Discussion: https://postgr.es/m/ZryVvjqS9SnV1GPP@ip-10-97-1-34.eu-west-3.compute.internal	2024-09-03 09:11:54 +09:00
Daniel Gustafsson	a70e01d430	Remove support for OpenSSL older than 1.1.0 OpenSSL 1.0.2 has been EOL from the upstream OpenSSL project for some time, and is no longer the default OpenSSL version with any vendor which package PostgreSQL. By retiring support for OpenSSL 1.0.2 we can remove a lot of no longer required complexity for managing state within libcrypto which is now handled by OpenSSL. Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/ZG3JNursG69dz1lr@paquier.xyz Discussion: https://postgr.es/m/CA+hUKGKh7QrYzu=8yWEUJvXtMVm_CNWH1L_TLWCbZMwbi1XP2Q@mail.gmail.com	2024-09-02 13:51:48 +02:00
Peter Eisentraut	4d5111b3f1	More use of getpwuid_r() directly Remove src/port/user.c, call getpwuid_r() directly. This reduces some complexity and allows better control of the error behavior. For example, the old code would in some circumstances silently truncate the result string, or produce error message strings that the caller wouldn't use. src/port/user.c used to be called src/port/thread.c and contained various portability complications to support thread-safety. These are all obsolete, and all but the user-lookup functions have already been removed. This patch completes this by also removing the user-lookup functions. Also convert src/backend/libpq/auth.c to use getpwuid_r() for thread-safety. Originally, I tried to be overly correct by using sysconf(_SC_GETPW_R_SIZE_MAX) to get the buffer size for getpwuid_r(), but that doesn't work on FreeBSD. All the OS where I could find the source code internally use 1024 as the suggested buffer size, so I just ended up hardcoding that. The previous code used BUFSIZ, which is an unrelated constant from stdio.h, so its use seemed inappropriate. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/flat/5f293da9-ceb4-4937-8e52-82c25db8e4d3%40eisentraut.org	2024-09-02 09:04:30 +02:00
Michael Paquier	c39afc38cf	Define PG_LOGICAL_DIR for path pg_logical/ in data folder This is similar to `2065ddf5e3`, but this time for pg_logical/ itself and its contents, like the paths for snapshots, mappings or origin checkpoints. Author: Bertrand Drouvot Reviewed-by: Ashutosh Bapat, Yugo Nagata, Michael Paquier Discussion: https://postgr.es/m/ZryVvjqS9SnV1GPP@ip-10-97-1-34.eu-west-3.compute.internal	2024-08-30 15:25:12 +09:00
Michael Paquier	2065ddf5e3	Define PG_REPLSLOT_DIR for path pg_replslot/ in data folder This commit replaces most of the hardcoded values of "pg_replslot" by a new PG_REPLSLOT_DIR #define. This makes the style more consistent with the existing PG_STAT_TMP_DIR, for example. More places will follow a similar change. Author: Bertrand Drouvot Reviewed-by: Ashutosh Bapat, Yugo Nagata, Michael Paquier Discussion: https://postgr.es/m/ZryVvjqS9SnV1GPP@ip-10-97-1-34.eu-west-3.compute.internal	2024-08-30 10:42:21 +09:00
Michael Paquier	a83a944e9f	Rename pg_sequence_read_tuple() to pg_get_sequence_data() This commit removes log_cnt from the tuple returned by the SQL function. This field is an internal counter that tracks when a WAL record should be generated for a sequence, and it is reset each time the sequence is restored or recovered. It is not necessary to rebuild the sequence DDL commands for pg_dump and pg_upgrade where this function is used. The field can still be queried with a scan of the "table" created under-the-hood for a sequence. Issue noticed while hacking on a feature that can rely on this new function rather than pg_sequence_last_value(), aimed at making sequence computation more easily pluggable. Bump catalog version. Reviewed-by: Nathan Bossart Discussion: https://postgr.es/m/Zsvka3r-y2ZoXAdH@paquier.xyz	2024-08-30 08:49:24 +09:00
Heikki Linnakangas	478846e768	Rename some shared memory initialization routines To make them follow the usual naming convention where FoobarShmemSize() calculates the amount of shared memory needed by Foobar subsystem, and FoobarShmemInit() performs the initialization. I didn't rename CreateLWLocks() and InitShmmeIndex(), because they are a little special. They need to be called before any of the other ShmemInit() functions, because they set up the shared memory bookkeeping itself. I also didn't rename InitProcGlobal(), because unlike other Shmeminit functions, it's not called by individual backends. Reviewed-by: Andreas Karlsson Discussion: https://www.postgresql.org/message-id/c09694ff-2453-47e5-b26c-32a16cd75ce6@iki.fi	2024-08-29 09:46:21 +03:00
Heikki Linnakangas	fbce7dfc77	Refactor lock manager initialization to make it a bit less special Split the shared and local initialization to separate functions, and follow the common naming conventions. With this, we no longer create the LockMethodLocalHash hash table in the postmaster process, which was always pointless. Reviewed-by: Andreas Karlsson Discussion: https://www.postgresql.org/message-id/c09694ff-2453-47e5-b26c-32a16cd75ce6@iki.fi	2024-08-29 09:46:06 +03:00
Amit Kapila	640178c92e	Rename the conflict types for the origin differ cases. The conflict types 'update_differ' and 'delete_differ' indicate that a row to be modified was previously altered by another origin. Rename those to 'update_origin_differs' and 'delete_origin_differs' to clarify their meaning. Author: Hou Zhijie Reviewed-by: Shveta Malik, Peter Smith Discussion: https://postgr.es/m/CAA4eK1+HEKwG_UYt4Zvwh5o_HoCKCjEGesRjJX38xAH3OxuuYA@mail.gmail.com	2024-08-29 09:12:12 +05:30
Peter Eisentraut	6654bb9204	Add prefetching support on macOS macOS doesn't have posix_fadvise(), but fcntl() with the F_RDADVISE command does the same thing. Some related documentation has been generalized to not mention posix_advise() specifically anymore. Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/0827edec-1317-4917-a186-035eb1e3241d%40eisentraut.org	2024-08-28 07:28:27 +02:00
Alexander Korotkov	3890d90c15	Revert support for ALTER TABLE ... MERGE/SPLIT PARTITION(S) commands This commit reverts `1adf16b8fb`, `87c21bb941`, and subsequent fixes and improvements including `df64c81ca9`, `c99ef1811a`, `9dfcac8e15`, `885742b9f8`, `842c9b2705`, `fcf80c5d5f`, `96c7381c4c`, `f4fc7cb54b`, `60ae37a8bc`, `259c96fa8f`, `449cdcd486`, `3ca43dbbb6`, `2a679ae94e`, `3a82c689fd`, `fbd4321fd5`, `d53a4286d7`, `c086896625`, `4e5d6c4091`, `04158e7fa3`. The reason for reverting is security issues related to repeatable name lookups (CVE-2014-0062). Even though `04158e7fa3` solved part of the problem, there are still remaining issues, which aren't feasible to even carefully analyze before the RC deadline. Reported-by: Noah Misch, Robert Haas Discussion: https://postgr.es/m/20240808171351.a9.nmisch%40google.com Backpatch-through: 17	2024-08-24 18:48:48 +03:00
Peter Eisentraut	a2bbc58f74	thread-safety: gmtime_r(), localtime_r() Use gmtime_r() and localtime_r() instead of gmtime() and localtime(), for thread-safety. There are a few affected calls in libpq and ecpg's libpgtypes, which are probably effectively bugs, because those libraries already claim to be thread-safe. There is one affected call in the backend. Most of the backend otherwise uses the custom functions pg_gmtime() and pg_localtime(), which are implemented differently. While we're here, change the call in the backend to gmtime() instead of localtime(), since for that use time zone behavior is irrelevant, and this side-steps any questions about when time zones are initialized by localtime_r() vs localtime(). Portability: gmtime_r() and localtime_r() are in POSIX but are not available on Windows. Windows has functions gmtime_s() and localtime_s() that can fulfill the same purpose, so we add some small wrappers around them. (Note that these _s() functions are also different from the _s() functions in the bounds-checking extension of C11. We are not using those here.) On MinGW, you can get the POSIX-style *_r() functions by defining _POSIX_C_SOURCE appropriately before including <time.h>. This leads to a conflict at least in plpython because apparently _POSIX_C_SOURCE gets defined in some header there, and then our replacement definitions conflict with the system definitions. To avoid that sort of thing, we now always define _POSIX_C_SOURCE on MinGW and use the POSIX-style functions here. Reviewed-by: Stepan Neretin <sncfmgg@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/eba1dc75-298e-4c46-8869-48ba8aad7d70@eisentraut.org	2024-08-23 07:43:04 +02:00
Alexander Korotkov	04158e7fa3	Avoid repeated table name lookups in createPartitionTable() Currently, createPartitionTable() opens newly created table using its name. This approach is prone to privilege escalation attack, because we might end up opening another table than we just created. This commit address the issue above by opening newly created table by its OID. It appears to be tricky to get a relation OID out of ProcessUtility(). We have to extend TableLikeClause with new newRelationOid field, which is filled within ProcessUtility() to be further accessed by caller. Security: CVE-2014-0062 Reported-by: Noah Misch Discussion: https://postgr.es/m/20240808171351.a9.nmisch%40google.com Reviewed-by: Pavel Borisov, Dmitry Koval	2024-08-22 09:50:48 +03:00
Michael Paquier	490f869d92	Create syscache entries for pg_extension Two syscache identifiers are added for extension names and OIDs. Shared libraries of extensions might want to invalidate or update their own caches whenever a CREATE, ALTER or DROP EXTENSION command is run for their extension (in any backend). Right now this is non-trivial to do correctly and efficiently, but, if an extension catalog is part of a syscache, this could simply be done by registering an callback using CacheRegisterSyscacheCallback for the relevant syscache. Another case where this is useful is a loaded library where some of its code paths rely on some objects of the extension to exist; it can be simpler and more efficient to do an existence check directly on the extension through the syscache. Author: Jelte Fennema-Nio Reviewed-by: Alexander Korotkov, Pavel Stehule Discussion: https://postgr.es/m/CAGECzQTWm9sex719Hptbq4j56hBGUti7J9OWjeMobQ1ccRok9w@mail.gmail.com	2024-08-22 10:48:25 +09:00
Robert Haas	c01743aa48	Show number of disabled nodes in EXPLAIN ANALYZE output. Now that disable_cost is not included in the cost estimate, there's no visible sign in EXPLAIN output of which plan nodes are disabled. Fix that by propagating the number of disabled nodes from Path to Plan, and then showing it in the EXPLAIN output. There is some question about whether this is a desirable change. While I personally believe that it is, it seems best to make it a separate commit, in case we decide to back out just this part, or rework it. Reviewed by Andres Freund, Heikki Linnakangas, and David Rowley. Discussion: http://postgr.es/m/CA+TgmoZ_+MS+o6NeGK2xyBv-xM+w1AfFVuHE4f_aq6ekHv7YSQ@mail.gmail.com	2024-08-21 10:14:35 -04:00
Robert Haas	e222534679	Treat number of disabled nodes in a path as a separate cost metric. Previously, when a path type was disabled by e.g. enable_seqscan=false, we either avoided generating that path type in the first place, or more commonly, we added a large constant, called disable_cost, to the estimated startup cost of that path. This latter approach can distort planning. For instance, an extremely expensive non-disabled path could seem to be worse than a disabled path, especially if the full cost of that path node need not be paid (e.g. due to a Limit). Or, as in the regression test whose expected output changes with this commit, the addition of disable_cost can make two paths that would normally be distinguishible in cost seem to have fuzzily the same cost. To fix that, we now count the number of disabled path nodes and consider that a high-order component of both the startup cost and the total cost. Hence, the path list is now sorted by disabled_nodes and then by total_cost, instead of just by the latter, and likewise for the partial path list. It is important that this number is a count and not simply a Boolean; else, as soon as we're unable to respect disabled path types in all portions of the path, we stop trying to avoid them where we can. Because the path list is now sorted by the number of disabled nodes, the join prechecks must compute the count of disabled nodes during the initial cost phase instead of postponing it to final cost time. Counts of disabled nodes do not cross subquery levels; at present, there is no reason for them to do so, since the we do not postpone path selection across subquery boundaries (see make_subplan). Reviewed by Andres Freund, Heikki Linnakangas, and David Rowley. Discussion: http://postgr.es/m/CA+TgmoZ_+MS+o6NeGK2xyBv-xM+w1AfFVuHE4f_aq6ekHv7YSQ@mail.gmail.com	2024-08-21 10:12:30 -04:00
Amit Kapila	3f28b2fcac	Don't advance origin during apply failure. We advance origin progress during abort on successful streaming and application of ROLLBACK in parallel streaming mode. But the origin shouldn't be advanced during an error or unsuccessful apply due to shutdown. Otherwise, it will result in a transaction loss as such a transaction won't be sent again by the server. Reported-by: Hou Zhijie Author: Hayato Kuroda and Shveta Malik Reviewed-by: Amit Kapila Backpatch-through: 16 Discussion: https://postgr.es/m/TYAPR01MB5692FAC23BE40C69DA8ED4AFF5B92@TYAPR01MB5692.jpnprd01.prod.outlook.com	2024-08-21 09:22:32 +05:30
Michael Paquier	15c1abd977	Remove _PG_fini() `ab02d702ef` has removed from the backend the code able to support the unloading of modules, because this has never worked. This removes the last references to _PG_fini(), that could be used as a callback for modules to manipulate the stack when unloading a library. The test module ldap_password_func had the idea to declare it, doing nothing. The function declaration in fmgr.h is gone. It was left around in 2022 to avoid breaking extension code, but at this stage there are also benefits in letting extension developers know that keeping the unloading code is pointless and this move leads to less maintenance. Reviewed-by: Tom Lane, Heikki Linnakangas Discussion: https://postgr.es/m/ZsQfi0AUJoMF6NSd@paquier.xyz	2024-08-21 07:24:03 +09:00
Amit Kapila	9758174e2e	Log the conflicts while applying changes in logical replication. This patch provides the additional logging information in the following conflict scenarios while applying changes: insert_exists: Inserting a row that violates a NOT DEFERRABLE unique constraint. update_differ: Updating a row that was previously modified by another origin. update_exists: The updated row value violates a NOT DEFERRABLE unique constraint. update_missing: The tuple to be updated is missing. delete_differ: Deleting a row that was previously modified by another origin. delete_missing: The tuple to be deleted is missing. For insert_exists and update_exists conflicts, the log can include the origin and commit timestamp details of the conflicting key with track_commit_timestamp enabled. update_differ and delete_differ conflicts can only be detected when track_commit_timestamp is enabled on the subscriber. We do not offer additional logging for exclusion constraint violations because these constraints can specify rules that are more complex than simple equality checks. Resolving such conflicts won't be straightforward. This area can be further enhanced if required. Author: Hou Zhijie Reviewed-by: Shveta Malik, Amit Kapila, Nisha Moond, Hayato Kuroda, Dilip Kumar Discussion: https://postgr.es/m/OS0PR01MB5716352552DFADB8E9AD1D8994C92@OS0PR01MB5716.jpnprd01.prod.outlook.com	2024-08-20 08:35:11 +05:30
David Rowley	adf97c1562	Speed up Hash Join by making ExprStates support hashing Here we add ExprState support for obtaining a 32-bit hash value from a list of expressions. This allows both faster hashing and also JIT compilation of these expressions. This is especially useful when hash joins have multiple join keys as the previous code called ExecEvalExpr on each hash join key individually and that was inefficient as tuple deformation would have only taken into account one key at a time, which could lead to walking the tuple once for each join key. With the new code, we'll determine the maximum attribute required and deform the tuple to that point only once. Some performance tests done with this change have shown up to a 20% performance increase of a query containing a Hash Join without JIT compilation and up to a 26% performance increase when JIT is enabled and optimization and inlining were performed by the JIT compiler. The performance increase with 1 join column was less with a 14% increase with and without JIT. This test was done using a fairly small hash table and a large number of hash probes. The increase will likely be less with large tables, especially ones larger than L3 cache as memory pressure is more likely to be the limiting factor there. This commit only addresses Hash Joins, but lays expression evaluation and JIT compilation infrastructure for other hashing needs such as Hash Aggregate. Author: David Rowley Reviewed-by: Alexey Dvoichenkov <alexey@hyperplane.net> Reviewed-by: Tels <nospam-pg-abuse@bloodgate.com> Discussion: https://postgr.es/m/CAApHDvoexAxgQFNQD_GRkr2O_eJUD1-wUGm%3Dm0L%2BGc%3DT%3DkEa4g%40mail.gmail.com	2024-08-20 13:38:22 +12:00
Nathan Bossart	9e9a2b7031	Remove dependence on -fwrapv semantics in a few places. This commit attempts to update a few places, such as the money, numeric, and timestamp types, to no longer rely on signed integer wrapping for correctness. This is intended to move us closer towards removing -fwrapv, which may enable some compiler optimizations. However, there is presently no plan to actually remove that compiler option in the near future. Besides using some of the existing overflow-aware routines in int.h, this commit introduces and makes use of some new ones. Specifically, it adds functions that accept a signed integer and return its absolute value as an unsigned integer with the same width (e.g., pg_abs_s64()). It also adds functions that accept an unsigned integer, store the result of negating that integer in a signed integer with the same width, and return whether the negation overflowed (e.g., pg_neg_u64_overflow()). Finally, this commit adds a couple of tests for timestamps near POSTGRES_EPOCH_JDATE. Author: Joseph Koshakow Reviewed-by: Tom Lane, Heikki Linnakangas, Jian He Discussion: https://postgr.es/m/CAAvxfHdBPOyEGS7s%2Bxf4iaW0-cgiq25jpYdWBqQqvLtLe_t6tw%40mail.gmail.com	2024-08-15 15:47:31 -05:00
David Rowley	80ffcb8427	Improve ALTER PUBLICATION validation and error messages Attempting to add a system column for a table to an existing publication would result in the not very intuitive error message of: ERROR: negative bitmapset member not allowed Here we improve that to have it display the same error message as a user would see if they tried adding a system column for a table when adding it to the publication in the first place. Doing this requires making the function which validates the list of columns an extern function. The signature of the static function wasn't an ideal external API as it made the code more complex than it needed to be. Here we adjust the function to have it populate a Bitmapset of attribute numbers. Doing it this way allows code simplification. There was no particular bug here other than the weird error message, so no backpatch. Bug: #18558 Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Peter Smith, David Rowley Discussion: https://postgr.es/m/18558-411bc81b03592125@postgresql.org	2024-08-15 13:10:25 +12:00
Peter Eisentraut	5304fec4d8	Apply PGDLLIMPORT markings to some GUC variables According to the commit message in `8ec569479`, we must have all variables in header files marked with PGDLLIMPORT. In commit `d3cc5ffe81` some variables were moved from launch_backend.c file to several header files. This adds PGDLLIMPORT to moved variables. Author: Sofia Kopikova <s.kopikova@postgrespro.ru> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/e0b17014-5319-4dd6-91cd-93d9c8fc9539%40postgrespro.ru	2024-08-14 11:36:12 +02:00
Peter Eisentraut	c8e2d422fd	Remove TRACE_SORT macro The TRACE_SORT macro guarded the availability of the trace_sort GUC setting. But it has been enabled by default ever since it was introduced in PostgreSQL 8.1, and there have been no reports that someone wanted to disable it. So just remove the macro to simplify things. (For the avoidance of doubt: The trace_sort GUC is still there. This only removes the rarely-used macro guarding it.) Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/flat/be5f7162-7c1d-44e3-9a78-74dcaa6529f2%40eisentraut.org	2024-08-14 08:07:52 +02:00
Masahiko Sawada	c584781bcc	Use pgBufferUsage for buffer usage tracking in analyze. Previously, (auto)analyze used global variables VacuumPageHit, VacuumPageMiss, and VacuumPageDirty to track buffer usage. However, pgBufferUsage provides a more generic way to track buffer usage with support functions. This change replaces those global variables with pgBufferUsage in analyze. Since analyze was the sole user of those variables, it removes their declarations. Vacuum previously used those variables but replaced them with pgBufferUsage as part of a bug fix, commit `5cd72cc0c`. Additionally, it adjusts the buffer usage message in both vacuum and analyze for better consistency. Author: Anthonin Bonnefoy Reviewed-by: Masahiko Sawada, Michael Paquier Discussion: https://postgr.es/m/CAO6_Xqr__kTTCLkftqS0qSCm-J7_xbRG3Ge2rWhucxQJMJhcRA%40mail.gmail.com	2024-08-13 18:49:45 -07:00
Thomas Munro	14c648ff00	All POSIX systems have langinfo.h and CODESET. We don't need configure probes for HAVE_LANGINFO_H (it is implied by !WIN32), and we don't need to consider systems that have it but don't define CODESET (that was for OpenBSD in commit `81cca218`, but it has now had it for 19 years). Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA%2BhUKGJqVe0%2BPv9dvC9dSums_PXxGo9SWcxYAMBguWJUGbWz-A%40mail.gmail.com	2024-08-13 22:13:52 +12:00
Peter Geoghegan	1343ae954c	Give nbtree move right function internal linkage. Declare _bt_moveright() static. This is a minor modularity win; the routine was already private to nbtsearch.c for all practical purposes. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAEze2WgWVzCNEXQB_op5MMZMDgJ3fg3AhVm6bq2iZPpJNXGhWw@mail.gmail.com	2024-08-12 14:36:55 -04:00
Nathan Bossart	760162fedb	Add user-callable CRC functions. We've had code for CRC-32 and CRC-32C for some time (for WAL records, etc.), but there was no way for users to call it, despite apparent popular demand. The new crc32() and crc32c() functions accept bytea input and return bigint (to avoid returning negative values). Bumps catversion. Author: Aleksander Alekseev Reviewed-by: Peter Eisentraut, Tom Lane Discussion: https://postgr.es/m/CAJ7c6TNMTGnqnG%3DyXXUQh9E88JDckmR45H2Q%2B%3DucaCLMOW1QQw%40mail.gmail.com	2024-08-12 10:35:06 -05:00
David Rowley	313df8f5ad	Fix outdated comments A few fields in ResultRelInfo are now also used for MERGE. Update the comments to mention that. Reported-by: jian he <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxH8-NvFhLcSZZTTW+1M9AfS4+SOTKmyPG7ZhzNvN=+NkA@mail.gmail.com:wq	2024-08-12 23:41:13 +12:00
David Rowley	ffabb56c94	Fix a series of typos and outdated references Author: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/c1d63754-cb85-2d8a-8409-bde2c4d2d04b@gmail.com	2024-08-12 23:27:09 +12:00
David Rowley	f0d1127595	Remove "parent" column from pg_backend_memory_contexts `32d3ed816` added the "path" column to pg_backend_memory_contexts to allow a stable method of obtaining the parent MemoryContext of a given row in the view. Using the "path" column is now the preferred method of obtaining the parent row. Previously, any queries which were self-joining to this view using the "name" and "parent" columns could get incorrect results due to the fact that names are not unique. Here we aim to explicitly break such queries so that they can be corrected and use the "path" column instead. It is possible that there are more innocent users of the parent column that just need an indication of the parent and having to write out a self-joining CTE may be an unnecessary hassle for those cases. Let's remove the column for now and see if anyone comes back with any complaints. This does seem like a good time to attempt to get rid of the column as we still have around 1 year to revert this if someone comes back with a valid complaint. Plus this view is new to v14 and is quite niche, so perhaps not many people will be affected. Author: Melih Mutlu <m.melihmutlu@gmail.com> Discussion: https://postgr.es/m/CAGPVpCT7NOe4fZXRL8XaoxHpSXYTu6GTpULT_3E-HT9hzjoFRA@mail.gmail.com	2024-08-12 15:42:16 +12:00
Tom Lane	364de74cff	Allow adjusting session_authorization and role in parallel workers. The code intends to allow GUCs to be set within parallel workers via function SET clauses, but not otherwise. However, doing so fails for "session_authorization" and "role", because the assign hooks for those attempt to set the subsidiary "is_superuser" GUC, and that call falls foul of the "not otherwise" prohibition. We can't switch to using GUC_ACTION_SAVE for this, so instead add a new GUC variable flag GUC_ALLOW_IN_PARALLEL to mark is_superuser as being safe to set anyway. (This is okay because is_superuser has context PGC_INTERNAL and thus only hard-wired calls can change it. We'd need more thought before applying the flag to other GUCs; but maybe there are other use-cases.) This isn't the prettiest fix perhaps, but other alternatives we thought of would be much more invasive. While here, correct a thinko in commit `059de3ca4`: when rejecting a GUC setting within a parallel worker, we should return 0 not -1 if the ereport doesn't longjmp. (This seems to have no consequences right now because no caller cares, but it's inconsistent.) Improve the comments to try to forestall future confusion of the same kind. Despite the lack of field complaints, this seems worth back-patching. Thanks to Nathan Bossart for the idea to invent a new flag, and for review. Discussion: https://postgr.es/m/2833457.1723229039@sss.pgh.pa.us	2024-08-10 15:51:30 -04:00
Heikki Linnakangas	28a520c0b7	Refactor code to handle death of a backend or bgworker in postmaster Currently, when a child process exits, the postmaster first scans through BackgroundWorkerList, to see if it the child process was a background worker. If not found, then it scans through BackendList to see if it was a regular backend. That leads to some duplication between the bgworker and regular backend cleanup code, as both have an entry in the BackendList that needs to be cleaned up in the same way. Refactor that so that we scan just the BackendList to find the child process, and if it was a background worker, do the additional bgworker-specific cleanup in addition to the normal Backend cleanup. Change HandleChildCrash so that it doesn't try to handle the cleanup of the process that already exited, only the signaling of all the other processes. When called for any of the aux processes, the caller had already cleared the *PID global variable, so the code in HandleChildCrash() to do that was unused. On Windows, if a child process exits with ERROR_WAIT_NO_CHILDREN, it's now logged with that exit code, instead of 0. Also, if a bgworker exits with ERROR_WAIT_NO_CHILDREN, it's now treated as crashed and is restarted. Previously it was treated as a normal exit. If a child process is not found in the BackendList, the log message now calls it "untracked child process" rather than "server process". Arguably that should be a PANIC, because we do track all the child processes in the list, so failing to find a child process is highly unexpected. But if we want to change that, let's discuss and do that as a separate commit. Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi	2024-08-10 00:04:43 +03:00
Heikki Linnakangas	b43100fa71	Make BackgroundWorkerList doubly-linked This allows ForgetBackgroundWorker() and ReportBackgroundWorkerExit() to take a RegisteredBgWorker pointer as argument, rather than a list iterator. That feels a little more natural. But more importantly, this paves the way for more refactoring in the next commit. Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi	2024-08-09 22:44:20 +03:00
Peter Eisentraut	7da1bdc2c2	Remove obsolete RECHECK keyword completely This used to be part of CREATE OPERATOR CLASS and ALTER OPERATOR FAMILY, but it has done nothing (except issue a NOTICE) since PostgreSQL 8.4. Commit `30e7c175b8` removed support for dumping from pre-9.2 servers, so this no longer serves any need. This now removes it completely, and you'd get a normal parse error if you used it. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/flat/113ef2d2-3657-4353-be97-f28fceddbca1%40eisentraut.org	2024-08-09 07:18:51 +02:00
Robert Haas	22b4a1b561	Improve file header comments for astramer code. Make it clear that "astreamer" stands for "archive streamer". Generalize comments that still believe this code can only be used by pg_basebackup. Add some comments explaining the asymmetry between the gzip, lz4, and zstd astreamers, in the hopes of making life easier for anyone who hacks on this code in the future. Robert Haas, reviewed by Amul Sul. Discussion: http://postgr.es/m/CAAJ_b97O2kkKVTWxt8MxDN1o-cDfbgokqtiN2yqFf48=gXpcxQ@mail.gmail.com	2024-08-07 08:49:41 -04:00
Alexander Korotkov	d0f020037e	Introduce hash_search_with_hash_value() function This new function iterates hash entries with given hash values. This function is designed to avoid full sequential hash search in the syscache invalidation callbacks. Discussion: https://postgr.es/m/5812a6e5-68ae-4d84-9d85-b443176966a1%40sigaev.ru Author: Teodor Sigaev Reviewed-by: Aleksander Alekseev, Tom Lane, Michael Paquier, Roman Zharkov Reviewed-by: Andrei Lepikhov	2024-08-07 07:06:17 +03:00
Heikki Linnakangas	d5f139cb68	Constify fields and parameters in spell.c I started by marking VoidString as const, and fixing the fallout by marking more fields and function arguments as const. It proliferated quite a lot, but all within spell.c and spell.h. A more narrow patch to get rid of the static VoidString buffer would be to replace it with '#define VoidString ""', as C99 allows assigning "" to a non-const pointer, even though you're not allowed to modify it. But it seems like good hygiene to mark all these as const. In the structs, the pointers can point to the constant VoidString, or a buffer allocated with palloc(), or with compact_palloc(), so you should not modify them. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/54c29fb0-edf2-48ea-9814-44e918bbd6e8@iki.fi	2024-08-06 23:04:51 +03:00
Heikki Linnakangas	85829c973c	Make nullSemAction const, add 'const' decorators to related functions To make it more clear that these should never be modified. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/54c29fb0-edf2-48ea-9814-44e918bbd6e8@iki.fi	2024-08-06 23:04:22 +03:00
Heikki Linnakangas	1e35951e71	Turn a few 'validnsps' static variables into locals There was no need for these to be static buffers, local variables work just as well. I think they were marked as 'static' to imply that they are read-only, but 'const' is more appropriate for that, so change them to const. To make it possible to mark the variables as 'const', also add 'const' decorations to the transformRelOptions() signature. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/54c29fb0-edf2-48ea-9814-44e918bbd6e8@iki.fi	2024-08-06 23:03:43 +03:00
Heikki Linnakangas	a54d4ed183	Fix datatypes in comments in instr_time.h The INSTR_TIME_GET_NANOSEC(t) and INSTR_TIME_GET_MICROSEC(t) macros return a signed int64. Discussion: https://www.postgresql.org/message-id/ZrHkv3MAQfwNSmTG@ip-10-97-1-34.eu-west-3.compute.internal	2024-08-06 22:15:55 +03:00
Heikki Linnakangas	39a138fbef	Revert "Fix comments in instr_time.h and remove an unneeded cast to int64" This reverts commit `3dcb09de7b`. Tom Lane pointed out that it broke the abstraction provided by the macros. The callers should not need to know what the internal type is. This commit is an exact revert, the next commit will fix the comments on the macros that incorrectly claim that they return uint64. Discussion: https://www.postgresql.org/message-id/ZrHkv3MAQfwNSmTG@ip-10-97-1-34.eu-west-3.compute.internal	2024-08-06 22:15:46 +03:00
Heikki Linnakangas	3dcb09de7b	Fix comments in instr_time.h and remove an unneeded cast to int64 `03023a2664` represented time as an int64 on all platforms but forgot to update the comment related to INSTR_TIME_GET_MICROSEC() and provided an incorrect comment for INSTR_TIME_GET_NANOSEC(). In passing remove an unneeded cast to int64. Author: Bertrand Drouvot Discussion: https://www.postgresql.org/message-id/ZrHkv3MAQfwNSmTG@ip-10-97-1-34.eu-west-3.compute.internal	2024-08-06 14:28:02 +03:00
Robert Haas	f80b09bac8	Move astreamer (except astreamer_inject) to fe_utils. This allows the code to be used by other frontend applications. Amul Sul, reviewed by Sravan Kumar, Andres Freund (whose input I specifically solicited regarding the meson.build changes), and me. Discussion: http://postgr.es/m/CAAJ_b94StvLWrc_p4q-f7n3OPfr6GhL8_XuAg2aAaYZp1tF-nw@mail.gmail.com	2024-08-05 11:41:57 -04:00
Masahiko Sawada	66e94448ab	Restrict accesses to non-system views and foreign tables during pg_dump. When pg_dump retrieves the list of database objects and performs the data dump, there was possibility that objects are replaced with others of the same name, such as views, and access them. This vulnerability could result in code execution with superuser privileges during the pg_dump process. This issue can arise when dumping data of sequences, foreign tables (only 13 or later), or tables registered with a WHERE clause in the extension configuration table. To address this, pg_dump now utilizes the newly introduced restrict_nonsystem_relation_kind GUC parameter to restrict the accesses to non-system views and foreign tables during the dump process. This new GUC parameter is added to back branches too, but these changes do not require cluster recreation. Back-patch to all supported branches. Reviewed-by: Noah Misch Security: CVE-2024-7348 Backpatch-through: 12	2024-08-05 06:05:33 -07:00
Amit Kapila	b5df24e520	Fix typo in bufpage.h. Author: Senglee Choi Reviewed-by: Tender Wang Discussion: https://postgr.es/m/CACUsy79U0=S5zWEf6D57F=vB7rOEa86xFY6oovDZ58jRcROCxQ@mail.gmail.com	2024-08-05 14:38:00 +05:30
Michael Paquier	2eff9e678d	Add helper routines to retrieve data for custom fixed-numbered pgstats This is useful for extensions to get snapshot and shmem data for custom cumulative statistics when these have a fixed number of objects, so as these do not need to know about the snapshot internals, aka pgStatLocal. An upcoming commit introducing an example template for custom cumulative stats with fixed-numbered objects will make use of these. I have noticed that this is useful for extension developers while hacking my own example, actually. Author: Michael Paquier Reviewed-by: Dmitry Dolgov, Bertrand Drouvot Discussion: https://postgr.es/m/Zmqm9j5EO0I4W8dx@paquier.xyz	2024-08-05 11:43:33 +09:00
Michael Paquier	7949d95945	Introduce pluggable APIs for Cumulative Statistics This commit adds support in the backend for $subject, allowing out-of-core extensions to plug their own custom kinds of cumulative statistics. This feature has come up a few times into the lists, and the first, original, suggestion came from Andres Freund, about pg_stat_statements to use the cumulative statistics APIs in shared memory rather than its own less efficient internals. The advantage of this implementation is that this can be extended to any kind of statistics. The stats kinds are divided into two parts: - The in-core "builtin" stats kinds, with designated initializers, able to use IDs up to 128. - The "custom" stats kinds, able to use a range of IDs from 128 to 256 (128 slots available as of this patch), with information saved in TopMemoryContext. This can be made larger, if necessary. There are two types of cumulative statistics in the backend: - For fixed-numbered objects (like WAL, archiver, etc.). These are attached to the snapshot and pgstats shmem control structures for efficiency, and built-in stats kinds still do that to avoid any redirection penalty. The data of custom kinds is stored in a first array in snapshot structure and a second array in the shmem control structure, both indexed by their ID, acting as an equivalent of the builtin stats. - For variable-numbered objects (like tables, functions, etc.). These are stored in a dshash using the stats kind ID in the hash lookup key. Internally, the handling of the builtin stats is unchanged, and both fixed and variabled-numbered objects are supported. Structure definitions for builtin stats kinds are renamed to reflect better the differences with custom kinds. Like custom RMGRs, custom cumulative statistics can only be loaded with shared_preload_libraries at startup, and must allocate a unique ID shared across all the PostgreSQL extension ecosystem with the following wiki page to avoid conflicts: https://wiki.postgresql.org/wiki/CustomCumulativeStats This makes the detection of the stats kinds and their handling when reading and writing stats much easier than, say, allocating IDs for stats kinds from a shared memory counter, that may change the ID used by a stats kind across restarts. When under development, extensions can use PGSTAT_KIND_EXPERIMENTAL. Two examples that can be used as templates for fixed-numbered and variable-numbered stats kinds will be added in some follow-up commits, with tests to provide coverage. Some documentation is added to explain how to use this plugin facility. Author: Michael Paquier Reviewed-by: Dmitry Dolgov, Bertrand Drouvot Discussion: https://postgr.es/m/Zmqm9j5EO0I4W8dx@paquier.xyz	2024-08-04 19:41:24 +09:00
Alexander Korotkov	3c5db1d6b0	Implement pg_wal_replay_wait() stored procedure pg_wal_replay_wait() is to be used on standby and specifies waiting for the specific WAL location to be replayed. This option is useful when the user makes some data changes on primary and needs a guarantee to see these changes are on standby. The queue of waiters is stored in the shared memory as an LSN-ordered pairing heap, where the waiter with the nearest LSN stays on the top. During the replay of WAL, waiters whose LSNs have already been replayed are deleted from the shared memory pairing heap and woken up by setting their latches. pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise, the snapshot could prevent the replay of WAL records, implying a kind of self-deadlock. This is why it is only possible to implement pg_wal_replay_wait() as a procedure working without an active snapshot, not a function. Catversion is bumped. Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru Author: Kartyshov Ivan, Alexander Korotkov Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira Reviewed-by: Heikki Linnakangas, Kyotaro Horiguchi	2024-08-02 21:16:56 +03:00
Heikki Linnakangas	ef4c35b416	Fix outdated comment; all running bgworkers are in BackendList Before commit `8a02b3d732`, only bgworkers that connected to a database had an entry in the Backendlist. Commit `8a02b3d732` changed that, but forgot to update this comment. Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi	2024-08-01 23:23:47 +03:00
Michael Paquier	3188a4582a	Switch PgStat_Kind from an enum to a uint32 type A follow-up patch is planned to make cumulative statistics pluggable, and using a type is useful in the internal routines used by pgstats as PgStat_Kind may have a value that was not originally in the enum removed here, once made pluggable. While on it, this commit switches pgstat_is_kind_valid() to use PgStat_Kind rather than an int, to be more consistent with its existing callers. Some loops based on the stats kind IDs are switched to use PgStat_Kind rather than int, for consistency with the new time. Author: Michael Paquier Reviewed-by: Dmitry Dolgov, Bertrand Drouvot Discussion: https://postgr.es/m/Zmqm9j5EO0I4W8dx@paquier.xyz	2024-08-02 04:49:34 +09:00
Michael Paquier	b860848232	Add redo LSN to pgstats files This is used in the startup process to check that the pgstats file we are reading includes the redo LSN referring to the shutdown checkpoint where it has been written. The redo LSN in the pgstats file needs to match with what the control file has. This is intended to be used for an upcoming change that will extend the write of the stats file to happen during checkpoints, rather than only shutdown sequences. Bump PGSTAT_FILE_FORMAT_ID. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/Zp8o6_cl0KSgsnvS@paquier.xyz	2024-08-02 01:57:28 +09:00
Etsuro Fujita	e66b32e43b	Update comment in portal.h. We store tuples into the portal's tuple store for a PORTAL_ONE_MOD_WITH query as well. Back-patch to all supported branches. Reviewed by Andy Fan. Discussion: https://postgr.es/m/CAPmGK14HVYBZYZtHabjeCd-e31VT%3Dwx6rQNq8QfehywLcpZ2Hw%40mail.gmail.com	2024-08-01 17:45:00 +09:00
Peter Eisentraut	a292c98d62	Convert node test compile-time settings into run-time parameters This converts COPY_PARSE_PLAN_TREES WRITE_READ_PARSE_PLAN_TREES RAW_EXPRESSION_COVERAGE_TEST into run-time parameters debug_copy_parse_plan_trees debug_write_read_parse_plan_trees debug_raw_expression_coverage_test They can be activated for tests using PG_TEST_INITDB_EXTRA_OPTS. The compile-time symbols are kept for build farm compatibility, but they now just determine the default value of the run-time settings. Furthermore, support for these settings is not compiled in at all unless assertions are enabled, or the new symbol DEBUG_NODE_TESTS_ENABLED is defined at compile time, or any of the legacy compile-time setting symbols are defined. So there is no run-time overhead in production builds. (This is similar to the handling of DISCARD_CACHES_ENABLED.) Discussion: https://www.postgresql.org/message-id/flat/30747bd8-f51e-4e0c-a310-a6e2c37ec8aa%40eisentraut.org	2024-08-01 10:09:18 +02:00
Andres Freund	a7f107df2b	Evaluate arguments of correlated SubPlans in the referencing ExprState Until now we generated an ExprState for each parameter to a SubPlan and evaluated them one-by-one ExecScanSubPlan. That's sub-optimal as creating lots of small ExprStates a) makes JIT compilation more expensive b) wastes memory c) is a bit slower to execute This commit arranges to evaluate parameters to a SubPlan as part of the ExprState referencing a SubPlan, using the new EEOP_PARAM_SET expression step. We emit one EEOP_PARAM_SET for each argument to a subplan, just before the EEOP_SUBPLAN step. It likely is worth using EEOP_PARAM_SET in other places as well, e.g. for SubPlan outputs, nestloop parameters and - more ambitiously - to get rid of ExprContext->domainValue/caseValue/ecxt_agg*. But that's for later. Author: Andres Freund <andres@anarazel.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru> Discussion: https://postgr.es/m/20230225214401.346ancgjqc3zmvek@awork3.anarazel.de	2024-07-31 19:54:46 -07:00
Jeff Davis	ca2eea3ac8	Add is_create parameter to RefreshMatviewByOid(). RefreshMatviewByOid is used for both REFRESH and CREATE MATERIALIZED VIEW. This flag is currently just used for handling internal error messages, but also aimed to improve code-readability. Author: Yugo Nagata Discussion: https://postgr.es/m/20240726122630.70e889f63a4d7e26f8549de8@sraoss.co.jp	2024-07-31 16:42:19 -07:00
Jeff Davis	f683d3a4ca	Remove unused ParamListInfo argument from ExecRefreshMatView. Author: Yugo Nagata Discussion: https://postgr.es/m/20240726122630.70e889f63a4d7e26f8549de8@sraoss.co.jp	2024-07-31 16:37:53 -07:00
Nathan Bossart	c8b06bb969	Introduce pg_sequence_read_tuple(). This new function returns the data for the given sequence, i.e., the values within the sequence tuple. Since this function is a substitute for SELECT from the sequence, the SELECT privilege is required on the sequence in question. It returns all NULLs for sequences for which we lack privileges, other sessions' temporary sequences, and unlogged sequences on standbys. This function is primarily intended for use by pg_dump in a follow-up commit that will use it to optimize dumpSequenceData(). Like pg_sequence_last_value(), which is a support function for the pg_sequences system view, pg_sequence_read_tuple() is left undocumented. Bumps catversion. Reviewed-by: Michael Paquier, Tom Lane Discussion: https://postgr.es/m/20240503025140.GA1227404%40nathanxps13	2024-07-31 10:12:42 -05:00
Heikki Linnakangas	47c98035c6	Remove leftover function declaration Commit `9d9b9d46f3` removed the function (or rather, moved it to a different source file and renamed it to SendCancelRequest), but forgot the declaration in the header file.	2024-07-30 15:19:46 +03:00
Thomas Munro	83aadbeb96	Require memory barrier support. Previously we had a fallback implementation that made a harmless system call, based on the assumption that system calls must contain a memory barrier. That shouldn't be reached on any current system, and it seems highly likely that we can easily find out how to request explicit memory barriers, if we've already had to find out how to do atomics on a hypothetical new system. Removed comments and a function name referred to a spinlock used for fallback memory barriers, but that changed in `1b468a13`, which left some misleading words behind in a few places. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Suggested-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/721bf39a-ed8a-44b0-8b8e-be3bd81db748%40technowledgy.de Discussion: https://postgr.es/m/3351991.1697728588%40sss.pgh.pa.us	2024-07-30 23:01:55 +12:00
Thomas Munro	a011dc399c	Require compiler barrier support. Previously we had a fallback implementation of pg_compiler_barrier() that called an empty function across a translation unit boundary so the compiler couldn't see what it did. That shouldn't be needed on any current systems, and might not even work with a link time optimizer. Since we now require compiler-specific knowledge of how to implement atomics, we should also know how to implement compiler barriers on a hypothetical new system. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Suggested-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/721bf39a-ed8a-44b0-8b8e-be3bd81db748%40technowledgy.de Discussion: https://postgr.es/m/3351991.1697728588%40sss.pgh.pa.us	2024-07-30 22:59:30 +12:00
Thomas Munro	8138526136	Remove --disable-atomics, require 32 bit atomics. Modern versions of all relevant architectures and tool chains have atomics support. Since `edadeb07`, there is no remaining reason to carry code that simulates atomic flags and uint32 imperfectly with spinlocks. 64 bit atomics are still emulated with spinlocks, if needed, for now. Any modern compiler capable of implementing C11 <stdatomic.h> must have the underlying operations we need, though we don't require C11 yet. We detect certain compilers and architectures, so hypothetical new systems might need adjustments here. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (concept, not the patch) Reviewed-by: Andres Freund <andres@anarazel.de> (concept, not the patch) Discussion: https://postgr.es/m/3351991.1697728588%40sss.pgh.pa.us	2024-07-30 22:58:57 +12:00
Thomas Munro	e25626677f	Remove --disable-spinlocks. A later change will require atomic support, so it wouldn't make sense for a hypothetical new system not to be able to implement spinlocks. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (concept, not the patch) Reviewed-by: Andres Freund <andres@anarazel.de> (concept, not the patch) Discussion: https://postgr.es/m/3351991.1697728588%40sss.pgh.pa.us	2024-07-30 22:58:37 +12:00
Jeff Davis	72fe6d24a3	Make collation not depend on setlocale(). Now that the result of pg_newlocale_from_collation() is always non-NULL, then we can move the collate_is_c and ctype_is_c flags into pg_locale_t. That simplifies the logic in lc_collate_is_c() and lc_ctype_is_c(), removing the dependence on setlocale(). This commit also eliminates the multi-stage initialization of the collation cache. As long as we have catalog access, then it's now safe to call pg_newlocale_from_collation() without checking lc_collate_is_c() first. Discussion: https://postgr.es/m/cfd9eb85-c52a-4ec9-a90e-a5e4de56e57d@eisentraut.org Reviewed-by: Peter Eisentraut, Andreas Karlsson	2024-07-30 00:58:06 -07:00
Richard Guo	9b282a9359	Fix partitionwise join with partially-redundant join clauses To determine if the two relations being joined can use partitionwise join, we need to verify the existence of equi-join conditions involving pairs of matching partition keys for all partition keys. Currently we do that by looking through the join's restriction clauses. However, it has been discovered that this approach is insufficient, because there might be partition keys known equal by a specific EC, but they do not form a join clause because it happens that other members of the EC than the partition keys are constrained to become a join clause. To address this issue, in addition to examining the join's restriction clauses, we also check if any partition keys are known equal by ECs, by leveraging function exprs_known_equal(). To accomplish this, we enhance exprs_known_equal() to check equality per the semantics of the opfamily, if provided. It could be argued that exprs_known_equal() could be called O(N^2) times, where N is the number of partition key expressions, resulting in noticeable performance costs if there are a lot of partition key expressions. But I think this is not a problem. The number of a joinrel's partition key expressions would only be equal to the join degree, since each base relation within the join contributes only one partition key expression. That is to say, it does not scale with the number of partitions. A benchmark with a query involving 5-way joins of partitioned tables, each with 3 partition keys and 1000 partitions, shows that the planning time is not significantly affected by this patch (within the margin of error), particularly when compared to the impact caused by partitionwise join. Thanks to Tom Lane for the idea of leveraging exprs_known_equal() to check if partition keys are known equal by ECs. Author: Richard Guo, Tom Lane Reviewed-by: Tom Lane, Ashutosh Bapat, Robert Haas Discussion: https://postgr.es/m/CAN_9JTzo_2F5dKLqXVtDX5V6dwqB0Xk+ihstpKEt3a1LT6X78A@mail.gmail.com	2024-07-30 15:51:54 +09:00
Amit Langote	7f56eaff2f	SQL/JSON: Fix casting for integer EXISTS columns in JSON_TABLE The current method of coercing the boolean result value of JsonPathExists() to the target type specified for an EXISTS column, which is to call the type's input function via json_populate_type(), leads to an error when the target type is integer, because the integer input function doesn't recognize boolean literal values as valid. Instead use the boolean-to-integer cast function for coercion in that case so that using integer or domains thereof as type for EXISTS columns works. Note that coercion for ON ERROR values TRUE and FALSE already works like that because the parser creates a cast expression including the cast function, but the coercion of the actual result value is not handled by the parser. Tests by Jian He. Reported-by: Jian He <jian.universality@gmail.com> Author: Jian He <jian.universality@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17	2024-07-30 10:34:17 +09:00
Heikki Linnakangas	9d9b9d46f3	Move cancel key generation to after forking the backend Move responsibility of generating the cancel key to the backend process. The cancel key is now generated after forking, and the backend advertises it in the ProcSignal array. When a cancel request arrives, the backend handling it scans the ProcSignal array to find the target pid and cancel key. This is similar to how this previously worked in the EXEC_BACKEND case with the ShmemBackendArray, just reusing the ProcSignal array. One notable change is that we no longer generate cancellation keys for non-backend processes. We generated them before just to prevent a malicious user from canceling them; the keys for non-backend processes were never actually given to anyone. There is now an explicit flag indicating whether a process has a valid key or not. I wrote this originally in preparation for supporting longer cancel keys, but it's a nice cleanup on its own. Reviewed-by: Jelte Fennema-Nio Discussion: https://www.postgresql.org/message-id/508d0505-8b7a-4864-a681-e7e5edfe32aa@iki.fi	2024-07-29 15:37:48 +03:00
Richard Guo	513f4472a4	Reduce memory used by partitionwise joins In try_partitionwise_join, we aim to break down the join between two partitioned relations into joins between matching partitions. To achieve this, we iterate through each pair of partitions from the two joining relations and create child-join relations for them. With potentially thousands of partitions, the local objects allocated in each iteration can accumulate significant memory usage. Therefore, we opt to eagerly free these local objects at the end of each iteration. In line with this approach, this patch frees the bitmap set that represents the relids of child-join relations at the end of each iteration. Additionally, it modifies build_child_join_rel() to reuse the AppendRelInfo structures generated within each iteration. Author: Ashutosh Bapat Reviewed-by: David Christensen, Richard Guo Discussion: https://postgr.es/m/CAExHW5s4EqY43oB=ne6B2=-xLgrs9ZGeTr1NXwkGFt2j-OmaQQ@mail.gmail.com	2024-07-29 11:35:51 +09:00
Jeff Davis	1c461a8d8d	Refactor: make default_locale internal to pg_locale.c. Discussion: https://postgr.es/m/2228884bb1f1a02614b39f71a90c94d2cc8a3a2f.camel@j-davis.com Reviewed-by: Peter Eisentraut, Andreas Karlsson	2024-07-28 13:07:25 -07:00
David Rowley	17a5871d9d	Optimize escaping of JSON strings There were quite a few places where we either had a non-NUL-terminated string or a text Datum which we needed to call escape_json() on. Many of these places required that a temporary string was created due to the fact that escape_json() needs a NUL-terminated cstring. For text types, those first had to be converted to cstring before calling escape_json() on them. Here we introduce two new functions to make escaping JSON more optimal: escape_json_text() can be given a text Datum to append onto the given buffer. This is more optimal as it foregoes the need to convert the text Datum into a cstring. A temporary allocation is only required if the text Datum needs to be detoasted. escape_json_with_len() can be used when the length of the cstring is already known or the given string isn't NUL-terminated. Having this allows various places which were creating a temporary NUL-terminated string to just call escape_json_with_len() without any temporary memory allocations. Discussion: https://postgr.es/m/CAApHDvpLXwMZvbCKcdGfU9XQjGCDm7tFpRdTXuB9PVgpNUYfEQ@mail.gmail.com Reviewed-by: Melih Mutlu, Heikki Linnakangas	2024-07-27 23:46:07 +12:00
Robert Haas	8a53539bd6	Wait for WAL summarization to catch up before creating .partial file. When a standby is promoted, CleanupAfterArchiveRecovery() may decide to rename the final WAL file from the old timeline by adding ".partial" to the name. If WAL summarization is enabled and this file is renamed before its partial contents are summarized, WAL summarization breaks: the summarizer gets stuck at that point in the WAL stream and just errors out. To fix that, first make the startup process wait for WAL summarization to catch up before renaming the file. Generally, this should be quick, and if it's not, the user can shut off summarize_wal and try again. To make this fix work, also teach the WAL summarizer that after a promotion has occurred, no more WAL can appear on the previous timeline: previously, the WAL summarizer wouldn't switch to the new timeline until we actually started writing WAL there, but that meant that when the startup process was waiting for the WAL summarizer, it was waiting for an action that the summarizer wasn't yet prepared to take. In the process of fixing these bugs, I realized that the logic to wait for WAL summarization to catch up was spread out in a way that made it difficult to reuse properly, so this code refactors things to make it easier. Finally, add a test case that would have caught this bug and the previously-fixed bug that WAL summarization sometimes needs to back up when the timeline changes. Discussion: https://postgr.es/m/CA+TgmoZGEsZodXC4f=XZNkAeyuDmWTSkpkjCEOcF19Am0mt_OA@mail.gmail.com	2024-07-26 15:00:48 -04:00
Daniel Gustafsson	161c73462b	Fix macro placement in pg_config.h.in Commit `274bbced85` accidentally placed the pg_config.h.in for SSL_CTX_set_num_tickets on the wrong line wrt where autoheader places it. Fix by re-arranging and backpatch to the same level as the original commit. Reported-by: Marina Polyakova <m.polyakova@postgrespro.ru> Discussion: https://postgr.es/m/48cebe8c3eaf308bae253b1dbf4e4a75@postgrespro.ru Backpatch-through: v12	2024-07-26 16:25:28 +02:00
Heikki Linnakangas	20e0e7da9b	Add test for early backend startup errors The new test tests the libpq fallback behavior on an early error, which was fixed in the previous commit. This adds an IS_INJECTION_POINT_ATTACHED() macro, to allow writing injected test code alongside the normal source code. In principle, the new test could've been implemented by an extra test module with a callback that sets the FrontendProtocol global variable, but I think it's more clear to have the test code right where the injection point is, because it has pretty intimate knowledge of the surrounding context it runs in. Reviewed-by: Michael Paquier Discussion: https://www.postgresql.org/message-id/CAOYmi%2Bnwvu21mJ4DYKUa98HdfM_KZJi7B1MhyXtnsyOO-PB6Ww%40mail.gmail.com	2024-07-26 15:12:21 +03:00
Heikki Linnakangas	b9e5249c29	Fix using injection points at backend startup in EXEC_BACKEND mode Commit `86db52a506` changed the locking of injection points to use only atomic ops and spinlocks, to make it possible to define injection points in processes that don't have a PGPROC entry (yet). However, it didn't work in EXEC_BACKEND mode, because the pointer to shared memory area was not initialized until the process "attaches" to all the shared memory structs. To fix, pass the pointer to the child process along with other global variables that need to be set up early. Backpatch-through: 17	2024-07-26 15:11:50 +03:00
Daniel Gustafsson	274bbced85	Disable all TLS session tickets OpenSSL supports two types of session tickets for TLSv1.3, stateless and stateful. The option we've used only turns off stateless tickets leaving stateful tickets active. Use the new API introduced in 1.1.1 to disable all types of tickets. Backpatch to all supported versions. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20240617173803.6alnafnxpiqvlh3g@awork3.anarazel.de Backpatch-through: v12	2024-07-26 11:09:45 +02:00
Tom Lane	580f8727ca	Add argument names to the regexp_XXX functions. This change allows these functions to be called using named-argument notation, which can be helpful for readability, particularly for the ones with many arguments. There was considerable debate about exactly which names to use, but in the end we settled on the names already shown in our documentation table 9.10. The citext extension provides citext-aware versions of some of these functions, so add argument names to those too. In passing, fix table 9.10's syntax synopses for regexp_match, which were slightly wrong about which combinations of arguments are allowed. Jian He, reviewed by Dian Fay and others Discussion: https://postgr.es/m/CACJufxG3NFKKsh6x4fRLv8h3V-HvN4W5dA=zNKMxsNcDwOKang@mail.gmail.com	2024-07-25 14:51:46 -04:00
David Rowley	32d3ed8165	Add path column to pg_backend_memory_contexts view "path" provides a reliable method of determining the parent/child relationships between memory contexts. Previously this could be done in a non-reliable way by writing a recursive query and joining the "parent" and "name" columns. This wasn't reliable as the names were not unique, which could result in joining to the wrong parent. To make this reliable, "path" stores an array of numerical identifiers starting with the identifier for TopLevelMemoryContext. It contains an element for each intermediate parent between that and the current context. Incompatibility: Here we also adjust the "level" column to make it 1-based rather than 0-based. A 1-based level provides a convenient way to access elements in the "path" array. e.g. path[level] gives the identifier for the current context. Identifiers are not stable across multiple evaluations of the view. In an attempt to make these more stable for ad-hoc queries, the identifiers are assigned breadth-first. Contexts closer to TopLevelMemoryContext are less likely to change between queries and during queries. Author: Melih Mutlu <m.melihmutlu@gmail.com> Discussion: https://postgr.es/m/CAGPVpCThLyOsj3e_gYEvLoHkr5w=tadDiN_=z2OwsK3VJppeBA@mail.gmail.com Reviewed-by: Andres Freund, Stephen Frost, Atsushi Torikoshi, Reviewed-by: Michael Paquier, Robert Haas, David Rowley	2024-07-25 15:03:28 +12:00
Thomas Munro	f6bef362ca	Refactor tidstore.c iterator buffering. Previously, TidStoreIterateNext() would expand the set of offsets for each block into an internal buffer that it overwrote each time. In order to be able to collect the offsets for multiple blocks before working with them, change the contract. Now, the offsets are obtained by a separate call to TidStoreGetBlockOffsets(), which can be called at a later time. TidStoreIteratorResult objects are safe to copy and store in a queue. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/CAAKRu_bbkmwAzSBgnezancgJeXrQZXy4G4kBTd+5=cr86H5yew@mail.gmail.com	2024-07-24 17:32:35 +12:00
Amit Kapila	1462aad2e4	Allow altering of two_phase option of a SUBSCRIPTION. The two_phase option is controlled by both the publisher (as a slot option) and the subscriber (as a subscription option), so the slot option must also be modified. Changing the 'two_phase' option for a subscription from 'true' to 'false' is permitted only when there are no pending prepared transactions corresponding to that subscription. Otherwise, the changes of already prepared transactions can be replicated again along with their corresponding commit leading to duplicate data or errors. To avoid data loss, the 'two_phase' option for a subscription can only be changed from 'false' to 'true' once the initial data synchronization is completed. Therefore this is performed later by the logical replication worker. Author: Hayato Kuroda, Ajin Cherian, Amit Kapila Reviewed-by: Peter Smith, Hou Zhijie, Amit Kapila, Vitaly Davydov, Vignesh C Discussion: https://postgr.es/m/8fab8-65d74c80-1-2f28e880@39088166	2024-07-24 10:13:36 +05:30
Peter Eisentraut	774d47b6c0	Move all extern declarations for GUC variables to header files Add extern declarations in appropriate header files for global variables related to GUC. In many cases, this was handled quite inconsistently before, with some GUC variables declared in a header file and some only pulled in via ad-hoc extern declarations in various .c files. Also add PGDLLIMPORT qualifications to those variables. These were previously missing because src/tools/mark_pgdllimport.pl has only been used with header files. This also fixes -Wmissing-variable-declarations warnings for GUC variables (not yet part of the standard warning options). Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org	2024-07-24 06:31:07 +02:00
Peter Eisentraut	d3cc5ffe81	Move extern declarations for EXEC_BACKEND to header files This fixes warnings from -Wmissing-variable-declarations (not yet part of the standard warning options) under EXEC_BACKEND. The NON_EXEC_STATIC variables need a suitable declaration in a header file under EXEC_BACKEND. Also fix the inconsistent application of the volatile qualifier for PMSignalState, which was revealed by this change. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org	2024-07-23 15:07:10 +02:00
Peter Eisentraut	935e675f3c	Get rid of a global variable bootstrap_data_checksum_version can just as easily be passed to where it is used via function arguments. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org	2024-07-23 10:00:41 +02:00
Michael Paquier	ffb0603929	Improve comments in slru.{c,h} about segment name format slru.h described incorrectly how SLRU segment names are formatted depending on the segment number and if long or short segment names are used. This commit closes the gap with a better description, fitting with the reality. Reported-by: Noah Misch Author: Aleksander Alekseev Discussion: https://postgr.es/m/20240626002747.dc.nmisch@google.com Backpatch-through: 17	2024-07-23 16:54:51 +09:00
Peter Eisentraut	4d130b2872	Windows replacement for strtok_r() They spell it "strtok_s" there. There are currently no uses, but some will be added soon. Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: David Steele <david@pgmasters.net> Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c3944222a@eisentraut.org	2024-07-23 09:20:22 +02:00
Richard Guo	581df21487	Fix rowcount estimate for gather (merge) paths In the case of a parallel plan, when computing the number of tuples processed per worker, we divide the total number of tuples by the parallel_divisor obtained from get_parallel_divisor(), which accounts for the leader's contribution in addition to the number of workers. Accordingly, when estimating the number of tuples for gather (merge) nodes, we should multiply the number of tuples per worker by the same parallel_divisor to reverse the division. However, currently we use parallel_workers rather than parallel_divisor for the multiplication. This could result in an underestimation of the number of tuples for gather (merge) nodes, especially when there are fewer than four workers. This patch fixes this issue by using the same parallel_divisor for the multiplication. There is one ensuing plan change in the regression tests, but it looks reasonable and does not compromise its original purpose of testing parallel-aware hash join. In passing, this patch removes an unnecessary assignment for path.rows in create_gather_merge_path, and fixes an uninitialized-variable issue in generate_useful_gather_paths. No backpatch as this could result in plan changes. Author: Anthonin Bonnefoy Reviewed-by: Rafia Sabih, Richard Guo Discussion: https://postgr.es/m/CAO6_Xqr9+51NxgO=XospEkUeAg-p=EjAWmtpdcZwjRgGKJ53iA@mail.gmail.com	2024-07-23 10:33:26 +09:00
Robert Haas	e4326fbc60	Remove grotty use of disable_cost for TID scan plans. Previously, the code charged disable_cost for CurrentOfExpr, and then subtracted disable_cost from the cost of a TID path that used CurrentOfExpr as the TID qual, effectively disabling all paths except that one. Now, we instead suppress generation of the disabled paths entirely, and generate only the one that the executor will actually understand. With this approach, we do not need to rely on disable_cost being large enough to prevent the wrong path from being chosen, and we save some CPU cycle by avoiding generating paths that we can't actually use. In my opinion, the code is also easier to understand like this. Patch by me. Review by Heikki Linnakangas. Discussion: http://postgr.es/m/591b3596-2ea0-4b8e-99c6-fad0ef2801f5@iki.fi	2024-07-22 14:57:53 -04:00
Peter Eisentraut	683be87fbb	Add port/ replacement for strsep() from OpenBSD, similar to strlcat, strlcpy There are currently no uses, but some will be added soon. Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: David Steele <david@pgmasters.net> Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c3944222a@eisentraut.org	2024-07-22 09:50:30 +02:00
Noah Misch	a858be17c3	Add a way to create read stream object by using SMgrRelation. Currently read stream object can be created only by using Relation. Nazir Bilal Yavuz Discussion: https://postgr.es/m/CAN55FZ0JKL6vk1xQp6rfOXiNFV1u1H0tJDPPGHWoiO3ea2Wc=A@mail.gmail.com	2024-07-20 04:22:12 -07:00
Noah Misch	af07a827b9	Refactor PinBufferForBlock() to remove checks about persistence. There are checks in PinBufferForBlock() function to set persistence of the relation. This function is called for each block in the relation. Instead, set persistence of the relation before PinBufferForBlock(). Nazir Bilal Yavuz Discussion: https://postgr.es/m/CAN55FZ0JKL6vk1xQp6rfOXiNFV1u1H0tJDPPGHWoiO3ea2Wc=A@mail.gmail.com	2024-07-20 04:22:12 -07:00
Noah Misch	e00c45f685	Remove "smgr_persistence == 0" dead code. Reaching that code would have required multiple processes performing relation extension during recovery, which does not happen. That caller has the persistence available, so pass it. This was dead code as soon as commit `210622c60e` added it. Discussion: https://postgr.es/m/CAN55FZ0JKL6vk1xQp6rfOXiNFV1u1H0tJDPPGHWoiO3ea2Wc=A@mail.gmail.com	2024-07-20 04:22:12 -07:00
Heikki Linnakangas	5784a493f1	Move resowner from common JitContext to LLVM specific Only the LLVM specific code uses it since resource owners were made extensible in commit `b8bff07daa`. This is new in v17, so backpatch there to keep the branches from diverging just yet. Author: Andreas Karlsson <andreas@proxel.se> Discussion: https://www.postgresql.org/message-id/fd3a2a00-6605-4e30-a118-48418b478e6e@proxel.se	2024-07-19 10:27:06 +03:00
Robert Haas	402b586d0a	Do not summarize WAL if generated with wal_level=minimal. To do this, we must include the wal_level in the first WAL record covered by each summary file; so add wal_level to struct Checkpoint and the payload of XLOG_CHECKPOINT_REDO and XLOG_END_OF_RECOVERY. This, in turn, requires bumping XLOG_PAGE_MAGIC and, since the Checkpoint is also stored in the control file, also PG_CONTROL_VERSION. It's not great to do that so late in the release cycle, but the alternative seems to ship v17 without robust protections against this scenario, which could result in corrupted incremental backups. A side effect of this patch is that, when a server with wal_level=replica is started with summarize_wal=on for the first time, summarization will no longer begin with the oldest WAL that still exists in pg_wal, but rather from the first checkpoint after that. This change should be harmless, because a WAL summary for a partial checkpoint cycle can never make an incremental backup possible when it would otherwise not have been. Report by Fujii Masao. Patch by me. Review and/or testing by Jakub Wartak and Fujii Masao. Discussion: http://postgr.es/m/6e30082e-041b-4e31-9633-95a66de76f5d@oss.nttdata.com	2024-07-18 12:09:48 -04:00
Michael Paquier	a0a5869a85	Add INJECTION_POINT_CACHED() to run injection points directly from cache This new macro is able to perform a direct lookup from the local cache of injection points (refreshed each time a point is loaded or run), without touching the shared memory state of injection points at all. This works in combination with INJECTION_POINT_LOAD(), and it is better than INJECTION_POINT() in a critical section due to the fact that it would avoid all memory allocations should a concurrent detach happen since a LOAD(), as it retrieves a callback from the backend-private memory. The documentation is updated to describe in more details how to use this new macro with a load. Some tests are added to the module injection_points based on a new SQL function that acts as a wrapper of INJECTION_POINT_CACHED(). Based on a suggestion from Heikki Linnakangas. Author: Heikki Linnakangas, Michael Paquier Discussion: https://postgr.es/m/58d588d0-e63f-432f-9181-bed29313dece@iki.fi	2024-07-18 09:50:41 +09:00
Nathan Bossart	a99cc6c6b4	Use PqMsg_* macros in more places. Commit `f4b54e1ed9`, which introduced macros for protocol characters, missed updating a few places. It also did not introduce macros for messages sent from parallel workers to their leader processes. This commit adds a new section in protocol.h for those. Author: Aleksander Alekseev Discussion: https://postgr.es/m/CAJ7c6TNTd09AZq8tGaHS3LDyH_CCnpv0oOz2wN1dGe8zekxrdQ%40mail.gmail.com Backpatch-through: 17	2024-07-17 10:51:00 -05:00
Jeff Davis	4b74ebf726	When creating materialized views, use REFRESH to load data. Previously, CREATE MATERIALIZED VIEW ... WITH DATA populated the MV the same way as CREATE TABLE ... AS. Instead, reuse the REFRESH logic, which locks down security-restricted operations and restricts the search_path. This reduces the chance that a subsequent refresh will fail. Reported-by: Noah Misch Backpatch-through: 17 Discussion: https://postgr.es/m/20240630222344.db.nmisch@google.com	2024-07-16 15:41:29 -07:00
Tom Lane	a0f1fce80c	Add min and max aggregates for composite types (records). Like min/max for arrays, these are just thin wrappers around the existing btree comparison function for records. Aleksander Alekseev Discussion: https://postgr.es/m/CAO=iB8L4WYSNxCJ8GURRjQsrXEQ2-zn3FiCsh2LMqvWq2WcONg@mail.gmail.com	2024-07-11 11:50:50 -04:00
Masahiko Sawada	bb19b70081	Fix possibility of logical decoding partial transaction changes. When creating and initializing a logical slot, the restart_lsn is set to the latest WAL insertion point (or the latest replay point on standbys). Subsequently, WAL records are decoded from that point to find the start point for extracting changes in the DecodingContextFindStartpoint() function. Since the initial restart_lsn could be in the middle of a transaction, the start point must be a consistent point where we won't see the data for partial transactions. Previously, when not building a full snapshot, serialized snapshots were restored, and the SnapBuild jumps to the consistent state even while finding the start point. Consequently, the slot's restart_lsn and confirmed_flush could be set to the middle of a transaction. This could lead to various unexpected consequences. Specifically, there were reports of logical decoding decoding partial transactions, and assertion failures occurred because only subtransactions were decoded without decoding their top-level transaction until decoding the commit record. To resolve this issue, the changes prevent restoring the serialized snapshot and jumping to the consistent state while finding the start point. On v17 and HEAD, a flag indicating whether snapshot restores should be skipped has been added to the SnapBuild struct, and SNAPBUILD_VERSION has been bumpded. On backbranches, the flag is stored in the LogicalDecodingContext instead, preserving on-disk compatibility. Backpatch to all supported versions. Reported-by: Drew Callahan Reviewed-by: Amit Kapila, Hayato Kuroda Discussion: https://postgr.es/m/2444AA15-D21B-4CCE-8052-52C7C2DAFE5C%40amazon.com Backpatch-through: 12	2024-07-11 22:48:23 +09:00
Michael Paquier	9e4664d950	Add a new 'F' entry type for fixed-numbered stats in pgstats file This new entry type is used for all the fixed-numbered statistics, making possible support for custom pluggable stats. In short, we need to be able to detect more easily if a stats kind exists or not when reading back its data from the pgstats file without a dependency on the order of the entries read. The kind ID of the stats is added to the data written. The data is written in the same fashion as previously, with the fixed-numbered stats first and the dshash entries next. The read part becomes more flexible, loading fixed-numbered stats into shared memory based on the new entry type found. Bump PGSTAT_FILE_FORMAT_ID. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/Zot5bxoPYdS7yaoy@paquier.xyz	2024-07-11 16:12:44 +09:00
Michael Paquier	21471f18e9	Add PgStat_KindInfo.init_shmem_cb This new callback gives fixed-numbered stats the possibility to take actions based on the area of shared memory allocated for them. This removes from pgstat_shmem.c any knowledge specific to the types of fixed-numbered stats, and the initializations happen in their own files. Like `b68b29bc8f`, this change is useful to make this area of the code more pluggable, so as custom fixed-numbered stats can take actions after their shared memory area is initialized. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/Zot5bxoPYdS7yaoy@paquier.xyz	2024-07-11 09:21:40 +09:00
Michael Paquier	d898665bf7	Extend pg_get_acl() to handle sub-object IDs This patch modifies the pg_get_acl() function to accept a third argument called "objsubid", bringing it on par with similar functions in this area like pg_describe_object(). This enables the retrieval of ACLs for relation attributes when scanning dependencies. Bump catalog version. Author: Joel Jacobson Discussion: https://postgr.es/m/f2539bff-64be-47f0-9f0b-df85d3cc0432@app.fastmail.com	2024-07-10 10:14:37 +09:00
Nathan Bossart	ccd38024bc	Introduce pg_signal_autovacuum_worker. Since commit `3a9b18b309`, roles with privileges of pg_signal_backend cannot signal autovacuum workers. Many users treated the ability to signal autovacuum workers as a feature instead of a bug, so we are reintroducing it via a new predefined role. Having privileges of this new role, named pg_signal_autovacuum_worker, only permits signaling autovacuum workers. It does not permit signaling other types of superuser backends. Bumps catversion. Author: Kirill Reshke Reviewed-by: Anthony Leung, Michael Paquier, Andrey Borodin Discussion: https://postgr.es/m/CALdSSPhC4GGmbnugHfB9G0%3DfAxjCSug_-rmL9oUh0LTxsyBfsg%40mail.gmail.com	2024-07-09 13:03:40 -05:00
Michael Paquier	b68b29bc8f	Use pgstat_kind_infos to write fixed shared statistics This is similar to `9004abf620`, but this time for the write part of the stats file. The code is changed so as, rather than referring to individual members of PgStat_Snapshot in an order based on their PgStat_Kind value, a loop based on pgstat_kind_infos is used to retrieve the contents to write from the snapshot structure, for a size of PgStat_KindInfo's shared_data_len. This requires the addition to PgStat_KindInfo of an offset to track the location of each fixed-numbered stats in PgStat_Snapshot. This change is useful to make this area of the code more easily pluggable, and reduces the knowledge of specific fixed-numbered kinds in pgstat.c. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/Zot5bxoPYdS7yaoy@paquier.xyz	2024-07-09 10:27:12 +09:00
David Rowley	5a1e6df3b8	Show Parallel Bitmap Heap Scan worker stats in EXPLAIN ANALYZE Nodes like Memoize report the cache stats for each parallel worker, so it makes sense to show the exact and lossy pages in Parallel Bitmap Heap Scan in a similar way. Likewise, Sort shows the method and memory used for each worker. There was some discussion on whether the leader stats should include the totals for each parallel worker or not. I did some analysis on this to see what other parallel node types do and it seems only Parallel Hash does anything like this. All the rest, per what's supported by ExecParallelRetrieveInstrumentation() are consistent with each other. Author: David Geier <geidav.pg@gmail.com> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Donghang Lin <donghanglin@gmail.com> Author: Alena Rybakina <lena.ribackina@yandex.ru> Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Michael Christofides <michael@pgmustard.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Donghang Lin <donghanglin@gmail.com> Reviewed-by: Masahiro Ikeda <Masahiro.Ikeda@nttdata.com> Discussion: https://postgr.es/m/b3d80961-c2e5-38cc-6a32-61886cdf766d%40gmail.com	2024-07-09 12:15:47 +12:00
David Rowley	e41f713097	Perform forgotten cat version bump I missed this in `036bdcec9`	2024-07-09 09:56:46 +12:00
David Rowley	036bdcec9f	Teach planner how to estimate rows for timestamp generate_series This provides the planner with row estimates for generate_series(TIMESTAMP, TIMESTAMP, INTERVAL), generate_series(TIMESTAMPTZ, TIMESTAMPTZ, INTERVAL) and generate_series(TIMESTAMPTZ, TIMESTAMPTZ, INTERVAL, TEXT) when the input parameter values can be estimated during planning. Author: David Rowley Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://postgr.es/m/CAApHDvrBE%3D%2BASo_sGYmQJ3GvO8GPvX5yxXhRS%3Dt_ybd4odFkhQ%40mail.gmail.com	2024-07-09 09:54:59 +12:00
Michael Paquier	e311c6e539	Renumber pg_get_acl() in pg_proc.dat `a6417078c4` has introduced as project policy that new features committed during the development cycle should use new OIDs in the [8000,9999] range. `4564f1cebd` did not respect that rule, so let's renumber pg_get_acl() to use an OID in the correct range. Bump catalog version.	2024-07-08 15:34:33 +09:00
David Rowley	7340d9362a	Widen lossy and exact page counters for Bitmap Heap Scan Both of these counters were using the "long" data type. On MSVC that's a 32-bit type. On modern hardware, I was able to demonstrate that we can wrap those counters with a query that only takes 15 minutes to run. This issue may manifest itself either by not showing the values of the counters because they've wrapped and are less than zero, resulting in them being filtered by the > 0 checks in show_tidbitmap_info(), or bogus numbers being displayed which are modulus 2^32 of the actual number. Widen these counters to uint64. Discussion: https://postgr.es/m/CAApHDvpS_97TU+jWPc=T83WPp7vJa1dTw3mojEtAVEZOWh9bjQ@mail.gmail.com	2024-07-08 14:43:09 +12:00
Thomas Munro	2a5ef09830	Cope with <regex.h> name clashes. macOS 15's SDK pulls in headers related to <regex.h> when we include <xlocale.h>. This causes our own regex_t implementation to clash with the OS's regex_t implementation. Luckily our function names already had pg_ prefixes, but the macros and typenames did not. Include <regex.h> explicitly on all POSIX systems, and fix everything that breaks. Then we can prove that we are capable of fully hiding and replacing the system regex API with our own. 1. Deal with standard-clobbering macros by undefining them all first. POSIX says they are "symbolic constants". If they are macros, this allows us to redefine them. If they are enums or variables, our macros will hide them. 2. Deal with standard-clobbering types by giving our types pg_ prefixes, and then using macros to redirect xxx_t -> pg_xxx_t. After including our "regex/regex.h", the system <regex.h> is hidden, because we've replaced all the standard names. The PostgreSQL source tree and extensions can continue to use standard prefix-less type and macro names, but reach our implementation, if they included our "regex/regex.h" header. Back-patch to all supported branches, so that macOS 15's tool chain can build them. Reported-by: Stan Hu <stanhu@gmail.com> Suggested-by: Tom Lane <tgl@sss.pgh.pa.us> Tested-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://postgr.es/m/CAMBWrQnEwEJtgOv7EUNsXmFw2Ub4p5P%2B5QTBEgYwiyjy7rAsEQ%40mail.gmail.com	2024-07-06 10:27:16 +12:00
Nathan Bossart	0b1fe1413e	Remove check hooks for GUCs that contribute to MaxBackends. Each of max_connections, max_worker_processes, autovacuum_max_workers, and max_wal_senders has a GUC check hook that verifies the sum of those GUCs does not exceed a hard-coded limit (see the comment for MAX_BACKENDS in postmaster.h). In general, the hooks effectively guard against egregious misconfigurations. However, this approach has some problems. Since these check hooks are called as each GUC is assigned its user-specified value, only one of the hooks will be called with all the relevant GUCs set. If one or more of the user-specified values are less than the initial values of the GUCs' underlying variables, false positives can occur. Furthermore, the error message emitted when one of the check hooks fails is not tremendously helpful. For example, the command $ pg_ctl -D . start -o "-c max_connections=262100 -c max_wal_senders=10000" fails with the following error: FATAL: invalid value for parameter "max_wal_senders": 10000 Fortunately, there is an extra copy of this check in InitializeMaxBackends() that we can rely on, so this commit removes the aforementioned GUC check hooks in favor of that one. It also enhances the error message to clearly show the values of the relevant GUCs and the hard-coded limit their sum may not exceed. The downside of this change is that server startup progresses further before failing due to such misconfigurations (thus taking longer), but these failures are expected to be rare, so we don't anticipate any real harm in practice. Reviewed-by: Tom Lane Discussion: https://postgr.es/m/ZnMr2k-Nk5vj7T7H%40nathan	2024-07-05 14:42:55 -05:00
Michael Paquier	4b211003ec	Support loading of injection points This can be used to load an injection point and prewarm the backend-level cache before running it, to avoid issues if the point cannot be loaded due to restrictions in the code path where it would be run, like a critical section where no memory allocation can happen (load_external_function() can do allocations when expanding a library name). Tests can use a macro called INJECTION_POINT_LOAD() to load an injection point. The test module injection_points gains some tests, and a SQL function able to load an injection point. Based on a request from Andrey Borodin, who has implemented a test for multixacts requiring this facility. Reviewed-by: Andrey Borodin Discussion: https://postgr.es/m/ZkrBE1e2q2wGvsoN@paquier.xyz	2024-07-05 18:09:03 +09:00

1 2 3 4 5 ...

11816 commits