postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-05-28 04:35:45 -04:00

Author	SHA1	Message	Date
Peter Geoghegan	d088ba5a5a	nbtree: Allocate new pages in separate function. Split nbtree's _bt_getbuf function is two: code that read locks or write locks existing pages remains in _bt_getbuf, while code that deals with allocating new pages is moved to a new, dedicated function called _bt_allocbuf. This simplifies most _bt_getbuf callers, since it is no longer necessary for them to pass a heaprel argument. Many of the changes to nbtree from commit `61b313e4` can be reverted. This minimizes the divergence between HEAD/PostgreSQL 16 and earlier release branches. _bt_allocbuf replaces the previous nbtree idiom of passing P_NEW to _bt_getbuf. There are only 3 affected call sites, all of which continue to pass a heaprel for recovery conflict purposes. Note that nbtree's use of P_NEW was superficial; nbtree never actually relied on the P_NEW code paths in bufmgr.c, so this change is strictly mechanical. GiST already took the same approach; it has a dedicated function for allocating new pages called gistNewBuffer(). That factor allowed commit `61b313e4` to make much more targeted changes to GiST. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CAH2-Wz=8Z9qY58bjm_7TAHgtW6RzZ5Ke62q5emdCEy9BAzwhmg@mail.gmail.com	2023-06-10 14:08:25 -07:00
Jeff Davis	2fcc7ee7af	Revert "Fix search_path to a safe value during maintenance operations." This reverts commit `05e1737351`.	2023-06-10 08:11:41 -07:00
Andres Freund	a1cd982098	meson: Add dependencies to perl modules to various script invocations Eventually it is likely worth trying to deal with this in a more expansive way, by generating dependency files generated within the scripts. But it's not entirely obvious how to do that in perl and is work more suitable for 17 anyway. Reported-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Reviewed-by: Tristan Partin <tristan@neon.tech> Discussion: https://postgr.es/m/87v8g7s6bf.fsf@wibble.ilmari.org	2023-06-09 20:12:16 -07:00
Jeff Davis	05e1737351	Fix search_path to a safe value during maintenance operations. While executing maintenance operations (ANALYZE, CLUSTER, REFRESH MATERIALIZED VIEW, REINDEX, or VACUUM), set search_path to 'pg_catalog, pg_temp' to prevent inconsistent behavior. Functions that are used for functional indexes, in index expressions, or in materialized views and depend on a different search path must be declared with CREATE FUNCTION ... SET search_path='...'. This change addresses a security risk introduced in commit `60684dd834`, where a role with MAINTAIN privileges on a table may be able to escalate privileges to the table owner. That commit is not yet part of any release, so no need to backpatch. Discussion: https://postgr.es/m/e44327179e5c9015c8dda67351c04da552066017.camel%40j-davis.com Reviewed-by: Greg Stark Reviewed-by: Nathan Bossart	2023-06-09 11:20:47 -07:00
David Rowley	53ea2b7ad0	Don't use _BitScanForward64/_BitScanReverse64 on 32-bit MSVC builds `677319746` added support for making use of MSVC's bit scanning functions. However, that commit failed to consider 32-bit MSVC builds where the 64-bit versions of these functions are unavailable. This resulted in compilation failures on 32-bit MSVC. Here we adjust the code so we fall back on the manual way of finding the bit positions for 64-bit integers when building on 32-bit MSVC. Bug: #17967 Reported-by: Youmiu Mo Discussion: https://postgr.es/m/17967-cd21e34a314141b2@postgresql.org	2023-06-08 10:10:34 +12:00
Peter Eisentraut	08235203dd	Remove obsolete comment OIDs are no longer system columns, since `578b229718`.	2023-06-05 15:33:08 +02:00
Tom Lane	991a3df227	Fix filtering of "cloned" outer-join quals some more. We've had multiple issues with the clause_is_computable_at logic that I introduced in `2489d76c4`: it's been known to accept more than one clone of the same qual at the same plan node, and also to accept no clones at all. It's looking impractical to get it 100% right on the basis of the currently-stored information, so fix it by introducing a new RestrictInfo field "incompatible_relids" that explicitly shows which outer joins a given clone mustn't be pushed above. In principle we could populate this field in every RestrictInfo, but that would cost space and there doesn't presently seem to be a need for it in general. Also, while deconstruct_distribute_oj_quals can easily fill the field with the remaining members of the commutative join set that it's considering, computing it in the general case seems again pretty complicated. So for now, just fill it for clone quals. Along the way, fix a bug that may or may not be only latent: equivclass.c was generating replacement clauses with is_pushed_down and has_clone/is_clone markings that didn't match their required_relids. This led me to conclude that leaving the clone flags out of make_restrictinfo's purview wasn't such a great idea after all, so add them. Per report from Richard Guo. Discussion: https://postgr.es/m/CAMbWs48EYi_9-pSd0ORes1kTmTeAjT4Q3gu49hJtYCbSn2JyeA@mail.gmail.com	2023-05-25 10:28:33 -04:00
Andres Freund	bc971f4025	Optimize walsender wake up logic using condition variables WalSndWakeup() currently loops through all the walsenders slots, with a spinlock acquisition and release for every iteration, to wake up waiting walsenders. This commonly was not a problem before `e101dfac3a`. But, to allow logical decoding on standbys, we need to wake up logical walsenders after every WAL record is applied on the standby, rather just when flushing WAL or switching timelines. This causes a performance regression for workloads replaying a lot of WAL records. To solve this, we use condition variable (CV) to efficiently wake up walsenders in WalSndWakeup(). Every walsender prepares to sleep on a shared memory CV. Note that it just prepares to sleep on the CV (i.e., adds itself to the CV's waitlist), but does not actually wait on the CV (IOW, it never calls ConditionVariableSleep()). It still uses WaitEventSetWait() for waiting, because CV infrastructure doesn't handle FeBe socket events currently. The processes (startup process, walreceiver etc.) wanting to wake up walsenders use ConditionVariableBroadcast(), which in turn calls SetLatch(), helping walsenders come out of WaitEventSetWait(). We use separate shared memory CVs for physical and logical walsenders for selective wake ups, see WalSndWakeup() for more details. This approach is simple and reasonably efficient. But not very elegant. But for 16 it seems to be a better path than a larger redesign of the CV mechanism. A desirable future improvement would be to add support for CVs into WaitEventSetWait(). This still leaves us with a small regression in very extreme workloads (due to the spinlock acquisition in ConditionVariableBroadcast() when there are no waiters) - but that seems acceptable. Reported-by: Andres Freund <andres@anarazel.de> Suggested-by: Andres Freund <andres@anarazel.de> Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Discussion: https://www.postgresql.org/message-id/20230509190247.3rrplhdgem6su6cg%40awork3.anarazel.de	2023-05-21 09:44:55 -07:00
Tom Lane	a2eb99a01e	Expand some more uses of "deleg" to "delegation" or "delegated". Complete the task begun in `9c0a0e2ed`: we don't want to use the abbreviation "deleg" for GSS delegation in any user-visible places. (For consistency, this also changes most internal uses too.) Abhijit Menon-Sen and Tom Lane Discussion: https://postgr.es/m/949048.1684639317@sss.pgh.pa.us	2023-05-21 10:55:18 -04:00
Bruce Momjian	9c0a0e2ed9	rename "gss_accept_deleg" to "gss_accept_delegation". This is more consistent with existing GUC spelling. Discussion: https://postgr.es/m/ZGdnEsGtNj7+fZoa@momjian.us	2023-05-20 21:32:54 -04:00
Tom Lane	0245f8db36	Pre-beta mechanical code beautification. Run pgindent, pgperltidy, and reformat-dat-files. This set of diffs is a bit larger than typical. We've updated to pg_bsd_indent 2.1.2, which properly indents variable declarations that have multi-line initialization expressions (the continuation lines are now indented one tab stop). We've also updated to perltidy version 20230309 and changed some of its settings, which reduces its desire to add whitespace to lines to make assignments etc. line up. Going forward, that should make for fewer random-seeming changes to existing code. Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql	2023-05-19 17:24:48 -04:00
Tom Lane	722541ead1	Do pre-release housekeeping on catalog data. Run renumber_oids.pl to move high-numbered OIDs down, as per pre-beta tasks specified by RELEASE_CHANGES. For reference, the command was ./renumber_oids.pl --first-mapped-oid 8000 --target-oid 6200	2023-05-19 16:36:38 -04:00
Tom Lane	70b42f2790	Fix misbehavior of EvalPlanQual checks with multiple result relations. The idea of EvalPlanQual is that we replace the query's scan of the result relation with a single injected tuple, and see if we get a tuple out, thereby implying that the injected tuple still passes the query quals. (In join cases, other relations in the query are still scanned normally.) This logic was not updated when commit `86dc90056` made it possible for a single DML query plan to have multiple result relations, when the query target relation has inheritance or partition children. We replaced the output for the current result relation successfully, but other result relations were still scanned normally; thus, if any other result relation contained a tuple satisfying the quals, we'd think the EPQ check passed, even if it did not pass for the injected tuple itself. This would lead to update or delete actions getting performed when they should have been skipped due to a conflicting concurrent update in READ COMMITTED isolation mode. Fix by blocking all sibling result relations from emitting tuples during an EvalPlanQual recheck. In the back branches, the fix is complicated a bit by the need to not change the size of struct EPQState (else we'd have ABI-breaking changes in offsets in struct ModifyTableState). Like the back-patches of `3f7836ff6` and `4b3e37993`, add a separately palloc'd struct to avoid that. The logic is the same as in HEAD otherwise. This is only a live bug back to v14 where `86dc90056` came in. However, I chose to back-patch the test cases further, on the grounds that this whole area is none too well tested. I skipped doing so in v11 though because none of the test applied cleanly, and it didn't quite seem worth extra work for a branch with only six months to live. Per report from Ante Krešić (via Aleksander Alekseev) Discussion: https://postgr.es/m/CAJ7c6TMBTN3rcz4=AjYhLPD_w3FFT0Wq_C15jxCDn8U4tZnH1g@mail.gmail.com	2023-05-19 14:26:40 -04:00
Tomas Vondra	8c4040edf4	Allocate hash join files in a separate memory context Should a hash join exceed memory limit, the hashtable is split up into multiple batches. The number of batches is doubled each time a given batch is determined not to fit in memory. Each batch file is allocated with a block-sized buffer for buffering tuples and parallel hash join has additional sharedtuplestore accessor buffers. In some pathological cases requiring a lot of batches, often with skewed data, bad stats, or very large datasets, users can run out-of-memory solely from the memory overhead of all the batch files' buffers. Batch files were allocated in the ExecutorState memory context, making it very hard to identify when this batch explosion was the source of an OOM. This commit allocates the batch files in a dedicated memory context, making it easier to identify the cause of an OOM and work to avoid it. Based on initial draft by Tomas Vondra, with significant reworks and improvements by Jehan-Guillaume de Rorthais. Author: Jehan-Guillaume de Rorthais <jgdr@dalibo.com> Author: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20190421114618.z3mpgmimc3rmubi4@development Discussion: https://postgr.es/m/20230504193006.1b5b9622%40karst#273020ff4061fc7a2fbb1ba96b281f17	2023-05-19 17:17:58 +02:00
Peter Eisentraut	803b4a26ca	Remove stray mid-sentence tabs in comments	2023-05-19 16:13:16 +02:00
Michael Paquier	e7bff46e50	pageinspect: Fix gist_page_items() with included columns Non-leaf pages of GiST indexes contain key attributes, leaf pages contain both key and non-key attributes, and gist_page_items() ignored the handling of non-key attributes. This caused a few problems when using gist_page_items() on a GiST index with INCLUDE: - On a non-leaf page, the function would crash. - On a leaf page, the function would work, but miss to display all the values for included attributes. This commit fixes gist_page_items() to handle such cases in a more appropriate way, and now displays the values of key and non-key attributes for each item separately in a style consistent with what ruleutils.c would generate for the attribute list, depending on the page type dealt with. In a way similar to how a record is displayed, values would be double-quoted for key or non-key attributes if required. ruleutils.c did not provide a routine able to control if non-key attributes should be displayed, so an extended() routine for index definitions is added to work around the leaf and non-leaf page differences. While on it, this commit fixes a third problem related to the amount of data reported for key attributes. The code originally relied on BuildIndexValueDescription() (used for error reports on constraints) that would not print all the data stored in the index but the index opclass's input type, so this limited the amount of information available. This switch makes gist_page_items() much cheaper as there is no need to run ACL checks for each item printed, which is not an issue anyway as superuser rights are required to execute the functions of pageinspect. Opclasses whose data cannot be displayed can rely on gist_page_items_bytea(). The documentation of this function was slightly incorrect for the output results generated on HEAD and v15, so adjust it on these branches. Author: Alexander Lakhin, Michael Paquier Discussion: https://postgr.es/m/17884-cb8c326522977acb@postgresql.org Backpatch-through: 14	2023-05-19 12:37:58 +09:00
Tomas Vondra	3581cbdcd6	Fix handling of empty ranges and NULLs in BRIN BRIN indexes did not properly distinguish between summaries for empty (no rows) and all-NULL ranges, treating them as essentially the same thing. Summaries were initialized with allnulls=true, and opclasses simply reset allnulls to false when processing the first non-NULL value. This however produces incorrect results if the range starts with a NULL value (or a sequence of NULL values), in which case we forget the range contains NULL values when adding the first non-NULL value. This happens because the allnulls flag is used for two separate purposes - to mark empty ranges (not representing any rows yet) and ranges containing only NULL values. Opclasses don't know which of these cases it is, and so don't know whether to set hasnulls=true. Setting the flag in both cases would make it correct, but it would also make BRIN indexes useless for queries with IS NULL clauses. All ranges start empty (and thus allnulls=true), so all ranges would end up with either allnulls=true or hasnulls=true. The severity of the issue is somewhat reduced by the fact that it only happens when adding values to an existing summary with allnulls=true. This can happen e.g. for small tables (because a summary for the first range exists for all BRIN indexes), or for tables with large fraction of NULL values in the indexed columns. Bulk summarization (e.g. during CREATE INDEX or automatic summarization) that processes all values at once is not affected by this issue. In this case the flags were updated in a slightly different way, not forgetting the NULL values. To identify empty ranges we use a new flag, stored in an unused bit in the BRIN tuple header so the on-disk format remains the same. A matching flag is added to BrinMemTuple, into a 3B gap after bt_placeholder. That means there's no risk of ABI breakage, although we don't actually pass the BrinMemTuple to any public API. We could also skip storing index tuples for empty summaries, but then we'd have to always process such ranges - even if there are no rows in large parts of the table (e.g. after a bulk DELETE), it would still require reading the pages etc. So we store them, but ignore them when building the bitmap. Backpatch to 11. The issue exists since BRIN indexes were introduced in 9.5, but older releases are already EOL. Backpatch-through: 11 Reviewed-by: Justin Pryzby, Matthias van de Meent, Alvaro Herrera Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80afff3b@enterprisedb.com	2023-05-19 01:29:44 +02:00
Tom Lane	8a2523ff35	Tweak API of new function clause_is_computable_at(). Pass it the RestrictInfo under consideration, not just the clause_relids. This should save some trivial amount of code at the call sites, and it gives us more flexibility about what clause_is_computable_at() does. There's no actual functional change here, though. Discussion: https://postgr.es/m/3564467.1684352557@sss.pgh.pa.us	2023-05-18 10:39:16 -04:00
Andres Freund	093e5c57d5	Add writeback to pg_stat_io `28e626bde0` added the concept of IOOps but neglected to include writeback operations. `ac8d53dae5` added time spent doing these I/O operations. Without counting writeback, checkpointer write time in the log often differed substantially from that in pg_stat_io. To fix this, add IOOp IOOP_WRITEBACK and track writeback in pg_stat_io. Bumps catversion. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20230419172326.dhgyo4wrrhulovt6%40awork3.anarazel.de	2023-05-17 11:18:35 -07:00
Andres Freund	52676dc2e0	Update parameter name context to wb_context For clarity of review, renaming the function parameter "context" in ScheduleBufferTagForWriteback() and IssuePendingWritebacks() to "wb_context" is a separate commit. The next commit adds an "io_context" parameter and "wb_context" makes it more clear which is which. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_acc6iL4M3hvOTeztf_ZPpsB3Pqio5aVHgZ5q=Pi3BZKg@mail.gmail.com	2023-05-17 11:18:30 -07:00
Alexander Korotkov	b9a7a82272	Revert "Add USER SET parameter values for pg_db_role_setting" This reverts commit `096dd80f3c` and its fixups `beecbe8e50`, `afdd9f7f0e`, `529da086ba`, `db93e739ac`. Catversion is bumped. Discussion: https://postgr.es/m/d46f9265-ff3c-6743-2278-6772598233c2%40pgmasters.net	2023-05-17 20:28:57 +03:00
Tom Lane	9df8f903eb	Fix some issues with improper placement of outer join clauses. After applying outer-join identity 3 in the forward direction, it was possible for the planner to mistakenly apply a qual clause from above the two outer joins at the now-lower join level. This can give the wrong answer, since a value that would get nulled by the now-upper join might not yet be null. To fix, when we perform such a transformation, consider that the now-lower join hasn't really completed the outer join it's nominally responsible for and thus its relid set should not include that OJ's relid (nor should its output Vars have that nullingrel bit set). Instead we add those bits when the now-upper join is performed. The existing rules for qual placement then suffice to prevent higher qual clauses from dropping below the now-upper join. There are a few complications from needing to consider transitive closures in case multiple pushdowns have happened, but all in all it's not a very complex patch. This is all new logic (from `2489d76c4`) so no need to back-patch. The added test cases all have the same results as in v15. Tom Lane and Richard Guo Discussion: https://postgr.es/m/0b819232-4b50-f245-1c7d-c8c61bf41827@postgrespro.ru	2023-05-17 11:14:04 -04:00
Michael Paquier	d8c3106bb6	Add back SQLValueFunction for SQL keywords This is equivalent to a revert of `f193883` and `fb32748`, with the addition that the declaration of the SQLValueFunction node needs to gain a couple of node_attr for query jumbling. The performance impact of removing the function call inlining is proving to be too huge for some workloads where these are used. A worst-case test case of involving only simple SELECT queries with a SQL keyword is proving to lead to a reduction of 10% in TPS via pgbench and prepared queries on a high-end machine. None of the tests I ran back for this set of changes saw such a huge gap, but Alexander Lakhin and Andres Freund have found that this can be noticeable. Keeping the older performance would mean to do more inlining in the executor when using COERCE_SQL_SYNTAX for a function expression, similarly to what SQLValueFunction does. This requires more redesign work and there is little time until 16beta1 is released, so for now reverting the change is the best way forward, bringing back the previous performance. Bump catalog version. Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/b32bed1b-0746-9b20-1472-4bdc9ca66d52@gmail.com	2023-05-17 10:19:17 +09:00
Thomas Munro	63932a6d38	Fix wal_writer_flush_after initializer value. Commit `a73952b795` (new in 16) required default values in guc_table.c and C variable initializers to match. This one only matched when XLOG_BLCKSZ == 8kB. Fix by using the same expression in both places with a new DEFAULT_XXX macro, as done for other GUCs. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGLNmLV=VrT==5MqnbARgx2ifRSFtdd8ofdfrdSLL3yv5A@mail.gmail.com	2023-05-15 11:19:54 +12:00
Peter Eisentraut	e32701b8d2	initdb: Set collversion for standard collation UNICODE Since the behavior of the UNICODE collation can change with new ICU/Unicode versions, we need to apply the versioning mechanism to it. We do this with an UPDATE command in initdb; this is similar to how we put the collation version into pg_database already. Reported-by: Daniel Verite <daniel@manitou-mail.org> Discussion: https://www.postgresql.org/message-id/49417853-7bdd-4b23-a4e9-04c7aff33821@manitou-mail.org	2023-05-12 10:03:05 +02:00
Michael Paquier	605994651b	Fix assertion failure when updating stats_fetch_consistency in a transaction An update of the GUC stats_fetch_consistency in a transaction would be able to trigger an assertion when doing cache->snapshot. In this case, when retrieving a pgstat entry after the switch, a new snapshot would be rebuilt, confusing pgstat_build_snapshot() because a snapshot is already cached with an unexpected mode ("cache"). In order to fix this problem, this commit adds a flag to force a snapshot clear each time this GUC is changed. Some tests are added to check, while on it. Some optimizations in avoiding the snapshot clear should be possible depending on what is cached and the current GUC value, I guess, but this solution is simple, and ensures that the state of the cache is updated each time a new pgstat entry is fetched, hence being consistent with the level wanted by the client that has set the GUC. Note that cache->none and snapshot->none would not cause issues, as fetching a pgstat entry would be retrieved from shared memory on the second attempt, however a snapshot would still be cached. Similarly, none->snapshot and none->cache would build a new snapshot on the second fetch attempt. Finally, snapshot->cache would cache a new snapshot on the second attempt. Reported-by: Alexander Lakhin Author: Kyotaro Horiguchi Discussion: https://postgr.es/m/17804-2a118cd046f2d0e5@postgresql.org backpatch-through: 15	2023-05-10 11:24:30 +09:00
Amit Kapila	3d144c6c86	Fix invalid memory access during the shutdown of the parallel apply worker. The callback function pa_shutdown() accesses MyLogicalRepWorker which may not be initialized if there is an error during the initialization of the parallel apply worker. The other problem is that by the time it is invoked even after the initialization of the worker, the MyLogicalRepWorker will be reset by another callback logicalrep_worker_onexit. So, it won't have the required information. To fix this, register the shutdown callback after we are attached to the worker slot. After this fix, we observed another issue which is that sometimes the leader apply worker tries to receive the message from the error queue that might already be detached by the parallel apply worker leading to an error. To prevent such an error, we ensure that the leader apply worker detaches from the parallel apply worker's error queue before stopping it. Reported-by: Sawada Masahiko Author: Hou Zhijie Reviewed-by: Sawada Masahiko, Amit Kapila Discussion: https://postgr.es/m/CAD21AoDo+yUwNq6nTrvE2h9bB2vZfcag=jxWc7QxuWCmkDAqcA@mail.gmail.com	2023-05-09 09:28:06 +05:30
Alvaro Herrera	5472743d9e	Revert "Move PartitionPruneInfo out of plan nodes into PlannedStmt" This reverts commit `ec38694894` and its fixup `589bb81649`. This change was intended to support query planning avoiding acquisition of locks on partitions that were going to be pruned; however, the overall project took a different direction at [1] and this bit is no longer needed. Put things back the way they were as agreed in [2], to avoid unnecessary complexity. Discussion: [1] https://postgr.es/m/4191508.1674157166@sss.pgh.pa.us Discussion: [2] https://postgr.es/m/20230502175409.kcoirxczpdha26wt@alvherre.pgsql	2023-05-04 12:09:59 +02:00
Amit Kapila	de63f8dade	Fix assertion failure in apply worker. During exit, the logical replication apply worker tries to release session level locks, if any. However, if the apply worker exits due to an error before its connection is initialized, trying to release locks can lead to assertion failure. The locks will be acquired once the worker is initialized, so we don't need to release them till the worker initialization is complete. Reported-by: Alexander Lakhin Author: Hou Zhijie based on inputs from Sawada Masahiko and Amit Kapila Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/2185d65f-5aae-3efa-c48f-fb42b173ef5c@gmail.com	2023-05-03 10:17:49 +05:30
Michael Paquier	8961cb9a03	Fix typos in comments The changes done in this commit impact comments with no direct user-visible changes, with fixes for incorrect function, variable or structure names. Author: Alexander Lakhin Discussion: https://postgr.es/m/e8c38840-596a-83d6-bd8d-cebc51111572@gmail.com	2023-05-02 12:23:08 +09:00
Michael Paquier	4dadd660f0	Fix crashes with CREATE SCHEMA AUTHORIZATION and schema elements CREATE SCHEMA AUTHORIZATION with appended schema elements can lead to crashes when comparing the schema name of the query with the schemas used in the qualification of some clauses in the elements' queries. The origin of the problem is that the transformation routine for the elements listed in a CREATE SCHEMA query uses as new, expected, schema name the one listed in CreateSchemaStmt itself. However, depending on the query, CreateSchemaStmt.schemaname may be NULL, being computed instead from the role specification of the query given by the AUTHORIZATION clause, that could be either: - A user name string, with the new schema name being set to the same value as the role given. - Guessed from CURRENT_ROLE, SESSION_ROLE or CURRENT_ROLE, with a new schema name computed from the security context where CREATE SCHEMA is running. Regression tests are added for CREATE SCHEMA with some appended elements (some of them with schema qualifications), covering also some role specification patterns. While on it, this simplifies the context structure used during the transformation of the elements listed in a CREATE SCHEMA query by removing the fields for the role specification and the role type. They were not used, and for the role specification this could be confusing as the schema name may by extracted from that at the beginning of CreateSchemaCommand(). This issue exists for a long time, so backpatch down to all the versions supported. Reported-by: Song Hongyu Author: Michael Paquier Reviewed-by: Richard Guo Discussion: https://postgr.es/m/17909-f65c12dfc5f0451d@postgresql.org Backpatch-through: 11	2023-04-28 19:29:12 +09:00
Thomas Munro	828e93a6f2	Remove bogus #include added by `d4e71df6d7`. The recently added inclusion of guc.h in smgr.h is not necessary and introduces more server-related stuff. Removing the directive helps avoid potential issues with including sgmr.h in frontends. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20230425.115748.2130383825066921512.horikyota.ntt%40gmail.com	2023-04-26 10:43:53 +12:00
Andres Freund	1118cd37eb	Remove vacuum_defer_cleanup_age vacuum_defer_cleanup_age was introduced before hot_standby_feedback and replication slots existed. It is hard to use reasonably - commonly it will either be set too low (not preventing recovery conflicts, while still causing some bloat), or too high (causing a lot of bloat). The alternatives do not have that issue. That on its own might not be sufficient reason to remove vacuum_defer_cleanup_age, but it also complicates computation of xid horizons. See e.g. the bug fixed in `be504a3e97`. It also is untested. This commit removes TransactionIdRetreatSafely(), as there are no users anymore. There might be potential future users, hence noting that here. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20230317230930.nhsgk3qfk7f4axls@awork3.anarazel.de	2023-04-24 12:21:02 -07:00
Tom Lane	fce3b26e97	Rename ExecAggTransReparent, and improve its documentation. The name of this function suggests that it ought to reparent R/W expanded objects to be children of the persistent aggcontext, instead of copying them. In fact it does no such thing, and if you try to make it do so you will see multiple regression failures. Rename it to the less-misleading ExecAggCopyTransValue, and add commentary about why that attractive-sounding optimization won't work. Also adjust comments at call sites, some of which were describing logic that has since been moved into ExecAggCopyTransValue. Discussion: https://postgr.es/m/3004282.1681930251@sss.pgh.pa.us	2023-04-24 13:01:33 -04:00
Michael Paquier	0ecb87e1fa	Remove io prefix from pg_stat_io columns `a9c70b46` added the statistics view pg_stat_io which contained columns "io_context" and "io_object". Given that the columns are in the pg_stat_io view, the "io" prefix is somewhat redundant, so remove it. The code variables referring to these fields are kept unchanged so as they can keep their context about I/O. Bump catalog version. Author: Melanie Plageman Reviewed-by: Kyotaro Horiguchi, Fabrízio de Royes Mello Discussion: https://postgr.es/m/CAAKRu_aAQoJWrvT2BYYQvJChFKra_O-5ra3jhzKJZqWsTR1CPQ@mail.gmail.com	2023-04-21 07:21:50 +09:00
Thomas Munro	7d3d72b55e	Remove obsolete defense against strxfrm() bugs. Old versions of Solaris and illumos had buffer overrun bugs in their strxfrm() implementations. The bugs were fixed more than a decade ago and the relevant releases are long out of vendor support. It's time to remove the defense added by commit `be8b06c3`. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA+hUKGJ-ZPJwKHVLbqye92-ZXeLoCHu5wJL6L6HhNP7FkJ=meA@mail.gmail.com	2023-04-20 13:20:14 +12:00
Peter Geoghegan	50547a3fae	Fix wal_consistency_checking enhanced desc output. Recent enhancements to rmgr desc routines that made the output summarize certain block data (added by commits `7d8219a4` and `1c453cfd`) dealt with records that lack relevant block data (and so have nothing to give a more detailed summary of) by testing !DecodedBkpBlock.has_image. As a result, more detailed descriptions of block data were not output when wal_consistency_checking was enabled. This bug affected records with summarizable block data that also happened to have an FPI that the REDO routine isn't supposed to apply (FPIs used for consistency checking purposes only). The presence of such an FPI was incorrectly taken to indicate the absence of block data. To fix, test DecodedBkpBlock.has_data, not !DecodedBkpBlock.has_image. This is the exact condition that we care about, not an inexact proxy. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wzm5Sc9cBg1qWV_cEBfLNJCrW9FjS-SoHVt8FLA7Ldn8yg@mail.gmail.com	2023-04-19 10:42:39 -07:00
David Rowley	3f58a4e296	Fix various typos and incorrect/outdated name references Author: Alexander Lakhin Discussion: https://postgr.es/m/699beab4-a6ca-92c9-f152-f559caf6dc25@gmail.com	2023-04-19 13:50:33 +12:00
Peter Geoghegan	06e0652750	Remove useless argument from nbtree dedup function. _bt_dedup_pass()'s heapRel argument hasn't been needed or used since commit `cf2acaf4dc` made deleting any existing LP_DEAD index tuples the caller's responsibility.	2023-04-18 10:33:15 -07:00
David Rowley	eef231e816	Fix some typos and some incorrectly duplicated words Author: Justin Pryzby Reviewed-by: David Rowley Discussion: https://postgr.es/m/ZD3D1QxoccnN8A1V@telsasoft.com	2023-04-18 14:03:49 +12:00
David Rowley	b4dbf3e924	Fix various typos This fixes many spelling mistakes in comments, but a few references to invalid parameter names, function names and option names too in comments and also some in string constants Also, fix an #undef that was undefining the incorrect definition Author: Alexander Lakhin Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/d5f68d19-c0fc-91a9-118d-7c6a5a3f5fad@gmail.com	2023-04-18 13:23:23 +12:00
Peter Geoghegan	cd7cdc550c	Fix incorrect comment about nbtree WAL record. The nbtree VACUUM WAL record stores its page offset number payload in blk 0 (just like the closely related nbtree DELETE WAL record). Commit `ebd551f5` fixed a similar issue with the DELETE WAL record, but missed this one.	2023-04-17 09:58:18 -07:00
Tom Lane	d48ac0070c	Further cleanup of autoconf output files for GSSAPI changes. Running autoheader was missed in `f7431bca8`. This is cosmetic since we aren't using these HAVE_ symbols, but let's get everything in sync while we're looking at this. Discussion: https://postgr.es/m/2422362.1681741814@sss.pgh.pa.us	2023-04-17 11:21:50 -04:00
Peter Geoghegan	d6f0f95a6b	Harmonize some more function parameter names. Make sure that function declarations use names that exactly match the corresponding names from function definitions in a few places. These inconsistencies were all introduced relatively recently, after the code base had parameter name mismatches fixed in bulk (see commits starting with commits `4274dc22` and `035ce1fe`). pg_bsd_indent still has a couple of similar inconsistencies, which I (pgeoghegan) have left untouched for now. Like all earlier commits that cleaned up function parameter names, this commit was written with help from clang-tidy.	2023-04-13 10:15:20 -07:00
Stephen Frost	6633cfb216	De-Revert "Add support for Kerberos credential delegation" This reverts commit `3d03b24c3` (Revert Add support for Kerberos credential delegation) which was committed on the grounds of concern about portability, but on further review and discussion, it's clear that we are better off explicitly requiring MIT Kerberos as that appears to be the only GSSAPI library currently that's under proper maintenance and ongoing development. The API used for storing credentials was added to MIT Kerberos over a decade ago while for the other libraries which appear to be mainly based on Heimdal, which exists explicitly to be a re-implementation of MIT Kerberos, the API never made it to a released version (even though it was added to the Heimdal git repo over 5 years ago..). This post-feature-freeze change was approved by the RMT. Discussion: https://postgr.es/m/ZDDO6jaESKaBgej0%40tamriel.snowman.net	2023-04-13 08:55:07 -04:00
Alvaro Herrera	9ce04b50e1	Revert "Catalog NOT NULL constraints" and fallout This reverts commit `e056c557ae` and minor later fixes thereof. There's a few problems in this new feature -- most notably regarding pg_upgrade behavior, but others as well. This new feature is not in any way critical on its own, so instead of scrambling to fix it we revert it and try again in early 17 with these issues in mind. Discussion: https://postgr.es/m/3801207.1681057430@sss.pgh.pa.us	2023-04-12 19:29:21 +02:00
Michael Paquier	a923e21631	Fix detection of unseekable files for fseek() and ftello() with MSVC Calling fseek() or ftello() on a handle to a non-seeking device such as a pipe or a communications device is not supported. Unfortunately, MSVC's flavor of these routines, _fseeki64() and _ftelli64(), do not return an error when given a pipe as handle. Some of the logic of pg_dump and restore relies on these routines to check if a handle is seekable, causing failures when passing the contents of pg_dump to pg_restore through a pipe, for example. This commit introduces wrappers for fseeko() and ftello() on MSVC so as any callers are able to properly detect the cases of non-seekable handles. This relies mainly on GetFileType(), sharing a bit of code with the MSVC port for fstat(). The code in charge of getting a file type is refactored into a new file called win32common.c, shared by win32stat.c and the new win32fseek.c. It includes the MSVC ports for fseeko() and ftello(). Like `765f5df`, this is backpatched down to 14, where the fstat() implementation for MSVC is able to understand about files larger than 4GB in size. Using a TAP test for that is proving to be tricky as IPC::Run handles the pipes by itself, still I have been able to check the fix manually. Reported-by: Daniel Watzinger Author: Juan José Santamaría Flecha, Michael Paquier Discussion: https://postgr.es/m/CAC+AXB26a4EmxM2suXxPpJaGrqAdxracd7hskLg-zxtPB50h7A@mail.gmail.com Backpatch-through: 14	2023-04-12 09:09:38 +09:00
Peter Geoghegan	c03c2eae0a	Refine the guidelines for rmgrdesc authors. Clarify the goals of the recently added guidelines for rmgrdesc authors: to avoid gratuitous inconsistencies across resource managers, and to make it reasonably easy to write a reusable custom parser. Beyond that, the guidelines leave rmgrdesc authors with a significant amount of leeway. This even includes the leeway to invent custom conventions (in cases where it's warranted). Follow-up to commit `7d8219a4`. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkbYuvwYKm-Y-72QEh6SPMQcAo9uONv+mR3bMGcu9E_Cg@mail.gmail.com	2023-04-11 15:26:24 -07:00
Peter Geoghegan	e944063294	Fix xl_heap_lock WAL record field's data type. Make xl_heap_lock's infobits_set field of type uint8, not int8. Using int8 isn't appropriate given that the field just holds status bits. This fixes an oversight in commit `0ac5ad5134`. In passing rename the nearby TransactionId field to "xmax" to make things consistency with related records, such as xl_heap_lock_updated. Deliberately avoid a bump in XLOG_PAGE_MAGIC. No backpatch, either. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkCd3kOS8b7Rfxw7Mh1_6jvX=Nzo-CWR1VBTiOtVZkWHA@mail.gmail.com	2023-04-11 14:07:54 -07:00
Peter Geoghegan	5d6728e588	Fix nbtree posting list update desc output. We cannot use the generic array_desc approach with per-tuple nbtree posting list update metadata because array_desc can only deal with fixed width elements (e.g., page offset numbers). Using array_desc led to incorrect rmgr descriptions for updates from nbtree DELETE/VACUUM WAL records. To fix, add specialized code to describe the update metadata as array elements in desc output. We now iterate over the update metadata using an approach that matches related REDO routines. Also stop showing the updates offset number array separately in nbtree DELETE/VACUUM desc output. It's redundant information, since the same page offset numbers appear in the description of each individual update element. Also make some small tweaks to the way that we format arrays in all desc routines (not just nbtree desc routines) to make arrays a little less verbose. Oversight in commit `1c453cfd`, which enhanced the nbtree rmgr desc routines. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkbYuvwYKm-Y-72QEh6SPMQcAo9uONv+mR3bMGcu9E_Cg@mail.gmail.com	2023-04-10 11:15:41 -07:00
Stephen Frost	3d03b24c35	Revert "Add support for Kerberos credential delegation" This reverts commit `3d4fa227bc`. Per discussion and buildfarm, this depends on APIs that seem to not be available on at least one platform (NetBSD). Should be certainly possible to rework to be optional on that platform if necessary but bit late for that at this point. Discussion: https://postgr.es/m/3286097.1680922218@sss.pgh.pa.us	2023-04-08 07:21:35 -04:00
Thomas Munro	db4f21e4a3	Redesign interrupt/cancel API for regex engine. Previously, a PostgreSQL-specific callback checked by the regex engine had a way to trigger a special error code REG_CANCEL if it detected that the next call to CHECK_FOR_INTERRUPTS() would certainly throw via ereport(). A later proposed bugfix aims to move some complex logic out of signal handlers, so that it won't run until the next CHECK_FOR_INTERRUPTS(), which makes the above design impossible unless we split CHECK_FOR_INTERRUPTS() into two phases, one to run logic and another to ereport(). We may develop such a system in the future, but for the regex code it is no longer necessary. An earlier commit moved regex memory management over to our MemoryContext system. Given that the purpose of the two-phase interrupt checking was to free memory before throwing, something we don't need to worry about anymore, it seems simpler to inject CHECK_FOR_INTERRUPTS() directly into cancelation points, and just let it throw. Since the plan is to keep PostgreSQL-specific concerns separate from the main regex engine code (with a view to bein able to stay in sync with other projects), do this with a new macro INTERRUPT(), customizable in regcustom.h and defaulting to nothing. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com	2023-04-08 22:10:39 +12:00
Thomas Munro	4f51429dd7	Update tsearch regex memory management. Now that our regex engine uses palloc(), it's not necessary to set up a special memory context callback to free compiled regexes. The regex has no resources other than the memory that is already going to be freed in bulk. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com	2023-04-08 22:09:17 +12:00
Thomas Munro	bea3d7e383	Use MemoryContext API for regex memory management. Previously, regex_t objects' memory was managed with malloc() and free() directly. Switch to palloc()-based memory management instead. Advantages: * memory used by cached regexes is now visible with MemoryContext observability tools * cleanup can be done automatically in certain failure modes (something that later commits will take advantage of) * cleanup can be done in bulk On the downside, there may be more fragmentation (wasted memory) due to per-regex MemoryContext objects. This is a problem shared with other cached objects in PostgreSQL and can probably be improved with later tuning. Thanks to Noah Misch for suggesting this general approach, which unblocks later work on interrupts. Suggested-by: Noah Misch <noah@leadboat.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com	2023-04-08 22:08:41 +12:00
Andres Freund	0fdab27ad6	Allow logical decoding on standbys Unsurprisingly, this requires wal_level = logical to be set on the primary and standby. The infrastructure added in `26669757b6` ensures that slots are invalidated if the primary's wal_level is lowered. Creating a slot on a standby waits for a xl_running_xact record to be processed. If the primary is idle (and thus not emitting xl_running_xact records), that can take a while. To make that faster, this commit also introduces the pg_log_standby_snapshot() function. By executing it on the primary, completion of slot creation on the standby can be accelerated. Note that logical decoding on a standby does not itself enforce that required catalog rows are not removed. The user has to use physical replication slots + hot_standby_feedback or other measures to prevent that. If catalog rows required for a slot are removed, the slot is invalidated. See `6af1793954` for an overall design of logical decoding on a standby. Bumps catversion, for the addition of the pg_log_standby_snapshot() function. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> (in an older version) Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: FabrÌzio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com>	2023-04-08 02:20:05 -07:00
Andres Freund	e101dfac3a	For cascading replication, wake physical and logical walsenders separately Physical walsenders can't send data until it's been flushed; logical walsenders can't decode and send data until it's been applied. On the standby, the WAL is flushed first, which will only wake up physical walsenders; and then applied, which will only wake up logical walsenders. Previously, all walsenders were awakened when the WAL was flushed. That was fine for logical walsenders on the primary; but on the standby the flushed WAL would have been not applied yet, so logical walsenders were awakened too early. Per idea from Jeff Davis and Amit Kapila. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-By: Jeff Davis <pgsql@j-davis.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAA4eK1+zO5LUeisabX10c81LU-fWMKO4M9Wyg1cdkbW7Hqh6vQ@mail.gmail.com	2023-04-08 01:06:00 -07:00
Andres Freund	26669757b6	Handle logical slot conflicts on standby During WAL replay on the standby, when a conflict with a logical slot is identified, invalidate such slots. There are two sources of conflicts: 1) Using the information added in `6af1793954`, logical slots are invalidated if required rows are removed 2) wal_level on the primary server is reduced to below logical Uses the infrastructure introduced in the prior commit. FIXME: add commit reference. Change InvalidatePossiblyObsoleteSlot() to use a recovery conflict to interrupt use of a slot, if called in the startup process. The new recovery conflict is added to pg_stat_database_conflicts, as confl_active_logicalslot. See `6af1793954` for an overall design of logical decoding on a standby. Bumps catversion for the addition of the pg_stat_database_conflicts column. Bumps PGSTAT_FILE_FORMAT_ID for the same reason. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-08 00:05:44 -07:00
Andres Freund	be87200efd	Support invalidating replication slots due to horizon and wal_level Needed for logical decoding on a standby. Slots need to be invalidated because of the horizon if rows required for logical decoding are removed. If the primary's wal_level is lowered from 'logical', logical slots on the standby need to be invalidated. The new invalidation methods will be used in a subsequent commit. Logical slots that have been invalidated can be identified via the new pg_replication_slots.conflicting column. See `6af1793954` for an overall design of logical decoding on a standby. Bumps catversion for the addition of the new pg_replication_slots column. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-07 22:40:27 -07:00
Andres Freund	15f8203a59	Replace replication slot's invalidated_at LSN with an enum This is mainly useful because the upcoming logical-decoding-on-standby feature adds further reasons for invalidating slots, and we don't want to end up with multiple invalidated_* fields, or check different attributes. Eventually we should consider not resetting restart_lsn when invalidating a slot due to max_slot_wal_keep_size. But that's a user visible change, so left for later. Increases SLOT_VERSION, due to the changed field (with a different alignment, no less). Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-07 21:47:25 -07:00
Thomas Munro	d4e71df6d7	Add io_direct setting (developer-only). Provide a way to ask the kernel to use O_DIRECT (or local equivalent) where available for data and WAL files, to avoid or minimize kernel caching. This hurts performance currently and is not intended for end users yet. Later proposed work would introduce our own I/O clustering, read-ahead, etc to replace the facilities the kernel disables with this option. The only user-visible change, if the developer-only GUC is not used, is that this commit also removes the obscure logic that would activate O_DIRECT for the WAL when wal_sync_method=open_[data]sync and wal_level=minimal (which also requires max_wal_senders=0). Those are non-default and unlikely settings, and this behavior wasn't (correctly) documented. The same effect can be achieved with io_direct=wal. Author: Thomas Munro <thomas.munro@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg%40mail.gmail.com	2023-04-08 16:35:07 +12:00
Thomas Munro	faeedbcefd	Introduce PG_IO_ALIGN_SIZE and align all I/O buffers. In order to have the option to use O_DIRECT/FILE_FLAG_NO_BUFFERING in a later commit, we need the addresses of user space buffers to be well aligned. The exact requirements vary by OS and file system (typically sectors and/or memory pages). The address alignment size is set to 4096, which is enough for currently known systems: it matches modern sectors and common memory page size. There is no standard governing O_DIRECT's requirements so we might eventually have to reconsider this with more information from the field or future systems. Aligning I/O buffers on memory pages is also known to improve regular buffered I/O performance. Three classes of I/O buffers for regular data pages are adjusted: (1) Heap buffers are now allocated with the new palloc_aligned() or MemoryContextAllocAligned() functions introduced by commit `439f6175`. (2) Stack buffers now use a new struct PGIOAlignedBlock to respect PG_IO_ALIGN_SIZE, if possible with this compiler. (3) The buffer pool is also aligned in shared memory. WAL buffers were already aligned on XLOG_BLCKSZ. It's possible for XLOG_BLCKSZ to be configured smaller than PG_IO_ALIGNED_SIZE and thus for O_DIRECT WAL writes to fail to be well aligned, but that's a pre-existing condition and will be addressed by a later commit. BufFiles are not yet addressed (there's no current plan to use O_DIRECT for those, but they could potentially get some incidental speedup even in plain buffered I/O operations through better alignment). If we can't align stack objects suitably using the compiler extensions we know about, we disable the use of O_DIRECT by setting PG_O_DIRECT to 0. This avoids the need to consider systems that have O_DIRECT but can't align stack objects the way we want; such systems could in theory be supported with more work but we don't currently know of any such machines, so it's easier to pretend there is no O_DIRECT support instead. That's an existing and tested class of system. Add assertions that all buffers passed into smgrread(), smgrwrite() and smgrextend() are correctly aligned, unless PG_O_DIRECT is 0 (= stack alignment tricks may be unavailable) or the block size has been set too small to allow arrays of buffers to be all aligned. Author: Thomas Munro <thomas.munro@gmail.com> Author: Andres Freund <andres@anarazel.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/CA+hUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg@mail.gmail.com	2023-04-08 16:34:50 +12:00
Stephen Frost	3d4fa227bc	Add support for Kerberos credential delegation Support GSSAPI/Kerberos credentials being delegated to the server by a client. With this, a user authenticating to PostgreSQL using Kerberos (GSSAPI) credentials can choose to delegate their credentials to the PostgreSQL server (which can choose to accept them, or not), allowing the server to then use those delegated credentials to connect to another service, such as with postgres_fdw or dblink or theoretically any other service which is able to be authenticated using Kerberos. Both postgres_fdw and dblink are changed to allow non-superuser password-less connections but only when GSSAPI credentials have been delegated to the server by the client and GSSAPI is used to authenticate to the remote system. Authors: Stephen Frost, Peifeng Qiu Reviewed-By: David Christensen Discussion: https://postgr.es/m/CO1PR05MB8023CC2CB575E0FAAD7DF4F8A8E29@CO1PR05MB8023.namprd05.prod.outlook.com	2023-04-07 21:58:04 -04:00
Andres Freund	ac8d53dae5	Track IO times in pg_stat_io `a9c70b46db` and 8aaa04b32S added counting of IO operations to a new view, pg_stat_io. Now, add IO timing for reads, writes, extends, and fsyncs to pg_stat_io as well. This combines the tracking for pgBufferUsage with the tracking for pg_stat_io into a new function pgstat_count_io_op_time(). This should make it a bit easier to avoid the somewhat costly instr_time conversion done for pgBufferUsage. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ay5iKmnbXZ3DsauViF3eMxu4m1oNnJXqV_HyqYeg55Ww%40mail.gmail.com	2023-04-07 17:04:56 -07:00
Peter Geoghegan	1c453cfd89	Show more detail in nbtree rmgr descriptions. Show a detailed description of the page offset number arrays that appear in certain nbtree WAL records. Also brings nbtree desc routines in line with the guidelines established by recent commit `7d8219a4`. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/flat/20230109215842.fktuhesvayno6o4g%40awork3.anarazel.de	2023-04-07 16:46:23 -07:00
Peter Geoghegan	7d8219a444	Show more detail in heapam rmgr descriptions. Add helper functions that output arrays in a standard format, and use the functions inside heapdesc routines. This allows tools like pg_walinspect to show a detailed description of the page offset number arrays for records like PRUNE and VACUUM (unless there was an FPI). Also document the conventions that desc routines should follow. Only the heapdesc routines follow the conventions for now, so they're just guidelines for the time being. Based on a suggestion from Andres Freund. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/flat/20230109215842.fktuhesvayno6o4g%40awork3.anarazel.de	2023-04-07 16:08:52 -07:00
Alvaro Herrera	e056c557ae	Catalog NOT NULL constraints We now create pg_constaint rows for NOT NULL constraints with contype='n'. We propagate these constraints during operations such as adding inheritance relationships, creating and attaching partitions, creating tables LIKE other tables. We mostly follow the well-known rules of conislocal and coninhcount that we have for CHECK constraints, with some adaptations; for example, as opposed to CHECK constraints, we don't match NOT NULL ones by name when descending a hierarchy to alter it; instead we match by column number. This means we don't require the constraint names to be identical across a hierarchy. For now, we omit them from system catalogs. Maybe this is worth reconsidering. We don't support NOT VALID nor DEFERRABLE clauses either; these can be added as separate features later (this patch is already large and complicated enough.) This has been very long in the making. The first patch was written by Bernd Helmle in 2010 to add a new pg_constraint.contype value ('n'), which I (Álvaro) then hijacked in 2011 and 2012, until that one was killed by the realization that we ought to use contype='c' instead: manufactured CHECK constraints. However, later SQL standard development, as well as nonobvious emergent properties of that design (mostly, failure to distinguish them from "normal" CHECK constraints as well as the performance implication of having to test the CHECK expression) led us to reconsider this choice, so now the current implementation uses contype='n' again. In 2016 Vitaly Burovoy also worked on this feature[1] but found no consensus for his proposed approach, which was claimed to be closer to the letter of the standard, requiring additional pg_attribute columns to track the OID of the NOT NULL constraint for that column. [1] https://postgr.es/m/CAKOSWNkN6HSyatuys8xZxzRCR-KL1OkHS5-b9qd9bf1Rad3PLA@mail.gmail.com Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Bernd Helmle <mailings@oopsware.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/CACA0E642A0267EDA387AF2B%40%5B172.26.14.62%5D Discussion: https://postgr.es/m/AANLkTinLXMOEMz+0J29tf1POokKi4XDkWJ6-DDR9BKgU@mail.gmail.com Discussion: https://postgr.es/m/20110707213401.GA27098@alvh.no-ip.org Discussion: https://postgr.es/m/1343682669-sup-2532@alvh.no-ip.org Discussion: https://postgr.es/m/CAKOSWNkN6HSyatuys8xZxzRCR-KL1OkHS5-b9qd9bf1Rad3PLA@mail.gmail.com Discussion: https://postgr.es/m/20220817181249.q7qvj3okywctra3c@alvherre.pgsql	2023-04-07 19:59:57 +02:00
Tom Lane	888f2ea0a8	Add array_sample() and array_shuffle() functions. These are useful in Monte Carlo applications. Martin Kalcher, reviewed/adjusted by Daniel Gustafsson and myself Discussion: https://postgr.es/m/9d160a44-7675-51e8-60cf-6d64b76db831@aboutsource.net	2023-04-07 11:47:07 -04:00
Michael Paquier	8fcb32db98	Add more protections in WAL record APIs against overflows This commit adds a limit to the size of an XLogRecord at 1020MB, based on a suggestion by Heikki Linnakangas. This counts for the overhead needed by the XLogReader when allocating the memory it needs to read a record in DecodeXLogRecordRequiredSpace(), based on the record size. An assertion based on that is added to detect that any additions in the XLogReader facilities would not cause any overflows. If that's ever the case, the upper bound allowed would need to be adjusted. Before this, it was possible for an external module to create WAL records large enough to be assembled but not replayable, causing failures when replaying such WAL records on standbys. One case mentioned where this is possible is the in-core function pg_logical_emit_message() (wrapper for LogLogicalMessage), that allows to emit WAL records with an arbitrary amount of data potentially higher than the replay limit of approximately 1GB (limit of a palloc, minus the overhead needed by a XLogReader). This commit is a follow-up of `ffd1b6b` that has added similar protections for the block-level data. Here, the checks are extended to the whole record length, mainrdata_len being extended from uint32 to uint64 with the routines registering buffer and record data still limited to uint32 to minimize the checks when assembling a record. All the error messages related to overflow checks are improved to provide more context about the error happening. Author: Matthias van de Meent Reviewed-by: Andres Freund, Heikki Linnakangas, Michael Paquier Discussion: https://postgr.es/m/CAEze2WgGiw+LZt+vHf8tWqB_6VxeLsMeoAuod0N=ij1q17n5pw@mail.gmail.com	2023-04-07 10:10:17 +09:00
David Rowley	ae78cae3be	Add --buffer-usage-limit option to vacuumdb `1cbbee033` added BUFFER_USAGE_LIMIT to the VACUUM and ANALYZE commands, so here we permit that option to be specified in vacuumdb. In passing, adjust the documents for vacuum_buffer_usage_limit and the BUFFER_USAGE_LIMIT VACUUM option to mention "kB" rather than "KB". Do the same for the ERROR message in ExecVacuum() and check_vacuum_buffer_usage_limit(). Without that we might tell a user that the valid minimum value is 128 KB only to reject that because we accept only "kB" and not "KB". Also, add a small reminder comment in vacuum.h to try to trigger the memory of anyone adding new fields to VacuumParams that they might want to consider if vacuumdb needs to grow a new option too. Author: Melanie Plageman Reviewed-by: Justin Pryzby Reviewed-by: David Rowley Discussion: https://postgr.es/m/ZAzTg3iEnubscvbf@telsasoft.com	2023-04-07 12:47:10 +12:00
Andres Freund	00d1e02be2	hio: Use ExtendBufferedRelBy() to extend tables more efficiently While we already had some form of bulk extension for relations, it was fairly limited. It only amortized the cost of acquiring the extension lock, the relation itself was still extended one-by-one. Bulk extension was also solely triggered by contention, not by the amount of data inserted. To address this, use ExtendBufferedRelBy(), introduced in `31966b151e`, to extend the relation. We try to extend the relation by multiple blocks in two situations: 1) The caller tells RelationGetBufferForTuple() that it will need multiple pages. For now that's only used by heap_multi_insert(), see commit FIXME. 2) If there is contention on the extension lock, use the number of waiters for the lock as a multiplier for the number of blocks to extend by. This is similar to what we already did. Previously we additionally multiplied the numbers of waiters by 20, but with the new relation extension infrastructure I could not see a benefit in doing so. Using the freespacemap to provide empty pages can cause significant contention, and adds measurable overhead, even if there is no contention. To reduce that, remember the blocks the relation was extended by in the BulkInsertState, in the extending backend. In case 1) from above, the blocks the extending backend needs are not entered into the FSM, as we know that we will need those blocks. One complication with using the FSM to record empty pages, is that we need to insert blocks into the FSM, when we already hold a buffer content lock. To avoid doing IO while holding a content lock, release the content lock before recording free space. Currently that opens a small window in which another backend could fill the block, if a concurrent VACUUM records the free space. If that happens, we retry, similar to the already existing case when otherBuffer is provided. In the future it might be worth closing the race by preventing VACUUM from recording the space in newly extended pages. This change provides very significant wins (3x at 16 clients, on my workstation) for concurrent COPY into a single relation. Even single threaded COPY is measurably faster, primarily due to not dirtying pages while extending, if supported by the operating system (see commit `4d330a61bb`). Even single-row INSERTs benefit, although to a much smaller degree, as the relation extension lock rarely is the primary bottleneck. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-06 16:53:17 -07:00
David Rowley	1cbbee0338	Add VACUUM/ANALYZE BUFFER_USAGE_LIMIT option Add new options to the VACUUM and ANALYZE commands called BUFFER_USAGE_LIMIT to allow users more control over how large to make the buffer access strategy that is used to limit the usage of buffers in shared buffers. Larger rings can allow VACUUM to run more quickly but have the drawback of VACUUM possibly evicting more buffers from shared buffers that might be useful for other queries running on the database. Here we also add a new GUC named vacuum_buffer_usage_limit which controls how large to make the access strategy when it's not specified in the VACUUM/ANALYZE command. This defaults to 256KB, which is the same size as the access strategy was prior to this change. This setting also controls how large to make the buffer access strategy for autovacuum. Per idea by Andres Freund. Author: Melanie Plageman Reviewed-by: David Rowley Reviewed-by: Andres Freund Reviewed-by: Justin Pryzby Reviewed-by: Bharath Rupireddy Discussion: https://postgr.es/m/20230111182720.ejifsclfwymw2reb@awork3.anarazel.de	2023-04-07 11:40:31 +12:00
Andres Freund	5279e9db8e	heapam: Pass number of required pages to RelationGetBufferForTuple() A future commit will use this information to determine how aggressively to extend the relation by. In heap_multi_insert() we know accurately how many pages we need once we need to extend the relation, providing an accurate lower bound for how much to extend. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-06 16:17:16 -07:00
Daniel Gustafsson	7d71d3dd08	Refresh cost-based delay params more frequently in autovacuum Allow autovacuum to reload the config file more often so that cost-based delay parameters can take effect while VACUUMing a relation. Previously, autovacuum workers only reloaded the config file once per relation vacuumed, so config changes could not take effect until beginning to vacuum the next table. Now, check if a reload is pending roughly once per block, when checking if we need to delay. In order for autovacuum workers to safely update their own cost delay and cost limit parameters without impacting performance, we had to rethink when and how these values were accessed. Previously, an autovacuum worker's wi_cost_limit was set only at the beginning of vacuuming a table, after reloading the config file. Therefore, at the time that autovac_balance_cost() was called, workers vacuuming tables with no cost-related storage parameters could still have different values for their wi_cost_limit_base and wi_cost_delay. Now that the cost parameters can be updated while vacuuming a table, workers will (within some margin of error) have no reason to have different values for cost limit and cost delay (in the absence of cost-related storage parameters). This removes the rationale for keeping cost limit and cost delay in shared memory. Balancing the cost limit requires only the number of active autovacuum workers vacuuming a table with no cost-based storage parameters. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com	2023-04-07 01:00:21 +02:00
Daniel Gustafsson	a85c60a945	Separate vacuum cost variables from GUCs Vacuum code run both by autovacuum workers and a backend doing VACUUM/ANALYZE previously inspected VacuumCostLimit and VacuumCostDelay, which are the global variables backing the GUCs vacuum_cost_limit and vacuum_cost_delay. Autovacuum workers needed to override these variables with their own values, derived from autovacuum_vacuum_cost_limit and autovacuum_vacuum_cost_delay and worker cost limit balancing logic. This led to confusing code which, in some cases, both derived and set a new value of VacuumCostLimit from VacuumCostLimit. In preparation for refreshing these GUC values more often, introduce new, independent global variables and add a function to update them using the GUCs and existing logic. Per suggestion by Kyotaro Horiguchi Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com	2023-04-07 00:54:53 +02:00
Daniel Gustafsson	71a825194f	Make vacuum failsafe_active globally visible While vacuuming a table in failsafe mode, VacuumCostActive should not be re-enabled. This currently isn't a problem because vacuum cost parameters are only refreshed in between vacuuming tables and failsafe status is reset for every table. In preparation for allowing vacuum cost parameters to be updated more frequently, elevate LVRelState->failsafe_active to a global, VacuumFailsafeActive, which will be checked when determining whether or not to re-enable vacuum cost-related delays. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com	2023-04-07 00:54:08 +02:00
Tomas Vondra	2820adf775	Support long distance matching for zstd compression zstd compression supports a special mode for finding matched in distant past, which may result in better compression ratio, at the expense of using more memory (the window size is 128MB). To enable this optional mode, use the "long" keyword when specifying the compression method (--compress=zstd:long). Author: Justin Pryzby Reviewed-by: Tomas Vondra, Jacob Champion Discussion: https://postgr.es/m/20230224191840.GD1653@telsasoft.com Discussion: https://postgr.es/m/20220327205020.GM28503@telsasoft.com	2023-04-06 17:18:42 +02:00
David Rowley	b9b125b9c1	Move various prechecks from vacuum() into ExecVacuum() vacuum() is used for both the VACUUM command and for autovacuum. There were many prechecks being done inside vacuum() that were just not relevant to autovacuum. Let's move the bulk of these into ExecVacuum() so that they're only executed when running the VACUUM command. This removes a small amount of overhead when autovacuum vacuums a table. While we are at it, allocate VACUUM's BufferAccessStrategy in ExecVacuum() and pass it into vacuum() instead of expecting vacuum() to make it if it's not already made by the calling function. To make this work, we need to create the vacuum memory context slightly earlier, so we now need to pass that down to vacuum() so that it's available for use in other memory allocations. Author: Melanie Plageman Reviewed-by: David Rowley Discussion: https://postgr.es/m/20230405211534.4skgskbilnxqrmxg@awork3.anarazel.de	2023-04-06 15:44:52 +12:00
Michael Paquier	1d477a907e	Fix row tracking in pg_stat_statements with extended query protocol pg_stat_statements relies on EState->es_processed to count the number of rows processed by ExecutorRun(). This proves to be a problem under the extended query protocol when the result of a query is fetched through more than one call of ExecutorRun(), as es_processed is reset each time ExecutorRun() is called. This causes pg_stat_statements to report the number of rows calculated in the last execute fetch, rather than the global sum of all the rows processed. As pquery.c tells, this is a problem when a portal does not use holdStore. For example, DMLs with RETURNING would report a correct tuple count as these do one execution cycle when the query is first executed to fill in the portal's store with one ExecutorRun(), feeding on the portal's store for each follow-up execute fetch depending on the fetch size requested by the client. The fix proposed for this issue is simple with the addition of an extra counter in EState that's preserved across multiple ExecutorRun() calls, incremented with the value calculated in es_processed. This approach is not back-patchable, unfortunately. Note that libpq does not currently give any way to control the fetch size when using the extended v3 protocol, meaning that in-core testing is not possible yet. This issue can be easily verified with the JDBC driver, though, with autocommit disabled. Hence, having in-core tests requires more features, left for future discussion: - At least two new libpq routines splitting PQsendQueryGuts(), one for the bind/describe and a second for a series of execute fetches with a custom fetch size, likely in a fashion similar to what JDBC does. - A psql meta-command for the execute phase. This part is not strictly mandatory, still it could be handy. Reported-by: Andrew Dunstan (original discovery by Simon Siggs) Author: Sami Imseih Reviewed-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/EBE6C507-9EB6-4142-9E4D-38B1673363A7@amazon.com Discussion: https://postgr.es/m/c90890e7-9c89-c34f-d3c5-d5c763a34bd8@dunslane.net	2023-04-06 09:29:03 +09:00
Andres Freund	31966b151e	bufmgr: Introduce infrastructure for faster relation extension The primary bottlenecks for relation extension are: 1) The extension lock is held while acquiring a victim buffer for the new page. Acquiring a victim buffer can require writing out the old page contents including possibly needing to flush WAL. 2) When extending via ReadBuffer() et al, we write a zero page during the extension, and then later write out the actual page contents. This can nearly double the write rate. 3) The existing bulk relation extension infrastructure in hio.c just amortized the cost of acquiring the relation extension lock, but none of the other costs. Unfortunately 1) cannot currently be addressed in a central manner as the callers to ReadBuffer() need to acquire the extension lock. To address that, this this commit moves the responsibility for acquiring the extension lock into bufmgr.c functions. That allows to acquire the relation extension lock for just the required time. This will also allow us to improve relation extension further, without changing callers. The reason we write all-zeroes pages during relation extension is that we hope to get ENOSPC errors earlier that way (largely works, except for CoW filesystems). It is easier to handle out-of-space errors gracefully if the page doesn't yet contain actual tuples. This commit addresses 2), by using the recently introduced smgrzeroextend(), which extends the relation, without dirtying the kernel page cache for all the extended pages. To address 3), this commit introduces a function to extend a relation by multiple blocks at a time. There are three new exposed functions: ExtendBufferedRel() for extending the relation by a single block, ExtendBufferedRelBy() to extend a relation by multiple blocks at once, and ExtendBufferedRelTo() for extending a relation up to a certain size. To avoid duplicating code between ReadBuffer(P_NEW) and the new functions, ReadBuffer(P_NEW) now implements relation extension with ExtendBufferedRel(), using a flag to tell ExtendBufferedRel() that the relation lock is already held. Note that this commit does not yet lead to a meaningful performance or scalability improvement - for that uses of ReadBuffer(P_NEW) will need to be converted to ExtendBuffered*(), which will be done in subsequent commits. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-05 16:21:09 -07:00
Andres Freund	12f3867f55	bufmgr: Support multiple in-progress IOs by using resowner A future patch will add support for extending relations by multiple blocks at once. To be concurrency safe, the buffers for those blocks need to be marked as BM_IO_IN_PROGRESS. Until now we only had infrastructure for recovering from an IO error for a single buffer. This commit extends that infrastructure to multiple buffers by using the resource owner infrastructure. This commit increases the size of the ResourceOwnerData struct, which appears to have a just about measurable overhead in very extreme workloads. Medium term we are planning to substantially shrink the size of ResourceOwnerData. Short term the increase is small enough to not worry about it for now. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de Discussion: https://postgr.es/m/20221029200025.w7bvlgvamjfo6z44@awork3.anarazel.de	2023-04-05 14:17:55 -07:00
Tom Lane	16dc2703c5	Support "Right Anti Join" plan shapes. Merge and hash joins can support antijoin with the non-nullable input on the right, using very simple combinations of their existing logic for right join and anti join. This gives the planner more freedom about how to order the join. It's particularly useful for hash join, since we may now have the option to hash the smaller table instead of the larger. Richard Guo, reviewed by Ronan Dunklau and myself Discussion: https://postgr.es/m/CAMbWs48xh9hMzXzSy3VaPzGAz+fkxXXTUbCLohX1_L8THFRm2Q@mail.gmail.com	2023-04-05 16:59:09 -04:00
Andres Freund	794f259447	bufmgr: Add Pin/UnpinLocalBuffer() So far these were open-coded in quite a few places, without a good reason. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-05 10:42:17 -07:00
Andres Freund	819b69a81d	bufmgr: Add some more error checking [infrastructure] around pinning This adds a few more assertions against a buffer being local in places we don't expect, and extracts the check for a buffer being pinned exactly once from LockBufferForCleanup() into its own function. Later commits will use this function. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: http://postgr.es/m/419312fd-9255-078c-c3e3-f0525f911d7f@iki.fi	2023-04-05 10:42:17 -07:00
Andres Freund	4d330a61bb	Add smgrzeroextend(), FileZero(), FileFallocate() smgrzeroextend() uses FileFallocate() to efficiently extend files by multiple blocks. When extending by a small number of blocks, use FileZero() instead, as using posix_fallocate() for small numbers of blocks is inefficient for some file systems / operating systems. FileZero() is also used as the fallback for FileFallocate() on platforms / filesystems that don't support fallocate. A big advantage of using posix_fallocate() is that it typically won't cause dirty buffers in the kernel pagecache. So far the most common pattern in our code is that we smgrextend() a page full of zeroes and put the corresponding page into shared buffers, from where we later write out the actual contents of the page. If the kernel, e.g. due to memory pressure or elapsed time, already wrote back the all-zeroes page, this can lead to doubling the amount of writes reaching storage. There are no users of smgrzeroextend() as of this commit. That will follow in future commits. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-05 10:06:39 -07:00
Peter Eisentraut	c9f57541d9	doc: Update SQL features/conformance information to SQL:2023 Optional subfeatures have been changed to top-level features, so there is a bit of a churn in the list for that. Some existing functions have been added to the standard, so they are moved from the "other" to the "standard" lists in their sections. Discussion: https://www.postgresql.org/message-id/flat/63f285d9-4ec8-0c9e-4bf5-e76334ddc0af@enterprisedb.com	2023-04-05 09:20:25 +02:00
Jeff Davis	ea1db8ae70	Canonicalize ICU locale names to language tags. Convert to BCP47 language tags before storing in the catalog, except during binary upgrade or when the locale comes from an existing collation or template database. The resulting language tags can vary slightly between ICU versions. For instance, "@colBackwards=yes" is converted to "und-u-kb-true" in older versions of ICU, and to the simpler (but equivalent) "und-u-kb" in newer versions. The process of canonicalizing to a language tag also understands more input locale string formats than ucol_open(). For instance, "fr_CA.UTF-8" is misinterpreted by ucol_open() and the region is ignored; effectively treating it the same as the locale "fr" and opening the wrong collator. Canonicalization properly interprets the language and region, resulting in the language tag "fr-CA", which can then be understood by ucol_open(). This commit fixes a problem in prior versions due to ucol_open() misinterpreting locale strings as described above. For instance, creating an ICU collation with locale "fr_CA.UTF-8" would store that string directly in the catalog, which would later be passed to (and misinterpreted by) ucol_open(). After this commit, the locale string will be canonicalized to language tag "fr-CA" in the catalog, which will be properly understood by ucol_open(). Because this fix affects the resulting collator, we cannot change the locale string stored in the catalog for existing databases or collations; otherwise we'd risk corrupting indexes. Therefore, only canonicalize locales for newly-created (not upgraded) collations/databases. For similar reasons, do not backport. Discussion: https://postgr.es/m/8c7af6820aed94dc7bc259d2aa7f9663518e6137.camel@j-davis.com Reviewed-by: Peter Eisentraut	2023-04-04 10:38:58 -07:00
Robert Haas	482675987b	Add a run_as_owner option to subscriptions. This option is normally false, but can be set to true to obtain the legacy behavior where the subscription runs with the permissions of the subscription owner rather than the permissions of the table owner. The advantages of this mode are (1) it doesn't require that the subscription owner have permission to SET ROLE to each table owner and (2) since no role switching occurs, the SECURITY_RESTRICTED_OPERATION restrictions do not apply. On the downside, it allows any table owner to easily usurp the privileges of the subscription owner - basically, to take over their account. Because that's generally quite undesirable, we don't make this mode the default, but we do make it available, just in case the new behavior causes too many problems for someone. Discussion: http://postgr.es/m/CA+TgmoZ-WEeG6Z14AfH7KhmpX2eFh+tZ0z+vf0=eMDdbda269g@mail.gmail.com	2023-04-04 12:03:03 -04:00
Robert Haas	1e10d49b65	Perform logical replication actions as the table owner. Up until now, logical replication actions have been performed as the subscription owner, who will generally be a superuser. Commit `cec57b1a0f` documented hazards associated with that situation, namely, that any user who owns a table on the subscriber side could assume the privileges of the subscription owner by attaching a trigger, expression index, or some other kind of executable code to it. As a remedy, it suggested not creating configurations where users who are not fully trusted own tables on the subscriber. Although that will work, it basically precludes using logical replication in the way that people typically want to use it, namely, to replicate a database from one node to another without necessarily having any restrictions on which database users can own tables. So, instead, change logical replication to execute INSERT, UPDATE, DELETE, and TRUNCATE operations as the table owner when they are replicated. Since this involves switching the active user frequently within a session that is authenticated as the subscription user, also impose SECURITY_RESTRICTED_OPERATION restrictions on logical replication code. As an exception, if the table owner can SET ROLE to the subscription owner, these restrictions have no security value, so don't impose them in that case. Subscription owners are now required to have the ability to SET ROLE to every role that owns a table that the subscription is replicating. If they don't, replication will fail. Superusers, who normally own subscriptions, satisfy this property by default. Non-superusers users who own subscriptions will need to be granted the roles that own relevant tables. Patch by me, reviewed (but not necessarily in its entirety) by Jelte Fennema, Jeff Davis, and Noah Misch. Discussion: http://postgr.es/m/CA+TgmoaSCkg9ww9oppPqqs+9RVqCexYCE6Aq=UsYPfnOoDeFkw@mail.gmail.com	2023-04-04 11:25:23 -04:00
Alvaro Herrera	71bfd1543f	Code review for recent SQL/JSON commits - At the last minute and for no particularly good reason, I changed the WITHOUT token to be marked especially for lookahead, from the one in WITHOUT TIME to the one in WITHOUT UNIQUE. Study of upcoming patches (where a new WITHOUT ARRAY WRAPPER clause is added) showed me that the former was better, so put it back the way the original patch had it. - update exprTypmod() for JsonConstructorExpr to return the typmod of the RETURNING clause, as a comment there suggested. Perhaps it's possible for this to make a difference with datetime types, but I didn't try to build a test case. - The nodeFuncs.c support code for new nodes was calling walker() directly instead of the WALK() macro as introduced by commit `1c27d16e6e`. Modernize that. Also add exprLocation() support for a couple of nodes that missed it. Lastly, reorder the code more sensibly. The WITHOUT_LA -> WITHOUT change means that stored rules containing either WITHOUT TIME ZONE or WITHOUT UNIQUE KEYS would change representation. Therefore, bump catversion. Discussion: https://postgr.es/m/20230329181708.e64g2tpy7jyufqkr@alvherre.pgsql	2023-04-04 14:04:30 +02:00
Peter Geoghegan	e48c817395	Recycle deleted nbtree pages more aggressively. Commit `61b313e4` made nbtree consistently pass down a heaprel to low level routines like _bt_getbuf(). Although this was primarily intended as preparation for logical decoding on standbys, it also made it easy to correct an old deficiency in how nbtree VACUUM determines whether or not it's now safe to recycle deleted pages. Pass the heaprel to GlobalVisTestFor() in nbtree routines that deal with recycle safety. nbtree now makes less pessimistic assumptions about recycle safety within non-catalog relations. This enhancement complements the recycling enhancement added by commit `9dd963ae25`. nbtree remains just as pessimistic as ever when it comes to recycle safety within indexes on catalog relations. There is no fundamental reason why we need to treat catalog relations differently, though. The behavioral inconsistency is a consequence of the way that nbtree uses nextXID values to implement what Lanin and Shasha call "the drain technique". Note in particular that it has nothing to do with whether or not index tuples might still be required for an older MVCC snapshot. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkaiDxCje0yPuH=3Uh2p1V_2pFGY==xfbZoZu7Ax_NB8g@mail.gmail.com	2023-04-03 11:31:43 -07:00
Peter Geoghegan	a349b86603	Move heaprel struct field next to index rel field. Commit `61b313e4` added a heaprel struct member to IndexVacuumInfo, but placed it last. Move the heaprel struct member next to the index struct member to improve the code's readability. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznG=TV6S9d3VA=y0vBHbXwnLs9_LLdiML=aNJuHeriwxg@mail.gmail.com	2023-04-03 11:01:11 -07:00
Alexander Korotkov	2b65bf046d	Revert `11470f544e` Discussion: https://postgr.es/m/20230323003003.plgaxjqahjgkuxrk%40awork3.anarazel.de	2023-04-03 16:54:31 +03:00
Andres Freund	6af1793954	Add info in WAL records in preparation for logical slot conflict handling This commit only implements one prerequisite part for allowing logical decoding. The commit message contains an explanation of the overall design, which later commits will refer back to. Overall design: 1. We want to enable logical decoding on standbys, but replay of WAL from the primary might remove data that is needed by logical decoding, causing error(s) on the standby. To prevent those errors, a new replication conflict scenario needs to be addressed (as much as hot standby does). 2. Our chosen strategy for dealing with this type of replication slot is to invalidate logical slots for which needed data has been removed. 3. To do this we need the latestRemovedXid for each change, just as we do for physical replication conflicts, but we also need to know whether any particular change was to data that logical replication might access. That way, during WAL replay, we know when there is a risk of conflict and, if so, if there is a conflict. 4. We can't rely on the standby's relcache entries for this purpose in any way, because the startup process can't access catalog contents. 5. Therefore every WAL record that potentially removes data from the index or heap must carry a flag indicating whether or not it is one that might be accessed during logical decoding. Why do we need this for logical decoding on standby? First, let's forget about logical decoding on standby and recall that on a primary database, any catalog rows that may be needed by a logical decoding replication slot are not removed. This is done thanks to the catalog_xmin associated with the logical replication slot. But, with logical decoding on standby, in the following cases: - hot_standby_feedback is off - hot_standby_feedback is on but there is no a physical slot between the primary and the standby. Then, hot_standby_feedback will work, but only while the connection is alive (for example a node restart would break it) Then, the primary may delete system catalog rows that could be needed by the logical decoding on the standby (as it does not know about the catalog_xmin on the standby). So, it’s mandatory to identify those rows and invalidate the slots that may need them if any. Identifying those rows is the purpose of this commit. Implementation: When a WAL replay on standby indicates that a catalog table tuple is to be deleted by an xid that is greater than a logical slot's catalog_xmin, then that means the slot's catalog_xmin conflicts with the xid, and we need to handle the conflict. While subsequent commits will do the actual conflict handling, this commit adds a new field isCatalogRel in such WAL records (and a new bit set in the xl_heap_visible flags field), that is true for catalog tables, so as to arrange for conflict handling. The affected WAL records are the ones that already contain the snapshotConflictHorizon field, namely: - gistxlogDelete - gistxlogPageReuse - xl_hash_vacuum_one_page - xl_heap_prune - xl_heap_freeze_page - xl_heap_visible - xl_btree_reuse_page - xl_btree_delete - spgxlogVacuumRedirect Due to this new field being added, xl_hash_vacuum_one_page and gistxlogDelete do now contain the offsets to be deleted as a FLEXIBLE_ARRAY_MEMBER. This is needed to ensure correct alignment. It's not needed on the others struct where isCatalogRel has been added. This commit just introduces the WAL format changes mentioned above. Handling the actual conflicts will follow in future commits. Bumps XLOG_PAGE_MAGIC as the several WAL records are changed. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> (in an older version) Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>	2023-04-02 12:32:19 -07:00
Andres Freund	61b313e47e	Pass down table relation into more index relation functions This is done in preparation for logical decoding on standby, which needs to include whether visibility affecting WAL records are about a (user) catalog table. Which is only known for the table, not the indexes. It's also nice to be able to pass the heap relation to GlobalVisTestFor() in vacuumRedirectAndPlaceholder(). Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/21b700c3-eecf-2e05-a699-f8c78dd31ec7@gmail.com	2023-04-01 20:18:29 -07:00
Alvaro Herrera	6ee30209a6	SQL/JSON: support the IS JSON predicate This patch introduces the SQL standard IS JSON predicate. It operates on text and bytea values representing JSON, as well as on the json and jsonb types. Each test has IS and IS NOT variants and supports a WITH UNIQUE KEYS flag. The tests are: IS JSON [VALUE] IS JSON ARRAY IS JSON OBJECT IS JSON SCALAR These should be self-explanatory. The WITH UNIQUE KEYS flag makes these return false when duplicate keys exist in any object within the value, not necessarily directly contained in the outermost object. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Author: Andrew Dunstan <andrew@dunslane.net> Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/CAF4Au4w2x-5LTnN_bxky-mq4=WOqsGsxSpENCzHRAzSnEd8+WQ@mail.gmail.com Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org	2023-03-31 22:34:04 +02:00
Alvaro Herrera	9b058f6b0d	Move ExecEvalJsonConstructor new function to a more natural place Commit `7081ac46ac` put it at the end of the file, but that doesn't look very nice.	2023-03-31 12:55:25 +02:00
Andres Freund	f95c1cd6b2	Bump PGSTAT_FILE_FORMAT_ID, omitted in `8aaa04b32d` I forgot to do so in the referenced commit. While the consequences of omitting the version change are likely to be harmless (besides discarding stats, as a PGSTAT_FILE_FORMAT_ID bump also does), it still seems worth doing.	2023-03-30 19:48:01 -07:00
Andres Freund	8aaa04b32d	Track shared buffer hits in pg_stat_io Among other things, this should make it easier to calculate a useful cache hit ratio by excluding buffer reads via buffer access strategies. As buffer access strategies reuse buffers (and thus evict the prior buffer contents), it is normal to see reads on repeated scans of the same data. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAAKRu_beMa9Hzih40%3DXPYqhDVz6tsgUGTrhZXRo%3Dunp%2Bszb%3DUA%40mail.gmail.com	2023-03-30 19:24:21 -07:00
Thomas Munro	11c2d6fdf5	Parallel Hash Full Join. Full and right outer joins were not supported in the initial implementation of Parallel Hash Join because of deadlock hazards (see discussion). Therefore FULL JOIN inhibited parallelism, as the other join strategies can't do that in parallel either. Add a new PHJ phase PHJ_BATCH_SCAN that scans for unmatched tuples on the inner side of one batch's hash table. For now, sidestep the deadlock problem by terminating parallelism there. The last process to arrive at that phase emits the unmatched tuples, while others detach and are free to go and work on other batches, if there are any, but otherwise they finish the join early. That unfairness is considered acceptable for now, because it's better than no parallelism at all. The build and probe phases are run in parallel, and the new scan-for-unmatched phase, while serial, is usually applied to the smaller of the two relations and is either limited by some multiple of work_mem, or it's too big and is partitioned into batches and then the situation is improved by batch-level parallelism. Author: Melanie Plageman <melanieplageman@gmail.com> Author: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKG%2BA6ftXPz4oe92%2Bx8Er%2BxpGZqto70-Q_ERwRaSyA%3DafNg%40mail.gmail.com	2023-03-31 11:34:03 +13:00
Andres Freund	ca7b3c4c00	pg_stat_wal: Accumulate time as instr_time instead of microseconds In instr_time.h it is stated that: * When summing multiple measurements, it's recommended to leave the * running sum in instr_time form (ie, use INSTR_TIME_ADD or * INSTR_TIME_ACCUM_DIFF) and convert to a result format only at the end. The reason for that is that converting to microseconds is not cheap, and can loose precision. Therefore this commit changes 'PendingWalStats' to use 'instr_time' instead of 'PgStat_Counter' while accumulating 'wal_write_time' and 'wal_sync_time'. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/1feedb83-7aa9-cb4b-5086-598349d3f555@gmail.com	2023-03-30 14:23:14 -07:00
Alvaro Herrera	60966f56c3	Fix inconsistencies and style issues in new SQL/JSON code Reported by Alexander Lakhin. Discussion: https://postgr.es/m/60483139-5c34-851d-baee-6c0d014e1710@gmail.com	2023-03-30 21:06:31 +02:00
Robert Haas	c3afe8cf5a	Add new predefined role pg_create_subscription. This role can be granted to non-superusers to allow them to issue CREATE SUBSCRIPTION. The non-superuser must additionally have CREATE permissions on the database in which the subscription is to be created. Most forms of ALTER SUBSCRIPTION, including ALTER SUBSCRIPTION .. SKIP, now require only that the role performing the operation own the subscription, or inherit the privileges of the owner. However, to use ALTER SUBSCRIPTION ... RENAME or ALTER SUBSCRIPTION ... OWNER TO, you also need CREATE permission on the database. This is similar to what we do for schemas. To change the owner of a schema, you must also have permission to SET ROLE to the new owner, similar to what we do for other object types. Non-superusers are required to specify a password for authentication and the remote side must use the password, similar to what is required for postgres_fdw and dblink. A superuser who wants a non-superuser to own a subscription that does not rely on password authentication may set the new password_required=false property on that subscription. A non-superuser may not set password_required=false and may not modify a subscription that already has password_required=false. This new password_required subscription property works much like the eponymous postgres_fdw property. In both cases, the actual semantics are that a password is not required if either (1) the property is set to false or (2) the relevant user is the superuser. Patch by me, reviewed by Andres Freund, Jeff Davis, Mark Dilger, and Stephen Frost (but some of those people did not fully endorse all of the decisions that the patch makes). Discussion: http://postgr.es/m/CA+TgmoaDH=0Xj7OBiQnsHTKcF2c4L+=gzPBUKSJLh8zed2_+Dg@mail.gmail.com	2023-03-30 11:37:19 -04:00
David Rowley	902ecd3bd4	Fix outdated comments regarding TupleTableSlots The tts_flag is named TTS_FLAG_SHOULDFREE, so use that instead of TTS_SHOULDFREE, which is the name of the macro that checks for that flag. Additionally, `4da597edf` got rid of the TupleTableSlot.tts_tuple field but forgot to update a comment which referenced that field. Fix that. Reported-by: Zhen Mingyang <zhenmingyang@yeah.net> Reported-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/1a96696c.9d3.187193989c3.Coremail.zhenmingyang@yeah.net	2023-03-30 16:37:03 +13:00
Daniel Gustafsson	44d85ba5a3	Copy and store addrinfo in libpq-owned private memory This refactors libpq to copy addrinfos returned by getaddrinfo to memory owned by libpq such that future improvements can alter for example the order of entries. As a nice side effect of this refactor the mechanism for iteration over addresses in PQconnectPoll is now identical to its iteration over hosts. Author: Jelte Fennema <postgres@jeltef.nl> Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Michael Banck <mbanck@gmx.net> Reviewed-by: Andrey Borodin <amborodin86@gmail.com> Discussion: https://postgr.es/m/PR3PR83MB04768E2FF04818EEB2179949F7A69@PR3PR83MB0476.EURPRD83.prod.outlook.com	2023-03-29 21:41:27 +02:00
Tom Lane	58c9600a9f	Remove empty function BufmgrCommit(). This function has been a no-op for over a decade. Even if bufmgr regains a need to be called during commit, it seems unlikely that the most appropriate call points would be precisely here, so it's not doing us much good as a placeholder either. Now, removing it probably doesn't save any noticeable number of cycles --- but the main call is inside the commit critical section, and the less work done there the better. Matthias van de Meent Discussion: https://postgr.es/m/CAEze2Wi1=tLKbxZnXzcD+8fYKyKqBtivVakLQC_mYBsP4Y8qVA@mail.gmail.com	2023-03-29 09:13:57 -04:00
Alvaro Herrera	7081ac46ac	SQL/JSON: add standard JSON constructor functions This commit introduces the SQL/JSON standard-conforming constructors for JSON types: JSON_ARRAY() JSON_ARRAYAGG() JSON_OBJECT() JSON_OBJECTAGG() Most of the functionality was already present in PostgreSQL-specific functions, but these include some new functionality such as the ability to skip or include NULL values, and to allow duplicate keys or throw error when they are found, as well as the standard specified syntax to specify output type and format. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/CAF4Au4w2x-5LTnN_bxky-mq4=WOqsGsxSpENCzHRAzSnEd8+WQ@mail.gmail.com Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org	2023-03-29 12:11:36 +02:00
Peter Eisentraut	563f21cda8	Move definition of standard collations from initdb to pg_collation.dat The standard collations "ucs_basic" and "unicode" were defined in initdb, even though pg_collation.dat seems like the correct place for them. It seems this was just forgotten during various reorganizations of initdb and pg_collation.dat/.h over time. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/08b58ecd-0d50-9395-ed51-dc8294e3fd2b%40enterprisedb.com	2023-03-29 09:45:21 +02:00
Amit Kapila	062a844424	Avoid syncing data twice for the 'publish_via_partition_root' option. When there are multiple publications for a subscription and one of those publishes via the parent table by using publish_via_partition_root and the other one directly publishes the child table, we end up copying the same data twice during initial synchronization. The reason for this was that we get both the parent and child tables from the publisher and try to copy the data for both of them. This patch extends the function pg_get_publication_tables() to take a publication list as its input parameter. This allows us to exclude a partition table whose ancestor is published by the same publication list. This problem does exist in back-branches but we decide to fix it there in a separate commit if required. The fix for back-branches requires quite complicated changes to fetch the required table information from the publisher as we can't update the function pg_get_publication_tables() in back-branches. We are not sure whether we want to deviate and complicate the code in back-branches for this problem as there are no field reports yet. Author: Wang wei Reviewed-by: Peter Smith, Jacob Champion, Kuroda Hayato, Vignesh C, Osumi Takamichi, Amit Kapila Discussion: https://postgr.es/m/OS0PR01MB57167F45D481F78CDC5986F794B99@OS0PR01MB5716.jpnprd01.prod.outlook.com	2023-03-29 10:46:58 +05:30
Jeff Davis	1671f990dd	Validate ICU locales. For ICU collations, ensure that the locale's language exists in ICU, and that the locale can be opened. Basic validation helps avoid minor mistakes and misspellings, which often fall back to the root locale instead of the intended locale. It's even more important to avoid such mistakes in ICU versions 54 and earlier, where the same (misspelled) locale string could fall back to different locales depending on the environment. Discussion: https://postgr.es/m/11b1eeb7e7667fdd4178497aeb796c48d26e69b9.camel@j-davis.com Discussion: https://postgr.es/m/df2efad0cae7c65180df8e5ebb709e5eb4f2a82b.camel@j-davis.com Reviewed-by: Peter Eisentraut	2023-03-28 16:34:29 -07:00
Peter Eisentraut	90189eefc1	Save a few bytes in pg_attribute Change the columns attndims, attstattarget, and attinhcount from int32 to int16, and reorder a bit. This saves some space (currently 4 bytes) in pg_attribute and tuple descriptors, which translates into small performance benefits and/or room for new columns in pg_attribute needed by future features. attndims and attinhcount are never realistically used with values larger than int16. Just to be sure, add some overflow checks. attstattarget is currently limited explicitly to 10000. For consistency, pg_constraint.coninhcount is also changed like attinhcount. Discussion: https://www.postgresql.org/message-id/flat/d07ffc2b-e0e8-77f7-38fb-be921dff71af%40enterprisedb.com	2023-03-28 10:05:56 +02:00
Daniel Gustafsson	b577743000	Make SCRAM iteration count configurable Replace the hardcoded value with a GUC such that the iteration count can be raised in order to increase protection against brute-force attacks. The hardcoded value for SCRAM iteration count was defined to be 4096, which is taken from RFC 7677, so set the default for the GUC to 4096 to match. In RFC 7677 the recommendation is at least 15000 iterations but 4096 is listed as a SHOULD requirement given that it's estimated to yield a 0.5s processing time on a mobile handset of the time of RFC writing (late 2015). Raising the iteration count of SCRAM will make stored passwords more resilient to brute-force attacks at a higher computational cost during connection establishment. Lowering the count will reduce computational overhead during connections at the tradeoff of reducing strength against brute-force attacks. There are however platforms where even a modest iteration count yields a too high computational overhead, with weaker password encryption schemes chosen as a result. In these situations, SCRAM with a very low iteration count still gives benefits over weaker schemes like md5, so we allow the iteration count to be set to one at the low end. The new GUC is intentionally generically named such that it can be made to support future SCRAM standards should they emerge. At that point the value can be made into key:value pairs with an undefined key as a default which will be backwards compatible with this. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org> Discussion: https://postgr.es/m/F72E7BC7-189F-4B17-BF47-9735EB72C364@yesql.se	2023-03-27 09:46:29 +02:00
Daniel Gustafsson	d435f15fff	Add SysCacheGetAttrNotNull for guaranteed not-null attrs When extracting an attr from a cached tuple in the syscache with SysCacheGetAttr the isnull parameter must be checked in case the attr cannot be NULL. For cases when this is known beforehand, a wrapper is introduced which perform the errorhandling internally on behalf of the caller, invoking an elog in case of a NULL attr. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/AD76405E-DB45-46B6-941F-17B1EB3A9076@yesql.se	2023-03-25 22:49:33 +01:00
Tom Lane	27f5c712b2	Fix CREATE INDEX progress reporting for multi-level partitioning. The "partitions_total" and "partitions_done" fields were updated as though the current level of partitioning was the only one. In multi-level cases, not only could partitions_total change over the course of the command, but partitions_done could go backwards or exceed the currently-reported partitions_total. Fix by setting partitions_total to the total number of direct and indirect children once at command start, and then just incrementing partitions_done at appropriate points. Invent a new progress monitoring function "pgstat_progress_incr_param" to simplify doing the latter. We can avoid adding cost for the former when doing CREATE INDEX, because ProcessUtility already enumerates the children and it's pretty easy to pass the count down to DefineIndex. In principle the same could be done in ALTER TABLE, but that's structurally difficult; for now, just eat the cost of an extra find_all_inheritors scan in that case. Ilya Gladyshev and Justin Pryzby Discussion: https://postgr.es/m/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel@gmail.com	2023-03-25 15:34:03 -04:00
Tom Lane	3c05284d83	Invent GENERIC_PLAN option for EXPLAIN. This provides a very simple way to see the generic plan for a parameterized query. Without this, it's necessary to define a prepared statement and temporarily change plan_cache_mode, which is a bit tedious. One thing that's a bit of a hack perhaps is that we disable execution-time partition pruning when the GENERIC_PLAN option is given. That's because the pruning code may attempt to fetch the value of one of the parameters, which would fail. Laurenz Albe, reviewed by Julien Rouhaud, Christoph Berg, Michel Pelletier, Jim Jones, and myself Discussion: https://postgr.es/m/0a29b954b10b57f0d135fe12aa0909bd41883eb0.camel@cybertec.at	2023-03-24 17:07:22 -04:00
Michael Paquier	36f40ce2dc	libpq: Add sslcertmode option to control client certificates The sslcertmode option controls whether the server is allowed and/or required to request a certificate from the client. There are three modes: - "allow" is the default and follows the current behavior, where a configured client certificate is sent if the server requests one (via one of its default locations or sslcert). With the current implementation, will happen whenever TLS is negotiated. - "disable" causes the client to refuse to send a client certificate even if sslcert is configured or if a client certificate is available in one of its default locations. - "require" causes the client to fail if a client certificate is never sent and the server opens a connection anyway. This doesn't add any additional security, since there is no guarantee that the server is validating the certificate correctly, but it may helpful to troubleshoot more complicated TLS setups. sslcertmode=require requires SSL_CTX_set_cert_cb(), available since OpenSSL 1.0.2. Note that LibreSSL does not include it. Using a connection parameter different than require_auth has come up as the simplest design because certificate authentication does not rely directly on any of the AUTH_REQ_* codes, and one may want to require a certificate to be sent in combination of a given authentication method, like SCRAM-SHA-256. TAP tests are added in src/test/ssl/, some of them relying on sslinfo to check if a certificate has been set. These are compatible across all the versions of OpenSSL supported on HEAD (currently down to 1.0.1). Author: Jacob Champion Reviewed-by: Aleksander Alekseev, Peter Eisentraut, David G. Johnston, Michael Paquier Discussion: https://postgr.es/m/9e5a8ccddb8355ea9fa4b75a1e3a9edc88a70cd3.camel@vmware.com	2023-03-24 13:34:26 +09:00
Michael Paquier	8089517ab8	Rename fields in pgstat structures for functions and relations This commit renames the members of a few pgstat structures related to functions and relations, by respectively removing their prefix "f_" and "t_". The statistics for functions and relations and handled in their own file, and pgstatfuncs.c associates each field in a structure variable named based on the object type handled, so no information is lost with this rename. This will help with some of the refactoring aimed for pgstatfuncs.c, as this makes more consistent the field names with the SQL functions retrieving them. Author: Bertrand Drouvot Reviewed-by: Michael Paquier, Melanie Plageman Discussion: https://postgr.es/m/9142f62a-a422-145c-bde0-b5bc498a4ada@gmail.com	2023-03-24 08:46:29 +09:00
Peter Geoghegan	ae4fdde135	Count updates that move row to a new page. Add pgstat counter to track row updates that result in the successor version going to a new heap page, leaving behind an original version whose t_ctid points to the new version. The current count is shown by the n_tup_newpage_upd column of each of the pg_stat_*_tables views. The new n_tup_newpage_upd column complements the existing n_tup_hot_upd and n_tup_upd columns. Tables that have high n_tup_newpage_upd values (relative to n_tup_upd) are good candidates for tuning heap fillfactor. Corey Huinker, with small tweaks by me. Author: Corey Huinker <corey.huinker@gmail.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CADkLM=ded21M9iZ36hHm-vj2rE2d=zcKpUQMds__Xm2pxLfHKA@mail.gmail.com	2023-03-23 11:16:17 -07:00
Thomas Munro	8fba928fd7	Improve the naming of Parallel Hash Join phases. * Commit `3048898e` dropped -ING from PHJ wait event names. Update the corresponding barrier phases names to match. * Rename the "DONE" phases to "FREE". That's symmetrical with "ALLOCATE", and names the activity that actually happens in that phase (as we do for the other phases) rather than a state. The bug fixed by commit `8d578b9b` might have been more obvious with this name. * Rename the batch/bucket growth barriers' "ALLOCATE" phases to "REALLOCATE", a better description of what they do. * Update the high level comments about phases to highlight phases are executed by a single process with an asterisk (mostly memory management phases). No behavior change, as this is just improving internal identifiers. The only user-visible sign of this is that a couple of wait events' display names change from "...Allocate" to "...Reallocate" in pg_stat_activity, to stay in sync with the internal names. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKG%2BMDpwF2Eo2LAvzd%3DpOh81wUTsrwU1uAwR-v6OGBB6%2B7g%40mail.gmail.com	2023-03-23 13:14:25 +13:00
Alexander Korotkov	11470f544e	Allow locking updated tuples in tuple_update() and tuple_delete() Currently, in read committed transaction isolation mode (default), we have the following sequence of actions when tuple_update()/tuple_delete() finds the tuple updated by concurrent transaction. 1. Attempt to update/delete tuple with tuple_update()/tuple_delete(), which returns TM_Updated. 2. Lock tuple with tuple_lock(). 3. Re-evaluate plan qual (recheck if we still need to update/delete and calculate the new tuple for update). 4. Second attempt to update/delete tuple with tuple_update()/tuple_delete(). This attempt should be successful, since the tuple was previously locked. This patch eliminates step 2 by taking the lock during first tuple_update()/tuple_delete() call. Heap table access method saves some efforts by checking the updated tuple once instead of twice. Future undo-based table access methods, which will start from the latest row version, can immediately place a lock there. The code in nodeModifyTable.c is simplified by removing the nested switch/case. Discussion: https://postgr.es/m/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com Reviewed-by: Aleksander Alekseev, Pavel Borisov, Vignesh C, Mason Sharp Reviewed-by: Andres Freund, Chris Travers	2023-03-23 00:26:59 +03:00
Tom Lane	b0d8f2d983	Add SHELL_ERROR and SHELL_EXIT_CODE magic variables to psql. These are set after a \! command or a backtick substitution. SHELL_ERROR is just "true" for error (nonzero exit status) or "false" for success, while SHELL_EXIT_CODE records the actual exit status following standard shell/system(3) conventions. Corey Huinker, reviewed by Maxim Orlov and myself Discussion: https://postgr.es/m/CADkLM=cWao2x2f+UDw15W1JkVFr_bsxfstw=NGea7r9m4j-7rQ@mail.gmail.com	2023-03-21 13:03:56 -04:00
Thomas Munro	8d578b9b2e	Fix race in parallel hash join batch cleanup, take II. With unlucky timing and parallel_leader_participation=off (not the default), PHJ could attempt to access per-batch shared state just as it was being freed. There was code intended to prevent that by checking for a cleared pointer, but it was racy. Fix, by introducing an extra barrier phase. The new phase PHJ_BUILD_RUNNING means that it's safe to access the per-batch state to find a batch to help with, and PHJ_BUILD_DONE means that it is too late. The last to detach will free the array of per-batch state as before, but now it will also atomically advance the phase, so that late attachers can avoid the hazard. This mirrors the way per-batch hash tables are freed (see phases PHJ_BATCH_PROBING and PHJ_BATCH_DONE). An earlier attempt to fix this (commit `3b8981b6`, later reverted) missed one special case. When the inner side is empty (the "empty inner optimization), the build barrier would only make it to PHJ_BUILD_HASHING_INNER phase before workers attempted to detach from the hashtable. In that case, fast-forward the build barrier to PHJ_BUILD_RUNNING before proceeding, so that our later assertions hold and we can still negotiate who is cleaning up. Revealed by build farm failures, where BarrierAttach() failed a sanity check assertion, because the memory had been clobbered by dsa_free(). In non-assert builds, the result could be a segmentation fault. Back-patch to all supported releases. Author: Thomas Munro <thomas.munro@gmail.com> Author: Melanie Plageman <melanieplageman@gmail.com> Reported-by: Michael Paquier <michael@paquier.xyz> Reported-by: David Geier <geidav.pg@gmail.com> Tested-by: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/20200929061142.GA29096%40paquier.xyz	2023-03-21 14:29:34 +13:00
Tomas Vondra	19d8e2308b	Ignore BRIN indexes when checking for HOT updates When determining whether an index update may be skipped by using HOT, we can ignore attributes indexed by block summarizing indexes without references to individual tuples that need to be cleaned up. A new type TU_UpdateIndexes provides a signal to the executor to determine which indexes to update - no indexes, all indexes, or only the summarizing indexes. This also removes rd_indexattr list, and replaces it with rd_attrsvalid flag. The list was not used anywhere, and a simple flag is sufficient. This was originally committed as `5753d4ee32`, but then got reverted by `e3fcca0d0d` because of correctness issues. Original patch by Josef Simanek, various fixes and improvements by Tomas Vondra and me. Authors: Matthias van de Meent, Josef Simanek, Tomas Vondra Reviewed-by: Tomas Vondra, Alvaro Herrera Discussion: https://postgr.es/m/05ebcb44-f383-86e3-4f31-0a97a55634cf@enterprisedb.com Discussion: https://postgr.es/m/CAFp7QwpMRGcDAQumN7onN9HjrJ3u4X3ZRXdGFT0K5G2JWvnbWg%40mail.gmail.com	2023-03-20 11:02:42 +01:00
Tom Lane	75bd846b68	Add functions to do timestamptz arithmetic in a non-default timezone. Add versions of timestamptz + interval, timestamptz - interval, and generate_series(timestamptz, ...) in which a timezone can be specified explicitly instead of defaulting to the TimeZone GUC setting. The new functions for the first two are named date_add and date_subtract. This might seem too generic, but we could use overloading to add additional variants if that seems useful. Along the way, improve the docs' pretty inadequate explanation of how timestamptz +- interval works. Przemysław Sztoch and Gurjeet Singh; cosmetic changes and most of the docs work by me Discussion: https://postgr.es/m/01a84551-48dd-1359-bf7e-f6b0203a6bd0@sztoch.pl	2023-03-18 14:12:16 -04:00
Michael Paquier	0e681cf039	Add files related to query jumbling in src/include/nodes/ for meson This caused ninja clean to not remove the two files generated by gen_node_support.pl for the query jumbling, for example: queryjumblefuncs.funcs.c and queryjumblefuncs.switch.c. Reported-by: Pavel Stehule Discussion: https://postgr.es/m/CAFj8pRBFiWVRyGYSPziyFuXJbHirNmfWwzbfTyCf8YOdiwK74w@mail.gmail.com	2023-03-18 18:04:04 +09:00
Tom Lane	3e59e5048d	Refactor datetime functions' timezone lookup code to reduce duplication. We already had five copies of essentially the same logic, and an upcoming patch introduces yet another use-case. That's past my threshold of pain, so introduce a common subroutine. There's not that much net code savings, but the chance of typos should go down. Inspired by a patch from Przemysław Sztoch, but different in detail. Discussion: https://postgr.es/m/01a84551-48dd-1359-bf7e-f6b0203a6bd0@sztoch.pl	2023-03-17 17:47:19 -04:00
Jeff Davis	f413941f41	Fix t_isspace(), etc., when datlocprovider=i and datctype=C. Check whether the datctype is C to determine whether t_isspace() and related functions use isspace() or iswspace(). Previously, t_isspace() checked whether the database default collation was C; which is incorrect when the default collation uses the ICU provider. Discussion: https://postgr.es/m/79e4354d9eccfdb00483146a6b9f6295202e7890.camel@j-davis.com Reviewed-by: Peter Eisentraut Backpatch-through: 15	2023-03-17 12:08:46 -07:00
Amit Kapila	e709596b25	Add macros for ReorderBufferTXN toptxn. Currently, there are quite a few places in reorderbuffer.c that tries to access top-transaction for a subtransaction. This makes the code to access top-transaction consistent and easier to follow. Author: Peter Smith Reviewed-by: Vignesh C, Sawada Masahiko Discussion: https://postgr.es/m/CAHut+PuCznOyTqBQwjRUu-ibG-=KHyCv-0FTcWQtZUdR88umfg@mail.gmail.com	2023-03-17 08:29:41 +05:30
Michael Paquier	98ae2c84a4	libpq: Remove code for SCM credential authentication Support for SCM credential authentication has been removed in the backend in 9.1, and libpq has kept some code to handle it for compatibility. Commit `be4585b`, that did the cleanup of the backend code, has done so because the code was not really portable originally. And, as there are likely little chances that this is used these days, this removes the remaining code from libpq. An error will now be raised by libpq if attempting to connect to a server that returns AUTH_REQ_SCM_CREDS, instead. References to SCM credential authentication are removed from the protocol documentation. This removes some meson and configure checks. Author: Michael Paquier Reviewed-by: Tom Lane Discussion: https://postgr.es/m/ZBLH8a4otfqgd6Kn@paquier.xyz	2023-03-17 10:52:26 +09:00
Michael Paquier	e731aeac89	Remove PgStat_BackendFunctionEntry This structure included only PgStat_FunctionCounts, and removing it facilitates some upcoming refactoring for pgstatfuncs.c to use more macros rather that mostly-duplicated functions. Author: Bertrand Drouvot Reviewed-by: Nathan Bossart Discussion: https://postgr.es/m/11d531fe-52fc-c6ea-7e8e-62f1b6ec626e@gmail.com	2023-03-16 14:22:34 +09:00
Tom Lane	483bdb2afe	Support [NO] INDENT option in XMLSERIALIZE(). This adds the ability to pretty-print XML documents ... according to libxml's somewhat idiosyncratic notions of what's pretty, anyway. One notable divergence from a strict reading of the spec is that libxml is willing to collapse empty nodes "<node></node>" to just "<node/>", whereas SQL and the underlying XML spec say that this option should only result in whitespace tweaks. Nonetheless, it seems close enough to justify using the SQL-standard syntax. Jim Jones, reviewed by Peter Smith and myself Discussion: https://postgr.es/m/2f5df461-dad8-6d7d-4568-08e10608a69b@uni-muenster.de	2023-03-15 16:59:09 -04:00
Andrew Dunstan	419a8dd814	Add a hook for modifying the ldapbind password The hook can be installed by a shared_preload library. A similar mechanism could be used for radius paswords, for example, and the type name auth_password_hook_typ has been shosen with that in mind. John Naylor and Andrew Dunstan Discussion: https://postgr.es/m/469b06ed-69de-ba59-c13a-91d2372e52a9@dunslane.net	2023-03-15 16:37:28 -04:00
Amit Kapila	89e46da5e5	Allow the use of indexes other than PK and REPLICA IDENTITY on the subscriber. Using REPLICA IDENTITY FULL on the publisher can lead to a full table scan per tuple change on the subscription when REPLICA IDENTITY or PK index is not available. This makes REPLICA IDENTITY FULL impractical to use apart from some small number of use cases. This patch allows using indexes other than PRIMARY KEY or REPLICA IDENTITY on the subscriber during apply of update/delete. The index that can be used must be a btree index, not a partial index, and it must have at least one column reference (i.e. cannot consist of only expressions). We can uplift these restrictions in the future. There is no smart mechanism to pick the index. If there is more than one index that satisfies these requirements, we just pick the first one. We discussed using some of the optimizer's low-level APIs for this but ruled it out as that can be a maintenance burden in the long run. This patch improves the performance in the vast majority of cases and the improvement is proportional to the amount of data in the table. However, there could be some regression in a small number of cases where the indexes have a lot of duplicate and dead rows. It was discussed that those are mostly impractical cases but we can provide a table or subscription level option to disable this feature if required. Author: Onder Kalaci, Amit Kapila Reviewed-by: Peter Smith, Shi yu, Hou Zhijie, Vignesh C, Kuroda Hayato, Amit Kapila Discussion: https://postgr.es/m/CACawEhVLqmAAyPXdHEPv1ssU2c=dqOniiGz7G73HfyS7+nGV4w@mail.gmail.com	2023-03-15 08:49:04 +05:30
Dean Rasheed	d5d574146d	Add support for the error functions erf() and erfc(). Expose the standard error functions as SQL-callable functions. These are expected to be useful to people working with normal distributions, and we use them here to test the distribution from random_normal(). Since these functions are defined in the POSIX and C99 standards, they should in theory be available on all supported platforms. If that turns out not to be the case, more work will be needed. On all platforms tested so far, using extra_float_digits = -1 in the regression tests is sufficient to allow for variations between implementations. However, past experience has shown that there are almost certainly going to be additional unexpected portability issues, so these tests may well need further adjustments, based on the buildfarm results. Dean Rasheed, reviewed by Nathan Bossart and Thomas Munro. Discussion: https://postgr.es/m/CAEZATCXv5fi7+Vu-POiyai+ucF95+YMcCMafxV+eZuN1B-=MkQ@mail.gmail.com	2023-03-14 09:17:36 +00:00
Michael Paquier	3a465cc678	libpq: Add support for require_auth to control authorized auth methods The new connection parameter require_auth allows a libpq client to define a list of comma-separated acceptable authentication types for use with the server. There is no negotiation: if the server does not present one of the allowed authentication requests, the connection attempt done by the client fails. The following keywords can be defined in the list: - password, for AUTH_REQ_PASSWORD. - md5, for AUTH_REQ_MD5. - gss, for AUTH_REQ_GSS[_CONT]. - sspi, for AUTH_REQ_SSPI and AUTH_REQ_GSS_CONT. - scram-sha-256, for AUTH_REQ_SASL[_CONT\|_FIN]. - creds, for AUTH_REQ_SCM_CREDS (perhaps this should be removed entirely now). - none, to control unauthenticated connections. All the methods that can be defined in the list can be negated, like "!password", in which case the server must NOT use the listed authentication type. The special method "none" allows/disallows the use of unauthenticated connections (but it does not govern transport-level authentication via TLS or GSSAPI). Internally, the patch logic is tied to check_expected_areq(), that was used for channel_binding, ensuring that an incoming request is compatible with conn->require_auth. It also introduces a new flag, conn->client_finished_auth, which is set by various authentication routines when the client side of the handshake is finished. This signals to check_expected_areq() that an AUTH_REQ_OK from the server is expected, and allows the client to complain if the server bypasses authentication entirely, with for example the reception of a too-early AUTH_REQ_OK message. Regression tests are added in authentication TAP tests for all the keywords supported (except "creds", because it is around only for compatibility reasons). A new TAP script has been added for SSPI, as there was no script dedicated to it yet. It relies on SSPI being the default authentication method on Windows, as set by pg_regress. Author: Jacob Champion Reviewed-by: Peter Eisentraut, David G. Johnston, Michael Paquier Discussion: https://postgr.es/m/9e5a8ccddb8355ea9fa4b75a1e3a9edc88a70cd3.camel@vmware.com	2023-03-14 14:00:05 +09:00
Andrew Dunstan	9f8377f7a2	Add a DEFAULT option to COPY FROM This allows for a string which if an input field matches causes the column's default value to be inserted. The advantage of this is that the default can be inserted in some rows and not others, for which non-default data is available. The file_fdw extension is also modified to take allow use of this option. Israel Barth Rubio Discussion: https://postgr.es/m/CAO_rXXAcqesk6DsvioOZ5zmeEmpUN5ktZf-9=9yu+DTr0Xr8Uw@mail.gmail.com	2023-03-13 10:01:56 -04:00
Dean Rasheed	9321c79c86	Fix concurrent update issues with MERGE. If MERGE attempts an UPDATE or DELETE on a table with BEFORE ROW triggers, or a cross-partition UPDATE (with or without triggers), and a concurrent UPDATE or DELETE happens, the merge code would fail. In some cases this would lead to a crash, while in others it would cause the wrong merge action to be executed, or no action at all. The immediate cause of the crash was the trigger code calling ExecGetUpdateNewTuple() as part of the EPQ mechanism, which fails because during a merge ri_projectNew is NULL, since merge has its own per-action projection information, which ExecGetUpdateNewTuple() knows nothing about. Fix by arranging for the trigger code to exit early, returning the TM_Result and TM_FailureData information, if a concurrent modification is detected, allowing the merge code to do the necessary EPQ handling in its own way. Similarly, prevent the cross-partition update code from doing any EPQ processing for a merge, allowing the merge code to work out what it needs to do. This leads to a number of simplifications in nodeModifyTable.c. Most notably, the ModifyTableContext->GetUpdateNewTuple() callback is no longer needed, and mergeGetUpdateNewTuple() can be deleted, since there is no longer any requirement for get-update-new-tuple during a merge. Similarly, ModifyTableContext->cpUpdateRetrySlot is no longer needed. Thus ExecGetUpdateNewTuple() and the retry_slot handling of ExecCrossPartitionUpdate() can be restored to how they were in v14, before the merge code was added, and ExecMergeMatched() no longer needs any special-case handling for cross-partition updates. While at it, tidy up ExecUpdateEpilogue() a bit, making it handle recheckIndexes locally, rather than passing it in as a parameter, ensuring that it is freed properly. This dates back to when it was split off from ExecUpdate() to support merge. Per bug #17809 from Alexander Lakhin, and follow-up investigation of bug #17792, also from Alexander Lakhin. Back-patch to v15, where MERGE was introduced, taking care to preserve backwards-compatibility of the trigger API in v15 for any extensions that might use it. Discussion: https://postgr.es/m/17809-9e6650bef133f0fe%40postgresql.org https://postgr.es/m/17792-0f89452029662c36%40postgresql.org	2023-03-13 10:22:22 +00:00
Peter Eisentraut	d72900bded	Improve support for UNICODE collation on older ICU The recently added standard collation UNICODE (`0d21d4b9bc`) doesn't give consistent results on some build farm members with old ICU versions. Apparently, the ICU locale specification 'und' (language tag style) misbehaves on some older ICU versions. Replacing it with '' (ICU locale ID style) fixes it at least on some OS versions. Let's see what the build farm says.	2023-03-13 09:08:58 +01:00
Peter Eisentraut	0d21d4b9bc	Add standard collation UNICODE This adds a new predefined collation named UNICODE, which sorts by the default Unicode collation algorithm specifications, per SQL standard. This only works if ICU support is built. Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://www.postgresql.org/message-id/flat/1293e382-2093-a2bf-a397-c04e8f83d3c2@enterprisedb.com	2023-03-10 13:35:43 +01:00
Michael Paquier	6ad5793a49	Include headers of archive/ in installation These new headers have been recently added in `35739b8`, but they were not installed. Sravan has provided the patch for configure/make, while I have fixed the meson part. Author: Sravan Kumar, Michael Paquier Discussion: https://postgr.es/m/CA+=NbjguiQy-MbVqfQ-jQ=2Fcmx3Zs36OkKb-vjt28jMTG0OOg@mail.gmail.com	2023-03-10 20:08:10 +09:00
Peter Eisentraut	30a53b7929	Allow tailoring of ICU locales with custom rules This exposes the ICU facility to add custom collation rules to a standard collation. New options are added to CREATE COLLATION, CREATE DATABASE, createdb, and initdb to set the rules. Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Discussion: https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f739f@enterprisedb.com	2023-03-08 16:56:37 +01:00
Peter Eisentraut	822e8e2951	Update comment There was apparently an attempt here to list all the object types that ACL_USAGE applies to, but it wasn't complete. So instead of trying to keep up, put in a more timeless comment.	2023-03-08 14:22:06 +01:00
Michael Paquier	a4e003338d	Refine query jumbling handling for CallStmt Previously, all the nodes of CallStmt were included in the jumbling, causing a duplicate in the computation as the transformed state of the CALL query was included as well as the parsed state (transformed FuncCall with all the input arguments and potential output arguments). Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/Y+MRdEq9W9XVa2AB@paquier.xyz	2023-03-08 14:38:35 +09:00
Michael Paquier	d69cd3a2e2	Ignore IntoClause.viewQuery in query jumbling IntoClause.viewQuery is a copy of the parsed-but-not-rewritten SELECT clause copied to IntoClause when transforming CreateTableAsStmt for a materialized view. Including a second copy of the SELECT Query into the query jumbling was leading to an incorrect numbering of the Const node locations, as these would be counted twice instead of once. This becomes visible once the query normalization is applied to CREATE MATERIALIZED VIEW in pg_stat_statements in the shape of a query string using only odd numbers for the normalized constants, (regression tests added in pg_stat_statements as of `de2aca2` would show the difference). Including the original Query from CreateTableAsStmt is enough for the query jumbling. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/Y+MRdEq9W9XVa2AB@paquier.xyz	2023-03-08 11:41:52 +09:00
Michael Paquier	e20b1ea157	Make get_extension_schema() available This routine is able to retrieve the OID of the schema used with an extension (pg_extension.extnamespace), or InvalidOid if this information is not available. plpgsql_check embeds a copy of this code when performing checks on functions, as one out-of-core example. Author: Pavel Stehule Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/CAFj8pRD+9x55hjDoi285jCcjPc8uuY_D+FLn5RpXggdz+4O2sQ@mail.gmail.com	2023-03-07 14:18:20 +09:00
Tom Lane	7fee7871b4	Fix some more cases of missed GENERATED-column updates. If UPDATE is forced to retry after an EvalPlanQual check, it neglected to repeat GENERATED-column computations, even though those might well have changed since we're dealing with a different tuple than before. Fixing this is mostly a matter of looping back a bit further when we retry. In v15 and HEAD that's most easily done by altering the API of ExecUpdateAct so that it includes computing GENERATED expressions. Also, if an UPDATE in a partitioned table turns into a cross-partition INSERT operation, we failed to recompute GENERATED columns. That's a bug since `8bf6ec3ba` allowed partitions to have different generation expressions; although it seems to have no ill effects before that. Fixing this is messier because we can now have situations where the same query needs both the UPDATE-aligned set of GENERATED columns and the INSERT-aligned set, and it's unclear which set will be generated first (else we could hack things by forcing the INSERT-aligned set to be generated, which is indeed how `fe9e658f4` made it work for MERGE). The best fix seems to be to build and store separate sets of expressions for the INSERT and UPDATE cases. That would create ABI issues in the back branches, but so far it seems we can leave this alone in the back branches. Per bug #17823 from Hisahiro Kauchi. The first part of this affects all branches back to v12 where GENERATED columns were added. Discussion: https://postgr.es/m/17823-b64909cf7d63de84@postgresql.org	2023-03-06 18:31:27 -05:00
Tom Lane	b803b7d132	Fill EState.es_rteperminfos more systematically. While testing a fix for bug #17823, I discovered that EvalPlanQualStart failed to copy es_rteperminfos from the parent EState, resulting in failure if anything in EPQ execution wanted to consult that information. This led me to conclude that commit `a61b1f748` had been too haphazard about where to fill es_rteperminfos, and that we need to be sure that that happens exactly where es_range_table gets filled. So I changed the signature of ExecInitRangeTable to help ensure that this new requirement doesn't get missed. (Indeed, pgoutput.c was also failing to fill it. Maybe we don't ever need it there, but I wouldn't bet on that.) No test case yet; one will arrive with the fix for #17823. But that needs to be back-patched, while this fix is HEAD-only. Discussion: https://postgr.es/m/17823-b64909cf7d63de84@postgresql.org	2023-03-06 13:10:57 -05:00
Michael Paquier	4211fbd841	Add PROCESS_MAIN to VACUUM Disabling this option is useful to run VACUUM (with or without FULL) on only the toast table of a relation, bypassing the main relation. This option is enabled by default. Running directly VACUUM on a toast table was already possible without this feature, by using the non-deterministic name of a toast relation (as of pg_toast.pg_toast_N, where N would be the OID of the parent relation) in the VACUUM command, and it required a scan of pg_class to know the name of the toast table. So this feature is basically a shortcut to be able to run VACUUM or VACUUM FULL on a toast relation, using only the name of the parent relation. A new switch called --no-process-main is added to vacuumdb, to work as an equivalent of PROCESS_MAIN. Regression tests are added to cover VACUUM and VACUUM FULL, looking at pg_stat_all_tables.vacuum_count to see how many vacuums have run on each table, main or toast. Author: Nathan Bossart Reviewed-by: Masahiko Sawada Discussion: https://postgr.es/m/20221230000028.GA435655@nathanxps13	2023-03-06 16:41:05 +09:00
Michael Paquier	ce340e530d	Revise pg_pwrite_zeros() The following changes are made to pg_write_zeros(), the API able to write series of zeros using vectored I/O: - Add of an "offset" parameter, to write the size from this position (the 'p' of "pwrite" seems to mean position, though POSIX does not outline ythat directly), hence the name of the routine is incorrect if it is not able to handle offsets. - Avoid memset() of "zbuffer" on every call. - Avoid initialization of the whole IOV array if not needed. - Group the trailing write() call with the main write() call, simplifying the function logic. Author: Andres Freund Reviewed-by: Michael Paquier, Bharath Rupireddy Discussion: https://postgr.es/m/20230215005525.mrrlmqrxzjzhaipl@awork3.anarazel.de	2023-03-06 13:21:33 +09:00
Tom Lane	6949b921d5	Avoid failure when altering state of partitioned foreign-key triggers. Beginning in v15, if you apply ALTER TABLE ENABLE/DISABLE TRIGGER to a partitioned table, it also affects the partitions' cloned versions of the affected trigger(s). The initial implementation of this located the clones by name, but that fails on foreign-key triggers which have names incorporating their own OIDs. We can fix that, and also make the behavior more bulletproof in the face of user-initiated trigger renames, by identifying the cloned triggers by tgparentid. Following the lead of earlier commits in this area, I took care not to break ABI in the v15 branch, even though I rather doubt there are any external callers of EnableDisableTrigger. While here, update the documentation, which was not touched when the semantics were changed. Per bug #17817 from Alan Hodgson. Back-patch to v15; older versions do not have this behavior. Discussion: https://postgr.es/m/17817-31dfb7c2100d9f3d@postgresql.org	2023-03-04 13:32:35 -05:00
Robert Haas	ebd551f586	Update some incorrect comments about xlog records. The comments claim that certain pieces of data are part of the main WAL record data when in reality they are part of the data for block 0. Repair. Bertrand Drouvot, reviewed by Amit Kapila. Originally reported by me. Discussion: http://postgr.es/m/80db7836-4415-d54a-64c3-66b88b1430e7@gmail.com	2023-03-03 12:52:04 -05:00

1 2 3 4 5 ...

11067 commits