postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-03-11 02:34:28 -04:00

Author	SHA1	Message	Date
Andres Freund	27cc7cd2bc	Reorder EPQ work, to fix rowmark related bugs and improve efficiency. In `ad0bda5d24` I changed the EvalPlanQual machinery to store substitution tuples in slot, instead of using plain HeapTuples. The main motivation for that was that using HeapTuples will be inefficient for future tableams. But it turns out that that conversion was buggy for non-locking rowmarks - the wrong tuple descriptor was used to create the slot. As a secondary issue `5db6df0c0` changed ExecLockRows() to begin EPQ earlier, to allow to fetch the locked rows directly into the EPQ slots, instead of having to copy tuples around. Unfortunately, as Tom complained, that forces some expensive initialization to happen earlier. As a third issue, the test coverage for EPQ was clearly insufficient. Fixing the first issue is unfortunately not trivial: Non-locked row marks were fetched at the start of EPQ, and we don't have the type information for the rowmarks available at that point. While we could change that, it's not easy. It might be worthwhile to change that at some point, but to fix this bug, it seems better to delay fetching non-locking rowmarks when they're actually needed, rather than eagerly. They're referenced at most once, and in cases where EPQ fails, might never be referenced. Fetching them when needed also increases locality a bit. To be able to fetch rowmarks during execution, rather than initialization, we need to be able to access the active EPQState, as that contains necessary data. To do so move EPQ related data from EState to EPQState, and, only for EStates creates as part of EPQ, reference the associated EPQState from EState. To fix the second issue, change EPQ initialization to allow use of EvalPlanQualSlot() to be used before EvalPlanQualBegin() (but obviously still requiring EvalPlanQualInit() to have been done). As these changes made struct EState harder to understand, e.g. by adding multiple EStates, significantly reorder the members, and add a lot more comments. Also add a few more EPQ tests, including one that fails for the first issue above. More is needed. Reported-By: yi huang Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/CAHU7rYZo_C4ULsAx_LAj8az9zqgrD8WDd4hTegDTMM1LMqrBsg@mail.gmail.com https://postgr.es/m/24530.1562686693@sss.pgh.pa.us Backpatch: 12-, where the EPQ changes were introduced	2019-09-09 05:14:11 -07:00
Tom Lane	b1907d6882	Set application_name per-test in isolation and ecpg tests. Commit `a4327296d` taught pg_regress proper to do this, but missed the opportunity to do likewise in the isolationtester and ecpg variants of pg_regress. Seems like this might be helpful for tracking down issues exposed by those tests.	2019-08-27 19:49:09 -04:00
Michael Paquier	989d23b04b	Detect unused steps in isolation specs and do some cleanup This is useful for developers to find out if an isolation spec is over-engineered or if it needs more work by warning at the end of a test run if a step is not used, generating a failure with extra diffs. While on it, clean up all the specs which include steps not used in any permutations to simplify them. Author: Michael Paquier Reviewed-by: Asim Praveen, Melanie Plageman Discussion: https://postgr.es/m/20190819080820.GG18166@paquier.xyz	2019-08-24 11:45:05 +09:00
Michael Paquier	9903338b5e	Remove dry-run mode from isolationtester The original purpose of the dry-run mode is to be able to print all the possible permutations from a spec file, but it has become less useful since isolation tests has improved regarding deadlock detection as one step not wanted by the author could block indefinitely now (originally the step blocked would have been detected rather quickly). Per discussion, let's remove it. Author: Michael Paquier Reviewed-by: Asim Praveen, Melanie Plageman Discussion: https://postgr.es/m/20190819080820.GG18166@paquier.xyz	2019-08-24 11:35:43 +09:00
Michael Paquier	c96581abe4	Fix inconsistencies and typos in the tree, take 11 This fixes various typos in docs and comments, and removes some orphaned definitions. Author: Alexander Lakhin Discussion: https://postgr.es/m/5da8e325-c665-da95-21e0-c8a99ea61fbf@gmail.com	2019-08-19 16:21:39 +09:00
Tom Lane	9be4ce4fa3	Make deadlock-parallel isolation test more robust. This test failed fairly reproducibly on some CLOBBER_CACHE_ALWAYS buildfarm animals. The cause seems to be that if a parallel worker is slow enough to reach its lock wait, it may not be released by the first deadlock check run, and then later deadlock checks might decide to unblock the d2 session instead of the d1 session, leaving us in an undetected deadlock state (since the isolationtester client is waiting for d1 to complete first). Fix by introducing an additional lock wait at the end of the d2a1 step, ensuring that the deadlock checker will recognize that d1 has to be unblocked before d2a1 completes. Also reduce max_parallel_workers_per_gather to 3 in this test. With the default max_worker_processes value, we were only getting one parallel worker for the d2a1 step, which is not the case I hoped to test. We should get 3 for d1a2 and 2 for d2a1, as the code stands; and maybe 3 for d2a1 if somebody figures out why the last parallel worker slot isn't free already. Discussion: https://postgr.es/m/22195.1566077308@sss.pgh.pa.us	2019-08-17 18:15:38 -04:00
Tom Lane	bb5ae8f6c4	Use a hash table to de-duplicate NOTIFY events faster. Previously, async.c got rid of duplicate notifications by scanning the list of pending events to compare each one to the proposed new event. This works okay for very small numbers of distinct events, but degrades as O(N^2) for many events. We can improve matters by using a hash table to probe for duplicates. So as not to add a lot of overhead for the simple cases that the code did handle well before, create the hash table only once a (sub)transaction has queued more than 16 distinct notify events. A downside is that we now have to do per-event work to propagate a successful subtransaction's notify events up to its parent. (But this isn't significant unless the subtransaction had many events, in which case the O(N^2) behavior would have been in play already, so we still come out ahead.) We can make some lemonade out of this lemon, though: since we must examine each event anyway, it's now possible to de-duplicate events fully, rather than skipping that for events merged up from subtransactions. Hence, remove the old weasel wording in notify.sgml about whether de-duplication happens or not, and adjust the test case in async-notify.spec that exhibited the old behavior. While at it, rearrange the definition of struct Notification to make it more compact and require just one palloc per event, rather than two or three. This saves space when there are a lot of events, in fact more than enough to buy back the space needed for the hash table. Patch by me, based on discussions around a different patch submitted by Filip Rembiałkowski. Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us	2019-08-15 12:22:12 -04:00
Michael Paquier	66bde49d96	Fix inconsistencies and typos in the tree, take 10 This addresses some issues with unnecessary code comments, fixes various typos in docs and comments, and removes some orphaned structures and definitions. Author: Alexander Lakhin Discussion: https://postgr.es/m/9aabc775-5494-b372-8bcb-4dfc0bd37c68@gmail.com	2019-08-13 13:53:41 +09:00
Heikki Linnakangas	1169fcf129	Fix predicate-locking of HOT updated rows. In serializable mode, heap_hot_search_buffer() incorrectly acquired a predicate lock on the root tuple, not the returned tuple that satisfied the visibility checks. As explained in README-SSI, the predicate lock does not need to be copied or extended to other tuple versions, but for that to work, the correct, visible, tuple version must be locked in the first place. The original SSI commit had this bug in it, but it was fixed back in 2013, in commit `81fbbfe335`. But unfortunately, it was reintroduced a few months later in commit `b89e151054`. Wising up from that, add a regression test to cover this, so that it doesn't get reintroduced again. Also, move the code that sets 't_self', so that it happens at the same time that the other HeapTuple fields are set, to make it more clear that all the code in the loop operate on the "current" tuple in the chain, not the root tuple. Bug spotted by Andres Freund, analysis and original fix by Thomas Munro, test case and some additional changes to the fix by Heikki Linnakangas. Backpatch to all supported versions (9.4). Discussion: https://www.postgresql.org/message-id/20190731210630.nqhszuktygwftjty%40alap3.anarazel.de	2019-08-07 12:40:49 +03:00
Michael Paquier	8548ddc61b	Fix inconsistencies and typos in the tree, take 9 This addresses more issues with code comments, variable names and unreferenced variables. Author: Alexander Lakhin Discussion: https://postgr.es/m/7ab243e0-116d-3e44-d120-76b3df7abefd@gmail.com	2019-08-05 12:14:58 +09:00
Tom Lane	da9456d22a	Add an isolation test to exercise parallel-worker deadlock resolution. Commit `a1c1af2a1` added logic in the deadlock checker to handle lock grouping, but it was very poorly tested, as evidenced by the bug fixed in `3420851a2`. Add a test case that exercises that a bit better (and catches the bug --- if you revert `3420851a2`, this will hang). Since it's pretty hard to get parallel workers to take exclusive regular locks that their parents don't already have, this test operates by creating a deadlock among advisory locks taken in parallel workers. To make that happen, we must override the parallel-safety labeling of the advisory-lock functions, which we do by putting them in mislabeled, non-inlinable wrapper functions. We also have to remove the redundant PreventAdvisoryLocksInParallelMode checks in lockfuncs.c. That seems fine though; if some user accidentally does what this test is intentionally doing, not much harm will ensue. (If there are any remaining bugs that are reachable that way, they're probably reachable in other ways too.) Discussion: https://postgr.es/m/3243.1564437314@sss.pgh.pa.us	2019-08-01 11:50:00 -04:00
Tom Lane	b10f40bf0e	Improve test coverage for LISTEN/NOTIFY. We had no actual end-to-end test of NOTIFY message delivery. In the core async.sql regression test, testing this is problematic because psql traditionally prints the PID of the sending backend, making the output unstable. We also have an isolation test script, but it likewise failed to prove that delivery worked, because isolationtester.c had no provisions for detecting/reporting NOTIFY messages. Hence, add such provisions to isolationtester.c, and extend async-notify.spec to include direct tests of basic NOTIFY functionality. I also added tests showing that NOTIFY de-duplicates messages normally, but not across subtransaction boundaries. (That's the historical behavior since we introduced subtransactions, though perhaps we ought to change it.) Patch by me, with suggestions/review by Andres Freund. Discussion: https://postgr.es/m/31304.1564246011@sss.pgh.pa.us	2019-07-28 12:02:27 -04:00
Tom Lane	30717637c1	Fix isolationtester race condition for notices sent before blocking. If a test sends a notice just before blocking, it's possible on slow machines for isolationtester to detect the blocked state before it's consumed the notice. (For this to happen, the notice would have to arrive after isolationtester has waited for data for 10ms, so on fast/lightly-loaded machines it's hard to reproduce the failure.) But, if we have seen the backend as blocked, it's certainly already sent any notices it's going to send. Therefore, one more round of PQconsumeInput and PQisBusy should be enough to collect and process any such notices. This appears to explain the instability noted in commit `ebd499282`, so undo the hack therein to not print notices from insert-conflict-specconflict. Patch by me, diagnosis by Andres Freund. Discussion: https://postgr.es/m/14616.1564251339@sss.pgh.pa.us	2019-07-27 20:21:54 -04:00
Tom Lane	ebd4992821	Don't drop NOTICE messages in isolation tests. For its entire existence, isolationtester.c has forced client_min_messages to WARNING, but that seems like a very poor choice of test design. It should be up to individual test scripts to manage whether they emit notices and to ensure that the results are stable. (There were no NOTICE messages in the original set of isolation tests, so this was certainly dead code when committed, but perhaps it was needed at some earlier point.) It's possible that the original motivation was due to platform-dependent variations in the timing of stdout vs. stderr output. That should be moot since commits 73bcb76b7/6eda3e9c2, but just in case, adjust isotesterNoticeProcessor to print to stdout not stderr. (stderr seems like the wrong thing anyway: it should be for error printouts not expected test output.) Testing shows that the notices in insert-conflict-specconflict are indeed a bit timing-unstable on very slow machines, so hide them; maybe we can improve that later. Also, make the notices in plpgsql-toast a bit less verbose than the original code would've had them. Discussion: https://postgr.es/m/14616.1564251339@sss.pgh.pa.us	2019-07-27 15:59:57 -04:00
Alvaro Herrera	8b21b416ed	Avoid spurious deadlocks when upgrading a tuple lock This puts back reverted commit `de87a084c0`, with some bug fixes. When two (or more) transactions are waiting for transaction T1 to release a tuple-level lock, and transaction T1 upgrades its lock to a higher level, a spurious deadlock can be reported among the waiting transactions when T1 finishes. The simplest example case seems to be: T1: select id from job where name = 'a' for key share; Y: select id from job where name = 'a' for update; -- starts waiting for T1 Z: select id from job where name = 'a' for key share; T1: update job set name = 'b' where id = 1; Z: update job set name = 'c' where id = 1; -- starts waiting for T1 T1: rollback; At this point, transaction Y is rolled back on account of a deadlock: Y holds the heavyweight tuple lock and is waiting for the Xmax to be released, while Z holds part of the multixact and tries to acquire the heavyweight lock (per protocol) and goes to sleep; once T1 releases its part of the multixact, Z is awakened only to be put back to sleep on the heavyweight lock that Y is holding while sleeping. Kaboom. This can be avoided by having Z skip the heavyweight lock acquisition. As far as I can see, the biggest downside is that if there are multiple Z transactions, the order in which they resume after T1 finishes is not guaranteed. Backpatch to 9.6. The patch applies cleanly on 9.5, but the new tests don't work there (because isolationtester is not smart enough), so I'm not going to risk it. Author: Oleksii Kliukin Discussion: https://postgr.es/m/B9C9D7CD-EB94-4635-91B6-E558ACEC0EC3@hintbits.com Discussion: https://postgr.es/m/2815.1560521451@sss.pgh.pa.us	2019-06-18 18:23:16 -04:00
Michael Paquier	3412030205	Fix more typos and inconsistencies in the tree Author: Alexander Lakhin Discussion: https://postgr.es/m/0a5419ea-1452-a4e6-72ff-545b1a5a8076@gmail.com	2019-06-17 16:13:16 +09:00
Alvaro Herrera	9d20b0ec8f	Revert "Avoid spurious deadlocks when upgrading a tuple lock" This reverts commits `3da73d6839` and `de87a084c0`. This code has some tricky corner cases that I'm not sure are correct and not properly tested anyway, so I'm reverting the whole thing for next week's releases (reintroducing the deadlock bug that we set to fix). I'll try again afterwards. Discussion: https://postgr.es/m/E1hbXKQ-0003g1-0C@gemulon.postgresql.org	2019-06-16 22:24:21 -04:00
Alvaro Herrera	de87a084c0	Avoid spurious deadlocks when upgrading a tuple lock When two (or more) transactions are waiting for transaction T1 to release a tuple-level lock, and transaction T1 upgrades its lock to a higher level, a spurious deadlock can be reported among the waiting transactions when T1 finishes. The simplest example case seems to be: T1: select id from job where name = 'a' for key share; Y: select id from job where name = 'a' for update; -- starts waiting for X Z: select id from job where name = 'a' for key share; T1: update job set name = 'b' where id = 1; Z: update job set name = 'c' where id = 1; -- starts waiting for X T1: rollback; At this point, transaction Y is rolled back on account of a deadlock: Y holds the heavyweight tuple lock and is waiting for the Xmax to be released, while Z holds part of the multixact and tries to acquire the heavyweight lock (per protocol) and goes to sleep; once X releases its part of the multixact, Z is awakened only to be put back to sleep on the heavyweight lock that Y is holding while sleeping. Kaboom. This can be avoided by having Z skip the heavyweight lock acquisition. As far as I can see, the biggest downside is that if there are multiple Z transactions, the order in which they resume after X finishes is not guaranteed. Backpatch to 9.6. The patch applies cleanly on 9.5, but the new tests don't work there (because isolationtester is not smart enough), so I'm not going to risk it. Author: Oleksii Kliukin Discussion: https://postgr.es/m/B9C9D7CD-EB94-4635-91B6-E558ACEC0EC3@hintbits.com	2019-06-13 17:28:24 -04:00
Tom Lane	8255c7a5ee	Phase 2 pgindent run for v12. Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com	2019-05-22 13:04:48 -04:00
Andres Freund	08e2edc076	Add isolation test for INSERT ON CONFLICT speculative insertion failure. This path previously was not reliably covered. There was some heuristic coverage via insert-conflict-toast.spec, but that test is not deterministic, and only tested for a somewhat specific bug. Backpatch, as this is a complicated and otherwise untested code path. Unfortunately 9.5 cannot handle two waiting sessions, and thus cannot execute this test. Triggered by a conversion with Melanie Plageman. Author: Andres Freund Discussion: https://postgr.es/m/CAAKRu_a7hbyrk=wveHYhr4LbcRnRCG=yPUVoQYB9YO1CdUBE9Q@mail.gmail.com Backpatch: 9.5-	2019-05-14 11:51:29 -07:00
Tom Lane	fc9a62af3f	Move logging.h and logging.c from src/fe_utils/ to src/common/. The original placement of this module in src/fe_utils/ is ill-considered, because several src/common/ modules have dependencies on it, meaning that libpgcommon and libpgfeutils now have mutual dependencies. That makes it pointless to have distinct libraries at all. The intended design is that libpgcommon is lower-level than libpgfeutils, so only dependencies from the latter to the former are acceptable. We already have the precedent that fe_memutils and a couple of other modules in src/common/ are frontend-only, so it's not stretching anything out of whack to treat logging.c as a frontend-only module in src/common/. To the extent that such modules help provide a common frontend/backend environment for the rest of common/ to use, it's a reasonable design. (logging.c does not yet provide an ereport() emulation, but one can dream.) Hence, move these files over, and revert basically all of the build-system changes made by commit `cc8d41511`. There are no places that need to grow new dependencies on libpgcommon, further reinforcing the idea that this is the right solution. Discussion: https://postgr.es/m/a912ffff-f6e4-778a-c86a-cf5c47a12933@2ndquadrant.com	2019-05-14 14:20:10 -04:00
Tom Lane	a2418f9e23	Test some more cases with partitioned tables in EvalPlanQual. We weren't testing anything involving EPQ on UPDATEs that move tuples into different partitions. Depending on the implementation, it might be that these cases aren't actually very interesting ... but given our thin coverage of EPQ in general, I think it's a good idea to have a test case. Amit Langote, minor tweak by me Discussion: https://postgr.es/m/7889df35-ad1a-691a-00e3-4d4b18f364e3@lab.ntt.co.jp	2019-04-09 11:43:03 -04:00
Tom Lane	a8cb8f1246	Fix EvalPlanQualStart to handle partitioned result rels correctly. The es_root_result_relations array needs to be shallow-copied in the same way as the main es_result_relations array, else EPQ rechecks on partitioned result relations fail, as seen in bug #15677 from Norbert Benkocs. Amit Langote, isolation test case added by me Discussion: https://postgr.es/m/15677-0bf089579b4cd02d@postgresql.org Discussion: https://postgr.es/m/19321.1554567786@sss.pgh.pa.us	2019-04-08 12:20:22 -04:00
Andres Freund	41f5e04aec	Fix a number of issues around modifying a previously updated row. This commit fixes three, unfortunately related, issues: 1) Since `5db6df0c01`, the introduction of DML via tableam, it was possible to trigger "ERROR: unexpected table_lock_tuple status: 1" when updating a row that was previously updated in the same transaction - but only when the previously updated row was before updated in a concurrent transaction (and READ COMMITTED was used). The reason for that was that that case simply wasn't expected. Fixing that lead to: 2) Even before the above commit, there were error checks (introduced in `6868ed7491`) preventing a row being updated by different commands within the same statement (say in a function called by an UPDATE) - but that check wasn't performed when the row was first updated in a concurrent transaction - instead the second update was silently skipped in that case. After this change we throw the same error as we'd without the concurrent transaction. 3) The error messages (introduced in `6868ed7491`) preventing such updates emitted the same error message for both DELETE and UPDATE ("tuple to be updated was already modified by an operation triggered by the current command"). While that could be changed separately, it made it hard to write tests that verify the correct correct behavior of the code. This commit changes heap's implementation of table_lock_tuple() to return TM_SelfModified instead of TM_Invisible (previously loosely modeled after EvalPlanQualFetch), and teaches nodeModifyTable.c to handle that in response to table_lock_tuple() and not just in response to table_(delete\|update). Additionally it fixes the wrong error message (see 3 above). The comment for table_lock_tuple() is also adjusted to state that TM_Deleted won't return information in TM_FailureData - it'll not always be available. This also adds tests to ensure that DELETE/UPDATE correctly error out when affecting a row that concurrently was modified by another transaction. Author: Andres Freund Reported-By: Tom Lane, when investigating a bug bug fix to another bug by Amit Langote Discussion: https://postgr.es/m/19321.1554567786@sss.pgh.pa.us	2019-04-07 22:14:47 -07:00
Alvaro Herrera	f56f8f8da6	Support foreign keys that reference partitioned tables Previously, while primary keys could be made on partitioned tables, it was not possible to define foreign keys that reference those primary keys. Now it is possible to do that. Author: Álvaro Herrera Reviewed-by: Amit Langote, Jesper Pedersen Discussion: https://postgr.es/m/20181102234158.735b3fevta63msbj@alvherre.pgsql	2019-04-03 14:40:21 -03:00
Peter Eisentraut	cc8d415117	Unified logging system for command-line programs This unifies the various ad hoc logging (message printing, error printing) systems used throughout the command-line programs. Features: - Program name is automatically prefixed. - Message string does not end with newline. This removes a common source of inconsistencies and omissions. - Additionally, a final newline is automatically stripped, simplifying use of PQerrorMessage() etc., another common source of mistakes. - I converted error message strings to use %m where possible. - As a result of the above several points, more translatable message strings can be shared between different components and between frontends and backend, without gratuitous punctuation or whitespace differences. - There is support for setting a "log level". This is not meant to be user-facing, but can be used internally to implement debug or verbose modes. - Lazy argument evaluation, so no significant overhead if logging at some level is disabled. - Some color in the messages, similar to gcc and clang. Set PG_COLOR=auto to try it out. Some colors are predefined, but can be customized by setting PG_COLORS. - Common files (common/, fe_utils/, etc.) can handle logging much more simply by just using one API without worrying too much about the context of the calling program, requiring callbacks, or having to pass "progname" around everywhere. - Some programs called setvbuf() to make sure that stderr is unbuffered, even on Windows. But not all programs did that. This is now done centrally. Soft goals: - Reduces vertical space use and visual complexity of error reporting in the source code. - Encourages more deliberate classification of messages. For example, in some cases it wasn't clear without analyzing the surrounding code whether a message was meant as an error or just an info. - Concepts and terms are vaguely aligned with popular logging frameworks such as log4j and Python logging. This is all just about printing stuff out. Nothing affects program flow (e.g., fatal exits). The uses are just too varied to do that. Some existing code had wrappers that do some kind of print-and-exit, and I adapted those. I tried to keep the output mostly the same, but there is a lot of historical baggage to unwind and special cases to consider, and I might not always have succeeded. One significant change is that pg_rewind used to write all error messages to stdout. That is now changed to stderr. Reviewed-by: Donald Dong <xdong@csumb.edu> Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru> Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com	2019-04-01 20:01:35 +02:00
Peter Eisentraut	5dc92b844e	REINDEX CONCURRENTLY This adds the CONCURRENTLY option to the REINDEX command. A REINDEX CONCURRENTLY on a specific index creates a new index (like CREATE INDEX CONCURRENTLY), then renames the old index away and the new index in place and adjusts the dependencies, and then drops the old index (like DROP INDEX CONCURRENTLY). The REINDEX command also has the capability to run its other variants (TABLE, DATABASE) with the CONCURRENTLY option (but not SYSTEM). The reindexdb command gets the --concurrently option. Author: Michael Paquier, Andreas Karlsson, Peter Eisentraut Reviewed-by: Andres Freund, Fujii Masao, Jim Nasby, Sergei Kornilov Discussion: https://www.postgresql.org/message-id/flat/60052986-956b-4478-45ed-8bd119e9b9cf%402ndquadrant.com#74948a1044c56c5e817a5050f554ddee	2019-03-29 08:26:33 +01:00
Andres Freund	5db6df0c01	tableam: Add tuple_{insert, delete, update, lock} and use. This adds new, required, table AM callbacks for insert/delete/update and lock_tuple. To be able to reasonably use those, the EvalPlanQual mechanism had to be adapted, moving more logic into the AM. Previously both delete/update/lock call-sites and the EPQ mechanism had to have awareness of the specific tuple format to be able to fetch the latest version of a tuple. Obviously that needs to be abstracted away. To do so, move the logic that find the latest row version into the AM. lock_tuple has a new flag argument, TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last version, rather than the current one. It'd have been possible to do so via a separate callback as well, but finding the last version usually also necessitates locking the newest version, making it sensible to combine the two. This replaces the previous use of EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously signaled either a concurrent update or delete, is now split into two, to avoid callers needing AM specific knowledge to differentiate. The move of finding the latest row version into tuple_lock means that encountering a row concurrently moved into another partition will now raise an error about "tuple to be locked" rather than "tuple to be updated/deleted" - which is accurate, as that always happens when locking rows. While possible slightly less helpful for users, it seems like an acceptable trade-off. As part of this commit HTSU_Result has been renamed to TM_Result, and its members been expanded to differentiated between updating and deleting. HeapUpdateFailureData has been renamed to TM_FailureData. The interface to speculative insertion is changed so nodeModifyTable.c does not have to set the speculative token itself anymore. Instead there's a version of tuple_insert, tuple_insert_speculative, that performs the speculative insertion (without requiring a flag to signal that fact), and the speculative insertion is either made permanent with table_complete_speculative(succeeded = true) or aborted with succeeded = false). Note that multi_insert is not yet routed through tableam, nor is COPY. Changing multi_insert requires changes to copy.c that are large enough to better be done separately. Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED VIEW are also only going to be adjusted in a later commit. Author: Andres Freund and Haribabu Kommi Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql	2019-03-23 19:55:57 -07:00
Andres Freund	cdcffe2263	Expand EPQ tests for UPDATEs and DELETEs Previously there was basically no coverage for UPDATEs encountering deleted rows, and no coverage for DELETE having to perform EPQ. That's problematic for an upcoming commit in which EPQ is tought to integrate with tableams. Also, there was no test for UPDATE to encounter a row UPDATEd into another partition. Author: Andres Freund	2019-03-22 19:55:23 -07:00
Thomas Munro	bb16aba50c	Enable parallel query with SERIALIZABLE isolation. Previously, the SERIALIZABLE isolation level prevented parallel query from being used. Allow the two features to be used together by sharing the leader's SERIALIZABLEXACT with parallel workers. An extra per-SERIALIZABLEXACT LWLock is introduced to make it safe to share, and new logic is introduced to coordinate the early release of the SERIALIZABLEXACT required for the SXACT_FLAG_RO_SAFE optimization, as follows: The first backend to observe the SXACT_FLAG_RO_SAFE flag (set by some other transaction) will 'partially release' the SERIALIZABLEXACT, meaning that the conflicts and locks it holds are released, but the SERIALIZABLEXACT itself will remain active because other backends might still have a pointer to it. Whenever any backend notices the SXACT_FLAG_RO_SAFE flag, it clears its own MySerializableXact variable and frees local resources so that it can skip SSI checks for the rest of the transaction. In the special case of the leader process, it transfers the SERIALIZABLEXACT to a new variable SavedSerializableXact, so that it can be completely released at the end of the transaction after all workers have exited. Remove the serializable_okay flag added to CreateParallelContext() by commit `9da0cc35`, because it's now redundant. Author: Thomas Munro Reviewed-by: Haribabu Kommi, Robert Haas, Masahiko Sawada, Kevin Grittner Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com	2019-03-15 17:47:04 +13:00
Tom Lane	4be058fe9e	In the planner, replace an empty FROM clause with a dummy RTE. The fact that "SELECT expression" has no base relations has long been a thorn in the side of the planner. It makes it hard to flatten a sub-query that looks like that, or is a trivial VALUES() item, because the planner generally uses relid sets to identify sub-relations, and such a sub-query would have an empty relid set if we flattened it. prepjointree.c contains some baroque logic that works around this in certain special cases --- but there is a much better answer. We can replace an empty FROM clause with a dummy RTE that acts like a table of one row and no columns, and then there are no such corner cases to worry about. Instead we need some logic to get rid of useless dummy RTEs, but that's simpler and covers more cases than what was there before. For really trivial cases, where the query is just "SELECT expression" and nothing else, there's a hazard that adding the extra RTE makes for a noticeable slowdown; even though it's not much processing, there's not that much for the planner to do overall. However testing says that the penalty is very small, close to the noise level. In more complex queries, this is able to find optimizations that we could not find before. The new RTE type is called RTE_RESULT, since the "scan" plan type it gives rise to is a Result node (the same plan we produced for a "SELECT expression" query before). To avoid confusion, rename the old ResultPath path type to GroupResultPath, reflecting that it's only used in degenerate grouping cases where we know the query produces just one grouped row. (It wouldn't work to unify the two cases, because there are different rules about where the associated quals live during query_planner.) Note: although this touches readfuncs.c, I don't think a catversion bump is required, because the added case can't occur in stored rules, only plans. Patch by me, reviewed by David Rowley and Mark Dilger Discussion: https://postgr.es/m/15944.1521127664@sss.pgh.pa.us	2019-01-28 17:54:23 -05:00
Peter Eisentraut	be2e329f2e	isolationtester: Use atexit() Replace exit_nicely() calls with standard exit() and register the cleanup actions using atexit(). Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/ec4135ba-84e9-28bf-b584-0e78d47448d5@2ndquadrant.com/	2019-01-07 16:25:16 +01:00
Bruce Momjian	97c39498e5	Update copyright for 2019 Backpatch-through: certain files through 9.4	2019-01-02 12:44:25 -05:00
Alexander Korotkov	0c6f4f9212	Reduce length of GIN predicate locking isolation test suite Isolation test suite of GIN predicate locking was criticized for being too slow, especially under Valgrind. This commit is intended to accelerate it. Tests are simplified in the following ways. 1) Amount of data is reduced. We're now close to the minimal amount of data, which produces at least one posting tree and at least two pages of entry tree. 2) Three isolation tests are merged into one. 3) Only one tuple is queried from posting tree. So, locking of index is the same, but tuple locks are not propagated to relation lock. Also, it is faster. 4) Test cases itself are simplified. Now each test case run just one INSERT and one SELECT involving GIN, which either conflict or not. Discussion: https://postgr.es/m/20181204000740.ok2q53nvkftwu43a%40alap3.anarazel.de Reported-by: Andres Freund Tested-by: Andrew Dunstan Author: Alexander Korotkov Backpatch-through: 11	2018-12-28 03:33:10 +03:00
Michael Paquier	1e504f01da	Ignore inherited temp relations from other sessions when truncating Inheritance trees can include temporary tables if the parent is permanent, which makes possible the presence of multiple temporary children from different sessions. Trying to issue a TRUNCATE on the parent in this scenario causes a failure, so similarly to any other queries just ignore such cases, which makes TRUNCATE work transparently. This makes truncation behave similarly to any other DML query working on the parent table with queries which need to be work on the children. A set of isolation tests is added to cover basic cases. Reported-by: Zhou Digoal Author: Amit Langote, Michael Paquier Discussion: https://postgr.es/m/15565-ce67a48d0244436a@postgresql.org Backpatch-through: 9.4	2018-12-27 10:16:19 +09:00
Noah Misch	1db439ad49	Raise some timeouts to 180s, in test code. Slow runs of buildfarm members chipmunk, hornet and mandrill saw the shorter timeouts expire. The 180s timeout in poll_query_until has been trouble-free since `2a0f89cd71` introduced it two years ago, so use 180s more widely. Back-patch to 9.6, where the first of these timeouts was introduced. Reviewed by Michael Paquier. Discussion: https://postgr.es/m/20181209001601.GC2973271@rfd.leadboat.com	2018-12-10 20:15:42 -08:00
Michael Paquier	ee2b37ae04	Add some missing schema qualifications This does not improve the security and reliability of the touched areas, but it makes the style more consistent. Author: Michael Paquier Reviewed-by- Noah Misch Discussion: https://postgr.es/m/20180309075538.GD9376@paquier.xyz	2018-12-03 14:21:52 +09:00
Alvaro Herrera	a28e10e82e	Indicate session name in isolationtester notices When a session under isolationtester produces printable notices (NOTICE, WARNING) we were just printing them unadorned, which can be confusing when debugging. Prefix them with the session name, which makes things clearer. Author: Álvaro Herrera Reviewed-by: Hari Babu Kommi Discussion: https://postgr.es/m/20181024213451.75nh3f3dctmcdbfq@alvherre.pgsql	2018-11-09 13:08:00 -03:00
Tom Lane	9958b2b2a8	Fix minor bug in isolationtester. If the lock wait query failed, isolationtester would report the PQerrorMessage from some other connection, meaning there would be no message or an unrelated one. This seems like a pretty unlikely occurrence, but if it did happen, this bug could make it really difficult/confusing to figure out what happened. That seems to justify patching all the way back. In passing, clean up another place where the "wrong" conn was used for an error report. That one's not actually buggy because it's a different alias for the same connection, but it's still confusing to the reader.	2018-10-17 15:06:57 -04:00
Michael Paquier	803b1301e8	Add option SKIP_LOCKED to VACUUM and ANALYZE When specified, this option allows VACUUM to skip the work on a relation if there is a conflicting lock on it when trying to open it at the beginning of its processing. Similarly to autovacuum, this comes with a couple of limitations while the relation is processed which can cause the process to still block: - when opening the relation indexes. - when acquiring row samples for table inheritance trees, partition trees or certain types of foreign tables, and that a lock is taken on some leaves of such trees. Author: Nathan Bossart Reviewed-by: Michael Paquier, Andres Freund, Masahiko Sawada Discussion: https://postgr.es/m/9EF7EBE4-720D-4CF1-9D0E-4403D7E92990@amazon.com Discussion: https://postgr.es/m/20171201160907.27110.74730@wrigleys.postgresql.org	2018-10-04 09:00:33 +09:00
Tom Lane	1f4a920b73	Fix failure with initplans used conditionally during EvalPlanQual rechecks. The EvalPlanQual machinery assumes that any initplans (that is, uncorrelated sub-selects) used during an EPQ recheck would have already been evaluated during the main query; this is implicit in the fact that execPlan pointers are not copied into the EPQ estate's es_param_exec_vals. But it's possible for that assumption to fail, if the initplan is only reached conditionally. For example, a sub-select inside a CASE expression could be reached during a recheck when it had not been previously, if the CASE test depends on a column that was just updated. This bug is old, appearing to date back to my rewrite of EvalPlanQual in commit `9f2ee8f28`, but was not detected until Kyle Samson reported a case. To fix, force all not-yet-evaluated initplans used within the EPQ plan subtree to be evaluated at the start of the recheck, before entering the EPQ environment. This could be inefficient, if such an initplan is expensive and goes unused again during the recheck --- but that's piling one layer of improbability atop another. It doesn't seem worth adding more complexity to prevent that, at least not in the back branches. It was convenient to use the new-in-v11 ExecEvalParamExecParams function to implement this, but I didn't like either its name or the specifics of its API, so revise that. Back-patch all the way. Rather than rewrite the patch to avoid depending on bms_next_member() in the oldest branches, I chose to back-patch that function into 9.4 and 9.3. (This isn't the first time back-patches have needed that, and it exhausted my patience.) I also chose to back-patch some test cases added by commits `71404af2a` and `342a1ffa2` into 9.4 and 9.3, so that the 9.x versions of eval-plan-qual.spec are all the same. Andrew Gierth diagnosed the problem and contributed the added test cases, though the actual code changes are by me. Discussion: https://postgr.es/m/A033A40A-B234-4324-BE37-272279F7B627@tripadvisor.com	2018-09-15 13:42:33 -04:00
Michael Paquier	a556549d7e	Improve VACUUM and ANALYZE by avoiding early lock queue A caller of VACUUM can perform early lookup obtention which can cause other sessions to block on the request done, causing potentially DOS attacks as even a non-privileged user can attempt a vacuum fill of a critical catalog table to block even all incoming connection attempts. Contrary to TRUNCATE, a client could attempt a system-wide VACUUM after building the list of relations to VACUUM, which can cause vacuum_rel() or analyze_rel() to try to lock the relation but the operation would just block. When the client specifies a list of relations and the relation needs to be skipped, ownership checks are done when building the list of relations to work on, preventing a later lock attempt. vacuum_rel() already had the sanity checks needed, except that those were applied too late. This commit refactors the code so as relation skips are checked beforehand, making it safer to avoid too early locks, for both manual VACUUM with and without a list of relations specified. An isolation test is added emulating the fact that early locks do not happen anymore, issuing a WARNING message earlier if the user calling VACUUM is not a relation owner. When a partitioned table is listed in a manual VACUUM or ANALYZE command, its full list of partitions is fetched, all partitions get added to the list to work on, and then each one of them is processed one by one, with ownership checks happening at the later phase of vacuum_rel() or analyze_rel(). Trying to do early ownership checks for each partition is proving to be tedious as this would result in deadlock risks with lock upgrades, and skipping all partitions if the listed partitioned table is not owned would result in a behavior change compared to how Postgres 10 has implemented vacuum for partitioned tables. The original problem reported related to early lock queue for critical relations is fixed anyway, so priority is given to avoiding a backward-incompatible behavior. Reported-by: Lloyd Albin, Jeremy Schneider Author: Michael Paquier Reviewed by: Nathan Bossart, Kyotaro Horiguchi Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org Discussion: https://postgr.es/m/20180812222142.GA6097@paquier.xyz	2018-08-27 09:11:12 +09:00
Tom Lane	cc4f6b7786	Clean up assorted misuses of snprintf()'s result value. Fix a small number of places that were testing the result of snprintf() but doing so incorrectly. The right test for buffer overrun, per C99, is "result >= bufsize" not "result > bufsize". Some places were also checking for failure with "result == -1", but the standard only says that a negative value is delivered on failure. (Note that this only makes these places correct if snprintf() delivers C99-compliant results. But at least now these places are consistent with all the other places where we assume that.) Also, make psql_start_test() and isolation_start_test() check for buffer overrun while constructing their shell commands. There seems like a higher risk of overrun, with more severe consequences, here than there is for the individual file paths that are made elsewhere in the same functions, so this seemed like a worthwhile change. Also fix guc.c's do_serialize() to initialize errno = 0 before calling vsnprintf. In principle, this should be unnecessary because vsnprintf should have set errno if it returns a failure indication ... but the other two places this coding pattern is cribbed from don't assume that, so let's be consistent. These errors are all very old, so back-patch as appropriate. I think that only the shell command overrun cases are even theoretically reachable in practice, but there's not much point in erroneous error checks. Discussion: https://postgr.es/m/17245.1534289329@sss.pgh.pa.us	2018-08-15 16:29:31 -04:00
Michael Paquier	f841ceb26d	Improve TRUNCATE by avoiding early lock queue A caller of TRUNCATE could previously queue for an access exclusive lock on a relation it may not have permission to truncate, potentially interfering with users authorized to work on it. This can be very intrusive depending on the lock attempted to be taken. For example, pg_authid could be blocked, preventing any authentication attempt to happen on a PostgreSQL instance. This commit fixes the case of TRUNCATE so as RangeVarGetRelidExtended is used with a callback doing the necessary ACL checks at an earlier stage, avoiding lock queuing issues, so as an immediate failure happens for unprivileged users instead of waiting on a lock that would not be taken. This is rather similar to the type of work done in `cbe24a6` for CLUSTER, and the code of TRUNCATE is this time refactored so as there is no user-facing changes. As the commit for CLUSTER, no back-patch is done. Reported-by: Lloyd Albin, Jeremy Schneider Author: Michael Paquier Reviewed by: Nathan Bossart, Kyotaro Horiguchi Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org Discussion: https://postgr.es/m/20180806165816.GA19883@paquier.xyz	2018-08-10 18:26:59 +02:00
Amit Kapila	40ca70ebcc	Allow using the updated tuple while moving it to a different partition. An update that causes the tuple to be moved to a different partition was missing out on re-constructing the to-be-updated tuple, based on the latest tuple in the update chain. Instead, it's simply deleting the latest tuple and inserting a new tuple in the new partition based on the old tuple. Commit `2f17844104` didn't consider this case, so some of the updates were getting lost. In passing, change the argument order for output parameter in ExecDelete and add some commentary about it. Reported-by: Pavan Deolasee Author: Amit Khandekar, with minor changes by me Reviewed-by: Dilip Kumar, Amit Kapila and Alvaro Herrera Backpatch-through: 11 Discussion: https://postgr.es/m/CAJ3gD9fRbEzDqdeDq1jxqZUb47kJn+tQ7=Bcgjc8quqKsDViKQ@mail.gmail.com	2018-07-12 12:51:39 +05:30
Peter Eisentraut	6b30d1386f	Fix whitespace	2018-05-17 23:04:41 -04:00
Tom Lane	2efc924180	Detoast plpgsql variables if they might live across a transaction boundary. Up to now, it's been safe for plpgsql to store TOAST pointers in its variables because the ActiveSnapshot for whatever query called the plpgsql function will surely protect such TOAST values from being vacuumed away, even if the owning table rows are committed dead. With the introduction of procedures, that assumption is no longer good in "non atomic" executions of plpgsql code. We adopt the slightly brute-force solution of detoasting all TOAST pointers at the time they are stored into variables, if we're in a non-atomic context, just in case the owning row goes away. Some care is needed to avoid long-term memory leaks, since plpgsql tends to run with CurrentMemoryContext pointing to its call-lifespan context, but we shouldn't assume that no memory is leaked by heap_tuple_fetch_attr. In plpgsql proper, we can do the detoasting work in the "eval_mcontext". Most of the code thrashing here is due to the need to add this capability to expandedrecord.c as well as plpgsql proper. In expandedrecord.c, we can't assume that the caller's context is short-lived, so make use of the short-term sub-context that was already invented for checking domain constraints. In view of this repurposing, it seems good to rename that variable and associated code from "domain_check_cxt" to "short_term_cxt". Peter Eisentraut and Tom Lane Discussion: https://postgr.es/m/5AC06865.9050005@anastigmatix.net	2018-05-16 14:56:52 -04:00
Teodor Sigaev	0bef1c0678	Re-think predicate locking on GIN indexes. The principle behind the locking was not very well thought-out, and not documented. Add a section in the README to explain how it's supposed to work, and change the code so that it actually works that way. This fixes two bugs: 1. If fast update was turned on concurrently, subsequent inserts to the pending list would not conflict with predicate locks that were acquired earlier, on entry pages. The included 'predicate-gin-fastupdate' test demonstrates that. To fix, make all scans acquire a predicate lock on the metapage. That lock represents a scan of the pending list, whether or not there is a pending list at the moment. Forget about the optimization to skip locking/checking for locks, when fastupdate=off. 2. If a scan finds no match, it still needs to lock the entry page. The point of predicate locks is to lock the gabs between values, whether or not there is a match. The included 'predicate-gin-nomatch' test tests that case. In addition to those two bug fixes, this removes some unnecessary locking, following the principle laid out in the README. Because all items in a posting tree have the same key value, a lock on the posting tree root is enough to cover all the items. (With a very large posting tree, it would possibly be better to lock the posting tree leaf pages instead, so that a "skip scan" with a query like "A & B", you could avoid unnecessary conflict if a new tuple is inserted with A but !B. But let's keep this simple.) Also, some spelling fixes. Author: Heikki Linnakangas with some editorization by me Review: Andrey Borodin, Alexander Korotkov Discussion: https://www.postgresql.org/message-id/0b3ad2c2-2692-62a9-3a04-5724f2af9114@iki.fi	2018-05-04 11:27:50 +03:00
Tom Lane	b39fd897e0	Improve regression test coverage of expand_tuple(). I was dissatisfied with the code coverage report for expand_tuple() in the wake of commit `7c44c46de`: while better than no coverage at all, it was still not exercising the core function of inserting out-of-line default values, nor was the HeapTuple-output path covered. So far as I can find, the only code path that reaches the latter at present is EvalPlanQual fetches for non-locked tables. Hence, extend eval-plan-qual.spec to test cases where out-of-line defaults must be inserted into a tuple fetched from a non-locked table. Discussion: https://postgr.es/m/87woxi24uw.fsf@ansel.ydns.eu	2018-04-14 19:02:30 -04:00
Simon Riggs	08ea7a2291	Revert MERGE patch This reverts commits `d204ef6377`, `83454e3c2b` and a few more commits thereafter (complete list at the end) related to MERGE feature. While the feature was fully functional, with sufficient test coverage and necessary documentation, it was felt that some parts of the executor and parse-analyzer can use a different design and it wasn't possible to do that in the available time. So it was decided to revert the patch for PG11 and retry again in the future. Thanks again to all reviewers and bug reporters. List of commits reverted, in reverse chronological order: `f1464c5380` Improve parse representation for MERGE `ddb4158579` MERGE syntax diagram correction `530e69e59b` Allow cpluspluscheck to pass by renaming variable `01b88b4df5` MERGE minor errata `3af7b2b0d4` MERGE fix variable warning in non-assert builds `a5d86181ec` MERGE INSERT allows only one VALUES clause `4b2d44031f` MERGE post-commit review `4923550c20` Tab completion for MERGE `aa3faa3c7a` WITH support in MERGE `83454e3c2b` New files for MERGE `d204ef6377` MERGE SQL Command following SQL:2016 Author: Pavan Deolasee Reviewed-by: Michael Paquier	2018-04-12 11:22:56 +01:00

1 2 3 4 5

233 commits