postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-04-22 22:59:54 -04:00

Author	SHA1	Message	Date
Peter Geoghegan	d6f0f95a6b	Harmonize some more function parameter names. Make sure that function declarations use names that exactly match the corresponding names from function definitions in a few places. These inconsistencies were all introduced relatively recently, after the code base had parameter name mismatches fixed in bulk (see commits starting with commits `4274dc22` and `035ce1fe`). pg_bsd_indent still has a couple of similar inconsistencies, which I (pgeoghegan) have left untouched for now. Like all earlier commits that cleaned up function parameter names, this commit was written with help from clang-tidy.	2023-04-13 10:15:20 -07:00
Andres Freund	e101dfac3a	For cascading replication, wake physical and logical walsenders separately Physical walsenders can't send data until it's been flushed; logical walsenders can't decode and send data until it's been applied. On the standby, the WAL is flushed first, which will only wake up physical walsenders; and then applied, which will only wake up logical walsenders. Previously, all walsenders were awakened when the WAL was flushed. That was fine for logical walsenders on the primary; but on the standby the flushed WAL would have been not applied yet, so logical walsenders were awakened too early. Per idea from Jeff Davis and Amit Kapila. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-By: Jeff Davis <pgsql@j-davis.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAA4eK1+zO5LUeisabX10c81LU-fWMKO4M9Wyg1cdkbW7Hqh6vQ@mail.gmail.com	2023-04-08 01:06:00 -07:00
Andres Freund	be87200efd	Support invalidating replication slots due to horizon and wal_level Needed for logical decoding on a standby. Slots need to be invalidated because of the horizon if rows required for logical decoding are removed. If the primary's wal_level is lowered from 'logical', logical slots on the standby need to be invalidated. The new invalidation methods will be used in a subsequent commit. Logical slots that have been invalidated can be identified via the new pg_replication_slots.conflicting column. See `6af1793954` for an overall design of logical decoding on a standby. Bumps catversion for the addition of the new pg_replication_slots column. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-07 22:40:27 -07:00
Andres Freund	15f8203a59	Replace replication slot's invalidated_at LSN with an enum This is mainly useful because the upcoming logical-decoding-on-standby feature adds further reasons for invalidating slots, and we don't want to end up with multiple invalidated_* fields, or check different attributes. Eventually we should consider not resetting restart_lsn when invalidating a slot due to max_slot_wal_keep_size. But that's a user visible change, so left for later. Increases SLOT_VERSION, due to the changed field (with a different alignment, no less). Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-07 21:47:25 -07:00
Robert Haas	c3afe8cf5a	Add new predefined role pg_create_subscription. This role can be granted to non-superusers to allow them to issue CREATE SUBSCRIPTION. The non-superuser must additionally have CREATE permissions on the database in which the subscription is to be created. Most forms of ALTER SUBSCRIPTION, including ALTER SUBSCRIPTION .. SKIP, now require only that the role performing the operation own the subscription, or inherit the privileges of the owner. However, to use ALTER SUBSCRIPTION ... RENAME or ALTER SUBSCRIPTION ... OWNER TO, you also need CREATE permission on the database. This is similar to what we do for schemas. To change the owner of a schema, you must also have permission to SET ROLE to the new owner, similar to what we do for other object types. Non-superusers are required to specify a password for authentication and the remote side must use the password, similar to what is required for postgres_fdw and dblink. A superuser who wants a non-superuser to own a subscription that does not rely on password authentication may set the new password_required=false property on that subscription. A non-superuser may not set password_required=false and may not modify a subscription that already has password_required=false. This new password_required subscription property works much like the eponymous postgres_fdw property. In both cases, the actual semantics are that a password is not required if either (1) the property is set to false or (2) the relevant user is the superuser. Patch by me, reviewed by Andres Freund, Jeff Davis, Mark Dilger, and Stephen Frost (but some of those people did not fully endorse all of the decisions that the patch makes). Discussion: http://postgr.es/m/CA+TgmoaDH=0Xj7OBiQnsHTKcF2c4L+=gzPBUKSJLh8zed2_+Dg@mail.gmail.com	2023-03-30 11:37:19 -04:00
Amit Kapila	e709596b25	Add macros for ReorderBufferTXN toptxn. Currently, there are quite a few places in reorderbuffer.c that tries to access top-transaction for a subtransaction. This makes the code to access top-transaction consistent and easier to follow. Author: Peter Smith Reviewed-by: Vignesh C, Sawada Masahiko Discussion: https://postgr.es/m/CAHut+PuCznOyTqBQwjRUu-ibG-=KHyCv-0FTcWQtZUdR88umfg@mail.gmail.com	2023-03-17 08:29:41 +05:30
Amit Kapila	89e46da5e5	Allow the use of indexes other than PK and REPLICA IDENTITY on the subscriber. Using REPLICA IDENTITY FULL on the publisher can lead to a full table scan per tuple change on the subscription when REPLICA IDENTITY or PK index is not available. This makes REPLICA IDENTITY FULL impractical to use apart from some small number of use cases. This patch allows using indexes other than PRIMARY KEY or REPLICA IDENTITY on the subscriber during apply of update/delete. The index that can be used must be a btree index, not a partial index, and it must have at least one column reference (i.e. cannot consist of only expressions). We can uplift these restrictions in the future. There is no smart mechanism to pick the index. If there is more than one index that satisfies these requirements, we just pick the first one. We discussed using some of the optimizer's low-level APIs for this but ruled it out as that can be a maintenance burden in the long run. This patch improves the performance in the vast majority of cases and the improvement is proportional to the amount of data in the table. However, there could be some regression in a small number of cases where the indexes have a lot of duplicate and dead rows. It was discussed that those are mostly impractical cases but we can provide a table or subscription level option to disable this feature if required. Author: Onder Kalaci, Amit Kapila Reviewed-by: Peter Smith, Shi yu, Hou Zhijie, Vignesh C, Kuroda Hayato, Amit Kapila Discussion: https://postgr.es/m/CACawEhVLqmAAyPXdHEPv1ssU2c=dqOniiGz7G73HfyS7+nGV4w@mail.gmail.com	2023-03-15 08:49:04 +05:30
Amit Kapila	8c58624df4	Fix the logical replication timeout during large DDLs. The DDLs like Refresh Materialized views that generate lots of temporary data due to rewrite rules may not be processed by output plugins (for example pgoutput). So, we won't send keep-alive messages for a long time while processing such commands and that can lead the subscriber side to timeout. We have previously fixed a similar case for large transactions in commit `f95d53eded` where the output plugin filters all or most of the changes but missed to handle the DDLs. We decided not to backpatch this as this adds a new callback in the existing exposed structure and moreover, users can increase the wal_sender_timeout and wal_receiver_timeout to avoid this problem. Author: Wang wei, Hou Zhijie Reviewed-by: Peter Smith, Ashutosh Bapat, Shi yu, Amit Kapila Discussion: https://postgr.es/m/OS3PR01MB6275478E5D29E4A563302D3D9E2B9@OS3PR01MB6275.jpnprd01.prod.outlook.com Discussion: https://postgr.es/m/CAA5-nLARN7-3SLU_QUxfy510pmrYK6JJb=bk3hcgemAM_pAv+w@mail.gmail.com	2023-02-08 07:58:25 +05:30
Amit Kapila	1e8b61735c	Rename GUC logical_decoding_mode to logical_replication_mode. Rename the developer option 'logical_decoding_mode' to the more flexible name 'logical_replication_mode' because doing so will make it easier to extend this option in the future to help test other areas of logical replication. Currently, it is used on the publisher side to allow streaming or serializing each change in logical decoding. In the upcoming patch, we are planning to use it on the subscriber. On the subscriber, it will allow serializing the changes to file and notifies the parallel apply workers to read and apply them at the end of the transaction. We discussed exposing this parameter as a subscription option but it did not seem advisable since it is primarily used for testing/debugging and there is no other such parameter. We also discussed having separate GUCs for publisher and subscriber but for current testing/debugging requirements, one GUC is sufficient. Author: Hou Zhijie Reviewed-by: Peter Smith, Kuroda Hayato, Sawada Masahiko, Amit Kapila Discussion: https://postgr.es/m/CAD21AoAy2c=Mx=FTCs+EwUsf2kQL5MmU3N18X84k0EmCXntK4g@mail.gmail.com Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com	2023-01-30 08:02:08 +05:30
Tom Lane	5a3a95385b	Track logrep apply workers' last start times to avoid useless waits. Enforce wal_retrieve_retry_interval on a per-subscription basis, rather than globally, and arrange to skip that delay in case of an intentional worker exit. This probably makes little difference in the field, where apply workers wouldn't be restarted often; but it has a significant impact on the runtime of our logical replication regression tests (even though those tests use artificially-small wal_retrieve_retry_interval settings already). Nathan Bossart, with mostly-cosmetic editorialization by me Discussion: https://postgr.es/m/20221122004119.GA132961@nathanxps13	2023-01-22 14:08:46 -05:00
Andres Freund	12605414a7	Use dlists instead of SHM_QUEUE for syncrep queue Part of a series to remove SHM_QUEUE. ilist.h style lists are more widely used and have an easier to use interface. Reviewed-by: Thomas Munro <thomas.munro@gmail.com> (in an older version) Discussion: https://postgr.es/m/20221120055930.t6kl3tyivzhlrzu2@awork3.anarazel.de Discussion: https://postgr.es/m/20200211042229.msv23badgqljrdg2@alap3.anarazel.de	2023-01-18 12:15:05 -08:00
Amit Kapila	d540a02a72	Display the leader apply worker's PID for parallel apply workers. Add leader_pid to pg_stat_subscription. leader_pid is the process ID of the leader apply worker if this process is a parallel apply worker. If this field is NULL, it indicates that the process is a leader apply worker or a synchronization worker. The new column makes it easier to distinguish parallel apply workers from other kinds of workers and helps to identify the leader for the parallel workers corresponding to a particular subscription. Additionally, update the leader_pid column in pg_stat_activity as well to display the PID of the leader apply worker for parallel apply workers. Author: Hou Zhijie Reviewed-by: Peter Smith, Sawada Masahiko, Amit Kapila, Shveta Mallik Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com	2023-01-18 09:03:12 +05:30
Amit Kapila	216a784829	Perform apply of large transactions by parallel workers. Currently, for large transactions, the publisher sends the data in multiple streams (changes divided into chunks depending upon logical_decoding_work_mem), and then on the subscriber-side, the apply worker writes the changes into temporary files and once it receives the commit, it reads from those files and applies the entire transaction. To improve the performance of such transactions, we can instead allow them to be applied via parallel workers. In this approach, we assign a new parallel apply worker (if available) as soon as the xact's first stream is received and the leader apply worker will send changes to this new worker via shared memory. The parallel apply worker will directly apply the change instead of writing it to temporary files. However, if the leader apply worker times out while attempting to send a message to the parallel apply worker, it will switch to "partial serialize" mode - in this mode, the leader serializes all remaining changes to a file and notifies the parallel apply workers to read and apply them at the end of the transaction. We use a non-blocking way to send the messages from the leader apply worker to the parallel apply to avoid deadlocks. We keep this parallel apply assigned till the transaction commit is received and also wait for the worker to finish at commit. This preserves commit ordering and avoid writing to and reading from files in most cases. We still need to spill if there is no worker available. This patch also extends the SUBSCRIPTION 'streaming' parameter so that the user can control whether to apply the streaming transaction in a parallel apply worker or spill the change to disk. The user can set the streaming parameter to 'on/off', or 'parallel'. The parameter value 'parallel' means the streaming will be applied via a parallel apply worker, if available. The parameter value 'on' means the streaming transaction will be spilled to disk. The default value is 'off' (same as current behaviour). In addition, the patch extends the logical replication STREAM_ABORT message so that abort_lsn and abort_time can also be sent which can be used to update the replication origin in parallel apply worker when the streaming transaction is aborted. Because this message extension is needed to support parallel streaming, parallel streaming is not supported for publications on servers < PG16. Author: Hou Zhijie, Wang wei, Amit Kapila with design inputs from Sawada Masahiko Reviewed-by: Sawada Masahiko, Peter Smith, Dilip Kumar, Shi yu, Kuroda Hayato, Shveta Mallik Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com	2023-01-09 07:52:45 +05:30
Tom Lane	c6e1f62e2c	Wake up a subscription's replication worker processes after DDL. Waken related worker processes immediately at commit of a transaction that has performed ALTER SUBSCRIPTION (including the RENAME and OWNER variants). This reduces the response time for such operations. In the real world that might not be worth much, but it shaves several seconds off the runtime for the subscription test suite. In the case of PREPARE, we just throw away this notification state; it doesn't seem worth the work to preserve it. The workers will still react after the eventual COMMIT PREPARED, but not as quickly. Nathan Bossart Discussion: https://postgr.es/m/20221122004119.GA132961@nathanxps13	2023-01-06 17:27:58 -05:00
Bruce Momjian	c8e1ba736b	Update copyright for 2023 Backpatch-through: 11	2023-01-02 15:00:37 -05:00
Amit Kapila	5de94a041e	Add 'logical_decoding_mode' GUC. This enables streaming or serializing changes immediately in logical decoding. This parameter is intended to be used to test logical decoding and replication of large transactions for which otherwise we need to generate the changes till logical_decoding_work_mem is reached. This helps in reducing the timing of existing tests related to logical replication of in-progress transactions and will help in writing tests for for the upcoming feature for parallelly applying large in-progress transactions. Author: Shi yu Reviewed-by: Sawada Masahiko, Shveta Mallik, Amit Kapila, Dilip Kumar, Kuroda Hayato, Kyotaro Horiguchi Discussion: https://postgr.es/m/OSZPR01MB63104E7449DBE41932DB19F1FD1B9@OSZPR01MB6310.jpnprd01.prod.outlook.com	2022-12-26 08:58:16 +05:30
Amit Kapila	bf07ab492c	Avoid unnecessary streaming of transactions during logical replication. After restart, we don't perform streaming of an in-progress transaction if it was previously decoded and confirmed by the client. To achieve that we were comparing the END location of the WAL record being decoded with the WAL location we have already decoded and confirmed by the client. While decoding the commit record, to decide whether to process and send the complete transaction, we compare its START location with the WAL location we have already decoded and confirmed by the client. Now, if we need to queue some change in the transaction while decoding the commit record (e.g. snapshot), it is possible that we decide to stream the transaction but later commit processing decides to skip it. In such a case, we would needlessly send the changes and later when we decide to skip it, we will send stream abort. We also sometimes decide to stream the changes when we actually just need to process them locally like a change for invalidations. This will lead us to send empty streams. To avoid this, while queuing each change for decoding, we remember whether the transaction has any change that actually needs to be sent downstream and use that information later to decide whether to stream the transaction or not. Note, we can't avoid all cases where we have to send empty streams like the case where the plugin later decides that the change is not publishable. However, we will no longer need to send stream_abort when we skip sending a particular transaction. Author: Dilip Kumar Reviewed-by: Hou Zhijie, Ashutosh Bapat, Shi yu, Amit Kapila Discussion: https://postgr.es/m/CAFiTN-tHK=7LzfrPs8fbT2ksrOJGQbzywcgXst2bM9-rJJAAUg@mail.gmail.com	2022-12-08 06:05:09 +05:30
Amit Kapila	40b1491357	Fix incorrect output from pgoutput when using column lists. For Updates and Deletes, we were not honoring the columns list for old tuple values while sending tuple data via pgoutput. This results in pgoutput emitting more columns than expected. This is not a problem for built-in logical replication as we simply ignore additional columns based on the relation information sent previously which didn't have those columns. However, some other users of pgoutput plugin may expect the columns as per the column list. Also, sending extra columns unnecessarily consumes network bandwidth defeating the purpose of the column list feature. Reported-by: Gunnar Morling Author: Hou Zhijie Reviewed-by: Amit Kapila Backpatch-through: 15 Discussion: https://postgr.es/m/CADGJaX9kiRZ-OH0EpWF5Fkyh1ZZYofoNRCrhapBfdk02tj5EKg@mail.gmail.com	2022-12-02 10:52:58 +05:30
David Rowley	7c335b7a20	Add doubly linked count list implementation We have various requirements when using a dlist_head to keep track of the number of items in the list. This, traditionally, has been done by maintaining a counter variable in the calling code. Here we tidy this up by adding "dclist", which is very similar to dlist but also keeps track of the number of items stored in the list. Callers may use the new dclist_count() function when they need to know how many items are stored. Obtaining the count is an O(1) operation. For simplicity reasons, dclist and dlist both use dlist_node as their node type and dlist_iter/dlist_mutable_iter as their iterator type. dclists have all of the same functionality as dlists except there is no function named dclist_delete(). To remove an item from a list dclist_delete_from() must be used. This requires knowing which dclist the given item is stored in. Additionally, here we also convert some dlists where additional code exists to keep track of the number of items stored and to make these use dclists instead. Author: David Rowley Reviewed-by: Bharath Rupireddy, Aleksander Alekseev Discussion: https://postgr.es/m/CAApHDvrtVxr+FXEX0VbViCFKDGxA3tWDgw9oFewNXCJMmwLjLg@mail.gmail.com	2022-11-02 14:06:05 +13:00
Amit Kapila	776e1c8a5d	Add a common function to generate the origin name. Make a common replication origin name formatting function to replace multiple snprintf() expressions. This also includes logic previously done by ReplicationOriginNameForTablesync(). This makes the code to generate the origin name consistent among apply worker and tablesync worker. Author: Peter Smith Reviewed-By: Aleksander Alekseev Discussion: https://postgr.es/m/CAHut%2BPsa8hhfSE6ozUK-ih7GkQziAVAf4f3bqiXEj2nQiu-43g%40mail.gmail.com	2022-10-11 10:37:52 +05:30
Andres Freund	06dbd619bf	pgstat: Prevent stats reset from corrupting slotname by removing slotname Previously PgStat_StatReplSlotEntry contained the slotname, which was mainly used when writing out the stats during shutdown, to identify the slot in the serialized data (at runtime the index in ReplicationSlotCtl->replication_slots is used, but that can change during a restart). Unfortunately the slotname was overwritten when the slot's stats were reset. That turned out to only cause "real" problems if the slot was active during the reset, triggering an assertion failure at the next pgstat_report_replslot(). In other paths the stats were re-initialized during pgstat_acquire_replslot(). Fix this by removing slotname from PgStat_StatReplSlotEntry. Instead we can get the slot's name from the slot itself. Besides fixing a bug, this also is architecturally cleaner (a name is not really statistics). This is safe because stats, for a slot removed while shut down, will not be restored at startup. In 15 the slotname is not removed, but renamed, to avoid changing the stats format. In master, bump PGSTAT_FILE_FORMAT_ID. This commit does not contain a test for the fix. I think this can only be tested by a tap test starting pg_recvlogical in the background and checking pg_recvlogical's output. That type of test is notoriously hard to be reliable, so committing it shortly before the release is wrapped seems like a bad idea. Reported-by: Jaime Casanova <jcasanov@systemguards.com.ec> Author: Andres Freund <andres@anarazel.de> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/YxfagaTXUNa9ggLb@ahch-to Backpatch: 15-, where the bug was introduced in `5891c7a8ed`	2022-10-08 09:43:29 -07:00
Amit Kapila	af51b2f042	Remove unused xid parameter. Commit `6c2003f8a1` removes the use of transaction id's for exporting snapshots. This commit removes one unused xid parameter left behind in SnapBuildGetOrBuildSnapshot. Author: Melih Mutlu Reviewed-By: Zhang Mingli Discussion: https://postgr.es/m/CAGPVpCTqZRoDKgCycw+eYi+Gq41rN9pU-gntgTd7wfsNDpPL3Q@mail.gmail.com	2022-09-26 08:47:00 +05:30
Peter Geoghegan	8fb4e001e9	Harmonize more lexer function parameter names. Make sure that function declarations use names that exactly match the corresponding names from function definitions for several "lexer adjacent" backend functions. These were missed by commit `aab06442`. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com	2022-09-22 13:27:16 -07:00
Peter Geoghegan	aab06442d4	Harmonize lexer adjacent function parameter names. Make sure that function declarations use names that exactly match the corresponding names from function definitions for several "lexer adjacent" backend functions. These functions were missed by recent commits because they were obscured by clang-tidy warnings about functions whose signature is directly under the control of the lexer (flex seems to always generate function declarations with unnamed parameters). We probably can't fix most of the warnings it generates for translation units that get built from .l and .y files, but we can at least do this much. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com	2022-09-21 13:21:36 -07:00
Amit Kapila	a932824dfe	Pass Size as a 2nd argument for snprintf() in tablesync.c. Previously the following snprintf() wrappers: * ReplicationSlotNameForTablesync() * ReplicationOriginNameForTablesync() ... used int as a second argument of snprintf() while the actual type of it is size_t. Although it doesn't fail at present better replace it with Size for consistency with the rest of the system. Author: Aleksander Alekseev Reviewed-By: Peter Smith Discussion: https://postgr.es/m/CAHut%2BPsa8hhfSE6ozUK-ih7GkQziAVAf4f3bqiXEj2nQiu-43g%40mail.gmail.com	2022-09-21 10:20:37 +05:30
Peter Geoghegan	bfcf1b3480	Harmonize parameter names in storage and AM code. Make sure that function declarations use names that exactly match the corresponding names from function definitions in storage, catalog, access method, executor, and logical replication code, as well as in miscellaneous utility/library code. Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Later commits will do the same for other parts of the codebase. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com	2022-09-19 19:18:36 -07:00
Peter Geoghegan	4bac9600f0	Harmonize heapam and tableam parameter names. Make sure that function declarations use names that exactly match the corresponding names from function definitions. Having parameter names that are reliably consistent in this way will make it easier to reason about groups of related C functions from the same translation unit as a module. It will also make certain refactoring tasks easier. Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Later commits will do the same for other parts of the codebase. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com	2022-09-19 16:46:23 -07:00
Peter Geoghegan	f66d997fd0	Harmonize missed reorderbuffer parameter names. The function ReorderBufferCommitChild() was overlooked by initial work from commit `035ce1fe`. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkhzFESnRo+VaGqyEZuzc33Dw09BdZBVmW896Sa22ci_A@mail.gmail.com	2022-09-18 12:05:07 -07:00
Peter Geoghegan	035ce1feb2	Harmonize reorderbuffer parameter names. Make reorderbuffer.h function declarations consistently use named parameters. Also make sure that the declarations use names that match corresponding names from function definitions in reorderbuffer.c. This makes the definitions easier to follow, especially in the case of functions that happen to have adjoining arguments of the same type. This patch was written with help from clang-tidy. Specifically, its "readability-inconsistent-declaration-parameter-name" check and its "readability-named-parameter" check were used. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/3955318.1663377656@sss.pgh.pa.us	2022-09-17 17:20:17 -07:00
Tom Lane	0a20ff54f5	Split up guc.c for better build speed and ease of maintenance. guc.c has grown to be one of our largest .c files, making it a bottleneck for compilation. It's also acquired a bunch of knowledge that'd be better kept elsewhere, because of our not very good habit of putting variable-specific check hooks here. Hence, split it up along these lines: * guc.c itself retains just the core GUC housekeeping mechanisms. * New file guc_funcs.c contains the SET/SHOW interfaces and some SQL-accessible functions for GUC manipulation. * New file guc_tables.c contains the data arrays that define the built-in GUC variables, along with some already-exported constant tables. * GUC check/assign/show hook functions are moved to the variable's home module, whenever that's clearly identifiable. A few hard- to-classify hooks ended up in commands/variable.c, which was already a home for miscellaneous GUC hook functions. To avoid cluttering a lot more header files with #include "guc.h", I also invented a new header file utils/guc_hooks.h and put all the GUC hook functions' declarations there, regardless of their originating module. That allowed removal of #include "guc.h" from some existing headers. The fallout from that (hopefully all caught here) demonstrates clearly why such inclusions are best minimized: there are a lot of files that, for example, were getting array.h at two or more levels of remove, despite not having any connection at all to GUCs in themselves. There is some very minor code beautification here, such as renaming a couple of inconsistently-named hook functions and improving some comments. But mostly this just moves code from point A to point B and deals with the ensuing needs for #include adjustments and exporting a few functions that previously weren't exported. Patch by me, per a suggestion from Andres Freund; thanks also to Michael Paquier for the idea to invent guc_funcs.c. Discussion: https://postgr.es/m/587607.1662836699@sss.pgh.pa.us	2022-09-13 11:11:45 -04:00
Michael Paquier	49e525a08f	Fix comment in walsender_private.h All the members of the stucture are protected by the spinlock WalSnd, but a comment referred to "replyTime" and "latch" as not being in the set of what gets protected, contrary to what walsender.c does. Author: Bharath Rupireddy Discussion: https://postgr.es/m/CALj2ACWE_7srye4_GZ=N=-rD=qr2WHL9GZrMnhWJOJ5RdnNS2A@mail.gmail.com	2022-08-22 10:02:53 +09:00
Thomas Munro	5579388d2d	Remove replacement code for getaddrinfo. SUSv3, all targeted Unixes and modern Windows have getaddrinfo() and related interfaces. Drop the replacement implementation, and adjust some headers slightly to make sure that the APIs are visible everywhere using standard POSIX headers and names. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKG%2BL_3brvh%3D8e0BW_VfX9h7MtwgN%3DnFHP5o7X2oZucY9dg%40mail.gmail.com	2022-08-14 09:53:28 +12:00
Amit Kapila	7f13ac8123	Fix catalog lookup with the wrong snapshot during logical decoding. Previously, we relied on HEAP2_NEW_CID records and XACT_INVALIDATION records to know if the transaction has modified the catalog, and that information is not serialized to snapshot. Therefore, after the restart, if the logical decoding decodes only the commit record of the transaction that has actually modified a catalog, we will miss adding its XID to the snapshot. Thus, we will end up looking at catalogs with the wrong snapshot. To fix this problem, this change adds the list of transaction IDs and sub-transaction IDs, that have modified catalogs and are running during snapshot serialization, to the serialized snapshot. After restart or otherwise, when we restore from such a serialized snapshot, the corresponding list is restored in memory. Now, when decoding a COMMIT record, we check both the list and the ReorderBuffer to see if the transaction has modified catalogs. Since this adds additional information to the serialized snapshot, we cannot backpatch it. For back branches, we took another approach. We remember the last-running-xacts list of the decoded RUNNING_XACTS record after restoring the previously serialized snapshot. Then, we mark the transaction as containing catalog changes if it's in the list of initial running transactions and its commit record has XACT_XINFO_HAS_INVALS. This doesn't require any file format changes but the transaction will end up being added to the snapshot even if it has only relcache invalidations. But that won't be a problem since we use snapshot built during decoding only to read system catalogs. This commit bumps SNAPBUILD_VERSION because of a change in SnapBuild. Reported-by: Mike Oh Author: Masahiko Sawada Reviewed-by: Amit Kapila, Shi yu, Takamichi Osumi, Kyotaro Horiguchi, Bertrand Drouvot, Ahsan Hadi Backpatch-through: 10 Discussion: https://postgr.es/m/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2%40amazon.com	2022-08-11 10:09:24 +05:30
Robert Haas	a8c0128697	Move basebackup code to new directory src/backend/backup Reviewed by David Steele and Justin Pryzby Discussion: http://postgr.es/m/CA+TgmoafqboATDSoXHz8VLrSwK_MDhjthK4hEpYjqf9_1Fmczw%40mail.gmail.com	2022-08-10 14:03:23 -04:00
John Naylor	bcabbfc6a9	Fix formatting and comment typos Justin Pryzby Discussion: https://www.postgresql.org/message-id/20220801181136.GJ15006%40telsasoft.com	2022-08-04 16:41:29 +07:00
Michael Paquier	245e14e28b	Fix inconsistent comments for some function declarations in headers Some of the headers list a couple of function prototypes located in a different file than what is referred to. This fixes a couple of places where this inconsistency exists. Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs4__RdcSNXPa7L62Ozvo_Q4LvT60o3Bnp8yrQ_m9y5CKvg@mail.gmail.com	2022-08-04 17:36:21 +09:00
Amit Kapila	366283961a	Allow users to skip logical replication of data having origin. This patch adds a new SUBSCRIPTION parameter "origin". It specifies whether the subscription will request the publisher to only send changes that don't have an origin or send changes regardless of origin. Setting it to "none" means that the subscription will request the publisher to only send changes that have no origin associated. Setting it to "any" means that the publisher sends changes regardless of their origin. The default is "any". Usage: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres port=9999' PUBLICATION pub1 WITH (origin = none); This can be used to avoid loops (infinite replication of the same data) among replication nodes. This feature allows filtering only the replication data originating from WAL but for initial sync (initial copy of table data) we don't have such a facility as we can only distinguish the data based on origin from WAL. As a follow-up patch, we are planning to forbid the initial sync if the origin is specified as none and we notice that the publication tables were also replicated from other publishers to avoid duplicate data or loops. We forbid to allow creating origin with names 'none' and 'any' to avoid confusion with the same name options. Author: Vignesh C, Amit Kapila Reviewed-By: Peter Smith, Amit Kapila, Dilip Kumar, Shi yu, Ashutosh Bapat, Hayato Kuroda Discussion: https://postgr.es/m/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com	2022-07-21 08:47:38 +05:30
Andres Freund	f2b73c8d75	Add central declarations for dlsym()ed symbols This is in preparation for defaulting to -fvisibility=hidden in extensions, instead of exporting all symbols. For that symbols intended to be exported need to be tagged with PGDLLEXPORT. Most extensions only need to do so for _PG_init() and functions defined with PG_FUNCTION_INFO_V1. Adding central declarations avoids each extension having to add PGDLLEXPORT. Any existing declarations in extensions will continue to work if fmgr.h is included before them, otherwise compilation for Windows will fail. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/20211101020311.av6hphdl6xbjbuif@alap3.anarazel.de	2022-07-17 17:23:42 -07:00
Robert Haas	b0a55e4329	Change internal RelFileNode references to RelFileNumber or RelFileLocator. We have been using the term RelFileNode to refer to either (1) the integer that is used to name the sequence of files for a certain relation within the directory set aside for that tablespace/database combination; or (2) that value plus the OIDs of the tablespace and database; or occasionally (3) the whole series of files created for a relation based on those values. Using the same name for more than one thing is confusing. Replace RelFileNode with RelFileNumber when we're talking about just the single number, i.e. (1) from above, and with RelFileLocator when we're talking about all the things that are needed to locate a relation's files on disk, i.e. (2) from above. In the places where we refer to (3) as a relfilenode, instead refer to "relation storage". Since there is a ton of SQL code in the world that knows about pg_class.relfilenode, don't change the name of that column, or of other SQL-facing things that derive their name from it. On the other hand, do adjust closely-related internal terminology. For example, the structure member names dbNode and spcNode appear to be derived from the fact that the structure itself was called RelFileNode, so change those to dbOid and spcOid. Likewise, various variables with names like rnode and relnode get renamed appropriately, according to how they're being used in context. Hopefully, this is clearer than before. It is also preparation for future patches that intend to widen the relfilenumber fields from its current width of 32 bits. Variables that store a relfilenumber are now declared as type RelFileNumber rather than type Oid; right now, these are the same, but that can now more easily be changed. Dilip Kumar, per an idea from me. Reviewed also by Andres Freund. I fixed some whitespace issues, changed a couple of words in a comment, and made one other minor correction. Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com	2022-07-06 11:39:09 -04:00
Amit Kapila	b7658c24c7	Fix data inconsistency between publisher and subscriber. We were not updating the partition map cache in the subscriber even when the corresponding remote rel is changed. Due to this data was getting incorrectly replicated for partition tables after the publisher has changed the table schema. Fix it by resetting the required entries in the partition map cache after receiving a new relation mapping from the publisher. Reported-by: Shi Yu Author: Shi Yu, Hou Zhijie Reviewed-by: Amit Langote, Amit Kapila Backpatch-through: 13, where it was introduced Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com	2022-06-16 08:45:07 +05:30
Andres Freund	905c020bef	Add missing 'extern' to function prototypes. Postgres style is to spell out extern. Noticed while scripting adding PGDLLIMPORT markers to functions. Discussion: https://postgr.es/m/20220512164513.vaheofqp2q24l65r@alap3.anarazel.de	2022-05-12 12:39:33 -07:00
Tom Lane	23e7b38bfe	Pre-beta mechanical code beautification. Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.	2022-05-12 15:17:30 -04:00
Amit Kapila	f95d53eded	Fix the logical replication timeout during large transactions. The problem is that we don't send keep-alive messages for a long time while processing large transactions during logical replication where we don't send any data of such transactions. This can happen when the table modified in the transaction is not published or because all the changes got filtered. We do try to send the keep_alive if necessary at the end of the transaction (via WalSndWriteData()) but by that time the subscriber-side can timeout and exit. To fix this we try to send the keepalive message if required after processing certain threshold of changes. Reported-by: Fabrice Chapuis Author: Wang wei and Amit Kapila Reviewed By: Masahiko Sawada, Euler Taveira, Hou Zhijie, Hayato Kuroda Backpatch-through: 10 Discussion: https://postgr.es/m/CAA5-nLARN7-3SLU_QUxfy510pmrYK6JJb=bk3hcgemAM_pAv+w@mail.gmail.com	2022-05-11 11:11:44 +05:30
David Rowley	a00fd066b1	Add missing spaces after single-line comments Only 1 of 3 of these changes appear to be handled by pgindent. That change is new to v15. The remaining two appear to be left alone by pgindent. The exact reason for that is not 100% clear to me. It seems related to the fact that it's a line that contains only a single line comment and no actual code. It does not seem worth investigating this in too much detail. In any case, these do not conform to our usual practices, so fix them. Author: Justin Pryzby Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com	2022-04-14 09:28:56 +12:00
Michael Paquier	a4b57543ac	Rename backup_compression.{c,h} to compression.{c,h} Compression option handling (level, algorithm or even workers) can be used across several parts of the system and not only base backups. Structures, objects and routines are renamed in consequence, to remove the concept of base backups from this part of the code making this change straight-forward. pg_receivewal, that has gained support for LZ4 since `babbbb5`, will make use of this infrastructure for its set of compression options, bringing more consistency with pg_basebackup. This cleanup needs to be done before releasing a beta of 15. pg_dump is a potential future target, as well, and adding more compression options to it may happen in 16~. Author: Michael Paquier Reviewed-by: Robert Haas, Georgios Kokolatos Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz	2022-04-12 13:38:54 +09:00
Robert Haas	8ec569479f	Apply PGDLLIMPORT markings broadly. Up until now, we've had a policy of only marking certain variables in the PostgreSQL header files with PGDLLIMPORT, but now we've decided to mark them all. This means that extensions running on Windows should no longer operate at a disadvantage as compared to extensions running on Linux: if the variable is present in a header file, it should be accessible. Discussion: http://postgr.es/m/CA+TgmoYanc1_FSfimhgiWSqVyP5KKmh5NP2BWNwDhO8Pg2vGYQ@mail.gmail.com	2022-04-08 08:16:38 -04:00
Tomas Vondra	2c7ea57e56	Revert "Logical decoding of sequences" This reverts a sequence of commits, implementing features related to logical decoding and replication of sequences: - `0da92dc530` - `80901b3291` - `b779d7d8fd` - `d5ed9da41d` - `a180c2b34d` - `75b1521dae` - `2d2232933b` - `002c9dd97a` - `05843b1aa4` The implementation has issues, mostly due to combining transactional and non-transactional behavior of sequences. It's not clear how this could be fixed, but it'll require reworking significant part of the patch. Discussion: https://postgr.es/m/95345a19-d508-63d1-860a-f5c2f41e8d40@enterprisedb.com	2022-04-07 20:06:36 +02:00
Andres Freund	e41aed674f	pgstat: revise replication slot API in preparation for shared memory stats. Previously the pgstat <-> replication slots API was done with on the basis of names. However, the upcoming move to storing stats in shared memory makes it more convenient to use a integer as key. Change the replication slot functions to take the slot rather than the slot name, and expose ReplicationSlotIndex() to compute the index of an replication slot. Special handling will be required for restarts, as the index is not stable across restarts. For now pgstat internally still uses names. Rename pgstat_report_replslot_{create,drop}() to pgstat_{create,drop}_replslot() to match the functions for other kinds of stats. Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20220404041516.cctrvpadhuriawlq@alap3.anarazel.de	2022-04-06 18:38:24 -07:00
Amit Kapila	d5a9d86d8f	Skip empty transactions for logical replication. The current logical replication behavior is to send every transaction to subscriber even if the transaction is empty. This can happen because transaction doesn't contain changes from the selected publications or all the changes got filtered. It is a waste of CPU cycles and network bandwidth to build/transmit these empty transactions. This patch addresses the above problem by postponing the BEGIN message until the first change is sent. While processing a COMMIT message, if there was no other change for that transaction, do not send the COMMIT message. This allows us to skip sending BEGIN/COMMIT messages for empty transactions. When skipping empty transactions in synchronous replication mode, we send a keepalive message to avoid delaying such transactions. Author: Ajin Cherian, Hou Zhijie, Euler Taveira Reviewed-by: Peter Smith, Takamichi Osumi, Shi Yu, Masahiko Sawada, Greg Nancarrow, Vignesh C, Amit Kapila Discussion: https://postgr.es/m/CAMkU=1yohp9-dv48FLoSPrMqYEyyS5ZWkaZGD41RJr10xiNo_Q@mail.gmail.com	2022-03-30 07:41:05 +05:30
Tomas Vondra	923def9a53	Allow specifying column lists for logical replication This allows specifying an optional column list when adding a table to logical replication. The column list may be specified after the table name, enclosed in parentheses. Columns not included in this list are not sent to the subscriber, allowing the schema on the subscriber to be a subset of the publisher schema. For UPDATE/DELETE publications, the column list needs to cover all REPLICA IDENTITY columns. For INSERT publications, the column list is arbitrary and may omit some REPLICA IDENTITY columns. Furthermore, if the table uses REPLICA IDENTITY FULL, column list is not allowed. The column list can contain only simple column references. Complex expressions, function calls etc. are not allowed. This restriction could be relaxed in the future. During the initial table synchronization, only columns included in the column list are copied to the subscriber. If the subscription has several publications, containing the same table with different column lists, columns specified in any of the lists will be copied. This means all columns are replicated if the table has no column list at all (which is treated as column list with all columns), or when of the publications is defined as FOR ALL TABLES (possibly IN SCHEMA that matches the schema of the table). For partitioned tables, publish_via_partition_root determines whether the column list for the root or the leaf relation will be used. If the parameter is 'false' (the default), the list defined for the leaf relation is used. Otherwise, the column list for the root partition will be used. Psql commands \dRp+ and \d <table-name> now display any column lists. Author: Tomas Vondra, Alvaro Herrera, Rahila Syed Reviewed-by: Peter Eisentraut, Alvaro Herrera, Vignesh C, Ibrar Ahmed, Amit Kapila, Hou zj, Peter Smith, Wang wei, Tang, Shi yu Discussion: https://postgr.es/m/CAH2L28vddB_NFdRVpuyRBJEBWjz4BSyTB=_ektNRH8NJ1jf95g@mail.gmail.com	2022-03-26 01:01:27 +01:00
Tomas Vondra	75b1521dae	Add decoding of sequences to built-in replication This commit adds support for decoding of sequences to the built-in replication (the infrastructure was added by commit `0da92dc530`). The syntax and behavior mostly mimics handling of tables, i.e. a publication may be defined as FOR ALL SEQUENCES (replicating all sequences in a database), FOR ALL SEQUENCES IN SCHEMA (replicating all sequences in a particular schema) or individual sequences. To publish sequence modifications, the publication has to include 'sequence' action. The protocol is extended with a new message, describing sequence increments. A new system view pg_publication_sequences lists all the sequences added to a publication, both directly and indirectly. Various psql commands (\d and \dRp) are improved to also display publications including a given sequence, or sequences included in a publication. Author: Tomas Vondra, Cary Huang Reviewed-by: Peter Eisentraut, Amit Kapila, Hannu Krosing, Andres Freund, Petr Jelinek Discussion: https://postgr.es/m/d045f3c2-6cfb-06d3-5540-e63c320df8bc@enterprisedb.com Discussion: https://postgr.es/m/1710ed7e13b.cd7177461430746.3372264562543607781@highgo.ca	2022-03-24 18:49:27 +01:00
Robert Haas	ffd53659c4	Replace BASE_BACKUP COMPRESSION_LEVEL option with COMPRESSION_DETAIL. There are more compression parameters that can be specified than just an integer compression level, so rename the new COMPRESSION_LEVEL option to COMPRESSION_DETAIL before it gets released. Introduce a flexible syntax for that option to allow arbitrary options to be specified without needing to adjust the main replication grammar, and common code to parse it that is shared between the client and the server. This commit doesn't actually add any new compression parameters, so the only user-visible change is that you can now type something like pg_basebackup --compress gzip:level=5 instead of writing just pg_basebackup --compress gzip:5. However, it should make it easy to add new options. If for example gzip starts offering fries, we can support pg_basebackup --compress gzip:level=5,fries=true for the benefit of users who want fries with that. Along the way, this fixes a few things in pg_basebackup so that the pg_basebackup can be used with a server-side compression algorithm that pg_basebackup itself does not understand. For example, pg_basebackup --compress server-lz4 could still succeed even if only the server and not the client has LZ4 support, provided that the other options to pg_basebackup don't require the client to decompress the archive. Patch by me. Reviewed by Justin Pryzby and Dagfinn Ilmari Mannsåker. Discussion: http://postgr.es/m/CA+TgmoYvpetyRAbbg1M8b3-iHsaN4nsgmWPjOENu5-doHuJ7fA@mail.gmail.com	2022-03-23 09:19:14 -04:00
Robert Haas	e4ba69f3f4	Allow extensions to add new backup targets. Commit `3500ccc39b` allowed for base backup targets, meaning that we could do something with the backup other than send it to the client, but all of those targets had to be baked in to the core code. This commit makes it possible for extensions to define additional backup targets. Patch by me, reviewed by Abhijit Menon-Sen. Discussion: http://postgr.es/m/CA+TgmoaqvdT-u3nt+_kkZ7bgDAyqDB0i-+XOMmr5JN2Rd37hxw@mail.gmail.com	2022-03-15 13:22:04 -04:00
Robert Haas	7cf085f077	Add support for zstd base backup compression. Both client-side compression and server-side compression are now supported for zstd. In addition, a backup compressed by the server using zstd can now be decompressed by the client in order to accommodate the use of -Fp. Jeevan Ladhe, with some edits by me. Discussion: http://postgr.es/m/CA+Tgmobyzfbz=gyze2_LL1ZumZunmaEKbHQxjrFkOR7APZGu-g@mail.gmail.com	2022-03-08 09:52:43 -05:00
Amit Kapila	52e4f0cd47	Allow specifying row filters for logical replication of tables. This feature adds row filtering for publication tables. When a publication is defined or modified, an optional WHERE clause can be specified. Rows that don't satisfy this WHERE clause will be filtered out. This allows a set of tables to be partially replicated. The row filter is per table. A new row filter can be added simply by specifying a WHERE clause after the table name. The WHERE clause must be enclosed by parentheses. The row filter WHERE clause for a table added to a publication that publishes UPDATE and/or DELETE operations must contain only columns that are covered by REPLICA IDENTITY. The row filter WHERE clause for a table added to a publication that publishes INSERT can use any column. If the row filter evaluates to NULL, it is regarded as "false". The WHERE clause only allows simple expressions that don't have user-defined functions, user-defined operators, user-defined types, user-defined collations, non-immutable built-in functions, or references to system columns. These restrictions could be addressed in the future. If you choose to do the initial table synchronization, only data that satisfies the row filters is copied to the subscriber. If the subscription has several publications in which a table has been published with different WHERE clauses, rows that satisfy ANY of the expressions will be copied. If a subscriber is a pre-15 version, the initial table synchronization won't use row filters even if they are defined in the publisher. The row filters are applied before publishing the changes. If the subscription has several publications in which the same table has been published with different filters (for the same publish operation), those expressions get OR'ed together so that rows satisfying any of the expressions will be replicated. This means all the other filters become redundant if (a) one of the publications have no filter at all, (b) one of the publications was created using FOR ALL TABLES, (c) one of the publications was created using FOR ALL TABLES IN SCHEMA and the table belongs to that same schema. If your publication contains a partitioned table, the publication parameter publish_via_partition_root determines if it uses the partition's row filter (if the parameter is false, the default) or the root partitioned table's row filter. Psql commands \dRp+ and \d <table-name> will display any row filters. Author: Hou Zhijie, Euler Taveira, Peter Smith, Ajin Cherian Reviewed-by: Greg Nancarrow, Haiying Tang, Amit Kapila, Tomas Vondra, Dilip Kumar, Vignesh C, Alvaro Herrera, Andres Freund, Wei Wang Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X%2BmK_DitLXF25B%3DjE6xyNCH4YOwM860JR7HarGQ%40mail.gmail.com	2022-02-22 08:11:50 +05:30
Andres Freund	2f6501fa3c	Move replication slot release to before_shmem_exit(). Previously, replication slots were released in ProcKill() on error, resulting in reporting replication slot drop of ephemeral slots after the stats subsystem was already shut down. To fix this problem, move replication slot release to a before_shmem_exit() hook that is called before the stats collector shuts down. There wasn't really a good reason for the slot handling to be in ProcKill() anyway. Patch by Masahiko Sawada, with very minor polishing by me. I, Andres, wrote a test for dropping slots during process exit, but there may be some OS dependent issues around the number of times FATAL error messages are displayed due to a still debated libpq issue. So that test will be committed separately / later. Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Author: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoDAeEpAbZEyYJsPZJUmSPaRicVSBObaL7sPaofnKz+9zg@mail.gmail.com	2022-02-14 17:08:17 -08:00
Robert Haas	dab298471f	Add suport for server-side LZ4 base backup compression. LZ4 compression can be a lot faster than gzip compression, so users may prefer it even if the compression ratio is not as good. We will want pg_basebackup to support LZ4 compression and decompression on the client side as well, and there is a pending patch for that, but it's by a different author, so I am committing this part separately for that reason. Jeevan Ladhe, reviewed by Tushar Ahuja and by me. Discussion: http://postgr.es/m/CANm22Cg9cArXEaYgHVZhCnzPLfqXCZLAzjwTq7Fc0quXRPfbxA@mail.gmail.com	2022-02-11 08:29:38 -05:00
Tomas Vondra	0da92dc530	Logical decoding of sequences This extends the logical decoding to also decode sequence increments. We differentiate between sequences created in the current (in-progress) transaction, and sequences created earlier. This mixed behavior is necessary because while sequences are not transactional (increments are not subject to ROLLBACK), relfilenode changes are. So we do this: * Changes for sequences created in the same top-level transaction are treated as transactional, i.e. just like any other change from that transaction, and discarded in case of a rollback. * Changes for sequences created earlier are applied immediately, as if performed outside any transaction. This applies also after ALTER SEQUENCE, which may create a new relfilenode. Moreover, if we ever get support for DDL replication, the sequence won't exist until the transaction gets applied. Sequences created in the current transaction are tracked in a simple hash table, identified by a relfilenode. That means a sequence may already exist, but if a transaction does ALTER SEQUENCE then the increments for the new relfilenode will be treated as transactional. For each relfilenode we track the XID of (sub)transaction that created it, which is needed for cleanup at transaction end. We don't need to check the XID to decide if an increment is transactional - if we find a match in the hash table, it has to be the same transaction. This requires two minor changes to WAL-logging. Firstly, we need to ensure the sequence record has a valid XID - until now the the increment might have XID 0 if it was the first change in a subxact. But the sequence might have been created in the same top-level transaction. So we ensure the XID is assigned when WAL-logging increments. The other change is addition of "created" flag, marking increments for newly created relfilenodes. This makes it easier to maintain the hash table of sequences that need transactional handling. Note: This is needed because of subxacts. A XID 0 might still have the sequence created in a different subxact of the same top-level xact. This does not include any changes to test_decoding and/or the built-in replication - those will be committed in separate patches. A patch adding decoding of sequences was originally submitted by Cary Huang. This commit reworks various important aspects (e.g. the WAL logging and transactional/non-transactional handling). However, the original patch and reviews were very useful. Author: Tomas Vondra, Cary Huang Reviewed-by: Peter Eisentraut, Hannu Krosing, Andres Freund Discussion: https://postgr.es/m/d045f3c2-6cfb-06d3-5540-e63c320df8bc@enterprisedb.com Discussion: https://postgr.es/m/1710ed7e13b.cd7177461430746.3372264562543607781@highgo.ca	2022-02-10 18:43:51 +01:00
Robert Haas	0d4513b613	Remove server support for the previous base backup protocol. Commit `cc333f3233` added a new COPY sub-protocol for taking base backups, but retained support for the previous protocol. For the same reasons articulated in the message for commit `9cd28c2e5f`, remove support for the previous protocol from the server. Discussion: http://postgr.es/m/CA+TgmoazKcKUWtqVa0xZqSzbKgTH+X-aw4V7GyLD68EpDLMh8A@mail.gmail.com	2022-02-10 12:12:43 -05:00
Michael Paquier	410aa248e5	Fix various typos, grammar and code style in comments and docs This fixes a set of issues that have accumulated over the past months (or years) in various code areas. Most fixes are related to some recent additions, as of the development of v15. Author: Justin Pryzby Discussion: https://postgr.es/m/20220124030001.GQ23027@telsasoft.com	2022-01-25 09:40:04 +09:00
Tom Lane	6aa5186146	Fix limitations on what SQL commands can be issued to a walsender. In logical replication mode, a WalSender is supposed to be able to execute any regular SQL command, as well as the special replication commands. Poor design of the replication-command parser caused it to fail in various cases, notably: * semicolons embedded in a command, or multiple SQL commands sent in a single message; * dollar-quoted literals containing odd numbers of single or double quote marks; * commands starting with a comment. The basic problem here is that we're trying to run repl_scanner.l across the entire input string even when it's not a replication command. Since repl_scanner.l does not understand all of the token types known to the core lexer, this is doomed to have failure modes. We certainly don't want to make repl_scanner.l as big as scan.l, so instead rejigger stuff so that we only lex the first token of a non-replication command. That will usually look like an IDENT to repl_scanner.l, though a comment would end up getting reported as a '-' or '/' single-character token. If the token is a replication command keyword, we push it back and proceed normally with repl_gram.y parsing. Otherwise, we can drop out of exec_replication_command() without examining the rest of the string. (It's still theoretically possible for repl_scanner.l to fail on the first token; but that could only happen if it's an unterminated single- or double-quoted string, in which case you'd have gotten largely the same error from the core lexer too.) In this way, repl_gram.y isn't involved at all in handling general SQL commands, so we can get rid of the SQLCmd node type. (In the back branches, we can't remove it because renumbering enum NodeTag would be an ABI break; so just leave it sit there unused.) I failed to resist the temptation to clean up some other sloppy coding in repl_scanner.l while at it. The only externally-visible behavior change from that is it now accepts \r and \f as whitespace, same as the core lexer. Per bug #17379 from Greg Rychlewski. Back-patch to all supported branches. Discussion: https://postgr.es/m/17379-6a5c6cfb3f1f5e77@postgresql.org	2022-01-24 15:33:38 -05:00
Robert Haas	0ad8032910	Server-side gzip compression. pg_basebackup's --compression option now lets you write either "client-gzip" or "server-gzip" instead of just "gzip" to specify where the compression should be performed. If you write simply "gzip" it's taken to mean "client-gzip" unless you also use --target, in which case it is interpreted to mean "server-gzip", because that's the only thing that makes any sense in that case. To make this work, the BASE_BACKUP command now takes new COMPRESSION and COMPRESSION_LEVEL options. At present, pg_basebackup cannot decompress .gz files, so server-side compression will cause a failure if (1) -Ft is not used or (2) -R is used or (3) -D- is used without --no-manifest. Along the way, I removed the information message added by commit `5c649fe153` which occurred if you specified no compression level and told you that the default level had been used instead. That seemed like more output than most people would want. Also along the way, this adds a check to the server for unrecognized base backup options. This repairs a bug introduced by commit `0ba281cb4b`. This commit also adds some new test cases for pg_verifybackup. They take a server-side backup with and without compression, and then extract the backup if we have the OS facilities available to do so, and then run pg_verifybackup on the extracted directory. That is a good test of the functionality added by this commit and also improves test coverage for the backup target patch (commit `3500ccc39b`) and for pg_verifybackup itself. Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which this is a part has also had review and/or testing from Tushar Ahuja, Suraj Kharage, Dipesh Pandit, and Mark Dilger. Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com	2022-01-24 15:13:18 -05:00
Robert Haas	3500ccc39b	Support base backup targets. pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied, it is sent to the server as the value of the TARGET option to the BASE_BACKUP command. If DETAIL is included, it is sent as the value of the new TARGET_DETAIL option to the BASE_BACKUP command. If the target is anything other than 'client', pg_basebackup assumes that it will now be the server's job to write the backup in a location somehow defined by the target, and that it therefore needs to write nothing locally. However, the server will still send messages to the client for progress reporting purposes. On the server side, we now support two additional types of backup targets. There is a 'blackhole' target, which just throws away the backup data without doing anything at all with it. Naturally, this should only be used for testing and debugging purposes, since you will not actually have a backup when it finishes running. More usefully, there is also a 'server' target, so you can now use something like 'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some location on the server. We can extend this to more types of targets in the future, and might even want to create an extensibility mechanism for adding new target types. Since WAL fetching is handled with separate client-side logic, it's not part of this mechanism; thus, backups with non-default targets must use -Xnone or -Xfetch. Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which this is a part has also had review and/or testing from Tushar Ahuja, Suraj Kharage, Dipesh Pandit, and Mark Dilger. Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com	2022-01-20 10:46:33 -05:00
Jeff Davis	7a5f6b4748	Make logical decoding a part of the rmgr. Add a new rmgr method, rm_decode, and use that rather than a switch statement. In preparation for rmgr extensibility. Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.camel%40j-davis.com Discussion: https://postgr.es/m/20220118095332.6xtlcjoyxobv6cbk@jrouhaud	2022-01-19 14:58:49 -08:00
Robert Haas	cc333f3233	Modify pg_basebackup to use a new COPY subprotocol for base backups. In the new approach, all files across all tablespaces are sent in a single COPY OUT operation. The CopyData messages are no longer raw archive content; rather, each message is prefixed with a type byte that describes its purpose, e.g. 'n' signifies the start of a new archive and 'd' signifies archive or manifest data. This protocol is significantly more extensible than the old approach, since we can later create more message types, though not without concern for backward compatibility. The new protocol sends a few things to the client that the old one did not. First, it sends the name of each archive explicitly, instead of letting the client compute it. This is intended to make it easier to write future patches that might send archives in a format other that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress messages rather than allowing the client to assume that progress is defined by the number of bytes received. This will help with future features where the server compresses the data, or sends it someplace directly rather than transmitting it to the client. The old protocol is still supported for compatibility with previous releases. The new protocol is selected by means of a new TARGET option to the BASE_BACKUP command. Currently, the only supported target is 'client'. Support for additional targets will be added in a later commit. Patch by me. The patch set of which this is a part has had review and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage, Dipesh Pandit, and Mark Dilger. Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com	2022-01-18 13:47:49 -05:00
Bruce Momjian	27b77ecf9f	Update copyright for 2022 Backpatch-through: 10	2022-01-07 19:04:57 -05:00
Alvaro Herrera	ad26ee2825	Fix headerscheck failure in replication/worker_internal.h Broken by `31c389d8de`	2021-11-16 13:30:37 -03:00
Robert Haas	bef47ff85d	Introduce 'bbsink' abstraction to modularize base backup code. The base backup code has accumulated a healthy number of new features over the years, but it's becoming increasingly difficult to maintain and further enhance that code because there's no real separation of concerns. For example, the code that understands knows the details of how we send data to the client using the libpq protocol is scattered throughout basebackup.c, rather than being centralized in one place. To try to improve this situation, introduce a new 'bbsink' object which acts as a recipient for archives generated during the base backup progress and also for the backup manifest. This commit introduces three types of bbsink: a 'copytblspc' bbsink forwards the backup to the client using one COPY OUT operation per tablespace and another for the manifest, a 'progress' bbsink performs command progress reporting, and a 'throttle' bbsink performs rate-limiting. The 'progress' and 'throttle' bbsink types also forward the data to a successor bbsink; at present, the last bbsink in the chain will always be of type 'copytblspc'. There are plans to add more types of 'bbsink' in future commits. This abstraction is a bit leaky in the case of progress reporting, but this still seems cleaner than what we had before. Patch by me, reviewed and tested by Andres Freund, Sumanta Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja, Mark Dilger, and Jeevan Ladhe. Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com	2021-11-05 10:08:30 -04:00
Michael Paquier	409f9ca447	Reset properly snapshot export state during transaction abort During a replication slot creation, an ERROR generated in the same transaction as the one creating a to-be-exported snapshot would have left the backend in an inconsistent state, as the associated static export snapshot state was not being reset on transaction abort, but only on the follow-up command received by the WAL sender that created this snapshot on replication slot creation. This would trigger inconsistency failures if this session tried to export again a snapshot, like during the creation of a replication slot. Note that a snapshot export cannot happen in a transaction block, so there is no need to worry resetting this state for subtransaction aborts. Also, this inconsistent state would very unlikely show up to users. For example, one case where this could happen is an out-of-memory error when building the initial snapshot to-be-exported. Dilip found this problem while poking at a different patch, that caused an error in this code path for reasons unrelated to HEAD. Author: Dilip Kumar Reviewed-by: Michael Paquier, Zhihong Yu Discussion: https://postgr.es/m/CAFiTN-s0zA1Kj0ozGHwkYkHwa5U0zUE94RSc_g81WrpcETB5=w@mail.gmail.com Backpatch-through: 9.6	2021-10-18 11:55:42 +09:00
Michael Paquier	026ed8efd6	Remove code duplication for permission checks with replication slots Two functions, both named check_permissions(), used the same checks to verify if a user had required privileges to work on replication slots. This commit removes the duplication, and moves the function doing the checks to slot.c to be centralized. Author: Bharath Rupireddy Reviewed-by: Nathan Bossart, Euler Taveira Discussion: https://postgr.es/m/CALj2ACUPpVw1u7sQocFVWrSs0n10pt_G_4NPZKSxXK6cW1dErw@mail.gmail.com	2021-09-14 10:15:49 +09:00
Amit Kapila	31c389d8de	Optimize fileset usage in apply worker. Use one fileset for the entire worker lifetime instead of using separate filesets for each streaming transaction. Now, the changes/subxacts files for every streaming transaction will be created under the same fileset and the files will be deleted after the transaction is completed. This patch extends the BufFileOpenFileSet and BufFileDeleteFileSet APIs to allow users to specify whether to give an error on missing files. Author: Dilip Kumar, based on suggestion by Thomas Munro Reviewed-by: Hou Zhijie, Masahiko Sawada, Amit Kapila Discussion: https://postgr.es/m/E1mCC6U-0004Ik-Fs@gemulon.postgresql.org	2021-09-02 08:13:46 +05:30
Amit Kapila	dcac5e7ac1	Refactor sharedfileset.c to separate out fileset implementation. Move fileset related implementation out of sharedfileset.c to allow its usage by backends that don't want to share filesets among different processes. After this split, fileset infrastructure is used by both sharedfileset.c and worker.c for the named temporary files that survive across transactions. Author: Dilip Kumar, based on suggestion by Andres Freund Reviewed-by: Hou Zhijie, Masahiko Sawada, Amit Kapila Discussion: https://postgr.es/m/E1mCC6U-0004Ik-Fs@gemulon.postgresql.org	2021-08-30 08:48:15 +05:30
Amit Kapila	abc0910e2e	Add logical change details to logical replication worker errcontext. Previously, on the subscriber, we set the error context callback for the tuple data conversion failures. This commit replaces the existing error context callback with a comprehensive one so that it shows not only the details of data conversion failures but also the details of logical change being applied by the apply worker or table sync worker. The additional information displayed will be the command, transaction id, and timestamp. The error context is added to an error only when applying a change but not while doing other work like receiving data etc. This will help users in diagnosing the problems that occur during logical replication. It also can be used for future work that allows skipping a particular transaction on the subscriber. Author: Masahiko Sawada Reviewed-by: Hou Zhijie, Greg Nancarrow, Haiying Tang, Amit Kapila Tested-by: Haiying Tang Discussion: https://postgr.es/m/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com	2021-08-27 08:30:23 +05:30
Amit Kapila	4cd7a18968	Rename LOGICAL_REP_MSG_STREAM_END to LOGICAL_REP_MSG_STREAM_STOP. In the code, most places used the term "Stream Stop" for the logical stream message. This commit improves consistency by renaming LogicalRepMsgType "LOGICAL_REP_MSG_STREAM_END" to "LOGICAL_REP_MSG_STREAM_STOP". Author: Masahiko Sawada Reviewed-by: Hou Zhijie, Amit Kapila Discussion: https://postgr.es/m/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com	2021-08-19 09:34:26 +05:30
Amit Kapila	63cf61cdeb	Add prepare API support for streaming transactions in logical replication. Commit `a8fd13cab0` added support for prepared transactions to built-in logical replication via a new option "two_phase" for a subscription. The "two_phase" option was not allowed with the existing streaming option. This commit permits the combination of "streaming" and "two_phase" subscription options. It extends the pgoutput plugin and the subscriber side code to add the prepare API for streaming transactions which will apply the changes accumulated in the spool-file at prepare time. Author: Peter Smith and Ajin Cherian Reviewed-by: Vignesh C, Amit Kapila, Greg Nancarrow Tested-By: Haiying Tang Discussion: https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru Discussion: https://postgr.es/m/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com	2021-08-04 07:47:06 +05:30
Alvaro Herrera	ead9e51e82	Advance old-segment horizon properly after slot invalidation When some slots are invalidated due to the max_slot_wal_keep_size limit, the old segment horizon should move forward to stay within the limit. However, in commit `c655077639` we forgot to call KeepLogSeg again to recompute the horizon after invalidating replication slots. In cases where other slots remained, the limits would be recomputed eventually for other reasons, but if all slots were invalidated, the limits would not move at all afterwards. Repair. Backpatch to 13 where the feature was introduced. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reported-by: Marcin Krupowicz <mk@071.ovh> Discussion: https://postgr.es/m/17103-004130e8f27782c9@postgresql.org	2021-07-16 12:07:30 -04:00
Amit Kapila	a8fd13cab0	Add support for prepared transactions to built-in logical replication. To add support for streaming transactions at prepare time into the built-in logical replication, we need to do the following things: * Modify the output plugin (pgoutput) to implement the new two-phase API callbacks, by leveraging the extended replication protocol. * Modify the replication apply worker, to properly handle two-phase transactions by replaying them on prepare. * Add a new SUBSCRIPTION option "two_phase" to allow users to enable two-phase transactions. We enable the two_phase once the initial data sync is over. We however must explicitly disable replication of two-phase transactions during replication slot creation, even if the plugin supports it. We don't need to replicate the changes accumulated during this phase, and moreover, we don't have a replication connection open so we don't know where to send the data anyway. The streaming option is not allowed with this new two_phase option. This can be done as a separate patch. We don't allow to toggle two_phase option of a subscription because it can lead to an inconsistent replica. For the same reason, we don't allow to refresh the publication once the two_phase is enabled for a subscription unless copy_data option is false. Author: Peter Smith, Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich Reviewed-by: Amit Kapila, Sawada Masahiko, Vignesh C, Dilip Kumar, Takamichi Osumi, Greg Nancarrow Tested-By: Haiying Tang Discussion: https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru Discussion: https://postgr.es/m/CAA4eK1+opiV4aFTmWWUF9h_32=HfPOW9vZASHarT0UA5oBrtGw@mail.gmail.com	2021-07-14 07:33:50 +05:30
Tom Lane	50371df266	Don't try to print data type names in slot_store_error_callback(). The existing code tried to do syscache lookups in an already-failed transaction, which is problematic to say the least. After some consideration of alternatives, the best fix seems to be to just drop type names from the error message altogether. The table and column names seem like sufficient localization. If the user is unsure what types are involved, she can check the local and remote table definitions. Having done that, we can also discard the LogicalRepTypMap hash table, which had no other use. Arguably, LOGICAL_REP_MSG_TYPE replication messages are now obsolete as well; but we should probably keep them in case some other use emerges. (The complexity of removing something from the replication protocol would likely outweigh any savings anyhow.) Masahiko Sawada and Bharath Rupireddy, per complaint from Andres Freund. Back-patch to v10 where this code originated. Discussion: https://postgr.es/m/20210106020229.ne5xnuu6wlondjpe@alap3.anarazel.de	2021-07-02 16:04:54 -04:00
Amit Kapila	4daa140a2f	Fix decoding of speculative aborts. During decoding for speculative inserts, we were relying for cleaning toast hash on confirmation records or next change records. But that could lead to multiple problems (a) memory leak if there is neither a confirmation record nor any other record after toast insertion for a speculative insert in the transaction, (b) error and assertion failures if the next operation is not an insert/update on the same table. The fix is to start queuing spec abort change and clean up toast hash and change record during its processing. Currently, we are queuing the spec aborts for both toast and main table even though we perform cleanup while processing the main table's spec abort record. Later, if we have a way to distinguish between the spec abort record of toast and the main table, we can avoid queuing the change for spec aborts of toast tables. Reported-by: Ashutosh Bapat Author: Dilip Kumar Reviewed-by: Amit Kapila Backpatch-through: 9.6, where it was introduced Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com	2021-06-15 08:28:36 +05:30
Alvaro Herrera	1632ea4368	Return ReplicationSlotAcquire API to its original form Per 96540f80f833; the awkward API introduced by `c655077639` is no longer needed. Author: Andres Freund <andres@anarazel.de> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20210408020913.zzprrlvqyvlt5cyy@alap3.anarazel.de	2021-06-11 15:48:26 -04:00
Peter Eisentraut	b29fa951ec	Add some const decorations One of these functions is new in PostgreSQL 14; might as well start it out right.	2021-06-10 16:21:48 +02:00
Amit Kapila	6f4bdf8152	Fix assertion during streaming of multi-insert toast changes. While decoding the multi-insert WAL we can't clean the toast untill we get the last insert of that WAL record. Now if we stream the changes before we get the last change, the memory for toast chunks won't be released and we expect the txn to have streamed all changes after streaming. This restriction is mainly to ensure the correctness of streamed transactions and it doesn't seem worth uplifting such a restriction just to allow this case because anyway we will stream the transaction once such an insert is complete. Previously we were using two different flags (one for toast tuples and another for speculative inserts) to indicate partial changes. Now instead we replaced both of them with a single flag to indicate partial changes. Reported-by: Pavan Deolasee Author: Dilip Kumar Reviewed-by: Pavan Deolasee, Amit Kapila Discussion: https://postgr.es/m/CABOikdN-_858zojYN-2tNcHiVTw-nhxPwoQS4quExeweQfG1Ug@mail.gmail.com	2021-05-27 07:59:43 +05:30
Alvaro Herrera	db16c65647	Rename the logical replication global "wrconn" The worker.c global wrconn is only meant to be used by logical apply/ tablesync workers, but there are other variables with the same name. To reduce future confusion rename the global from "wrconn" to "LogRepWorkerWalRcvConn". While this is just cosmetic, it seems better to backpatch it all the way back to 10 where this code appeared, to avoid future backpatching issues. Author: Peter Smith <smithpb2250@gmail.com> Discussion: https://postgr.es/m/CAHut+Pu7Jv9L2BOEx_Z0UtJxfDevQSAUW2mJqWU+CtmDrEZVAg@mail.gmail.com	2021-05-12 19:13:54 -04:00
Thomas Munro	c2dc19342e	Revert recovery prefetching feature. This set of commits has some bugs with known fixes, but at this late stage in the release cycle it seems best to revert and resubmit next time, along with some new automated test coverage for this whole area. Commits reverted: `dc88460c`: Doc: Review for "Optionally prefetch referenced data in recovery." `1d257577`: Optionally prefetch referenced data in recovery. `f003d9f8`: Add circular WAL decoding buffer. `323cbe7c`: Remove read_page callback from XLogReader. Remove the new GUC group WAL_RECOVERY recently added by `a55a9847`, as the corresponding section of config.sgml is now reverted. Discussion: https://postgr.es/m/CAOuzzgrn7iKnFRsB4MHp3UisEQAGgZMbk_ViTN4HV4-Ksq8zCg%40mail.gmail.com	2021-05-10 16:06:09 +12:00
Amit Kapila	592f00f8de	Update replication statistics after every stream/spill. Currently, replication slot statistics are updated at prepare, commit, and rollback. Now, if the transaction is interrupted the stats might not get updated. Fixed this by updating replication statistics after every stream/spill. In passing update the docs to change the description of some of the slot stats. Author: Vignesh C, Sawada Masahiko Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de	2021-05-06 11:21:26 +05:30
Amit Kapila	3fa17d3771	Use HTAB for replication slot statistics. Previously, we used to use the array of size max_replication_slots to store stats for replication slots. But that had two problems in the cases where a message for dropping a slot gets lost: 1) the stats for the new slot are not recorded if the array is full and 2) writing beyond the end of the array if the user reduces the max_replication_slots. This commit uses HTAB for replication slot statistics, resolving both problems. Now, pgstat_vacuum_stat() search for all the dead replication slots in stats hashtable and tell the collector to remove them. To avoid showing the stats for the already-dropped slots, pg_stat_replication_slots view searches slot stats by the slot name taken from pg_replication_slots. Also, we send a message for creating a slot at slot creation, initializing the stats. This reduces the possibility that the stats are accumulated into the old slot stats when a message for dropping a slot gets lost. Reported-by: Andres Freund Author: Sawada Masahiko, test case by Vignesh C Reviewed-by: Amit Kapila, Vignesh C, Dilip Kumar Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de	2021-04-27 09:09:11 +05:30
Amit Kapila	f5fc2f5b23	Add information of total data processed to replication slot stats. This adds the statistics about total transactions count and total transaction data logically sent to the decoding output plugin from ReorderBuffer. Users can query the pg_stat_replication_slots view to check these stats. Suggested-by: Andres Freund Author: Vignesh C and Amit Kapila Reviewed-by: Sawada Masahiko, Amit Kapila Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de	2021-04-16 07:34:43 +05:30
Thomas Munro	34399a670a	Remove duplicate typedef. Thinko in commit `323cbe7c`, per complaint from BF animal locust's older GCC compiler.	2021-04-09 00:39:12 +12:00
Thomas Munro	323cbe7c7d	Remove read_page callback from XLogReader. Previously, the XLogReader module would fetch new input data using a callback function. Redesign the interface so that it tells the caller to insert more data with a special return value instead. This API suits later patches for prefetching, encryption and maybe other future projects that would otherwise require continually extending the callback interface. As incidental cleanup work, move global variables readOff, readLen and readSegNo inside XlogReaderState. Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> Author: Heikki Linnakangas <hlinnaka@iki.fi> (parts of earlier version) Reviewed-by: Antonin Houska <ah@cybertec.at> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Takashi Menjo <takashi.menjo@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/20190418.210257.43726183.horiguchi.kyotaro%40lab.ntt.co.jp	2021-04-08 23:20:42 +12:00
Amit Kapila	ac4645c015	Allow pgoutput to send logical decoding messages. The output plugin accepts a new parameter (messages) that controls if logical decoding messages are written into the replication stream. It is useful for those clients that use pgoutput as an output plugin and needs to process messages that were written by pg_logical_emit_message(). Although logical streaming replication protocol supports logical decoding messages now, logical replication does not use this feature yet. Author: David Pirotte, Euler Taveira Reviewed-by: Euler Taveira, Andres Freund, Ashutosh Bapat, Amit Kapila Discussion: https://postgr.es/m/CADK3HHJ-+9SO7KuRLH=9Wa1rAo60Yreq1GFNkH_kd0=CdaWM+A@mail.gmail.com	2021-04-06 08:40:47 +05:30
Amit Kapila	531737ddad	Refactor function parse_output_parameters. Instead of using multiple parameters in parse_ouput_parameters function signature, use the struct PGOutputData that encapsulates all pgoutput options. It will be useful for future work where we need to add other options in pgoutput. Author: Euler Taveira Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CADK3HHJ-+9SO7KuRLH=9Wa1rAo60Yreq1GFNkH_kd0=CdaWM+A@mail.gmail.com	2021-04-06 08:26:31 +05:30
Amit Kapila	f64ea6dc5c	Add a xid argument to the filter_prepare callback for output plugins. Along with gid, this provides a different way to identify the transaction. The users that use xid in some way to prepare the transactions can use it to filter prepare transactions. The later commands COMMIT PREPARED or ROLLBACK PREPARED carries both identifiers, providing an output plugin the choice of what to use. Author: Markus Wanner Reviewed-by: Vignesh C, Amit Kapila Discussion: https://postgr.es/m/ee280000-7355-c4dc-e47b-2436e7be959c@enterprisedb.com	2021-03-30 10:34:43 +05:30
Tom Lane	8620a7f6db	Code review for server's handling of "tablespace map" files. While looking at Robert Foggia's report, I noticed a passel of other issues in the same area: * The scheme for backslash-quoting newlines in pathnames is just wrong; it will misbehave if the last ordinary character in a pathname is a backslash. I'm not sure why we're bothering to allow newlines in tablespace paths, but if we're going to do it we should do it without introducing other problems. Hence, backslashes themselves have to be backslashed too. * The author hadn't read the sscanf man page very carefully, because this code would drop any leading whitespace from the path. (I doubt that a tablespace path with leading whitespace could happen in practice; but if we're bothering to allow newlines in the path, it sure seems like leading whitespace is little less implausible.) Using sscanf for the task of finding the first space is overkill anyway. * While I'm not 100% sure what the rationale for escaping both \r and \n is, if the idea is to allow Windows newlines in the file then this code failed, because it'd throw an error if it saw \r followed by \n. * There's no cross-check for an incomplete final line in the map file, which would be a likely apparent symptom of the improper-escaping bug. On the generation end, aside from the escaping issue we have: * If needtblspcmapfile is true then do_pg_start_backup will pass back escaped strings in tablespaceinfo->path values, which no caller wants or is prepared to deal with. I'm not sure if there's a live bug from that, but it looks like there might be (given the dubious assumption that anyone actually has newlines in their tablespace paths). * It's not being very paranoid about the possibility of random stuff in the pg_tblspc directory. IMO we should ignore anything without an OID-like name. The escaping rule change doesn't seem back-patchable: it'll require doubling of backslashes in the tablespace_map file, which is basically a basebackup format change. The odds of that causing trouble are considerably more than the odds of the existing bug causing trouble. The rest of this seems somewhat unlikely to cause problems too, so no back-patch.	2021-03-17 16:18:46 -04:00
Thomas Munro	de829ddf23	Add condition variable for walreceiver shutdown. Use this new CV to wait for walreceiver shutdown without a sleep/poll loop, while also benefiting from standard postmaster death handling. Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com	2021-03-12 19:45:42 +13:00
Amit Kapila	19890a064e	Add option to enable two_phase commits via pg_create_logical_replication_slot. Commit `0aa8a01d04` extends the output plugin API to allow decoding of prepared xacts and allowed the user to enable/disable the two-phase option via pg_logical_slot_get_changes(). This can lead to a problem such that the first time when it gets changes via pg_logical_slot_get_changes() without two_phase option enabled it will not get the prepared even though prepare is after consistent snapshot. Now next time during getting changes, if the two_phase option is enabled it can skip prepare because by that time start decoding point has been moved. So the user will only get commit prepared. Allow to enable/disable this option at the create slot time and default will be false. It will break the existing slots which is fine in a major release. Author: Ajin Cherian Reviewed-by: Amit Kapila and Vignesh C Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com	2021-03-03 07:34:11 +05:30
Amit Kapila	8bdb1332eb	Avoid repeated decoding of prepared transactions after a restart. In commit `a271a1b50e`, we allowed decoding at prepare time and the prepare was decoded again if there is a restart after decoding it. It was done that way because we can't distinguish between the cases where we have not decoded the prepare because it was prior to consistent snapshot or we have decoded it earlier but restarted. To distinguish between these two cases, we have introduced an initial_consistent_point at the slot level which is an LSN at which we found a consistent point at the time of slot creation. This is also the point where we have exported a snapshot for the initial copy. So, prepare transaction prior to this point are sent along with commit prepared. This commit bumps SNAPBUILD_VERSION because of change in SnapBuild. It will break existing slots which is fine in a major release. Author: Ajin Cherian, based on idea by Andres Freund Reviewed-by: Amit Kapila and Vignesh C Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com	2021-03-01 09:11:18 +05:30
Amit Kapila	d9b0767bec	Fix the warnings introduced in commit `ce0fdbfe97`. Author: Amit Kapila Reviewed-by: Tom Lane Discussion: https://postgr.es/m/1610789.1613170207@sss.pgh.pa.us	2021-02-15 07:28:02 +05:30
Amit Kapila	ce0fdbfe97	Allow multiple xacts during table sync in logical replication. For the initial table data synchronization in logical replication, we use a single transaction to copy the entire table and then synchronize the position in the stream with the main apply worker. There are multiple downsides of this approach: (a) We have to perform the entire copy operation again if there is any error (network breakdown, error in the database operation, etc.) while we synchronize the WAL position between tablesync worker and apply worker; this will be onerous especially for large copies, (b) Using a single transaction in the synchronization-phase (where we can receive WAL from multiple transactions) will have the risk of exceeding the CID limit, (c) The slot will hold the WAL till the entire sync is complete because we never commit till the end. This patch solves all the above downsides by allowing multiple transactions during the tablesync phase. The initial copy is done in a single transaction and after that, we commit each transaction as we receive. To allow recovery after any error or crash, we use a permanent slot and origin to track the progress. The slot and origin will be removed once we finish the synchronization of the table. We also remove slot and origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not finished. The commands ALTER SUBSCRIPTION ... REFRESH PUBLICATION and ALTER SUBSCRIPTION ... SET PUBLICATION ... with refresh option as true cannot be executed inside a transaction block because they can now drop the slots for which we have no provision to rollback. This will also open up the path for logical replication of 2PC transactions on the subscriber side. Previously, we can't do that because of the requirement of maintaining a single transaction in tablesync workers. Bump catalog version due to change of state in the catalog (pg_subscription_rel). Author: Peter Smith, Amit Kapila, and Takamichi Osumi Reviewed-by: Ajin Cherian, Petr Jelinek, Hou Zhijie and Amit Kapila Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com	2021-02-12 07:41:51 +05:30
Amit Kapila	cd142e032e	Make pg_replication_origin_drop safe against concurrent drops. Currently, we get the origin id from the name and then drop the origin by taking ExclusiveLock on ReplicationOriginRelationId. So, two concurrent sessions can get the id from the name at the same time and then when they try to drop the origin, one of the sessions will get the either "tuple concurrently deleted" or "cache lookup failed for replication origin ..". To prevent this race condition we do the entire operation under lock. This obviates the need for replorigin_drop() API and we have removed it so if any extension authors are using it they need to instead use replorigin_drop_by_name. See it's usage in pg_replication_origin_drop(). Author: Peter Smith Reviewed-by: Amit Kapila, Euler Taveira, Petr Jelinek, and Alvaro Herrera Discussion: https://www.postgresql.org/message-id/CAHut%2BPuW8DWV5fskkMWWMqzt-x7RPcNQOtJQBp6SdwyRghCk7A%40mail.gmail.com	2021-02-10 07:17:09 +05:30
Amit Kapila	a271a1b50e	Allow decoding at prepare time in ReorderBuffer. This patch allows PREPARE-time decoding of two-phase transactions (if the output plugin supports this capability), in which case the transactions are replayed at PREPARE and then committed later when COMMIT PREPARED arrives. Now that we decode the changes before the commit, the concurrent aborts may cause failures when the output plugin consults catalogs (both system and user-defined). We detect such failures with a special sqlerrcode ERRCODE_TRANSACTION_ROLLBACK introduced by commit `7259736a6e` and stop decoding the remaining changes. Then we rollback the changes when rollback prepared is encountered. Author: Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich Reviewed-by: Amit Kapila, Peter Smith, Sawada Masahiko, Arseny Sher, and Dilip Kumar Tested-by: Takamichi Osumi Discussion: https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru https://postgr.es/m/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com	2021-01-04 08:34:50 +05:30
Bruce Momjian	ca3b37487b	Update copyright for 2021 Backpatch-through: 9.5	2021-01-02 13:06:25 -05:00
Amit Kapila	0aa8a01d04	Extend the output plugin API to allow decoding of prepared xacts. This adds six methods to the output plugin API, adding support for streaming changes of two-phase transactions at prepare time. * begin_prepare * filter_prepare * prepare * commit_prepared * rollback_prepared * stream_prepare Most of this is a simple extension of the existing methods, with the semantic difference that the transaction is not yet committed and maybe aborted later. Until now two-phase transactions were translated into regular transactions on the subscriber, and the GID was not forwarded to it. None of the two-phase commands were communicated to the subscriber. This patch provides the infrastructure for logical decoding plugins to be informed of two-phase commands Like PREPARE TRANSACTION, COMMIT PREPARED and ROLLBACK PREPARED commands with the corresponding GID. This also extends the 'test_decoding' plugin, implementing these new methods. This commit simply adds these new APIs and the upcoming patch to "allow the decoding at prepare time in ReorderBuffer" will use these APIs. Author: Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich Reviewed-by: Amit Kapila, Peter Smith, Sawada Masahiko, and Dilip Kumar Discussion: https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru https://postgr.es/m/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com	2020-12-30 16:17:26 +05:30
Michael Paquier	87ae9691d2	Move SHA2 routines to a new generic API layer for crypto hashes Two new routines to allocate a hash context and to free it are created, as these become necessary for the goal behind this refactoring: switch the all cryptohash implementations for OpenSSL to use EVP (for FIPS and also because upstream does not recommend the use of low-level cryptohash functions for 20 years). Note that OpenSSL hides the internals of cryptohash contexts since 1.1.0, so it is necessary to leave the allocation to OpenSSL itself, explaining the need for those two new routines. This part is going to require more work to properly track hash contexts with resource owners, but this not introduced here. Still, this refactoring makes the move possible. This reduces the number of routines for all SHA2 implementations from twelve (SHA{224,256,386,512} with init, update and final calls) to five (create, free, init, update and final calls) by incorporating the hash type directly into the hash context data. The new cryptohash routines are moved to a new file, called cryptohash.c for the fallback implementations, with SHA2 specifics becoming a part internal to src/common/. OpenSSL specifics are part of cryptohash_openssl.c. This infrastructure is usable for more hash types, like MD5 or HMAC. Any code paths using the internal SHA2 routines are adapted to report correctly errors, which are most of the changes of this commit. The zones mostly impacted are checksum manifests, libpq and SCRAM. Note that `e21cbb4` was a first attempt to switch SHA2 to EVP, but it lacked the refactoring needed for libpq, as done here. This patch has been tested on Linux and Windows, with and without OpenSSL, and down to 1.0.1, the oldest version supported on HEAD. Author: Michael Paquier Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz	2020-12-02 10:37:20 +09:00
Amit Kapila	9653f24ad8	Fix 'skip-empty-xacts' option in test_decoding for streaming mode. In streaming mode, the transaction can be decoded in multiple streams and those streams can be interleaved with streams of other transactions. So, we can't remember the transaction's write status in the logical decoding context because that might get changed due to some other transactions and lead to wrong answers for 'skip-empty-xacts' option. We decided to keep each transaction's write status in the ReorderBufferTxn to avoid interleaved streams changing the status of some unrelated transactions. Diagnosed-by: Amit Kapila Author: Dilip Kumar Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CAA4eK1LR7=XNM_TLmpZMFuV8ZQpoxkem--NZJYf8YXmesbvwLA@mail.gmail.com	2020-11-17 12:14:53 +05:30
Michael Paquier	788dd0b839	Fix some typos Author: Daniel Gustafsson Discussion: https://postgr.es/m/C36ADFDF-D09A-4EE5-B186-CB46C3653F4C@yesql.se	2020-11-14 11:43:10 +09:00
Amit Kapila	644f0d7cc9	Use Enum for top level logical replication message types. Logical replication protocol uses a single byte character to identify a message type in logical replication protocol. The code uses string literals for the same. Use Enum so that 1. All the string literals used can be found at a single place. This makes it easy to add more types without the risk of conflicts. 2. It's easy to locate the code handling a given message type. 3. When used with switch statements, it is easy to identify the missing cases using -Wswitch. Author: Ashutosh Bapat Reviewed-by: Kyotaro Horiguchi, Andres Freund, Peter Smith and Amit Kapila Discussion: https://postgr.es/m/CAExHW5uPzQ7L0oAd_ENyvaiYMOPgkrAoJpE+ZY5-obdcVT6NPg@mail.gmail.com	2020-11-02 08:18:18 +05:30
Amit Kapila	8e90ec5580	Track statistics for streaming of changes from ReorderBuffer. This adds the statistics about transactions streamed to the decoding output plugin from ReorderBuffer. Users can query the pg_stat_replication_slots view to check these stats and call pg_stat_reset_replication_slot to reset the stats of a particular slot. Users can pass NULL in pg_stat_reset_replication_slot to reset stats of all the slots. Commit `9868167500` has added the basic infrastructure to capture the stats of slot and this commit extends the statistics collector to track additional information about slots. Bump the catversion as we have added new columns in the catalog entry. Author: Ajin Cherian and Amit Kapila Reviewed-by: Sawada Masahiko and Dilip Kumar Discussion: https://postgr.es/m/CAA4eK1+chpEomLzgSoky-D31qev19AmECNiEAietPQUGEFhtVA@mail.gmail.com	2020-10-29 09:11:51 +05:30
Amit Kapila	d7eb52d718	Execute invalidation messages for each XLOG_XACT_INVALIDATIONS message during logical decoding. Prior to commit `c55040ccd0` we have no way of knowing the invalidations before commit. So, while decoding we use to execute all the invalidations at each command end as we had no way of knowing which invalidations happened before that command. Due to this, transactions involving large amounts of DDLs use to take more time and also lead to high CPU usage. But now we know specific invalidations at each command end so we execute only required invalidations. It has been observed that decoding of a transaction containing truncation of a table with 1000 partitions would be finished in 1s whereas before this patch it used to take 4-5 minutes. Author: Dilip Kumar Reviewed-by: Amit Kapila and Keisuke Kuroda Discussion: https://postgr.es/m/CANDwggKYveEtXjXjqHA6RL3AKSHMsQyfRY6bK+NqhAWJyw8psQ@mail.gmail.com	2020-10-15 08:17:51 +05:30
Amit Kapila	9868167500	Track statistics for spilling of changes from ReorderBuffer. This adds the statistics about transactions spilled to disk from ReorderBuffer. Users can query the pg_stat_replication_slots view to check these stats and call pg_stat_reset_replication_slot to reset the stats of a particular slot. Users can pass NULL in pg_stat_reset_replication_slot to reset stats of all the slots. This commit extends the statistics collector to track this information about slots. Author: Sawada Masahiko and Amit Kapila Reviewed-by: Amit Kapila and Dilip Kumar Discussion: https://postgr.es/m/CA+fd4k5_pPAYRTDrO2PbtTOe0eHQpBvuqmCr8ic39uTNmR49Eg@mail.gmail.com	2020-10-08 09:09:08 +05:30
Amit Kapila	079d0cacf4	Fix the logical replication from HEAD to lower versions. Commit `464824323e` changed the logical replication protocol to allow the streaming of in-progress transactions and used the new version of protocol irrespective of the server version. Use the appropriate version of the protocol based on the server version. Reported-by: Ashutosh Sharma Author: Dilip Kumar Reviewed-by: Ashutosh Sharma and Amit Kapila Discussion: https://postgr.es/m/CAE9k0P=9OpXcNrcU5Gsvd5MZ8GFpiN833vNHzX6Uc=8+h1ft1Q@mail.gmail.com	2020-09-26 10:13:51 +05:30
Tom Lane	3d65b0593c	Fix bogus cache-invalidation logic in logical replication worker. The code recorded cache invalidation events by zeroing the "localreloid" field of affected cache entries. However, it's possible for an inval event to occur even while we have the entry open and locked. So an ill-timed inval could result in "cache lookup failed for relation 0" errors, if the worker's code tried to use the cleared field. We can fix that by creating a separate bool field to record whether the entry needs to be revalidated. (In the back branches, cram the bool into what had been padding space, to avoid an ABI break in the somewhat unlikely event that any extension is looking at this struct.) Also, rearrange the logic in logicalrep_rel_open so that it does the right thing in cases where table_open would fail. We should retry the lookup by name in that case, but we didn't. The real-world impact of this is probably small. In the first place, the error conditions are very low probability, and in the second place, the worker would just exit and get restarted. We only noticed because in a CLOBBER_CACHE_ALWAYS build, the failure can occur repeatedly, preventing the worker from making progress. Nonetheless, it's clearly a bug, and it impedes a useful type of testing; so back-patch to v10 where this code was introduced. Discussion: https://postgr.es/m/1032727.1600096803@sss.pgh.pa.us	2020-09-16 12:07:31 -04:00
Amit Kapila	ddd5f6d260	Remove unused function declaration in logicalproto.h. In the passing, fix a typo in pgoutput.c. Reported-by: Tomas Vondra Author: Tomas Vondra Reviewed-by: Dilip Kumar Discussion: https://postgr.es/m/20200909084353.pncuclpbwlr7vylh@development	2020-09-12 07:47:53 +05:30
Alvaro Herrera	9f1cf97bb5	Print WAL logical message contents in pg_waldump This helps debuggability when looking at WAL streams containing logical messages. Author: Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CAExHW5sWx49rKmXbg5H1Xc1t+nRv9PaYKQmgw82HPt6vWDVmDg@mail.gmail.com	2020-09-10 19:37:02 -03:00
Amit Kapila	464824323e	Add support for streaming to built-in logical replication. To add support for streaming of in-progress transactions into the built-in logical replication, we need to do three things: * Extend the logical replication protocol, so identify in-progress transactions, and allow adding additional bits of information (e.g. XID of subtransactions). * Modify the output plugin (pgoutput) to implement the new stream API callbacks, by leveraging the extended replication protocol. * Modify the replication apply worker, to properly handle streamed in-progress transaction by spilling the data to disk and then replaying them on commit. We however must explicitly disable streaming replication during replication slot creation, even if the plugin supports it. We don't need to replicate the changes accumulated during this phase, and moreover we don't have a replication connection open so we don't have where to send the data anyway. Author: Tomas Vondra, Dilip Kumar and Amit Kapila Reviewed-by: Amit Kapila, Kuntal Ghosh and Ajin Cherian Tested-by: Neha Sharma, Mahendra Singh Thalor and Ajin Cherian Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com	2020-09-03 07:54:07 +05:30
Peter Eisentraut	a13421c96c	Add some const decorations	2020-08-08 07:31:52 +02:00
Amit Kapila	7259736a6e	Implement streaming mode in ReorderBuffer. Instead of serializing the transaction to disk after reaching the logical_decoding_work_mem limit in memory, we consume the changes we have in memory and invoke stream API methods added by commit `45fdc9738b`. However, sometimes if we have incomplete toast or speculative insert we spill to the disk because we can't generate the complete tuple and stream. And, as soon as we get the complete tuple we stream the transaction including the serialized changes. We can do this incremental processing thanks to having assignments (associating subxact with toplevel xacts) in WAL right away, and thanks to logging the invalidation messages at each command end. These features are added by commits `0bead9af48` and `c55040ccd0` respectively. Now that we can stream in-progress transactions, the concurrent aborts may cause failures when the output plugin consults catalogs (both system and user-defined). We handle such failures by returning ERRCODE_TRANSACTION_ROLLBACK sqlerrcode from system table scan APIs to the backend or WALSender decoding a specific uncommitted transaction. The decoding logic on the receipt of such a sqlerrcode aborts the decoding of the current transaction and continue with the decoding of other transactions. We have ReorderBufferTXN pointer in each ReorderBufferChange by which we know which xact it belongs to. The output plugin can use this to decide which changes to discard in case of stream_abort_cb (e.g. when a subxact gets discarded). We also provide a new option via SQL APIs to fetch the changes being streamed. Author: Dilip Kumar, Tomas Vondra, Amit Kapila, Nikhil Sontakke Reviewed-by: Amit Kapila, Kuntal Ghosh, Ajin Cherian Tested-by: Neha Sharma, Mahendra Singh Thalor and Ajin Cherian Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com	2020-08-08 07:47:06 +05:30
Amit Kapila	45fdc9738b	Extend the logical decoding output plugin API with stream methods. This adds seven methods to the output plugin API, adding support for streaming changes of large in-progress transactions. * stream_start * stream_stop * stream_abort * stream_commit * stream_change * stream_message * stream_truncate Most of this is a simple extension of the existing methods, with the semantic difference that the transaction (or subtransaction) is incomplete and may be aborted later (which is something the regular API does not really need to deal with). This also extends the 'test_decoding' plugin, implementing these new stream methods. The stream_start/start_stop are used to demarcate a chunk of changes streamed for a particular toplevel transaction. This commit simply adds these new APIs and the upcoming patch to "allow the streaming mode in ReorderBuffer" will use these APIs. Author: Tomas Vondra, Dilip Kumar, Amit Kapila Reviewed-by: Amit Kapila Tested-by: Neha Sharma and Mahendra Singh Thalor Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com	2020-07-28 08:09:44 +05:30
Amit Kapila	c55040ccd0	WAL Log invalidations at command end with wal_level=logical. When wal_level=logical, write invalidations at command end into WAL so that decoding can use this information. This patch is required to allow the streaming of in-progress transactions in logical decoding. The actual work to allow streaming will be committed as a separate patch. We still add the invalidations to the cache and write them to WAL at commit time in RecordTransactionCommit(). This uses the existing XLOG_INVALIDATIONS xlog record type, from the RM_STANDBY_ID resource manager (see LogStandbyInvalidations for details). So existing code relying on those invalidations (e.g. redo) does not need to be changed. The invalidations written at command end uses a new xlog record type XLOG_XACT_INVALIDATIONS, from RM_XACT_ID resource manager. See LogLogicalInvalidations for details. These new xlog records are ignored by existing redo procedures, which still rely on the invalidations written to commit records. The invalidations are decoded and accumulated in top-transaction, and then executed during replay. This obviates the need to decode the invalidations as part of a commit record. Bump XLOG_PAGE_MAGIC, since this introduces XLOG_XACT_INVALIDATIONS. Author: Dilip Kumar, Tomas Vondra, Amit Kapila Reviewed-by: Amit Kapila Tested-by: Neha Sharma and Mahendra Singh Thalor Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com	2020-07-23 08:34:48 +05:30
Tom Lane	d5daae47db	Fix construction of updated-columns bitmap in logical replication. Commit `b9c130a1f` failed to apply the publisher-to-subscriber column mapping while checking which columns were updated. Perhaps less significantly, it didn't exclude dropped columns either. This could result in an incorrect updated-columns bitmap and thus wrong decisions about whether to fire column-specific triggers on the subscriber while applying updates. In HEAD (since commit `9de77b545`), it could also result in accesses off the end of the colstatus array, as detected by buildfarm member skink. Fix the logic, and adjust 003_constraints.pl so that the problem is exposed in unpatched code. In HEAD, also add some assertions to check that we don't access off the ends of these newly variable-sized arrays. Back-patch to v10, as `b9c130a1f` was. Discussion: https://postgr.es/m/CAH2-Wz=79hKQ4++c5A060RYbjTHgiYTHz=fw6mptCtgghH2gJA@mail.gmail.com	2020-07-20 13:40:16 -04:00
Tom Lane	9b14280b20	Fix replication/worker_internal.h to compile without other headers. This header hasn't changed recently, so the fact that it now fails headerscheck/cpluspluscheck testing must be due to changes in what it includes. Probably `f21916791` is to blame, but I didn't try to verify that. Discussion: https://postgr.es/m/3699703.1595016554@sss.pgh.pa.us	2020-07-18 14:58:18 -04:00
Tom Lane	9de77b5453	Allow logical replication to transfer data in binary format. This patch adds a "binary" option to CREATE/ALTER SUBSCRIPTION. When that's set, the publisher will send data using the data type's typsend function if any, rather than typoutput. This is generally faster, if slightly less robust. As committed, we won't try to transfer user-defined array or composite types in binary, for fear that type OIDs won't match at the subscriber. This might be changed later, but it seems like fit material for a follow-on patch. Dave Cramer, reviewed by Daniel Gustafsson, Petr Jelinek, and others; adjusted some by me Discussion: https://postgr.es/m/CADK3HH+R3xMn=8t3Ct+uD+qJ1KD=Hbif5NFMJ+d5DkoCzp6Vgw@mail.gmail.com	2020-07-18 12:44:51 -04:00
Amit Kapila	d973747281	Revert "Track statistics for spilling of changes from ReorderBuffer". The stats with this commit was available only for WALSenders, however, users might want to see for backends doing logical decoding via SQL API. Then, users might want to reset and access these stats across server restart which was not possible with the current patch. List of commits reverted: `caa3c4242c` Don't call elog() while holding spinlock. `e641b2a995` Doc: Update the documentation for spilled transaction statistics. `5883f5fe27` Fix unportable printf format introduced in commit `9290ad198`. `9290ad198b` Track statistics for spilling of changes from ReorderBuffer. Additionaly, remove the release notes entry for this feature. Backpatch-through: 13, where it was introduced Discussion: https://postgr.es/m/CA+fd4k5_pPAYRTDrO2PbtTOe0eHQpBvuqmCr8ic39uTNmR49Eg@mail.gmail.com	2020-07-13 08:53:23 +05:30
Michael Paquier	641dd167a3	Move description of libpqwalreceiver hooks out of the replication's README src/backend/replication/README includes since `32bc08b` a basic description of the WAL receiver hooks available in walreceiver.h for a module like libpqwalreceiver, but the README has never been updated to reflect changes done to the hooks, so the contents of the README have rotten with the time. This commit moves the description from the README to walreceiver.h, where it will be hard to miss that a description update or addition is needed depending on the modifications done to the hooks. Each hook now includes a description of what it does in walreceiver.h, and the replication's README mentions walreceiver.h. Thanks also to Amit Kapila for the discussion. Author: Michael Paquier Reviewed-by: Peter Eisentraut Discussion: https://postgr.es/m/20200502024606.GA471944@paquier.xyz	2020-07-02 13:57:03 +09:00
Alvaro Herrera	0188bb8253	Save slot's restart_lsn when invalidated due to size We put it aside as invalidated_at, which let us show "lost" in pg_replication slot. Prior to this change, the state value was reported as NULL. Backpatch to 13. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20200617.101707.1735599255100002667.horikyota.ntt@gmail.com Discussion: https://postgr.es/m/20200407.120905.1507671100168805403.horikyota.ntt@gmail.com	2020-06-24 14:15:17 -04:00
Robert Haas	1fa092913d	Don't export basebackup.c's sendTablespace(). Commit `72d422a522` made xlog.c call sendTablespace() with the 'sizeonly' argument set to true, which required basebackup.c to export sendTablespace(). However, that's kind of ugly, so instead defer the call to sendTablespace() until basebackup.c regains control. That way, it can still be a static function. Patch by me, reviewed by Amit Kapila and Kyotaro Horiguchi. Discussion: http://postgr.es/m/CA+TgmoYq+59SJ2zBbP891ngWPA9fymOqntqYcweSDYXS2a620A@mail.gmail.com	2020-06-17 10:57:34 -04:00
Peter Eisentraut	0fd2a79a63	Spelling adjustments	2020-06-07 15:06:51 +02:00
Tom Lane	f88bd3139f	Don't call palloc() while holding a spinlock, either. Fix some more violations of the "only straight-line code inside a spinlock" rule. These are hazardous not only because they risk holding the lock for an excessively long time, but because it's possible for palloc to throw elog(ERROR), leaving a stuck spinlock behind. copy_replication_slot() had two separate places that did pallocs while holding a spinlock. We can make the code simpler and safer by copying the whole ReplicationSlot struct into a local variable while holding the spinlock, and then referencing that copy. (While that's arguably more cycles than we really need to spend holding the lock, the struct isn't all that big, and this way seems far more maintainable than copying fields piecemeal. Anyway this is surely much cheaper than a palloc.) That bug goes back to v12. InvalidateObsoleteReplicationSlots() not only did a palloc while holding a spinlock, but for extra sloppiness then leaked the memory --- probably for the lifetime of the checkpointer process, though I didn't try to verify that. Fortunately that silliness is new in HEAD. pg_get_replication_slots() had a cosmetic violation of the rule, in that it only assumed it's safe to call namecpy() while holding a spinlock. Still, that's a hazard waiting to bite somebody, and there were some other cosmetic coding-rule violations in the same function, so clean it up. I back-patched this as far as v10; the code exists before that but it looks different, and this didn't seem important enough to adapt the patch further back. Discussion: https://postgr.es/m/20200602.161518.1399689010416646074.horikyota.ntt@gmail.com	2020-06-03 12:36:23 -04:00
Tom Lane	5cbfce562f	Initial pgindent and pgperltidy run for v13. Includes some manual cleanup of places that pgindent messed up, most of which weren't per project style anyway. Notably, it seems some people didn't absorb the style rules of commit `c9d297751`, because there were a bunch of new occurrences of function calls with a newline just after the left paren, all with faulty expectations about how the rest of the call would get indented.	2020-05-14 13:06:50 -04:00
Alvaro Herrera	b060dbe000	Rework XLogReader callback system Code review for `0dc8ead463`, prompted by a bug closed by `91c40548d5`. XLogReader's system for opening and closing segments had gotten too complicated, with callbacks being passed at both the XLogReaderAllocate level (read_page) as well as at the WALRead level (segment_open). This was confusing and hard to follow, so restructure things so that these callbacks are passed together at XLogReaderAllocate time, and add another callback to the set (segment_close) to make it a coherent whole. Also, ensure XLogReaderState is an argument to all the callbacks, so that they can grab at the ->private data if necessary. Document the whole arrangement more clearly. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20200422175754.GA19858@alvherre.pgsql	2020-05-08 15:40:11 -04:00
Robert Haas	ab7646ff92	Also rename 'struct manifest_info'. The previous commit renamed the struct's typedef, but not the struct name itself.	2020-04-23 09:47:50 -04:00
Robert Haas	3989dbdf12	Rename exposed identifiers to say "backup manifest". Function names declared "extern" now use BackupManifest in the name rather than just Manifest, and data types use backup_manifest rather than just manifest. Per note from Michael Paquier. Discussion: http://postgr.es/m/20200418125713.GG350229@paquier.xyz	2020-04-23 08:44:06 -04:00
Tom Lane	2117c3cb3d	Fix duplicate typedef from commit `0d8c9c121`. Older gcc versions don't like duplicate typedefs, so get rid of that in favor of doing it like we do it elsewhere, ie just use a "struct" declaration when trying to avoid importing a whole header file. Also, there seems no reason to include stringinfo.h here at all, so get rid of that addition too. Discussion: https://postgr.es/m/27239.1587415696@sss.pgh.pa.us	2020-04-21 11:13:05 -04:00
Robert Haas	079ac29d4d	Move the server's backup manifest code to a separate file. basebackup.c is already a pretty big and complicated file, so it makes more sense to keep the backup manifest support routines in a separate file, for clarity and ease of maintenance. Discussion: http://postgr.es/m/CA+TgmoavRak5OdP76P8eJExDYhPEKWjMb0sxW7dF01dWFgE=uA@mail.gmail.com	2020-04-20 14:38:15 -04:00
Tom Lane	f332241a60	Fix race conditions in synchronous standby management. We have repeatedly seen the buildfarm reach the Assert(false) in SyncRepGetSyncStandbysPriority. This apparently is due to failing to consider the possibility that the sync_standby_priority values in shared memory might be inconsistent; but they will be whenever only some of the walsenders have updated their values after a change in the synchronous_standby_names setting. That function is vastly too complex for what it does, anyway, so rewriting it seems better than trying to apply a band-aid fix. Furthermore, the API of SyncRepGetSyncStandbys is broken by design: it returns a list of WalSnd array indexes, but there is nothing guaranteeing that the contents of the WalSnd array remain stable. Thus, if some walsender exits and then a new walsender process takes over that WalSnd array slot, a caller might make use of WAL position data that it should not, potentially leading to incorrect decisions about whether to release transactions that are waiting for synchronous commit. To fix, replace SyncRepGetSyncStandbys with a new function SyncRepGetCandidateStandbys that copies all the required data from shared memory while holding the relevant mutexes. If the associated walsender process then exits, this data is still safe to make release decisions with, since we know that that much WAL was sent to a valid standby server. This incidentally means that we no longer need to treat sync_standby_priority as protected by the SyncRepLock rather than the per-walsender mutex. SyncRepGetSyncStandbys is no longer used by the core code, so remove it entirely in HEAD. However, it seems possible that external code is relying on that function, so do not remove it from the back branches. Instead, just remove the known-incorrect Assert. When the bug occurs, the function will return a too-short list, which callers should treat as meaning there are not enough sync standbys, which seems like a reasonably safe fallback until the inconsistent state is resolved. Moreover it's bug-compatible with what has been happening in non-assert builds. We cannot do anything about the walsender-replacement race condition without an API/ABI break. The bogus assertion exists back to 9.6, but 9.6 is sufficiently different from the later branches that the patch doesn't apply at all. I chose to just remove the bogus assertion in 9.6, feeling that the probability of a bad outcome from the walsender-replacement race condition is too low to justify rewriting the whole patch for 9.6. Discussion: https://postgr.es/m/21519.1585272409@sss.pgh.pa.us	2020-04-18 14:02:44 -04:00
Thomas Munro	d140f2f3e2	Rationalize GetWalRcv{Write,Flush}RecPtr(). GetWalRcvWriteRecPtr() previously reported the latest flushed location. Adopt the conventional terminology used elsewhere in the tree by renaming it to GetWalRcvFlushRecPtr(), and likewise for some related variables that used the term "received". Add a new definition of GetWalRcvWriteRecPtr(), which returns the latest written value. This will allow later patches to use the value for non-data-integrity purposes, without having to wait for the flush pointer to advance. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com	2020-04-08 23:45:09 +12:00
Alvaro Herrera	c655077639	Allow users to limit storage reserved by replication slots Replication slots are useful to retain data that may be needed by a replication system. But experience has shown that allowing them to retain excessive data can lead to the primary failing because of running out of space. This new feature allows the user to configure a maximum amount of space to be reserved using the new option max_slot_wal_keep_size. Slots that overrun that space are invalidated at checkpoint time, enabling the storage to be released. Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20170228.122736.123383594.horiguchi.kyotaro@lab.ntt.co.jp	2020-04-07 18:35:00 -04:00
Peter Eisentraut	f1ac27bfda	Add logical replication support to replicate into partitioned tables Mainly, this adds support code in logical/worker.c for applying replicated operations whose target is a partitioned table to its relevant partitions. Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Reviewed-by: Petr Jelinek <petr@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com	2020-04-06 15:15:52 +02:00
Robert Haas	0d8c9c1210	Generate backup manifests for base backups, and validate them. A manifest is a JSON document which includes (1) the file name, size, last modification time, and an optional checksum for each file backed up, (2) timelines and LSNs for whatever WAL will need to be replayed to make the backup consistent, and (3) a checksum for the manifest itself. By default, we use CRC-32C when checksumming data files, because we are trying to detect corruption and user error, not foil an adversary. However, pg_basebackup and the server-side BASE_BACKUP command now have options to select a different algorithm, so users wanting a cryptographic hash function can select SHA-224, SHA-256, SHA-384, or SHA-512. Users not wanting file checksums at all can disable them, or disable generating of the backup manifest altogether. Using a cryptographic hash function in place of CRC-32C consumes significantly more CPU cycles, which may slow down backups in some cases. A new tool called pg_validatebackup can validate a backup against the manifest. If no checksums are present, it can still check that the right files exist and that they have the expected sizes. If checksums are present, it can also verify that each file has the expected checksum. Additionally, it calls pg_waldump to verify that the expected WAL files are present and parseable. Only plain format backups can be validated directly, but tar format backups can be validated after extracting them. Robert Haas, with help, ideas, review, and testing from David Steele, Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan Chalke, Amit Kapila, Andres Freund, and Noah Misch. Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com	2020-04-03 15:05:59 -04:00
Alvaro Herrera	092c6936de	Set wal_receiver_create_temp_slot PGC_POSTMASTER Commit `3297308278` gave walreceiver the ability to create and use a temporary replication slot, and made it controllable by a GUC (enabled by default) that can be changed with SIGHUP. That's useful but has two problems: one, it's possible to cause the origin server to fill its disk if the slot doesn't advance in time; and also there's a disconnect between state passed down via the startup process and GUCs that walreceiver reads directly. We handle the first problem by setting the option to disabled by default. If the user enables it, its on their head to make sure that disk doesn't fill up. We handle the second problem by passing the flag via startup rather than having walreceiver acquire it directly, and making it PGC_POSTMASTER (which ensures a walreceiver always has the fresh value). A future commit can relax this (to PGC_SIGHUP again) by having the startup process signal walreceiver to shutdown whenever the value changes. Author: Sergei Kornilov <sk@zsrv.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20200122055510.GH174860@paquier.xyz	2020-03-27 16:20:33 -03:00
Peter Eisentraut	c314c147c0	Prepare to support non-tables in publications This by itself doesn't change any functionality but prepares the way for having relations other than base tables in publications. Make arrangements for COPY handling the initial table sync. For non-tables we have to use COPY (SELECT ...) instead of directly copying from the table, but then we have to take care to omit generated columns from the column list. Also, remove a hardcoded reference to relkind = 'r' and rely on the publisher to send only what it can actually publish, which will be correct even in future cross-version scenarios. Reviewed-by: Amit Langote <amitlangote09@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com	2020-03-19 08:25:07 +01:00
Alvaro Herrera	5d0c2d5eba	Remove logical_read_local_xlog_page It devolved into a content-less wrapper over read_local_xlog_page, with nothing to add, plus it's easily confused with walsender's logical_read_xlog_page. There doesn't seem to be any reason for it to stay. src/include/replication/logicalfuncs.h becomes empty, so remove it too. The prototypes it initially had were absorbed by generated fmgrprotos.h. Discussion: https://postgr.es/m/20191115214102.GA15616@alvherre.pgsql	2020-03-17 18:18:01 -03:00
Alvaro Herrera	15cac3a523	Set ReorderBufferTXN->final_lsn more eagerly ... specifically, set it incrementally as each individual change is spilled down to disk. This way, it is set correctly when the transaction disappears without trace, ie. without leaving an XACT_ABORT wal record. (This happens when the server crashes midway through a transaction.) Failing to have final_lsn prevents ReorderBufferRestoreCleanup() from working, since it needs the final_lsn in order to know the endpoint of its iteration through spilled files. Commit `df9f682c7b` already tried to fix the problem, but it didn't set the final_lsn in all cases. Revert that, since it's no longer needed. Author: Vignesh C Reviewed-by: Amit Kapila, Dilip Kumar Discussion: https://postgr.es/m/CALDaNm2CLk+K9JDwjYST0sPbGg5AQdvhUt0jbKyX_HdAE0jk3A@mail.gmail.com	2020-01-17 18:00:39 -03:00
Peter Eisentraut	3297308278	walreceiver uses a temporary replication slot by default If no permanent replication slot is configured using primary_slot_name, the walreceiver now creates and uses a temporary replication slot. A new setting wal_receiver_create_temp_slot can be used to disable this behavior, for example, if the remote instance is out of replication slots. Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/CA%2Bfd4k4dM0iEPLxyVyme2RAFsn8SUgrNtBJOu81YqTY4V%2BnqZA%40mail.gmail.com	2020-01-14 14:40:41 +01:00
Peter Eisentraut	ee4ac46c8e	Expose PQbackendPID() through walreceiver API This will be used by a subsequent patch. Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/CA%2Bfd4k4dM0iEPLxyVyme2RAFsn8SUgrNtBJOu81YqTY4V%2BnqZA%40mail.gmail.com	2020-01-14 14:40:41 +01:00
Alvaro Herrera	a7b6ab5db1	Clean up representation of flags in struct ReorderBufferTXN This simplifies addition of further flags. Author: Nikhil Sontakke Discussion: https://postgr.es/m/CAMGcDxeViP+R-OL7QhzUV9eKCVjURobuY1Zijik4Ay_Ddwo4Cg@mail.gmail.com	2020-01-10 17:46:57 -03:00
Bruce Momjian	7559d8ebfa	Update copyrights for 2020 Backpatch-through: update all files in master, backpatch legal files through 9.4	2020-01-01 12:21:45 -05:00
Michael Paquier	e1551f96e6	Refactor attribute mappings used in logical tuple conversion Tuple conversion support in tupconvert.c is able to convert rowtypes between two relations, inner and outer, which are logically equivalent but have a different ordering or even dropped columns (used mainly for inheritance tree and partitions). This makes use of attribute mappings, which are simple arrays made of AttrNumber elements with a length matching the number of attributes of the outer relation. The length of the attribute mapping has been treated as completely independent of the mapping itself until now, making it easy to pass down an incorrect mapping length. This commit refactors the code related to attribute mappings and moves it into an independent facility called attmap.c, extracted from tupconvert.c. This merges the attribute mapping with its length, avoiding to try to guess what is the length of a mapping to use as this is computed once, when the map is built. This will avoid mistakes like what has been fixed in `dc816e58`, which has used an incorrect mapping length by matching it with the number of attributes of an inner relation (a child partition) instead of an outer relation (a partitioned table). Author: Michael Paquier Reviewed-by: Amit Langote Discussion: https://postgr.es/m/20191121042556.GD153437@paquier.xyz	2019-12-18 16:23:02 +09:00
Amit Kapila	e0487223ec	Make the order of the header file includes consistent. Similar to commits `14aec03502`, `7e735035f2` and `dddf4cdc33`, this commit makes the order of header file inclusion consistent in more places. Author: Vignesh C Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com	2019-11-25 08:08:57 +05:30
Amit Kapila	9290ad198b	Track statistics for spilling of changes from ReorderBuffer. This adds the statistics about transactions spilled to disk from ReorderBuffer. Users can query the pg_stat_replication view to check these stats. Author: Tomas Vondra, with bug-fixes and minor changes by Dilip Kumar Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com	2019-11-21 08:06:51 +05:30
Amit Kapila	cec2edfa78	Add logical_decoding_work_mem to limit ReorderBuffer memory usage. Instead of deciding to serialize a transaction merely based on the number of changes in that xact (toplevel or subxact), this makes the decisions based on amount of memory consumed by the changes. The memory limit is defined by a new logical_decoding_work_mem GUC, so for example we can do this SET logical_decoding_work_mem = '128kB' to reduce the memory usage of walsenders or set the higher value to reduce disk writes. The minimum value is 64kB. When adding a change to a transaction, we account for the size in two places. Firstly, in the ReorderBuffer, which is then used to decide if we reached the total memory limit. And secondly in the transaction the change belongs to, so that we can pick the largest transaction to evict (and serialize to disk). We still use max_changes_in_memory when loading changes serialized to disk. The trouble is we can't use the memory limit directly as there might be multiple subxact serialized, we need to read all of them but we don't know how many are there (and which subxact to read first). We do not serialize the ReorderBufferTXN entries, so if there is a transaction with many subxacts, most memory may be in this type of objects. Those records are not included in the memory accounting. We also do not account for INTERNAL_TUPLECID changes, which are kept in a separate list and not evicted from memory. Transactions with many CTID changes may consume significant amounts of memory, but we can't really do much about that. The current eviction algorithm is very simple - the transaction is picked merely by size, while it might be useful to also consider age (LSN) of the changes for example. With the new Generational memory allocator, evicting the oldest changes would make it more likely the memory gets actually pfreed. The logical_decoding_work_mem can be set in postgresql.conf, in which case it serves as the default for all publishers on that instance. Author: Tomas Vondra, with changes by Dilip Kumar and Amit Kapila Reviewed-by: Dilip Kumar and Amit Kapila Tested-By: Vignesh C Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com	2019-11-19 07:32:36 +05:30

1 2 3 4 5 ...

454 commits