postgresql/src/include
Tomas Vondra 4b8af2bf8c Fix ordering of XIDs in ProcArrayApplyRecoveryInfo
Commit 8431e296ea reworked ProcArrayApplyRecoveryInfo to sort XIDs
before adding them to KnownAssignedXids. But the XIDs are sorted using
xidComparator, which compares the XIDs simply as uint32 values, not
logically. KnownAssignedXidsAdd() however expects XIDs in logical order,
and calls TransactionIdFollowsOrEquals() to enforce that. If there are
XIDs for which the two orderings disagree, an error is raised and the
recovery fails/restarts.

Hitting this issue is fairly easy - you just need two transactions, one
started before the 4B limit (e.g. XID 4294967290), the other sometime
after it (e.g. XID 1000). Logically (4294967290 <= 1000) but when
compared using xidComparator we try to add them in the opposite order.
Which makes KnownAssignedXidsAdd() fail with an error like this:

  ERROR: out-of-order XID insertion in KnownAssignedXids

This only happens during replica startup, while processing RUNNING_XACTS
records to build the snapshot. Once we reach STANDBY_SNAPSHOT_READY, we
skip these records. So this does not affect already running replicas,
but if you restart (or create) a replica while there are transactions
with XIDs for which the two orderings disagree, you may hit this.

Long-running transactions and frequent replica restarts increase the
likelihood of hitting this issue. Once the replica gets into this state,
it can't be started (even if the old transactions are terminated).

Fixed by sorting the XIDs logically - this is fine because we're dealing
with normal XIDs (because it's XIDs assigned to backends) and from the
same wraparound epoch (otherwise the backends could not be running at
the same time on the primary node). So there are no problems with the
triangle inequality, which is why xidComparator compares raw values.

Investigation and root cause analysis by Abhijit Menon-Sen. Patch by me.

This issue is present in all releases since 9.4, however releases up to
9.6 are EOL already so backpatch to 10 only.

Reviewed-by: Abhijit Menon-Sen
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
Discussion: https://postgr.es/m/36b8a501-5d73-277c-4972-f58a4dce088a%40enterprisedb.com
2022-01-27 20:17:36 +01:00
..
access Fix CREATE INDEX CONCURRENTLY for the newest prepared transactions. 2021-10-23 18:36:42 -07:00
bootstrap Phase 2 pgindent run for v12. 2019-05-22 13:04:48 -04:00
catalog Correct comment and some documentation about REPLICA_IDENTITY_INDEX 2021-12-22 16:38:49 +09:00
commands Fix toast rewrites in logical decoding. 2021-08-25 09:32:56 +05:30
common Move connect.h from fe_utils to src/include/common. 2020-08-10 09:22:58 -07:00
datatype Update copyright for 2019 2019-01-02 12:44:25 -05:00
executor Restore the portal-level snapshot after procedure COMMIT/ROLLBACK. 2021-05-21 14:03:53 -04:00
fe_utils Fix parallel restore of FKs to partitioned tables 2019-10-17 09:58:01 +02:00
foreign Phase 2 pgindent run for v12. 2019-05-22 13:04:48 -04:00
jit jit: Do not try to shut down LLVM state in case of LLVM triggered errors. 2021-09-13 18:26:18 -07:00
lib Fix incorrect hash table resizing code in simplehash.h 2021-08-13 16:43:13 +12:00
libpq Harden be-gssapi-common.h for headerscheck 2021-11-26 17:00:29 -03:00
mb Phase 2 pgindent run for v12. 2019-05-22 13:04:48 -04:00
nodes Fix index-only scan plans, take 2. 2022-01-03 15:42:27 -05:00
optimizer Fix pull_varnos' miscomputation of relids set for a PlaceHolderVar. 2021-01-21 15:37:23 -05:00
parser Calculate extraUpdatedCols in query rewriter, not parser. 2020-10-28 13:47:02 -04:00
partitioning Fix hash partition pruning with asymmetric partition sets. 2021-01-28 13:41:55 -05:00
port Remove unnecessary declaration in win32_port.h 2021-06-08 13:40:10 +09:00
portability Update copyright for 2019 2019-01-02 12:44:25 -05:00
postmaster Fix race condition between shutdown and unstarted background workers. 2020-12-24 17:00:43 -05:00
regex Phase 2 pgindent run for v12. 2019-05-22 13:04:48 -04:00
replication Fix limitations on what SQL commands can be issued to a walsender. 2022-01-24 15:33:34 -05:00
rewrite Calculate extraUpdatedCols in query rewriter, not parser. 2020-10-28 13:47:02 -04:00
snowball Update copyright for 2019 2019-01-02 12:44:25 -05:00
statistics Fix choose_best_statistics to check clauses individually 2019-11-28 22:26:25 +01:00
storage Fix CREATE INDEX CONCURRENTLY for the newest prepared transactions. 2021-10-23 18:36:42 -07:00
tcop Restore the portal-level snapshot after procedure COMMIT/ROLLBACK. 2021-05-21 14:03:53 -04:00
tsearch Don't leak compiled regex(es) when an ispell cache entry is dropped. 2021-03-18 21:44:43 -04:00
utils Fix ordering of XIDs in ProcArrayApplyRecoveryInfo 2022-01-27 20:17:36 +01:00
.gitignore Refactor dlopen() support 2018-09-06 11:33:04 +02:00
c.h pg_attribute_no_sanitize_alignment() macro 2021-02-13 17:49:08 -05:00
fmgr.h Fix minor violations of FunctionCallInvoke usage protocol. 2020-04-21 14:23:58 -04:00
funcapi.h Avoid holding a directory FD open across assorted SRF calls. 2020-03-16 21:05:53 -04:00
getaddrinfo.h Phase 2 pgindent run for v12. 2019-05-22 13:04:48 -04:00
getopt_long.h Phase 2 pgindent run for v12. 2019-05-22 13:04:48 -04:00
Makefile Get rid of jsonpath_gram.h and jsonpath_scanner.h 2019-03-20 11:13:34 +03:00
miscadmin.h Refactor CHECK_FOR_INTERRUPTS() to add flexibility. 2021-05-14 12:54:26 -04:00
pg_config.h.in Update configure's probe for libldap to work with OpenLDAP 2.5. 2021-07-09 12:38:55 -04:00
pg_config.h.win32 Stamp 12.9. 2021-11-08 17:02:19 -05:00
pg_config_ext.h.in Autoconfiscate selection of 64-bit int type for 64-bit large object API. 2012-10-07 21:52:43 -04:00
pg_config_ext.h.win32 Autoconfiscate selection of 64-bit int type for 64-bit large object API. 2012-10-07 21:52:43 -04:00
pg_config_manual.h Update copyright for 2019 2019-01-02 12:44:25 -05:00
pg_getopt.h Use our own getopt() on OpenBSD. 2019-01-18 15:06:26 -05:00
pg_trace.h Update copyright for 2019 2019-01-02 12:44:25 -05:00
pgstat.h Add GUC variables for stat tracking and timeout as PGDLLIMPORT 2020-01-21 13:46:55 +09:00
pgtar.h Phase 2 pgindent run for v12. 2019-05-22 13:04:48 -04:00
pgtime.h Phase 2 pgindent run for v12. 2019-05-22 13:04:48 -04:00
port.h Add fallback implementation for setenv() 2021-06-01 09:27:31 +09:00
postgres.h Change function call information to be variable length. 2019-01-26 14:17:52 -08:00
postgres_ext.h Phase 2 of pgindent updates. 2017-06-21 15:19:25 -04:00
postgres_fe.h Update copyright for 2019 2019-01-02 12:44:25 -05:00
rusagestub.h Update copyright for 2019 2019-01-02 12:44:25 -05:00
windowapi.h Phase 2 pgindent run for v12. 2019-05-22 13:04:48 -04:00