postgresql/src
Michael Paquier e6767c0ed1 Fix header check for continuation records where standbys could be stuck
XLogPageRead() checks immediately for an invalid WAL record header on a
standby, to be able to handle the case of continuation records that need
to be read across two different sources.  As written, the check was too
generic, applying to any target LSN.  Based on an analysis by Kyotaro
Horiguchi, what really matters is to make sure that the page header is
checked when attempting to read a LSN at the boundary of a segment, to
handle the case of a continuation record that spawns across multiple
pages when dealing with multiple segments, as WAL receivers are spawned
they request WAL from the beginning of a segment.  This fix has been
proposed by Kyotaro Horiguchi.

This could cause standbys to loop infinitely when dealing with a
continuation record during a timeline jump, in the case where the
contents of the record in the follow-up page are invalid.

Some regression tests are added to check such scenarios, able to
reproduce the original problem.  In the test, the contents of a
continuation record are overwritten with junk zeros on its follow-up
page, and replayed on standbys.  This is inspired by 039_end_of_wal.pl,
and is enough to show how standbys should react on promotion by not
being stuck.  Without the fix, the test would fail with a timeout.  The
test to reproduce the problem has been written by Alexander Kukushkin.

The original check has been introduced in 0668719801, for a similar
problem.

Author: Kyotaro Horiguchi, Alexander Kukushkin
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAFh8B=mozC+e1wGJq0H=0O65goZju+6ab5AU7DEWCSUA2OtwDg@mail.gmail.com
Backpatch-through: 13
2025-01-20 09:30:33 +09:00
..
backend Fix header check for continuation records where standbys could be stuck 2025-01-20 09:30:33 +09:00
bin Fix off_t overflow in pg_basebackup on Windows. 2025-01-09 16:05:01 +13:00
common meson: Export all libcommon functions in Windows builds 2024-12-25 18:14:26 +02:00
fe_utils Prevent mis-encoding of "trailing junk after numeric literal" errors. 2024-09-05 12:42:33 -04:00
include Avoid symbol collisions between pqsignal.c and legacy-pqsignal.c. 2025-01-14 18:50:24 -05:00
interfaces Avoid symbol collisions between pqsignal.c and legacy-pqsignal.c. 2025-01-14 18:50:24 -05:00
makefiles Optimize pg_popcount() with AVX-512 instructions. 2024-04-06 21:56:23 -05:00
pl Repair memory leaks in plpython. 2025-01-11 11:45:56 -05:00
port Avoid symbol collisions between pqsignal.c and legacy-pqsignal.c. 2025-01-14 18:50:24 -05:00
template Remove AIX support 2024-02-28 15:17:23 +04:00
test Fix header check for continuation records where standbys could be stuck 2025-01-20 09:30:33 +09:00
timezone Update time zone data files to tzdata release 2024b. 2024-10-29 11:49:50 -04:00
tools Fix catcache invalidation of a list entry that's being built 2025-01-14 14:35:11 +02:00
tutorial Update copyright for 2024 2024-01-03 20:49:05 -05:00
.gitignore
DEVELOPERS
Makefile Remove distprep 2023-11-06 15:18:04 +01:00
Makefile.global.in Update Unicode data to CLDR 45 2024-04-22 09:16:33 +02:00
Makefile.shlib Remove AIX support 2024-02-28 15:17:23 +04:00
meson.build Update copyright for 2024 2024-01-03 20:49:05 -05:00
nls-global.mk Remove distprep 2023-11-06 15:18:04 +01:00