Fix bug due to confusion about what IsMVCCSnapshot means

In 0b96e734c5 I (Andres) relied on page_collect_tuples() being called only
with an MVCC snapshot, and added assertions to that end, but did not realize
that IsMVCCSnapshot() allows both proper MVCC snapshots and historical
snapshots, which behave quite similarly to MVCC snapshots.

Unfortunately that can lead to incorrect visibility results during logical
decoding, as a historical snapshot is interpreted as a plain MVCC
snapshot. The only reason this wasn't noticed earlier is that it's hard to
reach as most of the time there are no sequential scans during logical
decoding.

To fix the bug and avoid issues like this in the future, split
IsMVCCSnapshot() into IsMVCCSnapshot() and IsMVCCLikeSnapshot(), where now
only the latter includes historic snapshots.

One effect of this is that during logical decoding no page-at-a-time snapshots
are used, as otherwise runtime branches to handle historic snapshots would be
needed in some performance critical paths. Given how uncommon sequential scans
are during logical decoding, that seems acceptable.

Author: Antonin Houska <ah@cybertec.at>
Reported-by: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/61812.1770637345@localhost
This commit is contained in:
Andres Freund 2026-03-13 12:11:05 -04:00
parent b634c4e0e8
commit ce5d489166
4 changed files with 21 additions and 6 deletions

View file

@ -159,7 +159,7 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Only in a non-MVCC snapshot can more than one member of the HOT
* chain be visible.
*/
*call_again = !IsMVCCSnapshot(snapshot);
*call_again = !IsMVCCLikeSnapshot(snapshot);
slot->tts_tableOid = RelationGetRelid(scan->rel);
ExecStoreBufferHeapTuple(&bslot->base.tupdata, slot, hscan->xs_cbuf);

View file

@ -445,7 +445,7 @@ index_markpos(IndexScanDesc scan)
void
index_restrpos(IndexScanDesc scan)
{
Assert(IsMVCCSnapshot(scan->xs_snapshot));
Assert(IsMVCCLikeSnapshot(scan->xs_snapshot));
SCAN_CHECKS;
CHECK_SCAN_PROCEDURE(amrestrpos);

View file

@ -423,7 +423,7 @@ btrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
* Note: so->dropPin should never change across rescans.
*/
so->dropPin = (!scan->xs_want_itup &&
IsMVCCSnapshot(scan->xs_snapshot) &&
IsMVCCLikeSnapshot(scan->xs_snapshot) &&
RelationNeedsWAL(scan->indexRelation) &&
scan->heapRelation != NULL);

View file

@ -51,14 +51,29 @@ extern PGDLLIMPORT SnapshotData SnapshotToastData;
((snapshotdata).snapshot_type = SNAPSHOT_NON_VACUUMABLE, \
(snapshotdata).vistest = (vistestp))
/* This macro encodes the knowledge of which snapshots are MVCC-safe */
/*
* Is the snapshot implemented as an MVCC snapshot (i.e. it uses
* SNAPSHOT_MVCC)? If so, there will be at most one visible tuple in a chain
* of updated tuples, and each visible tuple will be seen exactly once.
*/
#define IsMVCCSnapshot(snapshot) \
((snapshot)->snapshot_type == SNAPSHOT_MVCC || \
(snapshot)->snapshot_type == SNAPSHOT_HISTORIC_MVCC)
((snapshot)->snapshot_type == SNAPSHOT_MVCC)
/*
* Special kind of MVCC snapshot, used during logical decoding. The visibility
* is checked from the perspective of an already committed transaction that is
* being decoded.
*/
#define IsHistoricMVCCSnapshot(snapshot) \
((snapshot)->snapshot_type == SNAPSHOT_HISTORIC_MVCC)
/*
* Is the snapshot either an MVCC snapshot or has equivalent visibility
* semantics (see IsMVCCSnapshot())?
*/
#define IsMVCCLikeSnapshot(snapshot) \
(IsMVCCSnapshot(snapshot) || IsHistoricMVCCSnapshot(snapshot))
extern Snapshot GetTransactionSnapshot(void);
extern Snapshot GetLatestSnapshot(void);
extern void SnapshotSetCommandId(CommandId curcid);