mirror of
https://github.com/postgres/postgres.git
synced 2026-05-28 04:35:45 -04:00
Fix race between ProcSignalInit() and EmitProcSignalBarrier().
Previously, ProcSignalInit() read the global barrier generation before publishing its PID into pss_pid. This created a race condition: a process could initialize its local generation with an older global value, while a concurrent EmitProcSignalBarrier() might skip that process because its pss_pid was still zero. This resulted in WaitForProcSignalBarrier() hanging indefinitely. Fix this by publishing pss_pid before reading psh_barrierGeneration with a memory barrier so that the store to pss_pid is ordered before the load. A concurrent EmitProcSignalBarrier() then either observes the published PID and signals this slot, or completes its generation increment before we load it. While this race has become more visible due to recent features using signal barriers in more places (such as online wal_level changes), the issue is theoretically present since signal barriers were introduced to release smgr caches (e.g., in DROP DATABASE). v14 has the procsiangl barrier infrastricutre but no in-tree caller that actually emits a barrier, so the case is unreachable there. This issue was also reported by buildfarm member flaviventris. Reported-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAEze2WgAJmWReDN7Chtba8Er2YBvKCoa0KVN25-1evnTrHsLyA@mail.gmail.com Backpatch-through: 15
This commit is contained in:
parent
c8cd3d6976
commit
1a9b1cc18e
1 changed files with 9 additions and 1 deletions
|
|
@ -185,6 +185,15 @@ ProcSignalInit(const uint8 *cancel_key, int cancel_key_len)
|
|||
/* Clear out any leftover signal reasons */
|
||||
MemSet(slot->pss_signalFlags, 0, NUM_PROCSIGNALS * sizeof(sig_atomic_t));
|
||||
|
||||
/*
|
||||
* Publish the PID before reading the global barrier generation to ensure
|
||||
* that EmitProcSignalBarrier() doesn't skip us while we are grabbing an
|
||||
* older generation. We need a memory barrier here to make sure that the
|
||||
* update of pss_pid is ordered before the subsequent load of
|
||||
* psh_barrierGeneration.
|
||||
*/
|
||||
pg_atomic_write_membarrier_u32(&slot->pss_pid, MyProcPid);
|
||||
|
||||
/*
|
||||
* Initialize barrier state. Since we're a brand-new process, there
|
||||
* shouldn't be any leftover backend-private state that needs to be
|
||||
|
|
@ -204,7 +213,6 @@ ProcSignalInit(const uint8 *cancel_key, int cancel_key_len)
|
|||
if (cancel_key_len > 0)
|
||||
memcpy(slot->pss_cancel_key, cancel_key, cancel_key_len);
|
||||
slot->pss_cancel_key_len = cancel_key_len;
|
||||
pg_atomic_write_u32(&slot->pss_pid, MyProcPid);
|
||||
|
||||
SpinLockRelease(&slot->pss_mutex);
|
||||
|
||||
|
|
|
|||
Loading…
Reference in a new issue