MEDIUM: stats-file/clock: automatically update now_offset based on shared clock

We no longer rely on now_offset stored in the shm-stats-file. Instead
haproxy automatically computes the now_offset relative to the monotonic
clock and the shared global clock.

Indeed, the previous model based on static now_offset when monotonic
clock is available proved to be insufficient when used in
combination with shm-stats-file (that is when monotonic clock is shared
between multiple co-processes). In ideal situation co-processes would
correctly apply the offset to their local monotonic clock and end up
with consistent now_ns. But when restarting from an existing
shm-stats-file from a previous session (ie: prior to reboot), then the
local monotonic clock would no longer be consistent with the one used
to update the file previously, so applying a static offset would fail
to restore clock consistency.

For this specific issue, a workaround was brought by 09bf116
("BUG/MEDIUM: stats-file: detect and fix inconsistent shared clock when resuming from shm-stats-file")
but the solution implemented there was deemed too fragile, because there
is a 60sec window where the fix would fail to detect inconsistent clock
and would leave haproxy with a broken clock ranging from 0 to 60 seconds,
which can be huge..

By simply recomputing the now_offset each time we learn from another
process (through the shared map by reading global_now_ns), we simply
recompute our local offset (difference between OUR monotonic clock
and the SHARED one). Also, in clock_update_global_date(), we make
sure we always recompute the now_offset as now_ms may have been
updated from shared clock if shared clock was ahead of us.

Thanks to that new logic, interrupted processes, resumed processes,
processed started with shm-stats-file from previous session now
correctly recover from those various situations and multiple
co-processes with diverting clocks on startup end up converging to
the same values.

Since it is no longer relevant to save now_offset in the map, it was
removed but to prevent shm-stats-file incompatibility with previous
versions, 8-byte hole was forced, and we didn't bump the shm-stats-file
version on purpose.

This patch may be backported in 3.3 after a solid period of observation
to ensure we didn't break things.
This commit is contained in:
Aurelien DARRAGON 2026-02-20 22:45:11 +01:00
parent 29592cb330
commit 4319c20363
3 changed files with 36 additions and 47 deletions

View file

@ -36,7 +36,7 @@ struct shm_stats_file_hdr {
/* 2 bytes hole */
uint global_now_ms; /* global monotonic date (ms) common to all processes using the shm */
ullong global_now_ns; /* global monotonic date (ns) common to all processes using the shm */
llong now_offset; /* offset applied to global monotonic date on startup */
ALWAYS_PAD(8); // 8 bytes hole
/* each process uses one slot and is identified using its pid, max 64 in order
* to be able to use bitmask to refer to a process and then look its pid in the
* "slots.pid" map

View file

@ -266,6 +266,7 @@ void clock_update_global_date()
{
ullong old_now_ns;
uint old_now_ms;
int now_ns_changed = 0;
/* now that we have bounded the local time, let's check if it's
* realistic regarding the global date, which only moves forward,
@ -275,8 +276,10 @@ void clock_update_global_date()
old_now_ms = _HA_ATOMIC_LOAD(global_now_ms);
do {
if (now_ns < old_now_ns)
if (now_ns < old_now_ns) {
now_ns_changed = 1;
now_ns = old_now_ns;
}
/* now <now_ns> is expected to be the most accurate date,
* equal to <global_now_ns> or newer. Updating the global
@ -295,8 +298,11 @@ void clock_update_global_date()
if (unlikely(now_ms == TICK_ETERNITY))
now_ms++;
if (!((now_ns ^ old_now_ns) & ~0x7FFFULL))
if (!((now_ns ^ old_now_ns) & ~0x7FFFULL)) {
if (now_ns_changed)
goto end;
return;
}
/* let's try to update the global_now_ns (both in nanoseconds
* and ms forms) or loop again.
@ -305,6 +311,7 @@ void clock_update_global_date()
(now_ms != old_now_ms && !_HA_ATOMIC_CAS(global_now_ms, &old_now_ms, now_ms))) &&
__ha_cpu_relax());
end:
if (!th_ctx->curr_mono_time) {
/* Only update the offset when monotonic time is not available.
* <now_ns> and <now_ms> are now updated to the last value of
@ -314,6 +321,16 @@ void clock_update_global_date()
*/
HA_ATOMIC_STORE(&now_offset, now_ns - tv_to_ns(&date));
}
else if (global_now_ns != &_global_now_ns) {
/*
* or global_now_ns is shared with other processes: this results
* in the now_offset requiring to self-adjust so that it is consistent
* with now_offset used by other processes, as we may have learned from
* a new global_now_ns that was used in pair with a different offset from
* ours
*/
HA_ATOMIC_STORE(&now_offset, now_ns - th_ctx->curr_mono_time);
}
}
/* must be called once at boot to initialize some global variables */

View file

@ -898,7 +898,6 @@ int shm_stats_file_prepare(void)
/* set global clock for the first time */
shm_stats_file_hdr->global_now_ms = *global_now_ms;
shm_stats_file_hdr->global_now_ns = *global_now_ns;
shm_stats_file_hdr->now_offset = clock_get_now_offset();
}
else if (!shm_stats_file_check_ver(shm_stats_file_hdr))
goto err_version;
@ -912,65 +911,38 @@ int shm_stats_file_prepare(void)
global_now_ns = &shm_stats_file_hdr->global_now_ns;
if (!first) {
llong adjt_offset;
llong new_offset, adjt_offset;
/* Given the clock from the shared map and our current clock which is considered
* up-to-date, we can now compute the now_offset that we will be using instead
* of the default one in order to make our clock consistent with the shared one
*
* First we remove the original offset from now_ns to get pure now_ns
* then we compare now_ns with the shared clock, which gives us the
* relative offset we should be using to make our monotonic clock
* coincide with the shared one.
*/
new_offset = HA_ATOMIC_LOAD(global_now_ns) - (now_ns - clock_get_now_offset());
/* set adjusted offset which corresponds to the corrected offset
* relative to the initial offset stored in the shared memory instead
* of our process-local one
* relative to the new offset we calculated instead or the default
* one
*/
adjt_offset = -clock_get_now_offset() + shm_stats_file_hdr->now_offset;
adjt_offset = -clock_get_now_offset() + new_offset;
/* we now rely on global_now_* from the shm, so the boot
* offset that was initially applied in clock_init_process_date()
* is no longer relevant. So we fix it by applying the one from the
* initial process instead
*/
if (HA_ATOMIC_LOAD(global_now_ns) >
(now_ns + adjt_offset) +
(unsigned long)SHM_STATS_FILE_HEARTBEAT_TIMEOUT * 1000 * 1000 * 1000) {
/* global_now_ns (which is supposed to be monotonic, as
* with now_ns) is inconsistent with local now_ns (off by
* more than SHM_STATS_FILE_HEARTBEAT_TIMEOUT seconds): global
* is too ahead from local while they are supposed to be close
* to each other. A possible cause for that is that we are
* resuming from a shm-state-file which was generated on another
* host or after a system reboot (monotonic clock is reset
* between reboots)
*
* Since we cannot work with inconsistent global and local
* now_ns, to prevent existing shared records that depend on
* the shared global_now_ns to become obsolete, we manually
* adjust the now_offset so that local now_ns is consistent
* with the global one.
*/
now_ns -= clock_get_now_offset();
adjt_offset = HA_ATOMIC_LOAD(global_now_ns) - now_ns;
/*
* While we are normally not supposed to change the shm-stats-file
* offset once it was set, we make an exception here as we
* can safely consider we are the only process working on the
* file (after reboot or host migration). Doing this ensure
* future processes from the same host will use the corrected
* offset right away.
*/
shm_stats_file_hdr->now_offset = adjt_offset;
}
now_ns = now_ns + adjt_offset;
start_time_ns = start_time_ns + adjt_offset;
clock_set_now_offset(shm_stats_file_hdr->now_offset);
clock_set_now_offset(new_offset);
/* ensure global_now_* is consistent before continuing */
clock_update_global_date();
}
/* now that global_now_ns is accurate, recompute precise now_offset
* if needed (in case it is dynamic when monotonic clock not available)
*/
if (!th_ctx->curr_mono_time)
clock_set_now_offset(HA_ATOMIC_LOAD(global_now_ns) - tv_to_ns(&date));
/* sync local and global clocks, so all clocks are consistent */
clock_update_date(0, 1);