BUG/MEDIUM: peers: Don't use resync timer when local resync is in progress

When a worker is stopped, the resync timer is used to limit in time the
connection stage to the new worker to perform the local resync. However,
this timer must be stopped when the resync is in progress and it must be
re-armed if the resync is interrupted (for instance because another
reload). Otherwise, if the resync is a bit long, an old worker may be killed
too early.

This bug was introduce by the commit 160fff665 ("BUG/MEDIUM: peers: limit
reconnect attempts of the old process on reload"). It must be backported as
far as 2.0.
This commit is contained in:
Christopher Faulet 2022-08-26 18:40:46 +02:00
parent 13db4bdbc6
commit 19a82b9495

View file

@ -3467,6 +3467,10 @@ struct task *process_peer_sync(struct task * task, void *context, unsigned int s
}
}
else if (!ps->appctx) {
/* Re-arm resync timeout if necessary */
if (!tick_isset(peers->resync_timeout))
peers->resync_timeout = tick_add(now_ms, MS_TO_TICKS(PEER_RESYNC_TIMEOUT));
/* If there's no active peer connection */
if (!tick_is_expired(peers->resync_timeout, now_ms) &&
(ps->statuscode == 0 ||
@ -3502,6 +3506,9 @@ struct task *process_peer_sync(struct task * task, void *context, unsigned int s
}
}
else if (ps->statuscode == PEER_SESS_SC_SUCCESSCODE ) {
/* Reset resync timeout during a resync */
peers->resync_timeout = TICK_ETERNITY;
/* current peer connection is active and established
* wake up all peer handlers to push remaining local updates */
for (st = ps->tables; st ; st = st->next) {