From cfa61290400211f585e6669262f464bc01705f3a Mon Sep 17 00:00:00 2001 From: Yuan Wang Date: Mon, 19 Jan 2026 19:57:20 +0800 Subject: [PATCH] Minor fixes for ASM (#14707) - **TCL test failure** https://github.com/redis/redis/actions/runs/21121021310/job/60733781853#step:6:5705 ``` [err]: Test cluster module notifications when replica restart with RDB during importing in tests/unit/cluster/atomic-slot-migration.tcl Expected '{sub: cluster-slot-migration-import-started, source_node_id:28c64b3f462f3c29aa3c96c2ba5dff948dfe315b, destination_node_id:1382a4b4ca86621e39068ee8b25524a44a21bbc1, task_id:4d185a5398be94edac0dd77fff094eb7f5c73ec4, slots:0-100}' to be equal to '{sub: cluster-slot-migration-import-started, source_node_id:28c64b3f462f3c29aa3c96c2ba5dff948dfe315b, destination_node_id:1382a4b4ca86621e39068ee8b25524a44a21bbc1, task_id:4d185a5398be94edac0dd77fff094eb7f5c73ec4, slots:0-100} {sub: cluster-slot-migration-import-completed, source_node_id:28c64b3f462f3c29aa3c96c2ba5dff948dfe315b, destination_node_id:1382a4b4ca86621e39068ee8b25524a44a21bbc1, task_id:4d185a5398be94edac0dd77fff094eb7f5c73ec4, slots:0-100}' (context: type eval line 29 cmd {assert_equal [list "sub: cluster-slot-migration-import-started, source_node_id:$src_id, destination_node_id:$dest_id, task_id:$task_id, slots:0-100" ] [R 4 asm.get_cluster_event_log]} proc ::test) ``` If there is a delay to work to check, the ASM task may complete, so we will get `started & completed` ASM log instead of only `started` log, it feels fragile, so delete the check, we will check all logs later. ``` restart_server -4 true false true save ;# rdb save ---> if there is a delay, the ASM task should complete # the asm task info in rdb will fire module event assert_equal [list \ "sub: cluster-slot-migration-import-started, source_node_id:$src_id, destination_node_id:$dest_id, task_id:$task_id, slots:0-100" \ ] [R 4 asm.get_cluster_event_log] ``` - **Start BGSAVE for slot snapshot ASAP** Since we consider the migrating client as a replica that wants diskless replication, so it will wait for repl-diskless-sync-delay` to start a new fork after the last child exits. But actually slot snapshot can not be shared with other slaves, so we can start BGSAVE for it immediately. also resolve internal ticket RED-177974. --- src/replication.c | 4 ++++ tests/unit/cluster/atomic-slot-migration.tcl | 4 ---- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/replication.c b/src/replication.c index 14241c79e..d4ea0aa6d 100644 --- a/src/replication.c +++ b/src/replication.c @@ -5002,6 +5002,10 @@ int shouldStartChildReplication(int *mincapa_out, int *req_out) { continue; } idle = server.unixtime - slave->lastinteraction; + /* If the slave requests a slots snapshot, we should start BGSAVE + * immediately since it can't share the RDB with other slaves. */ + if (slave->slave_req & SLAVE_REQ_SLOTS_SNAPSHOT) + idle = server.repl_diskless_sync_delay; /* Threshold for BGSAVE */ if (idle > max_idle) max_idle = idle; slaves_waiting++; mincapa = first ? slave->slave_capa : (mincapa & slave->slave_capa); diff --git a/tests/unit/cluster/atomic-slot-migration.tcl b/tests/unit/cluster/atomic-slot-migration.tcl index 6c398ccc5..f04257fe5 100644 --- a/tests/unit/cluster/atomic-slot-migration.tcl +++ b/tests/unit/cluster/atomic-slot-migration.tcl @@ -2630,10 +2630,6 @@ start_cluster 3 6 [list tags {external:skip cluster modules} config_lines [list # restart node 4 if {$with_rdb eq "with"} { restart_server -4 true false true save ;# rdb save - # the asm task info in rdb will fire module event - assert_equal [list \ - "sub: cluster-slot-migration-import-started, source_node_id:$src_id, destination_node_id:$dest_id, task_id:$task_id, slots:0-100" \ - ] [R 4 asm.get_cluster_event_log] } else { restart_server -4 true false true nosave ;# no rdb saved }