redis

mirror of https://github.com/redis/redis.git synced 2026-06-08 16:24:26 -04:00

Author	SHA1	Message	Date
Vitah Lin	31896140d1	Fix diskless replicas drop during rdb pipe test (#15131 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details This PR is based on: valkey-io/valkey#3511 Close https://github.com/redis/redis/issues/14983 ## Summary During diskless replication, if any single replica cannot accept a write (TCP send buffer full / `EAGAIN`), the master stops reading the RDB pipe entirely, stalling data delivery to all replicas — including fast ones that are ready to receive data. The failure reason is similar to https://github.com/redis/redis/pull/14946, the socket buffer is more easy to fill. ## Root Cause In `rdbPipeReadHandler`, the master reads from the child's RDB pipe and writes to all replica sockets in a loop. When `connWrite` to any replica returns a partial write (socket send buffer full), the handler: 1. Installs a per-replica `rdbPipeWriteHandler` and increments `rdb_pipe_numconns_writing` 2. Removes the pipe read event via `aeDeleteFileEvent(server.el, server.rdb_pipe_read, AE_READABLE)`, stopping all pipe reads The pipe read event is only re-enabled when all pending write handlers complete (`rdb_pipe_numconns_writing == 0`), meaning the slowest replica dictates the throughput for all replicas. ## Observed Behavior With one slow replica (consuming at ~290 KB/s due to `key-load-delay`): - Master bursts ~1.3 MB of RDB data until the slow replica's socket send buffer fills - `rdbPipeReadHandler` disables the pipe read event - All replicas starve for 4–5 seconds while the slow replica drains its buffer - Cycle repeats: burst → stall → burst → stall Ultimately, it leads to a very slow synchronization process of the entire master and replica. ### Changes 1. Skip the entire `diskless replicas drop during rdb pipe` test under Valgrind to avoid timing flakiness on slow env. 2. Move `start_server` inside the `foreach all_drop` loop so each subcase gets a fresh master instead of sharing state across subcases. 3. For `no / slow / fast / all` subcases, replica 0 runs with `key-load-delay 500`, which combined with the blocked-writer TCP back-pressure can stall the RDB-saving child indefinitely; shrink the dataset to ~40 MB so the transfer still exercises the blocked-writer path but completes in reasonable time instead of hanging on the TCP deadlock. For the timeout subcase, replica 0 does not run with `key-load-delay 500`, so to avoid the TCP deadlock we still reduce the dataset somewhat, but keep it larger than the other subcases. Otherwise the kernel TCP send buffer can absorb the whole RDB, and we'd miss the repl_last_partial_write != 0 "(full sync)" timeout path and only hit the "(streaming sync)" path instead. 5. For the `all` subcase, set `rdb-key-save-delay 1000` on the master so the RDB child keeps generating data while both replicas are killed, ensuring the last-replica-drop path is exercised rather than racing with normal completion. 6. Move the slow-replica `pause_process()` so it happens only in the timeout subcase, not after killing replicas, so Redis observes the disconnect promptly in non-timeout flows. 7. In the timeout subcase, set `repl-timeout` 2, wait inline for `Disconnecting timedout replica (full sync)`, then restore `repl-timeout` 60 so the remaining replica can finish the streamed RDB. --------- Co-authored-by: Sarthak Aggarwal <sarthagg@amazon.com> Co-authored-by: debing.sun <debing.sun@redis.com>	2026-05-19 18:27:33 +08:00
Mincho Paskalev	2e46d2e735	Hold GCRA out of the release (#15191 ) Some checks failed CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details Reply-schemas linter / reply-schemas-linter (push) Has been cancelled Details After introducing GCRA algorithm into redis https://github.com/redis/redis/pull/14826 and subsequent introduction of new RATE_LIMIT object type - https://github.com/redis/redis/pull/14905. It was internally decided not to introduce GCRA into the new release. As still no decision is made on whether it will be kept or not in the future, this PR only makes the code related to GCRA dead - commands are inaccessible and AOF/RDB load+save is disabled. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2026-05-14 16:31:25 +03:00
Scott Lasley	dbe5d1e605	Fix invalid repl-diskless-load value in replication test (#15178 ) Close #15177 Follow [Fix use-after-free when fullsync happens while replica is running a timed out script (CVE-2026-23631)](`0cca172a17`) Remove the `repl-diskless-load yes` test configuration because this option exists only in the Redis fork and is not available in Redis OSS. (cherry picked from commit `5033e15143`)	2026-05-14 14:10:41 +03:00
Ozan Tezcan	837ca7f8f4	Fix use-after-free when fullsync happens while replica is running a timed out script (CVE-2026-23631) Fullsync triggers emptyData and scriptingReset which free the scripting/function engine. If a timed out script is still running on the replica, this causes a use-after-free. Delay fullsync processing in readSyncBulkPayload until the script finishes.	2026-05-14 14:10:41 +03:00
Sergei Georgiev	c4e3405704	Invalid Memory Access in Redis RESTORE Command (CVE-2026-25243)	2026-05-14 14:10:41 +03:00
Salvatore Sanfilippo	0d9576435f	Implement the new Redis Array type (#15162 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Reply-schemas linter / reply-schemas-linter (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details # Redis Array For years, Redis has been missing a real indexed data structure for the use cases where the index and the spatial relationship of elements are semantic. Hashes give you random lookups, but you have to store an index as a key, and have no range visibility. Lists give you appending and trimming, but what is in the middle remains hard to access. Streams give you append-only events, which is another (useful, indeed) beast. None of these is what you want when the position itself has business meaning — slot 37, step 4, row 18552, day from 2934 to 2949, file line 11, 12, 15 and so forth. And, all those types, for different reasons, are all suboptimal when you want a ring buffer able to store the latest N observed samples of something. Up to now, users found ways (they always do \o/) using the fact that the data structures that are obvious in this universe are also extremely powerful, if well implemented. But this forces compromises. Arrays handle these index-first requirements natively, and usually with much better memory and CPU usage than the workarounds. If the use case is the right one, Arrays often provide much better space, time and usability at the same time. ## Internal encoding 1. When dense, an Array is essentially a more fancy C array. You don't pay anything for storing the index. 2. Yet, instead of going really flat, arrays are sliced into 4096-element slices, and each slice, when it contains just a few elements, uses a special sparse encoding. When a slice is empty it's just a `NULL` stored in the directory. 3. Small ints, floats, and short strings are pointer-tagged, so they cost zero additional memory beyond the pointer slot itself. 4. When very sparse, a super-directory of windowed directories is used. This allows the data type to be safe, instead of exhibiting pathological space or time behavior. This representation is only triggered when there are more than 8 million elements or very high indexes set. ## Use cases Arrays are mostly stateless if not for the fact that each array remembers the index of the latest added item, allowing `ARINSERT` and `ARRING` to work properly. Otherwise it is a set/get at this index game, with solid support for both setting / getting ranges, server-side scanning, returning only populated elements in a time which is proportional not to the range size, but to the population size. A few concrete examples, that may work as mental models for the set of problems that are similar to them (from the POV of the data modeling). Thermometer. A sensor reporting once per minute, with gaps: ``` ARSET temp:room12:day7 123 22.3 ARGETRANGE temp:room12:day7 600 660 # the 10:00–11:00 window, with NULLs ARSCAN temp:room12:day7 600 660 # only populated elements AROP temp:room12:day7 0 1439 MAX # peak of the day, server-side ``` Missing minutes cost little to nothing. Numeric aggregation runs inside Redis. Telemetry, IoT, meter readings, KPI rollups. Calendar. A clinic with 96 fifteen-minute slots per day: ``` ARSET sched:room12:day 32 booking:991 ARSCAN sched:room12:day 0 95 # only occupied slots ARGETRANGE sched:room12:day 48 63 # the afternoon full view to render ``` The slot number is the business key in this case. Room booking, parking spaces, warehouse bins, lockers, ... Ring buffer. ARRING replaces the classic LPUSH+LTRIM pattern. Imagine remote `dmesg`. ``` ARRING machine:123 200 "[141087.430123]: arm_cpu_init(): cpu 14 online" # Capped to 200 entries ARLASTITEMS machine:123 50 REV # 50 newest first ``` Faster than LPUSH+LTRIM, keep indexed access to past elements. Last-N alarms, recent fraud scores, access history, remote logs, device events. Ok here the use cases are mainly the ones of the old pattern: it is just a better fit and allows to access random items in the middle, aggregate server-side, and so forth. Workflow. Step number is the index, value is the status. Gaps are meaningful: ``` ARSET claim:99172 0 received ARSET claim:99172 3 waiting:reviewer42 ARSET claim:99172 5 approved ARGETRANGE claim:99172 0 5 # full workflow view, with NULLs for missing steps ARSCAN claim:99172 0 5 # only steps that have a state ARCOUNT claim:99172 # number of recorded steps ARLEN claim:99172 # highest reached step + 1 ``` Skills knowledge base for agents. Arrays are good at representing / grepping into Markdown files: ``` ARSET skill:metal_gpu 0 "...." ARSET skill:metal_gpu 1 "...." ARSET skill:metal_gpu 2 "...." ARGREP skill:metal_gpu - + RE "M3\|M4" WITHVALUES ``` ARGREP has EXACT, MATCH, GLOB, RE, you can have multiple predicates, can select AND or OR behavior. Bulk import results. Sparse row annotations over millions of rows / CSV / ...: ``` ARSET import:job551 18552 ERR:bad_email ARSCAN import:job551 0 1000000 # Provides only rows that have something ``` ## TLDR If the position is part of the meaning, use an Array. If you want to aggregate or grep remotely, use an Array. Feedback welcome :) --------- Co-authored-by: debing.sun <debing.sun@redis.com> Co-authored-by: Shubham S Taple <155555100+ShubhamTaple@users.noreply.github.com> Co-authored-by: Yuan Wang <yuan.wang@redis.com> Co-authored-by: Marc Gravell <marc.gravell@gmail.com>	2026-05-14 00:56:44 +08:00
Sergei Georgiev	5e155593d9	Reject corrupt stream payloads with mismatched entry counts (#15124 ) ## Summary The stream RDB loader and listpack integrity validator had two gaps that allowed corrupted payloads to be silently accepted, potentially leading to crashes or incorrect behaviour at query time. 1. `deleted_count` in the listpack header was trusted without verification (`t_stream.c`) `streamValidateListpackIntegrity` already walks every entry in a stream listpack during deep validation, inspecting each entry's flags. However, the `deleted_count` value stored in the listpack header was never cross-checked against the actual number of entries carrying `STREAM_ITEM_FLAG_DELETED`. A corrupted or crafted payload could set an arbitrary `deleted_count`, causing the `entry_count` (live) and `deleted_count` to be inconsistent with the real data. This PR adds a running `actual_deleted` counter during the entry walk and rejects the listpack when it disagrees with the header. 2. `s->length` was only loosely validated against the rax (`rdb.c`) The old check (`s->length && !raxSize(s->rax)`) only caught the degenerate case where the stream claimed a non-zero length but had zero rax nodes. It did not detect a mismatch where the stream's `length` field differed from the sum of live (non-tombstone) entries across all listpacks. A corrupted payload could, for example, report `length = 2` while every listpack's live-entry count adds up to only 1, bypassing the check entirely and causing incorrect results from `XLEN`, `XRANGE`, `XREAD`, etc. This PR accumulates `live_entries` from each listpack's first element (the live-entry count) during the rax-loading loop and then performs an exact equality check (`s->length != live_entries`). The live-entry count itself is also validated against the in-memory invariant maintained by `streamIteratorRemoveEntry`: when the last live entry of a listpack is deleted, the whole rax node is removed, so a node in the rax must have `lp_live >= 1`. The loader enforces this by rejecting `lp_live <= 0` (not just `< 0`). Without this stricter bound, a payload where every listpack has `lp_live = 0` and `s->length = 0` would pass the equality check (`0 == 0`) and load into an inconsistent state: non-empty rax containing only-tombstone listpacks, with `XLEN` reporting 0. 3. Tests Three new `corrupt-dump.tcl` tests exercise the new rejection paths: - stream listpack with wrong deleted count in header — a crafted payload where the header says `deleted_count = 1` but the only entry is live, caught by the new `actual_deleted` check in `streamValidateListpackIntegrity`. Requires `sanitize-dump-payload yes` since the check runs only during deep validation. - stream length inconsistent with live entries — a crafted payload where the listpack reports `lp_live = 1` (so the `lp_live <= 0` guard passes) but `s->length = 2`, caught by the `s->length != live_entries` check in `rdbLoadObject`. - stream all-tombstone listpack with zero length — a crafted payload where the listpack header reports `lp_live = 0` and `s->length = 0`. This case slips past the equality check (`0 == 0`) and is uniquely caught by the tightened `lp_live <= 0` rejection at the listpack header.	2026-05-13 10:28:31 +03:00
Moti Cohen	5a05863e97	t_string: rewrite SET GET propagation in place (#15114 ) Optimize SET key value GET propagation rewriting in setGenericCommand() by removing GET arguments in-place with rewriteClientCommandArgument(). This avoids the overhead of allocating a new argv vector and incrementing reference counts for every retained argument. The optimization is scoped to the no-expire SET ... GET rewrite path. It also adds test coverage for cases with repeated GET tokens to ensure robust string semantics and consistent replication behavior. Changes: - Use rewriteClientCommandArgument(c, j, NULL) for in-place removal. - Eliminate redundant argv allocations and refcount increments. - Improve performance of SET GET in high-throughput write streams.	2026-04-28 10:05:59 +03:00
sggeorgiev	47c51369ee	Reject corrupt stream RDB with shared NACK across consumers (#15081 ) Some checks failed CI / test-ubuntu-latest (push) Has been cancelled Details CI / test-sanitizer-address (push) Has been cancelled Details CI / build-debian-old (push) Has been cancelled Details CI / build-macos-latest (push) Has been cancelled Details CI / build-32bit (push) Has been cancelled Details CI / build-libc-malloc (push) Has been cancelled Details CI / build-centos-jemalloc (push) Has been cancelled Details CI / build-old-chain-jemalloc (push) Has been cancelled Details Codecov / code-coverage (push) Has been cancelled Details External Server Tests / test-external-standalone (push) Has been cancelled Details External Server Tests / test-external-cluster (push) Has been cancelled Details External Server Tests / test-external-nodebug (push) Has been cancelled Details Spellcheck / Spellcheck (push) Has been cancelled Details Summary Detects and rejects corrupt stream RDB payloads where the same NACK (pending entry) is referenced by more than one consumer, which violates a stream data-structure. Changes - `rdbLoadObject` (stream consumer PEL loading): Added a guard that checks `nack->consumer != NULL` before assigning the consumer pointer. When a second consumer's PEL references a NACK that was already claimed by a prior consumer, the loader now reports a corrupt RDB error and aborts instead of silently overwriting the pointer. Without this check, two consumers share the same `streamNACK`, and freeing the first consumer's PEL leaves the second with a dangling pointer. - `corrupt-dump.tcl`: Added a regression test that crafts a stream with two consumers (`consumerA`, `consumerB`) whose PELs both reference the same entry (`1-0`). The `RESTORE` command is expected to fail with `"Bad data format"`, and the server must remain responsive (`PING` succeeds). Benefits - Fail-fast on corrupt data: The invariant violation is caught at load time with a clear diagnostic message rather than manifesting as a crash later during normal operation. - Regression coverage: The crafted payload in the test ensures this class of corruption is permanently guarded against.	2026-04-23 15:46:48 +03:00
Vitah Lin	15cb40dac2	Fix command-docs and corrupt-dump-fuzzer of OBJ_GCRA type (#15055 ) ### Problem While the new type `OBJ_GCRA` was added, several related code paths were not updated accordingly, leading to failures in the `reply-schemas-validator` CI job and `corrupt-dump-fuzzer.tcl` ##### reply-schemas-validator Failed CI: https://github.com/redis/redis/actions/runs/24485248057/job/71558533290#step:10:903 ```shell Traceback (most recent call last): File "/home/runner/work/redis/redis/./utils/req-res-log-validator.py", line 238, in process_file jsonschema.validate(instance=res.json, schema=req.schema, cls=schema_validator) File "/home/runner/.local/lib/python3.12/site-packages/jsonschema/validators.py", line 1121, in validate raise error jsonschema.exceptions.ValidationError: 'rate_limit' is not valid under any of the given schemas Failed validating 'oneOf' in schema['patternProperties']['^.*$']['properties']['group']: {'description': 'the functional group to which the command belongs', 'oneOf': [{'const': 'bitmap'}, {'const': 'cluster'}, {'const': 'connection'}, {'const': 'generic'}, {'const': 'geo'}, {'const': 'hash'}, {'const': 'hyperloglog'}, {'const': 'list'}, {'const': 'module'}, {'const': 'pubsub'}, {'const': 'scripting'}, {'const': 'sentinel'}, {'const': 'server'}, {'const': 'set'}, {'const': 'sorted-set'}, {'const': 'stream'}, {'const': 'string'}, {'const': 'transactions'}]} On instance['gcrasetvalue']['group']: 'rate_limit' ``` ##### `corrupt-dump-fuzzer.tcl` Also fixed `: Fuzzer corrupt restore payloads - sanitize_dump: yes in tests/integration/corrupt-dump-fuzzer.tcl` Failed daily test : https://github.com/redis/redis/actions/runs/24485248057/job/71558533312#step:6:8652 ```shell Server crashed (by signal: 0, err: key "gcra" not known in dictionary), with payload: "\x1C\x0A\x02\x5F\x37\xC0\x06\xC0\x00\x02\x5F\x39\xC0\x08\x02\x5F\x33\x02\x5F\x35\x02\x5F\x31\xC0\x02\xC0\x04\x0E\x00\xA9\x71\xBF\xEE\x6F\x46\xEF\xA6" violating commands: Done 1434 cycles in 600 seconds. RESTORE: successful: 601, rejected: 833 Total commands sent in traffic: 1194776, crashes during traffic: 1 (0 by signal). [: Fuzzer corrupt restore payloads - sanitize_dump: yes in tests/integration/corrupt-dump-fuzzer.tcl Expected '1' to be equal to '0' (context: type eval line 155 cmd {assert_equal $stat_terminated_in_traffic 0} proc ::test) [147/147 done]: integration/corrupt-dump-fuzzer (1201 seconds) ``` ### Changed This change completes the necessary updates across all relevant components to ensure consistent handling of the rate_limit group and restores CI stability.	2026-04-17 10:30:43 +03:00
debing.sun	ca6e471a3f	Fix decrRefCount on NULL robj on corrupt KEY_META payload (#15034 ) ## Summary This PR fixes two issues when processing corrupt data in rdbLoadCheckModuleValue(): 1. When handling `RDB_MODULE_OPCODE_STRING` opcode, rdbGenericLoadStringObject() can return NULL on a corrupt payload. The code called decrRefCount(o) unconditionally without a NULL check, resulting in a NULL pointer dereference crash. 2. The while loop condition was `!= RDB_MODULE_OPCODE_EOF`, which means a truncated payload (causing rdbLoadLen to return RDB_LENERR) would never exit the loop, since `RDB_LENERR != RDB_MODULE_OPCODE_EOF` is always true, potentially causing an infinite hang.	2026-04-16 21:50:49 +08:00
Yuan Wang	2f1a8b2bad	Dismiss dict bucket arrays in fork child to reduce CoW (#14979 ) During RDB saving and AOF rewriting, the fork child already dismisses (madvise(MADV_DONTNEED)) individual key-value objects after serializing them. However, the hash table bucket arrays of each dict were never dismissed, leaving large contiguous allocations subject to CoW when the parent modifies them. This PR extends the dismiss mechanism to cover dict bucket arrays, reducing CoW memory overhead. - Expires kvstore — dismissed upfront before saving starts, since the child never accesses expires directly, after embeding expire time in the key object. - Slot dicts (cluster mode) — dismissed per-slot as the iterator moves to the next slot during RDB saving or AOF rewriting. - DB keys kvstore (standalone mode) — dismissed per-DB after each DB is fully serialized during RDB saving or AOF rewriting.	2026-04-15 20:34:36 +08:00
Alessio Attilio	1abd489d07	Skip memory prefetch during loading to avoid crash in dictEmpty callback (#14848 ) Some checks failed CI / test-ubuntu-latest (push) Has been cancelled Details CI / test-sanitizer-address (push) Has been cancelled Details CI / build-debian-old (push) Has been cancelled Details CI / build-macos-latest (push) Has been cancelled Details CI / build-32bit (push) Has been cancelled Details CI / build-libc-malloc (push) Has been cancelled Details CI / build-centos-jemalloc (push) Has been cancelled Details CI / build-old-chain-jemalloc (push) Has been cancelled Details Codecov / code-coverage (push) Has been cancelled Details External Server Tests / test-external-standalone (push) Has been cancelled Details External Server Tests / test-external-cluster (push) Has been cancelled Details External Server Tests / test-external-nodebug (push) Has been cancelled Details Spellcheck / Spellcheck (push) Has been cancelled Details Fixes #14838 ## Summary Fix a crash in `prefetchCommands()` that occurs during replica full sync when the replica has existing data that needs to be emptied. ## Problem Description During `emptyData()` → `kvstoreEmpty()` → `dictEmpty()` → `_dictClear()`, the first hash table is cleared and `d->ht_table[0]` is set to NULL via `_dictReset`. Then while clearing the second hash table, every 65536 buckets it invokes `replicationEmptyDbCallback()` → `processEventsWhileBlocked()` → `readQueryFromClient()` → `prefetchCommands()`. At this point, `dictSize() > 0` still holds (because the second hash table isn't fully cleared yet), but `ht_table[0]` is already NULL. The prefetch code assumed `ht_table[0]` is always valid when `dictSize() > 0`, leading to a crash. ## Solution 1. Skip prefetch during loading: Added a `server.loading` check at the top of `prefetchCommands()` to return early. During RDB loading, the main dictionary is being rebuilt, so prefetching keys from it is useless anyway. 2. Add defensive assertion: Added `serverAssert(batch->current_dicts[i]->ht_table[0])` in `initBatchInfo()` to catch any future cases where `ht_table[0]` is NULL while `dictSize() > 0` (which should only happen mid-`dictEmpty` via `_dictReset`). --------- Co-authored-by: kairosci <kairosci@users.noreply.github.com> Co-authored-by: debing.sun <debing.sun@redis.com> Co-authored-by: Yuan Wang <yuan.wang@redis.com>	2026-03-24 10:22:58 +08:00
h.o.t. neglected	4ecc07fcff	Fix divide-by-zero in redis-benchmark and redis-cli (#14371 ) Some checks failed CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details Reply-schemas linter / reply-schemas-linter (push) Has been cancelled Details Resolve https://github.com/redis/redis/issues/14358 and https://github.com/redis/redis/issues/14359. Fix division by zero in redis-benchmark and redis-cli when histograms are empty or elapsed time is zero. Guard all affected divisions in showLatencyReport(), showThroughput(), and displayKeyStatsSizeDist(). Added integration tests for both cases. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2026-03-18 19:57:58 +08:00
Zijie Zhao	4f0d3118e1	Fix listpack memory leak in zipmap-to-hash conversion on error path (#14878 ) When loading RDB_TYPE_HASH_ZIPMAP objects, a listpack is allocated for the zipmap-to-listpack conversion. If the conversion loop encounters a duplicate key, allocation failure, or oversized listpack, the error handler frees the dict, field, encoded buffer, and object — but not the listpack. Add the missing lpFree(lp) call and a regression test that RESTOREs a zipmap with duplicate keys to exercise this path.	2026-03-16 14:12:03 +08:00
Cong Chen	b3ce4c28ca	Fix test assertion except from TSAN case (#14852 ) Some checks failed CI / test-ubuntu-latest (push) Has been cancelled Details CI / test-sanitizer-address (push) Has been cancelled Details CI / build-debian-old (push) Has been cancelled Details CI / build-macos-latest (push) Has been cancelled Details CI / build-32bit (push) Has been cancelled Details CI / build-libc-malloc (push) Has been cancelled Details CI / build-centos-jemalloc (push) Has been cancelled Details CI / build-old-chain-jemalloc (push) Has been cancelled Details Codecov / code-coverage (push) Has been cancelled Details External Server Tests / test-external-standalone (push) Has been cancelled Details External Server Tests / test-external-cluster (push) Has been cancelled Details External Server Tests / test-external-nodebug (push) Has been cancelled Details Spellcheck / Spellcheck (push) Has been cancelled Details Fix #14835 Executed `failover` with `force` argument in slow environments (TSAN, IO threads, etc) can force a full resync instead of a partial one. `e3c38aab6` adds TSAN-only workaround in integration/failover test, test failure occurs in `test‑ubuntu‑io‑threads` job, drop TSAN-only condition and always check the sum of partial+full syncs will fix it. Steps to reproduce: My setup: Hardware: MacBook Pro (Apple chip) 1. Build docker image with Dockerfile below and run the container: ``` FROM ubuntu:22.04 # avoid interactive prompts ENV DEBIAN_FRONTEND=noninteractive RUN apt-get update && \ apt-get install -y --no-install-recommends \ build-essential \ tcl8.6 \ tclx \ tcl-tls \ pkg-config \ ca-certificates \ git \ bash \ vim \ wget \ curl \ && rm -rf /var/lib/apt/lists/* WORKDIR /opt/redis # if you prefer to copy the source during build uncomment the next line # COPY . /opt/redis # configure a non-root user if desired; we stay root for simplicity # default command keeps you in a shell CMD ["/bin/bash"] ``` Execute the command: ``` cd /path/to/redis docker build -f Dockerfile.io-threads -t redis-test-io . docker run --rm -it --cpuset-cpus=0 --cpus=0.1 -v $(pwd):/opt/redis redis-test-io /bin/bash ``` 2. Inside the container you can then run the same commands as the CI job: ``` make REDIS_CFLAGS='-Werror' ./runtest --single integration/failover --config io-threads 4 --tags "failover" --verbose --accurate --loop --stop ``` If the container still seems too fast, lower `--cpus` further (e.g. 0.05), or run the whole driver under `cpulimit`: ``` apt-get update && apt-get install -y cpulimit cpulimit -l 1 -- ./runtest --single integration/failover --config io-threads 4 --tags "failover" --verbose --accurate --loop --stop ```	2026-03-06 13:11:38 +02:00
Vitah Lin	6088387b6f	Capture and report server crash after RESTORE in corrupt-dump-fuzzer test (#14836 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details ## Problem In `corrupt-dump-fuzzer.tcl`, after a successful `RESTORE` with a corrupted payload, the test did a bare `r ping` to check whether the server was still alive: If the server crashed at this point, the uncaught exception would silently terminate the entire test without printing the payload that caused the crash, making the issue extremely hard to reproduce and debug. ## Fix Wrap `r ping` with catch to ensure failures trigger the same reporting and restart logic as `RESTORE`. This ensures offending payloads are logged for reproduction.	2026-03-03 09:33:52 +08:00
Yuan Wang	099203cb2a	Fix DB hash tables not expanding during RDB load (#14789 ) In standalone mode, we create dicts on demand. by default, there is no dicts, and RESIZEDB hint doesn't expand dict actually. so we should create dict first when expanding if the dict does not exist.	2026-02-14 15:18:08 +08:00
Mincho Paskalev	e3c38aab66	Handle primary/replica clients in IO threads (#14335 ) # Problem While introducing Async IO threads(https://github.com/redis/redis/pull/13695) primary and replica clients were left to be handled inside main thread due to data race and synchronization issues. This PR solves this issue with the additional hope it increases performance of replication. # Overview ## Moving the clients to IO threads Since clients first participate in a handshake and an RDB replication phases it was decided they are moved to IO-thread after RDB replication is done. For primary client this was trivial as the master client is created only after RDB sync (+ some additional checks one can see in `isClientMustHandledByMainThread`). Replica clients though are moved to IO threads immediately after connection (as are all clients) so currently in `unstable` replication happens while this client is in IO-thread. In this PR it was moved to main thread after receiving the first `REPLCONF` message from the replica, but it is a bit hacky and we can remove it. I didn't find issues between the two versions. ## Primary client (replica node) We have few issues here: - during `serverCron` a `replicationCron` is ran which periodically sends `REPLCONF ACK` message to the master, also checks for timed-out master. In order to prevent data races we utilize`IOThreadClientsCron`. The client is periodically sent to main thread and during `processClientsFromIOThread` it's checked if it needs to run the replication cron behaviour. - data races with main thread - specifically `lastinteraction` and `read_reploff` members of the primary client that are written to in `readQueryFromClient` could be accessed at the same time from main thread during execution of `INFO REPLICATION`(`genRedisInfoString`). To solve this the members were duplicated so if the client is in IO-thread it writes to the duplicates and they are synced with the original variables each time the client is send to main thread ( that means `INFO REPLICATION` could potentially return stale values). - During `freeClient` the primary client is fetched to main thread but when caching it(`replicationCacheMaster`) the thread id will remain the id of the IO thread it was from. This creates problems when resurrecting the master client. Here the call to `unbindClientFromIOThreadEventLoop` in `freeClient` was rewritten to call `keepClientInMainThread` which automatically fixes the problem. - During `exitScriptTimedoutMode` the master is queued for reprocessing (specifically process any pending commands ASAP after it's unblocked). We do that by putting it in the `server.unblocked_clients` list, which are processed in the next `beforeSleep` cycle in main thread. Since this will create a contention between main and IO thread, we just skip this queueing in `unblocked_clients` and just queue the client to main thread - the `processClientsFromIOThread` will process the pending commands just as main would have. ## Replica clients (primary node) We move the client after RDB replication is done and after replication backlog is fed with its first message. We do that so that the client's reference to the first replication backlog node is initialized before it's read from IO-thread, hence no contention with main thread on it. ### Shared replication buffer Currently in unstable the replication buffer is shared amongst clients. This is done via clients holding references to the nodes inside the buffer. A node from the buffer can be trimmed once each replica client has read it and send its contents. The reference is `client->ref_repl_buf_node`. The replication buffer is written to by main thread in `feedReplicationBuffer` and the refcounting is intrusive - it's inside the replication-buffer nodes themselves. Since the replica client changes the refcount (decreases the refcount of the node it has just read, and increases the refcount of the next node it starts to read) during `writeToClient` we have a data race with main thread when it feeds the replication buffer. Moreover, main thread also updates the `used` size of the node - how much it has written to it, compared to its capacity which the replica client relies on to know how much to read. Obviously replica being in IO-thread creates another data race here. To mitigate these issues a few new variables were added to the client's struct: - `io_curr_repl_node` - starting node this replica is reading from inside IO-thread - `io_bound_repl_node` - the last node in the replication buffer the replica sees before being send to IO-thread. These values are only allowed to be updated in main thread. The client keeps track of how much it has read into the buffer via the old `ref_repl_buf_node`. Generally while in IO-thread the replica client will now keep refcount of the `io_curr_repl_node` until it's processed all the nodes up to `io_bound_repl_node` - at that point its returned to main thread which can safely update the refcounts. The `io_bound_repl_node` reference is there so the replica knows when to stop reading from the repl buffer - imagine that replica reads from the last node of the replication buffer while main thread feeds data to it - we will create a data race on the `used` value (`_writeToClientSlave`(IO-thread) vs `feedReplicationBuffer`(main)). That's why this value is updated just before the replica is being send to IO thread. NOTE, this means that when replicas are handled by IO threads they will hold more than one node at a time (i.e `io_curr_repl_node` up to `io_bound_repl_node`) meaning trimming will happen a bit less frequently. Tests show no significant problems with that. (tnx to @ShooterIT for the `io_curr_repl_node` and `io_bound_repl_node` mechanism as my initial implementation had similar semantics but was way less clear) Example of how this works: * Replication buffer state at time N: \| node 0\| ... \| node M, used_size K \| * replica caches `io_curr_repl_node`=0, `io_bound_repl_node`=M and `io_bound_block_pos`=K * replica moves to IO thread and processes all the data it sees * Replication buffer state at time N + 1: \| node 0\| ... \| node M, used_size Full \| \|node M + 1\| \|node M + 2, used_size L\|, where Full > M * replica moves to main thread at time N + 1, at this point following happens - refcount to node 0 (io_curr_repl_node) is decreased - `ref_repl_buf_node` becomes node M(io_bound_repl_node) (we still have size-K bytes to process from there) - refcount to node M is increased (now all nodes from 0 up to M-1 including can be trimmed unless some other replica holds reference to them) - And just before the replica is send back to IO thread the following are updated: - `io_bound_repl_node` ref becomes node M+2 - `io_bound_block_pos` becomes L Note that replica client is only moved to main if it has processed all the data it knows about (i.e up to `io_bound_repl_node` + `io_bound_block_pos`) ### Replica clients kept in main as much as possible During implementation an issue arose - how fast is the replica client able to get knowledge about new data from the replication buffer and how fast can it trim it. In order for that to happen ASAP whenever a replica is moved to main it remains there until the replication buffer is fed new data. At that point its put in the pending write queue and special cased in handleClientsWithPendingWrites so that its send to IO thread ASAP to write the new data to replica. Also since each time the replica writes its whole repl data it knows about that means after it's send to main thread `processClientsFromIOThread` is able to immediately update the refcounts and trim whatever it can. ### ACK messages from primary Slave clients need to periodically read `REPLCONF ACK` messages from client. Since replica can remain in main thread indefinitely if no DB change occurs, a new atomic `pending_read` was added during `readQueryFromClient`. If a replica client has a pending read it's returned back to IO-thread in order to process the read even if there is no pending repl data to write. ### Replicas during shutdown During shutdown the main thread pauses write actions and periodically checks if all replicas have reached the same replication offset as the primary node. During `finishShutdown` that may or may not be the case. Either way a client data may be read from the replicas and even we may try to write any pending data to them inside `flushSlavesOutputBuffers`. In order to prevent races all the replicas from IO threads are moved to main via `fetchClientFromIOThread`. A cancel of the shutdown should be ok, since the mechanism employed by `handleClientsWithPendingWrites` should return the client back to IO thread when needed. ## Notes While adding new tests timing issues with Tsan tests were found and fixed. Also there is a data race issue caught by Tsan on the `last_error` member of the `client` struct. It happens when both IO-thread and main thread make a syscall using a `client` instance - this can happen only for primary and replica clients since their data can be accessed by commands send from other clients. Specific example is the `INFO REPLICATION` command. Although other such races were fixed, as described above, this once is insignificant and it was decided to be ignored in `tsan.sup`. --------- Co-authored-by: Yuan Wang <wangyuancode@163.com> Co-authored-by: Yuan Wang <yuan.wang@redis.com>	2026-01-21 16:19:12 +02:00
debing.sun	e76e3af5b7	Fix some test timing issues in replication.tcl and maxmemory.tcl (#14718 ) 1) Replace fixed sleep with wait_for_condition to avoid flaky test failures when checking master_current_sync_attempts counter. 2) Similar to https://github.com/redis/redis/pull/14674, use assert_lessthan_equal instead of assert_lessthan to verify the idle time.	2026-01-20 19:25:15 +08:00
debing.sun	4eda670de9	Fix infinite loop during reverse iteration due to invalid numfields of corrupted stream (#14472 ) Follow https://github.com/redis/redis/pull/14423 In https://github.com/redis/redis/pull/14423, I thought the last lpNext operation of the iterator occurred at the end of streamIteratorGetID. However, I overlooked the fact that after calling `streamIteratorGetID()`, we might still use `streamIteratorGetField()` to continue moving within the current entry. This means that during reverse iteration, the iterator could move back to a previous entry position. To fix this, in this PR I record the current position at the beginning of streamIteratorGetID(). When we enter it again next time, we ensure that the entry position does not exceed the previous one, that is, during forward iteration the entry must be greater than the last entry position, and during reverse iteration it must be smaller than the last entry position. Note that the fix for https://github.com/redis/redis/pull/14423 has been replaced by this fix.	2026-01-05 21:16:53 +08:00
Yuan Wang	f3316c3a1a	Introduce flushdb option for repl-diskless-load (#14596 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details `repl-diskless-load` feature can effectively reduce the time of full synchronization, but maybe it is not widely used. `swapdb` option needs double `maxmemory`, and `on-empty-db` only works on the first full sync (the replica must have no data). This PR introduce a new option: `flushdb` - Always flush the entire dataset before diskless load. If the diskless load fails, the replica will lose all existing data. Of course, it brings the risk of data loss, but it provides a choice if you want to reduce full sync time and accept this risk.	2025-12-15 11:25:53 +08:00
Yuan Wang	cb71dec0c3	Disable RDB compression when diskless replication is used (#14575 ) Some checks failed CI / test-ubuntu-latest (push) Has been cancelled Details CI / test-sanitizer-address (push) Has been cancelled Details CI / build-debian-old (push) Has been cancelled Details CI / build-macos-latest (push) Has been cancelled Details CI / build-32bit (push) Has been cancelled Details CI / build-libc-malloc (push) Has been cancelled Details CI / build-centos-jemalloc (push) Has been cancelled Details CI / build-old-chain-jemalloc (push) Has been cancelled Details Codecov / code-coverage (push) Has been cancelled Details External Server Tests / test-external-standalone (push) Has been cancelled Details External Server Tests / test-external-cluster (push) Has been cancelled Details External Server Tests / test-external-nodebug (push) Has been cancelled Details Spellcheck / Spellcheck (push) Has been cancelled Details Fixes #14538 If the master uses diskless synchronization and the replica uses diskless load, we can disable RDB compression to reduce full sync time. I tested on AWS and found we could reduce time by 20-40%. In terms of implementation, when the replica can use diskless load, the replica will send `replconf rdb-no-compress 1` to master to deliver a RDB without compression. If your network is slow, please disable repl-diskless-load, and maybe even repl-diskless-sync --------- Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>	2025-12-04 09:24:23 +08:00
lihp	0288d70820	Fixes an issue where EXEC checks ACL during AOF loading (#14545 ) Some checks failed CI / test-ubuntu-latest (push) Has been cancelled Details CI / test-sanitizer-address (push) Has been cancelled Details CI / build-debian-old (push) Has been cancelled Details CI / build-macos-latest (push) Has been cancelled Details CI / build-32bit (push) Has been cancelled Details CI / build-libc-malloc (push) Has been cancelled Details CI / build-centos-jemalloc (push) Has been cancelled Details CI / build-old-chain-jemalloc (push) Has been cancelled Details Codecov / code-coverage (push) Has been cancelled Details External Server Tests / test-external-standalone (push) Has been cancelled Details External Server Tests / test-external-cluster (push) Has been cancelled Details External Server Tests / test-external-nodebug (push) Has been cancelled Details Spellcheck / Spellcheck (push) Has been cancelled Details This PR fixes an issue(#14541) where EXEC’s ACL recheck was still being performed during AOF loading, that may cause AOF loading failed, if ACL rules are changed and don't allow some commands in MULTI-EXEC.	2025-11-22 11:52:31 +08:00
Alexey Weyl	b5f106110a	Add rdb_consecutive_write_failures metric (#14493 ) Some checks failed CI / test-ubuntu-latest (push) Has been cancelled Details CI / test-sanitizer-address (push) Has been cancelled Details CI / build-debian-old (push) Has been cancelled Details CI / build-macos-latest (push) Has been cancelled Details CI / build-32bit (push) Has been cancelled Details CI / build-libc-malloc (push) Has been cancelled Details CI / build-centos-jemalloc (push) Has been cancelled Details CI / build-old-chain-jemalloc (push) Has been cancelled Details Codecov / code-coverage (push) Has been cancelled Details External Server Tests / test-external-standalone (push) Has been cancelled Details External Server Tests / test-external-cluster (push) Has been cancelled Details External Server Tests / test-external-nodebug (push) Has been cancelled Details Spellcheck / Spellcheck (push) Has been cancelled Details Currently, it is difficult to determine whether an RDB persistence failure is caused by a persistent issue in the process or by a temporary error. Tracking the number of consecutive RDB persistence failures would help clarify this distinction. Multiple failures in a row could indicate an ongoing problem, while isolated cases may point to transient or environmental issues.	2025-10-31 14:03:40 +08:00
Stav-Levi	6431de3c38	Refactor and change config for broken AOF tail on startup (#14349 ) This PR follows https://github.com/redis/redis/pull/14058 When Redis is shut down uncleanly (e.g., due to power loss or system crashes), invalid bytes may remain at the end of AOF files. This PR refactors the AOF corruption handling system and introduces automatic recovery capabilities to improve resilience and reduce downtime. The feature only handles corruption at the end of files, respects the configured size limit to prevent excessive data loss, maintains all valid commands before the corruption point. Removed aof-load-broken configuration option Rename aof-load-broken-max-size to aof-load-corrupt-tail-max-size. New Config: aof-load-corrupt-tail-max-size - enables recovery for corruption up to specified size	2025-10-23 11:33:59 +03:00
debing.sun	166ae60734	Fix infinite loop during reverse iteration due to corrupted stream (#14423 ) Some checks are pending CI / test-ubuntu-latest (push) Waiting to run Details CI / test-sanitizer-address (push) Waiting to run Details CI / build-debian-old (push) Waiting to run Details CI / build-macos-latest (push) Waiting to run Details CI / build-32bit (push) Waiting to run Details CI / build-libc-malloc (push) Waiting to run Details CI / build-centos-jemalloc (push) Waiting to run Details CI / build-old-chain-jemalloc (push) Waiting to run Details Codecov / code-coverage (push) Waiting to run Details External Server Tests / test-external-standalone (push) Waiting to run Details External Server Tests / test-external-cluster (push) Waiting to run Details External Server Tests / test-external-nodebug (push) Waiting to run Details Spellcheck / Spellcheck (push) Waiting to run Details When the `lp_count` of a stream entry is incorrect, and we are performing a reverse iteration, we can normally move backward by `lp_count` to locate the previous entry. However, if `lp_count` is wrong, during forward iteration, after obtaining the flag, ID, and other fields, we may step beyond the current entry into the start position of the next entry(where we came from), which could cause an infinite loop. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-10-15 14:45:55 +08:00
debing.sun	6d89370cd6	Fix shutdown blocked client not being properly reset after shutdown cancellation (#14420 ) This issue was introduced by https://github.com/redis/redis/pull/10440 In that PR, we avoided resetting the current user during processCommand, but overlooked the fact that this client might not be the current one, it could be a client that was previously blocked due to shutdown. If we don’t reset these clients, and the shutdown is canceled, then when these clients continue executing other commands, they will trigger an assertion. This PR delays the operation of resetting the client to processUnblockedClients and no longer skips SHUTDOWN_BLOCKED clients.	2025-10-15 14:13:40 +08:00
debing.sun	60adba48aa	Introduce DEBUG_DEFRAG compilation option to allow run test with activedefrag when allocator is not jemalloc (#14326 ) This PR is based on https://github.com/valkey-io/valkey/pull/1303 This PR introduces a DEBUG_DEFRAG compilation option that enables activedefrag functionality even when the allocator is not jemalloc, and always forces defragmentation regardless of the amount or ratio of fragmentation. ## Using ``` make SANITIZER=address DEBUG_DEFRAG=<force\|fully> ./runtest --debug-defrag ``` * DEBUG_DEFRAG=force * Ignore the threshold for defragmentation to ensure that defragmentation is always triggered. * Always reallocate pointers to probe for correctness issues in pointer reallocation. * DEBUG_DEFRAG=fully * Includes everything in the option `force`. * Additionally performs a full defrag on every defrag cycle, which is significantly slower but more accurate. --------- Co-authored-by: Ran Shidlansik <ranshid@amazon.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: oranagra <oran@redislabs.com>	2025-09-10 12:52:20 +08:00
Stav-Levi	eac48279ad	Add auto-repair options for broken AOF tail on startup (#14058 ) When Redis is shut down uncleanly (e.g., due to power loss), invalid bytes may remain at the end of the AOF file. Currently, Redis detects such corruption only after parsing most of the AOF, leading to delayed error detection and increased downtime. Manual recovery via `redis-check-aof --fix` is also time-consuming. This fix introduces two new options to improve resilience and reduce downtime: - `aof-load-broken`: Enables automatic detection and repair of broken AOF tails. - `aof-load-broken-max-size`: Sets a maximum threshold (in bytes) for the corrupted tail size that Redis will attempt to fix automatically without requiring user intervention.	2025-08-12 10:55:16 +03:00
Ozan Tezcan	e03a4584fa	Fix flaky repl test by increasing key count to capture LOADING reply (#14265 ) Increased replica key count to make flush take longer, ensuring the test catches the `-LOADING` reply reliably.	2025-08-11 09:26:05 +03:00
Stav-Levi	7b40dbacdf	Fix timing issue for correct replication disconnection time counters behavior test (#14221 ) Co-authored-by: debing.sun <debing.sun@redis.com>	2025-08-06 15:07:30 +08:00
Ozan Tezcan	e9d2bf48e0	Skip replication test due to flakiness under TSAN (#14225 )	2025-07-28 17:55:09 +08:00
Yuan Wang	4313d7ff23	Stabilize tests for IO threading (#14138 ) - tests/unit/maxmemory.tcl If multithreaded, we need to let IO threads have chance to reply output buffer, to avoid next commands causing eviction. After eviction is performed, the next command becomes ready immediately in IO threads, and now we enqueue the client to be processed in main thread’s beforeSleep without notification. However, invalidation messages generated by eviction may not have been fully delivered by that time. As a result, executing the command in beforeSleep of the event loop (running eviction) can cause additional keys to be evicted. ``` Expected '73' to be between to '200' and '300' (context: type source line 473 file redis/tests/unit/maxmemory.tcl cmd {assert_range [r dbsize] 200 300} proc ::test) ``` the reason why CI doesn't find this issue is that we skill this test `tsan:skip` as below `start_server {tags {"maxmemory external:skip tsan:skip"}} `,so remove this tag. - tests/integration/aof.tcl Because IO and the main thread are working in better parallelism without notification, the main thread may haven't write AOF buffer into file, but the IO thread just writes the reply, so the clients receive the reply before AOF file is changed. We should use `appendfsync always` policy to make the command is written into AOF file when receiving reply. ``` Expected '0' to be equal to '54' (context: type source line 249 file redis/tests/integration/aof.tcl cmd {assert_equal $before $after} proc ::test) ``` #13969 makes these scenarios easy to appear.	2025-06-25 15:36:40 +08:00
Ozan Tezcan	03816c15f7	Fix short read of hfe key that causes exit() on replica (#14143 ) If replica detects broken connection while reading min expiration time of hfe key, it calls exit(). Fixed it to handle the error gracefully without calling exit. To reproduce the issue, the short-read test was modified to generate many small hfe keys, increasing the likelihood of a connection drop while reading min expiration time: ```tcl for {set k 0} {$k < 50000} {incr k} { for {set i 0} {$i < 1} {incr i} { r hsetex "$k hfe_small" EX [expr {int(rand()10)}] FIELDS 1 [string repeat A [expr {int(rand()10)}]] 0[string repeat A [expr {int(rand()*10)}]] } } ``` We can't have the test use only hfe keys, so a few were added alongside other data. I couldn't reproduce the issue this way but with the test's randomization, it should hit this scenario in one of the runs.	2025-06-23 07:41:30 +03:00
Stav-Levi	51239f75d0	Record the time a replica attempts to connect with master (#13990 ) Merge fork counters with https://github.com/redis/redis/pull/12957 repl_current_sync_attempts - Total number of attempts to connect to a master since the last time we disconnected from a good connection (or a configuration change). any number greater than 1 (even if the link is currently up), indicates an issue. repl_total_sync_attempts - Number of times in current configuration, the replica attempted to sync to a master. (dosent reset on master reconnect.) repl_total_disconnect_time - Total cumulative time we've been disconnected as a replica, visible when the link is up too. master_link_up_since_seconds - Number of seconds since the link is down, just maintain symmetry with master_link_down_since_seconds.	2025-06-22 09:19:26 +03:00
Slavomir Kaslev	b7c6755b1b	Add thread sanitizer run to daily CI (#13964 ) Add thread sanitizer run to daily CI. Few tests are skipped in tsan runs for two reasons: * Stack trace producing tests (oom, `unit/moduleapi/crash`, etc) are tagged `tsan:skip` because redis calls `backtrace()` in signal handler which turns out to be signal-unsafe since it might allocate memory (e.g. glibc 2.39 does it through a call to `_dl_map_object_deps()`). * Few tests become flaky with thread sanitizer builds and don't finish in expected deadlines because of the additional tsan overhead. Instead of skipping those tests, this can improved in the future by allowing more iterations when waiting for tsan builds. Deadlock detection is disabled for now because of tsan limitation where max 64 locks can be taken at once. There is one outstanding (false-positive?) race in jemalloc which is suppressed in `tsan.sup`. Fix few races thread sanitizer reported having to do with writes from signal handlers. Since in multi-threaded setting signal handlers might be called on any thread (modulo pthread_sigmask) while the main thread is running, `volatile sig_atomic_t` type is not sufficient and atomics are used instead.	2025-06-02 10:13:23 +03:00
Ozan Tezcan	a0b22576b8	Fix flaky replication test (#14034 ) - Fix flaky replication test which checks memory usage on master - Fix comments in another replication test	2025-05-13 13:29:27 +03:00
Salvatore Sanfilippo	65e164caff	[Vector sets] More rdb loading fixes (#14032 ) Hi all, this PR fixes two things: 1. An assertion, that prevented the RDB loading from recovery if there was a quantization type mismatch (with regression test). 2. Two code paths that just returned NULL without proper cleanup during RDB loading.	2025-05-12 21:57:38 +03:00
Moti Cohen	e1789e4368	keyspace - Unify key and value & use dict no_value=1 (#13806 ) The idea of packing the key (`sds`), value (`robj`) and optionally TTL into a single struct in memory was mentioned a few times in the past by the community in various flavors. This approach improves memory efficiency, reduces pointer dereferences for faster lookups, and simplifies expiration management by keeping all relevant data in one place. This change goes along with setting keyspace's dict to no_value=1, and saving considerable amount of memory. Two more motivations that well aligned with this unification are: - Prepare the groundwork for replacing EXPIRE scan based implementation and evaluate instead new `ebuckets` data structure that was introduced as part of [Hash Field Expiration feature](https://redis.io/blog/hash-field-expiration-architecture-and-benchmarks/). Using this data structure requires embedding the ExpireMeta structure within each object. - Consider replacing dict with a more space efficient open addressing approach hash table that might rely on keeping a single pointer to object. Before this PR, I POC'ed on a variant of open addressing hash-table and was surprised to find that dict with no_value actually could provide a good balance between performance, memory efficiency, and simplicity. This realization prompted the separation of the unification step from the evaluation of a new hash table to avoid introducing too many changes at once and to evaluate its impact independently before considering replacement of existing hash-table. On an earlier [commit](https://github.com/redis/redis/pull/13683) I extended dict no_value optimization (which saves keeping dictEntry where possible) to be relevant also for objects with even addresses in memory. Combining it with this unification saves a considerable amount of memory for keyspace. # kvobj This PR adopts Valkey’s [packing](`3eb8314be6`) layout and logic for key, value, and TTL. However, unlike Valkey implementation, which retained a common `robj` throughout the project, this PR distinguishes between the general-purpose, overused `robj`, and the new `kvobj`, which embeds both the key and value and used by the keyspace. Conceptually, `robj` serves as a base class, while `kvobj` acts as a derived class. Two new flags introduced into redis object, `iskvobj` and `expirable`: ``` struct redisObject { unsigned type:4; unsigned encoding:4; unsigned lru:LRU_BITS; unsigned iskvobj : 1; /* new flag / unsigned expirable : 1; / new flag / unsigned refcount : 30; / modified: 32bits->30bits / void ptr; }; typedef struct redisObject robj; typedef struct redisObject kvobj; ``` When the `iskvobj` flag is set, the object includes also the key and it is appended to the end of the object. If the `expirable` flag is set, an additional 8 bytes are added to the object. If the object is of type string, and the string is rather short, then it will be embedded as well. As a result, all keys in the keyspace are promoted to be of type `kvobj`. This term attempts to align with the existing Redis object, robj, and the kvstore data structure. # EXPIRE Implementation As `kvobj` embeds expiration time as well, looking up expiration times is now an O(1) operation. And the hash-table of EXPIRE is set now to be `no_value` mode, directly referencing `kvobj` entries, and in turn, saves memory. Next, I plan to evaluate replacing the EXPIRE implementation with the [ebuckets](https://github.com/redis/redis/blob/unstable/src/ebuckets.h) data structure, which would eliminate keyspace scans for expired keys. This requires embedding `ExpireMeta` within each `kvobj` of each key with expiration. In such implementation, the `expirable` flag will be shifted to indicate whether `ExpireMeta` is attached. # Implementation notes ## Manipulating keyspace (find, modify, insert) Initially, unifying the key and value into a single object and storing it in dict with `no_value` optimization seemed like a quick win. However, it (quickly) became clear that this change required deeper modifications to how keys are manipulated. The challenge was handling cases where a dictEntry is opt-out due to no_value optimization. In such cases, many of the APIs that return the dictEntry from a lookup become insufficient, as it just might be the key itself. To address this issue, a new-old approach of returning a "link" to the looked-up key's `dictEntry` instead of the `dictEntry` itself. The term `link` was already somewhat available in dict API, and is well aligned with the new dictEntLink declaration: ``` typedef dictEntry *dictEntLink; ``` This PR introduces two new function APIs to dict to leverage returned link from the search: ``` dictEntLink dictFindLink(dict d, const void key, dictEntLink bucket); void dictSetKeyAtLink(dict d, void key, dictEntLink link, int newItem); ``` After calling `link = dictFindLink(...)`, any necessary updates must be performed immediately after by calling `dictSetKeyAtLink()` without any intervening operations on given dict. Otherwise, `dictEntLink` may become invalid. Example: ``` / replace existing key / link = dictFindLink(d, key, &bucket, 0); // ... Do something, but don't modify the dict ... // assert(link != NULL); dictSetKeyAtLink(d, kv, &link, 0); / Add new value (If no space for the new key, dict will be expanded and bucket will be looked up again.) / link = dictFindLink(d, key, &bucket); // ... Do something, but don't modify the dict ... // assert(link == NULL); dictSetKeyAtLink(d, kv, &bucket, 1); ``` ## dict.h - The dict API has became cluttered with many unused functions. I have removed these from dict.h. - Additionally, APIs specifically related to hash maps (no_value=0), primarily those handling key-value access, have been gathered and isolated. - Removed entirely internal functions ending with “ByHash()” that were originally added for optimization and not required any more. - Few other legacy dict functions were adapted at API level to work with the term dictEntLink as well. - Simplified and generalized an optimization that related to comparison of length of keys of type strings. ## Hash Field Expiration Until now each hash object with expiration on fields needed to maintain a reference to its key-name (of the hash object), such that in case it will be active-expired, then it will be possible to resolve the key-name for the notification sake. Now there is no need anymore. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-05-12 10:15:17 +03:00
Yuan Wang	6a436b6f72	Redis-cli gets RDB by RDB channel (#13809 ) Now we have RDB channel in https://github.com/redis/redis/pull/13732, child process can transfer RDB in a background method, instead of handled by main thread. So when redis-cli gets RDB from server, we can adopt this way to reduce the main thread load. --------- Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>	2025-05-08 08:47:29 +08:00
Salvatore Sanfilippo	a46624e10e	[Vector sets] RDB IO errors handling (#13978 ) This PR adds support for REDISMODULE_OPTIONS_HANDLE_IO_ERRORS. and tests for short read and corrupted RESTORE payload. Please: note that I also removed the comment about async loading support since we should be already covered. No manipulation of global data structures in Vector Sets, if not for the unique ID used to create new vector sets with different IDs.	2025-05-07 21:49:00 +03:00
Pieter Cailliau	d65102861f	Adding AGPLv3 as a license option to Redis! (#13997 ) Read more about [the new license option](http://redis.io/blog/agplv3/) and [the Redis 8 release](http://redis.io/blog/redis-8-ga/).	2025-05-01 14:04:22 +01:00
Oran Agra	725c71b87a	fix flaky replication test (#13945 ) from the master's perspective, the replica can become online before it's actually done loading the rdb file. this was always like that, in disk-based repl, and thus ok with diskless and rdb channel. in this test, because all the keys are added before the backlog is created, the replication offset is 0, so the test proceeds and could get a LOADING error when trying to run the function.	2025-04-15 17:18:00 +03:00
Ozan Tezcan	eafc365040	Fix flaky replication test (#13909 ) Test fails time to time: ``` *** [err]: Slave is able to detect timeout during handshake in tests/integration/replication.tcl Replica is not able to detect timeout ``` Depending on the timing, "debug sleep" may occur during rdbchannel handshake and required log statement won't be printed to the log in that case. Better to wait after rdbchannel handshake.	2025-04-07 13:12:33 +03:00
Ozan Tezcan	3cdb8c6046	Improve replication buffering on replica and fix a related bug (#13904 ) With RDB channel replication, we introduced parallel replication stream and RDB delivery to the replica during a full sync. Currently, after the replica loads the RDB and begins streaming the accumulated buffer to the database, it does not read from the master connection during this period. Although streaming the local buffer is generally a fast operation, it can take some time if the buffer is large. This PR introduces buffering during the streaming of the local buffer. One important consideration is ensuring that we consume more than we read during this operation; otherwise, it could take indefinitely. To guarantee that it will eventually complete, we limit the read to at most half of what we consume, e.g. read at most 1 mb once we consume at least 2 mb. Additional changes Bug fix - Currently, when replica starts draining accumulated buffer, we call protectClient() for the master client as we occasionally yield back to event loop via processEventsWhileBlocked(). So, it prevents freeing the master client. While we are in this loop, if replica receives "replicaof newmaster" command, we call replicaSetMaster() which expects to free the master client and trigger a new connection attempt. As the client object is protected, its destruction will happen asynchronously. Though, a new connection attempt to new master will be made immediately. Later, when the replication buffer is drained, we realize master client was marked as CLOSE_ASAP, and freeing master client triggers another connection attempt to the new master. In most cases, we realize something is wrong in the replication state machine and abort the second attempt later. So, the bug may go undetected. Fix is not calling protectClient() for the master client. Instead, trying to detect if master client is disconnected during processEventsWhileBlocked() and if so, breaking the loop immediately. Related improvement: - Currently, the replication buffer is a linked list of buffers, each of which is 1 MB in size. While consuming the buffer, we process one buffer at a time and check if we need to yield back to `processEventsWhileBlocked()`. However, if `loading-process-events-interval-bytes` is set to less than 1 MB, this approach doesn't handle it well. To improve this, I've modified the code to process 16KB at a time and check `loading-process-events-interval-bytes` more frequently. This way, depending on the configuration, we may yield back to networking more often. - In replication.c, `disklessLoadingRio` will be set before a call to `emptyData()`. This change should not introduce any behavioral change but it is logically more correct as emptyData() may yield to networking and we may need to call rioAbort() on disklessLoadingRio. Otherwise, failure of main channel may go undetected until a failure on rdb channel on a corner case. Config changes - The default value for the `loading-process-events-interval-bytes` configuration is being lowered from 2MB to 512KB. This configuration primarily used for testing and controls the frequency of networking during the loading phase, specifically when loading the RDB or applying accumulated buffers during a full sync on the replica side. Before the introduction of RDB channel replication, the 2MB value was sufficient for occasionally yielding to networking, mainly to reply -loading to the clients. However, with RDB channel replication, during a full sync on the replica side (either while loading the RDB or applying the accumulated buffer), we need to yield back to networking more frequently to continue accumulating the replication stream. If this doesn’t happen often enough, the replication stream can accumulate on the master side, which is undesirable. To address this, we’ve decided to lower the default value to 512KB. One concern with frequent yielding to networking is the potential performance impact, as each call to processEventsWhileBlocked() involves 4 syscalls, which could slow down the RDB loading phase. However, benchmarking with various configuration values has shown that using 512KB or higher does not negatively impact RDB loading performance. Based on these results, 512KB is now selected as the default value. Test changes - Added improved version of a replication test which checks memory usage on master during full sync. --------- Co-authored-by: Oran Agra <oran@redislabs.com>	2025-04-03 10:04:29 +03:00
Ozan Tezcan	366c6aff81	Put replica online when bgsave is done (#13895 ) Before https://github.com/redis/redis/pull/13732, replicas were brought online immediately after master wrote the last bytes of the RDB file to the socket. This behavior remains unchanged if rdbchannel replication is not used. However, with rdbchannel replication, the replica is brought online after receiving the first ack which is sent by replica after rdb is loaded. To align the behavior, reverting this change to put replica online once bgsave is done. Additonal changes: - INFO field `mem_total_replication_buffers` will also contain `server.repl_full_sync_buffer.mem_used` which shows accumulated replication stream during rdbchannel replication on replica side. - Deleted debug level logging from some replication tests. These tests generate thousands of keys and it may cause per key logging on some cases.	2025-03-31 13:48:49 +03:00
Ozan Tezcan	a0da8390a2	Fix use-after-free when diskless load config is not swapdb (#13887 ) When the diskless load configuration is set to on-empty-db, we retain a pointer to the function library context. When emptyData() is called, it frees this function library context pointer, leading to a use-after-free situation. I refactored code to ensure that emptyData() is called first, followed by retrieving the valid pointer to the function library context. Refactored code should not introduce any runtime implications. Bug introduced by https://github.com/redis/redis/pull/13495 (Redis 8.0) Co-authored-by: Oran Agra <oran@redislabs.com>	2025-03-26 21:50:10 +03:00
Yuan Wang	f1d6542b1a	Stabilize tcl test cases (#13829 ) Recently encountered some errors as bellow, HGETEX/HSETEX with PXAT/EXAT options, after getting ttl, we calculate current time by `[clock seconds]` that may have a delay that causes results greater than expected. Dismiss memory test error, now we introduced rdb-channel replication, the full synchronization might finish before the child process exits. So we may fail if calling `bgsave` immediately after full sync.	2025-02-25 16:31:53 +08:00
Yuan Wang	7f5f588232	AOF offset info (#13773 ) ### Background AOF is often used as an effective data recovery method, but now if we have two AOFs from different nodes, it is hard to learn which one has latest data. Generally, we determine whose data is more up-to-date by reading the latest modification time of the AOF file, but because of replication delay, even if both master and replica write to the AOF at the same time, the data in the master is more up-to-date (there are commands that didn't arrive at the replica yet, or a large number of commands have accumulated on replica side ), so we may make wrong decision. ### Solution The replication offset always increments when AOF is enabled even if there is no replica, we think replication offset is better method to determine which one has more up-to-date data, whoever has a larger offset will have newer data, so we add the start replication offset info for AOF, as bellow. ``` file appendonly.aof.2.base.rdb seq 2 type b file appendonly.aof.2.incr.aof seq 2 type i startoffset 224 ``` And if we close gracefully the AOF file, not a crash, such as `shutdown`, `kill signal 15` or `config set appendonly no`, we will add the end replication offset, as bellow. ``` file appendonly.aof.2.base.rdb seq 2 type b file appendonly.aof.2.incr.aof seq 2 type i startoffset 224 endoffset 532 ``` #### Things to pay attention to - For BASE AOF, we do not add `startoffset` and `endoffset` info, since we could not know the start replication replication of data, and it is useless to help us to determine which one has more up-to-date data. - For AOFs from old version, we also don't add `startoffset` and `endoffset` info, since we also don't know start replication replication of them. If we add the start offset from 0, we might make the judgment even less accurate. For example, if the master has just rewritten the AOF, its INCR AOF will inevitably be very small. However, if the replica has not rewritten AOF for a long time, its INCR AOF might be much larger. By applying the following method, we might make incorrect decisions, so we still just check timestamp instead of adding offset info - If the last INCR AOF has `startoffset` or `endoffset`, we need to restore `server.master_repl_offset` according to them to avoid the rollback of the `startoffset` of next INCR AOF. If it has `endoffset`, we just use this value as `server.master_repl_offset`, and a very important thing is to remove this information from the manifest file to avoid the next time we load the manifest file with wrong `endoffset`. If it only has `startoffset`, we calculate `server.master_repl_offset` by the `startoffset` plus the file size. ### How to determine which one has more up-to-date data If AOF has a larger replication offset, it will have more up-to-date data. The following is how to get AOF offset: Read the AOF manifest file to obtain information about the last INCR AOF 1. If the last INCR AOF has `endoffset` field, we can directly use the `endoffset` to present the replication offset of AOF 2. If there is no `endoffset`(such as redis crashes abnormally), but there is `startoffset` filed of the last INCR AOF, we can get the replication offset of AOF by `startoffset` plus the file size 3. Finally, if the AOF doesn’t have both `startoffset` and `endoffset`, maybe from old version, and new version redis has not rewritten AOF yet, we still need to check the modification timestamp of the last INCR AOF ### TODO Fix ping causing inconsistency between AOF size and replication offset in the future PR. Because we increment the replication offset when sending PING/REPLCONF to the replica but do not write data to the AOF file, this might cause the starting offset of the AOF file plus its size to be inconsistent with the actual replication offset.	2025-02-13 17:31:40 +08:00

1 2 3 4 5 ...

428 commits