On macOS, running `make test` often fails with "too many open files" due
to the low default limit (usually 256).
This PR increases the limit by adding `ulimit -n 4096` so that the tests
have enough file descriptors for concurrent connections.
Follow https://github.com/redis/redis/issues/15045
## Summary
Simplify INCREX's out-of-bounds policy:
The original INCREX shipped with three out-of-bounds policies — OVERFLOW
FAIL, OVERFLOW SAT, OVERFLOW REJECT — but FAIL and REJECT are
functionally redundant: both leave the key untouched when the result is
out of bounds. They differ only in how the caller is notified (error
reply vs. [current_value, 0] array reply), which forces the user to make
a stylistic choice with no real semantic difference.
This PR collapses the three policies into one clear behavior:
* Default: the operation is rejected; the key value and TTL are left
unchanged, and the reply is [current_value, 0]. Callers detect
non-application by checking the applied-increment field; no
error-handling branch is required.
* SATURATE: the result is saturated to UBOUND / LBOUND, or to the type
limits (LLONG_MAX/MIN for BYINT, ±LDBL_MAX for BYFLOAT) when no explicit
bound is given.
New syntax:
INCREX <key> [BYFLOAT increment | BYINT increment]
[LBOUND lowerbound] [UBOUND upperbound] [SATURATE]
[EX seconds | PX milliseconds | EXAT seconds-timestamp | PXAT
milliseconds-timestamp | PERSIST] [ENX]
---------
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
This PR is based on: valkey-io/valkey#3511
Close https://github.com/redis/redis/issues/14983
## Summary
During diskless replication, if **any single replica** cannot accept a
write (TCP send buffer full / `EAGAIN`), the master stops reading the
RDB pipe entirely, stalling data delivery to **all** replicas —
including fast ones that are ready to receive data.
The failure reason is similar to
https://github.com/redis/redis/pull/14946, the socket buffer is more
easy to fill.
## Root Cause
In `rdbPipeReadHandler`, the master reads from the child's RDB pipe and
writes to all replica sockets in a loop. When `connWrite` to any replica
returns a partial write (socket send buffer full), the handler:
1. Installs a per-replica `rdbPipeWriteHandler` and increments
`rdb_pipe_numconns_writing`
2. **Removes the pipe read event** via `aeDeleteFileEvent(server.el,
server.rdb_pipe_read, AE_READABLE)`, stopping all pipe reads
The pipe read event is only re-enabled when **all** pending write
handlers complete (`rdb_pipe_numconns_writing == 0`), meaning the
**slowest replica dictates the throughput for all replicas**.
## Observed Behavior
With one slow replica (consuming at ~290 KB/s due to `key-load-delay`):
- Master bursts ~1.3 MB of RDB data until the slow replica's socket send
buffer fills
- `rdbPipeReadHandler` disables the pipe read event
- **All replicas starve for 4–5 seconds** while the slow replica drains
its buffer
- Cycle repeats: burst → stall → burst → stall
Ultimately, it leads to a very slow synchronization process of the
entire master and replica.
### Changes
1. Skip the entire `diskless replicas drop during rdb pipe` test under
Valgrind to avoid timing flakiness on slow env.
2. Move `start_server` inside the `foreach all_drop` loop so each
subcase gets a fresh master instead of sharing state across subcases.
3. For `no / slow / fast / all` subcases, replica 0 runs with
`key-load-delay 500`, which combined with the blocked-writer TCP
back-pressure can stall the RDB-saving child indefinitely; shrink the
dataset to ~40 MB so the transfer still exercises the blocked-writer
path but completes in reasonable time instead of hanging on the TCP
deadlock.
For the timeout subcase, replica 0 does not run with `key-load-delay
500`, so to avoid the TCP deadlock we still reduce the dataset somewhat,
but keep it larger than the other subcases. Otherwise the kernel TCP
send buffer can absorb the whole RDB, and we'd miss the
repl_last_partial_write != 0 "(full sync)" timeout path and only hit the
"(streaming sync)" path instead.
5. For the `all` subcase, set `rdb-key-save-delay 1000` on the master so
the RDB child keeps generating data while both replicas are killed,
ensuring the last-replica-drop path is exercised rather than racing with
normal completion.
6. Move the slow-replica `pause_process()` so it happens only in the
timeout subcase, not after killing replicas, so Redis observes the
disconnect promptly in non-timeout flows.
7. In the timeout subcase, set `repl-timeout` 2, wait inline for
`*Disconnecting timedout replica (full sync)*`, then restore
`repl-timeout` 60 so the remaining replica can finish the streamed RDB.
---------
Co-authored-by: Sarthak Aggarwal <sarthagg@amazon.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
Fixes:
1. After #15096, we pass -flto to jemalloc. On Azure Linux, the
resulting jemalloc library cannot be handled at link time and the build
fails. Adding -ffat-lto-objects so the compiler also emits regular
object code that the linker can fall back to when it cannot handle the
LTO-compiled library.
2. Fixed a warning about `path` being NULL in
`moduleLoadInternalModules()`.
3. Fixed compile warnings on older GCC versions introduced by #15162
(reported on Ubuntu 20.04)
Co-authored-by: debing.sun <debing.sun@redis.com>
Enabling memory tracking is forbidden during runtime if it is already
disabled. In non-clustered mode though the checks were incorrect so this
PR enforces the correct behavior in non-clustered environment.
Ensure backward compatibility and consistent behavior across different
architectures by explicitly setting the default value.
Fixes#15175
Co-authored-by: ofiryanai <ofiryanai1@gmail.com>
After introducing GCRA algorithm into redis
https://github.com/redis/redis/pull/14826 and subsequent introduction of
new RATE_LIMIT object type - https://github.com/redis/redis/pull/14905.
It was internally decided not to introduce GCRA into the new release.
As still no decision is made on whether it will be kept or not in the
future, this PR only makes the code related to GCRA dead - commands are
inaccessible and AOF/RDB load+save is disabled.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Close#15177
Follow [Fix use-after-free when fullsync happens while replica is
running a timed out script
(CVE-2026-23631)](0cca172a17)
Remove the `repl-diskless-load yes` test configuration because this
option exists only in the Redis fork and is not available in Redis OSS.
(cherry picked from commit 5033e15143)
Fixes [#15183](https://github.com/redis/redis/issues/15183).
## Motivation
Commit
[cf668f2c2](cf668f2c2c)
tightened cluster-announce-ip validation to require a valid IPv4 or IPv6
address, which is a regression for users that legitimately announce a
hostname.
## Changes
* isValidClusterAnnounceIp() now accepts either:
* A valid IPv4/IPv6 address
* A valid hostname — same character rules as cluster-announce-hostname,
length-bounded by NET_IP_STR_LEN to match the storage buffer.
(cherry picked from commit 21f2569f9b)
* Limit VADD REDUCE dim to not exceed original dim
Enforce VADD key [REDUCE dim] to reject dim that is bigger than the HNSW original dim, as dimension reduction makes no sense for reduce_dim > original_dim.
This also avoids OOM and possible heap overflow on later allocations using reduce_dim.
This should be backported to Redis version 8.0, 8.2 and 8.4.
Fullsync triggers emptyData and scriptingReset which free the scripting/function engine. If a timed out script is still running on the replica, this causes a use-after-free. Delay fullsync processing in readSyncBulkPayload until the script finishes.
Add the DENYOOM flag to SUBSCRIBE, PSUBSCRIBE, and SSUBSCRIBE commands
to bring their memory protection behavior in line with other Redis commands.
Problem:
Currently, subscribe commands lack memory protection when Redis reaches its
memory limit. This becomes problematic in two specific scenarios:
1. When the eviction policy doesn't allow eviction (e.g., noeviction)
2. When there are no evictable keys remaining in the database
In these cases, memory usage from pub/sub subscribers can keep growing
unchecked, potentially causing the Redis server to run out of memory. This
behavior is inconsistent with other Redis commands, which are protected by
the DENYOOM flag.
Solution:
Add the DENYOOM flag to all subscribe commands. When memory limits are
reached, these commands will be rejected, preventing uncontrolled memory
growth and aligning their behavior with other Redis commands.
Fixes#15082
## Problem
Loading a stream from RDB/RESTORE with a malformed consumer PEL (the
same pending ID listed twice for one consumer) hit an error path that
called streamFreeNACK() on a nack that was still referenced from the
group’s global PEL (cgroup->pel). Teardown then freed that nack again
while destroying the stream, causing a double free and a possible server
crash.
## Fix
On the duplicate consumer PEL branch in src/rdb.c, stop calling
streamFreeNACK(s, nack) when raxTryInsert(consumer->pel, …) fails. Keep
reporting corruption and rely on decrRefCount(o) for cleanup, consistent
with other paths where the nack is owned only by cgroup->pel.
# Redis Array
For years, Redis has been missing a real indexed data structure for the
use cases where the index and the spatial relationship of elements are
semantic. Hashes give you random lookups, but you have to store an index
as a key, and have no range visibility. Lists give you appending and
trimming, but what is in the middle remains hard to access. Streams give
you append-only events, which is another (useful, indeed) beast. None of
these is what you want when the *position itself* has business meaning —
slot 37, step 4, row 18552, day from 2934 to 2949, file line 11, 12, 15
and so forth. And, all those types, for different reasons, are all
suboptimal when you want a **ring buffer** able to store the latest N
observed samples of something.
Up to now, users found ways (they always do \o/) using the fact that the
data structures that are obvious in this universe are also extremely
powerful, if well implemented. But this forces compromises. Arrays
handle these index-first requirements natively, and usually with much
better memory and CPU usage than the workarounds. If the use case is the
right one, Arrays often provide much better space, time and usability at
the same time.
## Internal encoding
1. When dense, an Array is essentially a more fancy C array. You don't
pay anything for storing the index.
2. Yet, instead of going really flat, arrays are sliced into
4096-element slices, and each slice, when it contains just a few
elements, uses a special sparse encoding. When a slice is empty it's
just a `NULL` stored in the directory.
3. Small ints, floats, and short strings are pointer-tagged, so they
cost zero additional memory beyond the pointer slot itself.
4. When very sparse, a super-directory of windowed directories is used.
This allows the data type to be safe, instead of exhibiting pathological
space or time behavior. This representation is only triggered when there
are more than 8 million elements or very high indexes set.
## Use cases
Arrays are mostly stateless if not for the fact that each array
remembers the index of the latest added item, allowing `ARINSERT` and
`ARRING` to work properly. Otherwise it is a set/get at this index game,
with solid support for both setting / getting ranges, server-side
scanning, returning only populated elements in a time which is
proportional not to the range size, but to the population size.
A few concrete examples, that may work as mental models for the set of
problems that are similar to them (from the POV of the data modeling).
**Thermometer.** A sensor reporting once per minute, with gaps:
```
ARSET temp:room12:day7 123 22.3
ARGETRANGE temp:room12:day7 600 660 # the 10:00–11:00 window, with NULLs
ARSCAN temp:room12:day7 600 660 # only populated elements
AROP temp:room12:day7 0 1439 MAX # peak of the day, server-side
```
Missing minutes cost little to nothing. Numeric aggregation runs inside
Redis. Telemetry, IoT, meter readings, KPI rollups.
**Calendar.** A clinic with 96 fifteen-minute slots per day:
```
ARSET sched:room12:day 32 booking:991
ARSCAN sched:room12:day 0 95 # only occupied slots
ARGETRANGE sched:room12:day 48 63 # the afternoon full view to render
```
The slot number is the business key in this case. Room booking, parking
spaces, warehouse bins, lockers, ...
**Ring buffer.** ARRING replaces the classic LPUSH+LTRIM pattern.
Imagine remote `dmesg`.
```
ARRING machine:123 200 "[141087.430123]: arm_cpu_init(): cpu 14 online" # Capped to 200 entries
ARLASTITEMS machine:123 50 REV # 50 newest first
```
Faster than LPUSH+LTRIM, keep indexed access to past elements. Last-N
alarms, recent fraud scores, access history, remote logs, device events.
Ok here the use cases are mainly the ones of the old pattern: it is just
a better fit and allows to access random items in the middle, aggregate
server-side, and so forth.
**Workflow.** Step number is the index, value is the status. Gaps are
meaningful:
```
ARSET claim:99172 0 received
ARSET claim:99172 3 waiting:reviewer42
ARSET claim:99172 5 approved
ARGETRANGE claim:99172 0 5 # full workflow view, with NULLs for missing steps
ARSCAN claim:99172 0 5 # only steps that have a state
ARCOUNT claim:99172 # number of recorded steps
ARLEN claim:99172 # highest reached step + 1
```
**Skills knowledge base for agents.** Arrays are good at representing /
grepping into Markdown files:
```
ARSET skill:metal_gpu 0 "...."
ARSET skill:metal_gpu 1 "...."
ARSET skill:metal_gpu 2 "...."
ARGREP skill:metal_gpu - + RE "M3|M4" WITHVALUES
```
ARGREP has EXACT, MATCH, GLOB, RE, you can have multiple predicates, can
select AND or OR behavior.
**Bulk import results.** Sparse row annotations over millions of rows /
CSV / ...:
```
ARSET import:job551 18552 ERR:bad_email
ARSCAN import:job551 0 1000000 # Provides only rows that have something
```
## TLDR
If the position is part of the meaning, use an Array. If you want to
aggregate or grep remotely, use an Array. Feedback welcome :)
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Shubham S Taple <155555100+ShubhamTaple@users.noreply.github.com>
Co-authored-by: Yuan Wang <yuan.wang@redis.com>
Co-authored-by: Marc Gravell <marc.gravell@gmail.com>
## Summary
Fixes a ±1 ULP rounding mismatch between `fast_float_strtod()` and libc
`strtod()` in the widened (mantissa > 2^53) fast path introduced by
#15061. Reported by @vitahlin in
https://github.com/redis/redis/pull/14661#issuecomment-4320058616 with
two minimal reproducers:
```
input: 9007199255094284e-19
fast_float_strtod: 0x3f4d83c94fbbcb8a
libc strtod : 0x3f4d83c94fbbcb8b delta -1 ULP
input: 2489830482329185244e1
fast_float_strtod: 0x43f59888c51e5b4c
libc strtod : 0x43f59888c51e5b4b delta +1 ULP
```
Redis treats `strtod()` as the fallback for `fast_float_strtod()`, so
every fast-path-accepted input is contractually expected to be bit-exact
with `strtod()`. The two cases above are accepted by the widened branch
but produce a different IEEE-754 representation, breaking the contract.
## Root cause
The widened branch added in #15061 used a homebrew shortcut to convert a
128-bit integer product to a double:
```c
value = (double)hi * 18446744073709551616.0 + (double)lo;
```
This is **not** a single-rounding operation — `(double)hi` rounds when
`hi > 2^53`, and the subsequent `+ (double)lo` rounds again. For inputs
near the round-half-to-even boundary, the two roundings can compose into
the wrong direction.
The negative-exponent branch is doubly affected: integer division
`scaled / divisor` truncates the remainder before the conversion, so
even a hypothetical correct `hi*2^64+lo` step would round down on inputs
that should round up.
## Fix
Replace the homebrew widened branch with the **Eisel-Lemire** algorithm
from upstream `fast_float`. This is the same algorithm that
`fast_float`'s own widened path uses; bit-exact-with-strtod for inputs
with ≤19-digit mantissa is proved in:
> Noble Mushtak and Daniel Lemire, "Fast Number Parsing Without
Fallback".
Pieces ported (MIT-licensed, from
`fast_float/include/fast_float/fast_table.h`, `decimal_to_binary.h`,
`float_common.h`):
- 128-bit precomputed extended-precision powers of five (`5^-342 ...
5^308`, 651 entries) — pure data.
- `compute_product_approximation` — 128-bit multiply with high-half
rounding-boundary fixup.
- `compute_float` — the main algorithm; returns `(mantissa, power2)`
ready to be packed.
- `am_to_double` — IEEE-754 binary64 bit-pack.
- `__builtin_clzll` and `__uint128_t` wrappers (with a 32-bit fallback
for portability).
Clinger's strict fast path (`mantissa ≤ 2^53` and `|exp| ≤ 22`) is
**kept** unchanged — it's a single double multiply/divide and is faster
than Eisel-Lemire on its domain. Only the buggy widened branch is
replaced.
The port stays minimal:
- **double-only** (no `float`, no `long double`)
- No bigint slow path. The rare "indeterminate" inputs that upstream
resolves with `digit_comp` are unreachable from `parse_number_string`'s
≤19-digit mantissa per the Mushtak-Lemire proof, but a defensive
`am.power2 < 0` check is preserved that falls back to libc `strtod()` if
any future caller widens the input domain.
## Why not just revert #15061?
Considered. Reverting restores correctness at the cost of the +73-84 %
zset listpack-load wins #15061 measured on 17-19 digit double scores.
Eisel-Lemire is *the* algorithm that gives both correctness *and* the
wider mantissa range — preserving #15061's wins while fixing the
rounding regression.
A "tightened admission filter" (only accept widened-path inputs where
the conversion happens to be single-rounding) was also considered. The
math shows the filter conditions are essentially unsatisfied for typical
inputs (`lo == 0` requires the 128-bit product be divisible by 2^64;
only ~1 in 10^13 random inputs qualify), making it equivalent to a
revert with extra dead code. Eisel-Lemire is the only widened-path
solution that preserves perf on the typical case.
## Summary
The stream RDB loader and listpack integrity validator had two gaps that
allowed corrupted payloads to be silently accepted, potentially leading
to crashes or incorrect behaviour at query time.
**1. `deleted_count` in the listpack header was trusted without
verification (`t_stream.c`)**
`streamValidateListpackIntegrity` already walks every entry in a stream
listpack during deep validation, inspecting each entry's flags. However,
the `deleted_count` value stored in the listpack header was never
cross-checked against the actual number of entries carrying
`STREAM_ITEM_FLAG_DELETED`. A corrupted or crafted payload could set an
arbitrary `deleted_count`, causing the `entry_count` (live) and
`deleted_count` to be inconsistent with the real data. This PR adds a
running `actual_deleted` counter during the entry walk and rejects the
listpack when it disagrees with the header.
**2. `s->length` was only loosely validated against the rax (`rdb.c`)**
The old check (`s->length && !raxSize(s->rax)`) only caught the
degenerate case where the stream claimed a non-zero length but had zero
rax nodes. It did not detect a mismatch where the stream's `length`
field differed from the sum of live (non-tombstone) entries across all
listpacks. A corrupted payload could, for example, report `length = 2`
while every listpack's live-entry count adds up to only 1, bypassing the
check entirely and causing incorrect results from `XLEN`, `XRANGE`,
`XREAD`, etc.
This PR accumulates `live_entries` from each listpack's first element
(the live-entry count) during the rax-loading loop and then performs an
exact equality check (`s->length != live_entries`).
The live-entry count itself is also validated against the in-memory
invariant maintained by `streamIteratorRemoveEntry`: when the last live
entry of a listpack is deleted, the whole rax node is removed, so a node
in the rax must have `lp_live >= 1`. The loader enforces this by
rejecting `lp_live <= 0` (not just `< 0`). Without this stricter bound,
a payload where every listpack has `lp_live = 0` and `s->length = 0`
would pass the equality check (`0 == 0`) and load into an inconsistent
state: non-empty rax containing only-tombstone listpacks, with `XLEN`
reporting 0.
**3. Tests**
Three new `corrupt-dump.tcl` tests exercise the new rejection paths:
- **stream listpack with wrong deleted count in header** — a crafted
payload where the header says `deleted_count = 1` but the only entry is
live, caught by the new `actual_deleted` check in
`streamValidateListpackIntegrity`. Requires `sanitize-dump-payload yes`
since the check runs only during deep validation.
- **stream length inconsistent with live entries** — a crafted payload
where the listpack reports `lp_live = 1` (so the `lp_live <= 0` guard
passes) but `s->length = 2`, caught by the `s->length != live_entries`
check in `rdbLoadObject`.
- **stream all-tombstone listpack with zero length** — a crafted payload
where the listpack header reports `lp_live = 0` and `s->length = 0`.
This case slips past the equality check (`0 == 0`) and is uniquely
caught by the tightened `lp_live <= 0` rejection at the listpack header.
## Summary
Optimize the HyperLogLog register histogram functions (`hllRawRegHisto` and
`hllDenseRegHisto`) by splitting the single histogram accumulator into 4
independent accumulators that are merged at the end. This breaks
store→load dependency chains when consecutive register bytes map to the same
histogram bin, allowing the CPU's out-of-order engine to overlap the increments.
Profiling shows `hllRawRegHisto` as the single hottest function on multi-key
PFCOUNT. TMA analysis on x86 Sapphire Rapids reveals Core_Bound at 25.1%
with Heavy_Operations at 9.6%, indicating serialized memory operations
from the histogram update pattern.
**When it helps:**
- Multi-key PFCOUNT (the primary use case — each key triggers a full
16384-register histogram build)
- PFMERGE followed by PFCOUNT (same histogram path on the merged result)
- Any PFCOUNT on a dense HLL representation (most production HLLs)
**When it doesn't help:**
- PFADD (register updates, not histogram reads)
- Sparse HLL representations (small cardinalities use a different path)
- Single-key PFCOUNT on sparse encoding
## How it works
The original code builds one `reghisto[64]` array by incrementing bins for
each of the 16384 registers (processed 8 at a time from 64-bit words):
```c
reghisto[bytes[0]]++; // store→load hazard if bytes[0] == bytes[1]
reghisto[bytes[1]]++; // must wait for previous store to complete
...
```
When two bytes in the same word have the same value (common — register
values cluster around log2(cardinality)), the CPU serializes the increment
chain because each `reghisto[x]++` is a load-modify-store that depends
on the previous store to the same address.
The fix splits into 4 independent arrays — `h0` through `h3` — each handling
2 of the 8 bytes per word, interleaved so consecutive bytes go to different accumulators:
```c
h0[r[0]]++; // independent
h1[r[1]]++; // independent — different array, no hazard
h2[r[2]]++;
h3[r[3]]++;
h0[r[4]]++;
h1[r[5]]++;
h2[r[6]]++;
h3[r[7]]++;
```
`hllDenseRegHisto` applies the same pattern across 16 registers per
iteration, interleaved by index mod 4 (`r0,r4,r8,r12 → h0; r1,r5,r9,r13
→ h1; …`).
After the loop, the 4 arrays are summed into the final histogram. The 4×64
extra stack bytes are negligible, and the merge loop is ~1% of the function's cost.
## Benchmark Results
Tested on `x86-aws-m7i.metal-24xl-2` (Intel Sapphire Rapids, bare metal),
`oss-standalone` topology. **Median of 2 datapoints, ±0.4% std.dev on PFCOUNT.**
| Test | Baseline (`unstable @ 9c1ecd044`) | PR (`5401108`) | Change |
|------|---------------------------------:|---------------:|--------|
| **multiple-hll-pfcount-100B-values** | 63,338 | 78,809 ±0.4% |**+24.4%** |
| multiple-hll-pfmerge-100B-values | 106,881 | 106,832 ±2.0% | -0.0%(flat) |
PFMERGE is flat — the SIMD merge path is unchanged, only the histogram
accumulation is modified.
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Add configurations for `SLOWLOG_ENTRY_MAX_ARGC` and
`SLOWLOG_ENTRY_MAX_STRING` values which are currently hardcoded in code.
Two new configurations:
* `slowlog-entry-max-argc` - maximum number of command arguments kept in
a slowlog entry. Default: 32
* `slowlog-entry-max-string-len` - maximum length of a command argument
in a slowlog entry. Default: 128
Useful for better diagnostics of slow commands with numerous and long
arguments.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
vecClear() reset the logical size without releasing element ownership,
leaking elements whenever a free callback was registered via
vecSetFreeMethod(). This mirrors vecRelease()'s element-freeing loop
while still preserving the backing storage.
Close#14278
## Overview
Rate limiters, sliding windows, request counters, and numerous other
network-facing patterns share a common primitive: **atomically increment
a counter and set its expiration**. Achieving this in Redis requires
either multiple round-trips or a Lua script that bundles `INCR` /
`INCRBY` / `INCRBYFLOAT` with `EXPIRE` / `PEXPIRE`.
We propose a new command, **`INCREX`**, that collapses this two-step
pattern into a single, native, O(1) command. `INCREX` atomically:
1. Increments (or decrements) a key's numeric value — by integer or
float.
2. Optionally enforces lower and/or upper bounds, with a configurable
overflow policy (error out, saturate, or no-op), enabling built-in cap
enforcement (e.g., max request count) without additional client logic.
3. Optionally sets or removes the key's expiration.
4. Returns both the **new value** and the **actual increment applied**,
giving the caller immediate feedback on whether the operation was
saturated or skipped.
## Use Cases
### Basic Usage
```
# Increment by 1 (default) and set a 60-second TTL.
> SET mykey 10
> INCREX mykey EX 60
1) (integer) 11 # new value
2) (integer) 1 # actual increment
# Use 0 as initial value if the key doesn't exist.
> DEL mykey
> INCREX mykey
1) (integer) 1 # new value
2) (integer) 1 # actual increment
# Default policy (OVERFLOW FAIL): exceeding a bound returns an error.
> SET mykey 5
> INCREX mykey BYINT 20 UBOUND 10
(error) value is out of bounds
# Opt into saturation with OVERFLOW SAT.
> INCREX mykey BYINT 20 UBOUND 10 OVERFLOW SAT
1) (integer) 10 # saturated to upper bound
2) (integer) 5 # only 5 was actually applied
# Skip the operation with OVERFLOW REJECT — the key and its TTL are
# untouched, and the reply reports the current value with a zero delta.
> SET mykey 5
> INCREX mykey BYINT 20 UBOUND 10 OVERFLOW REJECT
1) (integer) 5 # current (unchanged) value
2) (integer) 0 # nothing was applied
# Increment by a float
> SET mykey 1
> INCREX mykey BYFLOAT 0.5
1) "1.5"
2) "0.5"
```
### Use Case: Rate Limiter
**Before (Lua script):**
```lua
-- KEYS[1] = rate limit key, ARGV[1] = limit, ARGV[2] = window in seconds
local current = redis.call('INCR', KEYS[1])
if current > tonumber(ARGV[1]) then
return 0 -- rejected
end
if current == 1 then
redis.call('EXPIRE', KEYS[1], ARGV[2])
end
return 1 -- allowed
```
Client invocation:
```python
result = redis.eval(LUA_SCRIPT, 1, f"ratelimit:{user_id}", 100, 60)
if result == 0:
reject_request()
```
**After (INCREX):**
```python
new_val, actual_incr = redis.execute_command(
"INCREX", f"ratelimit:{user_id}", "UBOUND", 100, "OVERFLOW", "REJECT", "EX", 60, "ENX"
)
if actual_incr == 0:
# Rate limit exceeded — key left unchanged.
reject_request()
```
`ENX` means: set expiration only if the key doesn't already have an
expiration. This ensures the sliding window's TTL is only set on the
first request.
### Use Case: Token Bucket Refill
Refill tokens periodically up to a capacity ceiling, saturating at the
cap instead of erroring:
```
> INCREX tokens:user123 BYINT 10 UBOUND 100 OVERFLOW SAT EX 3600 ENX
1) (integer) 10
2) (integer) 10
```
Tokens cannot exceed 100, and the key auto-expires after inactivity.
### Use Case: Countdown / Resource Consumption
Decrement a resource counter down to zero, saturating at the floor:
```
> SET credits:user123 50
> INCREX credits:user123 BYINT -1 LBOUND 0 OVERFLOW SAT
1) (integer) 49
2) (integer) -1
```
When credits are exhausted, `OVERFLOW SAT` prevents negative balances
without client-side checks.
## Parameter Reference
### Syntax
```
INCREX key
[BYFLOAT increment | BYINT increment]
[LBOUND lowerbound] [UBOUND upperbound] [OVERFLOW <FAIL | SAT | REJECT>]
[EX seconds | PX milliseconds | EXAT unix-time-seconds | PXAT unix-time-milliseconds | PERSIST] [ENX]
```
### Parameters
| Parameter | Description |
|-----------|-------------|
| `key` | The key to increment. Created with value `0` if it does not
exist. |
| `BYFLOAT increment` | Increment the value by the given long-double
float. |
| `BYINT increment` | Increment the value by the given 64-bit signed
integer. |
| `LBOUND lowerbound` | Set lower bound for the increment result.
Defaults to `LLONG_MIN` (integer) or `-LDBL_MAX` (float). |
| `UBOUND upperbound` | Set upper bound for the increment result.
Defaults to `LLONG_MAX` (integer) or `LDBL_MAX` (float). |
| `OVERFLOW <FAIL \| SAT \| REJECT>` | Set the overflow policy when the
result would be out of bounds. `FAIL` rejects the operation with an
error (default). `SAT` saturates the result to the bound. `REJECT`
leaves the key and its TTL untouched and replies with the current value
and a zero delta. |
| `EX seconds` | Set the key's TTL to `seconds` seconds. |
| `PX milliseconds` | Set the key's TTL to `milliseconds` milliseconds.
|
| `EXAT unix-time-seconds` | Set the key's expiration to the absolute
Unix timestamp in seconds. |
| `PXAT unix-time-milliseconds` | Set the key's expiration to the
absolute Unix timestamp in milliseconds. |
| `PERSIST` | Remove the key's existing TTL. |
| `ENX` | Set the key's TTL/expiration if it has No eXpiration |
If neither `BYINT` nor `BYFLOAT` is specified, the increment defaults to
integer `1`.
### Return Value
An **array of two elements**:
1. **New value** — the value of the key after the increment (or the
unchanged current value under `OVERFLOW REJECT`).
2. **Actual increment** — the increment that was actually applied. May
differ from the requested increment when `OVERFLOW SAT` saturates the
result to a bound, and is always `0` when `OVERFLOW REJECT` skipped the
operation.
- In integer mode (default or `BYINT`): both elements are **integers**.
- In float mode (`BYFLOAT`): both elements are **bulk strings**
representing the float values on RESP2, and **RESP3 Doubles** on RESP3.
### Overflow Policy (FAIL vs. SAT vs. REJECT)
Controlled by the optional `OVERFLOW` argument. A bound violation
includes both exceeding an explicit `LBOUND`/`UBOUND` and overflowing
the type limits when no explicit bound is given.
- **`OVERFLOW FAIL` (default)**: if the computed result would violate a
bound, the command returns an error and the key is left unchanged. This
matches the existing semantics of `INCRBY` / `INCRBYFLOAT` on overflow.
- **`OVERFLOW SAT`**: the result is silently capped at `UBOUND` /
floored at `LBOUND` (or saturated to the type limits when no explicit
bound is given). The second element of the reply reflects the saturated
delta. If the delta cannot be represented as a 64-bit signed
integer(default or `BYINT`), or would produce Infinity(`BYFLOAT`), an
error is returned.
- **`OVERFLOW REJECT`**: the operation is silently skipped — the key
value and its TTL are left unchanged, no keyspace notification is fired,
and nothing is replicated. The reply is `[current_value, 0]`, allowing
the caller to detect the rejection without handling an error.
### Notes
- If **no expiration option** is given, the key's existing TTL is
preserved (like `INCR`).
- `ENX` requires one of `EX`/`PX`/`EXAT`/`PXAT`.
- If the result is saturated by `OVERFLOW SAT`, the expiration is still
applied as specified.
- Under `OVERFLOW REJECT` the expiration option is ignored on the
rejected branch — TTL is preserved exactly as it was before the call.
- **`BYINT` requires an integer-typed existing value; `BYFLOAT` accepts
both.**
Integers can be promoted to floats losslessly, but a stored float (e.g.
`"1.5"`) cannot be parsed back as an integer. This is consistent with
`INCR`/`INCRBY` (integer-only) and `INCRBYFLOAT` (accepts both).
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Co-authored-by: Moti Cohen <moti.cohen@redis.com>
Co-authored-by: oranagra <oran@redislabs.com>
PR #14440 changed `mstate.commands` from an array of `multiCmd` structs
to an array of `pendingCommand` pointers. This PR fixes the overhead
calculation in multiStateMemOverhead to account for both the pointer
array and the actual structs:
- The pointer array: `alloc_count * sizeof(pendingCommand*)`
- The individually allocated structs: `count * sizeof(pendingCommand)`
Reduce MGET / MSET latency by overlapping the dict-lookup memory accesses
across the keys of a single multi-key command. Builds on the cross-command
batched prefetch framework introduced in #14017 and the dict-prefetch state
machine in `memory_prefetch.c`, and lifts the kvobject-aware bits out of the
state machine into two new `dictType` callbacks so the same machinery can
be reused for other dict-encoded types later (hash hashtable, sets, sorted
sets) without paying for `kvobj`-specific code paths in the core loop.
Bundles the work originally proposed in #14899 (MGET prefetch framework,
by @mpozniak95) and #15043 (MSET batch prefetch).
## Design
Two new optional callbacks on `dictType`:
```c
typedef struct dictType {
...
/* Bring the entry's key payload into cache before keyCompare runs.
* Returns the address to prefetch, or NULL if the entry alone is enough. */
void *(*prefetchEntryKey)(const dictEntry *de);
/* Called only after a key match. Returns the value-side payload to
* prefetch (or NULL). */
void *(*prefetchEntryValue)(const dictEntry *de);
} dictType;
```
`dbDictType` registers both. The kv-aware logic — the `dictEntryIsKey()`
shortcut for embedded kvobjs, and `kv->ptr` for `OBJ_STRING` /
`OBJ_ENCODING_RAW` values — now lives in two small helpers in
`server.c`:
```c
static void *dbDictPrefetchEntryKey(const dictEntry *de) {
return dictEntryIsKey(de) ? NULL : dictGetKey(de);
}
static void *dbDictPrefetchEntryValue(const dictEntry *de) {
kvobj *kv = dictGetKey(de);
return (kv->type == OBJ_STRING && kv->encoding == OBJ_ENCODING_RAW)
? kv->ptr : NULL;
}
```
The `PrefetchGetValueDataFunc` typedef and the per-call `get_val_data`
parameter on `dictPrefetchKeys()` / `dictPrefetch()` are removed — the
dict's own type drives both ends. This also removes the foot-gun where
callers (like `mgetCommand`) had to remember whether to pass
`prefetchGetObjectValuePtr` or `NULL`. `memory_prefetch.c` no longer
references `kvobj`, `kvobjGetKey`, or any specific value layout.
## State machine
Two file-local types in `memory_prefetch.c`:
| Type | Role |
|---|---|
| `dictPrefetchLookup` | Per-key snapshot of an in-flight,
software-pipelined `dictFind` (mirrors the locals a synchronous
`dictFind` would carry across one bucket walk). |
| `dictPrefetcher` | Driver that advances a batch of
`dictPrefetchLookup`s through the FSM, yielding to the next in-flight
lookup each time a prefetch is issued. |
Five-stage lifecycle for each lookup, driven by the prefetcher:
```text
│
start
│
┌────────▼─────────┐
┌─────────►│ PREFETCH_BUCKET ├────►────────┐
│ └────────┬─────────┘ no more tables
│ bucket│found │
│ │ │
entry not found - goto next table ┌────────▼────────┐ │
└────◄─────┤ PREFETCH_ENTRY │ ▼
┌────────────►└────────┬────────┘ │
│ entry│found │
│ │ │
│ ┌───────────▼─────────────┐ │
│ │ PREFETCH_ENTRY_KEY │ ◄── dictType->prefetchEntryKey(de)
│ └───────────┬─────────────┘ │
│ │ │
key mismatch - goto next entry │ │
│ ┌───────────▼─────────────┐ │
└──────◄───│ PREFETCH_ENTRY_VALUE │ ◄── keyCompare; on match,
└───────────┬─────────────┘ dictType->prefetchEntryValue(de)
│ │
┌─────────▼─────────────┐ │
│ PREFETCH_DONE │◄────────┘
└───────────────────────┘
```
`PREFETCH_BUCKET` first picks `ht_table[0]`, then flips to `ht_table[1]`
if the dict is mid-rehash, then transitions to `PREFETCH_DONE` if no
more tables remain.
`memory_prefetch.c` exposes a small lifecycle that any caller can drive:
```c
dictPrefetcherInit(p, max_keys); /* one-shot heap alloc of lookups[] */
dictPrefetcherReset(p, dicts, keys, nkeys); /* configure for one batch */
dictPrefetcherRun(p); /* drive FSM until all PREFETCH_DONE */
dictPrefetcherFree(p); /* release */
```
Each FSM stage is a named static function (`dictPrefetchBucket`,
`dictPrefetchEntry`, `dictPrefetchEntryKey`, `dictPrefetchEntryValue`),
so the `dictPrefetcherRun` driver is a four-line `switch` over the
state.
The state machine is dict-pure: no `kvobj` field on
`dictPrefetchLookup`,
no `kvobjGetKey` reach-through. Round-robin advance semantics — a state
only advances the cursor if a prefetch was actually issued — are
preserved, so the embedded-kvobj fast path
(`dictEntryIsKey(de) == 1` → callback returns NULL) still skips the
extra prefetch and falls straight into the compare on the next loop
iteration.
The cross-command path (`prefetchCommands` / `PrefetchCommandsBatch`)
embeds a `dictPrefetcher` initialized once at startup and reset per
batch, so cross-command prefetching no longer allocates per call.
## Intra-command API
```c
void dictPrefetchKeys(dict **dicts, void **keys, size_t nkeys);
```
A single multi-key command (e.g. MGET) can prefetch dict data for a
batch of its own keys, reusing the same state machine that the
cross-command path uses. Single-key calls (`nkeys <= 1`) early-return —
nothing to interleave with. The implementation stack-allocates a
fixed-size lookup array bounded by `DICT_PREFETCH_MAX_SIZE = 64` (no
VLA, predictable stack usage), so the intra-command path doesn't touch
the heap.
## Notes on the call sites
A shared helper picks the next prefetch batch and warms it via
`dictPrefetchKeys`:
```c
/* Pick the next prefetch batch starting at argv[start] and warm it via
* dictPrefetchKeys. 'stride' is 1 for keys-only args (MGET) or 2 for
* key/value pairs (MSET). Returns the chosen batch size in items. */
static int prefetchKeysBatch(client *c, int slot, int start, int stride);
```
Adaptive batch sizing inside the helper: if at least two full batches
(`PREFETCH_BATCH_SIZE * 2 = 32` items) remain, take one batch
(`PREFETCH_BATCH_SIZE = 16`); otherwise take all remaining items in one
call. This generalizes the small-request fast path so the trailing
batch of a large request also gets the single-call benefit.
- **MGET (`mgetCommand`)** — gated by
`do_prefetch = server.prefetch_batch_max_size && !already_prefetched && numkeys > 1`,
with `already_prefetched = c->current_pending_cmd &&
(c->current_pending_cmd->flags & PENDING_CMD_KEYS_PREFETCHED)`.
When `do_prefetch` is set, each iteration calls
`prefetchKeysBatch(c, slot, j, 1)` and then sequentially
`lookupKeyRead`s + replies the chosen batch. When `do_prefetch` is
clear (cross-command path already warmed the keys, or batch
prefetching is off), the loop takes all remaining items in one go
and skips the prefetch.
- **MSET / MSETNX (`msetGenericCommand`)** — same `do_prefetch` gate as
MGET with `stride = 2`. For the NX flag the NX-check loop runs
`lookupKeyWrite` (which already warmed everything via
`prefetchKeysBatch`); the SET loop then disables further prefetch
(`do_prefetch &&= !nx`) so we don't re-prefetch on the second pass.
Going through the full state machine (rather than bucket-only) means
`dbDictType`'s `prefetchEntryValue` callback runs on a key match —
warming the old kvobj's payload, which `setKey -> dbReplaceValue ->
updateKeysizesHist(oldlen, newlen)` then reads to compute the
histogram delta. The slot dict is re-fetched per batch — in cluster
mode the slot dict can be freed mid-MSET (`KVSTORE_FREE_EMPTY_DICTS`
+ `expireIfNeeded`), so a cached pointer would otherwise dangle.
- **Cross-command batch path (`addCommandToBatch`)** — sets
`PENDING_CMD_KEYS_PREFETCHED` on every command added to the batch,
even on partial-batch overflow (was: only when ALL keys fit). The
intra-command path then uniformly skips supplemental prefetching for
any command the batch touched. Rationale: running both paths
(cross-command warm + intra-command supplement) caused a measured
−9.6 % regression on x86 with pipeline-10, and the partial cross-
command warmup is sufficient for the head of the keyset; the cold
tail goes through normal lookup, which is still cheaper than running
the FSM a second time on already-warm keys.
- **Future types**: each dict's `dictType` can register its own
`prefetchEntryKey` / `prefetchEntryValue` (e.g. for the hashtable hash
encoding, the field-sds and value-sds payloads), without touching
`memory_prefetch.c`.
## Benchmark validation
On x86, performance improvements are significant for larger batch sizes:
- 5Mkeys-string-mget-10B-100keys-pipeline-10: +89.44%
- 5Mkeys-string-mget-100B-100keys: +37.33%
- 5Mkeys-string-mget-100B-30keys: +22.40%
On ARM (Graviton4), the gains are even more pronounced:
- 5Mkeys-string-mget-10B-100keys-pipeline-10: +128.34%
- 5Mkeys-string-mget-100B-100keys-pipeline-10: +46.76%
Overall, the improvement scales with batch size, while a few small-batch cases show marginal gains or slight regressions.
---------
Co-authored-by: Marcin Poźniak <marcin.pozniak@intel.com>
Co-authored-by: Yuan Wang <yuan.wang@redis.com>
## Problem
At the back-length width boundaries 16383 / 2097151 / 268435455,
lpEncodeBacklen falls through to the next-wider encoding instead of
using the narrower one, because the threshold checks use < where they
should use <=. The listpack stays valid (the decoder is self-delimiting)
but each such entry wastes one byte.
## Fix
Update the threshold checks to use inclusive ≤, ensuring boundary values
are encoded with the minimal number of bytes, consistent with
lpDecodeBacklen.
### Problem
Node.js 20 actions are deprecated. The warning in CI like that:
> Node.js 20 actions are deprecated. The following actions are running
on Node.js 20 and may not work as expected: actions/checkout@v4. Actions
will be forced to run with Node.js 24 by default starting June 2nd,
2026. Please check if updated versions of these actions are available
that support Node.js 24. To opt into Node.js 24 now, set the
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the
runner or in your workflow file. Once Node.js 24 becomes the default,
you can temporarily opt out by setting
ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see:
https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
### Changed
Upgrade actions to their latest stable versions:
1. `actions/upload-artifact` v4 => v7
2. `actions/checkout` v4 => v6
3. `actions/checkout` main => v6
4. `actions/create-github-app-token` v1 => v3
5. `github/codeql-action` v3 => v4
6. `actions/cache` v4 => v5
7. `actions/setup-node` v4 => v6
# Description
There is an array corruption bug in LDB caused by an incorrect size
argument being passed to `memmove()` inside the `ldbDelBreakpoint()`
function.
When deleting a breakpoint, `memmove()` is used to shift the remaining
breakpoints in the ldb.bp integer array forward. However, the size
parameter passes the number of elements rather than the number of bytes.
Because ldb.bp is an array of type `int`, this results in an under-copy.
While profiling command execution, I noticed that command argv object
alloc/free overhead is quite high for workloads with many small
arguments (e.g. `HSET` with many fields). The effect is much more
visible with pipelining when Redis becomes CPU bound.
I experimented with replacing argv object alloc/free with a simple
object pool and saw significant speedups.
(Note: related effort around this topic:
https://github.com/redis/redis/pull/13726)
In this PR, I tried to improve the main hotspots in the memory
allocation path (focusing on command arg allocations) to close the gap
with custom pool performance, so we can avoid having a dedicated memory
pools and let the whole codebase benefit from these optimizations.
## Changes
### 1) Faster dealloc via passing size hint to jemalloc (separate PR
#15071)
Jemalloc does more work than an object pool on free (a lookup on a tree
to find the allocation's size class). For some deallocations, we can
reduce free path overhead by passing a size hint to jemalloc (i.e.
`sdallocx()`) which can skip metadata lookup in the common case. This PR
introduces `zfree_with_size()` and uses it where we can know the
allocation size i.e. `OBJ_ENCODING_EMBSTR` objects in `decrRefCount()`
and SDS free path.
### 2) Reduce atomic operation cost for stat updates
`update_zmalloc_stat_alloc()` / `update_zmalloc_stat_free()` previously
used atomic read-modify-write (RMW) operations (`atomicIncrGet` /
`atomicDecr`) which can emit expensive locked instructions on x86.
When we can guarantee a single writer to a counter, we can use a cheaper
load+add+store sequence instead of a locked RMW. This PR gives the first
16 threads dedicated slots for used_memory stats (intended to cover the
main thread/ I/O threads) so they can use this single writer fast path.
Threads beyond that fall back to a shared pool and continue to use full
atomic RMW.
### 3) Improve jemalloc tcache hit rate
With the default `lookahead=16` config, a pipelined HSET with ~20 fields
does ~40 small allocations per command (fields + values), so you can get
16 x 40 = ~640 allocations. When args are small, many of these land in
the 32 byte size class (often `EMBSTR`). Jemalloc’s default per-bin
tcache cap is 200, so this kind of burst overflows the cache and it does
frequent flushes. I raised the small-bin tcache limits
(lg_tcache_nslots_mul:3, tcache_nslots_small_max:1000) to handle these
bursts better. In the worst case, tcache may have a higher memory usage
due to this change. Perhaps, another option was lowering `lookahead` to
tune it differently.
### 4) Inlining
When you have a simple pool, it has a few small functions and it is easy
for compiler to inline them. Compared to that, jemalloc alloc/free path
has a deeper call stack. Also, jemalloc was not compiled with `-flto`
which was preventing inlining jemalloc functions. As part of this PR, I
added `-flto` flag to jemalloc when it is enabled for Redis.
Compiler also chooses not to inline some hot path functions in Redis.
This suggests PGO (profile-guided optimization) could provide additional
wins and perhaps we can start experimenting with it sometime. We could
try to force inlining with attributes like `always_inline` but it is
hard to apply across a deep call stack and misuse can cause code bloat.
So, rather than going in this direction, I added `inline` keyword to
some functions for now. This doesn't make compiler to inline all hot
path functions but at least it is a step ahead. (If we can further
improve this in future, performance gets very close to custom memory
pool implementation).
## Benchmark results
Commands were like:
```
memtier_benchmark --command="HSET __key__ username john_doe email john@example.com password hashed_pwd_123 created_at 1709125200 updated_at 1709125200 first_name John last_name Doe phone_number +1234567890 address 123_Main_St city NewYork country USA postal_code 10001 company Acme_Corp job_title Engineer bio Loves_coding" --command-ratio=1 --command-key-pattern=P --key-prefix="hsetkey" --key-minimum=1 --key-maximum=100000 -n 1000000 -c 50 -t 2 --hide-histogram --pipeline 50
```
| Benchmark | Improvement |
| --- | ---: |
| SET | +0% |
| SET (pipeline) | +8% |
| HSET 15 fields | +2% |
| HSET 15 fields (pipeline) | +17% |
| ZADD 15 elements| +3% |
| ZADD 15 elements (pipeline) | +15% |
This PR is based on https://github.com/valkey-io/valkey/pull/453 and
https://github.com/valkey-io/valkey/pull/694
When jemalloc frees memory, it performs a lookup to find the
allocation's size class. `sdallocx()` lets us skip this lookup by
passing the size we already know. Introduced a new free function wrapper
for this: `zfree_with_size()`.
Note: Impact of this optimization is only visible on hot paths e.g. on
repeated memory deallocations.
For the initial phase, I integrated this at `sdsfree()` only. Over time,
we may expand the usage of this new API for other performance sensitive
paths.
For testing, added jemalloc config `--enable-opt-size-checks` to the
daily fortify build. This makes jemalloc validate that the size passed
to `sdallocx()` matches the actual allocation's size class, aborting on
mismatch.
----
Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: ranshid <ranshid@amazon.com>
Test 15-config-set-config-get.tcl was leaving announce-port and
announce-hostnames at non-default values, which breaks auto-discovery in
subsequent test units. Add reset lines at the end of each test that
modifies config. This PR fixes failures in Daily CI tests.
Close#15134
## Description
In cliLegacyIntegrateHelp(), malformed COMMAND entries could return
early and skip freeReplyObject(reply). Use break instead so the loop
exits and the existing cleanup still frees the reply.
Signed-off-by: huynhanx03 <157712338+huynhanx03@users.noreply.github.com>
Co-authored-by: IamShreshth <IamShreshth@users.noreply.github.com>
## Problem
After #14608 (Reply Copy Avoidance), when copy avoidance kicks in, bulk
string replies are sent by reference instead of being copied into the
output buffer.
The referenced bytes are not counted in `reply_bytes`, which causes:
1. `getClientOutputBufferMemoryUsage()` underestimates the actual memory
usage, so output buffer limits may not be triggered in time, allowing
clients to consume unbounded memory.
2. Client eviction does not account for the referenced bytes, making it
ineffective when copy avoidance is used.
3. `omem` reported in `CLIENT LIST` / `CLIENT INFO` does not reflect the
true output buffer memory footprint.
## Solution
Track the bytes of referenced bulk strings in the output buffer with two
per-client counters:
1. reply_bytes_shared - the logical size of all BULK_STR_REF payloads in
the output buffer.
Updated incrementally whenever a reference is added/removed.
Represents memory the client is "charged" for even though it is shared
with the keyspace.
2. reply_bytes_unshared — the subset of the above where the referenced
object's refcount == 1 (i.e. the key has been deleted from the
keyspace), so the memory is kept alive solely by this client's output
buffer and would actually be freed on disconnect.
Maintained as a lazy cache refreshed via
updateClientUnsharedReplyBytes().
## Info field
CLIENT LIST / CLIENT INFO — two new fields, plus refined semantics for
existing ones:
Field | Meaning
-- | --
omem | (semantics changed) logical output-buffer memory, now including
shared memory referenced from the keyspace. Still
excludes client->buf so static clients show 0.
omem-shared | (new) shared output-buffer memory (referenced bulk
strings, not solely owned by this client).
omem-unshared | (new) unshared output-buffer memory (referenced bulk
strings solely owned by this client; freed on disconnect).
tot-mem | (semantics refined) actual memory usage —
includes omem-unshared, excludes omem-shared to avoid double-counting
keyspace memory.
INFO memory — two new fields mirroring the above:
Field | Meaning
-- | --
mem_clients_normal | (semantics changed) actual memory usage of normal
clients (includes unshared, excludes shared).
mem_clients_normal_shared | (new) aggregate shared output-buffer memory
across normal clients.
mem_clients_normal_unshared | (new) aggregate unshared output-buffer
memory across normal clients.
MEMORY STATS — schema extended with the matching keys:
Field | Meaning
-- | --
clients.normal.shared | (new) aggregate shared output-buffer memory
across normal clients.
clients.normal.unshared | (new) aggregate unshared output-buffer memory
across normal clients.
## Bug Fix
Fix missing closeClientOnOutputBufferLimitReached() call when adding a
referenced robj to the reply
---------
Co-authored-by: oranagra <oran@redislabs.com>
<!-- CURSOR_SUMMARY -->
> [!NOTE]
> **Low Risk**
> Low risk Makefile change that only alters build flags for the
RediSearch module; primary risk is build/compatibility issues on some
toolchains when LTO is enabled.
>
> **Overview**
> **RediSearch module builds now default to link-time optimization.**
The `modules/redisearch/Makefile` introduces `LTO ?= 1` and exports it
so the upstream RediSearch build can pick it up, with an escape hatch to
disable via `LTO=0`.
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
de33dc7033. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
Reject control characters (0x00-0x1F, 0x7F) in user-controlled string
arguments to SENTINEL SET, SENTINEL MONITOR, and SENTINEL CONFIG SET to
prevent newline injection into the persisted config file. An attacker
with access to SENTINEL SET could inject arbitrary config directives
(e.g. notification-script) by embedding \r\n in auth-pass or similar
fields, leading to code execution on restart.
As a defense-in-depth measure, config serialization now uses sdscatrepr
(via sentinelSdscatConfigArg) for all user-controlled string fields when
they contain characters that require escaping. Simple values remain
unquoted for backward compatibility with older config parsers.
Add comprehensive Sentinel tests (16-config-injection.tcl) covering
control character rejection for all affected commands, verification that
injection payloads do not pollute the config file, round-trip
persistence of values containing spaces and quotes through restart, and
backward compatibility with the old unquoted config format.
PR https://github.com/redis/redis/pull/14937 updates the Codecov
workflow configuration for `codecov/codecov-action` v6.
The action no longer accepts the singular `file` input, so this switches
to `files` to ensure `./src/redis.info` is uploaded correctly.