## Motivation
Redis's existing keyspace notification system operates at the **key
level** only — when a hash field is modified via `HSET`, `HDEL`, or
`HEXPIRE`, the subscriber receives the key name and the event type, but
not **which fields** were affected, therefore, these notifications has
very little practical value.
This PR introduces a subkey notification system that extends keyspace
events to include field-level (subkey) details for hash operations,
through both Pub/Sub channels and the Module API.
## New Pub/Sub Notification Channels
Four new channels are added:
|Channel Format | Payload |
|---------------|---------|
| `__subkeyspace@<db>__:<key>` | `<event>\|<len>:<subkey>[,...]` |
|`__subkeyevent@<db>__:<event>` |
`<key_len>:<key>\|<len>:<subkey>[,...]` |
| `__subkeyspaceitem@<db>__:<key>\n<subkey>` | `<event>` |
|`__subkeyspaceevent@<db>__:<event>\|<key>` | `<len>:<subkey>[,...]` |
**Design rationale for 4 channels:**
- **Subkeyspace**: Subscribe to a specific key, receive all field
changes in a single message — efficient for key-centric consumers.
- **Subkeyevent**: Subscribe to a specific event type, receive
key+fields — efficient for event-centric consumers.
- **Subkeyspaceitem**: Subscribe to a specific key+field combination —
the most selective, one message per field, no parsing needed.
- **Subkeyspaceevent**: Subscribe to event+key combination, receiving
only the affected fields — server-side filtering on both dimensions.
Subkeys are encoded in a length-prefixed format (`<len>:<subkey>`) to
support binary-safe field names containing delimiters.
**Safety guards:**
- Events containing `|` are skipped for `__subkeyspace` and
`__subkeyspaceevent ` channels (to avoid parsing ambiguity).
- Keys containing `\n` are skipped for the `__subkeyspaceitem` channel
(newline is the key/subkey separator).
- Subkeys channels are only published when `subkeys != NULL && count >
0`.
## Hash Command Integration
The following hash operations now emit subkey level notifications with
the affected field names:
| Command | Event | Subkeys |
|---------|-------|---------|
| `HSET` / `HMSET` | `hset` | All fields being set |
| `HSETNX` | `hset` | The field (if set) |
| `HDEL` | `hdel` | All fields deleted |
| `HGETDEL` | `hdel` / `hexpired` | Deleted or lazily expired fields |
| `HGETEX` | `hexpire` / `hpersist` / `hdel` / `hexpired` | Affected
fields per event |
| `HINCRBY` | `hincrby` | The field |
| `HINCRBYFLOAT` | `hincrbyfloat` | The field |
| `HEXPIRE` / `HPEXPIRE` / `HEXPIREAT` / `HPEXPIREAT` | `hexpire` |
Updated fields |
| `HPERSIST` | `hpersist` | Persisted fields |
| `HSETEX` | `hset` / `hdel` / `hexpire` / `hexpired` | Affected fields
per event |
| Field expiration (active/lazy) | `hexpired` | All expired fields
(batched) |
For field expiration, expired fields are collected into a dynamic array
and sent as a single batched notification after the expiration loop,
rather than one notification per field.
## Module API
Three new APIs and one new callback type:
```c
/* Function pointer type for keyspace event notifications with subkeys from modules. */
typedef void (*RedisModuleNotificationWithSubkeysFunc)(
RedisModuleCtx *ctx, int type, const char *event,
RedisModuleString *key, RedisModuleString **subkeys, int count);
/* Subscribe to keyspace notifications with subkey information.
*
* This is the extended version of RM_SubscribeToKeyspaceEvents. When subkeys
* are available, the `subkeys` array and `count` are passed to the callback.
* `subkeys` contains only the names of affected subkeys (values are not included),
* and `count` is the number of elements. The array may contain duplicates when
* the same subkey appears more than once in a command (e.g. HSET key f1 v1 f1 v2
* produces subkeys=["f1","f1"], count=2). When no subkeys are present, `subkeys`
* will be NULL and `count` will be 0. Whether events without subkeys are delivered
* depends on the `flags` parameter (see below).
*
* `types` is a bit mask of event types the module is interested in
* (using the same REDISMODULE_NOTIFY_* flags as RM_SubscribeToKeyspaceEvents).
*
* `flags` controls delivery filtering:
* - REDISMODULE_NOTIFY_FLAG_NONE: The callback is invoked for all matching
* events regardless of whether subkeys are present, so a separate
* RM_SubscribeToKeyspaceEvents registration can be omitted.
* - REDISMODULE_NOTIFY_FLAG_SUBKEYS_REQUIRED: The callback is only invoked
* when subkeys are not empty. Events without subkey information (e.g. SET,
* EXPIRE, DEL) are skipped.
*
* The callback signature is:
* void callback(RedisModuleCtx *ctx, int type, const char *event,
* RedisModuleString *key, RedisModuleString **subkeys, int count);
*
* The subkeys array and its contents are only valid during the callback.
* The underlying objects may be stack-allocated or temporary, so
* RM_RetainString must NOT be used on them. To keep a subkey beyond
* the callback (e.g. in a RM_AddPostNotificationJob callback), use
* RM_HoldString (which handles static objects by copying) or
* RM_CreateStringFromString to make a deep copy before returning.
*/
int RM_SubscribeToKeyspaceEventsWithSubkeys(RedisModuleCtx *ctx, int types, int flags, RedisModuleNotificationWithSubkeysFunc callback);
/* Unregister a module's callback from keyspace notifications with subkeys
* for specific event types.
*
* This function removes a previously registered subscription identified by
* the event mask, delivery flags, and the callback function.
*
* Parameters:
* - ctx: The RedisModuleCtx associated with the calling module.
* - types: The event mask representing the notification types to unsubscribe from.
* - flags: The delivery flags that were used during registration.
* - callback: The callback function pointer that was originally registered.
*
* Returns:
* - REDISMODULE_OK on successful removal of the subscription.
* - REDISMODULE_ERR if no matching subscription was found. */
int RM_UnsubscribeFromKeyspaceEventsWithSubkeys(
RedisModuleCtx *ctx, int types, int flags,
RedisModuleNotificationWithSubkeysFunc cb);
/* Like RM_NotifyKeyspaceEvent, but also triggers subkey-level notifications
* when subkeys are provided. Both key-level (keyspace/keyevent) and
* subkey-level (subkeyspace/subkeyevent/subkeyspaceitem/subkeyspaceevent)
* channels are published to, depending on the server configuration.
*
* This is the extended version of RM_NotifyKeyspaceEvent and can actually
* replace it. When called with subkeys=NULL and count=0, it behaves
* identically to RM_NotifyKeyspaceEvent. */
int RM_NotifyKeyspaceEventWithSubkeys(
RedisModuleCtx *ctx, int type, const char *event,
RedisModuleString *key, RedisModuleString **subkeys, int count);
```
## Configuration
Subkey notifications are controlled via the existing
`notify-keyspace-events` configuration string with four new characters:
`notify-keyspace-events` "STIV"
**S** -> Subkeyspace events, published with `__subkeyspace@<db>__:<key>`
prefix.
**T** -> Subkeyevent events, published with
`__subkeyevent@<db>__:<event>` prefix.
**I** -> Subkeyspaceitem events, published per subkey with
`__subkeyspaceitem@<db>__:<key>\n<subkey>` prefix.
**V** -> Subkeyspaceevent events, published with
`__subkeyspaceevent@<db>__:<event>|<key>` prefix.
These flags are **independent** from the existing key-level flags (`K`,
`E`, etc.). Enabling subkey notifications does **not** implicitly enable
or depend on keyspace/keyevent notifications, and vice versa.
## Known Limitations
- **Duplicate fields in subkey notifications**: Subkey notification
payloads may contain duplicate field names when the same field is
affected more than once within a single command. Since duplicate fields
are not the common case and deduplication would introduce significant
overhead on every notification, we chose not to deduplicate at this
time.
- **Subkey is sds encoding object**: We assume the subkey is sds
encoding object, and access it by `subkey->ptr`, and there is an assert,
redis will crash if not.
M_CreateKeyMetaClass() allows registration only on:
- 'DEBUG enable-module-keymeta-runtime-registration 1' (replaces server.enable_debug_cmd)
- REDISMODULE_CTX_FLAGS_SERVER_STARTUP, in addition to module->onload
As part of KSN, modules must not modify keys. However, RediSearch
modifies key metadata in some flows, which may invalidate the local
kvobj pointer.
Introduce KSN_INVALIDATE_KVOBJ() to explicitly invalidate kvobj after
notifications, preventing further access by Redis core. Currently
relevant for hash keys without HFE.
Changes:
- Add KSN_INVALIDATE_KVOBJ() to guard unsafe flows
- Apply invalidation beyond hash-specific paths
- Extend KSN side-effect coverage for DELEX and MOVE
- Rearrange flows to avoid kvobj access after notification
- Include additional tests from @JoanFM (#14939)
Behavior:
No intended behavior change and no reordering of notifications.
On HSETNX command handling, the key space notification is sent before
the complete usage of kvobject in stack unlike in the rest of the HSET*
family of command handlers.
The problem is that when a Module writes potentially to Key Metadata,
this KVObject may be reallocated and the usage of this after the
notifciation becomes dangerous and can potentially lead to a crash.
This issue appeared because we started integration Key Metadata support
on a module and observed that when handling HSETNX to update some
metadata, we observed a crash.
The recent PR https://github.com/redis/redis/pull/14896 introduced new
slowlog statistics in `INFO` commandstats. This causes `assert_match` to
fail in CI (especially under Valgrind) when commands trigger the
slowlog, adding extra fields after failed_calls.
This PR updates the test patterns to append a trailing `*` to tolerate
these optional fields.
# What
Add global and per-command slowlog metrics.
`INFO STATS` now shows:
- slowlog_commands_count - total count of commands written to slowlog
(including trimmed ones)
- slowlog_commands_time_ms_sum - sum of execution times of the commands
from the slowlog.
- slowlog_commands_time_ms_max - maximum execution time of a command
from the slowlog (useful for calculating averege values)
`INFO COMMANDSTATS` adds the equivalent 3 metrics, but per command. Only
shown for a command if it was added at least once in the slowlog:
- slowlog_count - how many times the command was written in the slowlog
- slowlog_time_ms_sum - sum of execution time of the command (only from
the slowlog)
- slowlog_time_ms_max - maximum execution time of the command (only from
the slowlog)
# Why
More fine-grained slowlog metrics, easy of alert creation regarding
slowlog.
`luaRedisAclCheckCmdPermissionsCommand` and
`RM_ACLCheckCommandPermissions` now call `commandCheckArity()` to check
command arity before calling `ACLCheckAllUserCommandPerm`, matching the
behavior of `processCommand`, `scriptCall`, and `RM_Call`. Without this,
KEYNUM keyspec commands like EVAL with wrong arity cause out-of-bounds
argv access during key extraction.
Also fix KEYNUM index calculation (`first + keynumidx`) and add a bounds
check in genericGetKeys().
Add scripting and module ACL tests for wrong-arity `EVAL` to lock in the
non-crashing behavior.
Fixes#14843
Add RM_GetContextUser to retrieve the RedisModuleUser set via
RM_SetContextUser, allowing modules to access the user associated
with RM_Call ACL checks.
in addition, add new api to get the user name from RM_RedisModuleUser
Each command having subcommands needs a HELP subcommand which is
currently missing for HOTKEYS.
Also the newly added section "Hotkeys" for INFO was messing up modules
INFOs in some cases.
Fixed both issues in this PR.
# Problem
While introducing Async IO
threads(https://github.com/redis/redis/pull/13695) primary and replica
clients were left to be handled inside main thread due to data race and
synchronization issues. This PR solves this issue with the additional
hope it increases performance of replication.
# Overview
## Moving the clients to IO threads
Since clients first participate in a handshake and an RDB replication
phases it was decided they are moved to IO-thread after RDB replication
is done. For primary client this was trivial as the master client is
created only after RDB sync (+ some additional checks one can see in
`isClientMustHandledByMainThread`). Replica clients though are moved to
IO threads immediately after connection (as are all clients) so
currently in `unstable` replication happens while this client is in
IO-thread. In this PR it was moved to main thread after receiving the
first `REPLCONF` message from the replica, but it is a bit hacky and we
can remove it. I didn't find issues between the two versions.
## Primary client (replica node)
We have few issues here:
- during `serverCron` a `replicationCron` is ran which periodically
sends `REPLCONF ACK` message to the master, also checks for timed-out
master. In order to prevent data races we utilize`IOThreadClientsCron`.
The client is periodically sent to main thread and during
`processClientsFromIOThread` it's checked if it needs to run the
replication cron behaviour.
- data races with main thread - specifically `lastinteraction` and
`read_reploff` members of the primary client that are written to in
`readQueryFromClient` could be accessed at the same time from main
thread during execution of `INFO REPLICATION`(`genRedisInfoString`). To
solve this the members were duplicated so if the client is in IO-thread
it writes to the duplicates and they are synced with the original
variables each time the client is send to main thread ( that means `INFO
REPLICATION` could potentially return stale values).
- During `freeClient` the primary client is fetched to main thread but
when caching it(`replicationCacheMaster`) the thread id will remain the
id of the IO thread it was from. This creates problems when resurrecting
the master client. Here the call to `unbindClientFromIOThreadEventLoop`
in `freeClient` was rewritten to call `keepClientInMainThread` which
automatically fixes the problem.
- During `exitScriptTimedoutMode` the master is queued for reprocessing
(specifically process any pending commands ASAP after it's unblocked).
We do that by putting it in the `server.unblocked_clients` list, which
are processed in the next `beforeSleep` cycle in main thread. Since this
will create a contention between main and IO thread, we just skip this
queueing in `unblocked_clients` and just queue the client to main thread
- the `processClientsFromIOThread` will process the pending commands
just as main would have.
## Replica clients (primary node)
We move the client after RDB replication is done and after replication
backlog is fed with its first message.
We do that so that the client's reference to the first replication
backlog node is initialized before it's read from IO-thread, hence no
contention with main thread on it.
### Shared replication buffer
Currently in unstable the replication buffer is shared amongst clients.
This is done via clients holding references to the nodes inside the
buffer. A node from the buffer can be trimmed once each replica client
has read it and send its contents. The reference is
`client->ref_repl_buf_node`. The replication buffer is written to by
main thread in `feedReplicationBuffer` and the refcounting is intrusive
- it's inside the replication-buffer nodes themselves.
Since the replica client changes the refcount (decreases the refcount of
the node it has just read, and increases the refcount of the next node
it starts to read) during `writeToClient` we have a data race with main
thread when it feeds the replication buffer. Moreover, main thread also
updates the `used` size of the node - how much it has written to it,
compared to its capacity which the replica client relies on to know how
much to read. Obviously replica being in IO-thread creates another data
race here. To mitigate these issues a few new variables were added to
the client's struct:
- `io_curr_repl_node` - starting node this replica is reading from
inside IO-thread
- `io_bound_repl_node` - the last node in the replication buffer the
replica sees before being send to IO-thread.
These values are only allowed to be updated in main thread. The client
keeps track of how much it has read into the buffer via the old
`ref_repl_buf_node`. Generally while in IO-thread the replica client
will now keep refcount of the `io_curr_repl_node` until it's processed
all the nodes up to `io_bound_repl_node` - at that point its returned to
main thread which can safely update the refcounts.
The `io_bound_repl_node` reference is there so the replica knows when to
stop reading from the repl buffer - imagine that replica reads from the
last node of the replication buffer while main thread feeds data to it -
we will create a data race on the `used` value
(`_writeToClientSlave`(IO-thread) vs `feedReplicationBuffer`(main)).
That's why this value is updated just before the replica is being send
to IO thread.
*NOTE*, this means that when replicas are handled by IO threads they
will hold more than one node at a time (i.e `io_curr_repl_node` up to
`io_bound_repl_node`) meaning trimming will happen a bit less
frequently. Tests show no significant problems with that.
(tnx to @ShooterIT for the `io_curr_repl_node` and `io_bound_repl_node`
mechanism as my initial implementation had similar semantics but was way
less clear)
Example of how this works:
* Replication buffer state at time N:
| node 0| ... | node M, used_size K |
* replica caches `io_curr_repl_node`=0, `io_bound_repl_node`=M and
`io_bound_block_pos`=K
* replica moves to IO thread and processes all the data it sees
* Replication buffer state at time N + 1:
| node 0| ... | node M, used_size Full | |node M + 1| |node M + 2,
used_size L|, where Full > M
* replica moves to main thread at time N + 1, at this point following
happens
- refcount to node 0 (io_curr_repl_node) is decreased
- `ref_repl_buf_node` becomes node M(io_bound_repl_node) (we still have
size-K bytes to process from there)
- refcount to node M is increased (now all nodes from 0 up to M-1
including can be trimmed unless some other replica holds reference to
them)
- And just before the replica is send back to IO thread the following
are updated:
- `io_bound_repl_node` ref becomes node M+2
- `io_bound_block_pos` becomes L
Note that replica client is only moved to main if it has processed all
the data it knows about (i.e up to `io_bound_repl_node` +
`io_bound_block_pos`)
### Replica clients kept in main as much as possible
During implementation an issue arose - how fast is the replica client
able to get knowledge about new data from the replication buffer and how
fast can it trim it. In order for that to happen ASAP whenever a replica
is moved to main it remains there until the replication buffer is fed
new data. At that point its put in the pending write queue and special
cased in handleClientsWithPendingWrites so that its send to IO thread
ASAP to write the new data to replica. Also since each time the replica
writes its whole repl data it knows about that means after it's send to
main thread `processClientsFromIOThread` is able to immediately update
the refcounts and trim whatever it can.
### ACK messages from primary
Slave clients need to periodically read `REPLCONF ACK` messages from
client. Since replica can remain in main thread indefinitely if no DB
change occurs, a new atomic `pending_read` was added during
`readQueryFromClient`. If a replica client has a pending read it's
returned back to IO-thread in order to process the read even if there is
no pending repl data to write.
### Replicas during shutdown
During shutdown the main thread pauses write actions and periodically
checks if all replicas have reached the same replication offset as the
primary node. During `finishShutdown` that may or may not be the case.
Either way a client data may be read from the replicas and even we may
try to write any pending data to them inside `flushSlavesOutputBuffers`.
In order to prevent races all the replicas from IO threads are moved to
main via `fetchClientFromIOThread`. A cancel of the shutdown should be
ok, since the mechanism employed by `handleClientsWithPendingWrites`
should return the client back to IO thread when needed.
## Notes
While adding new tests timing issues with Tsan tests were found and
fixed.
Also there is a data race issue caught by Tsan on the `last_error`
member of the `client` struct. It happens when both IO-thread and main
thread make a syscall using a `client` instance - this can happen only
for primary and replica clients since their data can be accessed by
commands send from other clients. Specific example is the `INFO
REPLICATION` command.
Although other such races were fixed, as described above, this once is
insignificant and it was decided to be ignored in `tsan.sup`.
---------
Co-authored-by: Yuan Wang <wangyuancode@163.com>
Co-authored-by: Yuan Wang <yuan.wang@redis.com>
Addresses crash and clarifies errors around container commands.
- Update server.c to handle container commands with no subcommand: emit
"missing subcommand. Try HELP."; keep "unknown subcommand" for invalid
subcommands; for unknown commands, include args preview only when
present
- Add a test module command subcommands.internal_container with a
subcommand for validation
- Add unit test asserting missing subcommand error when calling the
internal container command without arguments
Modules KeyMeta (Keys Metadata)
Redis modules often need to associate additional metadata with keys in
the keyspace. The objective is to create a unified and extensible
interface, usable by modules, Redis core, and maybe later by the users,
that facilitate the association and management of metadata with keys.
While extending RedisModuleTypes might be an easier path, this proposal
goes one step further: a general-purpose mechanism that lets attach
metadata to any key, independent of underlying data type.
A major part of this feature involves defining how metadata is managed
throughout a key’s lifecycle. Modules will be able to optionally
register distinct metadata classes, each with its own lifecycle
callbacks and capable of storing arbitrary 8-byte value per key. These
metadata values will be embedded directly within Redis’s core key-value
objects to ensure fast access and automatic callback execution as keys
are created, updated, or deleted. Each 8 bytes of metadata can represent
either a simple primitive value or a pointer/handle to more complex,
externally managed data by the module and RDB serialized along with the
key.
Key Features:
- Modules can register up to 7 metadata classes (8 total, 1 reserved)
- Each class: 4-char name + 5-bit version (e.g., "SRC1" v1)
- Each class attaches 8 bytes per key (value or pointer/handle)
- Separate namespace from module data types
Module API:
- RedisModule_CreateKeyMetaClass() - Register metadata class
- RedisModule_ReleaseKeyMetaClass() - Release metadata class
- RedisModule_SetKeyMeta() - Attach/update metadata
- RedisModule_GetKeyMeta() - Retrieve metadata
Lifecycle Callbacks:
- copy, rename, move - Handle key operations
- unlink, free - Handle key deletion/expiration
- rdb_save, rdb_load - RDB persistence
- aof_rewrite - AOF rewrite support
Implementation:
- Metadata slots allocated before kvobj in reverse class ID order
- 8-bit metabits bitmap tracks active classes per key
- Minimal memory overhead - only allocated slots consume memory
RDB Serialization (v13):
- New opcode RDB_OPCODE_KEY_METADATA
- Compact 32-bit class spec: 24-bit name + 5-bit ver + 3-bit flags
- Self-contained format: [META,] TYPE, KEY, VALUE
- Portable across cluster nodes
Integration:
- Core ops: dbAdd, dbSet, COPY, MOVE, RENAME, DELETE
- DUMP/RESTORE support
- AOF rewrite via module callbacks
- Defragmentation support
- Module type I/O refactored to ModuleEntityId
Fix flaky test failures in `tests/unit/moduleapi/blockedclient.tcl`
caused by
clock precision issues with monotonic clock.
The test runs a command that blocks for 200ms and then asserts the
elapsed time
is >= 200ms. Due to clock skew and timing precision differences, the
measured
time occasionally comes back as 199ms, causing spurious test failures.
### Summary
This PR introduces two new maxmemory eviction policies: `volatile-lrm`
and `allkeys-lrm`.
LRM (Least Recently Modified) is similar to LRU but only updates the
timestamp on write operations, not read operations. This makes it useful
for evicting keys that haven't been modified recently, regardless of how
frequently they are read.
### Core Implementation
The LRM implementation reuses the existing LRU infrastructure but with a
key difference in when timestamps are updated:
- **LRU**: Updates timestamp on both read and write operations
- **LRM**: Updates timestamp only on write operations via `updateLRM()`
### Key changes:
Add `keyModified()` to accept an optional `robj *val` parameter and call
`updateLRM()` when a value is provided. Since `keyModified()` serves as
the unified entry point for all key modifications, placing the LRM
update here ensures timestamps are consistently updated across all write
operations
---------
Co-authored-by: oranagra <oran@redislabs.com>
Co-authored-by: Yuan Wang <yuan.wang@redis.com>
Revert a breaking change introduced in #14051 described in this comment
https://github.com/redis/redis/pull/14051#discussion_r2281765769
The non-negative check inside `checkNumericBoundaries` was ignoring that
passing a big unsigned long long value (> 2^63-1) will be passed as
negative value and will never reach the lower/upper boundary check.
The check is removed, reverting the breaking change.
This allows for RedisModule_ConfigSetNumeric to pass big unsigned number, albeit via `long long` parameter. Added comments about this behaviour.
Added tests for https://github.com/redis/redis/pull/14051#discussion_r2281765769
This PR is based on https://github.com/valkey-io/valkey/pull/1303
This PR introduces a DEBUG_DEFRAG compilation option that enables
activedefrag functionality even when the allocator is not jemalloc, and
always forces defragmentation regardless of the amount or ratio of
fragmentation.
## Using
```
make SANITIZER=address DEBUG_DEFRAG=<force|fully>
./runtest --debug-defrag
```
* DEBUG_DEFRAG=force
* Ignore the threshold for defragmentation to ensure that
defragmentation is always triggered.
* Always reallocate pointers to probe for correctness issues in pointer
reallocation.
* DEBUG_DEFRAG=fully
* Includes everything in the option `force`.
* Additionally performs a full defrag on every defrag cycle, which is
significantly slower but more accurate.
---------
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: oranagra <oran@redislabs.com>
These two tests often fail in the slow environment.
1. `Module defrag: late defrag with cursor works` test
`defragtest_datatype_resumes` in a defrag cycle does not always reach 10
times, so increase the threshold and move the assertion of
`defragtest_datatype_resumes` to `wait_for_condition`.
2. `Module defrag: global defrag works` test
Increase the waiting time for this test.
integrate module API tests into default test suite
- Add module_tests target to main Makefile to build test modules
- Include unit/moduleapi in test_dirs to run module tests with ./runtest
- Module API tests now run by default instead of requiring
runtest-moduleapi
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
This commit adds support for the "touches-arbitrary-keys" command flag
in Redis modules, allowing module commands to be properly marked when
they modify keys not explicitly provided as arguments, to avoid wrapping
replicated commands with MULTI/EXEC.
Changes:
- Added "touches-arbitrary-keys" flag parsing in
commandFlagsFromString()
- Updated module command documentation to describe the new flag
- Added test implementation in zset module with zset.delall command to
demonstrate and verify the flag functionality
The zset.delall command serves as a test case that scans the keyspace
and deletes all zset-type keys, properly using the new flag since it
modifies keys not provided via argv.
This commit adds a new `zset.delall` command to the zset test module
that iterates through the keyspace and deletes all keys of type "zset".
Key changes:
- Added zset_delall() function that uses RedisModule_Scan to iterate
through all keys in the keyspace
- Added zset_delall_callback() that checks each key's type and deletes
zset keys using RedisModule_Call with "DEL" command
- Registered the new command with "write touches-arbitrary-keys" flags
since it modifies arbitrary keys not provided via argv
- Added support for "touches-arbitrary-keys" flag in module command
parsing
- Added comprehensive tests for the new functionality
The command returns the number of deleted zset keys and properly handles
replication by using the "s!" format specifier with RedisModule_Call to
ensure DEL commands are replicated to slaves and AOF.
Usage: ZSET.DELALL
Returns: Integer count of deleted zset keys
Fix https://github.com/redis/redis/issues/14208
As mentioned in the above issue, RM_GetCommandKeysWithFlags could have
memory leak when the number of keys is larger than MAX_KEYS_BUFFER. This
PR fixes it by calling getKeysFreeResult before the function's return. A
TCL testcase is created to verify the fix.
In cluster mode with modules, for a given key, the slot resolution for
the KEYSIZES histogram update was incorrect. As a result, the histogram
might gracefully ignored those keys instead or update the wrong slot
histogram.
This API complements module subscribe by enabling modules to unsubscribe
from specific keyspace event notifications when they are no longer
needed.
This helps reduce performance overhead and unnecessary callback
invocations.
The function matches subscriptions based on event mask, callback
pointer,
and module identity. If a matching subscription is found, it is removed.
Returns REDISMODULE_OK if a subscription was successfully removed,
otherwise REDISMODULE_ERR.
After #13816, we added defragmentation support for moduleDict, which
significantly increased global data size.
As a result, the defragmentation tests for non-global data were
affected.
Now, we move the creation of global data to before the global data test
to avoid it interfering with other tests.
Fixed the simple key test failure due to forgetting to reset stats.
# Problem
Some redis modules need to call `CONFIG GET/SET` commands. Server may be
ran with `rename-command CONFIG ""`(or something similar) which leads to
the module being unable to access the config.
# Solution
Added new API functions for use by modules
```
RedisModuleConfigIterator* RedisModule_GetConfigIterator(RedisModuleCtx *ctx, const char *pattern);
void RedisModule_ReleaseConfigIterator(RedisModuleCtx *ctx, RedisModuleConfigIterator *iter);
const char *RedisModule_ConfigIteratorNext(RedisModuleConfigIterator *iter);
int RedisModule_GetConfigType(const char *name, RedisModuleConfigType *res);
int RedisModule_GetBoolConfig(RedisModuleCtx *ctx, const char *name, int *res);
int RedisModule_GetConfig(RedisModuleCtx *ctx, const char *name, RedisModuleString **res);
int RedisModule_GetEnumConfig(RedisModuleCtx *ctx, const char *name, RedisModuleString **res);
int RedisModule_GetNumericConfig(RedisModuleCtx *ctx, const char *name, long long *res);
int RedisModule_SetBoolConfig(RedisModuleCtx *ctx, const char *name, int value, RedisModuleString **err);
int RedisModule_SetConfig(RedisModuleCtx *ctx, const char *name, RedisModuleString *value, RedisModuleString **err);
int RedisModule_SetEnumConfig(RedisModuleCtx *ctx, const char *name, RedisModuleString *value, RedisModuleString **err);
int RedisModule_SetNumericConfig(RedisModuleCtx *ctx, const char *name, long long value, RedisModuleString **err);
```
## Implementation
The work is mostly done inside `config.c` as I didn't want to expose the
config dict outside of it. That means each of these module functions has
a corresponding method in `config.c` that actually does the job. F.e
`RedisModule_SetEnumConfig` calls `moduleSetEnumConfig` which is
implemented in `config.c`
## Notes
Also, refactored `configSetCommand` and `restoreBackupConfig` functions
for the following reasons:
- code and logic is now way more clear in `configSetCommand`. Only
caveat here is removal of an optimization that skipped running apply
functions that already have ran in favour of code clarity.
- Both functions needlessly separated logic for module configs and
normal configs whereas no such separation is needed. This also had the
side effect of removing some allocations.
- `restoreBackupConfig` now has clearer interface and can be reused with
ease. One of the places I reused it is for the individual
`moduleSet*Config` functions, each of which needs the restoration
functionality but for a single config only.
## Future
Additionally, a couple considerations were made for potentially
extending the API in the future
- if need be an API for atomically setting multiple config values can be
added - `RedisModule_SetConfigsTranscationStart/End` or similar that can
be put around `RedisModule_Set*Config` calls.
- if performance is an issue an API
`RedisModule_GetConfigIteratorNextWithTypehint` or similar may be added
in order not to incur the additional cost of calling
`RedisModule_GetConfigType`.
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
## Description
`updateClientMemUsageAndBucket` is called from the main thread to update
memory usage and memory bucket of a client. That's why it has assertion
that it's being called by the main thread.
But it may also be called from a thread spawned by a module.
Specifically, when a module calls `RedisModule_Call` which in turn calls
`call`->`replicationFeedMonitors`->`updateClientMemUsageAndBucket`.
This is generally safe as module calls inside a spawned thread should be
guarded by a call to `ThreadSafeContextLock`, i.e the module is holding
the GIL at this point.
This commit fixes the assertion inside `updateClientMemUsageAndBucket`
so that it encompasses that case also. Generally calls from
module-spawned threads are safe to operate on clients that are not
running on IO-threads when the module is holding the GIL.
---------
Co-authored-by: Yuan Wang <wangyuancode@163.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
Add thread sanitizer run to daily CI.
Few tests are skipped in tsan runs for two reasons:
* Stack trace producing tests (oom, `unit/moduleapi/crash`, etc) are
tagged `tsan:skip` because redis calls `backtrace()` in signal handler
which turns out to be signal-unsafe since it might allocate memory (e.g.
glibc 2.39 does it through a call to `_dl_map_object_deps()`).
* Few tests become flaky with thread sanitizer builds and don't finish
in expected deadlines because of the additional tsan overhead. Instead
of skipping those tests, this can improved in the future by allowing
more iterations when waiting for tsan builds.
Deadlock detection is disabled for now because of tsan limitation where
max 64 locks can be taken at once.
There is one outstanding (false-positive?) race in jemalloc which is
suppressed in `tsan.sup`.
Fix few races thread sanitizer reported having to do with writes from
signal handlers. Since in multi-threaded setting signal handlers might
be called on any thread (modulo pthread_sigmask) while the main thread
is running, `volatile sig_atomic_t` type is not sufficient and atomics
are used instead.
When `repl-diskless-load` is enabled on a replica, and it is in the
process of loading an RDB file, a broken connection detected by the main
channel may trigger a call to rioAbort(). This sets a flag to cause the
rdb channel to fail on the next rioRead() call, allowing it to perform
necessary cleanup.
However, there are specific scenarios where the error is checked using
rioGetReadError(), which does not account for the RIO_ABORT flag (see
[source](79b37ff535/src/rdb.c (L3098))).
As a result, the error goes undetected. The code then proceeds to
validate a module type, fails to find a match, and calls
rdbReportCorruptRDB() which logs the following error and exits the
process:
```
The RDB file contains module data I can't load: no matching module type '_________'
```
To fix this issue, the RIO_ABORT flag has been removed. Now, rioAbort()
sets both read and write error flags, so that subsequent operations and
error checks properly detect the failure.
Additional keys were added to the short read test. It reproduces the
issue with this change. We hit that problematic line once per key. My
guess is that with many smaller keys, the likelihood of the connection
being killed at just the right moment increases.
There are several issues with maintaining histogram counters.
Ideally, the hooks would be placed in the low-level datatype
implementations. However, this logic is triggered in various contexts
and doesn’t always map directly to a stored DB key. As a result, the
hooks sit closer to the high-level commands layer. It’s a bit messy, but
the right way to ensure histogram counters behave correctly is through
broad test coverage.
* Fix inaccuracies around deletion scenarios.
* Fix inaccuracies around modules calls. Added corresponding tests.
* The info-keysizes.tcl test has been extended to operate on meaningful
datasets
* Validate histogram correctness in edge cases involving collection
deletions.
* Add new macro debugServerAssert(). Effective only if compiled with
DEBUG_ASSERTIONS.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
in #13505, we changed the code to use the string value of the key rather
than the integer value on the stack, but we have a test in
unit/moduleapi/keyspace_events that uses keyspace notification hook to
modify the value with RM_StringDMA, which can cause this value to be
released before used. the reason it didn't happen so far is because we
were using shared integers, so releasing the object doesn't free it.
After #13840, the data we populate becomes more complex and slower, we
always wait for a defragmentation cycle to end before verifying that the
test is okay.
However, in some slow environments, an entire defragmentation cycle can
exceed 5 seconds, and in my local test using 'taskset -c 0' it can reach
6 seconds, so increase the threshold to avoid test failures.
### Background
The program runs normally in standalone mode, but migrating to cluster
mode may cause errors, this is because some cross slot commands can not
run in cluster mode. We should provide an approach to detect this issue
when running in standalone mode, and need to expose a metric which
indicates the usage of no incompatible commands.
### Solution
To avoid perf impact, we introduce a new config
`cluster-compatibility-sample-ratio` which define the sampling ratio
(0-100) for checking command compatibility in cluster mode. When a
command is executed, it is sampled at the specified ratio to determine
if it complies with Redis cluster constraints, such as cross-slot
restrictions.
A new metric is exposed: `cluster_incompatible_ops` in `info stats`
output.
The following operations will be considered incompatible operations.
- cross-slot command
If a command has multiple cross slot keys, it is incompatible
- `swap, copy, move, select` command
These commands involve multi databases in some cases, we don't allow
multiple DB in cluster mode, so there are not compatible
- Module command with `no-cluster` flag
If a module command has `no-cluster` flag, we will encounter an error
when loading module, leading to fail to load module if cluster is
enabled, so this is incompatible.
- Script/function with `no-cluster` flag
Similar with module command, if we declare `no-cluster` in shebang of
script/function, we also can not run it in cluster mode
- `sort` command by/get pattern
When `sort` command has `by/get` pattern option, we must ask that the
pattern slot is equal with the slot of keys, otherwise it is
incompatible in cluster mode.
- The script/function command accesses the keys and declared keys have
different slots
For the script/function command, we not only check the slot of declared
keys, but only check the slot the accessing keys, if they are different,
we think it is incompatible.
**Besides**, commands like `keys, scan, flushall, script/function
flush`, that in standalone mode iterate over all data to perform the
operation, are only valid for the server that executes the command in
cluster mode and are not broadcasted. However, this does not lead to
errors, so we do not consider them as incompatible commands.
### Performance impact test
**cross slot test**
Below are the test commands and results. When using MSET with 8 keys,
performance drops by approximately 3%.
**single key test**
It may be due to the overhead of the sampling function, and single-key
commands could cause a 1-2% performance drop.
After https://github.com/redis/redis/pull/13816, we make a new API to
defrag RedisModuleDict.
Currently, we only support incremental defragmentation of the dictionary
itself, but the defragmentation of values is still not incremental. If
the values are very large, it could lead to significant blocking.
Therefore, in this PR, we have added incremental defragmentation for the
values.
The main change is to the `RedisModuleDefragDictValueCallback`, we
modified the return value of this callback.
When the callback returns 1, we will save the `seekTo` as the key of the
current unfinished node, and the next time we enter, we will continue
defragmenting this node.
When the return value is 0, we will proceed to the next node.
## Test
Since each dictionary in the global dict originally contained only 10
strings, but now it has been changed to a nested dictionary, each
dictionary now has 10 sub-dictionaries, with each sub-dictionary
containing 10 strings, this has led to a corresponding reduction in the
defragmentation time obtained from other tests.
Therefore, the other tests have been modified to always wait for
defragmentation to be turned off before the test begins, then start it
after creating fragmentation, ensuring that they can always run for a
full defragmentation cycle.
---------
Co-authored-by: ephraimfeldblum <ephraim.feldblum@redis.com>
1) Enable the callback to be NULL for RM_DefragRedisModuleDict()
Because the dictionary may store only the key without the value.
2) Reduce the system calls of RM_DefragShouldStop()
The API checks the following thresholds before performing a time check:
over 512 defrag hits, or over 1024 defrag misses, and performs the time
judgment if any of these thresholds are reached.
3) Added defragmentation statistics for dictionary items to cover the
associated code for RM_DefragRedisModuleDict().
4) Removed `module_ctx` from `defragModuleCtx` struct, which can be
replaced by a temporary variable.
---------
Co-authored-by: oranagra <oran@redislabs.com>
1) Fix a bug that passing an incorrect endtime to module.
This bug was found by @ShooterIT.
After #13814, all endtime will be monotonic time, and we should no
longer convert it to ustime relative.
Add assertions to prevent endtime from being much larger thatn the
current time.
2) Fix a race in test `Reduce defrag CPU usage when module data can't be
defragged`
---------
Co-authored-by: ShooterIT <wangyuancode@163.com>
After #13815, we introduced incremental defragmentation for global data
for module.
Now we added a new module API `RM_DefragRedisModuleDict` to incremental
defrag `RedisModuleDict`.
This PR adds a new APIs and a new defrag callback:
```c
RedisModuleDict *RM_DefragRedisModuleDict(RedisModuleDefragCtx *ctx, RedisModuleDict *dict, RedisModuleDefragDictValueCallback valueCB, RedisModuleString **seekTo);
typedef void *(*RedisModuleDefragDictValueCallback)(RedisModuleDefragCtx *ctx, void *data, unsigned char *key, size_t keylen);
```
Usage:
```c
RedisModuleString *seekTo = NULL;
RedisModuleDict *dict = = RedisModule_CreateDict(ctx);
... populate the dict code ...
/* Defragment a dictionary completely */
do {
RedisModuleDict *new = RedisModule_DefragRedisModuleDict(ctx, dict, defragGlobalDictValueCB, &seekTo);
if (new != NULL) {
dict = new;
}
} while (seekTo);
```
---------
Co-authored-by: ShooterIT <wangyuancode@163.com>
Co-authored-by: oranagra <oran@redislabs.com>
## Description
Currently, when performing defragmentation on non-key data within the
module, we cannot process the defragmentation incrementally. This
limitation affects the efficiency and flexibility of defragmentation in
certain scenarios.
The primary goal of this PR is to introduce support for incremental
defragmentation of global module data.
## Interface Change
New module API `RegisterDefragFunc2`
This is a more advanced version of `RM_RegisterDefragFunc`, in that it
takes a new callbacks(`RegisterDefragFunc2`) that has a return value,
and can use RM_DefragShouldStop in and indicate that it should be called
again later, or is it done (returned 0).
## Note
The `RegisterDefragFunc` API remains available.
---------
Co-authored-by: ShooterIT <wangyuancode@163.com>
Co-authored-by: oranagra <oran@redislabs.com>