redis

mirror of https://github.com/redis/redis.git synced 2026-04-22 22:57:00 -04:00

Author	SHA1	Message	Date
debing.sun	fa040a72c0	Add XDELEX and XACKDEL commands for stream (#14130 ) ## Summary and detailed design for new stream command ## XDELEX ### Syntax ``` XDELEX key [KEEPREF \| DELREF \| ACKED] IDS numids id [id ...] ``` ### Description The `XDELEX` command extends the Redis Streams `XDEL` command, offering enhanced control over message entry deletion with respect to consumer groups. It accepts optional `DELREF` or `ACKED` parameters to modify its behavior: - KEEPREF: Deletes the specified entries from the stream, but preserves existing references to these entries in all consumer groups' PEL. This behavior is similar to XDEL. - DELREF: Deletes the specified entries from the stream and also removes all references to these entries from all consumer groups' pending entry lists, effectively cleaning up all traces of the messages. - ACKED: Only trims entries that were read and acknowledged by all consumer groups. Note: The `IDS` block can appear at any position in the command, consistent with other commands. ### Reply Array reply, for each `id`: - `-1`: No such `id` exists in the provided stream `key`. - `1`: Entry was deleted from the stream. - `2`: Entry was not deleted, but there are still dangling references. (ACKED option) ## XACKDEL ### Syntax ``` XACKDEL key group [KEEPREF \| DELREF \| ACKED] IDS numids id [id ...] ``` ### Description The `XACKDEL` command combines `XACK` and `XDEL` functionalities in Redis Streams. It acknowledges specified message IDs in the given consumer group and attempts to delete corresponding stream entries. It accepts optional `DELREF` or `ACKED` parameters: - KEEPREF: Acknowledges the messages in the specified consumer group and deletes the entries from the stream, but preserves existing references to these entries in all consumer groups' PEL. - DELREF: Acknowledges the messages in the specified consumer group, deletes the entries from the stream, and also removes all references to these entries from all consumer groups' pending entry lists, effectively cleaning up all traces of the messages. - ACKED: Acknowledges the messages in the specified consumer group and only trims entries that were read and acknowledged by all consumer groups. ### Reply Array reply, for each `id`: - `-1`: No such `id` exists in the provided stream `key`. - `1`: Entry was acknowledged and deleted from the stream. - `2`: Entry was acknowledged but not deleted, but there are still dangling references. (ACKED option) # Redis Streams Commands Extension ## XTRIM ### Syntax ``` XTRIM key <MAXLEN \| MINID> [= \| ~] threshold [LIMIT count] [KEEPREF \| DELREF \| ACKED] ``` ### Description The `XTRIM` command trims a stream by removing entries based on specified criteria, extended to include optional `DELREF` or `ACKED` parameters for consumer group handling: - KEEPREF: Trims the stream according to the specified strategy (MAXLEN or MINID) regardless of whether entries are referenced by any consumer groups, but preserves existing references to these entries in all consumer groups' PEL. - DELREF: Trims the stream according to the specified strategy and also removes all references to the trimmed entries from all consumer groups' PEL. - ACKED: Only trims entries that were read and acknowledged by all consumer groups. ### Reply No change. ## XADD ### Syntax ``` XADD key [NOMKSTREAM] [<MAXLEN \| MINID> [= \| ~] threshold [LIMIT count]] [KEEPREF \| DELREF \| ACKED] <* \| id> field value [field value ...] ``` ### Description The `XADD` command appends a new entry to a stream and optionally trims it in the same operation, extended to include optional `DELREF` or `ACKED` parameters for trimming behavior: - KEEPREF: When trimming, removes entries from the stream according to the specified strategy (MAXLEN or MINID), regardless of whether they are referenced by any consumer groups, but preserves existing references to these entries in all consumer groups' PEL. - DELREF: When trimming, removes entries from the stream according to the specified strategy and also removes all references to these entries from all consumer groups' PEL. - ACKED: When trimming, only removes entries that were read and acknowledged by all consumer groups. Note that if the number of referenced entries is bigger than MAXLEN, we will still stop. ### Reply No change. ## Key implementation Since we currently have no simple way to track the association between an entry and consumer groups without iterating over all groups, we introduce two mechanisms to establish this link. This allows us to determine whether an entry has been seen by all consumer groups, and to identify which groups are referencing it. With this links, we can break the association when the entry is either acknowledged or deleted. 1) Added reference tracking between stream messages and consumer groups using `cgroups_ref` The cgroups_ref is implemented as a rax that maps stream message IDs to lists of consumer groups that reference those messages, and streamNACK stores the corresponding nodes of this list, so that the corresponding groups can be deleted during `ACK`. In this way, we can determine whether an entry has been seen but not ack. 2) Store a cache minimum last_id in the stream structure. The reason for doing this is that there is a situation where an entry has never been seen by the consume group. In this case, we think this entry has not been consumed either. If there is an "ACKED" option, we cannot directly delete this entry either. When a consumer group updates its last_id, we don’t immediately update the cached minimum last_id. Instead, we check whether the group’s previous last_id was equal to the current minimum, or whether the new last_id is smaller than the current minimum (when using `XGROUP SETID`). If either is true, we mark the cached minimum last_id as invalid, and defer the actual update until the next time it’s needed. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: moticless <moticless@github.com> Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com> Co-authored-by: Slavomir Kaslev <slavomir.kaslev@gmail.com> Co-authored-by: Yuan Wang <yuan.wang@redis.com>	2025-07-01 21:00:42 +08:00
debing.sun	5ff81f68a3	Fix XPENDING reply schema for empty reply (#14129 ) When the PEL is empty, the reply of `XPENDING` without `start` option will be: ``` 1) (integer) 0 2) (nil) 3) (nil) 4) (nil) ``` It is not an empty array, so we need to create an individual reply schema for it.	2025-07-01 17:35:09 +08:00
Mincho Paskalev	8dfb823c51	Implement DIFF, DIFF1, ANDOR and ONE for BITOP (#13898 ) This PR adds 4 new operators to the `BITOP` command - `DIFF`, `DIFF1`, `ANDOR` and `ONE`. They enable redis clients to atomically do non-trivial logical operations that are useful for checking membership of a bitmap against a group of bitmaps. * DIFF `BITOP DIFF dest srckey1 srckey2 [key...]` Description DIFF(X, A1, A2, ..., AN) = X ∧ ¬(A1 ∨ A2 ∨ ... ∨ AN), i.e the set bits of X that are not set in any of A1, A2, …, AN NOTE Command expects at least 2 source keys. * DIFF1 `BITOP DIFF1 dest srckey1 srckey2 [key...]` Description DIFF1(X, A1, A2, ..., AN) = ¬X ∧ (A1 ∨ A2 ∨ ... ∨ AN), i.e the bits set in one or more of A1, A2, …, AN that are not set in X NOTE Command expects at least 2 source keys. * ANDOR `BITOP ANDOR dest srckey1 srckey2 [key...]` Description ANDOR(X, A1, A2, ..., AN) = X ∧ (A1 ∨ A2 ∨ ... ∨ AN), i.e the set bits of X that are also set in A1, A2, …, AN NOTE Command expects at least 2 source keys. * ONE `BITOP ONE dest key [key...]` Description ONE(A1, A2, ..., AN) = X, where if X[i] is the i-th bit of X then X[i] = 1 if and only if there is m such that A_m[i] = 1 and An[i] = 0 for all n != m, i.e bit X[i] is set only if it set in exactly one of A1, A2, ..., AN Return value As in all other `BITOP` operators return value for all the new ones is the number bytes of the longest key. EDIT: Besides adding the new commands couple more changes were made: - Added AVX2 path for more optimized computation of the BITOP operations (including the new ones) - Removed the hard limit of max 16 source keys for the fast path to be used - now no matter the number of keys we can enter the fast path given keys are long enough. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-05-20 10:45:50 +03:00
debing.sun	658424fc83	Revert "Update history for ban-list propagation (#13749 )" (#13827 ) As discussed in https://github.com/redis/redis/pull/13749#issuecomment-2673612941. After #10398 we should record only the arguments and output changes in the command history, while placing all others in the redis-doc, so revert #13749.	2025-02-24 17:40:25 +08:00
Ozan Tezcan	6c202f495c	Remove DENYOOM flag from hexpire command (#13800 ) Remove DENYOOM flag from hexpire / hexpireat / hpexpire / hpexpireat commands. h(p)expire(at) commands may allocate some memory but it is not that big. Similary, we don't have DENYOOM flag for EXPIRE command. This change will align EXPIRE and HEXPIRE commands in this manner.	2025-02-16 20:07:29 +03:00
Ozan Tezcan	e2608478b6	Add HGETDEL, HGETEX and HSETEX hash commands (#13798 ) This PR adds three new hash commands: HGETDEL, HGETEX and HSETEX. These commands enable user to do multiple operations in one step atomically e.g. set a hash field and update its TTL with a single command. Previously, it was only possible to do it by calling hset and hexpire commands subsequently. - HGETDEL command ``` HGETDEL <key> FIELDS <numfields> field [field ...] ``` Description Get and delete the value of one or more fields of a given hash key Reply Array reply: list of the value associated with each field or nil if the field doesn’t exist. - HGETEX command ``` HGETEX <key> [EX seconds \| PX milliseconds \| EXAT unix-time-seconds \| PXAT unix-time-milliseconds \| PERSIST] FIELDS <numfields> field [field ...] ``` Description Get the value of one or more fields of a given hash key, and optionally set their expiration Options: EX seconds: Set the specified expiration time, in seconds. PX milliseconds: Set the specified expiration time, in milliseconds. EXAT timestamp-seconds: Set the specified Unix time at which the field will expire, in seconds. PXAT timestamp-milliseconds: Set the specified Unix time at which the field will expire, in milliseconds. PERSIST: Remove the time to live associated with the field. Reply Array reply: list of the value associated with each field or nil if the field doesn’t exist. - HSETEX command ``` HSETEX <key> [FNX \| FXX] [EX seconds \| PX milliseconds \| EXAT unix-time-seconds \| PXAT unix-time-milliseconds \| KEEPTTL] FIELDS <numfields> field value [field value...] ``` Description Set the value of one or more fields of a given hash key, and optionally set their expiration Options: FNX: Only set the fields if all do not already exist. FXX: Only set the fields if all already exist. EX seconds: Set the specified expiration time, in seconds. PX milliseconds: Set the specified expiration time, in milliseconds. EXAT timestamp-seconds: Set the specified Unix time at which the field will expire, in seconds. PXAT timestamp-milliseconds: Set the specified Unix time at which the field will expire, in milliseconds. KEEPTTL: Retain the time to live associated with the field. Note: If no option is provided, any associated expiration time will be discarded similar to how SET command behaves. Reply Integer reply: 0 if no fields were set Integer reply: 1 if all the fields were set	2025-02-14 17:13:35 +03:00
Ozan Tezcan	09f8a2f374	Start AOFRW before streaming repl buffer during fullsync (#13758 ) During fullsync, before loading RDB on the replica, we stop aof child to prevent copy-on-write disaster. Once rdb is loaded, aof is started again and it will trigger aof rewrite. With https://github.com/redis/redis/pull/13732 , for rdbchannel replication, this behavior was changed. Currently, we start aof after replication buffer is streamed to db. This PR changes it back to start aof just after rdb is loaded (before repl buffer is streamed) Both approaches may have pros and cons. If we start aof before streaming repl buffers, we may still face with copy-on-write issues as repl buffers potentially include large amount of changes. If we wait until replication buffer drained, it means we are delaying starting aof persistence. Additional changes are introduced as part of this PR: - Interface change: Added `mem_replica_full_sync_buffer` field to the `INFO MEMORY` command reply. During full sync, it shows total memory consumed by accumulated replication stream buffer on replica. Added same metric to `MEMORY STATS` command reply as `replica.fullsync.buffer` field. - Fixes: - Count repl stream buffer size of replica as part of 'memory overhead' calculation for fields in "INFO MEMORY" and "MEMORY STATS" outputs. Before this PR, repl buffer was not counted as part of memory overhead calculation, causing misreports for fields like `used_memory_overhead` and `used_memory_dataset` in "INFO STATS" and for `overhead.total` field in "MEMORY STATS" command reply. - Dismiss replication stream buffers memory of replica in the fork to reduce COW impact during a fork. - Fixed a few time sensitive flaky tests, deleted a noop statement, fixed some comments and fail messages in rdbchannel tests.	2025-02-04 21:40:18 +03:00
Mason	f5e046a730	Update history for ban-list propagation (#13749 ) Update CLUSTER FORGET docs for changes in https://github.com/redis/redis/pull/10869 Docs PR: https://github.com/redis/docs/pull/1057 --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-01-27 21:05:37 +08:00
Yuan Wang	64a40b20d9	Async IO Threads (#13695 ) ## Introduction Redis introduced IO Thread in 6.0, allowing IO threads to handle client request reading, command parsing and reply writing, thereby improving performance. The current IO thread implementation has a few drawbacks. - The main thread is blocked during IO thread read/write operations and must wait for all IO threads to complete their current tasks before it can continue execution. In other words, the entire process is synchronous. This prevents the efficient utilization of multi-core CPUs for parallel processing. - When the number of clients and requests increases moderately, it causes all IO threads to reach full CPU utilization due to the busy wait mechanism used by the IO threads. This makes it challenging for us to determine which part of Redis has reached its bottleneck. - When IO threads are enabled with TLS and io-threads-do-reads, a disconnection of a connection with pending data may result in it being assigned to multiple IO threads simultaneously. This can cause race conditions and trigger assertion failures. Related issue: redis#12540 Therefore, we designed an asynchronous IO threads solution. The IO threads adopt an event-driven model, with the main thread dedicated to command processing, meanwhile, the IO threads handle client read and write operations in parallel. ## Implementation ### Overall As before, we did not change the fact that all client commands must be executed on the main thread, because Redis was originally designed to be single-threaded, and processing commands in a multi-threaded manner would inevitably introduce numerous race and synchronization issues. But now each IO thread has independent event loop, therefore, IO threads can use a multiplexing approach to handle client read and write operations, eliminating the CPU overhead caused by busy-waiting. the execution process can be briefly described as follows: the main thread assigns clients to IO threads after accepting connections, IO threads will notify the main thread when clients finish reading and parsing queries, then the main thread processes queries from IO threads and generates replies, IO threads handle writing reply to clients after receiving clients list from main thread, and then continue to handle client read and write events. ### Each IO thread has independent event loop We now assign each IO thread its own event loop. This approach eliminates the need for the main thread to perform the costly `epoll_wait` operation for handling connections (except for specific ones). Instead, the main thread processes requests from the IO threads and hands them back once completed, fully offloading read and write events to the IO threads. Additionally, all TLS operations, including handling pending data, have been moved entirely to the IO threads. This resolves the issue where io-threads-do-reads could not be used with TLS. ### Event-notified client queue To facilitate communication between the IO threads and the main thread, we designed an event-notified client queue. Each IO thread and the main thread have two such queues to store clients waiting to be processed. These queues are also integrated with the event loop to enable handling. We use pthread_mutex to ensure the safety of queue operations, as well as data visibility and ordering, and race conditions are minimized, as each IO thread and the main thread operate on independent queues, avoiding thread suspension due to lock contention. And we implemented an event notifier based on `eventfd` or `pipe` to support event-driven handling. ### Thread safety Since the main thread and IO threads can execute in parallel, we must handle data race issues carefully. client->flags The primary tasks of IO threads are reading and writing, i.e. `readQueryFromClient` and `writeToClient`. However, IO threads and the main thread may concurrently modify or access `client->flags`, leading to potential race conditions. To address this, we introduced an io-flags variable to record operations performed by IO threads, thereby avoiding race conditions on `client->flags`. Pause IO thread In the main thread, we may want to operate data of IO threads, maybe uninstall event handler, access or operate query/output buffer or resize event loop, we need a clean and safe context to do that. We pause IO thread in `IOThreadBeforeSleep`, do some jobs and then resume it. To avoid thread suspended, we use busy waiting to confirm the target status. Besides we use atomic variable to make sure memory visibility and ordering. We introduce these functions to pause/resume IO Threads as below. ``` pauseIOThread, resumeIOThread pauseAllIOThreads, resumeAllIOThreads pauseIOThreadsRange, resumeIOThreadsRange ``` Testing has shown that `pauseIOThread` is highly efficient, allowing the main thread to execute nearly 200,000 operations per second during stress tests. Similarly, `pauseAllIOThreads` with 8 IO threads can handle up to nearly 56,000 operations per second. But operations performed between pausing and resuming IO threads must be quick; otherwise, they could cause the IO threads to reach full CPU utilization. freeClient and freeClientAsync The main thread may need to terminate a client currently running on an IO thread, for example, due to ACL rule changes, reaching the output buffer limit, or evicting a client. In such cases, we need to pause the IO thread to safely operate on the client. maxclients and maxmemory-clients updating When adjusting `maxclients`, we need to resize the event loop for all IO threads. Similarly, when modifying `maxmemory-clients`, we need to traverse all clients to calculate their memory usage. To ensure safe operations, we pause all IO threads during these adjustments. Client info reading The main thread may need to read a client’s fields to generate a descriptive string, such as for the `CLIENT LIST` command or logging purposes. In such cases, we need to pause the IO thread handling that client. If information for all clients needs to be displayed, all IO threads must be paused. Tracking redirect Redis supports the tracking feature and can even send invalidation messages to a connection with a specified ID. But the target client may be running on IO thread, directly manipulating the client’s output buffer is not thread-safe, and the IO thread may not be aware that the client requires a response. In such cases, we pause the IO thread handling the client, modify the output buffer, and install a write event handler to ensure proper handling. clientsCron In the `clientsCron` function, the main thread needs to traverse all clients to perform operations such as timeout checks, verifying whether they have reached the soft output buffer limit, resizing the output/query buffer, or updating memory usage. To safely operate on a client, the IO thread handling that client must be paused. If we were to pause the IO thread for each client individually, the efficiency would be very low. Conversely, pausing all IO threads simultaneously would be costly, especially when there are many IO threads, as clientsCron is invoked relatively frequently. To address this, we adopted a batched approach for pausing IO threads. At most, 8 IO threads are paused at a time. The operations mentioned above are only performed on clients running in the paused IO threads, significantly reducing overhead while maintaining safety. ### Observability In the current design, the main thread always assigns clients to the IO thread with the least clients. To clearly observe the number of clients handled by each IO thread, we added the new section in INFO output. The `INFO THREADS` section can show the client count for each IO thread. ``` # Threads io_thread_0:clients=0 io_thread_1:clients=2 io_thread_2:clients=2 ``` Additionally, in the `CLIENT LIST` output, we also added a field to indicate the thread to which each client is assigned. `id=244 addr=127.0.0.1:41870 laddr=127.0.0.1:6379 ... resp=2 lib-name= lib-ver= io-thread=1` ## Trade-off ### Special Clients For certain special types of clients, keeping them running on IO threads would result in severe race issues that are difficult to resolve. Therefore, we chose not to offload these clients to the IO threads. For replica, monitor, subscribe, and tracking clients, main thread may directly write them a reply when conditions are met. Race issues are difficult to resolve, so we have them processed in the main thread. This includes the Lua debug clients as well, since we may operate connection directly. For blocking client, after the IO thread reads and parses a command and hands it over to the main thread, if the client is identified as a blocking type, it will be remained in the main thread. Once the blocking operation completes and the reply is generated, the client is transferred back to the IO thread to send the reply and wait for event triggers. ### Clients Eviction To support client eviction, it is necessary to update each client’s memory usage promptly during operations such as read, write, or command execution. However, when a client operates on an IO thread, it is not feasible to update the memory usage immediately due to the risk of data races. As a result, memory usage can only be updated either in the main thread while processing commands or in the `ClientsCron` periodically. The downside of this approach is that updates might experience a delay of up to one second, which could impact the precision of memory management for eviction. To avoid incorrectly evicting clients. We adopted a best-effort compensation solution, when we decide to eviction a client, we update its memory usage again before evicting, if the memory used by the client does not decrease or memory usage bucket is not changed, then we will evict it, otherwise, not evict it. However, we have not completely solved this problem. Due to the delay in memory usage updates, it may lead us to make incorrect decisions about the need to evict clients. ### Defragment In the majority of cases we do NOT use the data from argv directly in the db. 1. key names We store a copy that we allocate in the main thread, see `sdsdup()` in `dbAdd()`. 2. hash key and value We store key as hfield and store value as sds, see `hfieldNew()` and `sdsdup()` in `hashTypeSet()`. 3. other datatypes They don't even use SDS, so there is no reference issues. But in some cases client the data from argv may be retain by the main thread. As a result, during fragmentation cleanup, we need to move allocations from the IO thread’s arena to the main thread’s arena. We always allocate new memory in the main thread’s arena, but the memory released by IO threads may not yet have been reclaimed. This ultimately causes the fragmentation rate to be higher compared to creating and allocating entirely within a single thread. The following cases below will lead to memory allocated by the IO thread being kept by the main thread. 1. string related command: `append`, `getset`, `mset` and `set`. If `tryObjectEncoding()` does not change argv, we will keep it directly in the main thread, see the code in `tryObjectEncoding()`(specifically `trimStringObjectIfNeeded()`) 2. block related command. the key names will be kept in `c->db->blocking_keys`. 3. watch command the key names will be kept in `c->db->watched_keys`. 4. [s]subscribe command channel name will be kept in `serverPubSubChannels`. 5. script load command script will be kept in `server.lua_scripts`. 7. some module API: `RM_RetainString`, `RM_HoldString` Those issues will be handled in other PRs. ## Testing ### Functional Testing The commit with enabling IO Threads has passed all TCL tests, but we did some changes: Client query buffer: In the original code, when using a reusable query buffer, ownership of the query buffer would be released after the command was processed. However, with IO threads enabled, the client transitions from an IO thread to the main thread for processing. This causes the ownership release to occur earlier than the command execution. As a result, when IO threads are enabled, the client's information will never indicate that a shared query buffer is in use. Therefore, we skip the corresponding query buffer tests in this case. Defragment: Add a new defragmentation test to verify the effect of io threads on defragmentation. Command delay: For deferred clients in TCL tests, due to clients being assigned to different threads for execution, delays may occur. To address this, we introduced conditional waiting: the process proceeds to the next step only when the `client list` contains the corresponding commands. ### Sanitizer Testing The commit passed all TCL tests and reported no errors when compiled with the `fsanitizer=thread` and `fsanitizer=address` options enabled. But we made the following modifications: we suppressed the sanitizer warnings for clients with watched keys when updating `client->flags`, we think IO threads read `client->flags`, but never modify it or read the `CLIENT_DIRTY_CAS` bit, main thread just only modifies this bit, so there is no actual data race. ## Others ### IO thread number In the new multi-threaded design, the main thread is primarily focused on command processing to improve performance. Typically, the main thread does not handle regular client I/O operations but is responsible for clients such as replication and tracking clients. To avoid breaking changes, we still consider the main thread as the first IO thread. When the io-threads configuration is set to a low value (e.g., 2), performance does not show a significant improvement compared to a single-threaded setup for simple commands (such as SET or GET), as the main thread does not consume much CPU for these simple operations. This results in underutilized multi-core capacity. However, for more complex commands, having a low number of IO threads may still be beneficial. Therefore, it’s important to adjust the `io-threads` based on your own performance tests. Additionally, you can clearly monitor the CPU utilization of the main thread and IO threads using `top -H -p $redis_pid`. This allows you to easily identify where the bottleneck is. If the IO thread is the bottleneck, increasing the `io-threads` will improve performance. If the main thread is the bottleneck, the overall performance can only be scaled by increasing the number of shards or replicas. --------- Co-authored-by: debing.sun <debing.sun@redis.com> Co-authored-by: oranagra <oran@redislabs.com>	2024-12-23 14:16:40 +08:00
Oran Agra	79fd255828	Add Lua VM memory to memory overhead, now that it's part of zmalloc (#13660 ) To complement the work done in #13133. it added the script VMs memory to be counted as part of zmalloc, but that means they should be also counted as part of the non-value overhead. this commit contains some refactoring to make variable names and function names less confusing. it also adds a new field named `script.VMs` into the `MEMORY STATS` command. additionally, clear scripts and stats between tests in external mode (which is related to how this issue was discovered)	2024-11-21 08:22:17 +02:00
YaacovHazan	6c5e263d7b	Temporarily hide the new SFLUSH command by marking it as experimental (#13600 ) - Add a new 'EXPERIMENTAL' command flag, which causes the command generator to skip over it and make the command to be unavailable for execution - Skip experimental tests by default - Move the SFLUSH tests from the old framework to the new one --------- Co-authored-by: YaacovHazan <yaacov.hazan@redislabs.com>	2024-10-15 11:02:51 +03:00
Moti Cohen	d092d64d7a	Add new SFLUSH command to cluster for slot-based FLUSH (#13564 ) This PR introduces a new `SFLUSH` command to cluster mode that allows partial flushing of nodes based on specified slot ranges. Current implementation is designed to flush all slots of a shard, but future extensions could allow for more granular flushing. Command Usage: `SFLUSH <start-slot> <end-slot> [<start-slot> <end-slot>]* [SYNC\|ASYNC]` This command removes all data from the specified slots, either synchronously or asynchronously depending on the optional SYNC/ASYNC argument. Functionality: Current imp of `SFLUSH` command verifies that the provided slot ranges are valid and cover all of the node's slots before proceeding. If slots are partially or incorrectly specified, the command will fail and return an error, ensuring that all slots of a node must be fully covered for the flush to proceed. The function supports both synchronous (default) and asynchronous flushing. In addition, if possible, SFLUSH SYNC will be run as blocking ASYNC as an optimization.	2024-09-29 09:13:21 +03:00
Filipe Oliveira (Redis)	00a8e72cfc	Created specific SMEMBERS command logic which avoids sinterGenericCommand, and minimizes processing and memory overhead (#13499 ) This PR introduces a dedicated implementation for the SMEMBERS command that avoids using the more generalized sinterGenericCommand function. By tailoring the logic specifically for SMEMBERS, we reduce unnecessary processing and memory overheads that were previously incurred by handling more complex cases like set intersections. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2024-09-03 18:32:43 +08:00
Moti Cohen	4dd8b1faa9	Fix HTTL/HPTTL to be NONDETERMINISTIC_OUTPUT (#13461 ) H[P]TTL should be marked as NONDETERMINISTIC_OUTPUT just like [P]TTL.	2024-08-04 17:42:50 +03:00
Jo	871c985919	Update `FIELDS` argument to block type for HFE commands schema (#13339 ) I reviewed `XREAD` command syntax: ``` XREAD [COUNT count] [BLOCK milliseconds] STREAMS key [key ...] id [id ...] ``` Here’s the structure for `XREAD`: ```json "arguments": [ { "token": "COUNT", "name": "count", "type": "integer", "optional": true }, { "token": "BLOCK", "name": "milliseconds", "type": "integer", "optional": true }, { "name": "streams", "token": "STREAMS", "type": "block", "arguments": [ { "name": "key", "type": "key", "key_spec_index": 0, "multiple": true }, { "name": "ID", "type": "string", "multiple": true } ] } ] ``` Now, consider the `HEXPIRE` syntax: ``` HEXPIRE key seconds [NX \| XX \| GT \| LT] FIELDS numfields field [field ...] ``` Since the `FIELDS` token functions similarly to `STREAMS`, and given that `STREAMS` is defined as a block, I believe the `FIELDS` in `hepxire` should also be defined as a block.	2024-06-14 13:51:49 +08:00
debing.sun	7b9e960690	Hash Field Expiration (#13303 ) ## Background This PR introduces support for field-level expiration in Redis hashes. Previously, Redis supported expiration only at the key level, but this enhancement allows setting expiration times for individual fields within a hash. ## New commands * HEXPIRE * HEXPIREAT * HEXPIRETIME * HPERSIST * HPEXPIRE * HPEXPIREAT * HPEXPIRETIME * HPTTL * HTTL ## Short example from @moticless ```sh 127.0.0.1:6379> hset myhash f1 v1 f2 v2 f3 v3 (integer) 3 127.0.0.1:6379> hpexpire myhash 10000 NX fields 2 f2 f3 1) (integer) 1 2) (integer) 1 127.0.0.1:6379> hpttl myhash fields 3 f1 f2 f3 1) (integer) -1 2) (integer) 9997 3) (integer) 9997 127.0.0.1:6379> hgetall myhash 1) "f3" 2) "v3" 3) "f2" 4) "v2" 5) "f1" 6) "v1" ... after 10 seconds ... 127.0.0.1:6379> hgetall myhash 1) "f1" 2) "v1" 127.0.0.1:6379> ``` ## Expiration strategy 1. Integrate active Redis periodically performs active expiration and deletion of hash keys that contain expired fields, with a maximum attempt limit. 3. Lazy expiration When a client touches fields within a hash, Redis checks if the fields are expired. If a field is expired, it will be deleted. However, we do not delete expired fields during a traversal, we implicitly skip over them. ## RDB changes Add two new rdb type s`RDB_TYPE_HASH_METADATA` and `RDB_TYPE_HASH_LISTPACK_EX`. ## Notification 1. Add `hpersist` notification for `HPERSIST` command. 5. Add `hexpire` notification for `HEXPIRE`, `HEXPIREAT`, `HPEXPIRE` and `HPEXPIREAT` commands. ## Internal 1. Add new data structure `ebuckets`, which is used to store TTL and keys, enabling quick retrieval of keys based on TTL. 2. Add new data structure `mstr` like sds, which is used to store a string with TTL. This work was done by @moticless, @tezc, @ronen-kalish, @sundb, I just release it.	2024-05-30 15:26:19 +08:00
Ozan Tezcan	f0389f2823	Fix position of numfields in H(P)EXPIRE json files (#13301 ) Fix position of numfields in H(P)EXPIRE json files	2024-05-29 16:35:47 +03:00
Ozan Tezcan	e2918705c8	Fix hfe reply schemas (#13295 ) In https://github.com/redis/redis/pull/13291, we've changed that hfe commands to return empty array if the key does not exist. Forgot to update json schemas.	2024-05-27 16:07:01 +03:00
debing.sun	2d1bb42cba	Update version references from 8.0 to 7.4 for upcoming release (#13294 )	2024-05-27 16:47:23 +08:00
Ozan Tezcan	2f34f6f0b9	Delete hsetf and hgetf (#13291 ) Changes: - Delete hsetf and hgetf commands - Hfe commands will return empty array instead of nil. --------- Co-authored-by: Moti Cohen <moticless@gmail.com>	2024-05-26 13:30:45 +03:00
Moti Cohen	71676513dd	Fix commands HEXPIRE and H*TTL to include `FIELDS` constant (#13270 ) The same goes to: HPEXPIRE, HEXPIREAT, HPEXPIREAT, HEXPIRETIME, HPEXPIRETIME, HPTTL, HTTL, HPERSIST	2024-05-16 19:35:58 +03:00
Ozan Tezcan	5066e6e9cd	Fix hgetf/hsetf reply type by returning string (#13263 ) If encoding is listpack, hgetf and hsetf commands reply field value type as integer. This PR fixes it by returning string. Problematic cases: ``` 127.0.0.1:6379> hset hash one 1 (integer) 1 127.0.0.1:6379> hgetf hash fields 1 one 1) (integer) 1 127.0.0.1:6379> hsetf hash GETOLD fvs 1 one 2 1) (integer) 1 127.0.0.1:6379> hsetf hash DOF GETNEW fvs 1 one 2 1) (integer) 2 ``` Additional fixes: - hgetf/hsetf command description text Fixes #13261, #13262	2024-05-13 11:09:49 +03:00
debing.sun	7010f41c96	Add notification support for HFE (#13237 ) 1. Add `hpersist` notification for `hpersist` command. 2. Add `pexpire` notification for `hexpire`, `hexpireat` and `hpexpire`.	2024-05-09 22:23:00 +08:00
Ozan Tezcan	ca4ed48db6	Add listpack support, hgetf and hsetf commands (#13209 ) Changes: - Adds listpack support to hash field expiration - Implements hgetf/hsetf commands Listpack support for hash field expiration We keep field name and value pairs in listpack for the hash type. With this PR, if one of hash field expiration command is called on the key for the first time, it converts listpack layout to triplets to hold field name, value and ttl per field. If a field does not have a TTL, we store zero as the ttl value. Zero is encoded as two bytes in the listpack. So, once we convert listpack to hold triplets, for the fields that don't have a TTL, it will be consuming those extra 2 bytes per item. Fields are ordered by ttl in the listpack to find the field with minimum expiry time efficiently. New command implementations as part of this PR: - HGETF command For each specified field get its value and optionally set the field's expiration time in sec/msec /unix-sec/unix-msec: ``` HGETF key [NX \| XX \| GT \| LT] [EX seconds \| PX milliseconds \| EXAT unix-time-seconds \| PXAT unix-time-milliseconds \| PERSIST] <FIELDS count field [field ...]> ``` - HSETF command For each specified field value pair: set field to value and optionally set the field's expiration time in sec/msec /unix-sec/unix-msec: ``` HSETF key [DC] [DCF \| DOF] [NX \| XX \| GT \| LT] [GETNEW \| GETOLD] [EX seconds \| PX milliseconds \| EXAT unix-time-seconds \| PXAT unix-time-milliseconds \| KEEPTTL] <FVS count field value [field value …]> ``` Todo: - Performance improvement. - rdb load/save - aof - defrag	2024-05-08 23:11:32 +03:00
debing.sun	03cd525ffa	Fix reply schema for hfe related commands (#13238 )	2024-05-03 11:11:41 +08:00
Moti Cohen	c18ff05665	Hash Field Expiration - Basic support - Add ebuckets & mstr data structures - Integrate active & lazy expiration - Add most of the commands - Add support for dict (listpack is missing) TODOs: RDB, notification, listpack, HSET, HGETF, defrag, aof	2024-04-18 16:06:30 +03:00
Chen Tianjie	4cae99e785	Add overhead of all DBs and rehashing dict count to info. (#12913 ) Sometimes we need to make fast judgement about why Redis is suddenly taking more memory. One of the reasons is main DB's dicts doing rehashing. We may use `MEMORY STATS` to monitor the overhead memory of each DB, but there still lacks a total sum to show an overall trend. So this PR adds the total overhead of all DBs to `INFO MEMORY` section, together with the total count of rehashing DB dicts, providing some intuitive metrics about main dicts rehashing. This PR adds the following metrics to INFO MEMORY * `mem_overhead_db_hashtable_rehashing` - only size of ht[0] in dictionaries we're rehashing (i.e. the memory that's gonna get released soon) and a similar ones to MEMORY STATS: * `overhead.db.hashtable.lut` (complements the existing `overhead.hashtable.main` and `overhead.hashtable.expires` which also counts the `dictEntry` structs too) * `overhead.db.hashtable.rehashing` - temporary rehashing overhead. * `db.dict.rehashing.count` - number of top level dictionaries being rehashed. --------- Co-authored-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2024-03-01 13:41:24 +08:00
Binbin	f17381a38d	Fix propagation of entries_read by calling streamPropagateGroupID unconditionally (#12898 ) In XREADGROUP ACK, because streamPropagateXCLAIM does not propagate entries-read, entries-read will be inconsistent between master and replicas. I.e. if no entries were claimed, it would have propagated correctly, but if some were claimed, then the entries-read field would be inconsistent on the replica. The fix was suggested by guybe7, call streamPropagateGroupID unconditionally, so that we will normalize entries_read on the replicas. In the past, we would only set propagate_last_id when NOACK was specified. And in #9127, XCLAIM did not propagate entries_read in ACK, which would cause entries_read to be inconsistent between master and replicas. Another approach is add another arg to XCLAIM and let it propagate entries_read, but we decided not to use it. Because we want minimal damage in case there's an old target and new source (in the worst case scenario, the new source doesn't recognize XGROUP SETID ... ENTRIES READ and the lag is lost. If we change XCLAIM, the damage is much more severe). In this patch, now if the user uses XREADGROUP .. COUNT 1 there will be an additional overhead of MULTI, EXEC and XGROUPSETID. We assume the extra command in case of COUNT 1 (4x factor, changing from one XCLAIM to MULTI+XCLAIM+XSETID+EXEC), is probably ok since reading just one entry is in any case very inefficient (a client round trip per record), so we're hoping it's not a common case. Issue was introduced in #9127.	2024-02-29 09:48:20 +02:00
guybe7	820a4e45f1	Edit the history field of xinfo-consumers (#13078 ) Now it matches the information in xinfo-stream.json	2024-02-22 09:44:29 +02:00
Binbin	5b9fc46523	Add new allocator.muzzy field to memory-stats reply schema (#13076 ) This field was added in #12996 but forgot to add it in json file. This also causes reply-schemas-validator to fail.	2024-02-21 08:35:10 +02:00
Binbin	ca5cac998e	xinfo-stream add minimum to seen-time, skip logreqres in fuzzer (#13056 ) Recently I saw in CI that reply-schemas-validator fails here: ``` Failed validating 'minimum' in schema[1]['properties']['groups']['items']['properties']['consumers']['items']['properties']['active-time']: {'description': 'Last time this consumer was active (successful ' 'reading/claiming).', 'minimum': 0, 'type': 'integer'} On instance['groups'][0]['consumers'][0]['active-time']: -1729380548878722639 ``` The reason is that in fuzzer, we may restore corrupted active-time, which will cause the reply schema CI to fail. The fuzzer can cause corrupt the state in many places, which will bugs that mess up the reply, so we decided to skip logreqres. Also, seen-time is the same type as active-time, adding the minimum. --------- Co-authored-by: Oran Agra <oran@redislabs.com>	2024-02-20 12:21:10 +02:00
guybe7	6df42df291	Adds a README to the command JSON files (#13066 ) Add readme about the command json folder, what it does, and who should (not) use it. see discussion https://github.com/redis/redis/issues/9359#issuecomment-1936420698 --------- Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2024-02-19 18:49:31 +02:00
Daz	02a87885e6	Add missing structural API changes to JSON file (#12434 ) The JSON file lacks the following structural API changes: - GEORADIUSBYMEMBER: add the ANY option for COUNT since 6.2.0. - GEORADIUSBYMEMBER_RO: add the ANY option for COUNT since 6.2.0. - GEORADIUS_RO: Added support for uppercase unit names since 7.0.0. - GEORADIUSBYMEMBER_RO: Added support for uppercase unit names since 7.0.0. --------- Signed-off-by: daz-3ux <daz-3ux@proton.me> Co-authored-by: bodong.ybd <bodong.ybd@alibaba-inc.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: yangpengda.333 <yangpengda.333@bytedance.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2024-02-04 08:42:15 +02:00
Chen Tianjie	f469dd8ca6	Add novalues option to command HSCAN. (#12765 ) Add a way to HSCAN a hash key, and get only the filed names. Command syntax is now: ``` HSCAN key cursor [MATCH pattern] [COUNT count] [NOVALUES] ``` when `NOVALUES` is on, the command will only return keys in the hash. --------- Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-01-30 20:32:58 +02:00
Slava Koyfman	24f6d08b3f	Implement `CLIENT KILL MAXAGE <maxage>` (#12299 ) Adds an ability to kill clients older than a specified age. Also, fixed the age calculation in `catClientInfoString` to use `commandTimeSnapshot` instead of the old `server.unixtime`, and added missing documentation for `CLIENT KILL ID` to output of `CLIENT help`. --------- Co-authored-by: Oran Agra <oran@redislabs.com>	2024-01-30 20:24:36 +02:00
Binbin	85c31e0cff	Allow running WAITAOF in scripts, remove NOSCRIPT flag (#12977 ) In #11568 we removed the NOSCRIPT flag from commands, e.g. removing NOSCRIPT flag from WAIT. Aiming to allow them in scripts and let them implicitly behave in the non-blocking way. This PR remove NOSCRIPT flag from WAITAOF just like WAIT (to be symmetrical)). And this PR also add BLOCKING flag for WAIT and WAITAOF.	2024-01-23 15:19:41 +02:00
debing.sun	4730563e93	Change destination key's key-spec flag from RW to OW for SINTERSTORE command (#12917 ) In #10122, we set the destination key's flag of SINTERSTORE to `RW`, however, this command doesn't actually read or modify the destination key, just overwrites it. Therefore, we change it to `OW` similarly to all other *STORE commands.	2024-01-08 10:17:13 +02:00
Binbin	7410d985bc	Remove overhead.hashtable.slot-to-keys from memory-stats reply_schema (#12784 ) overhead.hashtable.slot-to-keys was added in 7.0 in #10017, then removed in #11695. Now remove it from reply_schema.	2023-12-10 09:46:21 +02:00
zhaozhao.zz	77a65e82b2	support XREAD[GROUP] with BLOCK option in scripts (#12596 ) In #11568 we removed the NOSCRIPT flag from commands and keep the BLOCKING flag. Aiming to allow them in scripts and let them implicitly behave in the non-blocking way. In that sense, the old behavior was to allow LPOP and reject BLPOP, and the new behavior, is to allow BLPOP too, and fail it only in case it ends up blocking. So likewise, so far we allowed XREAD and rejected XREAD BLOCK, and we will now allow that too, and only reject it if it ends up blocking.	2023-10-12 10:54:50 +03:00
Binbin	8d92f7f2b7	Support NO ONE block in REPLICAOF command json (#12633 ) The current commands.json doesn't mention the special NO ONE arguments. This change is also applied to SLAVEOF	2023-10-10 11:10:40 +03:00
Binbin	4031a18732	Fix that slot return in CLUSTER SHARDS should be integer (#12561 ) An unintentional change was introduced in #10536, we used to use addReplyLongLong and now it is addReplyBulkLonglong, revert it back the previous behavior.	2023-09-09 23:33:00 -07:00
nihohit	90e9fc387c	Update command tips on more admin / configuration commands (#12545 ) Updated the command tips for ACL SAVE / SETUSER / DELUSER, CLIENT SETNAME / SETINFO, and LATENCY RESET. The tips now match CONFIG SET, since there's a similar behavior for all of these commands - the user expects to update the various configurations & states on all nodes, not only on a single, random node. For LATENCY RESET the response tip is now agg_sum. Co-authored-by: Shachar Langbeheim <shachlan@amazon.com>	2023-09-04 21:30:42 +03:00
Binbin	9ce8c54d74	Update sort_ro reply_schema to mention the null reply (#12534 ) Also added a test to cover this case, so this can cover the reply schemas check.	2023-08-31 06:36:35 +03:00
nihohit	4b281ce519	Align CONFIG RESETSTAT/REWRITE tips with SET. (#12530 ) Since the three commands have similar behavior (change config, return OK), the tips that govern how they should behave should be similar. Co-authored-by: Shachar Langbeheim <shachlan@amazon.com>	2023-08-30 21:49:02 +03:00
Binbin	f4549d1cf4	Fix CLUSTER REPLICAS time complexity, should be O(N) (#12477 ) We iterate over all replicas to get the result, the time complexity should be O(N), like CLUSTER NODES complexity is O(N).	2023-08-14 20:57:55 -07:00
Binbin	7af9f4b36e	Fix GEOHASH / GEODIST / GEOPOS time complexity, should be O(1) (#12445 ) GEOHASH / GEODIST / GEOPOS use zsetScore to get the score, in skiplist encoding, we use dictFind to get the score, which is O(1), same as ZSCORE command. It is not clear why these commands had O(Log(N)), and O(N) until now.	2023-08-05 07:29:24 +03:00
nihohit	9f512017aa	Update request/response policies. (#12417 ) changing the response and request policy of a few commands, see https://redis.io/docs/reference/command-tips 1. RANDOMKEY used to have no response policy, which means that when sent to multiple shards, the responses should be aggregated. this normally applies to commands that return arrays, but since RANDOMKEY replies with a simple string, it actually requires a SPECIAL response policy (for the client to select just one) 2. SCAN used to have no response policy, but although the key names part of the response can be aggregated, the cursor part certainly can't. 3. MSETNX had a request policy of MULTI_SHARD and response policy of AGG_MIN, but in fact the contract with MSETNX is that when one key exists, it returns 0 and doesn't set any key, routing it to multiple shards would mean that if one failed and another succeeded, it's atomicity is broken and it's impossible to return a valid response to the caller. Co-authored-by: Shachar Langbeheim <shachlan@amazon.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2023-07-25 10:21:23 +03:00
Binbin	d306d86146	Fix ZRANK/ZREVRANK reply_schema description (#12331 ) The parameter name is WITHSCORE instead of WITHSCORES.	2023-06-20 11:15:40 +03:00
Binbin	b510624978	Optimize PSUBSCRIBE and PUNSUBSCRIBE from O(NM) to O(N) (#12298 ) In the original implementation, the time complexity of the commands is actually O(NM), where N is the number of patterns the client is already subscribed and M is the number of patterns to subscribe to. The docs are all wrong about this. Specifically, because the original client->pubsub_patterns is a list, so we need to do listSearchKey which is O(N). In this PR, we change it to a dict, so the search becomes O(1). At the same time, both pubsub_channels and pubsubshard_channels are dicts. Changing pubsub_patterns to a dictionary improves the readability and maintainability of the code.	2023-06-19 16:31:18 +03:00
Harkrishn Patro	a9e32767f7	Allow cluster slots/shards api to respond during loading (#12269 ) It would be helpful for clients to get cluster slots/shards information during a node failover and is loading data.	2023-06-13 18:16:32 +03:00

1 2 3 4

164 commits