redis

mirror of https://github.com/redis/redis.git synced 2026-05-28 04:02:46 -04:00

Author	SHA1	Message	Date
Salvatore Sanfilippo	11947d8892	[Vector sets] fast JSON filter (#13959 ) This PR replaces cJSON with an home-made parser designed for the kind of access pattern the FILTER option of VSIM performs on JSON objects. The main points here are: * cJSON forces us to parse the whole JSON, create a graph of cJSON objects, then we need to seek in O(N) to find the right field. * The cJSON object associated with the value is not of the same format as the expr.c virtual machine. We needed a conversion function doing more allocation and work. * Right now we only support top level fields in the JSON object, so a full parser is not needed. With all these things in mind, and after carefully profiling the old code, I realized that a specialized parser able to parse JSON in a zero-allocation fashion and only actually parse the value associated to our key would be much more efficient. Moreover, after this change, the dependencies of Vector Sets to external code drops to zero, and the count of lines of code is 3000 lines less. The new line count with LOC is 4200, making Vector Sets easily the smallest full featured implementation of a Vector store available. # Speedup achieved In a dataset with JSON objects with 30 fields, 1 million elements, the following query shows a 3.5x speedup: vsim vectors:million ele ele943903 FILTER ".field29 > 1000 and .field15 < 50" Please note that we get 3.5x speedup in the VSIM command itself. This means that the actual JSON parsing speedup is significantly greater than that. However, in Redis land, under my past kingdom of many years ago, the rule was that an improvement would produce speedups that are user facing. This PR definitely qualifies. What is interesting is that even with a JSON containing a single element the speedup is of about 70%, so we are faster even in the worst case. # Further info Note that the new skipping parser, may happily process JSON objects that are not perfectly valid, as soon as they look valid from the POV of balancing [] and {} and so forth. This should not be an issue. Anyway invalid JSON produces random results (the element is skipped at all even if it would pass the filter). Please feel free to ask me anything about the new implementation before merging.	2025-05-05 09:52:42 +03:00
Pieter Cailliau	d65102861f	Adding AGPLv3 as a license option to Redis! (#13997 ) Read more about [the new license option](http://redis.io/blog/agplv3/) and [the Redis 8 release](http://redis.io/blog/redis-8-ga/).	2025-05-01 14:04:22 +01:00
YaacovHazan	de16bee70a	Limiting output buffer for unauthenticated client (CVE-2025-21605) (#13993 ) For unauthenticated clients the output buffer is limited to prevent them from abusing it by not reading the replies	2025-04-30 09:58:51 +03:00
Yuan Wang	14dd59ab12	Remove io-threads-do-reads from normal config list (#13987 ) Since after https://github.com/redis/redis/pull/13695, `io-threads-do-reads` config is deprecated, we should remove it from normal config list and only keep it in deprecated config list, but we forgot to do this, this PR fixes this. thanks @YaacovHazan for reporting this	2025-04-28 12:55:47 +03:00
Vitah Lin	bd3c1e1bd7	Delete redundant declaration of handleDebugClusterCommand() (#13974 )	2025-04-24 10:50:35 +08:00
Vitah Lin	9f99dd5f6d	Fix tls port update not reflected in CLUSTER SLOTS (#13966 ) ### Problem A previous PR (https://github.com/redis/redis/pull/13932) fixed the TCP port issue in CLUSTER SLOTS, but it seems the handling of the TLS port was overlooked. There is this comment in the `addNodeToNodeReply` function in the `cluster.c` file: ```c /* Report TLS ports to TLS client, and report non-TLS port to non-TLS client. */ addReplyLongLong(c, clusterNodeClientPort(node, shouldReturnTlsInfo())); addReplyBulkCBuffer(c, clusterNodeGetName(node), CLUSTER_NAMELEN); ``` ### Fixed This PR fixes the TLS port issue and adds relevant tests.	2025-04-24 09:36:45 +08:00
nesty92	8468ded667	Fix incorrect lag due to trimming stream via XTRIM or XADD command (#13958 ) This PR fix the lag calculation by ensuring that when consumer group's last_id is behind the first entry, the consumer group's entries read is considered invalid and recalculated from the start of the stream Supplement to PR #13473 Close #13957 Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>	2025-04-22 10:11:10 +08:00
Alexandre Antonio Juca	a51918209c	Fix grammar and typos (#13803 ) This MR includes minor improvements and grammatical fixes in the documentation. Specifically: • Corrected grammatical mistakes in sentences for better clarity. • Fixed typos and improved phrasing to enhance readability. • Ensured consistency in terminology and sentence structure. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-04-22 09:16:10 +08:00
Stav-Levi	a257b6b4ba	Fix port update not reflected in CLUSTER SLOTS (#13932 ) Close https://github.com/redis/redis/issues/13892 config set port cmd updates server.port. cluster slot retrieves information about cluster slots and their associated nodes. the fix updates this info when config set port cmd is done, so cluster slots cmd returns the right value.	2025-04-21 17:13:55 +08:00
Ozan Tezcan	c9be4fbd72	Fix order of KSN for hgetex command (#13931 ) If HGETEX command deletes the only field due to lazy expiry, Redis currently sends `del` KSN (Keyspace Notification) first, followed by `hexpired` KSN. The order should be reversed, `hexpired` should be sent first and `del` later. Additonal changes: More test coverage for HGETDEL KSN --------- Co-authored-by: hristosko <hristosko.chaushev@redis.com>	2025-04-14 13:31:31 +03:00
chx9	90fa80f372	Delete redundant declaration of clusterNodeIsMaster() (#13937 )	2025-04-13 20:30:34 +08:00
Ozan Tezcan	ec31156b58	Fix a couple of compiler warnings (#13911 ) Fix a couple of compiler warnings 1. gcc-14 prints a warning: ``` In function ‘memcpy’, inlined from ‘zipmapSet’ at zipmap.c:255:5: /usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ writing between 254 and 4294967295 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=] 29 \| return __builtin___memcpy_chk (__dest, __src, __len, \| ^ In function ‘zipmapSet’: lto1: note: destination object is likely at address zero ``` 2. I occasionally get another warning while building with different options: ``` redis-cli.c: In function ‘clusterManagerNodeMasterRandom’: redis-cli.c:6053:1: warning: control reaches end of non-void function [-Wreturn-type] 6053 \| } ```	2025-04-07 13:09:47 +03:00
YaacovHazan	5582a41bb6	Few fixes around make for modules (#13922 ) - Suppress errors when removing .so files that may not exist - Fix -DINCLUDE_VEC_SETS duplication	2025-04-06 11:09:07 +03:00
Slava Koyfman	fd4b5cb3fa	Improve refcount check in 'decrRefCount' (#13888 ) The code of 'decrRefCount' included a validity check that would panic the server if the refcount ever became invalid. However, due to the way it was written, this could only happen if a corrupted value was written to the field, or we attempted to decrement a newly-allocated and never-incremented object. Incorrectly-tracked refcounts would not be caught, as the code would never actually reduce the refcount from 1 to 0. This left potential use-after-free errors unhandled. Improved the code so that incorrect tracking of refcounts causes a panic, even if the freed memory happens to still be owned by the application and not re-allocated.	2025-04-03 21:29:06 +08:00
Ozan Tezcan	3cdb8c6046	Improve replication buffering on replica and fix a related bug (#13904 ) With RDB channel replication, we introduced parallel replication stream and RDB delivery to the replica during a full sync. Currently, after the replica loads the RDB and begins streaming the accumulated buffer to the database, it does not read from the master connection during this period. Although streaming the local buffer is generally a fast operation, it can take some time if the buffer is large. This PR introduces buffering during the streaming of the local buffer. One important consideration is ensuring that we consume more than we read during this operation; otherwise, it could take indefinitely. To guarantee that it will eventually complete, we limit the read to at most half of what we consume, e.g. read at most 1 mb once we consume at least 2 mb. Additional changes Bug fix - Currently, when replica starts draining accumulated buffer, we call protectClient() for the master client as we occasionally yield back to event loop via processEventsWhileBlocked(). So, it prevents freeing the master client. While we are in this loop, if replica receives "replicaof newmaster" command, we call replicaSetMaster() which expects to free the master client and trigger a new connection attempt. As the client object is protected, its destruction will happen asynchronously. Though, a new connection attempt to new master will be made immediately. Later, when the replication buffer is drained, we realize master client was marked as CLOSE_ASAP, and freeing master client triggers another connection attempt to the new master. In most cases, we realize something is wrong in the replication state machine and abort the second attempt later. So, the bug may go undetected. Fix is not calling protectClient() for the master client. Instead, trying to detect if master client is disconnected during processEventsWhileBlocked() and if so, breaking the loop immediately. Related improvement: - Currently, the replication buffer is a linked list of buffers, each of which is 1 MB in size. While consuming the buffer, we process one buffer at a time and check if we need to yield back to `processEventsWhileBlocked()`. However, if `loading-process-events-interval-bytes` is set to less than 1 MB, this approach doesn't handle it well. To improve this, I've modified the code to process 16KB at a time and check `loading-process-events-interval-bytes` more frequently. This way, depending on the configuration, we may yield back to networking more often. - In replication.c, `disklessLoadingRio` will be set before a call to `emptyData()`. This change should not introduce any behavioral change but it is logically more correct as emptyData() may yield to networking and we may need to call rioAbort() on disklessLoadingRio. Otherwise, failure of main channel may go undetected until a failure on rdb channel on a corner case. Config changes - The default value for the `loading-process-events-interval-bytes` configuration is being lowered from 2MB to 512KB. This configuration primarily used for testing and controls the frequency of networking during the loading phase, specifically when loading the RDB or applying accumulated buffers during a full sync on the replica side. Before the introduction of RDB channel replication, the 2MB value was sufficient for occasionally yielding to networking, mainly to reply -loading to the clients. However, with RDB channel replication, during a full sync on the replica side (either while loading the RDB or applying the accumulated buffer), we need to yield back to networking more frequently to continue accumulating the replication stream. If this doesn’t happen often enough, the replication stream can accumulate on the master side, which is undesirable. To address this, we’ve decided to lower the default value to 512KB. One concern with frequent yielding to networking is the potential performance impact, as each call to processEventsWhileBlocked() involves 4 syscalls, which could slow down the RDB loading phase. However, benchmarking with various configuration values has shown that using 512KB or higher does not negatively impact RDB loading performance. Based on these results, 512KB is now selected as the default value. Test changes - Added improved version of a replication test which checks memory usage on master during full sync. --------- Co-authored-by: Oran Agra <oran@redislabs.com>	2025-04-03 10:04:29 +03:00
YaacovHazan	41b1b5df18	Add vector-sets module The vector-sets module is a part of Redis Core and is available by default, just like any other data type in Redis. As a result, when building Redis from the source, the vector-sets module is also compiled as part of the Redis binary and loaded at server start-up. This new data type added as a preview currently doesn't support all the capabilities in Redis like: 32-bit OS C99 Short-read that might end with memory leak AOF rewirte defrag	2025-04-02 15:06:24 +00:00
Ozan Tezcan	366c6aff81	Put replica online when bgsave is done (#13895 ) Before https://github.com/redis/redis/pull/13732, replicas were brought online immediately after master wrote the last bytes of the RDB file to the socket. This behavior remains unchanged if rdbchannel replication is not used. However, with rdbchannel replication, the replica is brought online after receiving the first ack which is sent by replica after rdb is loaded. To align the behavior, reverting this change to put replica online once bgsave is done. Additonal changes: - INFO field `mem_total_replication_buffers` will also contain `server.repl_full_sync_buffer.mem_used` which shows accumulated replication stream during rdbchannel replication on replica side. - Deleted debug level logging from some replication tests. These tests generate thousands of keys and it may cause per key logging on some cases.	2025-03-31 13:48:49 +03:00
Jason	aa8e2d1712	Ignore shardId updates from replica nodes (#13877 ) Close https://github.com/redis/redis/issues/13868 This bug was introduced by https://github.com/redis/redis/pull/13468 ## Issue To maintain compatibility with older versions that do not support shardid, when a replica passes a shardid, we also update the master’s shardid accordingly. However, when both the master and replica support shardid, an issue arises: in one moment, the master may pass a shardid, causing us to update both the master and all its replicas to match the master’s shardid. But if the replica later passes a different shardid, we would then update the master’s shardid again, leading to continuous changes in shardid. ## Solution Regardless of the situation, we always ensure that the replica’s shardid remains consistent with the master’s shardid.	2025-03-30 15:15:04 +08:00
debing.sun	87d8e71708	Fix defrag when type/encoding changes during scan (#13883 ) This PR is based on: https://github.com/valkey-io/valkey/pull/1801 [SoftlyRaining](https://github.com/SoftlyRaining) was hunting for defrag bugs with Jim and found a couple of improvements to make. Jim pointed out that in several of the callbacks, if the encoding were to change it simply returns without doing anything to `cursor` to make it reach 0, meaning that it would continue no-op working on that item without making any progress. Type and encoding can change while the defrag scan is in progress if the value is mutated or replaced by something else with the same key. --------- Signed-off-by: Rain Valentine <rsg000@gmail.com> Co-authored-by: Rain Valentine <rsg000@gmail.com>	2025-03-27 08:58:57 +08:00
Ozan Tezcan	a0da8390a2	Fix use-after-free when diskless load config is not swapdb (#13887 ) When the diskless load configuration is set to on-empty-db, we retain a pointer to the function library context. When emptyData() is called, it frees this function library context pointer, leading to a use-after-free situation. I refactored code to ensure that emptyData() is called first, followed by retrieving the valid pointer to the function library context. Refactored code should not introduce any runtime implications. Bug introduced by https://github.com/redis/redis/pull/13495 (Redis 8.0) Co-authored-by: Oran Agra <oran@redislabs.com>	2025-03-26 21:50:10 +03:00
Oran Agra	2a189709e0	avoid possible use-after-free with module KSN changes (#13875 ) in #13505, we changed the code to use the string value of the key rather than the integer value on the stack, but we have a test in unit/moduleapi/keyspace_events that uses keyspace notification hook to modify the value with RM_StringDMA, which can cause this value to be released before used. the reason it didn't happen so far is because we were using shared integers, so releasing the object doesn't free it.	2025-03-24 12:24:52 +02:00
Yuan Wang	319bbcc1a7	Fix sdscatprintf error of the in output of `info stats` (#13871 ) CI failed: https://github.com/redis/redis/actions/runs/13981749993/job/39148249096, since i don't reassign `info` after `sdscatprintf(info, xxx)` Thanks to @sundb for spotting this introduced in https://github.com/redis/redis/pull/13846	2025-03-24 09:17:58 +08:00
debing.sun	87b7c3ac1a	Fix rax node defragmentaion being skipped (#13847 ) First, when we do `raxSeek()` and then call raxNext, we will get the `RAX_ITER_JUST_SEEKED` flag and return success directly. We always set the node defrag callback after `raxSeek()`, which means that when we break from defragmentation, the first node that comes in again will never be defragged. In this PR, we save the last as the next node to be processed, not the last node to be completed. This way we defrag the next node when we exit to avoid it being skipped on the next resume. --------- Co-authored-by: oranagra <oran@redislabs.com>	2025-03-24 08:57:08 +08:00
Benson-li	427c36888e	Fix potential infinite loop of RANDOMKEY during client pause (#13863 ) The bug mentioned in this [#13862](https://github.com/redis/redis/issues/13862) has been fixed. --------- Signed-off-by: li-benson <1260437731@qq.com> Signed-off-by: youngmore1024 <youngmore1024@outlook.com> Co-authored-by: youngmore1024 <youngmore1024@outlook.com>	2025-03-20 21:32:12 +08:00
Yuan Wang	951ec79654	Cluster compatibility check (#13846 ) ### Background The program runs normally in standalone mode, but migrating to cluster mode may cause errors, this is because some cross slot commands can not run in cluster mode. We should provide an approach to detect this issue when running in standalone mode, and need to expose a metric which indicates the usage of no incompatible commands. ### Solution To avoid perf impact, we introduce a new config `cluster-compatibility-sample-ratio` which define the sampling ratio (0-100) for checking command compatibility in cluster mode. When a command is executed, it is sampled at the specified ratio to determine if it complies with Redis cluster constraints, such as cross-slot restrictions. A new metric is exposed: `cluster_incompatible_ops` in `info stats` output. The following operations will be considered incompatible operations. - cross-slot command If a command has multiple cross slot keys, it is incompatible - `swap, copy, move, select` command These commands involve multi databases in some cases, we don't allow multiple DB in cluster mode, so there are not compatible - Module command with `no-cluster` flag If a module command has `no-cluster` flag, we will encounter an error when loading module, leading to fail to load module if cluster is enabled, so this is incompatible. - Script/function with `no-cluster` flag Similar with module command, if we declare `no-cluster` in shebang of script/function, we also can not run it in cluster mode - `sort` command by/get pattern When `sort` command has `by/get` pattern option, we must ask that the pattern slot is equal with the slot of keys, otherwise it is incompatible in cluster mode. - The script/function command accesses the keys and declared keys have different slots For the script/function command, we not only check the slot of declared keys, but only check the slot the accessing keys, if they are different, we think it is incompatible. Besides, commands like `keys, scan, flushall, script/function flush`, that in standalone mode iterate over all data to perform the operation, are only valid for the server that executes the command in cluster mode and are not broadcasted. However, this does not lead to errors, so we do not consider them as incompatible commands. ### Performance impact test cross slot test Below are the test commands and results. When using MSET with 8 keys, performance drops by approximately 3%. single key test It may be due to the overhead of the sampling function, and single-key commands could cause a 1-2% performance drop.	2025-03-20 10:35:53 +08:00
Filipe Oliveira (Redis)	3e012c9260	Fix string2d usage in case of hexadecimal strings parsing and overflow (#13845 ) Since https://github.com/redis/redis/pull/11884, what was previously accepted as a valid input (hexadecimal string) before 8.0 returned an error. This PR addresses it. To avoid performance penalties if hints the compiler that the fallbacks are not likely to happen. Furthermore, we were ignoring std::result_out_of_range outputs from fast_float. This PR addresses it as well and includes tests for both identified scenarios. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-03-19 20:08:45 +08:00
debing.sun	26dcec4812	Fix messed-up unblocked clients in flush command (#13865 ) Fix https://github.com/redis/redis/pull/13853#pullrequestreview-2675227138 This PR ensures that the client's current command is not reset by unblockClient(), while still needing to be handled after `unblockclient()`. The FLUSH command still requires reprocessing (update the replication offset) after unblockClient(). Therefore, we mark such blocked clients with the CLIENT_PENDING_COMMAND flag to prevent the command from being reset during unblockClient().	2025-03-19 10:22:47 +08:00
debing.sun	a5a3afd923	Fix crash during SLAVEOF when clients are blocked on lazyfree (#13853 ) After https://github.com/redis/redis/pull/13167, when a client calls `FLUSHDB` command, we still async empty database, and the client was blocked until the lazyfree completes. 1) If another client calls `SLAVEOF` command during this time, the server will unblock all blocked clients, including those blocked by the lazyfree. However, when unblocking a lazyfree blocked client, we forgot to call `updateStatsOnUnblock()`, which ultimately triggered the following assertion. 2) If a client blocked by Lazyfree is unblocked midway, and at this point the `bio_comp_list` has already received the completion notification for the bio, we might end up processing a client that has already been unblocked in `flushallSyncBgDone()`. Therefore, we need to filter it out. --------- Co-authored-by: oranagra <oran@redislabs.com>	2025-03-17 20:27:05 +08:00
debing.sun	f364dcca2d	Make RM_DefragRedisModuleDict API support incremental defragmentation for dict leaf (#13840 ) After https://github.com/redis/redis/pull/13816, we make a new API to defrag RedisModuleDict. Currently, we only support incremental defragmentation of the dictionary itself, but the defragmentation of values is still not incremental. If the values are very large, it could lead to significant blocking. Therefore, in this PR, we have added incremental defragmentation for the values. The main change is to the `RedisModuleDefragDictValueCallback`, we modified the return value of this callback. When the callback returns 1, we will save the `seekTo` as the key of the current unfinished node, and the next time we enter, we will continue defragmenting this node. When the return value is 0, we will proceed to the next node. ## Test Since each dictionary in the global dict originally contained only 10 strings, but now it has been changed to a nested dictionary, each dictionary now has 10 sub-dictionaries, with each sub-dictionary containing 10 strings, this has led to a corresponding reduction in the defragmentation time obtained from other tests. Therefore, the other tests have been modified to always wait for defragmentation to be turned off before the test begins, then start it after creating fragmentation, ensuring that they can always run for a full defragmentation cycle. --------- Co-authored-by: ephraimfeldblum <ephraim.feldblum@redis.com>	2025-03-04 17:19:41 +08:00
debing.sun	7939ba031d	Enable the callback to be NULL for RM_DefragRedisModuleDict() and reduce the system calls of RM_DefragShouldStop() (#13830 ) 1) Enable the callback to be NULL for RM_DefragRedisModuleDict() Because the dictionary may store only the key without the value. 2) Reduce the system calls of RM_DefragShouldStop() The API checks the following thresholds before performing a time check: over 512 defrag hits, or over 1024 defrag misses, and performs the time judgment if any of these thresholds are reached. 3) Added defragmentation statistics for dictionary items to cover the associated code for RM_DefragRedisModuleDict(). 4) Removed `module_ctx` from `defragModuleCtx` struct, which can be replaced by a temporary variable. --------- Co-authored-by: oranagra <oran@redislabs.com>	2025-02-26 20:04:29 +08:00
Denis Nevmerzhitskii	33f03f6fc8	Fix wrong behavior of XREAD + after last entry of stream have been removed (#13632 ) Close #13628 This PR changes behavior of special `+` id of XREAD command. Now it uses `streamLastValidID` to find last entry instead of `last_id` field of stream object. This PR adds test for the issue. Notes Initial idea to update `last_id` while executing XDEL seems to be wrong. `last_id` is used to strore last generated id and not id of last entry. --------- Co-authored-by: debing.sun <debing.sun@redis.com> Co-authored-by: guybe7 <guy.benoish@redislabs.com>	2025-02-25 13:40:24 +08:00
Filipe Oliveira (Redis)	985bf68f34	Reduce redundant key slot calculations on expiration checks (#13796 ) On high-pipeline/fast commands use-cases, expireIfNeeded can take up to 3% cpu cycles. This PR introduces an optimization where key expiration checks leverage key slots to improve efficiency. --------- Co-authored-by: debing.sun <debing.sun@redis.com> Co-authored-by: ShooterIT <wangyuancode@163.com>	2025-02-25 11:55:30 +08:00
Moti Cohen	0200e8ada6	Fix multiple issues with "INFO KEYSIZES" (#13825 ) This commit addresses several issues related to the `INFO KEYSIZES` feature: - HyperLogLog commands: `KEYSIZES` hooks were not properly set or tested. - HFE lazy expiration: `KEYSIZES` hooks were not properly set or tested. - Empty DB & SYNC flow: On `blocking_async=0` flow, global `keysizes` histogram were not reset (can reproduced using `DEBUG RELOAD`). - Empty string handling: Fix histogram for strings of size 0. Not relevant to other data-types.	2025-02-25 00:38:44 +02:00
Filipe Oliveira (Redis)	1848809f66	Optimize dictFind by leveraging key length functions to avoid redundant computations. (#13792 ) This PR enhances dictFind by introducing support for key length functions, allowing the use of keyCompareWithLen when available. This avoids redundant key length computations, improving efficiency, especially when the dictionary is rehashing or there are a significant number of hash collisions. Additionally, it maintains backward compatibility and optimizes key lookups without altering existing behavior. Performance improvement on 100% GETs use-case benchmark command used ``` taskset -c 1-11 memtier_benchmark --ratio 0:1 --key-maximum 1000000 --key-minimum 1 -c 1 -t 5 --pipeline 100 --key-pattern P:P --test-time 30 --hide-histogram -d 1024 -S /tmp/1.socket -x 3 ``` In unstable dictFindByHash takes 29% (and sdslen within it takes 8.9%) of CPU cycles for a high-pipeline 100% gets use-case. After this change dictFindByHash takes 27.8% (and sdslen within it takes 7.7%) --------- Co-authored-by: debing.sun <debing.sun@redis.com> Co-authored-by: Yuan Wang <wangyuancode@163.com>	2025-02-24 22:27:06 +08:00
Filipe Oliveira (Redis)	d7a448f9ae	Avoid redundant calls to sdslen(c->querybuf) in processMultibulkBuffer (#13787 ) Optimize processMultibulkBuffer by avoiding redundant calls to sdslen(c->querybuf). The cached length is updated only when querybuf is modified.	2025-02-24 20:06:14 +08:00
debing.sun	658424fc83	Revert "Update history for ban-list propagation (#13749 )" (#13827 ) As discussed in https://github.com/redis/redis/pull/13749#issuecomment-2673612941. After #10398 we should record only the arguments and output changes in the command history, while placing all others in the redis-doc, so revert #13749.	2025-02-24 17:40:25 +08:00
Filipe Oliveira (Redis)	3f06ddfb7b	Reuse lookupCommand data on consecutive same command calls on main thread (#13764 ) We can see that on fast commands and fast pipeline use-cases, lookupCommand() takes 1.9% to 3.4% of total cpu cyles (depending on pipeline). In cases in which consecutives commands are the same we can avoid the call to lookupCommand() completely without changing or adding new fields to the client struct (we simply reuse the info already avaiable in lastcmd). This change can represent an improvement of around 4.4% in QPS on the high pipeline use-cases. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-02-24 12:33:14 +08:00
debing.sun	ee933d9e2b	Fixed passing incorrect endtime value for module context (#13822 ) 1) Fix a bug that passing an incorrect endtime to module. This bug was found by @ShooterIT. After #13814, all endtime will be monotonic time, and we should no longer convert it to ustime relative. Add assertions to prevent endtime from being much larger thatn the current time. 2) Fix a race in test `Reduce defrag CPU usage when module data can't be defragged` --------- Co-authored-by: ShooterIT <wangyuancode@163.com>	2025-02-23 12:58:48 +08:00
debing.sun	032357ec0f	Add RM_DefragRedisModuleDict module API (#13816 ) After #13815, we introduced incremental defragmentation for global data for module. Now we added a new module API `RM_DefragRedisModuleDict` to incremental defrag `RedisModuleDict`. This PR adds a new APIs and a new defrag callback: ```c RedisModuleDict RM_DefragRedisModuleDict(RedisModuleDefragCtx ctx, RedisModuleDict dict, RedisModuleDefragDictValueCallback valueCB, RedisModuleString seekTo); typedef void (RedisModuleDefragDictValueCallback)(RedisModuleDefragCtx ctx, void data, unsigned char key, size_t keylen); ``` Usage: ```c RedisModuleString seekTo = NULL; RedisModuleDict dict = = RedisModule_CreateDict(ctx); ... populate the dict code ... /* Defragment a dictionary completely / do { RedisModuleDict new = RedisModule_DefragRedisModuleDict(ctx, dict, defragGlobalDictValueCB, &seekTo); if (new != NULL) { dict = new; } } while (seekTo); ``` --------- Co-authored-by: ShooterIT <wangyuancode@163.com> Co-authored-by: oranagra <oran@redislabs.com>	2025-02-20 21:09:29 +08:00
debing.sun	695126ccce	Add support for incremental defragmentation of global module data (#13815 ) ## Description Currently, when performing defragmentation on non-key data within the module, we cannot process the defragmentation incrementally. This limitation affects the efficiency and flexibility of defragmentation in certain scenarios. The primary goal of this PR is to introduce support for incremental defragmentation of global module data. ## Interface Change New module API `RegisterDefragFunc2` This is a more advanced version of `RM_RegisterDefragFunc`, in that it takes a new callbacks(`RegisterDefragFunc2`) that has a return value, and can use RM_DefragShouldStop in and indicate that it should be called again later, or is it done (returned 0). ## Note The `RegisterDefragFunc` API remains available. --------- Co-authored-by: ShooterIT <wangyuancode@163.com> Co-authored-by: oranagra <oran@redislabs.com>	2025-02-20 00:28:16 +08:00
debing.sun	725cd268e6	Refactor of ActiveDefrag to reduce latencies (#13814 ) This PR is based on: https://github.com/valkey-io/valkey/pull/1462 ## Issue/Problems Duty Cycle: Active Defrag has configuration values which determine the intended percentage of CPU to be used based on a gradient of the fragmentation percentage. However, Active Defrag performs its work on the 100ms serverCron timer. It then computes a duty cycle and performs a single long cycle. For example, if the intended CPU is computed to be 10%, Active Defrag will perform 10ms of work on this 100ms timer cron. * This type of cycle introduces large latencies on the client (up to 25ms with default configurations) * This mechanism is subject to starvation when slow commands delay the serverCron Maintainability: The current Active Defrag code is difficult to read & maintain. Refactoring of the high level control mechanisms and functions will allow us to more seamlessly adapt to new defragmentation needs. Specific examples include: * A single function (activeDefragCycle) includes the logic to start/stop/modify the defragmentation as well as performing one "step" of the defragmentation. This should be separated out, so that the actual defrag activity can be performed on an independent timer (see duty cycle above). * The code is focused on kvstores, with other actions just thrown in at the end (defragOtherGlobals). There's no mechanism to break this up to reduce latencies. * For the main dictionary (only), there is a mechanism to set aside large keys to be processed in a later step. However this code creates a separate list in each kvstore (main dict or not), bleeding/exposing internal defrag logic. We only need 1 list - inside defrag. This logic should be more contained for the main key store. * The structure is not well suited towards other non-main-dictionary items. For example, pub-sub and pub-sub-shard was added, but it's added in such a way that in CMD mode, with multiple DBs, we will defrag pub-sub repeatedly after each DB. ## Description of the feature Primarily, this feature will split activeDefragCycle into 2 functions. 1. One function will be called from serverCron to determine if a defrag cycle (a complete scan) needs to be started. It will also determine if the CPU expenditure needs to be adjusted. 2. The 2nd function will be a timer proc dedicated to performing defrag. This will be invoked independently from serverCron. Once the functions are split, there is more control over the latency created by the defrag process. A new configuration will be used to determine the running time for the defrag timer proc. The default for this will be 500us (one-half of the current minimum time). Then the timer will be adjusted to achieve the desired CPU. As an example, 5% of CPU will run the defrag process for 500us every 10ms. This is much better than running for 5ms every 100ms. The timer function will also adjust to compensate for starvation. If a slow command delays the timer, the process will run proportionately longer to ensure that the configured CPU is achieved. Given the presence of slow commands, the proportional extra time is insignificant to latency. This also addresses the overload case. At 100% CPU, if the event loop slows, defrag will run proportionately longer to achieve the configured CPU utilization. Optionally, in low CPU situations, there would be little impact in utilizing more than the configured CPU. We could optionally allow the timer to pop more often (even with a 0ms delay) and the (tail) latency impact would not change. And we add a time limit for the defrag duty cycle to prevent excessive latency. When latency is already high (indicated by a long time between calls), we don't want to make it worse by running defrag for too long. Addressing maintainability: * The basic code structure can more clearly be organized around a "cycle". * Have clear begin/end functions and a set of "stages" to be executed. * Rather than stages being limited to "kvstore" type data, a cycle should be more flexible, incorporating the ability to incrementally perform arbitrary work. This will likely be necessary in the future for certain module types. It can be used today to address oddballs like defragOtherGlobals. * We reduced some of the globals, and reduce some of the coupling. defrag_later should be removed from serverDb. * Each stage should begin on a fresh cycle. So if there are non-time-bounded operations like kvstoreDictLUTDefrag, these would be less likely to introduce additional latency. Signed-off-by: Jim Brunner [brunnerj@amazon.com](mailto:brunnerj@amazon.com) Signed-off-by: Madelyn Olson [madelyneolson@gmail.com](mailto:madelyneolson@gmail.com) Co-authored-by: Madelyn Olson [madelyneolson@gmail.com](mailto:madelyneolson@gmail.com) --------- Signed-off-by: Jim Brunner brunnerj@amazon.com Signed-off-by: Madelyn Olson madelyneolson@gmail.com Co-authored-by: Madelyn Olson madelyneolson@gmail.com Co-authored-by: ShooterIT <wangyuancode@163.com>	2025-02-20 00:05:24 +08:00
guybe7	66df58f961	Do not send NL if replica client is already closed (#13813 ) In case a replica connection was closed mid-RDB, we should not send a \n to that replica, otherwise, it may reach the replica BEFORE it realizes that the RDB transfer failed, causing it to treat the \n as if it was read from the RDB stream	2025-02-19 15:04:28 +07:00
luozongle01	b045fe4e17	Fix overflow on 32-bit systems when calculating idle time for eviction (#13804 ) the `dictGetSignedIntegerVal` function should be used here, because in some cases (especially on 32-bit systems) long may be 4 bytes, and the ttl time saved in expires is a unix timestamp (millisecond value), which is more than 4 bytes. In this case, we may not be able to get the correct idle time, which may cause eviction disorder, in other words, keys that should be evicted later may be evicted earlier.	2025-02-19 11:01:15 +08:00
Yunxiao Du	c5f91abaf7	Fix syntax issue in comments of src/module.c (#13802 ) closes https://github.com/redis/redis/issues/13797, just fix syntax issue in comments instead of real code.	2025-02-19 10:58:14 +08:00
Ozan Tezcan	6c202f495c	Remove DENYOOM flag from hexpire command (#13800 ) Remove DENYOOM flag from hexpire / hexpireat / hpexpire / hpexpireat commands. h(p)expire(at) commands may allocate some memory but it is not that big. Similary, we don't have DENYOOM flag for EXPIRE command. This change will align EXPIRE and HEXPIRE commands in this manner.	2025-02-16 20:07:29 +03:00
Ozan Tezcan	e2608478b6	Add HGETDEL, HGETEX and HSETEX hash commands (#13798 ) This PR adds three new hash commands: HGETDEL, HGETEX and HSETEX. These commands enable user to do multiple operations in one step atomically e.g. set a hash field and update its TTL with a single command. Previously, it was only possible to do it by calling hset and hexpire commands subsequently. - HGETDEL command ``` HGETDEL <key> FIELDS <numfields> field [field ...] ``` Description Get and delete the value of one or more fields of a given hash key Reply Array reply: list of the value associated with each field or nil if the field doesn’t exist. - HGETEX command ``` HGETEX <key> [EX seconds \| PX milliseconds \| EXAT unix-time-seconds \| PXAT unix-time-milliseconds \| PERSIST] FIELDS <numfields> field [field ...] ``` Description Get the value of one or more fields of a given hash key, and optionally set their expiration Options: EX seconds: Set the specified expiration time, in seconds. PX milliseconds: Set the specified expiration time, in milliseconds. EXAT timestamp-seconds: Set the specified Unix time at which the field will expire, in seconds. PXAT timestamp-milliseconds: Set the specified Unix time at which the field will expire, in milliseconds. PERSIST: Remove the time to live associated with the field. Reply Array reply: list of the value associated with each field or nil if the field doesn’t exist. - HSETEX command ``` HSETEX <key> [FNX \| FXX] [EX seconds \| PX milliseconds \| EXAT unix-time-seconds \| PXAT unix-time-milliseconds \| KEEPTTL] FIELDS <numfields> field value [field value...] ``` Description Set the value of one or more fields of a given hash key, and optionally set their expiration Options: FNX: Only set the fields if all do not already exist. FXX: Only set the fields if all already exist. EX seconds: Set the specified expiration time, in seconds. PX milliseconds: Set the specified expiration time, in milliseconds. EXAT timestamp-seconds: Set the specified Unix time at which the field will expire, in seconds. PXAT timestamp-milliseconds: Set the specified Unix time at which the field will expire, in milliseconds. KEEPTTL: Retain the time to live associated with the field. Note: If no option is provided, any associated expiration time will be discarded similar to how SET command behaves. Reply Integer reply: 0 if no fields were set Integer reply: 1 if all the fields were set	2025-02-14 17:13:35 +03:00
Ofir Luzon	57807cd338	Memory Usage command LIST accuracy fix (#13783 ) MEMORY USAGE on a List samples quicklist entries, but does not account to how many elements are in each sampled node. This can skew the calculation when the sampled nodes are not balanced. The fix calculate the average element size in the sampled nodes instead of the average node size.	2025-02-14 09:18:47 +08:00
Yuan Wang	7f5f588232	AOF offset info (#13773 ) ### Background AOF is often used as an effective data recovery method, but now if we have two AOFs from different nodes, it is hard to learn which one has latest data. Generally, we determine whose data is more up-to-date by reading the latest modification time of the AOF file, but because of replication delay, even if both master and replica write to the AOF at the same time, the data in the master is more up-to-date (there are commands that didn't arrive at the replica yet, or a large number of commands have accumulated on replica side ), so we may make wrong decision. ### Solution The replication offset always increments when AOF is enabled even if there is no replica, we think replication offset is better method to determine which one has more up-to-date data, whoever has a larger offset will have newer data, so we add the start replication offset info for AOF, as bellow. ``` file appendonly.aof.2.base.rdb seq 2 type b file appendonly.aof.2.incr.aof seq 2 type i startoffset 224 ``` And if we close gracefully the AOF file, not a crash, such as `shutdown`, `kill signal 15` or `config set appendonly no`, we will add the end replication offset, as bellow. ``` file appendonly.aof.2.base.rdb seq 2 type b file appendonly.aof.2.incr.aof seq 2 type i startoffset 224 endoffset 532 ``` #### Things to pay attention to - For BASE AOF, we do not add `startoffset` and `endoffset` info, since we could not know the start replication replication of data, and it is useless to help us to determine which one has more up-to-date data. - For AOFs from old version, we also don't add `startoffset` and `endoffset` info, since we also don't know start replication replication of them. If we add the start offset from 0, we might make the judgment even less accurate. For example, if the master has just rewritten the AOF, its INCR AOF will inevitably be very small. However, if the replica has not rewritten AOF for a long time, its INCR AOF might be much larger. By applying the following method, we might make incorrect decisions, so we still just check timestamp instead of adding offset info - If the last INCR AOF has `startoffset` or `endoffset`, we need to restore `server.master_repl_offset` according to them to avoid the rollback of the `startoffset` of next INCR AOF. If it has `endoffset`, we just use this value as `server.master_repl_offset`, and a very important thing is to remove this information from the manifest file to avoid the next time we load the manifest file with wrong `endoffset`. If it only has `startoffset`, we calculate `server.master_repl_offset` by the `startoffset` plus the file size. ### How to determine which one has more up-to-date data If AOF has a larger replication offset, it will have more up-to-date data. The following is how to get AOF offset: Read the AOF manifest file to obtain information about the last INCR AOF 1. If the last INCR AOF has `endoffset` field, we can directly use the `endoffset` to present the replication offset of AOF 2. If there is no `endoffset`(such as redis crashes abnormally), but there is `startoffset` filed of the last INCR AOF, we can get the replication offset of AOF by `startoffset` plus the file size 3. Finally, if the AOF doesn’t have both `startoffset` and `endoffset`, maybe from old version, and new version redis has not rewritten AOF yet, we still need to check the modification timestamp of the last INCR AOF ### TODO Fix ping causing inconsistency between AOF size and replication offset in the future PR. Because we increment the replication offset when sending PING/REPLCONF to the replica but do not write data to the AOF file, this might cause the starting offset of the AOF file plus its size to be inconsistent with the actual replication offset.	2025-02-13 17:31:40 +08:00
Yuan Wang	662cb2fe75	Don't send unnecessary PING to replicas (#13790 ) The reason why master sends PING is to keep the connection with replica active, so master need not send PING to replicas if already sent replication stream in the past `repl_ping_slave_period` time. Now master only sends PINGs and increases `master_repl_offset` if there is no traffic, so this PR also can reduce the impact of issue in https://github.com/redis/redis/pull/13773, of course, does not resolve it completely. > Fix ping causing inconsistency between AOF size and replication offset in the future PR. Because we increment the replication offset when sending PING/REPLCONF to the replica but do not write data to the AOF file, this might cause the starting offset of the AOF file plus its size to be inconsistent with the actual replication offset.	2025-02-13 10:52:19 +08:00
Yuan Wang	87124a38b6	Fix wrongly updating fsynced_reploff_pending when appendfsync=everysecond (#13793 ) ``` if (server.aof_fsync == AOF_FSYNC_EVERYSEC && server.aof_last_incr_fsync_offset != server.aof_last_incr_size && server.mstime - server.aof_last_fsync >= 1000 && !(sync_in_progress = aofFsyncInProgress())) { goto try_fsync; ``` In https://github.com/redis/redis/pull/12622, when when appendfsync=everysecond, if redis has written some data to AOF but not `fsync`, and less than 1 second has passed since the last `fsync `, redis will won't fsync AOF, but we will update ` fsynced_reploff_pending`, so it cause the `WAITAOF` to return prematurely. this bug is introduced in https://github.com/redis/redis/pull/12622, from 7.4 The bug fix `1bd6688bca` is just as follows: ```diff diff --git a/src/aof.c b/src/aof.c index 8ccd8d8f8..521b30449 100644 --- a/src/aof.c +++ b/src/aof.c @@ -1096,8 +1096,11 @@ void flushAppendOnlyFile(int force) { * in which case master_repl_offset will increase but fsynced_reploff_pending won't be updated * (because there's no reason, from the AOF POV, to call fsync) and then WAITAOF may wait on * the higher offset (which contains data that was only propagated to replicas, and not to AOF) */ - if (!sync_in_progress && server.aof_fsync != AOF_FSYNC_NO) + if (server.aof_last_incr_fsync_offset == server.aof_last_incr_size && + !(sync_in_progress = aofFsyncInProgress())) + { atomicSet(server.fsynced_reploff_pending, server.master_repl_offset); + } return; ``` Additionally, we slightly refactored fsync AOF to make it simpler, as `584f008d1c`	2025-02-13 10:48:29 +08:00

1 2 3 4 5 ...

9089 commits