redis/src
Salvatore Sanfilippo 11947d8892
[Vector sets] fast JSON filter (#13959)
This PR replaces cJSON with an home-made parser designed for the kind of
access pattern the FILTER option of VSIM performs on JSON objects. The
main points here are:

* cJSON forces us to parse the whole JSON, create a graph of cJSON
objects, then we need to seek in O(N) to find the right field.
* The cJSON object associated with the value is not of the same format
as the expr.c virtual machine. We needed a conversion function doing
more allocation and work.
* Right now we only support top level fields in the JSON object, so a
full parser is not needed.

With all these things in mind, and after carefully profiling the old
code, I realized that a specialized parser able to parse JSON in a
zero-allocation fashion and only actually parse the value associated to
our key would be much more efficient. Moreover, after this change, the
dependencies of Vector Sets to external code drops to zero, and the
count of lines of code is 3000 lines less. The new line count with LOC
is 4200, making Vector Sets easily the smallest full featured
implementation of a Vector store available.

# Speedup achieved

In a dataset with JSON objects with 30 fields, 1 million elements, the
following query shows a 3.5x speedup:

vsim vectors:million ele ele943903 FILTER ".field29 > 1000 and .field15
< 50"
     
Please note that we get **3.5x speedup** in the VSIM command itself.
This means that the actual JSON parsing speedup is significantly greater
than that. However, in Redis land, under my past kingdom of many years
ago, the rule was that an improvement would produce speedups that are
*user facing*. This PR definitely qualifies.

What is interesting is that even with a JSON containing a single element
the speedup is of about 70%, so we are faster even in the worst case.

# Further info

Note that the new skipping parser, may happily process JSON objects that
are not perfectly valid, as soon as they look valid from the POV of
balancing [] and {} and so forth. This should not be an issue. Anyway
invalid JSON produces random results (the element is skipped at all even
if it would pass the filter).

Please feel free to ask me anything about the new implementation before
merging.
2025-05-05 09:52:42 +03:00
..
commands Revert "Update history for ban-list propagation (#13749)" (#13827) 2025-02-24 17:40:25 +08:00
modules Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
.gitignore
acl.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
adlist.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
adlist.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
ae.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
ae.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
ae_epoll.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
ae_evport.c Fix cluster bus extensions backwards compatibility (#10206) 2022-01-30 19:43:37 +02:00
ae_kqueue.c Fix the timing of read and write events under kqueue (#9416) 2021-09-02 11:07:51 +03:00
ae_select.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
anet.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
anet.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
aof.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
asciilogo.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
atomicvar.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
bio.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
bio.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
bitops.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
blocked.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
call_reply.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
call_reply.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
childinfo.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
cli_commands.c Reimplement cli hints based on command arg docs (#10515) 2023-03-30 19:03:56 +03:00
cli_commands.h Reimplement cli hints based on command arg docs (#10515) 2023-03-30 19:03:56 +03:00
cli_common.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
cli_common.h Adds connection timeout option to redis-cli (#10609) 2024-01-30 13:43:39 +02:00
cluster.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
cluster.h Delete redundant declaration of handleDebugClusterCommand() (#13974) 2025-04-24 10:50:35 +08:00
cluster_legacy.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
cluster_legacy.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
commands.c Reimplement cli hints based on command arg docs (#10515) 2023-03-30 19:03:56 +03:00
commands.def Revert "Update history for ban-list propagation (#13749)" (#13827) 2025-02-24 17:40:25 +08:00
commands.h Replaced comment with excessive warning. 2023-07-16 17:04:15 -05:00
config.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
config.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
connection.c Async IO Threads (#13695) 2024-12-23 14:16:40 +08:00
connection.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
connhelpers.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
crc16.c Change license from BSD-3 to dual RSALv2+SSPLv1 (#13157) 2024-03-20 22:38:24 +00:00
crc16_slottable.h Change crc16 slot table to be fixed size character array instead of pointer to strings (#13112) 2024-03-08 15:50:36 -08:00
crc64.c CRC64 perf improvements (#13638) 2024-11-12 09:21:22 +02:00
crc64.h Add --large-memory flag for REDIS_TEST to enable tests that consume more than 100mb (#9784) 2021-11-16 08:55:10 +02:00
crccombine.c CRC64 perf improvements (#13638) 2024-11-12 09:21:22 +02:00
crccombine.h CRC64 perf improvements (#13638) 2024-11-12 09:21:22 +02:00
crcspeed.c CRC64 perf improvements (#13638) 2024-11-12 09:21:22 +02:00
crcspeed.h CRC64 perf improvements (#13638) 2024-11-12 09:21:22 +02:00
db.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
debug.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
debugmacro.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
defrag.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
dict.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
dict.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
ebuckets.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
ebuckets.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
endianconv.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
endianconv.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
eval.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
eventnotifier.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
eventnotifier.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
evict.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
expire.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
fmacros.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
fmtargs.h Rewrite huge printf calls to smaller ones for readability (#12257) 2023-09-28 09:21:23 +03:00
function_lua.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
functions.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
functions.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
geo.c changed addReplyHumanLongDouble to addReplyDouble in georadiusGeneric and geoposCommand (#13494) 2024-09-03 20:54:20 +08:00
geo.h RDMF (Redis/Disque merge friendlyness) refactoring WIP 1. 2015-07-26 15:17:18 +02:00
geohash.c Change license from BSD-3 to dual RSALv2+SSPLv1 (#13157) 2024-03-20 22:38:24 +00:00
geohash.h Change license from BSD-3 to dual RSALv2+SSPLv1 (#13157) 2024-03-20 22:38:24 +00:00
geohash_helper.c Change license from BSD-3 to dual RSALv2+SSPLv1 (#13157) 2024-03-20 22:38:24 +00:00
geohash_helper.h Change license from BSD-3 to dual RSALv2+SSPLv1 (#13157) 2024-03-20 22:38:24 +00:00
hyperloglog.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
intset.c Change license from BSD-3 to dual RSALv2+SSPLv1 (#13157) 2024-03-20 22:38:24 +00:00
intset.h Change license from BSD-3 to dual RSALv2+SSPLv1 (#13157) 2024-03-20 22:38:24 +00:00
iothread.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
kvstore.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
kvstore.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
latency.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
latency.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
lazyfree.c Add KEYSIZES section to INFO (#13592) 2024-10-29 13:07:26 +02:00
listpack.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
listpack.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
listpack_malloc.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
localtime.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
logreqres.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
lolwut.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
lolwut.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
lolwut5.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
lolwut6.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
lzf.h Change lzf to handle values larger than UINT32_MAX (#9776) 2021-11-16 13:12:25 +02:00
lzf_c.c Change lzf to handle values larger than UINT32_MAX (#9776) 2021-11-16 13:12:25 +02:00
lzf_d.c Change lzf to handle values larger than UINT32_MAX (#9776) 2021-11-16 13:12:25 +02:00
lzfP.h Change lzf to handle values larger than UINT32_MAX (#9776) 2021-11-16 13:12:25 +02:00
Makefile [Vector sets] fast JSON filter (#13959) 2025-05-05 09:52:42 +03:00
memtest.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
mkreleasehdr.sh fix the wrong path in mkreleasehdr.sh (#12993) 2024-01-26 15:01:54 -08:00
module.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
monotonic.c Better standardize around assertions (#12539) 2023-10-02 18:58:44 -07:00
monotonic.h Remove prototypes with empty declarations (#12020) 2023-05-02 17:31:32 -07:00
mstr.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
mstr.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
mt19937-64.c Fix random element selection for large hash tables. (#8133) 2020-12-23 15:52:07 +02:00
mt19937-64.h Fix random element selection for large hash tables. (#8133) 2020-12-23 15:52:07 +02:00
multi.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
networking.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
notify.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
object.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
pqsort.c Change license from BSD-3 to dual RSALv2+SSPLv1 (#13157) 2024-03-20 22:38:24 +00:00
pqsort.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
pubsub.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
quicklist.c Fix typos in multiple Redis source files (#13716) 2025-01-07 15:35:47 +08:00
quicklist.h Determine the large limit of the quicklist node based on fill (#12659) 2024-02-22 10:02:38 +02:00
rand.c Change license from BSD-3 to dual RSALv2+SSPLv1 (#13157) 2024-03-20 22:38:24 +00:00
rand.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
rax.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
rax.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
rax_malloc.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
rdb.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
rdb.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
redis-benchmark.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
redis-check-aof.c Change license from BSD-3 to dual RSALv2+SSPLv1 (#13157) 2024-03-20 22:38:24 +00:00
redis-check-rdb.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
redis-cli.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
redis-trib.rb Redis-trib deprecated: it no longer works and it 2018-07-13 10:51:58 +02:00
redisassert.c Fixed variable parameter formatting issues in serverPanic function (#13504) 2024-09-03 15:51:46 +08:00
redisassert.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
redismodule.h Make RM_DefragRedisModuleDict API support incremental defragmentation for dict leaf (#13840) 2025-03-04 17:19:41 +08:00
release.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
replication.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
resp_parser.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
resp_parser.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
rio.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
rio.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
script.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
script.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
script_lua.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
script_lua.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
sds.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
sds.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
sdsalloc.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
sentinel.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
server.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
server.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
setcpuaffinity.c cpu affinity: DragonFlyBSD support (#7956) 2020-10-25 14:14:05 +02:00
setproctitle.c Change license from BSD-3 to dual RSALv2+SSPLv1 (#13157) 2024-03-20 22:38:24 +00:00
sha1.c Ignore -Wstringop-overread warning for SHA1Transform() on GCC 12 (#11538) 2022-11-24 15:27:16 +02:00
sha1.h Fix some compile warnings and errors when building with gcc-12 or clang (#12035) 2023-04-18 09:53:51 +03:00
sha256.c Add sanitizer support and clean up sanitizer findings (#9601) 2021-11-11 13:51:33 +02:00
sha256.h fix explanation of sha256 (#9220) 2021-07-10 10:04:54 -05:00
siphash.c Change license from BSD-3 to dual RSALv2+SSPLv1 (#13157) 2024-03-20 22:38:24 +00:00
slowlog.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
slowlog.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
socket.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
solarisfixes.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
sort.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
sparkline.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
sparkline.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
stream.h XREADGROUP from PEL should not affect server.dirty (#13251) 2024-05-06 16:55:42 +08:00
strl.c Avoid using unsafe C functions (#10932) 2022-07-18 10:56:26 +03:00
syncio.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
syscheck.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
syscheck.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
t_hash.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
t_list.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
t_set.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
t_stream.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
t_string.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
t_zset.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
testhelp.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
threads_mngr.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
threads_mngr.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
timeout.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
tls.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
tracking.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
unix.c Async IO Threads (#13695) 2024-12-23 14:16:40 +08:00
util.c Fix string2d usage in case of hexadecimal strings parsing and overflow (#13845) 2025-03-19 20:08:45 +08:00
util.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
valgrind.sup Sanitize dump payload: fuzz tester and fixes for segfaults and leaks it exposed 2020-12-06 14:54:34 +02:00
version.h Add Module API for version and compatibility checks (#7865) 2020-10-11 17:21:58 +03:00
ziplist.c Fix incorrect parameter type reports (#13744) 2025-01-14 15:51:05 +08:00
ziplist.h Change license from BSD-3 to dual RSALv2+SSPLv1 (#13157) 2024-03-20 22:38:24 +00:00
zipmap.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
zipmap.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
zmalloc.c Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00
zmalloc.h Adding AGPLv3 as a license option to Redis! (#13997) 2025-05-01 14:04:22 +01:00