redis

Commit Graph

Author	SHA1	Message	Date
Pieter Cailliau	0b34396924	Change license from BSD-3 to dual RSALv2+SSPLv1 (#13157 ) [Read more about the license change here](https://redis.com/blog/redis-adopts-dual-source-available-licensing/) Live long and prosper 🖖	2024-03-20 22:38:24 +00:00
YaacovHazan	e9c795e777	Fix loading rdb opcode RDB_OPCODE_RESIZEDB (#13050 ) Following the changes introduced by `8cd62f82c`, the dbExpandExpires used the db_size instead of expires_size. Co-authored-by: YaacovHazan <yaacov.hazan@redislabs.com>	2024-02-12 21:55:37 +02:00
YaacovHazan	7ca0b84af6	Fix loading rdb opcode RDB_OPCODE_SLOT_INFO (#13049 ) Following the changes introduced by `8cd62f82c`, the kvstoreDictExpand for the expires kvstore used the slot_size instead of expires_slot_size. Co-authored-by: YaacovHazan <yaacov.hazan@redislabs.com>	2024-02-12 21:46:06 +02:00
Binbin	813327b231	Fix SORT STORE quicklist with the right options (#13042 ) We forgot to call quicklistSetOptions after createQuicklistObject, in the sort store scenario, we will create a quicklist with default fill or compress options. This PR adds fill and depth parameters to createQuicklistObject to specify that options need to be set after creating a quicklist. This closes #12871. release notes: > Fix lists created by SORT STORE to respect list compression and packing configs.	2024-02-08 14:36:11 +02:00
guybe7	8cd62f82ca	Refactor the per-slot dict-array db.c into a new kvstore data structure (#12822 ) # Description Gather most of the scattered `redisDb`-related code from the per-slot dict PR (#11695) and turn it to a new data structure, `kvstore`. i.e. it's a class that represents an array of dictionaries. # Motivation The main motivation is code cleanliness, the idea of using an array of dictionaries is very well-suited to becoming a self-contained data structure. This allowed cleaning some ugly code, among others: loops that run twice on the main dict and expires dict, and duplicate code for allocating and releasing this data structure. # Notes 1. This PR reverts the part of https://github.com/redis/redis/pull/12848 where the `rehashing` list is global (handling rehashing `dict`s is under the responsibility of `kvstore`, and should not be managed by the server) 2. This PR also replaces the type of `server.pubsubshard_channels` from `dict**` to `kvstore` (original PR: https://github.com/redis/redis/pull/12804). After that was done, server.pubsub_channels was also chosen to be a `kvstore` (with only one `dict`, which seems odd) just to make the code cleaner by making it the same type as `server.pubsubshard_channels`, see `pubsubtype.serverPubSubChannels` 3. the keys and expires kvstores are currenlty configured to allocate the individual dicts only when the first key is added (unlike before, in which they allocated them in advance), but they won't release them when the last key is deleted. Worth mentioning that due to the recent change the reply of DEBUG HTSTATS changed, in case no keys were ever added to the db. before: ``` 127.0.0.1:6379> DEBUG htstats 9 [Dictionary HT] Hash table 0 stats (main hash table): No stats available for empty dictionaries [Expires HT] Hash table 0 stats (main hash table): No stats available for empty dictionaries ``` after: ``` 127.0.0.1:6379> DEBUG htstats 9 [Dictionary HT] [Expires HT] ```	2024-02-05 17:21:35 +02:00
Yanqi Lv	62153b3b2f	Refine the purpose of rdb saving with accurate flags (#12925 ) In Redis, rdb is produced in three scenarios mainly. - backup, such as `bgsave` and `save` command - full sync in replication - aof rewrite if `aof-use-rdb-preamble` is yes We also have some RDB flags to identify the purpose of rdb saving. ```C /* flags on the purpose of rdb save or load / #define RDBFLAGS_NONE 0 / No special RDB loading. / #define RDBFLAGS_AOF_PREAMBLE (1<<0) / Load/save the RDB as AOF preamble. / #define RDBFLAGS_REPLICATION (1<<1) / Load/save for SYNC. / ``` But currently, it seems that these flags and purposes of rdb saving don't exactly match. I find it in `rdbSaveRioWithEOFMark` which calls `startSaving` with `RDBFLAGS_REPLICATION` but `rdbSaveRio` with `RDBFLAGS_NONE`. ```C int rdbSaveRioWithEOFMark(int req, rio rdb, int error, rdbSaveInfo rsi) { char eofmark[RDB_EOF_MARK_SIZE]; startSaving(RDBFLAGS_REPLICATION); getRandomHexChars(eofmark,RDB_EOF_MARK_SIZE); if (error) error = 0; if (rioWrite(rdb,"$EOF:",5) == 0) goto werr; if (rioWrite(rdb,eofmark,RDB_EOF_MARK_SIZE) == 0) goto werr; if (rioWrite(rdb,"\r\n",2) == 0) goto werr; if (rdbSaveRio(req,rdb,error,RDBFLAGS_NONE,rsi) == C_ERR) goto werr; if (rioWrite(rdb,eofmark,RDB_EOF_MARK_SIZE) == 0) goto werr; stopSaving(1); return C_OK; werr: / Write error. / / Set 'error' only if not already set by rdbSaveRio() call. / if (error && error == 0) *error = errno; stopSaving(0); return C_ERR; } ``` In this PR, I refine the purpose of rdb saving with accurate flags.	2024-02-01 13:41:02 +02:00
Binbin	14e4a9835a	Fix minor fd leak in rdbSaveToSlavesSockets (#12919 ) We should close server.rdb_child_exit_pipe when redisFork fails, otherwise the pipe fd will be leaked. Just a cleanup.	2024-01-08 17:36:34 +02:00
Guillaume Koenig	967fb3c6e8	Extend rax usage by allowing any long long value (#12837 ) The raxFind implementation uses a special pointer value (the address of a static string) as the "not found" value. It works as long as actual pointers were used. However we've seen usages where long long, non-pointer values have been used. It creates a risk that one of the long long value precisely is the address of the special "not found" value. This commit changes raxFind to return 1 or 0 to indicate elementhood, and take in a new void **value to optionally return the associated value. By extension, this also allow the RedisModule_DictSet/Replace operations to also safely insert integers instead of just pointers.	2023-12-14 14:50:18 -08:00
Binbin	a3ae2ed37b	Remove dead code around should_expand_db (#12767 ) when dbExpand is called from rdb.c with try_expand set to 0, it will either panic panic on OOM, or be non-fatal (should not fail RDB loading) At the same time, the log text has been slightly adjusted to make it more unified.	2023-12-10 10:40:15 +02:00
bentotten	826b39e016	Align server.lastsave and server.rdb_save_time_last by removing multiple calls to time(NULL) (#12823 ) This makes sure the various times (server.lastsave and server.rdb_save_time_last) are aligned by using the result of the same time call.	2023-12-07 17:03:51 -08:00
zhaozhao.zz	b730404c2f	Fix multi dbs donot dbExpand when loading RDB (#12840 ) Currently, during RDB loading, once a `dbExpand` is performed, the `should_expand_db` flag is set to 0. This causes the remaining DBs unable to do `dbExpand` when there are multiple DBs. To fix this issue, we need to set `should_expand_db` back to 1 whenever we encounter `RDB_OPCODE_RESIZEDB`. This ensures that each DB can perform `dbExpand` correctly. Additionally, the initial value of `should_expand_db` should also be set to 0 to prevent invalid `dbExpand` in older versions of RDB where `RDB_OPCODE_RESIZEDB` is not present. problem introduced in #11695	2023-12-06 10:59:56 +02:00
Moshe Kaplan	c9aa586b6b	rdb.c: Avoid potential file handle leak (Coverity 404720) (#12795 ) `open()` can return any non-negative integer on success, including zero. This change modifies the check from open()'s return value to also include checking for a return value of zero (e.g., if stdin were closed and then `open()` was called). Fixes Coverity 404720 Can't happen in Redis. just a cleanup.	2023-11-23 10:14:17 +02:00
Binbin	fe36306340	Fix DB iterator not resetting pauserehash causing dict being unable to rehash (#12757 ) When using DB iterator, it will use dictInitSafeIterator to init a old safe dict iterator. When dbIteratorNext is used, it will jump to the next slot db dict when we are done a dict. During this process, we do not have any calls to dictResumeRehashing, which causes the dict's pauserehash to always be > 0. And at last, it will be returned directly in dictRehashMilliseconds, which causes us to have slot dict in a state where rehash cannot be completed. In the "expire scan should skip dictionaries with lot's of empty buckets" test, adding a `keys ` can reproduce the problem stably. `keys ` will call dbIteratorNext to trigger a traversal of all slot dicts. Added dbReleaseIterator and dbIteratorInitNextSafeIterator methods to call dictResetIterator. Issue was introduced in #11695.	2023-11-14 14:28:46 +02:00
Vitaly	0270abda82	Replace cluster metadata with slot specific dictionaries (#11695 ) This is an implementation of https://github.com/redis/redis/issues/10589 that eliminates 16 bytes per entry in cluster mode, that are currently used to create a linked list between entries in the same slot. Main idea is splitting main dictionary into 16k smaller dictionaries (one per slot), so we can perform all slot specific operations, such as iteration, without any additional info in the `dictEntry`. For Redis cluster, the expectation is that there will be a larger number of keys, so the fixed overhead of 16k dictionaries will be The expire dictionary is also split up so that each slot is logically decoupled, so that in subsequent revisions we will be able to atomically flush a slot of data. ## Important changes * Incremental rehashing - one big change here is that it's not one, but rather up to 16k dictionaries that can be rehashing at the same time, in order to keep track of them, we introduce a separate queue for dictionaries that are rehashing. Also instead of rehashing a single dictionary, cron job will now try to rehash as many as it can in 1ms. * getRandomKey - now needs to not only select a random key, from the random bucket, but also needs to select a random dictionary. Fairness is a major concern here, as it's possible that keys can be unevenly distributed across the slots. In order to address this search we introduced binary index tree). With that data structure we are able to efficiently find a random slot using binary search in O(log^2(slot count)) time. * Iteration efficiency - when iterating dictionary with a lot of empty slots, we want to skip them efficiently. We can do this using same binary index that is used for random key selection, this index allows us to find a slot for a specific key index. For example if there are 10 keys in the slot 0, then we can quickly find a slot that contains 11th key using binary search on top of the binary index tree. * scan API - in order to perform a scan across the entire DB, the cursor now needs to not only save position within the dictionary but also the slot id. In this change we append slot id into LSB of the cursor so it can be passed around between client and the server. This has interesting side effect, now you'll be able to start scanning specific slot by simply providing slot id as a cursor value. The plan is to not document this as defined behavior, however. It's also worth nothing the SCAN API is now technically incompatible with previous versions, although practically we don't believe it's an issue. * Checksum calculation optimizations - During command execution, we know that all of the keys are from the same slot (outside of a few notable exceptions such as cross slot scripts and modules). We don't want to compute the checksum multiple multiple times, hence we are relying on cached slot id in the client during the command executions. All operations that access random keys, either should pass in the known slot or recompute the slot. * Slot info in RDB - in order to resize individual dictionaries correctly, while loading RDB, it's not enough to know total number of keys (of course we could approximate number of keys per slot, but it won't be precise). To address this issue, we've added additional metadata into RDB that contains number of keys in each slot, which can be used as a hint during loading. * DB size - besides `DBSIZE` API, we need to know size of the DB in many places want, in order to avoid scanning all dictionaries and summing up their sizes in a loop, we've introduced a new field into `redisDb` that keeps track of `key_count`. This way we can keep DBSIZE operation O(1). This is also kept for O(1) expires computation as well. ## Performance This change improves SET performance in cluster mode by ~5%, most of the gains come from us not having to maintain linked lists for keys in slot, non-cluster mode has same performance. For workloads that rely on evictions, the performance is similar because of the extra overhead for finding keys to evict. RDB loading performance is slightly reduced, as the slot of each key needs to be computed during the load. ## Interface changes * Removed `overhead.hashtable.slot-to-keys` to `MEMORY STATS` * Scan API will now require 64 bits to store the cursor, even on 32 bit systems, as the slot information will be stored. * New RDB version to support the new op code for SLOT information. --------- Co-authored-by: Vitaly Arbuzov <arvit@amazon.com> Co-authored-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: Roshan Khatri <rvkhatri@amazon.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2023-10-14 23:58:26 -07:00
Wen Hui	965dc90b72	change return type to be consistant (#12479 ) Currently rdbSaveMillisecondTime, rdbSaveDoubleValue api's return type is int but they return the value directly from rdbWriteRaw function which has the return type of ssize_t. As this may cause overflow to int so changed to ssize_t.	2023-08-16 10:38:59 +03:00
Meir Shpilraien (Spielrein)	2ee1bbb53b	Ensure that the function load timeout is disabled during loading from RDB/AOF and on replicas. (#12451 ) When loading a function from either RDB/AOF or a replica, it is essential not to fail on timeout errors. The loading time may vary due to various factors, such as hardware specifications or the system's workload during the loading process. Once a function has been successfully loaded, it should be allowed to load from persistence or on replicas without encountering a timeout failure. To maintain a clear separation between the engine and Redis internals, the implementation refrains from directly checking the state of Redis within the engine itself. Instead, the engine receives the desired timeout as part of the library creation and duly respects this timeout value. If Redis wishes to disable any timeout, it can simply send a value of 0.	2023-08-02 11:43:31 +03:00
judeng	93708c7f6a	use embedded string object and more efficient ll2string for long long value convert to string (#12250 ) A value of type long long is always less than 21 bytes when convert to a string, so always meets the conditions for using embedded string object which can always get memory reduction and performance gain (less calls to the heap allocator). Additionally, for the conversion of longlong type to sds, we also use a faster algorithm (the one in util.c instead of the one that used to be in sds.c). For the DECR command on 32-bit Redis, we get about a 5.7% performance improvement. There will also be some performance gains for some commands that heavily use sdscatfmt to convert numbers, such as INFO. Co-authored-by: Oran Agra <oran@redislabs.com>	2023-06-20 15:14:44 +03:00
Oran Agra	c2f1815bcb	Avoid trying to trim string loaded from RDB file. (#12241 ) This is a followup fix for #11817	2023-05-30 10:43:25 +03:00
Joe Hu	644d94558a	Fix RDB check regression caused by PR 12022 (#12051 ) The nightly tests showed that the recent PR #12022 caused random failures in aof.tcl on checking RDB preamble inside an AOF file. Root cause: When checking RDB preamble in an AOF file, what's passed into redis_check_rdb is aof_filename, not aof_filepath. The newly introduced isFifo function does not check return status of the stat call and hence uses the uninitailized stat_p object. Fix: 1. Fix isFifo by checking stat call's return code. 2. Pass aof_filepath instead of aof_filename to redis_check_rdb. 3. move the FIFO check to rdb.c since the limitation is the re-opening of the file, and not anything specific about redis-check-rdb.	2023-04-17 21:05:36 +03:00
Ozan Tezcan	e55568edb5	Add RM_RdbLoad and RM_RdbSave module API functions (#11852 ) Add `RM_RdbLoad()` and `RM_RdbSave()` to load/save RDB files from the module API. In our use case, we have our clustering implementation as a module. As part of this implementation, the module needs to trigger RDB save operation at specific points. Also, this module delivers RDB files to other nodes (not using Redis' replication). When a node receives an RDB file, it should be able to load the RDB. Currently, there is no module API to save/load RDB files. This PR adds four new APIs: ```c RedisModuleRdbStream RM_RdbStreamCreateFromFile(const char filename); void RM_RdbStreamFree(RedisModuleRdbStream stream); int RM_RdbLoad(RedisModuleCtx ctx, RedisModuleRdbStream stream, int flags); int RM_RdbSave(RedisModuleCtx ctx, RedisModuleRdbStream stream, int flags); ``` The first step is to create a `RedisModuleRdbStream` object. This PR provides a function to create RedisModuleRdbStream from the filename. (You can load/save RDB with the filename). In the future, this API can be extended if needed: e.g., `RM_RdbStreamCreateFromFd()`, `RM_RdbStreamCreateFromSocket()` to save/load RDB from an `fd` or a `socket`. Usage: ```c / Save RDB / RedisModuleRdbStream stream = RedisModule_RdbStreamCreateFromFile("example.rdb"); RedisModule_RdbSave(ctx, stream, 0); RedisModule_RdbStreamFree(stream); /* Load RDB / RedisModuleRdbStream stream = RedisModule_RdbStreamCreateFromFile("example.rdb"); RedisModule_RdbLoad(ctx, stream, 0); RedisModule_RdbStreamFree(stream); ```	2023-04-09 12:07:32 +03:00
Binbin	521e54f551	Demoting some of the non-warning messages to notice (#10715 ) We have cases where we print information (might be important but by no means an error indicator) with the LL_WARNING level. Demoting these to LL_NOTICE: - oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo - User requested shutdown... This is also true for cases that we encounter a rare but normal situation. Demoting to LL_NOTICE. Examples: - AOF was enabled but there is already another background operation. An AOF background was scheduled to start when possible. - Connection with master lost. base on yoav-steinberg's https://github.com/redis/redis/pull/10650#issuecomment-1112280554 and yossigo's https://github.com/redis/redis/pull/10650#pullrequestreview-967677676	2023-02-19 16:33:19 +02:00
Tian	7dae142a2e	Reclaim page cache of RDB file (#11248 ) # Background The RDB file is usually generated and used once and seldom used again, but the content would reside in page cache until OS evicts it. A potential problem is that once the free memory exhausts, the OS have to reclaim some memory from page cache or swap anonymous page out, which may result in a jitters to the Redis service. Supposing an exact scenario, a high-capacity machine hosts many redis instances, and we're upgrading the Redis together. The page cache in host machine increases as RDBs are generated. Once the free memory drop into low watermark(which is more likely to happen in older Linux kernel like 3.10, before [watermark_scale_factor](https://lore.kernel.org/lkml/1455813719-2395-1-git-send-email-hannes@cmpxchg.org/) is introduced, the `low watermark` is linear to `min watermark`, and there'is not too much buffer space for `kswapd` to be wake up to reclaim memory), a `direct reclaim` happens, which means the process would stall to wait for memory allocation. # What the PR does The PR introduces a capability to reclaim the cache when the RDB is operated. Generally there're two cases, read and write the RDB. For read it's a little messy to address the incremental reclaim, so the reclaim is done in one go in background after the load is finished to avoid blocking the work thread. For write, incremental reclaim amortizes the work of reclaim so no need to put it into background, and the peak watermark of cache can be reduced in this way. Two cases are addresses specially, replication and restart, for both of which the cache is leveraged to speed up the processing, so the reclaim is postponed to a right time. To do this, a flag is added to`rdbSave` and `rdbLoad` to control whether the cache need to be kept, with the default value false. # Something deserve noting 1. Though `posix_fadvise` is the POSIX standard, but only few platform support it, e.g. Linux, FreeBSD 10.0. 2. In Linux `posix_fadvise` only take effect on writeback-ed pages, so a `sync`(or `fsync`, `fdatasync`) is needed to flush the dirty page before `posix_fadvise` if we reclaim write cache. # About test A unit test is added to verify the effect of `posix_fadvise`. In integration test overall cache increase is checked, as well as the cache backed by RDB as a specific TCL test is executed in isolated Github action job.	2023-02-12 09:23:29 +02:00
Binbin	20854cb610	Fix zuiFind crash / RM_ScanKey hang on SET object listpack encoding (#11581 ) In #11290, we added listpack encoding for SET object. But forgot to support it in zuiFind, causes ZINTER, ZINTERSTORE, ZINTERCARD, ZIDFF, ZDIFFSTORE to crash. And forgot to support it in RM_ScanKey, causes it hang. This PR add support SET listpack in zuiFind, and in RM_ScanKey. And add tests for related commands to cover this case. Other changes: - There is no reason for zuiFind to go into the internals of the SET. It can simply use setTypeIsMember and don't care about encoding. - Remove the `#include "intset.h"` from server.h reduce the chance of accidental intset API use. - Move setTypeAddAux, setTypeRemoveAux and setTypeIsMemberAux interfaces to the header. - In scanGenericCommand, use setTypeInitIterator and setTypeNext to handle OBJ_SET scan. - In RM_ScanKey, improve hash scan mode, use lpGetValue like zset, they can share code and better performance. The zuiFind part fixes #11578 Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2022-12-09 17:08:01 +02:00
Viktor Söderqvist	8a315fc285	When converting a set to dict, presize for one more element to be added (#11559 ) In most cases when a listpack or intset is converted to a dict, the conversion is trigged when adding an element. The extra element is added after conversion to dict (in all cases except when the conversion is triggered by set-max-intset-entries being reached). If set-max-listpack-entries is set to a power of two, let's say 128, when adding the 129th element, the 128 element listpack is first converted to a dict with a hashtable presized for 128 elements. After converting to dict, the 129th element is added to the dict which immediately triggers incremental rehashing to size 256. This commit instead presizes the dict to one more element, with the assumption that conversion to dict is followed by adding another element, so the dict doesn't immediately need rehashing. Co-authored-by: sundb <sundbcn@gmail.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2022-12-06 11:25:51 +02:00
guybe7	72e90695ec	Stream consumers: Re-purpose seen-time, add active-time (#11099 ) 1. "Fixed" the current code so that seen-time/idle actually refers to interaction attempts (as documented; breaking change) 2. Added active-time/inactive to refer to successful interaction (what seen-time/idle used to be) At first, I tried to avoid changing the behavior of seen-time/idle but then realized that, in this case, the odds are the people read the docs and implemented their code based on the docs (which didn't match the behavior). For the most part, that would work fine, except that issue #9996 was found. I was working under the assumption that people relied on the docs, and for the most part, it could have worked well enough. so instead of fixing the docs, as I would usually do, I fixed the code to match the docs in this particular case. Note that, in case the consumer has never read any entries, the values for both "active-time" (XINFO FULL) and "inactive" (XINFO CONSUMERS) will be -1, meaning here that the consumer was never active. Note that seen/active time is only affected by XREADGROUP / X[AUTO]CLAIM, not by XPENDING, XINFO, and other "read-only" stream CG commands (always has been, even before this PR) Other changes: * Another behavioral change (arguably a bugfix) is that XREADGROUP and X[AUTO]CLAIM create the consumer regardless of whether it was able to perform some reading/claiming * RDB format change to save the `active_time`, and set it to the same value of `seen_time` in old rdb files.	2022-11-30 14:21:31 +02:00
Binbin	51887e61b8	sanitize dump payload: fix crash with empty set with listpack encoding (#11519 ) The following example will create an empty set (listpack encoding): ``` > RESTORE key 0 "\x14\x25\x25\x00\x00\x00\x00\x00\x02\x01\x82\x5F\x37\x03\x06\x01\x82\x5F\x35\x03\x82\x5F\x33\x03\x00\x01\x82\x5F\x31\x03\x82\x5F\x39\x03\x04\xA9\x08\x01\xFF\x0B\x00\xA3\x26\x49\xB4\x86\xB0\x0F\x41" OK > SCARD key (integer) 0 > SRANDMEMBER key Error: Server closed the connection ``` In the spirit of #9297, skip empty set when loading RDB_TYPE_SET_LISTPACK. Introduced in #11290	2022-11-20 12:12:15 +02:00
sundb	2168ccc661	Add listpack encoding for list (#11303 ) Improve memory efficiency of list keys ## Description of the feature The new listpack encoding uses the old `list-max-listpack-size` config to perform the conversion, which we can think it of as a node inside a quicklist, but without 80 bytes overhead (internal fragmentation included) of quicklist and quicklistNode structs. For example, a list key with 5 items of 10 chars each, now takes 128 bytes instead of 208 it used to take. ## Conversion rules * Convert listpack to quicklist When the listpack length or size reaches the `list-max-listpack-size` limit, it will be converted to a quicklist. * Convert quicklist to listpack When a quicklist has only one node, and its length or size is reduced to half of the `list-max-listpack-size` limit, it will be converted to a listpack. This is done to avoid frequent conversions when we add or remove at the bounding size or length. ## Interface changes 1. add list entry param to listTypeSetIteratorDirection When list encoding is listpack, `listTypeIterator->lpi` points to the next entry of current entry, so when changing the direction, we need to use the current node (listTypeEntry->p) to update `listTypeIterator->lpi` to the next node in the reverse direction. ## Benchmark ### Listpack VS Quicklist with one node * LPUSH - roughly 0.3% improvement * LRANGE - roughly 13% improvement ### Both are quicklist * LRANGE - roughly 3% improvement * LRANGE without pipeline - roughly 3% improvement From the benchmark, as we can see from the results 1. When list is quicklist encoding, LRANGE improves performance by <5%. 2. When list is listpack encoding, LRANGE improves performance by ~13%, the main enhancement is brought by `addListListpackRangeReply()`. ## Memory usage 1M lists(key:0~key:1000000) with 5 items of 10 chars ("hellohello") each. shows memory usage down by 35.49%, from 214MB to 138MB. ## Note 1. Add conversion callback to support doing some work before conversion Since the quicklist iterator decompresses the current node when it is released, we can no longer decompress the quicklist after we convert the list.	2022-11-16 20:29:46 +02:00
Viktor Söderqvist	4e472a1a7f	Listpack encoding for sets (#11290 ) Small sets with not only integer elements are listpack encoded, by default up to 128 elements, max 64 bytes per element, new config `set-max-listpack-entries` and `set-max-listpack-value`. This saves memory for small sets compared to using a hashtable. Sets with only integers, even very small sets, are still intset encoded (up to 1G limit, etc.). Larger sets are hashtable encoded. This PR increments the RDB version, and has an effect on OBJECT ENCODING Possible conversions when elements are added: intset -> listpack listpack -> hashtable intset -> hashtable Note: No conversion happens when elements are deleted. If all elements are deleted and then added again, the set is deleted and recreated, thus implicitly converted to a smaller encoding.	2022-11-09 19:50:07 +02:00
Oran Agra	ccaef5c923	diskless master, avoid bgsave child hung when fork parent crashes (#11463 ) During a diskless sync, if the master main process crashes, the child would have hung in `write`. This fix closes the read fd on the child side, so that if the parent crashes, the child will get a write error and exit. This change also fixes disk-based replication, BGSAVE and AOFRW. In that case the child wouldn't have been hang, it would have just kept running until done which may be pointless. There is a certain degree of risk here. in case there's a BGSAVE child that could maybe succeed and the parent dies for some reason, the old code would have let the child keep running and maybe succeed and avoid data loss. On the other hand, if the parent is restarted, it would have loaded an old rdb file (or none), and then the child could reach the end and rename the rdb file (data conflicting with what the parent has), or also have a race with another BGSAVE child that the new parent started. Note that i removed a comment saying a write error will be ignored in the child and handled by the parent (this comment was very old and i don't think relevant).	2022-11-09 10:02:18 +02:00
Meir Shpilraien (Spielrein)	b43f254813	Avoid saving module aux on RDB if no aux data was saved by the module. (#11374 ) ### Background The issue is that when saving an RDB with module AUX data, the module AUX metadata (moduleid, when, ...) is saved to the RDB even though the module did not saved any actual data. This prevent loading the RDB in the absence of the module (although there is no actual data in the RDB that requires the module to be loaded). ### Solution The solution suggested in this PR is that module AUX will be saved on the RDB only if the module actually saved something during `aux_save` function. To support backward compatibility, we introduce `aux_save2` callback that acts the same as `aux_save` with the tiny change of avoid saving the aux field if no data was actually saved by the module. Modules can use the new API to make sure that if they have no data to save, then it will be possible to load the created RDB even without the module. ### Concerns A module may register for the aux load and save hooks just in order to be notified when saving or loading starts or completed (there are better ways to do that, but it still possible that someone used it). However, if a module didn't save a single field in the save callback, it means it's not allowed to read in the read callback, since it has no way to distinguish between empty and non-empty payloads. furthermore, it means that if the module did that, it must never change it, since it'll break compatibility with it's old RDB files, so this is really not a valid use case. Since some modules (ones who currently save one field indicating an empty payload), need to know if saving an empty payload is valid, and if Redis is gonna ignore an empty payload or store it, we opted to add a new API (rather than change behavior of an existing API and expect modules to check the redis version) ### Technical Details To avoid saving AUX data on RDB, we change the code to first save the AUX metadata (moduleid, when, ...) into a temporary buffer. The buffer is then flushed to the rio at the first time the module makes a write operation inside the `aux_save` function. If the module saves nothing (and `aux_save2` was used), the entire temporary buffer is simply dropped and no data about this AUX field is saved to the RDB. This make it possible to load the RDB even in the absence of the module. Test was added to verify the fix.	2022-10-18 19:45:46 +03:00
filipe oliveira	29380ff77d	optimizing d2string() and addReplyDouble() with grisu2: double to string conversion based on Florian Loitsch's Grisu-algorithm (#10587 ) All commands / use cases that heavily rely on double to a string representation conversion, (e.g. meaning take a double-precision floating-point number like 1.5 and return a string like "1.5" ), could benefit from a performance boost by swapping snprintf(buf,len,"%.17g",value) by the equivalent [fpconv_dtoa](https://github.com/night-shift/fpconv) or any other algorithm that ensures 100% coverage of conversion. This is a well-studied topic and Projects like MongoDB. RedPanda, PyTorch leverage libraries ( fmtlib ) that use the optimized double to string conversion underneath. The positive impact can be substantial. This PR uses the grisu2 approach ( grisu explained on https://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf section 5 ). test suite changes: Despite being compatible, in some cases it produces a different result from printf, and some tests had to be adjusted. one case is that `%.17g` (which means %e or %f which ever is shorter), chose to use `5000000000` instead of 5e+9, which sounds like a bug? In other cases, we changed TCL to compare numbers instead of strings to ignore minor rounding issues (`expr 0.8 == 0.79999999999999999`)	2022-10-15 12:17:41 +03:00
Oran Agra	ac1cc5a6e1	Trim rdb loading code for pre-release formats (#11058 ) The initial module format introduced in 4.0 RC1 and was changed in RC2 The initial function format introduced in 7.0 RC1 and changed in RC3	2022-08-15 21:41:44 +03:00
Binbin	4505eb1821	errno cleanup around rdbLoad (#11042 ) This is an addition to #11039, which cleans up rdbLoad* related errno. Remove the errno print from the outer message (may be invalid since errno may have been overwritten). Our aim should be the code that detects the error and knows which system call triggered it, is the one to print errno, and not the code way up above (in some cases a result of a logical error and not a system one). Remove the code to update errno in rdbLoadRioWithLoadingCtx, signature check and the rdb version check, in these cases, we do print the error message. The caller dose not have the specific logic for handling EINVAL. Small fix around rdb-preamble AOF: A truncated RDB is considered a failure, not handled the same as a truncated AOF file.	2022-08-04 10:47:37 +03:00
Moti Cohen	1aa6c4ab92	Adding parentheses and do-while(0) to macros (#11080 ) Fixing few macros that doesn't follows most basic safety conventions which is wrapping any usage of passed variable with parentheses and if written more than one command, then wrap it with do-while(0) (or parentheses).	2022-08-03 19:38:08 +03:00
Valentino Geron	2029976dc3	Optimization: moduleLoadString should try to create embedded string if not plain (#11050 ) Before this change, if the module has an embedded string, then uses RedisModule_SaveString and RedisModule_LoadString, the result would be a raw string instead of an embedded string. Now the `RDB_LOAD_ENC` flag to `moduleLoadString` only affects integer encoding, but not embedded strings (which still hold an sds in the robj ptr, so they're actually still raw strings for anyone who reads them). Co-authored-by: Valentino Geron <valentino@redis.com>	2022-07-31 17:29:59 +03:00
Binbin	00097bf4aa	Change the return value of rdbLoad function to enums (#11039 ) The reason we do this is because in #11036, we added error log message when failing to open RDB file for reading. In loadDdataFromDisk we call rdbLoad and also check errno, now the logging corrupts errno (reported in alpine daily). It is not safe to rely on errno as we do today, so we change the return value of rdbLoad function to enums, like we have when loading an AOF.	2022-07-26 15:13:13 +03:00
YaacovHazan	33bd8fb981	Add error log message when failing to open RDB file for reading (#11036 ) When failing to open the rdb file, there was no specific error printed (unlike a corrupt file), so it was not clear what failed and why.	2022-07-25 10:09:58 +03:00
ranshid	eacca729a5	Avoid using unsafe C functions (#10932 ) replace use of: sprintf --> snprintf strcpy/strncpy --> redis_strlcpy strcat/strncat --> redis_strlcat why are we making this change? Much of the code uses some unsafe variants or deprecated buffer handling functions. While most cases are probably not presenting any issue on the known path programming errors and unterminated strings might lead to potential buffer overflows which are not covered by tests. As part of this PR we change 1. added implementation for redis_strlcpy and redis_strlcat based on the strl implementation: https://linux.die.net/man/3/strl 2. change all occurrences of use of sprintf with use of snprintf 3. change occurrences of use of strcpy/strncpy with redis_strlcpy 4. change occurrences of use of strcat/strncat with redis_strlcat 5. change the behavior of ll2string/ull2string/ld2string so that it will always place null termination ('\0') on the output buffer in the first index. this was done in order to make the use of these functions more safe in cases were the user will not check the output returned by them (for example in rdbRemoveTempFile) 6. we added a compiler directive to issue a deprecation error in case a use of sprintf/strcpy/strcat is found during compilation which will result in error during compile time. However keep in mind that since the deprecation attribute is not supported on all compilers, this is expected to fail during push workflows. NOTE: while this is only an initial milestone. We might also consider using the *_s implementation provided by the C11 Extensions (however not yet widly supported). I would also suggest to start looking at static code analyzers to track unsafe use cases. For example LLVM clang checker supports security.insecureAPI.DeprecatedOrUnsafeBufferHandling which can help locate unsafe function usage. https://clang.llvm.org/docs/analyzer/checkers.html#security-insecureapi-deprecatedorunsafebufferhandling-c The main reason not to onboard it at this stage is that the alternative excepted by clang is to use the C11 extensions which are not always supported by stdlib.	2022-07-18 10:56:26 +03:00
Binbin	8203461120	Fix some outdated comments and some typo (#10946 ) * Fix some outdated comments and some typo	2022-07-06 20:31:59 -07:00
Tian	99a425d0f3	Fsync directory while persisting AOF manifest, RDB file, and config file (#10737 ) The current process to persist files is `write` the data, `fsync` and `rename` the file, but a underlying problem is that the rename may be lost when a sudden crash like power outage and the directory hasn't been persisted. The article [Ensuring data reaches disk](https://lwn.net/Articles/457667/) mentions a safe way to update file should be: 1. create a new temp file (on the same file system!) 2. write data to the temp file 3. fsync() the temp file 4. rename the temp file to the appropriate name 5. fsync() the containing directory This commit handles CONFIG REWRITE, AOF manifest, and RDB file (both for persistence, and the one the replica gets from the master). It doesn't handle (yet), ACL SAVE and Cluster configs, since these don't yet follow this pattern.	2022-06-20 19:17:23 +03:00
Binbin	ae9764ea0a	Increment the stat_rdb_saves counter in SAVE command (#10827 ) Currently, we only increment stat_rdb_saves in rdbSaveBackground, we should also increment it in the SAVE command. We concluded there's no need to increment when: 1. saving a base file for an AOF 2. when saving an empty rdb file to delete an old one 3. when saving to sockets (not creating a persistence / snapshot file) The stat counter was introduced in #10178 * fix a wrong comment in startSaving	2022-06-07 17:38:31 +03:00
DarrenJiang13	bb1de082ea	Adds isolated netstats for replication. (#10062 ) The amount of `server.stat_net_output_bytes/server.stat_net_input_bytes` is actually the sum of replication flow and users' data flow. It may cause confusions like this: "Why does my server get such a large output_bytes while I am doing nothing? ". After discussions and revisions, now here is the change about what this PR brings (final version before merge): - 2 server variables to count the network bytes during replication, including fullsync and propagate bytes. - `server.stat_net_repl_output_bytes`/`server.stat_net_repl_input_bytes` - 3 info fields to print the input and output of repl bytes and instantaneous value of total repl bytes. - `total_net_repl_input_bytes` / `total_net_repl_output_bytes` - `instantaneous_repl_total_kbps` - 1 new API `rioCheckType()` to check the type of rio. So we can use this to distinguish between diskless and diskbased replication - 2 new counting items to keep network statistics consistent between master and slave - rdb portion during diskless replica. in `rdbLoadProgressCallback()` - first line of the full sync payload. in `readSyncBulkPayload()` Co-authored-by: Oran Agra <oran@redislabs.com>	2022-05-31 08:07:33 +03:00
Oran Agra	0c4733c8d7	Optimize integer zset scores in listpack (converting to string and back) (#10486 ) When the score doesn't have fractional part, and can be stored as an integer, we use the integer capabilities of listpack to store it, rather than convert it to string. This already existed before this PR (lpInsert dose that conversion implicitly). But to do that, we would have first converted the score from double to string (calling `d2string`), then pass the string to `lpAppend` which identified it as being an integer and convert it back to an int. Now, instead of converting it to a string, we store it using lpAppendInteger`. Unrelated: --- * Fix the double2ll range check (negative and positive ranges, and also the comparison operands were slightly off. but also, the range could be made much larger, see comment). * Unify the double to string conversion code in rdb.c with the one in util.c * Small optimization in lpStringToInt64, don't attempt to convert strings that are obviously too long. Benchmark; --- Up to 20% improvement in certain tight loops doing zzlInsert with large integers. (if listpack is pre-allocated to avoid realloc, and insertion is sorted from largest to smaller)	2022-04-17 17:16:46 +03:00
Oran Agra	95050f2683	solve corrupt dump fuzzer crash in streams (#10579 ) we had a panic in streamLastValidID when the stream metadata said it's not empty, but the rax is empty.	2022-04-14 08:29:35 +03:00
Meir Shpilraien (Spielrein)	ae020e3d56	Functions: Move library meta data to be part of the library payload. (#10500 ) ## Move library meta data to be part of the library payload. Following the discussion on https://github.com/redis/redis/issues/10429 and the intention to add (in the future) library versioning support, we believe that the entire library metadata (like name and engine) should be part of the library payload and not provided by the `FUNCTION LOAD` command. The reasoning behind this is that the programmer who developed the library should be the one who set those values (name, engine, and in the future also version). It is not the responsibility of the admin who load the library into the database. The PR moves all the library metadata (engine and function name) to be part of the library payload. The metadata needs to be provided on the first line of the payload using the shebang format (`#!<engine> name=<name>`), example: ```lua #!lua name=test redis.register_function('foo', function() return 1 end) ``` The above script will run on the Lua engine and will create a library called `test`. ## API Changes (compare to 7.0 rc2) * `FUNCTION LOAD` command was change and now it simply gets the library payload and extract the engine and name from the payload. In addition, the command will now return the function name which can later be used on `FUNCTION DELETE` and `FUNCTION LIST`. * The description field was completely removed from`FUNCTION LOAD`, and `FUNCTION LIST` ## Breaking Changes (compare to 7.0 rc2) * Library description was removed (we can re-add it in the future either as part of the shebang line or an additional line). * Loading an AOF file that was generated by either 7.0 rc1 or 7.0 rc2 will fail because the old command syntax is invalid. ## Notes * Loading an RDB file that was generated by rc1 / rc2 is supported, Redis will automatically add the shebang to the libraries payloads (we can probably delete that code after 7.0.3 or so since there's no need to keep supporting upgrades from an RC build).	2022-04-05 10:27:24 +03:00
Itamar Haber	c81c7f51c3	Add stream consumer group lag tracking and reporting (#9127 ) Adds the ability to track the lag of a consumer group (CG), that is, the number of entries yet-to-be-delivered from the stream. The proposed constant-time solution is in the spirit of "best-effort." Partially addresses #8737. ## Description of approach We add a new "entries_added" property to the stream. This starts at 0 for a new stream and is incremented by 1 with every `XADD`. It is essentially an all-time counter of the entries added to the stream. Given the stream's length and this counter value, we can trivially find the logical "entries_added" counter of the first ID if and only if the stream is contiguous. A fragmented stream contains one or more tombstones generated by `XDEL`s. The new "xdel_max_id" stream property tracks the latest tombstone. The CG also tracks its last delivered ID's as an "entries_read" counter and increments it independently when delivering new messages, unless the this read counter is invalid (-1 means invalid offset). When the CG's counter is available, the reported lag is the difference between added and read counters. Lastly, this also adds a "first_id" field to the stream structure in order to make looking it up cheaper in most cases. ## Limitations There are two cases in which the mechanism isn't able to track the lag. In these cases, `XINFO` replies with `null` in the "lag" field. The first case is when a CG is created with an arbitrary last delivered ID, that isn't "0-0", nor the first or the last entries of the stream. In this case, it is impossible to obtain a valid read counter (short of an O(N) operation). The second case is when there are one or more tombstones fragmenting the stream's entries range. In both cases, given enough time and assuming that the consumers are active (reading and lacking) and advancing, the CG should be able to catch up with the tip of the stream and report zero lag. Once that's achieved, lag tracking would resume as normal (until the next tombstone is set). ## API changes * `XGROUP CREATE` added with the optional named argument `[ENTRIESREAD entries-read]` for explicitly specifying the new CG's counter. * `XGROUP SETID` added with an optional positional argument `[ENTRIESREAD entries-read]` for specifying the CG's counter. * `XINFO` reports the maximal tombstone ID, the recorded first entry ID, and total number of entries added to the stream. * `XINFO` reports the current lag and logical read counter of CGs. * `XSETID` is an internal command that's used in replication/aof. It has been added with the optional positional arguments `[ENTRIESADDED entries-added] [MAXDELETEDID max-deleted-entry-id]` for propagating the CG's offset and maximal tombstone ID of the stream. ## The generic unsolved problem The current stream implementation doesn't provide an efficient way to obtain the approximate/exact size of a range of entries. While it could've been nice to have that ability (#5813) in general, let alone specifically in the context of CGs, the risk and complexities involved in such implementation are in all likelihood prohibitive. ## A refactoring note The `streamGetEdgeID` has been refactored to accommodate both the existing seek of any entry as well as seeking non-deleted entries (the addition of the `skip_tombstones` argument). Furthermore, this refactoring also migrated the seek logic to use the `streamIterator` (rather than `raxIterator`) that was, in turn, extended with the `skip_tombstones` Boolean struct field to control the emission of these. Co-authored-by: Guy Benoish <guy.benoish@redislabs.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2022-02-23 22:34:58 +02:00
yoav-steinberg	56fa48ffc1	aof rewrite and rdb save counters in info (#10178 ) Add aof_rewrites and rdb_snapshots counters to info. This is useful to figure our if a rewrite or snapshot happened since last check. This was part of the (ongoing) effort to provide a safe backup solution for multipart-aof backups.	2022-02-17 14:32:48 +02:00
chenyang8094	a2f2b6f5b1	Modify AOF preamble related logs, and change the RDB aux field (#10283 ) In multi-part aof, We no longer have the concept of `RDB-preamble`, so the related logs should be removed. However, in order to print compatible logs when loading old-style AOFs, we also have to keep the relevant code. Additionally, when saving an RDB, change the RDB aux field from "aof-preamble" to "aof-base".	2022-02-11 18:47:03 +02:00
Meir Shpilraien (Spielrein)	9c60292250	Added functions support to redis-check-rdb (#10154 ) The PR added the missing verification for functions on redis-check-rdb. The verification only verifies the rdb structure and does not try to load the functions code and verify more advance checks (like compilation of the function code).	2022-01-20 11:10:33 +02:00
chenyang8094	e9bff7978a	Always create base AOF file when redis start from empty. (#10102 ) Force create a BASE file (use a foreground `rewriteAppendOnlyFile`) when redis starts from an empty data set and `appendonly` is yes. The reasoning is that normally, after redis is running for some time, and the AOF has gone though a few rewrites, there's always a base rdb file. and the scenario where the base file is missing, is kinda rare (happens only at empty startup), so this change normalizes it. But more importantly, there are or could be some complex modules that are started with some configuration, when they create persistence they write that configuration to RDB AUX fields, so that can can always know with which configuration the persistence file they're loading was created (could be critical). there is (was) one scenario in which they could load their persisted data, and that configuration was missing, and this change fixes it. Add a new module event: REDISMODULE_SUBEVENT_PERSISTENCE_SYNC_AOF_START, similar to REDISMODULE_SUBEVENT_PERSISTENCE_AOF_START which is async. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-13 08:49:26 +02:00

1 2 3 4 5 ...

421 Commits