redis/src/server.h

3570 lines
166 KiB
C
Raw Normal View History

/*
* Copyright (c) 2009-2012, Salvatore Sanfilippo <antirez at gmail dot com>
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Redis nor the names of its contributors may be used
* to endorse or promote products derived from this software without
* specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef __REDIS_H
#define __REDIS_H
#include "fmacros.h"
#include "config.h"
#include "solarisfixes.h"
#include "rio.h"
Implement redisAtomic to replace _Atomic C11 builtin (#7707) Redis 6.0 introduces I/O threads, it is so cool and efficient, we use C11 _Atomic to establish inter-thread synchronization without mutex. But the compiler that must supports C11 _Atomic can compile redis code, that brings a lot of inconvenience since some common platforms can't support by default such as CentOS7, so we want to implement redis atomic type to make it more portable. We have implemented our atomic variable for redis that only has 'relaxed' operations in src/atomicvar.h, so we implement some operations with 'sequentially-consistent', just like the default behavior of C11 _Atomic that can establish inter-thread synchronization. And we replace all uses of C11 _Atomic with redis atomic variable. Our implementation of redis atomic variable uses C11 _Atomic, __atomic or __sync macros if available, it supports most common platforms, and we will detect automatically which feature we use. In Makefile we use a dummy file to detect if the compiler supports C11 _Atomic. Now for gcc, we can compile redis code theoretically if your gcc version is not less than 4.1.2(starts to support __sync_xxx operations). Otherwise, we remove use mutex fallback to implement redis atomic variable for performance and test. You will get compiling errors if your compiler doesn't support all features of above. For cover redis atomic variable tests, we add other CI jobs that build redis on CentOS6 and CentOS7 and workflow daily jobs that run the tests on them. For them, we just install gcc by default in order to cover different compiler versions, gcc is 4.4.7 by default installation on CentOS6 and 4.8.5 on CentOS7. We restore the feature that we can test redis with Helgrind to find data race errors. But you need install Valgrind in the default path configuration firstly before running your tests, since we use macros in helgrind.h to tell Helgrind inter-thread happens-before relationship explicitly for avoiding false positives. Please open an issue on github if you find data race errors relate to this commit. Unrelated: - Fix redefinition of typedef 'RedisModuleUserChangedFunc' For some old version compilers, they will report errors or warnings, if we re-define function type.
2020-09-17 15:01:45 +02:00
#include "atomicvar.h"
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <limits.h>
#include <unistd.h>
#include <errno.h>
#include <inttypes.h>
2010-07-05 20:14:48 +02:00
#include <pthread.h>
2010-12-09 17:10:21 +01:00
#include <syslog.h>
2011-03-29 17:51:15 +02:00
#include <netinet/in.h>
2019-11-11 14:06:08 +01:00
#include <sys/socket.h>
#include <lua.h>
#include <signal.h>
Added INFO LATENCYSTATS section: latency by percentile distribution/latency by cumulative distribution of latencies (#9462) # Short description The Redis extended latency stats track per command latencies and enables: - exporting the per-command percentile distribution via the `INFO LATENCYSTATS` command. **( percentile distribution is not mergeable between cluster nodes ).** - exporting the per-command cumulative latency distributions via the `LATENCY HISTOGRAM` command. Using the cumulative distribution of latencies we can merge several stats from different cluster nodes to calculate aggregate metrics . By default, the extended latency monitoring is enabled since the overhead of keeping track of the command latency is very small. If you don't want to track extended latency metrics, you can easily disable it at runtime using the command: - `CONFIG SET latency-tracking no` By default, the exported latency percentiles are the p50, p99, and p999. You can alter them at runtime using the command: - `CONFIG SET latency-tracking-info-percentiles "0.0 50.0 100.0"` ## Some details: - The total size per histogram should sit around 40 KiB. We only allocate those 40KiB when a command was called for the first time. - With regards to the WRITE overhead As seen below, there is no measurable overhead on the achievable ops/sec or full latency spectrum on the client. Including also the measured redis-benchmark for unstable vs this branch. - We track from 1 nanosecond to 1 second ( everything above 1 second is considered +Inf ) ## `INFO LATENCYSTATS` exposition format - Format: `latency_percentiles_usec_<CMDNAME>:p0=XX,p50....` ## `LATENCY HISTOGRAM [command ...]` exposition format Return a cumulative distribution of latencies in the format of a histogram for the specified command names. The histogram is composed of a map of time buckets: - Each representing a latency range, between 1 nanosecond and roughly 1 second. - Each bucket covers twice the previous bucket's range. - Empty buckets are not printed. - Everything above 1 sec is considered +Inf. - At max there will be log2(1000000000)=30 buckets We reply a map for each command in the format: `<command name> : { `calls`: <total command calls> , `histogram` : { <bucket 1> : latency , < bucket 2> : latency, ... } }` Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-05 13:01:05 +01:00
#include "hdr_histogram.h"
#ifdef HAVE_LIBSYSTEMD
#include <systemd/sd-daemon.h>
#endif
#ifndef static_assert
#define static_assert(expr, lit) extern char __static_assert_failure[(expr) ? 1:-1]
#endif
typedef long long mstime_t; /* millisecond time type. */
typedef long long ustime_t; /* microsecond time type. */
#include "ae.h" /* Event driven programming library */
#include "sds.h" /* Dynamic safe strings */
#include "dict.h" /* Hash tables */
#include "adlist.h" /* Linked lists */
#include "zmalloc.h" /* total memory usage aware version of malloc/free */
#include "anet.h" /* Networking the easy way */
#include "ziplist.h" /* Compact list data structure */
#include "intset.h" /* Compact integer set structure */
#include "version.h" /* Version macro */
#include "util.h" /* Misc functions useful in many places */
#include "latency.h" /* Latency monitor API */
2016-01-29 12:08:10 +01:00
#include "sparkline.h" /* ASCII graphs API */
#include "quicklist.h" /* Lists are encoded as linked lists of
N-elements flat arrays */
#include "rax.h" /* Radix tree */
#include "connection.h" /* Connection abstraction */
#define REDISMODULE_CORE 1
#include "redismodule.h" /* Redis modules API defines. */
/* Following includes allow test functions to be called from Redis main() */
#include "zipmap.h"
#include "sha1.h"
#include "endianconv.h"
#include "crc64.h"
A better approach for COMMAND INFO for movablekeys commands (#8324) Fix #7297 The problem: Today, there is no way for a client library or app to know the key name indexes for commands such as ZUNIONSTORE/EVAL and others with "numkeys", since COMMAND INFO returns no useful info for them. For cluster-aware redis clients, this requires to 'patch' the client library code specifically for each of these commands or to resolve each execution of these commands with COMMAND GETKEYS. The solution: Introducing key specs other than the legacy "range" (first,last,step) The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates the command doesn't take any key arguments or may contain one or more key-specs, each one may leads to the discovery of 0 or more key arguments. A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will obviously remain unchanged. A client that supports this key-specs feature needs only to look at the key-specs array. If it finds an unrecognized spec, it must resort to using COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no need to use GETKEYS. Each spec is an array of arguments, first one is the spec name, the second is an array of flags, and the third is an array containing details about the spec (specific meaning for each spec type) The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write. clients should ignore any unfamiliar flags. In order to easily find the positions of keys in a given array of args we introduce keys specs. There are two logical steps of key specs: 1. `start_search`: Given an array of args, indicate where we should start searching for keys 2. `find_keys`: Given the output of start_search and an array of args, indicate all possible indices of keys. ### start_search step specs - `index`: specify an argument index explicitly - `index`: 0 based index (1 means the first command argument) - `keyword`: specify a string to match in `argv`. We should start searching for keys just after the keyword appears. - `keyword`: the string to search for - `start_search`: an index from which to start the keyword search (can be negative, which means to search from the end) Examples: - `SET` has start_search of type `index` with value `1` - `XREAD` has start_search of type `keyword` with value `[“STREAMS”,1]` - `MIGRATE` has start_search of type `keyword` with value `[“KEYS”,-2]` ### find_keys step specs - `range`: specify `[count, step, limit]`. - `lastkey`: index of the last key. relative to the index returned from begin_search. -1 indicating till the last argument, -2 one before the last - `step`: how many args should we skip after finding a key, in order to find the next one - `limit`: if count is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means ½ of the remaining args, 3 means ⅓, and so on. - “keynum”: specify `[keynum_index, first_key_index, step]`. - `keynum_index`: is relative to the return of the `start_search` spec. - `first_key_index`: is relative to `keynum_index`. - `step`: how many args should we skip after finding a key, in order to find the next one Examples: - `SET` has `range` of `[0,1,0]` - `MSET` has `range` of `[-1,2,0]` - `XREAD` has `range` of `[-1,1,2]` - `ZUNION` has `start_search` of type `index` with value `1` and `find_keys` of type `keynum` with value `[0,1,1]` - `AI.DAGRUN` has `start_search` of type `keyword` with value `[“LOAD“,1]` and `find_keys` of type `keynum` with value `[0,1,1]` (see https://oss.redislabs.com/redisai/master/commands/#aidagrun) Note: this solution is not perfect as the module writers can come up with anything, but at least we will be able to find the key args of the vast majority of commands. If one of the above specs can’t describe the key positions, the module writer can always fall back to the `getkeys-api` option. Some keys cannot be found easily (`KEYS` in `MIGRATE`: Imagine the argument for `AUTH` is the string “KEYS” - we will start searching in the wrong index). The guarantee is that the specs may be incomplete (`incomplete` will be specified in the spec to denote that) but we never report false information (assuming the command syntax is correct). For `MIGRATE` we start searching from the end - `startfrom=-1` - and if one of the keys is actually called "keys" we will report only a subset of all keys - hence the `incomplete` flag. Some `incomplete` specs can be completely empty (i.e. UNKNOWN begin_search) which should tell the client that COMMAND GETKEYS (or any other way to get the keys) must be used (Example: For `SORT` there is no way to describe the STORE keyword spec, as the word "store" can appear anywhere in the command). We will expose these key specs in the `COMMAND` command so that clients can learn, on startup, where the keys are for all commands instead of holding hardcoded tables or use `COMMAND GETKEYS` in runtime. Comments: 1. Redis doesn't internally use the new specs, they are only used for COMMAND output. 2. In order to support the current COMMAND INFO format (reply array indices 4, 5, 6) we created a synthetic range, called legacy_range, that, if possible, is built according to the new specs. 3. Redis currently uses only getkeys_proc or the legacy_range to get the keys indices (in COMMAND GETKEYS for example). "incomplete" specs: the command we have issues with are MIGRATE, STRALGO, and SORT for MIGRATE, because the token KEYS, if exists, must be the last token, we can search in reverse. it one of the keys is actually the string "keys" will return just a subset of the keys (hence, it's "incomplete") for SORT and STRALGO we can use this heuristic (the keys can be anywhere in the command) and therefore we added a key spec that is both "incomplete" and of "unknown type" if a client encounters an "incomplete" spec it means that it must find a different way (either COMMAND GETKEYS or have its own parser) to retrieve the keys. please note that all commands, apart from the three mentioned above, have "complete" key specs
2021-09-15 10:10:29 +02:00
/* min/max */
#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))
/* Error codes */
#define C_OK 0
#define C_ERR -1
/* Static server configuration */
#define CONFIG_DEFAULT_HZ 10 /* Time interrupt calls/sec. */
2015-07-27 09:41:48 +02:00
#define CONFIG_MIN_HZ 1
#define CONFIG_MAX_HZ 500
#define MAX_CLIENTS_PER_CLOCK_TICK 200 /* HZ is adapted based on that. */
2015-07-27 09:41:48 +02:00
#define CRON_DBS_PER_CALL 16
#define NET_MAX_WRITES_PER_EVENT (1024*64)
#define PROTO_SHARED_SELECT_CMDS 10
#define OBJ_SHARED_INTEGERS 10000
#define OBJ_SHARED_BULKHDR_LEN 32
#define OBJ_SHARED_HDR_STRLEN(_len_) (((_len_) < 10) ? 4 : 5) /* see shared.mbulkhdr etc. */
#define LOG_MAX_LEN 1024 /* Default maximum length of syslog messages.*/
2015-07-27 09:41:48 +02:00
#define AOF_REWRITE_ITEMS_PER_CMD 64
#define AOF_ANNOTATION_LINE_MAX_LEN 1024
2015-07-27 09:41:48 +02:00
#define CONFIG_AUTHPASS_MAX_LEN 512
#define CONFIG_RUN_ID_SIZE 40
#define RDB_EOF_MARK_SIZE 40
#define CONFIG_REPL_BACKLOG_MIN_SIZE (1024*16) /* 16k */
#define CONFIG_BGSAVE_RETRY_DELAY 5 /* Wait a few secs before trying again. */
#define CONFIG_DEFAULT_PID_FILE "/var/run/redis.pid"
#define CONFIG_DEFAULT_BINDADDR_COUNT 2
#define CONFIG_DEFAULT_BINDADDR { "*", "-::*" }
#define NET_HOST_STR_LEN 256 /* Longest valid hostname */
2015-07-27 09:41:48 +02:00
#define NET_IP_STR_LEN 46 /* INET6_ADDRSTRLEN is 46, but we need to be sure */
#define NET_ADDR_STR_LEN (NET_IP_STR_LEN+32) /* Must be enough for ip:port */
#define NET_HOST_PORT_STR_LEN (NET_HOST_STR_LEN+32) /* Must be enough for hostname:port */
2015-07-27 09:41:48 +02:00
#define CONFIG_BINDADDR_MAX 16
#define CONFIG_MIN_RESERVED_FDS 32
#define CONFIG_DEFAULT_PROC_TITLE_TEMPLATE "{title} {listen-addr} {server-mode}"
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
/* Bucket sizes for client eviction pools. Each bucket stores clients with
* memory usage of up to twice the size of the bucket below it. */
#define CLIENT_MEM_USAGE_BUCKET_MIN_LOG 15 /* Bucket sizes start at up to 32KB (2^15) */
#define CLIENT_MEM_USAGE_BUCKET_MAX_LOG 33 /* Bucket for largest clients: sizes above 4GB (2^32) */
#define CLIENT_MEM_USAGE_BUCKETS (1+CLIENT_MEM_USAGE_BUCKET_MAX_LOG-CLIENT_MEM_USAGE_BUCKET_MIN_LOG)
2013-08-06 12:55:49 +02:00
#define ACTIVE_EXPIRE_CYCLE_SLOW 0
#define ACTIVE_EXPIRE_CYCLE_FAST 1
/* Children process will exit with this status code to signal that the
* process terminated without an error: this is useful in order to kill
* a saving child (RDB or AOF one), without triggering in the parent the
* write protection that is normally turned on on write errors.
* Usually children that are terminated with SIGUSR1 will exit with this
* special code. */
#define SERVER_CHILD_NOERROR_RETVAL 255
/* Reading copy-on-write info is sometimes expensive and may slow down child
* processes that report it continuously. We measure the cost of obtaining it
* and hold back additional reading based on this factor. */
#define CHILD_COW_DUTY_CYCLE 100
/* Instantaneous metrics tracking. */
2015-07-27 09:41:48 +02:00
#define STATS_METRIC_SAMPLES 16 /* Number of samples per metric. */
#define STATS_METRIC_COMMAND 0 /* Number of commands executed. */
#define STATS_METRIC_NET_INPUT 1 /* Bytes read to network .*/
#define STATS_METRIC_NET_OUTPUT 2 /* Bytes written to network. */
#define STATS_METRIC_COUNT 3
/* Protocol and I/O related defines */
2015-07-27 09:41:48 +02:00
#define PROTO_IOBUF_LEN (1024*16) /* Generic I/O buffer size */
#define PROTO_REPLY_CHUNK_BYTES (16*1024) /* 16k output buffer */
#define PROTO_INLINE_MAX_SIZE (1024*64) /* Max size of inline reads */
#define PROTO_MBULK_BIG_ARG (1024*32)
#define PROTO_RESIZE_THRESHOLD (1024*32) /* Threshold for determining whether to resize query buffer */
introduce dynamic client reply buffer size - save memory on idle clients (#9822) Current implementation simple idle client which serves no traffic still use ~17Kb of memory. this is mainly due to a fixed size reply buffer currently set to 16kb. We have encountered some cases in which the server operates in a low memory environments. In such cases a user who wishes to create large connection pools to support potential burst period, will exhaust a large amount of memory to maintain connected Idle clients. Some users may choose to "sacrifice" performance in order to save memory. This commit introduce a dynamic mechanism to shrink and expend the client reply buffer based on periodic observed peak. the algorithm works as follows: 1. each time a client reply buffer has been fully written, the last recorded peak is updated: new peak = MAX( last peak, current written size) 2. during clients cron we check for each client if the last observed peak was: a. matching the current buffer size - in which case we expend (resize) the buffer size by 100% b. less than half the buffer size - in which case we shrink the buffer size by 50% 3. In any case we will **not** resize the buffer in case: a. the current buffer peak is less then the current buffer usable size and higher than 1/2 the current buffer usable size b. the value of (current buffer usable size/2) is less than 1Kib c. the value of (current buffer usable size*2) is larger than 16Kib 4. the peak value is reset to the current buffer position once every **5** seconds. we maintain a new field in the client structure (buf_peak_last_reset_time) which is used to keep track of how long it passed since the last buffer peak reset. ### **Interface changes:** **CIENT LIST** - now contains 2 new extra fields: rbs= < the current size in bytes of the client reply buffer > rbp=< the current value in bytes of the last observed buffer peak position > **INFO STATS** - now contains 2 new statistics: reply_buffer_shrinks = < total number of buffer shrinks performed > reply_buffer_expends = < total number of buffer expends performed > Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Yoav Steinberg <yoav@redislabs.com>
2022-02-22 10:19:38 +01:00
#define PROTO_REPLY_MIN_BYTES (1024) /* the lower limit on reply buffer size */
#define REDIS_AUTOSYNC_BYTES (1024*1024*4) /* Sync file every 4MB. */
2015-07-27 09:41:48 +02:00
introduce dynamic client reply buffer size - save memory on idle clients (#9822) Current implementation simple idle client which serves no traffic still use ~17Kb of memory. this is mainly due to a fixed size reply buffer currently set to 16kb. We have encountered some cases in which the server operates in a low memory environments. In such cases a user who wishes to create large connection pools to support potential burst period, will exhaust a large amount of memory to maintain connected Idle clients. Some users may choose to "sacrifice" performance in order to save memory. This commit introduce a dynamic mechanism to shrink and expend the client reply buffer based on periodic observed peak. the algorithm works as follows: 1. each time a client reply buffer has been fully written, the last recorded peak is updated: new peak = MAX( last peak, current written size) 2. during clients cron we check for each client if the last observed peak was: a. matching the current buffer size - in which case we expend (resize) the buffer size by 100% b. less than half the buffer size - in which case we shrink the buffer size by 50% 3. In any case we will **not** resize the buffer in case: a. the current buffer peak is less then the current buffer usable size and higher than 1/2 the current buffer usable size b. the value of (current buffer usable size/2) is less than 1Kib c. the value of (current buffer usable size*2) is larger than 16Kib 4. the peak value is reset to the current buffer position once every **5** seconds. we maintain a new field in the client structure (buf_peak_last_reset_time) which is used to keep track of how long it passed since the last buffer peak reset. ### **Interface changes:** **CIENT LIST** - now contains 2 new extra fields: rbs= < the current size in bytes of the client reply buffer > rbp=< the current value in bytes of the last observed buffer peak position > **INFO STATS** - now contains 2 new statistics: reply_buffer_shrinks = < total number of buffer shrinks performed > reply_buffer_expends = < total number of buffer expends performed > Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Yoav Steinberg <yoav@redislabs.com>
2022-02-22 10:19:38 +01:00
#define REPLY_BUFFER_DEFAULT_PEAK_RESET_TIME 5000 /* 5 seconds */
2015-07-27 09:41:48 +02:00
/* When configuring the server eventloop, we setup it so that the total number
* of file descriptors we can handle are server.maxclients + RESERVED_FDS +
* a few more to stay safe. Since RESERVED_FDS defaults to 32, we add 96
* in order to make sure of not over provisioning more than 128 fds. */
#define CONFIG_FDSET_INCR (CONFIG_MIN_RESERVED_FDS+96)
2011-10-31 11:13:28 +01:00
/* OOM Score Adjustment classes. */
#define CONFIG_OOM_MASTER 0
#define CONFIG_OOM_REPLICA 1
#define CONFIG_OOM_BGCHILD 2
#define CONFIG_OOM_COUNT 3
extern int configOOMScoreAdjValuesDefaults[CONFIG_OOM_COUNT];
/* Hash table parameters */
2015-07-27 09:41:48 +02:00
#define HASHTABLE_MIN_FILL 10 /* Minimal hash table fill 10% */
Limit the main db and expires dictionaries to expand (#7954) As we know, redis may reject user's requests or evict some keys if used memory is over maxmemory. Dictionaries expanding may make things worse, some big dictionaries, such as main db and expires dict, may eat huge memory at once for allocating a new big hash table and be far more than maxmemory after expanding. There are related issues: #4213 #4583 More details, when expand dict in redis, we will allocate a new big ht[1] that generally is double of ht[0], The size of ht[1] will be very big if ht[0] already is big. For db dict, if we have more than 64 million keys, we need to cost 1GB for ht[1] when dict expands. If the sum of used memory and new hash table of dict needed exceeds maxmemory, we shouldn't allow the dict to expand. Because, if we enable keys eviction, we still couldn't add much more keys after eviction and rehashing, what's worse, redis will keep less keys when redis only remains a little memory for storing new hash table instead of users' data. Moreover users can't write data in redis if disable keys eviction. What this commit changed ? Add a new member function expandAllowed for dict type, it provide a way for caller to allow expand or not. We expose two parameters for this function: more memory needed for expanding and dict current load factor, users can implement a function to make a decision by them. For main db dict and expires dict type, these dictionaries may be very big and cost huge memory for expanding, so we implement a judgement function: we can stop dict to expand provisionally if used memory will be over maxmemory after dict expands, but to guarantee the performance of redis, we still allow dict to expand if dict load factor exceeds the safe load factor. Add test cases to verify we don't allow main db to expand when left memory is not enough, so that avoid keys eviction. Other changes: For new hash table size when expand. Before this commit, the size is that double used of dict and later _dictNextPower. Actually we aim to control a dict load factor between 0.5 and 1.0. Now we replace *2 with +1, since the first check is that used >= size, the outcome of before will usually be the same as _dictNextPower(used+1). The only case where it'll differ is when dict_can_resize is false during fork, so that later the _dictNextPower(used*2) will cause the dict to jump to *4 (i.e. _dictNextPower(1025*2) will return 4096). Fix rehash test cases due to changing algorithm of new hash table size when expand.
2020-12-06 10:53:04 +01:00
#define HASHTABLE_MAX_LOAD_FACTOR 1.618 /* Maximum hash table load factor. */
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
/* Command flags. Please check the definition of struct redisCommand in this file
* for more information about the meaning of every flag. */
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
#define CMD_WRITE (1ULL<<0)
#define CMD_READONLY (1ULL<<1)
#define CMD_DENYOOM (1ULL<<2)
2019-01-23 16:47:29 +01:00
#define CMD_MODULE (1ULL<<3) /* Command exported by module. */
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
#define CMD_ADMIN (1ULL<<4)
#define CMD_PUBSUB (1ULL<<5)
#define CMD_NOSCRIPT (1ULL<<6)
Add command tips to COMMAND DOCS (#10104) Adding command tips (see https://redis.io/topics/command-tips) to commands. Breaking changes: 1. Removed the "random" and "sort_for_script" flags. They are now command tips. (this isn't affecting redis behavior since #9812, but could affect some client applications that's relying on COMMAND command flags) Summary of changes: 1. add BLOCKING flag (new flag) for all commands that could block. The ACL category with the same name is now implicit. 2. move RANDOM flag to a `nondeterministic_output` tip 3. move SORT_FOR_SCRIPT flag to `nondeterministic_output_order` tip 3. add REQUEST_POLICY and RESPONSE_POLICY where appropriate as documented in the tips 4. deprecate (ignore) the `random` flag for RM_CreateCommand Other notes: 1. Proxies need to send `RANDOMKEY` to all shards and then select one key randomly. The other option is to pick a random shard and transfer `RANDOMKEY `to it, but that scheme fails if this specific shard is empty 2. Remove CMD_RANDOM from `XACK` (i.e. XACK does not have RANDOM_OUTPUT) It was added in 9e4fb96ca12476b1c7468b143efca86b478bfb4a, I guess by mistake. Also from `(P)EXPIRETIME` (new command, was flagged "random" by mistake). 3. Add `nondeterministic_output` to `OBJECT ENCODING` (for the same reason `XTRIM` has it: the reply may differ depending on the internal representation in memory) 4. RANDOM on `HGETALL` was wrong (there due to a limitation of the old script sorting logic), now it's `nondeterministic_output_order` 5. Unrelated: Hide CMD_PROTECTED from COMMAND
2022-01-20 10:32:11 +01:00
#define CMD_BLOCKING (1ULL<<8) /* Has potential to block. */
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
#define CMD_LOADING (1ULL<<9)
#define CMD_STALE (1ULL<<10)
#define CMD_SKIP_MONITOR (1ULL<<11)
#define CMD_SKIP_SLOWLOG (1ULL<<12)
#define CMD_ASKING (1ULL<<13)
#define CMD_FAST (1ULL<<14)
#define CMD_NO_AUTH (1ULL<<15)
#define CMD_MAY_REPLICATE (1ULL<<16)
#define CMD_SENTINEL (1ULL<<17)
#define CMD_ONLY_SENTINEL (1ULL<<18)
#define CMD_NO_MANDATORY_KEYS (1ULL<<19)
#define CMD_PROTECTED (1ULL<<20)
#define CMD_MODULE_GETKEYS (1ULL<<21) /* Use the modules getkeys interface. */
#define CMD_MODULE_NO_CLUSTER (1ULL<<22) /* Deny on Redis Cluster. */
Allow most CONFIG SET during loading, block some commands in async-loading (#9878) ## background Till now CONFIG SET was blocked during loading. (In the not so distant past, GET was disallowed too) We recently (not released yet) added an async-loading mode, see #9323, and during that time it'll serve CONFIG SET and any other command. And now we realized (#9770) that some configs, and commands are dangerous during async-loading. ## changes * Allow most CONFIG SET during loading (both on async-loading and normal loading) * Allow CONFIG REWRITE and CONFIG RESETSTAT during loading * Block a few config during loading (`appendonly`, `repl-diskless-load`, and `dir`) * Block a few commands during loading (list below) ## the blocked commands: * SAVE - obviously we don't wanna start a foregreound save during loading 8-) * BGSAVE - we don't mind to schedule one, but we don't wanna fork now * BGREWRITEAOF - we don't mind to schedule one, but we don't wanna fork now * MODULE - we obviously don't wanna unload a module during replication / rdb loading (MODULE HELP and MODULE LIST are not blocked) * SYNC / PSYNC - we're in the middle of RDB loading from master, must not allow sync requests now. * REPLICAOF / SLAVEOF - we're in the middle of replicating, maybe it makes sense to let the user abort it, but he couldn't do that so far, i don't wanna take any risk of bugs due to odd state. * CLUSTER - only allow [HELP, SLOTS, NODES, INFO, MYID, LINKS, KEYSLOT, COUNTKEYSINSLOT, GETKEYSINSLOT, RESET, REPLICAS, COUNT_FAILURE_REPORTS], for others, preserve the status quo ## other fixes * processEventsWhileBlocked had an issue when being nested, this could happen with a busy script during async loading (new), but also in a busy script during AOF loading (old). this lead to a crash in the scenario described in #6988
2021-12-22 13:11:16 +01:00
#define CMD_NO_ASYNC_LOADING (1ULL<<23)
#define CMD_NO_MULTI (1ULL<<24)
Move doc metadata from COMMAND to COMMAND DOCS (#10056) Syntax: `COMMAND DOCS [<command name> ...]` Background: Apparently old version of hiredis (and thus also redis-cli) can't support more than 7 levels of multi-bulk nesting. The solution is to move all the doc related metadata from COMMAND to a new COMMAND DOCS sub-command. The new DOCS sub-command returns a map of commands (not an array like in COMMAND), And the same goes for the `subcommands` field inside it (also contains a map) Besides that, the remaining new fields of COMMAND (hints, key-specs, and sub-commands), are placed in the outer array rather than a nested map. this was done mainly for consistency with the old format. Other changes: --- * Allow COMMAND INFO with no arguments, which returns all commands, so that we can some day deprecated the plain COMMAND (no args) * Reduce the amount of deferred replies from both COMMAND and COMMAND DOCS, especially in the inner loops, since these create many small reply objects, which lead to many small write syscalls and many small TCP packets. To make this easier, when populating the command table, we count the history, args, and hints so we later know their size in advance. Additionally, the movablekeys flag was moved into the flags register. * Update generate-commands-json.py to take the data from both command, it now executes redis-cli directly, instead of taking input from stdin. * Sub-commands in both COMMAND (and COMMAND INFO), and also COMMAND DOCS, show their full name. i.e. CONFIG * GET will be shown as `config|get` rather than just `get`. This will be visible both when asking for `COMMAND INFO config` and COMMAND INFO config|get`, but is especially important for the later. i.e. imagine someone doing `COMMAND INFO slowlog|get config|get` not being able to distinguish between the two items in the array response.
2022-01-11 16:16:16 +01:00
#define CMD_MOVABLE_KEYS (1ULL<<25) /* populated by populateCommandMovableKeys */
#define CMD_ALLOW_BUSY ((1ULL<<26))
#define CMD_MODULE_GETCHANNELS (1ULL<<27) /* Use the modules getchannels interface. */
2019-01-23 16:47:29 +01:00
/* Command flags that describe ACLs categories. */
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
#define ACL_CATEGORY_KEYSPACE (1ULL<<0)
#define ACL_CATEGORY_READ (1ULL<<1)
#define ACL_CATEGORY_WRITE (1ULL<<2)
#define ACL_CATEGORY_SET (1ULL<<3)
#define ACL_CATEGORY_SORTEDSET (1ULL<<4)
#define ACL_CATEGORY_LIST (1ULL<<5)
#define ACL_CATEGORY_HASH (1ULL<<6)
#define ACL_CATEGORY_STRING (1ULL<<7)
#define ACL_CATEGORY_BITMAP (1ULL<<8)
#define ACL_CATEGORY_HYPERLOGLOG (1ULL<<9)
#define ACL_CATEGORY_GEO (1ULL<<10)
#define ACL_CATEGORY_STREAM (1ULL<<11)
#define ACL_CATEGORY_PUBSUB (1ULL<<12)
#define ACL_CATEGORY_ADMIN (1ULL<<13)
#define ACL_CATEGORY_FAST (1ULL<<14)
#define ACL_CATEGORY_SLOW (1ULL<<15)
#define ACL_CATEGORY_BLOCKING (1ULL<<16)
#define ACL_CATEGORY_DANGEROUS (1ULL<<17)
#define ACL_CATEGORY_CONNECTION (1ULL<<18)
#define ACL_CATEGORY_TRANSACTION (1ULL<<19)
#define ACL_CATEGORY_SCRIPTING (1ULL<<20)
New detailed key-spec flags (RO, RW, OW, RM, ACCESS, UPDATE, INSERT, DELETE) (#10122) The new ACL key based permissions in #9974 require the key-specs (#8324) to have more explicit flags rather than just READ and WRITE. See discussion in #10040 This PR defines two groups of flags: One about how redis internally handles the key (mutually-exclusive). The other is about the logical operation done from the user's point of view (3 mutually exclusive write flags, and one read flag, all optional). In both groups, if we can't explicitly flag something as explicit read-only, delete-only, or insert-only, we flag it as `RW` or `UPDATE`. here's the definition from the code: ``` /* Key-spec flags * * -------------- */ /* The following refer what the command actually does with the value or metadata * of the key, and not necessarily the user data or how it affects it. * Each key-spec may must have exaclty one of these. Any operation that's not * distinctly deletion, overwrite or read-only would be marked as RW. */ #define CMD_KEY_RO (1ULL<<0) /* Read-Only - Reads the value of the key, but * doesn't necessarily returns it. */ #define CMD_KEY_RW (1ULL<<1) /* Read-Write - Modifies the data stored in the * value of the key or its metadata. */ #define CMD_KEY_OW (1ULL<<2) /* Overwrite - Overwrites the data stored in * the value of the key. */ #define CMD_KEY_RM (1ULL<<3) /* Deletes the key. */ /* The follwing refer to user data inside the value of the key, not the metadata * like LRU, type, cardinality. It refers to the logical operation on the user's * data (actual input strings / TTL), being used / returned / copied / changed, * It doesn't refer to modification or returning of metadata (like type, count, * presence of data). Any write that's not INSERT or DELETE, would be an UPADTE. * Each key-spec may have one of the writes with or without access, or none: */ #define CMD_KEY_ACCESS (1ULL<<4) /* Returns, copies or uses the user data from * the value of the key. */ #define CMD_KEY_UPDATE (1ULL<<5) /* Updates data to the value, new value may * depend on the old value. */ #define CMD_KEY_INSERT (1ULL<<6) /* Adds data to the value with no chance of, * modification or deletion of existing data. */ #define CMD_KEY_DELETE (1ULL<<7) /* Explicitly deletes some content * from the value of the key. */ ``` Unrelated changes: - generate-command-code.py is only compatible with python3 (modified the shabang) - generate-command-code.py print file on json parsing error - rename `shard_channel` key-spec flag to just `channel`. - add INCOMPLETE flag in input spec of SORT and SORT_RO
2022-01-18 15:00:00 +01:00
/* Key-spec flags *
* -------------- */
/* The following refer what the command actually does with the value or metadata
* of the key, and not necessarily the user data or how it affects it.
* Each key-spec may must have exactly one of these. Any operation that's not
* distinctly deletion, overwrite or read-only would be marked as RW. */
#define CMD_KEY_RO (1ULL<<0) /* Read-Only - Reads the value of the key, but
* doesn't necessarily returns it. */
#define CMD_KEY_RW (1ULL<<1) /* Read-Write - Modifies the data stored in the
* value of the key or its metadata. */
#define CMD_KEY_OW (1ULL<<2) /* Overwrite - Overwrites the data stored in
* the value of the key. */
#define CMD_KEY_RM (1ULL<<3) /* Deletes the key. */
/* The following refer to user data inside the value of the key, not the metadata
* like LRU, type, cardinality. It refers to the logical operation on the user's
* data (actual input strings / TTL), being used / returned / copied / changed,
* It doesn't refer to modification or returning of metadata (like type, count,
* presence of data). Any write that's not INSERT or DELETE, would be an UPDATE.
* Each key-spec may have one of the writes with or without access, or none: */
#define CMD_KEY_ACCESS (1ULL<<4) /* Returns, copies or uses the user data from
* the value of the key. */
#define CMD_KEY_UPDATE (1ULL<<5) /* Updates data to the value, new value may
* depend on the old value. */
#define CMD_KEY_INSERT (1ULL<<6) /* Adds data to the value with no chance of
* modification or deletion of existing data. */
#define CMD_KEY_DELETE (1ULL<<7) /* Explicitly deletes some content
* from the value of the key. */
/* Other flags: */
#define CMD_KEY_NOT_KEY (1ULL<<8) /* A 'fake' key that should be routed
* like a key in cluster mode but is
* excluded from other key checks. */
New detailed key-spec flags (RO, RW, OW, RM, ACCESS, UPDATE, INSERT, DELETE) (#10122) The new ACL key based permissions in #9974 require the key-specs (#8324) to have more explicit flags rather than just READ and WRITE. See discussion in #10040 This PR defines two groups of flags: One about how redis internally handles the key (mutually-exclusive). The other is about the logical operation done from the user's point of view (3 mutually exclusive write flags, and one read flag, all optional). In both groups, if we can't explicitly flag something as explicit read-only, delete-only, or insert-only, we flag it as `RW` or `UPDATE`. here's the definition from the code: ``` /* Key-spec flags * * -------------- */ /* The following refer what the command actually does with the value or metadata * of the key, and not necessarily the user data or how it affects it. * Each key-spec may must have exaclty one of these. Any operation that's not * distinctly deletion, overwrite or read-only would be marked as RW. */ #define CMD_KEY_RO (1ULL<<0) /* Read-Only - Reads the value of the key, but * doesn't necessarily returns it. */ #define CMD_KEY_RW (1ULL<<1) /* Read-Write - Modifies the data stored in the * value of the key or its metadata. */ #define CMD_KEY_OW (1ULL<<2) /* Overwrite - Overwrites the data stored in * the value of the key. */ #define CMD_KEY_RM (1ULL<<3) /* Deletes the key. */ /* The follwing refer to user data inside the value of the key, not the metadata * like LRU, type, cardinality. It refers to the logical operation on the user's * data (actual input strings / TTL), being used / returned / copied / changed, * It doesn't refer to modification or returning of metadata (like type, count, * presence of data). Any write that's not INSERT or DELETE, would be an UPADTE. * Each key-spec may have one of the writes with or without access, or none: */ #define CMD_KEY_ACCESS (1ULL<<4) /* Returns, copies or uses the user data from * the value of the key. */ #define CMD_KEY_UPDATE (1ULL<<5) /* Updates data to the value, new value may * depend on the old value. */ #define CMD_KEY_INSERT (1ULL<<6) /* Adds data to the value with no chance of, * modification or deletion of existing data. */ #define CMD_KEY_DELETE (1ULL<<7) /* Explicitly deletes some content * from the value of the key. */ ``` Unrelated changes: - generate-command-code.py is only compatible with python3 (modified the shabang) - generate-command-code.py print file on json parsing error - rename `shard_channel` key-spec flag to just `channel`. - add INCOMPLETE flag in input spec of SORT and SORT_RO
2022-01-18 15:00:00 +01:00
#define CMD_KEY_INCOMPLETE (1ULL<<9) /* Means that the keyspec might not point
* out to all keys it should cover */
#define CMD_KEY_VARIABLE_FLAGS (1ULL<<10) /* Means that some keys might have
* different flags depending on arguments */
Treat subcommands as commands (#9504) ## Intro The purpose is to allow having different flags/ACL categories for subcommands (Example: CONFIG GET is ok-loading but CONFIG SET isn't) We create a small command table for every command that has subcommands and each subcommand has its own flags, etc. (same as a "regular" command) This commit also unites the Redis and the Sentinel command tables ## Affected commands CONFIG Used to have "admin ok-loading ok-stale no-script" Changes: 1. Dropped "ok-loading" in all except GET (this doesn't change behavior since there were checks in the code doing that) XINFO Used to have "read-only random" Changes: 1. Dropped "random" in all except CONSUMERS XGROUP Used to have "write use-memory" Changes: 1. Dropped "use-memory" in all except CREATE and CREATECONSUMER COMMAND No changes. MEMORY Used to have "random read-only" Changes: 1. Dropped "random" in PURGE and USAGE ACL Used to have "admin no-script ok-loading ok-stale" Changes: 1. Dropped "admin" in WHOAMI, GENPASS, and CAT LATENCY No changes. MODULE No changes. SLOWLOG Used to have "admin random ok-loading ok-stale" Changes: 1. Dropped "random" in RESET OBJECT Used to have "read-only random" Changes: 1. Dropped "random" in ENCODING and REFCOUNT SCRIPT Used to have "may-replicate no-script" Changes: 1. Dropped "may-replicate" in all except FLUSH and LOAD CLIENT Used to have "admin no-script random ok-loading ok-stale" Changes: 1. Dropped "random" in all except INFO and LIST 2. Dropped "admin" in ID, TRACKING, CACHING, GETREDIR, INFO, SETNAME, GETNAME, and REPLY STRALGO No changes. PUBSUB No changes. CLUSTER Changes: 1. Dropped "admin in countkeysinslots, getkeysinslot, info, nodes, keyslot, myid, and slots SENTINEL No changes. (note that DEBUG also fits, but we decided not to convert it since it's for debugging and anyway undocumented) ## New sub-command This commit adds another element to the per-command output of COMMAND, describing the list of subcommands, if any (in the same structure as "regular" commands) Also, it adds a new subcommand: ``` COMMAND LIST [FILTERBY (MODULE <module-name>|ACLCAT <cat>|PATTERN <pattern>)] ``` which returns a set of all commands (unless filters), but excluding subcommands. ## Module API A new module API, RM_CreateSubcommand, was added, in order to allow module writer to define subcommands ## ACL changes: 1. Now, that each subcommand is actually a command, each has its own ACL id. 2. The old mechanism of allowed_subcommands is redundant (blocking/allowing a subcommand is the same as blocking/allowing a regular command), but we had to keep it, to support the widespread usage of allowed_subcommands to block commands with certain args, that aren't subcommands (e.g. "-select +select|0"). 3. I have renamed allowed_subcommands to allowed_firstargs to emphasize the difference. 4. Because subcommands are commands in ACL too, you can now use "-" to block subcommands (e.g. "+client -client|kill"), which wasn't possible in the past. 5. It is also possible to use the allowed_firstargs mechanism with subcommand. For example: `+config -config|set +config|set|loglevel` will block all CONFIG SET except for setting the log level. 6. All of the ACL changes above required some amount of refactoring. ## Misc 1. There are two approaches: Either each subcommand has its own function or all subcommands use the same function, determining what to do according to argv[0]. For now, I took the former approaches only with CONFIG and COMMAND, while other commands use the latter approach (for smaller blamelog diff). 2. Deleted memoryGetKeys: It is no longer needed because MEMORY USAGE now uses the "range" key spec. 4. Bugfix: GETNAME was missing from CLIENT's help message. 5. Sentinel and Redis now use the same table, with the same function pointer. Some commands have a different implementation in Sentinel, so we redirect them (these are ROLE, PUBLISH, and INFO). 6. Command stats now show the stats per subcommand (e.g. instead of stats just for "config" you will have stats for "config|set", "config|get", etc.) 7. It is now possible to use COMMAND directly on subcommands: COMMAND INFO CONFIG|GET (The pipeline syntax was inspired from ACL, and can be used in functions lookupCommandBySds and lookupCommandByCString) 8. STRALGO is now a container command (has "help") ## Breaking changes: 1. Command stats now show the stats per subcommand (see (5) above)
2021-10-20 10:52:57 +02:00
/* Key flags for when access type is unknown */
#define CMD_KEY_FULL_ACCESS (CMD_KEY_RW | CMD_KEY_ACCESS | CMD_KEY_UPDATE)
/* Channel flags share the same flag space as the key flags */
#define CMD_CHANNEL_PATTERN (1ULL<<11) /* The argument is a channel pattern */
#define CMD_CHANNEL_SUBSCRIBE (1ULL<<12) /* The command subscribes to channels */
#define CMD_CHANNEL_UNSUBSCRIBE (1ULL<<13) /* The command unsubscribes to channels */
#define CMD_CHANNEL_PUBLISH (1ULL<<14) /* The command publishes to channels. */
/* AOF states */
2015-07-27 09:41:48 +02:00
#define AOF_OFF 0 /* AOF is off */
#define AOF_ON 1 /* AOF is on */
#define AOF_WAIT_REWRITE 2 /* AOF waits rewrite to start appending */
/* AOF return values for loadAppendOnlyFile() */
#define AOF_OK 0
#define AOF_NOT_EXIST 1
#define AOF_EMPTY 2
#define AOF_OPEN_ERR 3
#define AOF_FAILED 4
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788) Implement Multi-Part AOF mechanism to avoid overheads during AOFRW. Introducing a folder with multiple AOF files tracked by a manifest file. The main issues with the the original AOFRW mechanism are: * buffering of commands that are processed during rewrite (consuming a lot of RAM) * freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it. * double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files) The main modifications of this PR: 1. Remove the AOF rewrite buffer and related code. 2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type, it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the incremental commands since the last AOFRW. 3. Use a AOF manifest file to record and manage these AOF files mentioned above. 4. The original configuration of `appendfilename` will be the base part of the new file name, for example: `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof` 5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename` 6. Remove the `aof_rewrite_buffer_length` field in info. 7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs. It also gives users the opportunity to preserve the history AOFs. just for testing use now. 8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now), we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately. 9. Support upgrade (load) data from old version redis. 10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and manifest file will be placed in this directory. 11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if `aof-load-truncated` is enabled. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-03 18:14:13 +01:00
#define AOF_TRUNCATED 5
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
/* Command doc flags */
#define CMD_DOC_NONE 0
#define CMD_DOC_DEPRECATED (1<<0) /* Command is deprecated */
#define CMD_DOC_SYSCMD (1<<1) /* System (internal) command */
/* Client flags */
2020-10-18 07:11:18 +02:00
#define CLIENT_SLAVE (1<<0) /* This client is a replica */
#define CLIENT_MASTER (1<<1) /* This client is a master */
2015-07-27 09:41:48 +02:00
#define CLIENT_MONITOR (1<<2) /* This client is a slave monitor, see MONITOR */
#define CLIENT_MULTI (1<<3) /* This client is in a MULTI context */
#define CLIENT_BLOCKED (1<<4) /* The client is waiting in a blocking operation */
#define CLIENT_DIRTY_CAS (1<<5) /* Watched keys modified. EXEC will fail. */
#define CLIENT_CLOSE_AFTER_REPLY (1<<6) /* Close after writing entire reply. */
#define CLIENT_UNBLOCKED (1<<7) /* This client was unblocked and is stored in
server.unblocked_clients */
#define CLIENT_SCRIPT (1<<8) /* This is a non connected client used by Lua */
2015-07-27 09:41:48 +02:00
#define CLIENT_ASKING (1<<9) /* Client issued the ASKING command */
#define CLIENT_CLOSE_ASAP (1<<10)/* Close this client ASAP */
#define CLIENT_UNIX_SOCKET (1<<11) /* Client connected via Unix domain socket */
#define CLIENT_DIRTY_EXEC (1<<12) /* EXEC will fail for errors while queueing */
#define CLIENT_MASTER_FORCE_REPLY (1<<13) /* Queue replies even if is master */
#define CLIENT_FORCE_AOF (1<<14) /* Force AOF propagation of current cmd. */
#define CLIENT_FORCE_REPL (1<<15) /* Force replication of current cmd. */
#define CLIENT_PRE_PSYNC (1<<16) /* Instance don't understand PSYNC. */
#define CLIENT_READONLY (1<<17) /* Cluster client is in read-only state. */
#define CLIENT_PUBSUB (1<<18) /* Client is in Pub/Sub mode. */
#define CLIENT_PREVENT_AOF_PROP (1<<19) /* Don't propagate to AOF. */
#define CLIENT_PREVENT_REPL_PROP (1<<20) /* Don't propagate to slaves. */
#define CLIENT_PREVENT_PROP (CLIENT_PREVENT_AOF_PROP|CLIENT_PREVENT_REPL_PROP)
#define CLIENT_PENDING_WRITE (1<<21) /* Client has output to send but a write
handler is yet not installed. */
#define CLIENT_REPLY_OFF (1<<22) /* Don't send replies to client. */
#define CLIENT_REPLY_SKIP_NEXT (1<<23) /* Set CLIENT_REPLY_SKIP for next cmd */
#define CLIENT_REPLY_SKIP (1<<24) /* Don't send just this reply. */
#define CLIENT_LUA_DEBUG (1<<25) /* Run EVAL in debug mode. */
2015-11-06 16:19:59 +01:00
#define CLIENT_LUA_DEBUG_SYNC (1<<26) /* EVAL debugging without fork() */
2016-03-06 13:44:24 +01:00
#define CLIENT_MODULE (1<<27) /* Non connected client used by some module. */
#define CLIENT_PROTECTED (1<<28) /* Client should not be freed for now. */
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
/* #define CLIENT_... (1<<29) currently unused, feel free to use in the future */
#define CLIENT_PENDING_COMMAND (1<<30) /* Indicates the client has a fully
* parsed command ready for execution. */
#define CLIENT_TRACKING (1ULL<<31) /* Client enabled keys tracking in order to
perform client side caching. */
#define CLIENT_TRACKING_BROKEN_REDIR (1ULL<<32) /* Target client is invalid. */
#define CLIENT_TRACKING_BCAST (1ULL<<33) /* Tracking in BCAST mode. */
2020-02-21 16:39:42 +01:00
#define CLIENT_TRACKING_OPTIN (1ULL<<34) /* Tracking in opt-in mode. */
#define CLIENT_TRACKING_OPTOUT (1ULL<<35) /* Tracking in opt-out mode. */
#define CLIENT_TRACKING_CACHING (1ULL<<36) /* CACHING yes/no was given,
depending on optin/optout mode. */
#define CLIENT_TRACKING_NOLOOP (1ULL<<37) /* Don't send invalidation messages
about writes performed by myself.*/
#define CLIENT_IN_TO_TABLE (1ULL<<38) /* This client is in the timeout table. */
#define CLIENT_PROTOCOL_ERROR (1ULL<<39) /* Protocol error chatting with it. */
Don't write replies if close the client ASAP (#7202) Before this commit, we would have continued to add replies to the reply buffer even if client output buffer limit is reached, so the used memory would keep increasing over the configured limit. What's more, we shouldn’t write any reply to the client if it is set 'CLIENT_CLOSE_ASAP' flag because that doesn't conform to its definition and we will close all clients flagged with 'CLIENT_CLOSE_ASAP' in ‘beforeSleep’. Because of code execution order, before this, we may firstly write to part of the replies to the socket before disconnecting it, but in fact, we may can’t send the full replies to clients since OS socket buffer is limited. But this unexpected behavior makes some commands work well, for instance ACL DELUSER, if the client deletes the current user, we need to send reply to client and close the connection, but before, we close the client firstly and write the reply to reply buffer. secondly, we shouldn't do this despite the fact it works well in most cases. We add a flag 'CLIENT_CLOSE_AFTER_COMMAND' to mark clients, this flag means we will close the client after executing commands and send all entire replies, so that we can write replies to reply buffer during executing commands, send replies to clients, and close them later. We also fix some implicit problems. If client output buffer limit is enforced in 'multi/exec', all commands will be executed completely in redis and clients will not read any reply instead of partial replies. Even more, if the client executes 'ACL deluser' the using user in 'multi/exec', it will not read the replies after 'ACL deluser' just like before executing 'client kill' itself in 'multi/exec'. We added some tests for output buffer limit breach during multi-exec and using a pipeline of many small commands rather than one with big response. Co-authored-by: Oran Agra <oran@redislabs.com>
2020-09-24 15:01:41 +02:00
#define CLIENT_CLOSE_AFTER_COMMAND (1ULL<<40) /* Close after executing commands
* and writing entire reply. */
Unified MULTI, LUA, and RM_Call with respect to blocking commands (#8025) Blocking command should not be used with MULTI, LUA, and RM_Call. This is because, the caller, who executes the command in this context, expects a reply. Today, LUA and MULTI have a special (and different) treatment to blocking commands: LUA - Most commands are marked with no-script flag which are checked when executing and command from LUA, commands that are not marked (like XREAD) verify that their blocking mode is not used inside LUA (by checking the CLIENT_LUA client flag). MULTI - Command that is going to block, first verify that the client is not inside multi (by checking the CLIENT_MULTI client flag). If the client is inside multi, they return a result which is a match to the empty key with no timeout (for example blpop inside MULTI will act as lpop) For modules that perform RM_Call with blocking command, the returned results type is REDISMODULE_REPLY_UNKNOWN and the caller can not really know what happened. Disadvantages of the current state are: No unified approach, LUA, MULTI, and RM_Call, each has a different treatment Module can not safely execute blocking command (and get reply or error). Though It is true that modules are not like LUA or MULTI and should be smarter not to execute blocking commands on RM_Call, sometimes you want to execute a command base on client input (for example if you create a module that provides a new scripting language like javascript or python). While modules (on modules command) can check for REDISMODULE_CTX_FLAGS_LUA or REDISMODULE_CTX_FLAGS_MULTI to know not to block the client, there is no way to check if the command came from another module using RM_Call. So there is no way for a module to know not to block another module RM_Call execution. This commit adds a way to unify the treatment for blocking clients by introducing a new CLIENT_DENY_BLOCKING client flag. On LUA, MULTI, and RM_Call the new flag turned on to signify that the client should not be blocked. A blocking command verifies that the flag is turned off before blocking. If a blocking command sees that the CLIENT_DENY_BLOCKING flag is on, it's not blocking and return results which are matches to empty key with no timeout (as MULTI does today). The new flag is checked on the following commands: List blocking commands: BLPOP, BRPOP, BRPOPLPUSH, BLMOVE, Zset blocking commands: BZPOPMIN, BZPOPMAX Stream blocking commands: XREAD, XREADGROUP SUBSCRIBE, PSUBSCRIBE, MONITOR In addition, the new flag is turned on inside the AOF client, we do not want to block the AOF client to prevent deadlocks and commands ordering issues (and there is also an existing assert in the code that verifies it). To keep backward compatibility on LUA, all the no-script flags on existing commands were kept untouched. In addition, a LUA special treatment on XREAD and XREADGROUP was kept. To keep backward compatibility on MULTI (which today allows SUBSCRIBE, and PSUBSCRIBE). We added a special treatment on those commands to allow executing them on MULTI. The only backward compatibility issue that this PR introduces is that now MONITOR is not allowed inside MULTI. Tests were added to verify blocking commands are not blocking the client on LUA, MULTI, or RM_Call. Tests were added to verify the module can check for CLIENT_DENY_BLOCKING flag. Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Itamar Haber <itamar@redislabs.com>
2020-11-17 17:58:55 +01:00
#define CLIENT_DENY_BLOCKING (1ULL<<41) /* Indicate that the client should not be blocked.
currently, turned on inside MULTI, Lua, RM_Call,
and AOF client */
#define CLIENT_REPL_RDBONLY (1ULL<<42) /* This client is a replica that only wants
RDB without replication buffer. */
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
#define CLIENT_NO_EVICT (1ULL<<43) /* This client is protected against client
memory eviction. */
/* Client block type (btype field in client structure)
2015-07-27 09:41:48 +02:00
* if CLIENT_BLOCKED flag is set. */
#define BLOCKED_NONE 0 /* Not blocked, no CLIENT_BLOCKED flag set. */
#define BLOCKED_LIST 1 /* BLPOP & co. */
#define BLOCKED_WAIT 2 /* WAIT for synchronous replication. */
#define BLOCKED_MODULE 3 /* Blocked by a loadable module. */
#define BLOCKED_STREAM 4 /* XREAD. */
#define BLOCKED_ZSET 5 /* BZPOP et al. */
#define BLOCKED_POSTPONE 6 /* Blocked by processCommand, re-try processing later. */
Wait for replicas when shutting down (#9872) To avoid data loss, this commit adds a grace period for lagging replicas to catch up the replication offset. Done: * Wait for replicas when shutdown is triggered by SIGTERM and SIGINT. * Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients to call SHUTDOWN in parallel. Note that they don't expect a response unless an error happens and shutdown is aborted. * Log warning for each replica lagging behind when finishing shutdown. * CLIENT_PAUSE_WRITE while waiting for replicas. * Configurable grace period 'shutdown-timeout' in seconds (default 10). * New flags for the SHUTDOWN command: - NOW disables the grace period for lagging replicas. - FORCE ignores errors writing the RDB or AOF files which would normally prevent a shutdown. - ABORT cancels ongoing shutdown. Can't be combined with other flags. * New field in the output of the INFO command: 'shutdown_in_milliseconds'. The value is the remaining maximum time to wait for lagging replicas before finishing the shutdown. This field is present in the Server section **only** during shutdown. Not directly related: * When shutting down, if there is an AOF saving child, it is killed **even** if AOF is disabled. This can happen if BGREWRITEAOF is used when AOF is off. * Client pause now has end time and type (WRITE or ALL) per purpose. The different pause purposes are *CLIENT PAUSE command*, *failover* and *shutdown*. If clients are unpaused for one purpose, it doesn't affect client pause for other purposes. For example, the CLIENT UNPAUSE command doesn't affect client pause initiated by the failover or shutdown procedures. A completed failover or a failed shutdown doesn't unpause clients paused by the CLIENT PAUSE command. Notes: * DEBUG RESTART doesn't wait for replicas. * We already have a warning logged when a replica disconnects. This means that if any replica connection is lost during the shutdown, it is either logged as disconnected or as lagging at the time of exit. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-02 08:50:15 +01:00
#define BLOCKED_SHUTDOWN 7 /* SHUTDOWN. */
#define BLOCKED_NUM 8 /* Number of blocked states. */
/* Client request types */
2015-07-27 09:41:48 +02:00
#define PROTO_REQ_INLINE 1
#define PROTO_REQ_MULTIBULK 2
/* Client classes for client limits, currently used only for
* the max-client-output-buffer limit implementation. */
2015-07-27 09:41:48 +02:00
#define CLIENT_TYPE_NORMAL 0 /* Normal req-reply clients + MONITORs */
#define CLIENT_TYPE_SLAVE 1 /* Slaves. */
#define CLIENT_TYPE_PUBSUB 2 /* Clients subscribed to PubSub channels. */
2015-07-28 16:58:04 +02:00
#define CLIENT_TYPE_MASTER 3 /* Master. */
#define CLIENT_TYPE_COUNT 4 /* Total number of client types. */
2015-07-28 16:58:04 +02:00
#define CLIENT_TYPE_OBUF_COUNT 3 /* Number of clients to expose to output
buffer configuration. Just the first
three: normal, slave, pubsub. */
2015-07-27 09:41:48 +02:00
/* Slave replication state. Used in server.repl_state for slaves to remember
* what to do next. */
typedef enum {
REPL_STATE_NONE = 0, /* No active replication */
REPL_STATE_CONNECT, /* Must connect to master */
REPL_STATE_CONNECTING, /* Connecting to master */
/* --- Handshake states, must be ordered --- */
REPL_STATE_RECEIVE_PING_REPLY, /* Wait for PING reply */
REPL_STATE_SEND_HANDSHAKE, /* Send handshake sequence to master */
REPL_STATE_RECEIVE_AUTH_REPLY, /* Wait for AUTH reply */
REPL_STATE_RECEIVE_PORT_REPLY, /* Wait for REPLCONF reply */
REPL_STATE_RECEIVE_IP_REPLY, /* Wait for REPLCONF reply */
REPL_STATE_RECEIVE_CAPA_REPLY, /* Wait for REPLCONF reply */
REPL_STATE_SEND_PSYNC, /* Send PSYNC */
REPL_STATE_RECEIVE_PSYNC_REPLY, /* Wait for PSYNC reply */
/* --- End of handshake states --- */
REPL_STATE_TRANSFER, /* Receiving .rdb from master */
REPL_STATE_CONNECTED, /* Connected to master */
} repl_state;
2015-07-27 09:41:48 +02:00
/* The state of an in progress coordinated failover */
typedef enum {
NO_FAILOVER = 0, /* No failover in progress */
FAILOVER_WAIT_FOR_SYNC, /* Waiting for target replica to catch up */
FAILOVER_IN_PROGRESS /* Waiting for target replica to accept
* PSYNC FAILOVER request. */
} failover_state;
2015-07-27 09:41:48 +02:00
/* State of slaves from the POV of the master. Used in client->replstate.
* In SEND_BULK and ONLINE state the slave receives new updates
2015-07-27 09:41:48 +02:00
* in its output queue. In the WAIT_BGSAVE states instead the server is waiting
* to start the next background saving in order to send updates to it. */
2015-07-27 09:41:48 +02:00
#define SLAVE_STATE_WAIT_BGSAVE_START 6 /* We need to produce a new RDB file. */
#define SLAVE_STATE_WAIT_BGSAVE_END 7 /* Waiting RDB file creation to finish. */
#define SLAVE_STATE_SEND_BULK 8 /* Sending RDB file to slave. */
#define SLAVE_STATE_ONLINE 9 /* RDB file transmitted, sending just updates. */
/* Slave capabilities. */
#define SLAVE_CAPA_NONE 0
PSYNC2: different improvements to Redis replication. The gist of the changes is that now, partial resynchronizations between slaves and masters (without the need of a full resync with RDB transfer and so forth), work in a number of cases when it was impossible in the past. For instance: 1. When a slave is promoted to mastrer, the slaves of the old master can partially resynchronize with the new master. 2. Chained slalves (slaves of slaves) can be moved to replicate to other slaves or the master itsef, without requiring a full resync. 3. The master itself, after being turned into a slave, is able to partially resynchronize with the new master, when it joins replication again. In order to obtain this, the following main changes were operated: * Slaves also take a replication backlog, not just masters. * Same stream replication for all the slaves and sub slaves. The replication stream is identical from the top level master to its slaves and is also the same from the slaves to their sub-slaves and so forth. This means that if a slave is later promoted to master, it has the same replication backlong, and can partially resynchronize with its slaves (that were previously slaves of the old master). * A given replication history is no longer identified by the `runid` of a Redis node. There is instead a `replication ID` which changes every time the instance has a new history no longer coherent with the past one. So, for example, slaves publish the same replication history of their master, however when they are turned into masters, they publish a new replication ID, but still remember the old ID, so that they are able to partially resynchronize with slaves of the old master (up to a given offset). * The replication protocol was slightly modified so that a new extended +CONTINUE reply from the master is able to inform the slave of a replication ID change. * REPLCONF CAPA is used in order to notify masters that a slave is able to understand the new +CONTINUE reply. * The RDB file was extended with an auxiliary field that is able to select a given DB after loading in the slave, so that the slave can continue receiving the replication stream from the point it was disconnected without requiring the master to insert "SELECT" statements. This is useful in order to guarantee the "same stream" property, because the slave must be able to accumulate an identical backlog. * Slave pings to sub-slaves are now sent in a special form, when the top-level master is disconnected, in order to don't interfer with the replication stream. We just use out of band "\n" bytes as in other parts of the Redis protocol. An old design document is available here: https://gist.github.com/antirez/ae068f95c0d084891305 However the implementation is not identical to the description because during the work to implement it, different changes were needed in order to make things working well.
2016-11-09 11:31:06 +01:00
#define SLAVE_CAPA_EOF (1<<0) /* Can parse the RDB EOF streaming format. */
#define SLAVE_CAPA_PSYNC2 (1<<1) /* Supports PSYNC2 protocol. */
/* Slave requirements */
#define SLAVE_REQ_NONE 0
#define SLAVE_REQ_RDB_EXCLUDE_DATA (1 << 0) /* Exclude data from RDB */
#define SLAVE_REQ_RDB_EXCLUDE_FUNCTIONS (1 << 1) /* Exclude functions from RDB */
/* Mask of all bits in the slave requirements bitfield that represent non-standard (filtered) RDB requirements */
#define SLAVE_REQ_RDB_MASK (SLAVE_REQ_RDB_EXCLUDE_DATA | SLAVE_REQ_RDB_EXCLUDE_FUNCTIONS)
/* Synchronous read timeout - slave side */
2015-07-27 09:41:48 +02:00
#define CONFIG_REPL_SYNCIO_TIMEOUT 5
Replication backlog and replicas use one global shared replication buffer (#9166) ## Background For redis master, one replica uses one copy of replication buffer, that is a big waste of memory, more replicas more waste, and allocate/free memory for every reply list also cost much. If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with replicas and can't finish synchronization with replica. If we set client-output-buffer-limit big, master may be OOM when there are many replicas that separately keep much memory. Because replication buffers of different replica client are the same, one simple idea is that all replicas only use one replication buffer, that will effectively save memory. Since replication backlog content is the same as replicas' output buffer, now we can discard replication backlog memory and use global shared replication buffer to implement replication backlog mechanism. ## Implementation I create one global "replication buffer" which contains content of replication stream. The structure of "replication buffer" is similar to the reply list that exists in every client. But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields. ```c /* Replication buffer blocks is the list of replBufBlock. * * +--------------+ +--------------+ +--------------+ * | refcount = 1 | ... | refcount = 0 | ... | refcount = 2 | * +--------------+ +--------------+ +--------------+ * | / \ * | / \ * | / \ * Repl Backlog Replia_A Replia_B * * Each replica or replication backlog increments only the refcount of the * 'ref_repl_buf_node' which it points to. So when replica walks to the next * node, it should first increase the next node's refcount, and when we trim * the replication buffer nodes, we remove node always from the head node which * refcount is 0. If the refcount of the head node is not 0, we must stop * trimming and never iterate the next node. */ /* Similar with 'clientReplyBlock', it is used for shared buffers between * all replica clients and replication backlog. */ typedef struct replBufBlock { int refcount; /* Number of replicas or repl backlog using. */ long long id; /* The unique incremental number. */ long long repl_offset; /* Start replication offset of the block. */ size_t size, used; char buf[]; } replBufBlock; ``` So now when we feed replication stream into replication backlog and all replicas, we only need to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of replication backlog and replicas to references of the global replication buffer blocks. And we also need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim replication backlog if exceeding `repl-backlog-size`. When sending reply to replicas, we also need to iterate replication buffer blocks and send its content, when totally sending one block for replica, we decrease current node count and increase the next current node count, and then free the block which reference is 0 from the head of replication buffer blocks. Since now we use linked list to manage replication backlog, it may cost much time for iterating all linked list nodes to find corresponding replication buffer node. So we create a rax tree to store some nodes for index, but to avoid rax tree occupying too much memory, i record one per 64 nodes for index. Currently, to make partial resynchronization as possible as much, we always let replication backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks, and this method doesn't increase memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced replication buffer blocks when we need to trim backlog for exceeding backlog size setting, we trim backlog incrementally (free 64 blocks per call now), and make it faster in `beforeSleep` (free 640 blocks). ### Other changes - `mem_total_replication_buffers`: we add this field in INFO command, it means the total memory of replication buffers used. - `mem_clients_slaves`: now even replica is slow to replicate, and its output buffer memory is not 0, but it still may be 0, since replication backlog and replicas share one global replication buffer, only if replication buffer memory is more than the repl backlog setting size, we consider the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption of repl backlog. - Key eviction Since all replicas and replication backlog share global replication buffer, we think only the part of exceeding backlog size the extra separate consumption of replicas. Because we trim backlog incrementally in the background, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks disconnect. To avoid massive eviction loop, we don't count the delayed freed replication backlog into used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory. - `client-output-buffer-limit` check for replica clients It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size config (partial sync will succeed and then replica will get disconnected). Such a configuration is ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption implications since the replica client will share the backlog buffers memory. - Drop replication backlog after loading data if needed We always create replication backlog if server is a master, we need it because we put DELs in it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb, it is not possible to support partial resynchronization, to avoid extra memory of replication backlog, we drop it. - Multi IO threads Since all replicas and replication backlog use global replication buffer, if I/O threads are enabled, to guarantee data accessing thread safe, we must let main thread handle sending the output buffer to all replicas. But before, other IO threads could handle sending output buffer of all replicas. ## Other optimizations This solution resolve some other problem: - When replicas disconnect with master since of out of output buffer limit, releasing the output buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now, it doesn't cause freezing. - This implementation may mitigate reply list copy cost time(also freezes server) when one replication has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy reference info, it is very light. - If we set replication backlog size big, it also may cost much time to copy replication backlog into replica's output buffer. But this commit eliminates this problem. - Resizing replication backlog size doesn't empty current replication backlog content.
2021-10-25 08:24:31 +02:00
/* The default number of replication backlog blocks to trim per call. */
#define REPL_BACKLOG_TRIM_BLOCKS_PER_CALL 64
/* In order to quickly find the requested offset for PSYNC requests,
* we index some nodes in the replication buffer linked list into a rax. */
#define REPL_BACKLOG_INDEX_PER_BLOCKS 64
/* List related stuff */
2015-07-27 09:41:48 +02:00
#define LIST_HEAD 0
#define LIST_TAIL 1
#define ZSET_MIN 0
#define ZSET_MAX 1
/* Sort operations */
2015-07-27 09:41:48 +02:00
#define SORT_OP_GET 0
/* Log levels */
2015-07-27 09:41:48 +02:00
#define LL_DEBUG 0
#define LL_VERBOSE 1
#define LL_NOTICE 2
#define LL_WARNING 3
#define LL_RAW (1<<10) /* Modifier to log without timestamp */
/* Supervision options */
2015-07-27 09:41:48 +02:00
#define SUPERVISED_NONE 0
#define SUPERVISED_AUTODETECT 1
#define SUPERVISED_SYSTEMD 2
#define SUPERVISED_UPSTART 3
/* Anti-warning macro... */
2015-07-27 09:41:48 +02:00
#define UNUSED(V) ((void) V)
#define ZSKIPLIST_MAXLEVEL 32 /* Should be enough for 2^64 elements */
#define ZSKIPLIST_P 0.25 /* Skiplist P = 1/4 */
/* Append only defines */
#define AOF_FSYNC_NO 0
#define AOF_FSYNC_ALWAYS 1
#define AOF_FSYNC_EVERYSEC 2
/* Replication diskless load defines */
#define REPL_DISKLESS_LOAD_DISABLED 0
#define REPL_DISKLESS_LOAD_WHEN_DB_EMPTY 1
#define REPL_DISKLESS_LOAD_SWAPDB 2
/* TLS Client Authentication */
#define TLS_CLIENT_AUTH_NO 0
#define TLS_CLIENT_AUTH_YES 1
#define TLS_CLIENT_AUTH_OPTIONAL 2
/* Sanitize dump payload */
#define SANITIZE_DUMP_NO 0
#define SANITIZE_DUMP_YES 1
#define SANITIZE_DUMP_CLIENTS 2
/* Enable protected config/command */
#define PROTECTED_ACTION_ALLOWED_NO 0
#define PROTECTED_ACTION_ALLOWED_YES 1
#define PROTECTED_ACTION_ALLOWED_LOCAL 2
/* Sets operations codes */
2015-07-27 09:41:48 +02:00
#define SET_OP_UNION 0
#define SET_OP_DIFF 1
#define SET_OP_INTER 2
/* oom-score-adj defines */
#define OOM_SCORE_ADJ_NO 0
#define OOM_SCORE_RELATIVE 1
#define OOM_SCORE_ADJ_ABSOLUTE 2
/* Redis maxmemory strategies. Instead of using just incremental number
* for this defines, we use a set of flags so that testing for certain
* properties common to multiple policies is faster. */
#define MAXMEMORY_FLAG_LRU (1<<0)
#define MAXMEMORY_FLAG_LFU (1<<1)
#define MAXMEMORY_FLAG_ALLKEYS (1<<2)
#define MAXMEMORY_FLAG_NO_SHARED_INTEGERS \
(MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_LFU)
#define MAXMEMORY_VOLATILE_LRU ((0<<8)|MAXMEMORY_FLAG_LRU)
#define MAXMEMORY_VOLATILE_LFU ((1<<8)|MAXMEMORY_FLAG_LFU)
#define MAXMEMORY_VOLATILE_TTL (2<<8)
#define MAXMEMORY_VOLATILE_RANDOM (3<<8)
#define MAXMEMORY_ALLKEYS_LRU ((4<<8)|MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_ALLKEYS)
#define MAXMEMORY_ALLKEYS_LFU ((5<<8)|MAXMEMORY_FLAG_LFU|MAXMEMORY_FLAG_ALLKEYS)
#define MAXMEMORY_ALLKEYS_RANDOM ((6<<8)|MAXMEMORY_FLAG_ALLKEYS)
#define MAXMEMORY_NO_EVICTION (7<<8)
/* Units */
#define UNIT_SECONDS 0
#define UNIT_MILLISECONDS 1
/* SHUTDOWN flags */
#define SHUTDOWN_NOFLAGS 0 /* No flags. */
#define SHUTDOWN_SAVE 1 /* Force SAVE on SHUTDOWN even if no save
points are configured. */
#define SHUTDOWN_NOSAVE 2 /* Don't SAVE on SHUTDOWN. */
Wait for replicas when shutting down (#9872) To avoid data loss, this commit adds a grace period for lagging replicas to catch up the replication offset. Done: * Wait for replicas when shutdown is triggered by SIGTERM and SIGINT. * Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients to call SHUTDOWN in parallel. Note that they don't expect a response unless an error happens and shutdown is aborted. * Log warning for each replica lagging behind when finishing shutdown. * CLIENT_PAUSE_WRITE while waiting for replicas. * Configurable grace period 'shutdown-timeout' in seconds (default 10). * New flags for the SHUTDOWN command: - NOW disables the grace period for lagging replicas. - FORCE ignores errors writing the RDB or AOF files which would normally prevent a shutdown. - ABORT cancels ongoing shutdown. Can't be combined with other flags. * New field in the output of the INFO command: 'shutdown_in_milliseconds'. The value is the remaining maximum time to wait for lagging replicas before finishing the shutdown. This field is present in the Server section **only** during shutdown. Not directly related: * When shutting down, if there is an AOF saving child, it is killed **even** if AOF is disabled. This can happen if BGREWRITEAOF is used when AOF is off. * Client pause now has end time and type (WRITE or ALL) per purpose. The different pause purposes are *CLIENT PAUSE command*, *failover* and *shutdown*. If clients are unpaused for one purpose, it doesn't affect client pause for other purposes. For example, the CLIENT UNPAUSE command doesn't affect client pause initiated by the failover or shutdown procedures. A completed failover or a failed shutdown doesn't unpause clients paused by the CLIENT PAUSE command. Notes: * DEBUG RESTART doesn't wait for replicas. * We already have a warning logged when a replica disconnects. This means that if any replica connection is lost during the shutdown, it is either logged as disconnected or as lagging at the time of exit. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-02 08:50:15 +01:00
#define SHUTDOWN_NOW 4 /* Don't wait for replicas to catch up. */
#define SHUTDOWN_FORCE 8 /* Don't let errors prevent shutdown. */
/* Command call flags, see call() function */
2015-07-27 09:41:48 +02:00
#define CMD_CALL_NONE 0
#define CMD_CALL_SLOWLOG (1<<0)
#define CMD_CALL_STATS (1<<1)
#define CMD_CALL_PROPAGATE_AOF (1<<2)
#define CMD_CALL_PROPAGATE_REPL (1<<3)
#define CMD_CALL_PROPAGATE (CMD_CALL_PROPAGATE_AOF|CMD_CALL_PROPAGATE_REPL)
2015-07-27 09:41:48 +02:00
#define CMD_CALL_FULL (CMD_CALL_SLOWLOG | CMD_CALL_STATS | CMD_CALL_PROPAGATE)
Sort out mess around propagation and MULTI/EXEC (#9890) The mess: Some parts use alsoPropagate for late propagation, others using an immediate one (propagate()), causing edge cases, ugly/hacky code, and the tendency for bugs The basic idea is that all commands are propagated via alsoPropagate (i.e. added to a list) and the top-most call() is responsible for going over that list and actually propagating them (and wrapping them in MULTI/EXEC if there's more than one command). This is done in the new function, propagatePendingCommands. Callers to propagatePendingCommands: 1. top-most call() (we want all nested call()s to add to the also_propagate array and just the top-most one to propagate them) - via `afterCommand` 2. handleClientsBlockedOnKeys: it is out of call() context and it may propagate stuff - via `afterCommand`. 3. handleClientsBlockedOnKeys edge case: if the looked-up key is already expired, we will propagate the expire but will not unblock any client so `afterCommand` isn't called. in that case, we have to propagate the deletion explicitly. 4. cron stuff: active-expire and eviction may also propagate stuff 5. modules: the module API allows to propagate stuff from just about anywhere (timers, keyspace notifications, threads). I could have tried to catch all the out-of-call-context places but it seemed easier to handle it in one place: when we free the context. in the spirit of what was done in call(), only the top-most freeing of a module context may cause propagation. 6. modules: when using a thread-safe ctx it's not clear when/if the ctx will be freed. we do know that the module must lock the GIL before calling RM_Replicate/RM_Call so we propagate the pending commands when releasing the GIL. A "known limitation", which were actually a bug, was fixed because of this commit (see propagate.tcl): When using a mix of RM_Call with `!` and RM_Replicate, the command would propagate out-of-order: first all the commands from RM_Call, and then the ones from RM_Replicate Another thing worth mentioning is that if, in the past, a client would issue a MULTI/EXEC with just one write command the server would blindly propagate the MULTI/EXEC too, even though it's redundant. not anymore. This commit renames propagate() to propagateNow() in order to cause conflicts in pending PRs. propagatePendingCommands is the only caller of propagateNow, which is now a static, internal helper function. Optimizations: 1. alsoPropagate will not add stuff to also_propagate if there's no AOF and replicas 2. alsoPropagate reallocs also_propagagte exponentially, to save calls to memmove Bugfixes: 1. CONFIG SET can create evictions, sending notifications which can cause to dirty++ with modules. we need to prevent it from propagating to AOF/replicas 2. We need to set current_client in RM_Call. buggy scenario: - CONFIG SET maxmemory, eviction notifications, module hook calls RM_Call - assertion in lookupKey crashes, because current_client has CONFIG SET, which isn't CMD_WRITE 3. minor: in eviction, call propagateDeletion after notification, like active-expire and all commands (we always send a notification before propagating the command)
2021-12-22 23:03:48 +01:00
#define CMD_CALL_FROM_MODULE (1<<4) /* From RM_Call */
Sort out mess around propagation and MULTI/EXEC (#9890) The mess: Some parts use alsoPropagate for late propagation, others using an immediate one (propagate()), causing edge cases, ugly/hacky code, and the tendency for bugs The basic idea is that all commands are propagated via alsoPropagate (i.e. added to a list) and the top-most call() is responsible for going over that list and actually propagating them (and wrapping them in MULTI/EXEC if there's more than one command). This is done in the new function, propagatePendingCommands. Callers to propagatePendingCommands: 1. top-most call() (we want all nested call()s to add to the also_propagate array and just the top-most one to propagate them) - via `afterCommand` 2. handleClientsBlockedOnKeys: it is out of call() context and it may propagate stuff - via `afterCommand`. 3. handleClientsBlockedOnKeys edge case: if the looked-up key is already expired, we will propagate the expire but will not unblock any client so `afterCommand` isn't called. in that case, we have to propagate the deletion explicitly. 4. cron stuff: active-expire and eviction may also propagate stuff 5. modules: the module API allows to propagate stuff from just about anywhere (timers, keyspace notifications, threads). I could have tried to catch all the out-of-call-context places but it seemed easier to handle it in one place: when we free the context. in the spirit of what was done in call(), only the top-most freeing of a module context may cause propagation. 6. modules: when using a thread-safe ctx it's not clear when/if the ctx will be freed. we do know that the module must lock the GIL before calling RM_Replicate/RM_Call so we propagate the pending commands when releasing the GIL. A "known limitation", which were actually a bug, was fixed because of this commit (see propagate.tcl): When using a mix of RM_Call with `!` and RM_Replicate, the command would propagate out-of-order: first all the commands from RM_Call, and then the ones from RM_Replicate Another thing worth mentioning is that if, in the past, a client would issue a MULTI/EXEC with just one write command the server would blindly propagate the MULTI/EXEC too, even though it's redundant. not anymore. This commit renames propagate() to propagateNow() in order to cause conflicts in pending PRs. propagatePendingCommands is the only caller of propagateNow, which is now a static, internal helper function. Optimizations: 1. alsoPropagate will not add stuff to also_propagate if there's no AOF and replicas 2. alsoPropagate reallocs also_propagagte exponentially, to save calls to memmove Bugfixes: 1. CONFIG SET can create evictions, sending notifications which can cause to dirty++ with modules. we need to prevent it from propagating to AOF/replicas 2. We need to set current_client in RM_Call. buggy scenario: - CONFIG SET maxmemory, eviction notifications, module hook calls RM_Call - assertion in lookupKey crashes, because current_client has CONFIG SET, which isn't CMD_WRITE 3. minor: in eviction, call propagateDeletion after notification, like active-expire and all commands (we always send a notification before propagating the command)
2021-12-22 23:03:48 +01:00
/* Command propagation flags, see propagateNow() function */
2015-07-27 09:41:48 +02:00
#define PROPAGATE_NONE 0
#define PROPAGATE_AOF 1
#define PROPAGATE_REPL 2
2012-02-28 16:17:00 +01:00
/* Client pause types, larger types are more restrictive
* pause types than smaller pause types. */
typedef enum {
CLIENT_PAUSE_OFF = 0, /* Pause no commands */
CLIENT_PAUSE_WRITE, /* Pause write commands */
CLIENT_PAUSE_ALL /* Pause all commands */
} pause_type;
Wait for replicas when shutting down (#9872) To avoid data loss, this commit adds a grace period for lagging replicas to catch up the replication offset. Done: * Wait for replicas when shutdown is triggered by SIGTERM and SIGINT. * Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients to call SHUTDOWN in parallel. Note that they don't expect a response unless an error happens and shutdown is aborted. * Log warning for each replica lagging behind when finishing shutdown. * CLIENT_PAUSE_WRITE while waiting for replicas. * Configurable grace period 'shutdown-timeout' in seconds (default 10). * New flags for the SHUTDOWN command: - NOW disables the grace period for lagging replicas. - FORCE ignores errors writing the RDB or AOF files which would normally prevent a shutdown. - ABORT cancels ongoing shutdown. Can't be combined with other flags. * New field in the output of the INFO command: 'shutdown_in_milliseconds'. The value is the remaining maximum time to wait for lagging replicas before finishing the shutdown. This field is present in the Server section **only** during shutdown. Not directly related: * When shutting down, if there is an AOF saving child, it is killed **even** if AOF is disabled. This can happen if BGREWRITEAOF is used when AOF is off. * Client pause now has end time and type (WRITE or ALL) per purpose. The different pause purposes are *CLIENT PAUSE command*, *failover* and *shutdown*. If clients are unpaused for one purpose, it doesn't affect client pause for other purposes. For example, the CLIENT UNPAUSE command doesn't affect client pause initiated by the failover or shutdown procedures. A completed failover or a failed shutdown doesn't unpause clients paused by the CLIENT PAUSE command. Notes: * DEBUG RESTART doesn't wait for replicas. * We already have a warning logged when a replica disconnects. This means that if any replica connection is lost during the shutdown, it is either logged as disconnected or as lagging at the time of exit. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-02 08:50:15 +01:00
/* Client pause purposes. Each purpose has its own end time and pause type. */
typedef enum {
PAUSE_BY_CLIENT_COMMAND = 0,
PAUSE_DURING_SHUTDOWN,
PAUSE_DURING_FAILOVER,
NUM_PAUSE_PURPOSES /* This value is the number of purposes above. */
} pause_purpose;
typedef struct {
pause_type type;
mstime_t end;
} pause_event;
/* Ways that a clusters endpoint can be described */
typedef enum {
CLUSTER_ENDPOINT_TYPE_IP = 0, /* Show IP address */
CLUSTER_ENDPOINT_TYPE_HOSTNAME, /* Show hostname */
CLUSTER_ENDPOINT_TYPE_UNKNOWN_ENDPOINT /* Show NULL or empty */
} cluster_endpoint_type;
/* RDB active child save type. */
2015-07-27 09:41:48 +02:00
#define RDB_CHILD_TYPE_NONE 0
#define RDB_CHILD_TYPE_DISK 1 /* RDB is written to disk. */
#define RDB_CHILD_TYPE_SOCKET 2 /* RDB is written to slave socket. */
/* Keyspace changes notification classes. Every class is associated with a
* character for configuration purposes. */
2015-07-27 09:41:48 +02:00
#define NOTIFY_KEYSPACE (1<<0) /* K */
#define NOTIFY_KEYEVENT (1<<1) /* E */
#define NOTIFY_GENERIC (1<<2) /* g */
#define NOTIFY_STRING (1<<3) /* $ */
#define NOTIFY_LIST (1<<4) /* l */
#define NOTIFY_SET (1<<5) /* s */
#define NOTIFY_HASH (1<<6) /* h */
#define NOTIFY_ZSET (1<<7) /* z */
#define NOTIFY_EXPIRED (1<<8) /* x */
#define NOTIFY_EVICTED (1<<9) /* e */
#define NOTIFY_STREAM (1<<10) /* t */
#define NOTIFY_KEY_MISS (1<<11) /* m (Note: This one is excluded from NOTIFY_ALL on purpose) */
#define NOTIFY_LOADED (1<<12) /* module only key space notification, indicate a key loaded from rdb */
#define NOTIFY_MODULE (1<<13) /* d, module key space notification */
#define NOTIFY_NEW (1<<14) /* n, new key notification */
#define NOTIFY_ALL (NOTIFY_GENERIC | NOTIFY_STRING | NOTIFY_LIST | NOTIFY_SET | NOTIFY_HASH | NOTIFY_ZSET | NOTIFY_EXPIRED | NOTIFY_EVICTED | NOTIFY_STREAM | NOTIFY_MODULE) /* A flag */
/* Using the following macro you can run code inside serverCron() with the
* specified period, specified in milliseconds.
* The actual resolution depends on server.hz. */
#define run_with_period(_ms_) if ((_ms_ <= 1000/server.hz) || !(server.cronloops%((_ms_)/(1000/server.hz))))
/* We can print the stacktrace, so our assert is defined this way: */
#define serverAssertWithInfo(_c,_o,_e) ((_e)?(void)0 : (_serverAssertWithInfo(_c,_o,#_e,__FILE__,__LINE__),redis_unreachable()))
#define serverAssert(_e) ((_e)?(void)0 : (_serverAssert(#_e,__FILE__,__LINE__),redis_unreachable()))
#define serverPanic(...) _serverPanic(__FILE__,__LINE__,__VA_ARGS__),redis_unreachable()
Added INFO LATENCYSTATS section: latency by percentile distribution/latency by cumulative distribution of latencies (#9462) # Short description The Redis extended latency stats track per command latencies and enables: - exporting the per-command percentile distribution via the `INFO LATENCYSTATS` command. **( percentile distribution is not mergeable between cluster nodes ).** - exporting the per-command cumulative latency distributions via the `LATENCY HISTOGRAM` command. Using the cumulative distribution of latencies we can merge several stats from different cluster nodes to calculate aggregate metrics . By default, the extended latency monitoring is enabled since the overhead of keeping track of the command latency is very small. If you don't want to track extended latency metrics, you can easily disable it at runtime using the command: - `CONFIG SET latency-tracking no` By default, the exported latency percentiles are the p50, p99, and p999. You can alter them at runtime using the command: - `CONFIG SET latency-tracking-info-percentiles "0.0 50.0 100.0"` ## Some details: - The total size per histogram should sit around 40 KiB. We only allocate those 40KiB when a command was called for the first time. - With regards to the WRITE overhead As seen below, there is no measurable overhead on the achievable ops/sec or full latency spectrum on the client. Including also the measured redis-benchmark for unstable vs this branch. - We track from 1 nanosecond to 1 second ( everything above 1 second is considered +Inf ) ## `INFO LATENCYSTATS` exposition format - Format: `latency_percentiles_usec_<CMDNAME>:p0=XX,p50....` ## `LATENCY HISTOGRAM [command ...]` exposition format Return a cumulative distribution of latencies in the format of a histogram for the specified command names. The histogram is composed of a map of time buckets: - Each representing a latency range, between 1 nanosecond and roughly 1 second. - Each bucket covers twice the previous bucket's range. - Empty buckets are not printed. - Everything above 1 sec is considered +Inf. - At max there will be log2(1000000000)=30 buckets We reply a map for each command in the format: `<command name> : { `calls`: <total command calls> , `histogram` : { <bucket 1> : latency , < bucket 2> : latency, ... } }` Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-05 13:01:05 +01:00
/* latency histogram per command init settings */
#define LATENCY_HISTOGRAM_MIN_VALUE 1L /* >= 1 nanosec */
#define LATENCY_HISTOGRAM_MAX_VALUE 1000000000L /* <= 1 secs */
#define LATENCY_HISTOGRAM_PRECISION 2 /* Maintain a value precision of 2 significant digits across LATENCY_HISTOGRAM_MIN_VALUE and LATENCY_HISTOGRAM_MAX_VALUE range.
* Value quantization within the range will thus be no larger than 1/100th (or 1%) of any value.
* The total size per histogram should sit around 40 KiB Bytes. */
/* Busy module flags, see busy_module_yield_flags */
#define BUSY_MODULE_YIELD_NONE (0)
#define BUSY_MODULE_YIELD_EVENTS (1<<0)
#define BUSY_MODULE_YIELD_CLIENTS (1<<1)
/*-----------------------------------------------------------------------------
* Data types
*----------------------------------------------------------------------------*/
/* A redis object, that is a type able to hold a string / list / set */
/* The actual Redis Object */
#define OBJ_STRING 0 /* String object. */
#define OBJ_LIST 1 /* List object. */
#define OBJ_SET 2 /* Set object. */
#define OBJ_ZSET 3 /* Sorted set object. */
#define OBJ_HASH 4 /* Hash object. */
/* The "module" object type is a special one that signals that the object
* is one directly managed by a Redis module. In this case the value points
* to a moduleValue struct, which contains the object value (which is only
* handled by the module itself) and the RedisModuleType struct which lists
* function pointers in order to serialize, deserialize, AOF-rewrite and
* free the object.
*
* Inside the RDB file, module types are encoded as OBJ_MODULE followed
* by a 64 bit module type ID, which has a 54 bits module-specific signature
* in order to dispatch the loading to the right module, plus a 10 bits
* encoding version. */
#define OBJ_MODULE 5 /* Module object. */
#define OBJ_STREAM 6 /* Stream object. */
/* Extract encver / signature from a module type ID. */
#define REDISMODULE_TYPE_ENCVER_BITS 10
#define REDISMODULE_TYPE_ENCVER_MASK ((1<<REDISMODULE_TYPE_ENCVER_BITS)-1)
#define REDISMODULE_TYPE_ENCVER(id) (id & REDISMODULE_TYPE_ENCVER_MASK)
#define REDISMODULE_TYPE_SIGN(id) ((id & ~((uint64_t)REDISMODULE_TYPE_ENCVER_MASK)) >>REDISMODULE_TYPE_ENCVER_BITS)
/* Bit flags for moduleTypeAuxSaveFunc */
#define REDISMODULE_AUX_BEFORE_RDB (1<<0)
#define REDISMODULE_AUX_AFTER_RDB (1<<1)
struct RedisModule;
struct RedisModuleIO;
struct RedisModuleDigest;
struct RedisModuleCtx;
struct moduleLoadQueueEntry;
struct redisObject;
struct RedisModuleDefragCtx;
struct RedisModuleInfoCtx;
struct RedisModuleKeyOptCtx;
/* Each module type implementation should export a set of methods in order
* to serialize and deserialize the value in the RDB file, rewrite the AOF
* log, create the digest for "DEBUG DIGEST", and free the value when a key
* is deleted. */
typedef void *(*moduleTypeLoadFunc)(struct RedisModuleIO *io, int encver);
typedef void (*moduleTypeSaveFunc)(struct RedisModuleIO *io, void *value);
typedef int (*moduleTypeAuxLoadFunc)(struct RedisModuleIO *rdb, int encver, int when);
typedef void (*moduleTypeAuxSaveFunc)(struct RedisModuleIO *rdb, int when);
typedef void (*moduleTypeRewriteFunc)(struct RedisModuleIO *io, struct redisObject *key, void *value);
typedef void (*moduleTypeDigestFunc)(struct RedisModuleDigest *digest, void *value);
typedef size_t (*moduleTypeMemUsageFunc)(const void *value);
typedef void (*moduleTypeFreeFunc)(void *value);
typedef size_t (*moduleTypeFreeEffortFunc)(struct redisObject *key, const void *value);
typedef void (*moduleTypeUnlinkFunc)(struct redisObject *key, void *value);
typedef void *(*moduleTypeCopyFunc)(struct redisObject *fromkey, struct redisObject *tokey, const void *value);
typedef int (*moduleTypeDefragFunc)(struct RedisModuleDefragCtx *ctx, struct redisObject *key, void **value);
typedef void (*RedisModuleInfoFunc)(struct RedisModuleInfoCtx *ctx, int for_crash_report);
typedef void (*RedisModuleDefragFunc)(struct RedisModuleDefragCtx *ctx);
typedef size_t (*moduleTypeMemUsageFunc2)(struct RedisModuleKeyOptCtx *ctx, const void *value, size_t sample_size);
typedef void (*moduleTypeFreeFunc2)(struct RedisModuleKeyOptCtx *ctx, void *value);
typedef size_t (*moduleTypeFreeEffortFunc2)(struct RedisModuleKeyOptCtx *ctx, const void *value);
typedef void (*moduleTypeUnlinkFunc2)(struct RedisModuleKeyOptCtx *ctx, void *value);
typedef void *(*moduleTypeCopyFunc2)(struct RedisModuleKeyOptCtx *ctx, const void *value);
Implement redisAtomic to replace _Atomic C11 builtin (#7707) Redis 6.0 introduces I/O threads, it is so cool and efficient, we use C11 _Atomic to establish inter-thread synchronization without mutex. But the compiler that must supports C11 _Atomic can compile redis code, that brings a lot of inconvenience since some common platforms can't support by default such as CentOS7, so we want to implement redis atomic type to make it more portable. We have implemented our atomic variable for redis that only has 'relaxed' operations in src/atomicvar.h, so we implement some operations with 'sequentially-consistent', just like the default behavior of C11 _Atomic that can establish inter-thread synchronization. And we replace all uses of C11 _Atomic with redis atomic variable. Our implementation of redis atomic variable uses C11 _Atomic, __atomic or __sync macros if available, it supports most common platforms, and we will detect automatically which feature we use. In Makefile we use a dummy file to detect if the compiler supports C11 _Atomic. Now for gcc, we can compile redis code theoretically if your gcc version is not less than 4.1.2(starts to support __sync_xxx operations). Otherwise, we remove use mutex fallback to implement redis atomic variable for performance and test. You will get compiling errors if your compiler doesn't support all features of above. For cover redis atomic variable tests, we add other CI jobs that build redis on CentOS6 and CentOS7 and workflow daily jobs that run the tests on them. For them, we just install gcc by default in order to cover different compiler versions, gcc is 4.4.7 by default installation on CentOS6 and 4.8.5 on CentOS7. We restore the feature that we can test redis with Helgrind to find data race errors. But you need install Valgrind in the default path configuration firstly before running your tests, since we use macros in helgrind.h to tell Helgrind inter-thread happens-before relationship explicitly for avoiding false positives. Please open an issue on github if you find data race errors relate to this commit. Unrelated: - Fix redefinition of typedef 'RedisModuleUserChangedFunc' For some old version compilers, they will report errors or warnings, if we re-define function type.
2020-09-17 15:01:45 +02:00
/* This callback type is called by moduleNotifyUserChanged() every time
* a user authenticated via the module API is associated with a different
* user or gets disconnected. This needs to be exposed since you can't cast
* a function pointer to (void *). */
typedef void (*RedisModuleUserChangedFunc) (uint64_t client_id, void *privdata);
/* The module type, which is referenced in each value of a given type, defines
* the methods and links to the module exporting the type. */
typedef struct RedisModuleType {
uint64_t id; /* Higher 54 bits of type ID + 10 lower bits of encoding ver. */
struct RedisModule *module;
moduleTypeLoadFunc rdb_load;
moduleTypeSaveFunc rdb_save;
moduleTypeRewriteFunc aof_rewrite;
moduleTypeMemUsageFunc mem_usage;
moduleTypeDigestFunc digest;
moduleTypeFreeFunc free;
moduleTypeFreeEffortFunc free_effort;
moduleTypeUnlinkFunc unlink;
moduleTypeCopyFunc copy;
moduleTypeDefragFunc defrag;
moduleTypeAuxLoadFunc aux_load;
moduleTypeAuxSaveFunc aux_save;
moduleTypeMemUsageFunc2 mem_usage2;
moduleTypeFreeEffortFunc2 free_effort2;
moduleTypeUnlinkFunc2 unlink2;
moduleTypeCopyFunc2 copy2;
int aux_save_triggers;
char name[10]; /* 9 bytes name + null term. Charset: A-Z a-z 0-9 _- */
} moduleType;
/* In Redis objects 'robj' structures of type OBJ_MODULE, the value pointer
* is set to the following structure, referencing the moduleType structure
* in order to work with the value, and at the same time providing a raw
* pointer to the value, as created by the module commands operating with
* the module type.
*
* So for example in order to free such a value, it is possible to use
* the following code:
*
* if (robj->type == OBJ_MODULE) {
* moduleValue *mt = robj->ptr;
* mt->type->free(mt->value);
* zfree(mt); // We need to release this in-the-middle struct as well.
* }
*/
typedef struct moduleValue {
moduleType *type;
void *value;
} moduleValue;
/* This structure represents a module inside the system. */
struct RedisModule {
void *handle; /* Module dlopen() handle. */
char *name; /* Module name. */
int ver; /* Module version. We use just progressive integers. */
int apiver; /* Module API version as requested during initialization.*/
list *types; /* Module data types. */
list *usedby; /* List of modules using APIs from this one. */
list *using; /* List of modules we use some APIs of. */
list *filters; /* List of filters the module has registered. */
Module Configurations (#10285) This feature adds the ability to add four different types (Bool, Numeric, String, Enum) of configurations to a module to be accessed via the redis config file, and the CONFIG command. **Configuration Names**: We impose a restriction that a module configuration always starts with the module name and contains a '.' followed by the config name. If a module passes "config1" as the name to a register function, it will be registered as MODULENAME.config1. **Configuration Persistence**: Module Configurations exist only as long as a module is loaded. If a module is unloaded, the configurations are removed. There is now also a minimal core API for removal of standardConfig objects from configs by name. **Get and Set Callbacks**: Storage of config values is owned by the module that registers them, and provides callbacks for Redis to access and manipulate the values. This is exposed through a GET and SET callback. The get callback returns a typed value of the config to redis. The callback takes the name of the configuration, and also a privdata pointer. Note that these only take the CONFIGNAME portion of the config, not the entire MODULENAME.CONFIGNAME. ``` typedef RedisModuleString * (*RedisModuleConfigGetStringFunc)(const char *name, void *privdata); typedef long long (*RedisModuleConfigGetNumericFunc)(const char *name, void *privdata); typedef int (*RedisModuleConfigGetBoolFunc)(const char *name, void *privdata); typedef int (*RedisModuleConfigGetEnumFunc)(const char *name, void *privdata); ``` Configs must also must specify a set callback, i.e. what to do on a CONFIG SET XYZ 123 or when loading configurations from cli/.conf file matching these typedefs. *name* is again just the CONFIGNAME portion, *val* is the parsed value from the core, *privdata* is the registration time privdata pointer, and *err* is for providing errors to a client. ``` typedef int (*RedisModuleConfigSetStringFunc)(const char *name, RedisModuleString *val, void *privdata, RedisModuleString **err); typedef int (*RedisModuleConfigSetNumericFunc)(const char *name, long long val, void *privdata, RedisModuleString **err); typedef int (*RedisModuleConfigSetBoolFunc)(const char *name, int val, void *privdata, RedisModuleString **err); typedef int (*RedisModuleConfigSetEnumFunc)(const char *name, int val, void *privdata, RedisModuleString **err); ``` Modules can also specify an optional apply callback that will be called after value(s) have been set via CONFIG SET: ``` typedef int (*RedisModuleConfigApplyFunc)(RedisModuleCtx *ctx, void *privdata, RedisModuleString **err); ``` **Flags:** We expose 7 new flags to the module, which are used as part of the config registration. ``` #define REDISMODULE_CONFIG_MODIFIABLE 0 /* This is the default for a module config. */ #define REDISMODULE_CONFIG_IMMUTABLE (1ULL<<0) /* Can this value only be set at startup? */ #define REDISMODULE_CONFIG_SENSITIVE (1ULL<<1) /* Does this value contain sensitive information */ #define REDISMODULE_CONFIG_HIDDEN (1ULL<<4) /* This config is hidden in `config get <pattern>` (used for tests/debugging) */ #define REDISMODULE_CONFIG_PROTECTED (1ULL<<5) /* Becomes immutable if enable-protected-configs is enabled. */ #define REDISMODULE_CONFIG_DENY_LOADING (1ULL<<6) /* This config is forbidden during loading. */ /* Numeric Specific Configs */ #define REDISMODULE_CONFIG_MEMORY (1ULL<<7) /* Indicates if this value can be set as a memory value */ ``` **Module Registration APIs**: ``` int (*RedisModule_RegisterBoolConfig)(RedisModuleCtx *ctx, char *name, int default_val, unsigned int flags, RedisModuleConfigGetBoolFunc getfn, RedisModuleConfigSetBoolFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_RegisterNumericConfig)(RedisModuleCtx *ctx, const char *name, long long default_val, unsigned int flags, long long min, long long max, RedisModuleConfigGetNumericFunc getfn, RedisModuleConfigSetNumericFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_RegisterStringConfig)(RedisModuleCtx *ctx, const char *name, const char *default_val, unsigned int flags, RedisModuleConfigGetStringFunc getfn, RedisModuleConfigSetStringFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_RegisterEnumConfig)(RedisModuleCtx *ctx, const char *name, int default_val, unsigned int flags, const char **enum_values, const int *int_values, int num_enum_vals, RedisModuleConfigGetEnumFunc getfn, RedisModuleConfigSetEnumFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_LoadConfigs)(RedisModuleCtx *ctx); ``` The module name will be auto appended along with a "." to the front of the name of the config. **What RM_Register[...]Config does**: A RedisModule struct now keeps a list of ModuleConfig objects which look like: ``` typedef struct ModuleConfig { sds name; /* Name of config without the module name appended to the front */ void *privdata; /* Optional data passed into the module config callbacks */ union get_fn { /* The get callback specificed by the module */ RedisModuleConfigGetStringFunc get_string; RedisModuleConfigGetNumericFunc get_numeric; RedisModuleConfigGetBoolFunc get_bool; RedisModuleConfigGetEnumFunc get_enum; } get_fn; union set_fn { /* The set callback specified by the module */ RedisModuleConfigSetStringFunc set_string; RedisModuleConfigSetNumericFunc set_numeric; RedisModuleConfigSetBoolFunc set_bool; RedisModuleConfigSetEnumFunc set_enum; } set_fn; RedisModuleConfigApplyFunc apply_fn; RedisModule *module; } ModuleConfig; ``` It also registers a standardConfig in the configs array, with a pointer to the ModuleConfig object associated with it. **What happens on a CONFIG GET/SET MODULENAME.MODULECONFIG:** For CONFIG SET, we do the same parsing as is done in config.c and pass that as the argument to the module set callback. For CONFIG GET, we call the module get callback and return that value to config.c to return to a client. **CONFIG REWRITE**: Starting up a server with module configurations in a .conf file but no module load directive will fail. The flip side is also true, specifying a module load and a bunch of module configurations will load those configurations in using the module defined set callbacks on a RM_LoadConfigs call. Configs being rewritten works the same way as it does for standard configs, as the module has the ability to specify a default value. If a module is unloaded with configurations specified in the .conf file those configurations will be commented out from the .conf file on the next config rewrite. **RM_LoadConfigs:** `RedisModule_LoadConfigs(RedisModuleCtx *ctx);` This last API is used to make configs available within the onLoad() after they have been registered. The expected usage is that a module will register all of its configs, then call LoadConfigs to trigger all of the set callbacks, and then can error out if any of them were malformed. LoadConfigs will attempt to set all configs registered to either a .conf file argument/loadex argument or their default value if an argument is not specified. **LoadConfigs is a required function if configs are registered. ** Also note that LoadConfigs **does not** call the apply callbacks, but a module can do that directly after the LoadConfigs call. **New Command: MODULE LOADEX [CONFIG NAME VALUE] [ARGS ...]:** This command provides the ability to provide startup context information to a module. LOADEX stands for "load extended" similar to GETEX. Note that provided config names need the full MODULENAME.MODULECONFIG name. Any additional arguments a module might want are intended to be specified after ARGS. Everything after ARGS is passed to onLoad as RedisModuleString **argv. Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <matolson@amazon.com> Co-authored-by: sundb <sundbcn@gmail.com> Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com> Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2022-03-30 14:47:06 +02:00
list *module_configs; /* List of configurations the module has registered */
int configs_initialized; /* Have the module configurations been initialized? */
int in_call; /* RM_Call() nesting level */
int in_hook; /* Hooks callback nesting level for this module (0 or 1). */
int options; /* Module options and capabilities. */
int blocked_clients; /* Count of RedisModuleBlockedClient in this module. */
RedisModuleInfoFunc info_cb; /* Callback for module to add INFO fields. */
RedisModuleDefragFunc defrag_cb; /* Callback for global data defrag. */
struct moduleLoadQueueEntry *loadmod; /* Module load arguments for config rewrite. */
};
typedef struct RedisModule RedisModule;
/* This is a wrapper for the 'rio' streams used inside rdb.c in Redis, so that
* the user does not have to take the total count of the written bytes nor
* to care about error conditions. */
typedef struct RedisModuleIO {
size_t bytes; /* Bytes read / written so far. */
rio *rio; /* Rio stream. */
moduleType *type; /* Module type doing the operation. */
int error; /* True if error condition happened. */
RDB modules values serialization format version 2. The original RDB serialization format was not parsable without the module loaded, becuase the structure was managed only by the module itself. Moreover RDB is a streaming protocol in the sense that it is both produce di an append-only fashion, and is also sometimes directly sent to the socket (in the case of diskless replication). The fact that modules values cannot be parsed without the relevant module loaded is a problem in many ways: RDB checking tools must have loaded modules even for doing things not involving the value at all, like splitting an RDB into N RDBs by key or alike, or just checking the RDB for sanity. In theory module values could be just a blob of data with a prefixed length in order for us to be able to skip it. However prefixing the values with a length would mean one of the following: 1. To be able to write some data at a previous offset. This breaks stremaing. 2. To bufferize values before outputting them. This breaks performances. 3. To have some chunked RDB output format. This breaks simplicity. Moreover, the above solution, still makes module values a totally opaque matter, with the fowllowing problems: 1. The RDB check tool can just skip the value without being able to at least check the general structure. For datasets composed mostly of modules values this means to just check the outer level of the RDB not actually doing any checko on most of the data itself. 2. It is not possible to do any recovering or processing of data for which a module no longer exists in the future, or is unknown. So this commit implements a different solution. The modules RDB serialization API is composed if well defined calls to store integers, floats, doubles or strings. After this commit, the parts generated by the module API have a one-byte prefix for each of the above emitted parts, and there is a final EOF byte as well. So even if we don't know exactly how to interpret a module value, we can always parse it at an high level, check the overall structure, understand the types used to store the information, and easily skip the whole value. The change is backward compatible: older RDB files can be still loaded since the new encoding has a new RDB type: MODULE_2 (of value 7). The commit also implements the ability to check RDB files for sanity taking advantage of the new feature.
2017-06-27 13:09:33 +02:00
int ver; /* Module serialization version: 1 (old),
* 2 (current version with opcodes annotation). */
2016-10-06 17:10:47 +02:00
struct RedisModuleCtx *ctx; /* Optional context, see RM_GetContextFromIO()*/
2016-11-30 20:47:02 +01:00
struct redisObject *key; /* Optional name of key processed */
int dbid; /* The dbid of the key being processed, -1 when unknown. */
} RedisModuleIO;
RDB modules values serialization format version 2. The original RDB serialization format was not parsable without the module loaded, becuase the structure was managed only by the module itself. Moreover RDB is a streaming protocol in the sense that it is both produce di an append-only fashion, and is also sometimes directly sent to the socket (in the case of diskless replication). The fact that modules values cannot be parsed without the relevant module loaded is a problem in many ways: RDB checking tools must have loaded modules even for doing things not involving the value at all, like splitting an RDB into N RDBs by key or alike, or just checking the RDB for sanity. In theory module values could be just a blob of data with a prefixed length in order for us to be able to skip it. However prefixing the values with a length would mean one of the following: 1. To be able to write some data at a previous offset. This breaks stremaing. 2. To bufferize values before outputting them. This breaks performances. 3. To have some chunked RDB output format. This breaks simplicity. Moreover, the above solution, still makes module values a totally opaque matter, with the fowllowing problems: 1. The RDB check tool can just skip the value without being able to at least check the general structure. For datasets composed mostly of modules values this means to just check the outer level of the RDB not actually doing any checko on most of the data itself. 2. It is not possible to do any recovering or processing of data for which a module no longer exists in the future, or is unknown. So this commit implements a different solution. The modules RDB serialization API is composed if well defined calls to store integers, floats, doubles or strings. After this commit, the parts generated by the module API have a one-byte prefix for each of the above emitted parts, and there is a final EOF byte as well. So even if we don't know exactly how to interpret a module value, we can always parse it at an high level, check the overall structure, understand the types used to store the information, and easily skip the whole value. The change is backward compatible: older RDB files can be still loaded since the new encoding has a new RDB type: MODULE_2 (of value 7). The commit also implements the ability to check RDB files for sanity taking advantage of the new feature.
2017-06-27 13:09:33 +02:00
/* Macro to initialize an IO context. Note that the 'ver' field is populated
* inside rdb.c according to the version of the value to load. */
#define moduleInitIOContext(iovar,mtype,rioptr,keyptr,db) do { \
iovar.rio = rioptr; \
iovar.type = mtype; \
iovar.bytes = 0; \
iovar.error = 0; \
RDB modules values serialization format version 2. The original RDB serialization format was not parsable without the module loaded, becuase the structure was managed only by the module itself. Moreover RDB is a streaming protocol in the sense that it is both produce di an append-only fashion, and is also sometimes directly sent to the socket (in the case of diskless replication). The fact that modules values cannot be parsed without the relevant module loaded is a problem in many ways: RDB checking tools must have loaded modules even for doing things not involving the value at all, like splitting an RDB into N RDBs by key or alike, or just checking the RDB for sanity. In theory module values could be just a blob of data with a prefixed length in order for us to be able to skip it. However prefixing the values with a length would mean one of the following: 1. To be able to write some data at a previous offset. This breaks stremaing. 2. To bufferize values before outputting them. This breaks performances. 3. To have some chunked RDB output format. This breaks simplicity. Moreover, the above solution, still makes module values a totally opaque matter, with the fowllowing problems: 1. The RDB check tool can just skip the value without being able to at least check the general structure. For datasets composed mostly of modules values this means to just check the outer level of the RDB not actually doing any checko on most of the data itself. 2. It is not possible to do any recovering or processing of data for which a module no longer exists in the future, or is unknown. So this commit implements a different solution. The modules RDB serialization API is composed if well defined calls to store integers, floats, doubles or strings. After this commit, the parts generated by the module API have a one-byte prefix for each of the above emitted parts, and there is a final EOF byte as well. So even if we don't know exactly how to interpret a module value, we can always parse it at an high level, check the overall structure, understand the types used to store the information, and easily skip the whole value. The change is backward compatible: older RDB files can be still loaded since the new encoding has a new RDB type: MODULE_2 (of value 7). The commit also implements the ability to check RDB files for sanity taking advantage of the new feature.
2017-06-27 13:09:33 +02:00
iovar.ver = 0; \
2016-11-30 20:47:02 +01:00
iovar.key = keyptr; \
iovar.dbid = db; \
iovar.ctx = NULL; \
} while(0)
2017-07-06 10:29:19 +02:00
/* This is a structure used to export DEBUG DIGEST capabilities to Redis
* modules. We want to capture both the ordered and unordered elements of
* a data structure, so that a digest can be created in a way that correctly
* reflects the values. See the DEBUG DIGEST command implementation for more
* background. */
typedef struct RedisModuleDigest {
unsigned char o[20]; /* Ordered elements. */
unsigned char x[20]; /* Xored elements. */
struct redisObject *key; /* Optional name of key processed */
int dbid; /* The dbid of the key being processed */
2017-07-06 10:29:19 +02:00
} RedisModuleDigest;
/* Just start with a digest composed of all zero bytes. */
#define moduleInitDigestContext(mdvar) do { \
memset(mdvar.o,0,sizeof(mdvar.o)); \
memset(mdvar.x,0,sizeof(mdvar.x)); \
} while(0)
2017-07-06 10:29:19 +02:00
/* Objects encoding. Some kind of objects like Strings and Hashes can be
* internally represented in multiple ways. The 'encoding' field of the object
* is set to one of this fields for this object. */
#define OBJ_ENCODING_RAW 0 /* Raw representation */
#define OBJ_ENCODING_INT 1 /* Encoded as integer */
#define OBJ_ENCODING_HT 2 /* Encoded as hash table */
#define OBJ_ENCODING_ZIPMAP 3 /* Encoded as zipmap */
#define OBJ_ENCODING_LINKEDLIST 4 /* No longer used: old list encoding. */
#define OBJ_ENCODING_ZIPLIST 5 /* Encoded as ziplist */
#define OBJ_ENCODING_INTSET 6 /* Encoded as intset */
#define OBJ_ENCODING_SKIPLIST 7 /* Encoded as skiplist */
#define OBJ_ENCODING_EMBSTR 8 /* Embedded sds string encoding */
2021-11-24 12:34:13 +01:00
#define OBJ_ENCODING_QUICKLIST 9 /* Encoded as linked list of listpacks */
#define OBJ_ENCODING_STREAM 10 /* Encoded as a radix tree of listpacks */
#define OBJ_ENCODING_LISTPACK 11 /* Encoded as a listpack */
2015-07-27 09:41:48 +02:00
#define LRU_BITS 24
#define LRU_CLOCK_MAX ((1<<LRU_BITS)-1) /* Max value of obj->lru */
#define LRU_CLOCK_RESOLUTION 1000 /* LRU clock resolution in ms */
#define OBJ_SHARED_REFCOUNT INT_MAX /* Global object never destroyed. */
#define OBJ_STATIC_REFCOUNT (INT_MAX-1) /* Object allocated in the stack. */
#define OBJ_FIRST_SPECIAL_REFCOUNT OBJ_STATIC_REFCOUNT
typedef struct redisObject {
unsigned type:4;
unsigned encoding:4;
unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
* LFU data (least significant 8 bits frequency
* and most significant 16 bits access time). */
int refcount;
void *ptr;
} robj;
/* The a string name for an object's type as listed above
* Native types are checked against the OBJ_STRING, OBJ_LIST, OBJ_* defines,
* and Module types have their registered name returned. */
char *getObjectTypeName(robj*);
2013-01-16 18:00:20 +01:00
/* Macro used to initialize a Redis object allocated on the stack.
* Note that this macro is taken near the structure definition to make sure
* we'll update it when the structure is changed, to avoid bugs like
* bug #85 introduced exactly in this way. */
#define initStaticStringObject(_var,_ptr) do { \
_var.refcount = OBJ_STATIC_REFCOUNT; \
_var.type = OBJ_STRING; \
_var.encoding = OBJ_ENCODING_RAW; \
_var.ptr = _ptr; \
2016-04-25 15:49:57 +02:00
} while(0)
struct evictionPoolEntry; /* Defined in evict.c */
/* This structure is used in order to represent the output buffer of a client,
* which is actually a linked list of blocks like that, that is: client->reply. */
typedef struct clientReplyBlock {
size_t size, used;
char buf[];
} clientReplyBlock;
Replication backlog and replicas use one global shared replication buffer (#9166) ## Background For redis master, one replica uses one copy of replication buffer, that is a big waste of memory, more replicas more waste, and allocate/free memory for every reply list also cost much. If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with replicas and can't finish synchronization with replica. If we set client-output-buffer-limit big, master may be OOM when there are many replicas that separately keep much memory. Because replication buffers of different replica client are the same, one simple idea is that all replicas only use one replication buffer, that will effectively save memory. Since replication backlog content is the same as replicas' output buffer, now we can discard replication backlog memory and use global shared replication buffer to implement replication backlog mechanism. ## Implementation I create one global "replication buffer" which contains content of replication stream. The structure of "replication buffer" is similar to the reply list that exists in every client. But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields. ```c /* Replication buffer blocks is the list of replBufBlock. * * +--------------+ +--------------+ +--------------+ * | refcount = 1 | ... | refcount = 0 | ... | refcount = 2 | * +--------------+ +--------------+ +--------------+ * | / \ * | / \ * | / \ * Repl Backlog Replia_A Replia_B * * Each replica or replication backlog increments only the refcount of the * 'ref_repl_buf_node' which it points to. So when replica walks to the next * node, it should first increase the next node's refcount, and when we trim * the replication buffer nodes, we remove node always from the head node which * refcount is 0. If the refcount of the head node is not 0, we must stop * trimming and never iterate the next node. */ /* Similar with 'clientReplyBlock', it is used for shared buffers between * all replica clients and replication backlog. */ typedef struct replBufBlock { int refcount; /* Number of replicas or repl backlog using. */ long long id; /* The unique incremental number. */ long long repl_offset; /* Start replication offset of the block. */ size_t size, used; char buf[]; } replBufBlock; ``` So now when we feed replication stream into replication backlog and all replicas, we only need to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of replication backlog and replicas to references of the global replication buffer blocks. And we also need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim replication backlog if exceeding `repl-backlog-size`. When sending reply to replicas, we also need to iterate replication buffer blocks and send its content, when totally sending one block for replica, we decrease current node count and increase the next current node count, and then free the block which reference is 0 from the head of replication buffer blocks. Since now we use linked list to manage replication backlog, it may cost much time for iterating all linked list nodes to find corresponding replication buffer node. So we create a rax tree to store some nodes for index, but to avoid rax tree occupying too much memory, i record one per 64 nodes for index. Currently, to make partial resynchronization as possible as much, we always let replication backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks, and this method doesn't increase memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced replication buffer blocks when we need to trim backlog for exceeding backlog size setting, we trim backlog incrementally (free 64 blocks per call now), and make it faster in `beforeSleep` (free 640 blocks). ### Other changes - `mem_total_replication_buffers`: we add this field in INFO command, it means the total memory of replication buffers used. - `mem_clients_slaves`: now even replica is slow to replicate, and its output buffer memory is not 0, but it still may be 0, since replication backlog and replicas share one global replication buffer, only if replication buffer memory is more than the repl backlog setting size, we consider the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption of repl backlog. - Key eviction Since all replicas and replication backlog share global replication buffer, we think only the part of exceeding backlog size the extra separate consumption of replicas. Because we trim backlog incrementally in the background, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks disconnect. To avoid massive eviction loop, we don't count the delayed freed replication backlog into used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory. - `client-output-buffer-limit` check for replica clients It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size config (partial sync will succeed and then replica will get disconnected). Such a configuration is ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption implications since the replica client will share the backlog buffers memory. - Drop replication backlog after loading data if needed We always create replication backlog if server is a master, we need it because we put DELs in it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb, it is not possible to support partial resynchronization, to avoid extra memory of replication backlog, we drop it. - Multi IO threads Since all replicas and replication backlog use global replication buffer, if I/O threads are enabled, to guarantee data accessing thread safe, we must let main thread handle sending the output buffer to all replicas. But before, other IO threads could handle sending output buffer of all replicas. ## Other optimizations This solution resolve some other problem: - When replicas disconnect with master since of out of output buffer limit, releasing the output buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now, it doesn't cause freezing. - This implementation may mitigate reply list copy cost time(also freezes server) when one replication has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy reference info, it is very light. - If we set replication backlog size big, it also may cost much time to copy replication backlog into replica's output buffer. But this commit eliminates this problem. - Resizing replication backlog size doesn't empty current replication backlog content.
2021-10-25 08:24:31 +02:00
/* Replication buffer blocks is the list of replBufBlock.
*
* +--------------+ +--------------+ +--------------+
* | refcount = 1 | ... | refcount = 0 | ... | refcount = 2 |
* +--------------+ +--------------+ +--------------+
* | / \
* | / \
* | / \
* Repl Backlog Replia_A Replia_B
*
* Each replica or replication backlog increments only the refcount of the
* 'ref_repl_buf_node' which it points to. So when replica walks to the next
* node, it should first increase the next node's refcount, and when we trim
* the replication buffer nodes, we remove node always from the head node which
* refcount is 0. If the refcount of the head node is not 0, we must stop
* trimming and never iterate the next node. */
/* Similar with 'clientReplyBlock', it is used for shared buffers between
* all replica clients and replication backlog. */
typedef struct replBufBlock {
int refcount; /* Number of replicas or repl backlog using. */
long long id; /* The unique incremental number. */
long long repl_offset; /* Start replication offset of the block. */
size_t size, used;
char buf[];
} replBufBlock;
Replica keep serving data during repl-diskless-load=swapdb for better availability (#9323) For diskless replication in swapdb mode, considering we already spend replica memory having a backup of current db to restore in case of failure, we can have the following benefits by instead swapping database only in case we succeeded in transferring db from master: - Avoid `LOADING` response during failed and successful synchronization for cases where the replica is already up and running with data. - Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping. - This could be implemented also for disk replication with similar benefits if consumers are willing to spend the extra memory usage. General notes: - The concept of `backupDb` becomes `tempDb` for clarity. - Async loading mode will only kick in if the replica is syncing from a master that has the same repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline. - New property in INFO: `async_loading` to differentiate from the blocking loading - Slot to Key mapping is now a field of `redisDb` as it's more natural to access it from both server.db and the tempDb that is passed around. - Because this is affecting replicas only, we assume that if they are not readonly and write commands during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET here anyways to avoid complications. Considerations for review: - We have many cases where server.loading flag is used and even though I tried my best, there may be cases where async_loading should be checked as well and cases where it shouldn't (would require very good understanding of whole code) - Several places that had different behavior depending on the loading flag where actually meant to just handle commands coming from the AOF client differently than ones coming from real clients, changed to check CLIENT_ID_AOF instead. **Additional for Release Notes** - Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't contribute on triggering next database SAVE - New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING - Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event. Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED, ABORTED and COMPLETED. - New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions to allow modules to declare they support the diskless replication with async loading (when absent, we fall back to disk-based loading). Co-authored-by: Eduardo Semprebon <edus@saxobank.com> Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-04 09:46:50 +01:00
/* Opaque type for the Slot to Key API. */
typedef struct clusterSlotToKeyMapping clusterSlotToKeyMapping;
/* Redis database representation. There are multiple databases identified
* by integers from 0 (the default database) up to the max configured
* database. The database number is the 'id' field in the structure. */
typedef struct redisDb {
dict *dict; /* The keyspace for this DB */
dict *expires; /* Timeout of keys with a timeout set */
dict *blocking_keys; /* Keys with clients waiting for data (BLPOP)*/
A reimplementation of blocking operation internals. Redis provides support for blocking operations such as BLPOP or BRPOP. This operations are identical to normal LPOP and RPOP operations as long as there are elements in the target list, but if the list is empty they block waiting for new data to arrive to the list. All the clients blocked waiting for th same list are served in a FIFO way, so the first that blocked is the first to be served when there is more data pushed by another client into the list. The previous implementation of blocking operations was conceived to serve clients in the context of push operations. For for instance: 1) There is a client "A" blocked on list "foo". 2) The client "B" performs `LPUSH foo somevalue`. 3) The client "A" is served in the context of the "B" LPUSH, synchronously. Processing things in a synchronous way was useful as if "A" pushes a value that is served by "B", from the point of view of the database is a NOP (no operation) thing, that is, nothing is replicated, nothing is written in the AOF file, and so forth. However later we implemented two things: 1) Variadic LPUSH that could add multiple values to a list in the context of a single call. 2) BRPOPLPUSH that was a version of BRPOP that also provided a "PUSH" side effect when receiving data. This forced us to make the synchronous implementation more complex. If client "B" is waiting for data, and "A" pushes three elemnents in a single call, we needed to propagate an LPUSH with a missing argument in the AOF and replication link. We also needed to make sure to replicate the LPUSH side of BRPOPLPUSH, but only if in turn did not happened to serve another blocking client into another list ;) This were complex but with a few of mutually recursive functions everything worked as expected... until one day we introduced scripting in Redis. Scripting + synchronous blocking operations = Issue #614. Basically you can't "rewrite" a script to have just a partial effect on the replicas and AOF file if the script happened to serve a few blocked clients. The solution to all this problems, implemented by this commit, is to change the way we serve blocked clients. Instead of serving the blocked clients synchronously, in the context of the command performing the PUSH operation, it is now an asynchronous and iterative process: 1) If a key that has clients blocked waiting for data is the subject of a list push operation, We simply mark keys as "ready" and put it into a queue. 2) Every command pushing stuff on lists, as a variadic LPUSH, a script, or whatever it is, is replicated verbatim without any rewriting. 3) Every time a Redis command, a MULTI/EXEC block, or a script, completed its execution, we run the list of keys ready to serve blocked clients (as more data arrived), and process this list serving the blocked clients. 4) As a result of "3" maybe more keys are ready again for other clients (as a result of BRPOPLPUSH we may have push operations), so we iterate back to step "3" if it's needed. The new code has a much simpler semantics, and a simpler to understand implementation, with the disadvantage of not being able to "optmize out" a PUSH+BPOP as a No OP. This commit will be tested with care before the final merge, more tests will be added likely.
2012-09-04 10:37:49 +02:00
dict *ready_keys; /* Blocked keys that received a PUSH */
dict *watched_keys; /* WATCHED keys for MULTI/EXEC CAS */
int id; /* Database ID */
long long avg_ttl; /* Average TTL, just for stats */
unsigned long expires_cursor; /* Cursor of the active expire cycle. */
list *defrag_later; /* List of key names to attempt to defrag one by one, gradually. */
Replica keep serving data during repl-diskless-load=swapdb for better availability (#9323) For diskless replication in swapdb mode, considering we already spend replica memory having a backup of current db to restore in case of failure, we can have the following benefits by instead swapping database only in case we succeeded in transferring db from master: - Avoid `LOADING` response during failed and successful synchronization for cases where the replica is already up and running with data. - Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping. - This could be implemented also for disk replication with similar benefits if consumers are willing to spend the extra memory usage. General notes: - The concept of `backupDb` becomes `tempDb` for clarity. - Async loading mode will only kick in if the replica is syncing from a master that has the same repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline. - New property in INFO: `async_loading` to differentiate from the blocking loading - Slot to Key mapping is now a field of `redisDb` as it's more natural to access it from both server.db and the tempDb that is passed around. - Because this is affecting replicas only, we assume that if they are not readonly and write commands during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET here anyways to avoid complications. Considerations for review: - We have many cases where server.loading flag is used and even though I tried my best, there may be cases where async_loading should be checked as well and cases where it shouldn't (would require very good understanding of whole code) - Several places that had different behavior depending on the loading flag where actually meant to just handle commands coming from the AOF client differently than ones coming from real clients, changed to check CLIENT_ID_AOF instead. **Additional for Release Notes** - Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't contribute on triggering next database SAVE - New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING - Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event. Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED, ABORTED and COMPLETED. - New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions to allow modules to declare they support the diskless replication with async loading (when absent, we fall back to disk-based loading). Co-authored-by: Eduardo Semprebon <edus@saxobank.com> Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-04 09:46:50 +01:00
clusterSlotToKeyMapping *slots_to_keys; /* Array of slots to keys. Only used in cluster mode (db 0). */
} redisDb;
/* forward declaration for functions ctx */
Redis Function Libraries (#10004) # Redis Function Libraries This PR implements Redis Functions Libraries as describe on: https://github.com/redis/redis/issues/9906. Libraries purpose is to provide a better code sharing between functions by allowing to create multiple functions in a single command. Functions that were created together can safely share code between each other without worrying about compatibility issues and versioning. Creating a new library is done using 'FUNCTION LOAD' command (full API is described below) This PR introduces a new struct called libraryInfo, libraryInfo holds information about a library: * name - name of the library * engine - engine used to create the library * code - library code * description - library description * functions - the functions exposed by the library When Redis gets the `FUNCTION LOAD` command it creates a new empty libraryInfo. Redis passes the `CODE` to the relevant engine alongside the empty libraryInfo. As a result, the engine will create one or more functions by calling 'libraryCreateFunction'. The new funcion will be added to the newly created libraryInfo. So far Everything is happening locally on the libraryInfo so it is easy to abort the operation (in case of an error) by simply freeing the libraryInfo. After the library info is fully constructed we start the joining phase by which we will join the new library to the other libraries currently exist on Redis. The joining phase make sure there is no function collision and add the library to the librariesCtx (renamed from functionCtx). LibrariesCtx is used all around the code in the exact same way as functionCtx was used (with respect to RDB loading, replicatio, ...). The only difference is that apart from function dictionary (maps function name to functionInfo object), the librariesCtx contains also a libraries dictionary that maps library name to libraryInfo object. ## New API ### FUNCTION LOAD `FUNCTION LOAD <ENGINE> <LIBRARY NAME> [REPLACE] [DESCRIPTION <DESCRIPTION>] <CODE>` Create a new library with the given parameters: * ENGINE - REPLACE Engine name to use to create the library. * LIBRARY NAME - The new library name. * REPLACE - If the library already exists, replace it. * DESCRIPTION - Library description. * CODE - Library code. Return "OK" on success, or error on the following cases: * Library name already taken and REPLACE was not used * Name collision with another existing library (even if replace was uses) * Library registration failed by the engine (usually compilation error) ## Changed API ### FUNCTION LIST `FUNCTION LIST [LIBRARYNAME <LIBRARY NAME PATTERN>] [WITHCODE]` Command was modified to also allow getting libraries code (so `FUNCTION INFO` command is no longer needed and removed). In addition the command gets an option argument, `LIBRARYNAME` allows you to only get libraries that match the given `LIBRARYNAME` pattern. By default, it returns all libraries. ### INFO MEMORY Added number of libraries to `INFO MEMORY` ### Commands flags `DENYOOM` flag was set on `FUNCTION LOAD` and `FUNCTION RESTORE`. We consider those commands as commands that add new data to the dateset (functions are data) and so we want to disallows to run those commands on OOM. ## Removed API * FUNCTION CREATE - Decided on https://github.com/redis/redis/issues/9906 * FUNCTION INFO - Decided on https://github.com/redis/redis/issues/9899 ## Lua engine changes When the Lua engine gets the code given on `FUNCTION LOAD` command, it immediately runs it, we call this run the loading run. Loading run is not a usual script run, it is not possible to invoke any Redis command from within the load run. Instead there is a new API provided by `library` object. The new API's: * `redis.log` - behave the same as `redis.log` * `redis.register_function` - register a new function to the library The loading run purpose is to register functions using the new `redis.register_function` API. Any attempt to use any other API will result in an error. In addition, the load run is has a time limit of 500ms, error is raise on timeout and the entire operation is aborted. ### `redis.register_function` `redis.register_function(<function_name>, <callback>, [<description>])` This new API allows users to register a new function that will be linked to the newly created library. This API can only be called during the load run (see definition above). Any attempt to use it outside of the load run will result in an error. The parameters pass to the API are: * function_name - Function name (must be a Lua string) * callback - Lua function object that will be called when the function is invokes using fcall/fcall_ro * description - Function description, optional (must be a Lua string). ### Example The following example creates a library called `lib` with 2 functions, `f1` and `f1`, returns 1 and 2 respectively: ``` local function f1(keys, args)     return 1 end local function f2(keys, args)     return 2 end redis.register_function('f1', f1) redis.register_function('f2', f2) ``` Notice: Unlike `eval`, functions inside a library get the KEYS and ARGV as arguments to the functions and not as global. ### Technical Details On the load run we only want the user to be able to call a white list on API's. This way, in the future, if new API's will be added, the new API's will not be available to the load run unless specifically added to this white list. We put the while list on the `library` object and make sure the `library` object is only available to the load run by using [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv) API. This API allows us to set the `globals` of a function (and all the function it creates). Before starting the load run we create a new fresh Lua table (call it `g`) that only contains the `library` API (we make sure to set global protection on this table just like the general global protection already exists today), then we use [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv) to set `g` as the global table of the load run. After the load run finished we update `g` metatable and set `__index` and `__newindex` functions to be `_G` (Lua default globals), we also pop out the `library` object as we do not need it anymore. This way, any function that was created on the load run (and will be invoke using `fcall`) will see the default globals as it expected to see them and will not have the `library` API anymore. An important outcome of this new approach is that now we can achieve a distinct global table for each library (it is not yet like that but it is very easy to achieve it now). In the future we can decide to remove global protection because global on different libraries will not collide or we can chose to give different API to different libraries base on some configuration or input. Notice that this technique was meant to prevent errors and was not meant to prevent malicious user from exploit it. For example, the load run can still save the `library` object on some local variable and then using in `fcall` context. To prevent such a malicious use, the C code also make sure it is running in the right context and if not raise an error.
2022-01-06 12:39:38 +01:00
typedef struct functionsLibCtx functionsLibCtx;
/* Holding object that need to be populated during
* rdb loading. On loading end it is possible to decide
* whether not to set those objects on their rightful place.
* For example: dbarray need to be set as main database on
* successful loading and dropped on failure. */
typedef struct rdbLoadingCtx {
redisDb* dbarray;
Redis Function Libraries (#10004) # Redis Function Libraries This PR implements Redis Functions Libraries as describe on: https://github.com/redis/redis/issues/9906. Libraries purpose is to provide a better code sharing between functions by allowing to create multiple functions in a single command. Functions that were created together can safely share code between each other without worrying about compatibility issues and versioning. Creating a new library is done using 'FUNCTION LOAD' command (full API is described below) This PR introduces a new struct called libraryInfo, libraryInfo holds information about a library: * name - name of the library * engine - engine used to create the library * code - library code * description - library description * functions - the functions exposed by the library When Redis gets the `FUNCTION LOAD` command it creates a new empty libraryInfo. Redis passes the `CODE` to the relevant engine alongside the empty libraryInfo. As a result, the engine will create one or more functions by calling 'libraryCreateFunction'. The new funcion will be added to the newly created libraryInfo. So far Everything is happening locally on the libraryInfo so it is easy to abort the operation (in case of an error) by simply freeing the libraryInfo. After the library info is fully constructed we start the joining phase by which we will join the new library to the other libraries currently exist on Redis. The joining phase make sure there is no function collision and add the library to the librariesCtx (renamed from functionCtx). LibrariesCtx is used all around the code in the exact same way as functionCtx was used (with respect to RDB loading, replicatio, ...). The only difference is that apart from function dictionary (maps function name to functionInfo object), the librariesCtx contains also a libraries dictionary that maps library name to libraryInfo object. ## New API ### FUNCTION LOAD `FUNCTION LOAD <ENGINE> <LIBRARY NAME> [REPLACE] [DESCRIPTION <DESCRIPTION>] <CODE>` Create a new library with the given parameters: * ENGINE - REPLACE Engine name to use to create the library. * LIBRARY NAME - The new library name. * REPLACE - If the library already exists, replace it. * DESCRIPTION - Library description. * CODE - Library code. Return "OK" on success, or error on the following cases: * Library name already taken and REPLACE was not used * Name collision with another existing library (even if replace was uses) * Library registration failed by the engine (usually compilation error) ## Changed API ### FUNCTION LIST `FUNCTION LIST [LIBRARYNAME <LIBRARY NAME PATTERN>] [WITHCODE]` Command was modified to also allow getting libraries code (so `FUNCTION INFO` command is no longer needed and removed). In addition the command gets an option argument, `LIBRARYNAME` allows you to only get libraries that match the given `LIBRARYNAME` pattern. By default, it returns all libraries. ### INFO MEMORY Added number of libraries to `INFO MEMORY` ### Commands flags `DENYOOM` flag was set on `FUNCTION LOAD` and `FUNCTION RESTORE`. We consider those commands as commands that add new data to the dateset (functions are data) and so we want to disallows to run those commands on OOM. ## Removed API * FUNCTION CREATE - Decided on https://github.com/redis/redis/issues/9906 * FUNCTION INFO - Decided on https://github.com/redis/redis/issues/9899 ## Lua engine changes When the Lua engine gets the code given on `FUNCTION LOAD` command, it immediately runs it, we call this run the loading run. Loading run is not a usual script run, it is not possible to invoke any Redis command from within the load run. Instead there is a new API provided by `library` object. The new API's: * `redis.log` - behave the same as `redis.log` * `redis.register_function` - register a new function to the library The loading run purpose is to register functions using the new `redis.register_function` API. Any attempt to use any other API will result in an error. In addition, the load run is has a time limit of 500ms, error is raise on timeout and the entire operation is aborted. ### `redis.register_function` `redis.register_function(<function_name>, <callback>, [<description>])` This new API allows users to register a new function that will be linked to the newly created library. This API can only be called during the load run (see definition above). Any attempt to use it outside of the load run will result in an error. The parameters pass to the API are: * function_name - Function name (must be a Lua string) * callback - Lua function object that will be called when the function is invokes using fcall/fcall_ro * description - Function description, optional (must be a Lua string). ### Example The following example creates a library called `lib` with 2 functions, `f1` and `f1`, returns 1 and 2 respectively: ``` local function f1(keys, args)     return 1 end local function f2(keys, args)     return 2 end redis.register_function('f1', f1) redis.register_function('f2', f2) ``` Notice: Unlike `eval`, functions inside a library get the KEYS and ARGV as arguments to the functions and not as global. ### Technical Details On the load run we only want the user to be able to call a white list on API's. This way, in the future, if new API's will be added, the new API's will not be available to the load run unless specifically added to this white list. We put the while list on the `library` object and make sure the `library` object is only available to the load run by using [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv) API. This API allows us to set the `globals` of a function (and all the function it creates). Before starting the load run we create a new fresh Lua table (call it `g`) that only contains the `library` API (we make sure to set global protection on this table just like the general global protection already exists today), then we use [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv) to set `g` as the global table of the load run. After the load run finished we update `g` metatable and set `__index` and `__newindex` functions to be `_G` (Lua default globals), we also pop out the `library` object as we do not need it anymore. This way, any function that was created on the load run (and will be invoke using `fcall`) will see the default globals as it expected to see them and will not have the `library` API anymore. An important outcome of this new approach is that now we can achieve a distinct global table for each library (it is not yet like that but it is very easy to achieve it now). In the future we can decide to remove global protection because global on different libraries will not collide or we can chose to give different API to different libraries base on some configuration or input. Notice that this technique was meant to prevent errors and was not meant to prevent malicious user from exploit it. For example, the load run can still save the `library` object on some local variable and then using in `fcall` context. To prevent such a malicious use, the C code also make sure it is running in the right context and if not raise an error.
2022-01-06 12:39:38 +01:00
functionsLibCtx* functions_lib_ctx;
}rdbLoadingCtx;
/* Client MULTI/EXEC state */
typedef struct multiCmd {
robj **argv;
int argv_len;
int argc;
struct redisCommand *cmd;
} multiCmd;
typedef struct multiState {
multiCmd *commands; /* Array of MULTI commands */
int count; /* Total number of MULTI commands */
int cmd_flags; /* The accumulated command flags OR-ed together.
So if at least a command has a given flag, it
will be set in this field. */
int cmd_inv_flags; /* Same as cmd_flags, OR-ing the ~flags. so that it
is possible to know if all the commands have a
certain flag. */
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
size_t argv_len_sums; /* mem used by all commands arguments */
} multiState;
/* This structure holds the blocking operation state for a client.
* The fields used depend on client->btype. */
2010-11-08 23:38:01 +01:00
typedef struct blockingState {
/* Generic fields. */
long count; /* Elements to pop if count was specified (BLMPOP/BZMPOP), -1 otherwise. */
mstime_t timeout; /* Blocking operation timeout. If UNIX current time
* is > timeout then the operation timed out. */
/* BLOCKED_LIST, BLOCKED_ZSET and BLOCKED_STREAM */
dict *keys; /* The keys we are waiting to terminate a blocking
* operation such as BLPOP or XREAD. Or NULL. */
2010-11-08 23:38:01 +01:00
robj *target; /* The key that should receive the element,
* for BLMOVE. */
struct blockPos {
int wherefrom; /* Where to pop from */
int whereto; /* Where to push to */
} blockpos; /* The positions in the src/dst lists/zsets
* where we want to pop/push an element
* for BLPOP, BRPOP, BLMOVE and BZMPOP. */
/* BLOCK_STREAM */
size_t xread_count; /* XREAD COUNT option. */
robj *xread_group; /* XREADGROUP group name. */
robj *xread_consumer; /* XREADGROUP consumer name. */
int xread_group_noack;
2015-07-27 09:41:48 +02:00
/* BLOCKED_WAIT */
int numreplicas; /* Number of replicas we are waiting for ACK. */
long long reploffset; /* Replication offset to reach. */
/* BLOCKED_MODULE */
void *module_blocked_handle; /* RedisModuleBlockedClient structure.
which is opaque for the Redis core, only
handled in module.c. */
2010-11-08 23:38:01 +01:00
} blockingState;
A reimplementation of blocking operation internals. Redis provides support for blocking operations such as BLPOP or BRPOP. This operations are identical to normal LPOP and RPOP operations as long as there are elements in the target list, but if the list is empty they block waiting for new data to arrive to the list. All the clients blocked waiting for th same list are served in a FIFO way, so the first that blocked is the first to be served when there is more data pushed by another client into the list. The previous implementation of blocking operations was conceived to serve clients in the context of push operations. For for instance: 1) There is a client "A" blocked on list "foo". 2) The client "B" performs `LPUSH foo somevalue`. 3) The client "A" is served in the context of the "B" LPUSH, synchronously. Processing things in a synchronous way was useful as if "A" pushes a value that is served by "B", from the point of view of the database is a NOP (no operation) thing, that is, nothing is replicated, nothing is written in the AOF file, and so forth. However later we implemented two things: 1) Variadic LPUSH that could add multiple values to a list in the context of a single call. 2) BRPOPLPUSH that was a version of BRPOP that also provided a "PUSH" side effect when receiving data. This forced us to make the synchronous implementation more complex. If client "B" is waiting for data, and "A" pushes three elemnents in a single call, we needed to propagate an LPUSH with a missing argument in the AOF and replication link. We also needed to make sure to replicate the LPUSH side of BRPOPLPUSH, but only if in turn did not happened to serve another blocking client into another list ;) This were complex but with a few of mutually recursive functions everything worked as expected... until one day we introduced scripting in Redis. Scripting + synchronous blocking operations = Issue #614. Basically you can't "rewrite" a script to have just a partial effect on the replicas and AOF file if the script happened to serve a few blocked clients. The solution to all this problems, implemented by this commit, is to change the way we serve blocked clients. Instead of serving the blocked clients synchronously, in the context of the command performing the PUSH operation, it is now an asynchronous and iterative process: 1) If a key that has clients blocked waiting for data is the subject of a list push operation, We simply mark keys as "ready" and put it into a queue. 2) Every command pushing stuff on lists, as a variadic LPUSH, a script, or whatever it is, is replicated verbatim without any rewriting. 3) Every time a Redis command, a MULTI/EXEC block, or a script, completed its execution, we run the list of keys ready to serve blocked clients (as more data arrived), and process this list serving the blocked clients. 4) As a result of "3" maybe more keys are ready again for other clients (as a result of BRPOPLPUSH we may have push operations), so we iterate back to step "3" if it's needed. The new code has a much simpler semantics, and a simpler to understand implementation, with the disadvantage of not being able to "optmize out" a PUSH+BPOP as a No OP. This commit will be tested with care before the final merge, more tests will be added likely.
2012-09-04 10:37:49 +02:00
/* The following structure represents a node in the server.ready_keys list,
* where we accumulate all the keys that had clients blocked with a blocking
* operation such as B[LR]POP, but received new data in the context of the
* last executed command.
*
* After the execution of every command or script, we run this list to check
* if as a result we should serve data to clients blocked, unblocking them.
* Note that server.ready_keys will not have duplicates as there dictionary
* also called ready_keys in every structure representing a Redis database,
* where we make sure to remember if a given key was already added in the
* server.ready_keys list. */
typedef struct readyList {
redisDb *db;
robj *key;
} readyList;
/* This structure represents a Redis user. This is useful for ACLs, the
* user is associated to the connection after the connection is authenticated.
* If there is no associated user, the connection uses the default user. */
#define USER_COMMAND_BITS_COUNT 1024 /* The total number of command bits
in the user structure. The last valid
command ID we can set in the user
is USER_COMMAND_BITS_COUNT-1. */
#define USER_FLAG_ENABLED (1<<0) /* The user is active. */
#define USER_FLAG_DISABLED (1<<1) /* The user is disabled. */
#define USER_FLAG_NOPASS (1<<2) /* The user requires no password, any
provided password will work. For the
default user, this also means that
no AUTH is needed, and every
connection is immediately
authenticated. */
#define USER_FLAG_SANITIZE_PAYLOAD (1<<3) /* The user require a deep RESTORE
* payload sanitization. */
#define USER_FLAG_SANITIZE_PAYLOAD_SKIP (1<<4) /* The user should skip the
* deep sanitization of RESTORE
* payload. */
#define SELECTOR_FLAG_ROOT (1<<0) /* This is the root user permission
* selector. */
#define SELECTOR_FLAG_ALLKEYS (1<<1) /* The user can mention any key. */
#define SELECTOR_FLAG_ALLCOMMANDS (1<<2) /* The user can run all commands. */
#define SELECTOR_FLAG_ALLCHANNELS (1<<3) /* The user can mention any Pub/Sub
channel. */
typedef struct {
sds name; /* The username as an SDS string. */
uint32_t flags; /* See USER_FLAG_* */
list *passwords; /* A list of SDS valid passwords for this user. */
list *selectors; /* A list of selectors this user validates commands
against. This list will always contain at least
one selector for backwards compatibility. */
} user;
2013-01-16 18:00:20 +01:00
/* With multiplexing we need to take per-client state.
2013-11-20 07:14:27 +01:00
* Clients are taken in a linked list. */
#define CLIENT_ID_AOF (UINT64_MAX) /* Reserved ID for the AOF client. If you
need more reserved IDs use UINT64_MAX-1,
-2, ... and so forth. */
Replication backlog and replicas use one global shared replication buffer (#9166) ## Background For redis master, one replica uses one copy of replication buffer, that is a big waste of memory, more replicas more waste, and allocate/free memory for every reply list also cost much. If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with replicas and can't finish synchronization with replica. If we set client-output-buffer-limit big, master may be OOM when there are many replicas that separately keep much memory. Because replication buffers of different replica client are the same, one simple idea is that all replicas only use one replication buffer, that will effectively save memory. Since replication backlog content is the same as replicas' output buffer, now we can discard replication backlog memory and use global shared replication buffer to implement replication backlog mechanism. ## Implementation I create one global "replication buffer" which contains content of replication stream. The structure of "replication buffer" is similar to the reply list that exists in every client. But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields. ```c /* Replication buffer blocks is the list of replBufBlock. * * +--------------+ +--------------+ +--------------+ * | refcount = 1 | ... | refcount = 0 | ... | refcount = 2 | * +--------------+ +--------------+ +--------------+ * | / \ * | / \ * | / \ * Repl Backlog Replia_A Replia_B * * Each replica or replication backlog increments only the refcount of the * 'ref_repl_buf_node' which it points to. So when replica walks to the next * node, it should first increase the next node's refcount, and when we trim * the replication buffer nodes, we remove node always from the head node which * refcount is 0. If the refcount of the head node is not 0, we must stop * trimming and never iterate the next node. */ /* Similar with 'clientReplyBlock', it is used for shared buffers between * all replica clients and replication backlog. */ typedef struct replBufBlock { int refcount; /* Number of replicas or repl backlog using. */ long long id; /* The unique incremental number. */ long long repl_offset; /* Start replication offset of the block. */ size_t size, used; char buf[]; } replBufBlock; ``` So now when we feed replication stream into replication backlog and all replicas, we only need to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of replication backlog and replicas to references of the global replication buffer blocks. And we also need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim replication backlog if exceeding `repl-backlog-size`. When sending reply to replicas, we also need to iterate replication buffer blocks and send its content, when totally sending one block for replica, we decrease current node count and increase the next current node count, and then free the block which reference is 0 from the head of replication buffer blocks. Since now we use linked list to manage replication backlog, it may cost much time for iterating all linked list nodes to find corresponding replication buffer node. So we create a rax tree to store some nodes for index, but to avoid rax tree occupying too much memory, i record one per 64 nodes for index. Currently, to make partial resynchronization as possible as much, we always let replication backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks, and this method doesn't increase memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced replication buffer blocks when we need to trim backlog for exceeding backlog size setting, we trim backlog incrementally (free 64 blocks per call now), and make it faster in `beforeSleep` (free 640 blocks). ### Other changes - `mem_total_replication_buffers`: we add this field in INFO command, it means the total memory of replication buffers used. - `mem_clients_slaves`: now even replica is slow to replicate, and its output buffer memory is not 0, but it still may be 0, since replication backlog and replicas share one global replication buffer, only if replication buffer memory is more than the repl backlog setting size, we consider the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption of repl backlog. - Key eviction Since all replicas and replication backlog share global replication buffer, we think only the part of exceeding backlog size the extra separate consumption of replicas. Because we trim backlog incrementally in the background, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks disconnect. To avoid massive eviction loop, we don't count the delayed freed replication backlog into used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory. - `client-output-buffer-limit` check for replica clients It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size config (partial sync will succeed and then replica will get disconnected). Such a configuration is ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption implications since the replica client will share the backlog buffers memory. - Drop replication backlog after loading data if needed We always create replication backlog if server is a master, we need it because we put DELs in it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb, it is not possible to support partial resynchronization, to avoid extra memory of replication backlog, we drop it. - Multi IO threads Since all replicas and replication backlog use global replication buffer, if I/O threads are enabled, to guarantee data accessing thread safe, we must let main thread handle sending the output buffer to all replicas. But before, other IO threads could handle sending output buffer of all replicas. ## Other optimizations This solution resolve some other problem: - When replicas disconnect with master since of out of output buffer limit, releasing the output buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now, it doesn't cause freezing. - This implementation may mitigate reply list copy cost time(also freezes server) when one replication has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy reference info, it is very light. - If we set replication backlog size big, it also may cost much time to copy replication backlog into replica's output buffer. But this commit eliminates this problem. - Resizing replication backlog size doesn't empty current replication backlog content.
2021-10-25 08:24:31 +02:00
/* Replication backlog is not separate memory, it just is one consumer of
* the global replication buffer. This structure records the reference of
* replication buffers. Since the replication buffer block list may be very long,
* it would cost much time to search replication offset on partial resync, so
* we use one rax tree to index some blocks every REPL_BACKLOG_INDEX_PER_BLOCKS
* to make searching offset from replication buffer blocks list faster. */
typedef struct replBacklog {
listNode *ref_repl_buf_node; /* Referenced node of replication buffer blocks,
* see the definition of replBufBlock. */
size_t unindexed_count; /* The count from last creating index block. */
rax *blocks_index; /* The index of recorded blocks of replication
Replication backlog and replicas use one global shared replication buffer (#9166) ## Background For redis master, one replica uses one copy of replication buffer, that is a big waste of memory, more replicas more waste, and allocate/free memory for every reply list also cost much. If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with replicas and can't finish synchronization with replica. If we set client-output-buffer-limit big, master may be OOM when there are many replicas that separately keep much memory. Because replication buffers of different replica client are the same, one simple idea is that all replicas only use one replication buffer, that will effectively save memory. Since replication backlog content is the same as replicas' output buffer, now we can discard replication backlog memory and use global shared replication buffer to implement replication backlog mechanism. ## Implementation I create one global "replication buffer" which contains content of replication stream. The structure of "replication buffer" is similar to the reply list that exists in every client. But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields. ```c /* Replication buffer blocks is the list of replBufBlock. * * +--------------+ +--------------+ +--------------+ * | refcount = 1 | ... | refcount = 0 | ... | refcount = 2 | * +--------------+ +--------------+ +--------------+ * | / \ * | / \ * | / \ * Repl Backlog Replia_A Replia_B * * Each replica or replication backlog increments only the refcount of the * 'ref_repl_buf_node' which it points to. So when replica walks to the next * node, it should first increase the next node's refcount, and when we trim * the replication buffer nodes, we remove node always from the head node which * refcount is 0. If the refcount of the head node is not 0, we must stop * trimming and never iterate the next node. */ /* Similar with 'clientReplyBlock', it is used for shared buffers between * all replica clients and replication backlog. */ typedef struct replBufBlock { int refcount; /* Number of replicas or repl backlog using. */ long long id; /* The unique incremental number. */ long long repl_offset; /* Start replication offset of the block. */ size_t size, used; char buf[]; } replBufBlock; ``` So now when we feed replication stream into replication backlog and all replicas, we only need to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of replication backlog and replicas to references of the global replication buffer blocks. And we also need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim replication backlog if exceeding `repl-backlog-size`. When sending reply to replicas, we also need to iterate replication buffer blocks and send its content, when totally sending one block for replica, we decrease current node count and increase the next current node count, and then free the block which reference is 0 from the head of replication buffer blocks. Since now we use linked list to manage replication backlog, it may cost much time for iterating all linked list nodes to find corresponding replication buffer node. So we create a rax tree to store some nodes for index, but to avoid rax tree occupying too much memory, i record one per 64 nodes for index. Currently, to make partial resynchronization as possible as much, we always let replication backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks, and this method doesn't increase memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced replication buffer blocks when we need to trim backlog for exceeding backlog size setting, we trim backlog incrementally (free 64 blocks per call now), and make it faster in `beforeSleep` (free 640 blocks). ### Other changes - `mem_total_replication_buffers`: we add this field in INFO command, it means the total memory of replication buffers used. - `mem_clients_slaves`: now even replica is slow to replicate, and its output buffer memory is not 0, but it still may be 0, since replication backlog and replicas share one global replication buffer, only if replication buffer memory is more than the repl backlog setting size, we consider the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption of repl backlog. - Key eviction Since all replicas and replication backlog share global replication buffer, we think only the part of exceeding backlog size the extra separate consumption of replicas. Because we trim backlog incrementally in the background, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks disconnect. To avoid massive eviction loop, we don't count the delayed freed replication backlog into used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory. - `client-output-buffer-limit` check for replica clients It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size config (partial sync will succeed and then replica will get disconnected). Such a configuration is ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption implications since the replica client will share the backlog buffers memory. - Drop replication backlog after loading data if needed We always create replication backlog if server is a master, we need it because we put DELs in it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb, it is not possible to support partial resynchronization, to avoid extra memory of replication backlog, we drop it. - Multi IO threads Since all replicas and replication backlog use global replication buffer, if I/O threads are enabled, to guarantee data accessing thread safe, we must let main thread handle sending the output buffer to all replicas. But before, other IO threads could handle sending output buffer of all replicas. ## Other optimizations This solution resolve some other problem: - When replicas disconnect with master since of out of output buffer limit, releasing the output buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now, it doesn't cause freezing. - This implementation may mitigate reply list copy cost time(also freezes server) when one replication has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy reference info, it is very light. - If we set replication backlog size big, it also may cost much time to copy replication backlog into replica's output buffer. But this commit eliminates this problem. - Resizing replication backlog size doesn't empty current replication backlog content.
2021-10-25 08:24:31 +02:00
* buffer for quickly searching replication
* offset on partial resynchronization. */
long long histlen; /* Backlog actual data length */
long long offset; /* Replication "master offset" of first
* byte in the replication backlog buffer.*/
} replBacklog;
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
typedef struct {
list *clients;
size_t mem_usage_sum;
} clientMemUsageBucket;
typedef struct client {
uint64_t id; /* Client incremental unique ID. */
connection *conn;
int resp; /* RESP protocol version. Can be 2 or 3. */
2015-08-06 09:41:11 +02:00
redisDb *db; /* Pointer to currently SELECTed DB. */
robj *name; /* As set by CLIENT SETNAME. */
sds querybuf; /* Buffer we use to accumulate client queries. */
size_t qb_pos; /* The position we have read in querybuf. */
2015-08-06 09:41:11 +02:00
size_t querybuf_peak; /* Recent (100ms or more) peak of querybuf size. */
int argc; /* Num of arguments of current command. */
robj **argv; /* Arguments of current command. */
int argv_len; /* Size of argv array (may be more than argc) */
int original_argc; /* Num of arguments of original command if arguments were rewritten. */
robj **original_argv; /* Arguments of original command if arguments were rewritten. */
size_t argv_len_sum; /* Sum of lengths of objects in argv list. */
2015-08-06 09:41:11 +02:00
struct redisCommand *cmd, *lastcmd; /* Last command executed. */
Sort out the mess around Lua error messages and error stats (#10329) This PR fix 2 issues on Lua scripting: * Server error reply statistics (some errors were counted twice). * Error code and error strings returning from scripts (error code was missing / misplaced). ## Statistics a Lua script user is considered part of the user application, a sophisticated transaction, so we want to count an error even if handled silently by the script, but when it is propagated outwards from the script we don't wanna count it twice. on the other hand, if the script decides to throw an error on its own (using `redis.error_reply`), we wanna count that too. Besides, we do count the `calls` in command statistics for the commands the script calls, we we should certainly also count `failed_calls`. So when a simple `eval "return redis.call('set','x','y')" 0` fails, it should count the failed call to both SET and EVAL, but the `errorstats` and `total_error_replies` should be counted only once. The PR changes the error object that is raised on errors. Instead of raising a simple Lua string, Redis will raise a Lua table in the following format: ``` { err='<error message (including error code)>', source='<User source file name>', line='<line where the error happned>', ignore_error_stats_update=true/false, } ``` The `luaPushError` function was modified to construct the new error table as describe above. The `luaRaiseError` was renamed to `luaError` and is now simply called `lua_error` to raise the table on the top of the Lua stack as the error object. The reason is that since its functionality is changed, in case some Redis branch / fork uses it, it's better to have a compilation error than a bug. The `source` and `line` fields are enriched by the error handler (if possible) and the `ignore_error_stats_update` is optional and if its not present then the default value is `false`. If `ignore_error_stats_update` is true, the error will not be counted on the error stats. When parsing Redis call reply, each error is translated to a Lua table on the format describe above and the `ignore_error_stats_update` field is set to `true` so we will not count errors twice (we counted this error when we invoke the command). The changes in this PR might have been considered as a breaking change for users that used Lua `pcall` function. Before, the error was a string and now its a table. To keep backward comparability the PR override the `pcall` implementation and extract the error message from the error table and return it. Example of the error stats update: ``` 127.0.0.1:6379> lpush l 1 (integer) 2 127.0.0.1:6379> eval "return redis.call('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value. script: e471b73f1ef44774987ab00bdf51f21fd9f7974a, on @user_script:1. 127.0.0.1:6379> info Errorstats # Errorstats errorstat_WRONGTYPE:count=1 127.0.0.1:6379> info commandstats # Commandstats cmdstat_eval:calls=1,usec=341,usec_per_call=341.00,rejected_calls=0,failed_calls=1 cmdstat_info:calls=1,usec=35,usec_per_call=35.00,rejected_calls=0,failed_calls=0 cmdstat_lpush:calls=1,usec=14,usec_per_call=14.00,rejected_calls=0,failed_calls=0 cmdstat_get:calls=1,usec=10,usec_per_call=10.00,rejected_calls=0,failed_calls=1 ``` ## error message We can now construct the error message (sent as a reply to the user) from the error table, so this solves issues where the error message was malformed and the error code appeared in the middle of the error message: ```diff 127.0.0.1:6379> eval "return redis.call('set','x','y')" 0 -(error) ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: OOM command not allowed when used memory > 'maxmemory'. +(error) OOM command not allowed when used memory > 'maxmemory' @user_script:1. Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479) ``` ```diff 127.0.0.1:6379> eval "redis.call('get', 'l')" 0 -(error) ERR Error running script (call to f_8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1): @user_script:1: WRONGTYPE Operation against a key holding the wrong kind of value +(error) WRONGTYPE Operation against a key holding the wrong kind of value script: 8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1, on @user_script:1. ``` Notica that `redis.pcall` was not change: ``` 127.0.0.1:6379> eval "return redis.pcall('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value ``` ## other notes Notice that Some commands (like GEOADD) changes the cmd variable on the client stats so we can not count on it to update the command stats. In order to be able to update those stats correctly we needed to promote `realcmd` variable to be located on the client struct. Tests was added and modified to verify the changes. Related PR's: #10279, #10218, #10278, #10309 Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-27 12:40:57 +01:00
struct redisCommand *realcmd; /* The original command that was executed by the client,
Used to update error stats in case the c->cmd was modified
during the command invocation (like on GEOADD for example). */
2019-01-14 13:21:21 +01:00
user *user; /* User associated with this connection. If the
user is set to NULL the connection can do
anything (admin). */
2015-08-06 09:41:11 +02:00
int reqtype; /* Request protocol type: PROTO_REQ_* */
int multibulklen; /* Number of multi bulk arguments left to read. */
long bulklen; /* Length of bulk argument in multi bulk request. */
list *reply; /* List of reply objects to send to the client. */
unsigned long long reply_bytes; /* Tot bytes of objects in reply list. */
list *deferred_reply_errors; /* Used for module thread safe contexts. */
size_t sentlen; /* Amount of bytes already sent in the current
buffer or object being sent. */
2015-08-06 09:41:11 +02:00
time_t ctime; /* Client creation time. */
Enabled background and reply time tracking on blocked on keys/blocked on background work clients (#7491) This commit enables tracking time of the background tasks and on replies, opening the door for properly tracking commands that rely on blocking / background work via the slowlog, latency history, and commandstats. Some notes: - The time spent blocked waiting for key changes, or blocked on synchronous replication is not accounted for. - **This commit does not affect latency tracking of commands that are non-blocking or do not have background work.** ( meaning that it all stays the same with exception to `BZPOPMIN`,`BZPOPMAX`,`BRPOP`,`BLPOP`, etc... and module's commands that rely on background threads ). - Specifically for latency history command we've added a new event class named `command-unblocking` that will enable latency monitoring on commands that spawn background threads to do the work. - For blocking commands we're now considering the total time of a command as the time spent on call() + the time spent on replying when unblocked. - For Modules commands that rely on background threads we're now considering the total time of a command as the time spent on call (main thread) + the time spent on the background thread ( if marked within `RedisModule_MeasureTimeStart()` and `RedisModule_MeasureTimeEnd()` ) + the time spent on replying (main thread) To test for this feature we've added a `unit/moduleapi/blockonbackground` test that relies on a module that blocks the client and sleeps on the background for a given time. - check blocked command that uses RedisModule_MeasureTimeStart() is tracking background time - check blocked command that uses RedisModule_MeasureTimeStart() is tracking background time even in timeout - check blocked command with multiple calls RedisModule_MeasureTimeStart() is tracking the total background time - check blocked command without calling RedisModule_MeasureTimeStart() is not reporting background time
2021-01-29 14:38:30 +01:00
long duration; /* Current command duration. Used for measuring latency of blocking/non-blocking cmds */
int slot; /* The slot the client is executing against. Set to -1 if no slot is being used */
2015-08-06 09:41:11 +02:00
time_t lastinteraction; /* Time of the last interaction, used for timeout */
time_t obuf_soft_limit_reached_time;
uint64_t flags; /* Client flags: CLIENT_* macros. */
int authenticated; /* Needed when the default user requires auth. */
2015-08-06 09:41:11 +02:00
int replstate; /* Replication state if this is a slave. */
Set repl-diskless-sync to yes by default, add repl-diskless-sync-max-replicas (#10092) 1. enable diskless replication by default 2. add a new config named repl-diskless-sync-max-replicas that enables replication to start before the full repl-diskless-sync-delay was reached. 3. put replica online sooner on the master (see below) 4. test suite uses repl-diskless-sync-delay of 0 to be faster 5. a few tests that use multiple replica on a pre-populated master, are now using the new repl-diskless-sync-max-replicas 6. fix possible timing issues in a few cluster tests (see below) put replica online sooner on the master ---------------------------------------------------- there were two tests that failed because they needed for the master to realize that the replica is online, but the test code was actually only waiting for the replica to realize it's online, and in diskless it could have been before the master realized it. changes include two things: 1. the tests wait on the right thing 2. issues in the master, putting the replica online in two steps. the master used to put the replica as online in 2 steps. the first step was to mark it as online, and the second step was to enable the write event (only after getting ACK), but in fact the first step didn't contains some of the tasks to put it online (like updating good slave count, and sending the module event). this meant that if a test was waiting to see that the replica is online form the point of view of the master, and then confirm that the module got an event, or that the master has enough good replicas, it could fail due to timing issues. so now the full effect of putting the replica online, happens at once, and only the part about enabling the writes is delayed till the ACK. fix cluster tests -------------------- I added some code to wait for the replica to sync and avoid race conditions. later realized the sentinel and cluster tests where using the original 5 seconds delay, so changed it to 0. this means the other changes are probably not needed, but i suppose they're still better (avoid race conditions)
2022-01-17 13:11:11 +01:00
int repl_start_cmd_stream_on_ack; /* Install slave write handler on first ACK. */
2015-08-06 09:41:11 +02:00
int repldbfd; /* Replication DB file descriptor. */
off_t repldboff; /* Replication DB file offset. */
off_t repldbsize; /* Replication DB file size. */
sds replpreamble; /* Replication DB preamble. */
Fix PSYNC2 incomplete command bug as described in #3899. This bug was discovered by @kevinmcgehee and constituted a major hidden bug in the PSYNC2 implementation, caused by the propagation from the master of incomplete commands to slaves. The bug had several results: 1. Borrowing from Kevin text in the issue: "Given that slaves blindly copy over their master's input into their own replication backlog over successive read syscalls, it's possible that with large commands or small TCP buffers, partial commands are present in this buffer. If the master were to fail before successfully propagating the entire command to a slave, the slaves will never execute the partial command (since the client is invalidated) but will copy it to replication backlog which may relay those invalid bytes to its slaves on PSYNC2, corrupting the backlog and possibly other valid commands that follow the failover. Simple command boundaries aren't sufficient to capture this, either, because in the case of a MULTI/EXEC block, if the master successfully propagates a subset of the commands but not the EXEC, then the transaction in the backlog becomes corrupt and could corrupt other slaves that consume this data." 2. As identified by @yangsiran later, there is another effect of the bug. For the same mechanism of the first problem, a slave having another slave, could receive a full resynchronization request with an already half-applied command in the backlog. Once the RDB is ready, it will be sent to the slave, and the replication will continue sending to the sub-slave the other half of the command, which is not valid. The fix, designed by @yangsiran and @antirez, and implemented by @antirez, uses a secondary buffer in order to feed the sub-masters and update the replication backlog and offsets, only when a given part of the query buffer is actually *applied* to the state of the instance, that is, when the command gets processed and the command is not pending in the Redis transaction buffer because of CLIENT_MULTI state. Given that now the backlog and offsets representation are in agreement with the actual processed commands, both issue 1 and 2 should no longer be possible. Thanks to @kevinmcgehee, @yangsiran and @oranagra for their work in identifying and designing a fix for this problem.
2017-04-19 10:25:45 +02:00
long long read_reploff; /* Read replication offset if this is a master. */
long long reploff; /* Applied replication offset if this is a master. */
long long repl_applied; /* Applied replication data count in querybuf, if this is a replica. */
2015-08-06 09:41:11 +02:00
long long repl_ack_off; /* Replication ack offset, if this is a slave. */
long long repl_ack_time;/* Replication ack time, if this is a slave. */
long long repl_last_partial_write; /* The last time the server did a partial write from the RDB child pipe to this replica */
long long psync_initial_offset; /* FULLRESYNC reply offset other slaves
copying this slave output buffer
should use. */
PSYNC2: different improvements to Redis replication. The gist of the changes is that now, partial resynchronizations between slaves and masters (without the need of a full resync with RDB transfer and so forth), work in a number of cases when it was impossible in the past. For instance: 1. When a slave is promoted to mastrer, the slaves of the old master can partially resynchronize with the new master. 2. Chained slalves (slaves of slaves) can be moved to replicate to other slaves or the master itsef, without requiring a full resync. 3. The master itself, after being turned into a slave, is able to partially resynchronize with the new master, when it joins replication again. In order to obtain this, the following main changes were operated: * Slaves also take a replication backlog, not just masters. * Same stream replication for all the slaves and sub slaves. The replication stream is identical from the top level master to its slaves and is also the same from the slaves to their sub-slaves and so forth. This means that if a slave is later promoted to master, it has the same replication backlong, and can partially resynchronize with its slaves (that were previously slaves of the old master). * A given replication history is no longer identified by the `runid` of a Redis node. There is instead a `replication ID` which changes every time the instance has a new history no longer coherent with the past one. So, for example, slaves publish the same replication history of their master, however when they are turned into masters, they publish a new replication ID, but still remember the old ID, so that they are able to partially resynchronize with slaves of the old master (up to a given offset). * The replication protocol was slightly modified so that a new extended +CONTINUE reply from the master is able to inform the slave of a replication ID change. * REPLCONF CAPA is used in order to notify masters that a slave is able to understand the new +CONTINUE reply. * The RDB file was extended with an auxiliary field that is able to select a given DB after loading in the slave, so that the slave can continue receiving the replication stream from the point it was disconnected without requiring the master to insert "SELECT" statements. This is useful in order to guarantee the "same stream" property, because the slave must be able to accumulate an identical backlog. * Slave pings to sub-slaves are now sent in a special form, when the top-level master is disconnected, in order to don't interfer with the replication stream. We just use out of band "\n" bytes as in other parts of the Redis protocol. An old design document is available here: https://gist.github.com/antirez/ae068f95c0d084891305 However the implementation is not identical to the description because during the work to implement it, different changes were needed in order to make things working well.
2016-11-09 11:31:06 +01:00
char replid[CONFIG_RUN_ID_SIZE+1]; /* Master replication ID (if master). */
Squash merging 125 typo/grammar/comment/doc PRs (#7773) List of squashed commits or PRs =============================== commit 66801ea Author: hwware <wen.hui.ware@gmail.com> Date: Mon Jan 13 00:54:31 2020 -0500 typo fix in acl.c commit 46f55db Author: Itamar Haber <itamar@redislabs.com> Date: Sun Sep 6 18:24:11 2020 +0300 Updates a couple of comments Specifically: * RM_AutoMemory completed instead of pointing to docs * Updated link to custom type doc commit 61a2aa0 Author: xindoo <xindoo@qq.com> Date: Tue Sep 1 19:24:59 2020 +0800 Correct errors in code comments commit a5871d1 Author: yz1509 <pro-756@qq.com> Date: Tue Sep 1 18:36:06 2020 +0800 fix typos in module.c commit 41eede7 Author: bookug <bookug@qq.com> Date: Sat Aug 15 01:11:33 2020 +0800 docs: fix typos in comments commit c303c84 Author: lazy-snail <ws.niu@outlook.com> Date: Fri Aug 7 11:15:44 2020 +0800 fix spelling in redis.conf commit 1eb76bf Author: zhujian <zhujianxyz@gmail.com> Date: Thu Aug 6 15:22:10 2020 +0800 add a missing 'n' in comment commit 1530ec2 Author: Daniel Dai <764122422@qq.com> Date: Mon Jul 27 00:46:35 2020 -0400 fix spelling in tracking.c commit e517b31 Author: Hunter-Chen <huntcool001@gmail.com> Date: Fri Jul 17 22:33:32 2020 +0800 Update redis.conf Co-authored-by: Itamar Haber <itamar@redislabs.com> commit c300eff Author: Hunter-Chen <huntcool001@gmail.com> Date: Fri Jul 17 22:33:23 2020 +0800 Update redis.conf Co-authored-by: Itamar Haber <itamar@redislabs.com> commit 4c058a8 Author: 陈浩鹏 <chenhaopeng@heytea.com> Date: Thu Jun 25 19:00:56 2020 +0800 Grammar fix and clarification commit 5fcaa81 Author: bodong.ybd <bodong.ybd@alibaba-inc.com> Date: Fri Jun 19 10:09:00 2020 +0800 Fix typos commit 4caca9a Author: Pruthvi P <pruthvi@ixigo.com> Date: Fri May 22 00:33:22 2020 +0530 Fix typo eviciton => eviction commit b2a25f6 Author: Brad Dunbar <dunbarb2@gmail.com> Date: Sun May 17 12:39:59 2020 -0400 Fix a typo. commit 12842ae Author: hwware <wen.hui.ware@gmail.com> Date: Sun May 3 17:16:59 2020 -0400 fix spelling in redis conf commit ddba07c Author: Chris Lamb <chris@chris-lamb.co.uk> Date: Sat May 2 23:25:34 2020 +0100 Correct a "conflicts" spelling error. commit 8fc7bf2 Author: Nao YONASHIRO <yonashiro@r.recruit.co.jp> Date: Thu Apr 30 10:25:27 2020 +0900 docs: fix EXPIRE_FAST_CYCLE_DURATION to ACTIVE_EXPIRE_CYCLE_FAST_DURATION commit 9b2b67a Author: Brad Dunbar <dunbarb2@gmail.com> Date: Fri Apr 24 11:46:22 2020 -0400 Fix a typo. commit 0746f10 Author: devilinrust <63737265+devilinrust@users.noreply.github.com> Date: Thu Apr 16 00:17:53 2020 +0200 Fix typos in server.c commit 92b588d Author: benjessop12 <56115861+benjessop12@users.noreply.github.com> Date: Mon Apr 13 13:43:55 2020 +0100 Fix spelling mistake in lazyfree.c commit 1da37aa Merge: 2d4ba28 af347a8 Author: hwware <wen.hui.ware@gmail.com> Date: Thu Mar 5 22:41:31 2020 -0500 Merge remote-tracking branch 'upstream/unstable' into expiretypofix commit 2d4ba28 Author: hwware <wen.hui.ware@gmail.com> Date: Mon Mar 2 00:09:40 2020 -0500 fix typo in expire.c commit 1a746f7 Author: SennoYuki <minakami1yuki@gmail.com> Date: Thu Feb 27 16:54:32 2020 +0800 fix typo commit 8599b1a Author: dongheejeong <donghee950403@gmail.com> Date: Sun Feb 16 20:31:43 2020 +0000 Fix typo in server.c commit f38d4e8 Author: hwware <wen.hui.ware@gmail.com> Date: Sun Feb 2 22:58:38 2020 -0500 fix typo in evict.c commit fe143fc Author: Leo Murillo <leonardo.murillo@gmail.com> Date: Sun Feb 2 01:57:22 2020 -0600 Fix a few typos in redis.conf commit 1ab4d21 Author: viraja1 <anchan.viraj@gmail.com> Date: Fri Dec 27 17:15:58 2019 +0530 Fix typo in Latency API docstring commit ca1f70e Author: gosth <danxuedexing@qq.com> Date: Wed Dec 18 15:18:02 2019 +0800 fix typo in sort.c commit a57c06b Author: ZYunH <zyunhjob@163.com> Date: Mon Dec 16 22:28:46 2019 +0800 fix-zset-typo commit b8c92b5 Author: git-hulk <hulk.website@gmail.com> Date: Mon Dec 16 15:51:42 2019 +0800 FIX: typo in cluster.c, onformation->information commit 9dd981c Author: wujm2007 <jim.wujm@gmail.com> Date: Mon Dec 16 09:37:52 2019 +0800 Fix typo commit e132d7a Author: Sebastien Williams-Wynn <s.williamswynn.mail@gmail.com> Date: Fri Nov 15 00:14:07 2019 +0000 Minor typo change commit 47f44d5 Author: happynote3966 <01ssrmikururudevice01@gmail.com> Date: Mon Nov 11 22:08:48 2019 +0900 fix comment typo in redis-cli.c commit b8bdb0d Author: fulei <fulei@kuaishou.com> Date: Wed Oct 16 18:00:17 2019 +0800 Fix a spelling mistake of comments in defragDictBucketCallback commit 0def46a Author: fulei <fulei@kuaishou.com> Date: Wed Oct 16 13:09:27 2019 +0800 fix some spelling mistakes of comments in defrag.c commit f3596fd Author: Phil Rajchgot <tophil@outlook.com> Date: Sun Oct 13 02:02:32 2019 -0400 Typo and grammar fixes Redis and its documentation are great -- just wanted to submit a few corrections in the spirit of Hacktoberfest. Thanks for all your work on this project. I use it all the time and it works beautifully. commit 2b928cd Author: KangZhiDong <worldkzd@gmail.com> Date: Sun Sep 1 07:03:11 2019 +0800 fix typos commit 33aea14 Author: Axlgrep <axlgrep@gmail.com> Date: Tue Aug 27 11:02:18 2019 +0800 Fixed eviction spelling issues commit e282a80 Author: Simen Flatby <simen@oms.no> Date: Tue Aug 20 15:25:51 2019 +0200 Update comments to reflect prop name In the comments the prop is referenced as replica-validity-factor, but it is really named cluster-replica-validity-factor. commit 74d1f9a Author: Jim Green <jimgreen2013@qq.com> Date: Tue Aug 20 20:00:31 2019 +0800 fix comment error, the code is ok commit eea1407 Author: Liao Tonglang <liaotonglang@gmail.com> Date: Fri May 31 10:16:18 2019 +0800 typo fix fix cna't to can't commit 0da553c Author: KAWACHI Takashi <tkawachi@gmail.com> Date: Wed Jul 17 00:38:16 2019 +0900 Fix typo commit 7fc8fb6 Author: Michael Prokop <mika@grml.org> Date: Tue May 28 17:58:42 2019 +0200 Typo fixes s/familar/familiar/ s/compatiblity/compatibility/ s/ ot / to / s/itsef/itself/ commit 5f46c9d Author: zhumoing <34539422+zhumoing@users.noreply.github.com> Date: Tue May 21 21:16:50 2019 +0800 typo-fixes typo-fixes commit 321dfe1 Author: wxisme <850885154@qq.com> Date: Sat Mar 16 15:10:55 2019 +0800 typo fix commit b4fb131 Merge: 267e0e6 3df1eb8 Author: Nikitas Bastas <nikitasbst@gmail.com> Date: Fri Feb 8 22:55:45 2019 +0200 Merge branch 'unstable' of antirez/redis into unstable commit 267e0e6 Author: Nikitas Bastas <nikitasbst@gmail.com> Date: Wed Jan 30 21:26:04 2019 +0200 Minor typo fix commit 30544e7 Author: inshal96 <39904558+inshal96@users.noreply.github.com> Date: Fri Jan 4 16:54:50 2019 +0500 remove an extra 'a' in the comments commit 337969d Author: BrotherGao <yangdongheng11@gmail.com> Date: Sat Dec 29 12:37:29 2018 +0800 fix typo in redis.conf commit 9f4b121 Merge: 423a030 e504583 Author: BrotherGao <yangdongheng@xiaomi.com> Date: Sat Dec 29 11:41:12 2018 +0800 Merge branch 'unstable' of antirez/redis into unstable commit 423a030 Merge: 42b02b7 46a51cd Author: 杨东衡 <yangdongheng@xiaomi.com> Date: Tue Dec 4 23:56:11 2018 +0800 Merge branch 'unstable' of antirez/redis into unstable commit 42b02b7 Merge: 68c0e6e b8febe6 Author: Dongheng Yang <yangdongheng11@gmail.com> Date: Sun Oct 28 15:54:23 2018 +0800 Merge pull request #1 from antirez/unstable update local data commit 714b589 Author: Christian <crifei93@gmail.com> Date: Fri Dec 28 01:17:26 2018 +0100 fix typo "resulution" commit e23259d Author: garenchan <1412950785@qq.com> Date: Wed Dec 26 09:58:35 2018 +0800 fix typo: segfauls -> segfault commit a9359f8 Author: xjp <jianping_xie@aliyun.com> Date: Tue Dec 18 17:31:44 2018 +0800 Fixed REDISMODULE_H spell bug commit a12c3e4 Author: jdiaz <jrd.palacios@gmail.com> Date: Sat Dec 15 23:39:52 2018 -0600 Fixes hyperloglog hash function comment block description commit 770eb11 Author: 林上耀 <1210tom@163.com> Date: Sun Nov 25 17:16:10 2018 +0800 fix typo commit fd97fbb Author: Chris Lamb <chris@chris-lamb.co.uk> Date: Fri Nov 23 17:14:01 2018 +0100 Correct "unsupported" typo. commit a85522d Author: Jungnam Lee <jungnam.lee@oracle.com> Date: Thu Nov 8 23:01:29 2018 +0900 fix typo in test comments commit ade8007 Author: Arun Kumar <palerdot@users.noreply.github.com> Date: Tue Oct 23 16:56:35 2018 +0530 Fixed grammatical typo Fixed typo for word 'dictionary' commit 869ee39 Author: Hamid Alaei <hamid.a85@gmail.com> Date: Sun Aug 12 16:40:02 2018 +0430 fix documentations: (ThreadSafeContextStart/Stop -> ThreadSafeContextLock/Unlock), minor typo commit f89d158 Author: Mayank Jain <mayankjain255@gmail.com> Date: Tue Jul 31 23:01:21 2018 +0530 Updated README.md with some spelling corrections. Made correction in spelling of some misspelled words. commit 892198e Author: dsomeshwar <someshwar.dhayalan@gmail.com> Date: Sat Jul 21 23:23:04 2018 +0530 typo fix commit 8a4d780 Author: Itamar Haber <itamar@redislabs.com> Date: Mon Apr 30 02:06:52 2018 +0300 Fixes some typos commit e3acef6 Author: Noah Rosamilia <ivoahivoah@gmail.com> Date: Sat Mar 3 23:41:21 2018 -0500 Fix typo in /deps/README.md commit 04442fb Author: WuYunlong <xzsyeb@126.com> Date: Sat Mar 3 10:32:42 2018 +0800 Fix typo in readSyncBulkPayload() comment. commit 9f36880 Author: WuYunlong <xzsyeb@126.com> Date: Sat Mar 3 10:20:37 2018 +0800 replication.c comment: run_id -> replid. commit f866b4a Author: Francesco 'makevoid' Canessa <makevoid@gmail.com> Date: Thu Feb 22 22:01:56 2018 +0000 fix comment typo in server.c commit 0ebc69b Author: 줍 <jubee0124@gmail.com> Date: Mon Feb 12 16:38:48 2018 +0900 Fix typo in redis.conf Fix `five behaviors` to `eight behaviors` in [this sentence ](antirez/redis@unstable/redis.conf#L564) commit b50a620 Author: martinbroadhurst <martinbroadhurst@users.noreply.github.com> Date: Thu Dec 28 12:07:30 2017 +0000 Fix typo in valgrind.sup commit 7d8f349 Author: Peter Boughton <peter@sorcerersisle.com> Date: Mon Nov 27 19:52:19 2017 +0000 Update CONTRIBUTING; refer doc updates to redis-doc repo. commit 02dec7e Author: Klauswk <klauswk1@hotmail.com> Date: Tue Oct 24 16:18:38 2017 -0200 Fix typo in comment commit e1efbc8 Author: chenshi <baiwfg2@gmail.com> Date: Tue Oct 3 18:26:30 2017 +0800 Correct two spelling errors of comments commit 93327d8 Author: spacewander <spacewanderlzx@gmail.com> Date: Wed Sep 13 16:47:24 2017 +0800 Update the comment for OBJ_ENCODING_EMBSTR_SIZE_LIMIT's value The value of OBJ_ENCODING_EMBSTR_SIZE_LIMIT is 44 now instead of 39. commit 63d361f Author: spacewander <spacewanderlzx@gmail.com> Date: Tue Sep 12 15:06:42 2017 +0800 Fix <prevlen> related doc in ziplist.c According to the definition of ZIP_BIG_PREVLEN and other related code, the guard of single byte <prevlen> should be 254 instead of 255. commit ebe228d Author: hanael80 <hanael80@gmail.com> Date: Tue Aug 15 09:09:40 2017 +0900 Fix typo commit 6b696e6 Author: Matt Robenolt <matt@ydekproductions.com> Date: Mon Aug 14 14:50:47 2017 -0700 Fix typo in LATENCY DOCTOR output commit a2ec6ae Author: caosiyang <caosiyang@qiyi.com> Date: Tue Aug 15 14:15:16 2017 +0800 Fix a typo: form => from commit 3ab7699 Author: caosiyang <caosiyang@qiyi.com> Date: Thu Aug 10 18:40:33 2017 +0800 Fix a typo: replicationFeedSlavesFromMaster() => replicationFeedSlavesFromMasterStream() commit 72d43ef Author: caosiyang <caosiyang@qiyi.com> Date: Tue Aug 8 15:57:25 2017 +0800 fix a typo: servewr => server commit 707c958 Author: Bo Cai <charpty@gmail.com> Date: Wed Jul 26 21:49:42 2017 +0800 redis-cli.c typo: conut -> count. Signed-off-by: Bo Cai <charpty@gmail.com> commit b9385b2 Author: JackDrogon <jack.xsuperman@gmail.com> Date: Fri Jun 30 14:22:31 2017 +0800 Fix some spell problems commit 20d9230 Author: akosel <aaronjkosel@gmail.com> Date: Sun Jun 4 19:35:13 2017 -0500 Fix typo commit b167bfc Author: Krzysiek Witkowicz <krzysiekwitkowicz@gmail.com> Date: Mon May 22 21:32:27 2017 +0100 Fix #4008 small typo in comment commit 2b78ac8 Author: Jake Clarkson <jacobwclarkson@gmail.com> Date: Wed Apr 26 15:49:50 2017 +0100 Correct typo in tests/unit/hyperloglog.tcl commit b0f1cdb Author: Qi Luo <qiluo-msft@users.noreply.github.com> Date: Wed Apr 19 14:25:18 2017 -0700 Fix typo commit a90b0f9 Author: charsyam <charsyam@naver.com> Date: Thu Mar 16 18:19:53 2017 +0900 fix typos fix typos fix typos commit 8430a79 Author: Richard Hart <richardhart92@gmail.com> Date: Mon Mar 13 22:17:41 2017 -0400 Fixed log message typo in listenToPort. commit 481a1c2 Author: Vinod Kumar <kumar003vinod@gmail.com> Date: Sun Jan 15 23:04:51 2017 +0530 src/db.c: Correct "save" -> "safe" typo commit 586b4d3 Author: wangshaonan <wshn13@gmail.com> Date: Wed Dec 21 20:28:27 2016 +0800 Fix typo they->the in helloworld.c commit c1c4b5e Author: Jenner <hypxm@qq.com> Date: Mon Dec 19 16:39:46 2016 +0800 typo error commit 1ee1a3f Author: tielei <43289893@qq.com> Date: Mon Jul 18 13:52:25 2016 +0800 fix some comments commit 11a41fb Author: Otto Kekäläinen <otto@seravo.fi> Date: Sun Jul 3 10:23:55 2016 +0100 Fix spelling in documentation and comments commit 5fb5d82 Author: francischan <f1ancis621@gmail.com> Date: Tue Jun 28 00:19:33 2016 +0800 Fix outdated comments about redis.c file. It should now refer to server.c file. commit 6b254bc Author: lmatt-bit <lmatt123n@gmail.com> Date: Thu Apr 21 21:45:58 2016 +0800 Refine the comment of dictRehashMilliseconds func SLAVECONF->REPLCONF in comment - by andyli029 commit ee9869f Author: clark.kang <charsyam@naver.com> Date: Tue Mar 22 11:09:51 2016 +0900 fix typos commit f7b3b11 Author: Harisankar H <harisankarh@gmail.com> Date: Wed Mar 9 11:49:42 2016 +0530 Typo correction: "faield" --> "failed" Typo correction: "faield" --> "failed" commit 3fd40fc Author: Itamar Haber <itamar@redislabs.com> Date: Thu Feb 25 10:31:51 2016 +0200 Fixes a typo in comments commit 621c160 Author: Prayag Verma <prayag.verma@gmail.com> Date: Mon Feb 1 12:36:20 2016 +0530 Fix typo in Readme.md Spelling mistakes - `eviciton` > `eviction` `familar` > `familiar` commit d7d07d6 Author: WonCheol Lee <toctoc21c@gmail.com> Date: Wed Dec 30 15:11:34 2015 +0900 Typo fixed commit a4dade7 Author: Felix Bünemann <buenemann@louis.info> Date: Mon Dec 28 11:02:55 2015 +0100 [ci skip] Improve supervised upstart config docs This mentions that "expect stop" is required for supervised upstart to work correctly. See http://upstart.ubuntu.com/cookbook/#expect-stop for an explanation. commit d9caba9 Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:30:03 2015 +1100 README: Remove trailing whitespace commit 72d42e5 Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:29:32 2015 +1100 README: Fix typo. th => the commit dd6e957 Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:29:20 2015 +1100 README: Fix typo. familar => familiar commit 3a12b23 Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:28:54 2015 +1100 README: Fix typo. eviciton => eviction commit 2d1d03b Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:21:45 2015 +1100 README: Fix typo. sever => server commit 3973b06 Author: Itamar Haber <itamar@garantiadata.com> Date: Sat Dec 19 17:01:20 2015 +0200 Typo fix commit 4f2e460 Author: Steve Gao <fu@2token.com> Date: Fri Dec 4 10:22:05 2015 +0800 Update README - fix typos commit b21667c Author: binyan <binbin.yan@nokia.com> Date: Wed Dec 2 22:48:37 2015 +0800 delete redundancy color judge in sdscatcolor commit 88894c7 Author: binyan <binbin.yan@nokia.com> Date: Wed Dec 2 22:14:42 2015 +0800 the example output shoule be HelloWorld commit 2763470 Author: binyan <binbin.yan@nokia.com> Date: Wed Dec 2 17:41:39 2015 +0800 modify error word keyevente Signed-off-by: binyan <binbin.yan@nokia.com> commit 0847b3d Author: Bruno Martins <bscmartins@gmail.com> Date: Wed Nov 4 11:37:01 2015 +0000 typo commit bbb9e9e Author: dawedawe <dawedawe@gmx.de> Date: Fri Mar 27 00:46:41 2015 +0100 typo: zimap -> zipmap commit 5ed297e Author: Axel Advento <badwolf.bloodseeker.rev@gmail.com> Date: Tue Mar 3 15:58:29 2015 +0800 Fix 'salve' typos to 'slave' commit edec9d6 Author: LudwikJaniuk <ludvig.janiuk@gmail.com> Date: Wed Jun 12 14:12:47 2019 +0200 Update README.md Co-Authored-By: Qix <Qix-@users.noreply.github.com> commit 692a7af Author: LudwikJaniuk <ludvig.janiuk@gmail.com> Date: Tue May 28 14:32:04 2019 +0200 grammar commit d962b0a Author: Nick Frost <nickfrostatx@gmail.com> Date: Wed Jul 20 15:17:12 2016 -0700 Minor grammar fix commit 24fff01aaccaf5956973ada8c50ceb1462e211c6 (typos) Author: Chad Miller <chadm@squareup.com> Date: Tue Sep 8 13:46:11 2020 -0400 Fix faulty comment about operation of unlink() commit 3cd5c1f3326c52aa552ada7ec797c6bb16452355 Author: Kevin <kevin.xgr@gmail.com> Date: Wed Nov 20 00:13:50 2019 +0800 Fix typo in server.c. From a83af59 Mon Sep 17 00:00:00 2001 From: wuwo <wuwo@wacai.com> Date: Fri, 17 Mar 2017 20:37:45 +0800 Subject: [PATCH] falure to failure From c961896 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E5=B7=A6=E6=87=B6?= <veficos@gmail.com> Date: Sat, 27 May 2017 15:33:04 +0800 Subject: [PATCH] fix typo From e600ef2 Mon Sep 17 00:00:00 2001 From: "rui.zou" <rui.zou@yunify.com> Date: Sat, 30 Sep 2017 12:38:15 +0800 Subject: [PATCH] fix a typo From c7d07fa Mon Sep 17 00:00:00 2001 From: Alexandre Perrin <alex@kaworu.ch> Date: Thu, 16 Aug 2018 10:35:31 +0200 Subject: [PATCH] deps README.md typo From b25cb67 Mon Sep 17 00:00:00 2001 From: Guy Korland <gkorland@gmail.com> Date: Wed, 26 Sep 2018 10:55:37 +0300 Subject: [PATCH 1/2] fix typos in header From ad28ca6 Mon Sep 17 00:00:00 2001 From: Guy Korland <gkorland@gmail.com> Date: Wed, 26 Sep 2018 11:02:36 +0300 Subject: [PATCH 2/2] fix typos commit 34924cdedd8552466fc22c1168d49236cb7ee915 Author: Adrian Lynch <adi_ady_ade@hotmail.com> Date: Sat Apr 4 21:59:15 2015 +0100 Typos fixed commit fd2a1e7 Author: Jan <jsteemann@users.noreply.github.com> Date: Sat Oct 27 19:13:01 2018 +0200 Fix typos Fix typos commit e14e47c1a234b53b0e103c5f6a1c61481cbcbb02 Author: Andy Lester <andy@petdance.com> Date: Fri Aug 2 22:30:07 2019 -0500 Fix multiple misspellings of "following" commit 79b948ce2dac6b453fe80995abbcaac04c213d5a Author: Andy Lester <andy@petdance.com> Date: Fri Aug 2 22:24:28 2019 -0500 Fix misspelling of create-cluster commit 1fffde52666dc99ab35efbd31071a4c008cb5a71 Author: Andy Lester <andy@petdance.com> Date: Wed Jul 31 17:57:56 2019 -0500 Fix typos commit 204c9ba9651e9e05fd73936b452b9a30be456cfe Author: Xiaobo Zhu <xiaobo.zhu@shopee.com> Date: Tue Aug 13 22:19:25 2019 +0800 fix typos Squashed commit of the following: commit 1d9aaf8 Author: danmedani <danmedani@gmail.com> Date: Sun Aug 2 11:40:26 2015 -0700 README typo fix. Squashed commit of the following: commit 32bfa7c Author: Erik Dubbelboer <erik@dubbelboer.com> Date: Mon Jul 6 21:15:08 2015 +0200 Fixed grammer Squashed commit of the following: commit b24f69c Author: Sisir Koppaka <sisir.koppaka@gmail.com> Date: Mon Mar 2 22:38:45 2015 -0500 utils/hashtable/rehashing.c: Fix typos Squashed commit of the following: commit 4e04082 Author: Erik Dubbelboer <erik@dubbelboer.com> Date: Mon Mar 23 08:22:21 2015 +0000 Small config file documentation improvements Squashed commit of the following: commit acb8773 Author: ctd1500 <ctd1500@gmail.com> Date: Fri May 8 01:52:48 2015 -0700 Typo and grammar fixes in readme commit 2eb75b6 Author: ctd1500 <ctd1500@gmail.com> Date: Fri May 8 01:36:18 2015 -0700 fixed redis.conf comment Squashed commit of the following: commit a8249a2 Author: Masahiko Sawada <sawada.mshk@gmail.com> Date: Fri Dec 11 11:39:52 2015 +0530 Revise correction of typos. Squashed commit of the following: commit 3c02028 Author: zhaojun11 <zhaojun11@jd.com> Date: Wed Jan 17 19:05:28 2018 +0800 Fix typos include two code typos in cluster.c and latency.c Squashed commit of the following: commit 9dba47c Author: q191201771 <191201771@qq.com> Date: Sat Jan 4 11:31:04 2020 +0800 fix function listCreate comment in adlist.c Update src/server.c commit 2c7c2cb536e78dd211b1ac6f7bda00f0f54faaeb Author: charpty <charpty@gmail.com> Date: Tue May 1 23:16:59 2018 +0800 server.c typo: modules system dictionary type comment Signed-off-by: charpty <charpty@gmail.com> commit a8395323fb63cb59cb3591cb0f0c8edb7c29a680 Author: Itamar Haber <itamar@redislabs.com> Date: Sun May 6 00:25:18 2018 +0300 Updates test_helper.tcl's help with undocumented options Specifically: * Host * Port * Client commit bde6f9ced15755cd6407b4af7d601b030f36d60b Author: wxisme <850885154@qq.com> Date: Wed Aug 8 15:19:19 2018 +0800 fix comments in deps files commit 3172474ba991532ab799ee1873439f3402412331 Author: wxisme <850885154@qq.com> Date: Wed Aug 8 14:33:49 2018 +0800 fix some comments commit 01b6f2b6858b5cf2ce4ad5092d2c746e755f53f0 Author: Thor Juhasz <thor@juhasz.pro> Date: Sun Nov 18 14:37:41 2018 +0100 Minor fixes to comments Found some parts a little unclear on a first read, which prompted me to have a better look at the file and fix some minor things I noticed. Fixing minor typos and grammar. There are no changes to configuration options. These changes are only meant to help the user better understand the explanations to the various configuration options
2020-09-10 12:43:38 +02:00
int slave_listening_port; /* As configured with: REPLCONF listening-port */
char *slave_addr; /* Optionally given by REPLCONF ip-address */
int slave_capa; /* Slave capabilities: SLAVE_CAPA_* bitwise OR. */
int slave_req; /* Slave requirements: SLAVE_REQ_* */
multiState mstate; /* MULTI/EXEC state */
2015-07-27 09:41:48 +02:00
int btype; /* Type of blocking op if CLIENT_BLOCKED. */
blockingState bpop; /* blocking state */
long long woff; /* Last write global replication offset. */
list *watched_keys; /* Keys WATCHED for MULTI/EXEC CAS */
dict *pubsub_channels; /* channels a client is interested in (SUBSCRIBE) */
list *pubsub_patterns; /* patterns a client is interested in (SUBSCRIBE) */
dict *pubsubshard_channels; /* shard level channels a client is interested in (SSUBSCRIBE) */
sds peerid; /* Cached peer ID. */
sds sockname; /* Cached connection target address. */
listNode *client_list_node; /* list node in client list */
listNode *postponed_list_node; /* list node within the postponed list */
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
listNode *pending_read_list_node; /* list node in clients pending read list */
RedisModuleUserChangedFunc auth_callback; /* Module callback to execute
* when the authenticated user
* changes. */
void *auth_callback_privdata; /* Private data that is passed when the auth
* changed callback is executed. Opaque for
* Redis Core. */
void *auth_module; /* The module that owns the callback, which is used
* to disconnect the client if the module is
* unloaded for cleanup. Opaque for Redis Core.*/
/* If this client is in tracking mode and this field is non zero,
* invalidation messages for keys fetched by this client will be send to
* the specified client ID. */
uint64_t client_tracking_redirection;
rax *client_tracking_prefixes; /* A dictionary of prefixes we are already
subscribed to in BCAST mode, in the
context of client side caching. */
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
/* In updateClientMemUsage() we track the memory usage of
* each client and add it to the sum of all the clients of a given type,
* however we need to remember what was the old contribution of each
* client, and in which category the client was, in order to remove it
* before adding it the new value. */
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
size_t last_memory_usage;
int last_memory_type;
listNode *mem_usage_bucket_node;
clientMemUsageBucket *mem_usage_bucket;
Replication backlog and replicas use one global shared replication buffer (#9166) ## Background For redis master, one replica uses one copy of replication buffer, that is a big waste of memory, more replicas more waste, and allocate/free memory for every reply list also cost much. If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with replicas and can't finish synchronization with replica. If we set client-output-buffer-limit big, master may be OOM when there are many replicas that separately keep much memory. Because replication buffers of different replica client are the same, one simple idea is that all replicas only use one replication buffer, that will effectively save memory. Since replication backlog content is the same as replicas' output buffer, now we can discard replication backlog memory and use global shared replication buffer to implement replication backlog mechanism. ## Implementation I create one global "replication buffer" which contains content of replication stream. The structure of "replication buffer" is similar to the reply list that exists in every client. But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields. ```c /* Replication buffer blocks is the list of replBufBlock. * * +--------------+ +--------------+ +--------------+ * | refcount = 1 | ... | refcount = 0 | ... | refcount = 2 | * +--------------+ +--------------+ +--------------+ * | / \ * | / \ * | / \ * Repl Backlog Replia_A Replia_B * * Each replica or replication backlog increments only the refcount of the * 'ref_repl_buf_node' which it points to. So when replica walks to the next * node, it should first increase the next node's refcount, and when we trim * the replication buffer nodes, we remove node always from the head node which * refcount is 0. If the refcount of the head node is not 0, we must stop * trimming and never iterate the next node. */ /* Similar with 'clientReplyBlock', it is used for shared buffers between * all replica clients and replication backlog. */ typedef struct replBufBlock { int refcount; /* Number of replicas or repl backlog using. */ long long id; /* The unique incremental number. */ long long repl_offset; /* Start replication offset of the block. */ size_t size, used; char buf[]; } replBufBlock; ``` So now when we feed replication stream into replication backlog and all replicas, we only need to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of replication backlog and replicas to references of the global replication buffer blocks. And we also need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim replication backlog if exceeding `repl-backlog-size`. When sending reply to replicas, we also need to iterate replication buffer blocks and send its content, when totally sending one block for replica, we decrease current node count and increase the next current node count, and then free the block which reference is 0 from the head of replication buffer blocks. Since now we use linked list to manage replication backlog, it may cost much time for iterating all linked list nodes to find corresponding replication buffer node. So we create a rax tree to store some nodes for index, but to avoid rax tree occupying too much memory, i record one per 64 nodes for index. Currently, to make partial resynchronization as possible as much, we always let replication backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks, and this method doesn't increase memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced replication buffer blocks when we need to trim backlog for exceeding backlog size setting, we trim backlog incrementally (free 64 blocks per call now), and make it faster in `beforeSleep` (free 640 blocks). ### Other changes - `mem_total_replication_buffers`: we add this field in INFO command, it means the total memory of replication buffers used. - `mem_clients_slaves`: now even replica is slow to replicate, and its output buffer memory is not 0, but it still may be 0, since replication backlog and replicas share one global replication buffer, only if replication buffer memory is more than the repl backlog setting size, we consider the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption of repl backlog. - Key eviction Since all replicas and replication backlog share global replication buffer, we think only the part of exceeding backlog size the extra separate consumption of replicas. Because we trim backlog incrementally in the background, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks disconnect. To avoid massive eviction loop, we don't count the delayed freed replication backlog into used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory. - `client-output-buffer-limit` check for replica clients It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size config (partial sync will succeed and then replica will get disconnected). Such a configuration is ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption implications since the replica client will share the backlog buffers memory. - Drop replication backlog after loading data if needed We always create replication backlog if server is a master, we need it because we put DELs in it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb, it is not possible to support partial resynchronization, to avoid extra memory of replication backlog, we drop it. - Multi IO threads Since all replicas and replication backlog use global replication buffer, if I/O threads are enabled, to guarantee data accessing thread safe, we must let main thread handle sending the output buffer to all replicas. But before, other IO threads could handle sending output buffer of all replicas. ## Other optimizations This solution resolve some other problem: - When replicas disconnect with master since of out of output buffer limit, releasing the output buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now, it doesn't cause freezing. - This implementation may mitigate reply list copy cost time(also freezes server) when one replication has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy reference info, it is very light. - If we set replication backlog size big, it also may cost much time to copy replication backlog into replica's output buffer. But this commit eliminates this problem. - Resizing replication backlog size doesn't empty current replication backlog content.
2021-10-25 08:24:31 +02:00
listNode *ref_repl_buf_node; /* Referenced node of replication buffer blocks,
* see the definition of replBufBlock. */
size_t ref_block_pos; /* Access position of referenced buffer block,
* i.e. the next offset to send. */
/* Response buffer */
introduce dynamic client reply buffer size - save memory on idle clients (#9822) Current implementation simple idle client which serves no traffic still use ~17Kb of memory. this is mainly due to a fixed size reply buffer currently set to 16kb. We have encountered some cases in which the server operates in a low memory environments. In such cases a user who wishes to create large connection pools to support potential burst period, will exhaust a large amount of memory to maintain connected Idle clients. Some users may choose to "sacrifice" performance in order to save memory. This commit introduce a dynamic mechanism to shrink and expend the client reply buffer based on periodic observed peak. the algorithm works as follows: 1. each time a client reply buffer has been fully written, the last recorded peak is updated: new peak = MAX( last peak, current written size) 2. during clients cron we check for each client if the last observed peak was: a. matching the current buffer size - in which case we expend (resize) the buffer size by 100% b. less than half the buffer size - in which case we shrink the buffer size by 50% 3. In any case we will **not** resize the buffer in case: a. the current buffer peak is less then the current buffer usable size and higher than 1/2 the current buffer usable size b. the value of (current buffer usable size/2) is less than 1Kib c. the value of (current buffer usable size*2) is larger than 16Kib 4. the peak value is reset to the current buffer position once every **5** seconds. we maintain a new field in the client structure (buf_peak_last_reset_time) which is used to keep track of how long it passed since the last buffer peak reset. ### **Interface changes:** **CIENT LIST** - now contains 2 new extra fields: rbs= < the current size in bytes of the client reply buffer > rbp=< the current value in bytes of the last observed buffer peak position > **INFO STATS** - now contains 2 new statistics: reply_buffer_shrinks = < total number of buffer shrinks performed > reply_buffer_expends = < total number of buffer expends performed > Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Yoav Steinberg <yoav@redislabs.com>
2022-02-22 10:19:38 +01:00
size_t buf_peak; /* Peak used size of buffer in last 5 sec interval. */
mstime_t buf_peak_last_reset_time; /* keeps the last time the buffer peak value was reset */
int bufpos;
size_t buf_usable_size; /* Usable size of buffer. */
introduce dynamic client reply buffer size - save memory on idle clients (#9822) Current implementation simple idle client which serves no traffic still use ~17Kb of memory. this is mainly due to a fixed size reply buffer currently set to 16kb. We have encountered some cases in which the server operates in a low memory environments. In such cases a user who wishes to create large connection pools to support potential burst period, will exhaust a large amount of memory to maintain connected Idle clients. Some users may choose to "sacrifice" performance in order to save memory. This commit introduce a dynamic mechanism to shrink and expend the client reply buffer based on periodic observed peak. the algorithm works as follows: 1. each time a client reply buffer has been fully written, the last recorded peak is updated: new peak = MAX( last peak, current written size) 2. during clients cron we check for each client if the last observed peak was: a. matching the current buffer size - in which case we expend (resize) the buffer size by 100% b. less than half the buffer size - in which case we shrink the buffer size by 50% 3. In any case we will **not** resize the buffer in case: a. the current buffer peak is less then the current buffer usable size and higher than 1/2 the current buffer usable size b. the value of (current buffer usable size/2) is less than 1Kib c. the value of (current buffer usable size*2) is larger than 16Kib 4. the peak value is reset to the current buffer position once every **5** seconds. we maintain a new field in the client structure (buf_peak_last_reset_time) which is used to keep track of how long it passed since the last buffer peak reset. ### **Interface changes:** **CIENT LIST** - now contains 2 new extra fields: rbs= < the current size in bytes of the client reply buffer > rbp=< the current value in bytes of the last observed buffer peak position > **INFO STATS** - now contains 2 new statistics: reply_buffer_shrinks = < total number of buffer shrinks performed > reply_buffer_expends = < total number of buffer expends performed > Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Yoav Steinberg <yoav@redislabs.com>
2022-02-22 10:19:38 +01:00
char *buf;
} client;
struct saveparam {
time_t seconds;
int changes;
};
struct moduleLoadQueueEntry {
sds path;
int argc;
robj **argv;
};
struct sentinelLoadQueueEntry {
int argc;
sds *argv;
int linenum;
sds line;
};
struct sentinelConfig {
list *pre_monitor_cfg;
list *monitor_cfg;
list *post_monitor_cfg;
};
struct sharedObjectsStruct {
robj *crlf, *ok, *err, *emptybulk, *czero, *cone, *pong, *space,
*queued, *null[4], *nullarray[4], *emptymap[4], *emptyset[4],
*emptyarray, *wrongtypeerr, *nokeyerr, *syntaxerr, *sameobjecterr,
*outofrangeerr, *noscripterr, *loadingerr,
*slowevalerr, *slowscripterr, *slowmoduleerr, *bgsaveerr,
*masterdownerr, *roslaveerr, *execaborterr, *noautherr, *noreplicaserr,
*busykeyerr, *oomerr, *plus, *messagebulk, *pmessagebulk, *subscribebulk,
*unsubscribebulk, *psubscribebulk, *punsubscribebulk, *del, *unlink,
*rpop, *lpop, *lpush, *rpoplpush, *lmove, *blmove, *zpopmin, *zpopmax,
*emptyscan, *multi, *exec, *left, *right, *hset, *srem, *xgroup, *xclaim,
*script, *replconf, *eval, *persist, *set, *pexpireat, *pexpire,
Add stream consumer group lag tracking and reporting (#9127) Adds the ability to track the lag of a consumer group (CG), that is, the number of entries yet-to-be-delivered from the stream. The proposed constant-time solution is in the spirit of "best-effort." Partially addresses #8737. ## Description of approach We add a new "entries_added" property to the stream. This starts at 0 for a new stream and is incremented by 1 with every `XADD`. It is essentially an all-time counter of the entries added to the stream. Given the stream's length and this counter value, we can trivially find the logical "entries_added" counter of the first ID if and only if the stream is contiguous. A fragmented stream contains one or more tombstones generated by `XDEL`s. The new "xdel_max_id" stream property tracks the latest tombstone. The CG also tracks its last delivered ID's as an "entries_read" counter and increments it independently when delivering new messages, unless the this read counter is invalid (-1 means invalid offset). When the CG's counter is available, the reported lag is the difference between added and read counters. Lastly, this also adds a "first_id" field to the stream structure in order to make looking it up cheaper in most cases. ## Limitations There are two cases in which the mechanism isn't able to track the lag. In these cases, `XINFO` replies with `null` in the "lag" field. The first case is when a CG is created with an arbitrary last delivered ID, that isn't "0-0", nor the first or the last entries of the stream. In this case, it is impossible to obtain a valid read counter (short of an O(N) operation). The second case is when there are one or more tombstones fragmenting the stream's entries range. In both cases, given enough time and assuming that the consumers are active (reading and lacking) and advancing, the CG should be able to catch up with the tip of the stream and report zero lag. Once that's achieved, lag tracking would resume as normal (until the next tombstone is set). ## API changes * `XGROUP CREATE` added with the optional named argument `[ENTRIESREAD entries-read]` for explicitly specifying the new CG's counter. * `XGROUP SETID` added with an optional positional argument `[ENTRIESREAD entries-read]` for specifying the CG's counter. * `XINFO` reports the maximal tombstone ID, the recorded first entry ID, and total number of entries added to the stream. * `XINFO` reports the current lag and logical read counter of CGs. * `XSETID` is an internal command that's used in replication/aof. It has been added with the optional positional arguments `[ENTRIESADDED entries-added] [MAXDELETEDID max-deleted-entry-id]` for propagating the CG's offset and maximal tombstone ID of the stream. ## The generic unsolved problem The current stream implementation doesn't provide an efficient way to obtain the approximate/exact size of a range of entries. While it could've been nice to have that ability (#5813) in general, let alone specifically in the context of CGs, the risk and complexities involved in such implementation are in all likelihood prohibitive. ## A refactoring note The `streamGetEdgeID` has been refactored to accommodate both the existing seek of any entry as well as seeking non-deleted entries (the addition of the `skip_tombstones` argument). Furthermore, this refactoring also migrated the seek logic to use the `streamIterator` (rather than `raxIterator`) that was, in turn, extended with the `skip_tombstones` Boolean struct field to control the emission of these. Co-authored-by: Guy Benoish <guy.benoish@redislabs.com> Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-23 21:34:58 +01:00
*time, *pxat, *absttl, *retrycount, *force, *justid, *entriesread,
*lastid, *ping, *setid, *keepttl, *load, *createconsumer,
*getack, *special_asterick, *special_equals, *default_username, *redacted,
*ssubscribebulk,*sunsubscribebulk,
2015-07-27 09:41:48 +02:00
*select[PROTO_SHARED_SELECT_CMDS],
*integers[OBJ_SHARED_INTEGERS],
*mbulkhdr[OBJ_SHARED_BULKHDR_LEN], /* "*<value>\r\n" */
*bulkhdr[OBJ_SHARED_BULKHDR_LEN], /* "$<value>\r\n" */
*maphdr[OBJ_SHARED_BULKHDR_LEN], /* "%<value>\r\n" */
*sethdr[OBJ_SHARED_BULKHDR_LEN]; /* "~<value>\r\n" */
sds minstring, maxstring;
};
/* ZSETs use a specialized version of Skiplists */
typedef struct zskiplistNode {
sds ele;
double score;
struct zskiplistNode *backward;
struct zskiplistLevel {
struct zskiplistNode *forward;
unsigned long span;
} level[];
} zskiplistNode;
typedef struct zskiplist {
struct zskiplistNode *header, *tail;
unsigned long length;
int level;
} zskiplist;
typedef struct zset {
dict *dict;
zskiplist *zsl;
} zset;
typedef struct clientBufferLimitsConfig {
unsigned long long hard_limit_bytes;
unsigned long long soft_limit_bytes;
time_t soft_limit_seconds;
} clientBufferLimitsConfig;
2015-07-28 16:58:04 +02:00
extern clientBufferLimitsConfig clientBufferLimitsDefaults[CLIENT_TYPE_OBUF_COUNT];
/* The redisOp structure defines a Redis Operation, that is an instance of
* a command with an argument vector, database ID, propagation target
2015-07-27 09:41:48 +02:00
* (PROPAGATE_*), and command pointer.
*
* Currently only used to additionally propagate more commands to AOF/Replication
* after the propagation of the executed command. */
typedef struct redisOp {
robj **argv;
int argc, dbid, target;
} redisOp;
/* Defines an array of Redis operations. There is an API to add to this
Squash merging 125 typo/grammar/comment/doc PRs (#7773) List of squashed commits or PRs =============================== commit 66801ea Author: hwware <wen.hui.ware@gmail.com> Date: Mon Jan 13 00:54:31 2020 -0500 typo fix in acl.c commit 46f55db Author: Itamar Haber <itamar@redislabs.com> Date: Sun Sep 6 18:24:11 2020 +0300 Updates a couple of comments Specifically: * RM_AutoMemory completed instead of pointing to docs * Updated link to custom type doc commit 61a2aa0 Author: xindoo <xindoo@qq.com> Date: Tue Sep 1 19:24:59 2020 +0800 Correct errors in code comments commit a5871d1 Author: yz1509 <pro-756@qq.com> Date: Tue Sep 1 18:36:06 2020 +0800 fix typos in module.c commit 41eede7 Author: bookug <bookug@qq.com> Date: Sat Aug 15 01:11:33 2020 +0800 docs: fix typos in comments commit c303c84 Author: lazy-snail <ws.niu@outlook.com> Date: Fri Aug 7 11:15:44 2020 +0800 fix spelling in redis.conf commit 1eb76bf Author: zhujian <zhujianxyz@gmail.com> Date: Thu Aug 6 15:22:10 2020 +0800 add a missing 'n' in comment commit 1530ec2 Author: Daniel Dai <764122422@qq.com> Date: Mon Jul 27 00:46:35 2020 -0400 fix spelling in tracking.c commit e517b31 Author: Hunter-Chen <huntcool001@gmail.com> Date: Fri Jul 17 22:33:32 2020 +0800 Update redis.conf Co-authored-by: Itamar Haber <itamar@redislabs.com> commit c300eff Author: Hunter-Chen <huntcool001@gmail.com> Date: Fri Jul 17 22:33:23 2020 +0800 Update redis.conf Co-authored-by: Itamar Haber <itamar@redislabs.com> commit 4c058a8 Author: 陈浩鹏 <chenhaopeng@heytea.com> Date: Thu Jun 25 19:00:56 2020 +0800 Grammar fix and clarification commit 5fcaa81 Author: bodong.ybd <bodong.ybd@alibaba-inc.com> Date: Fri Jun 19 10:09:00 2020 +0800 Fix typos commit 4caca9a Author: Pruthvi P <pruthvi@ixigo.com> Date: Fri May 22 00:33:22 2020 +0530 Fix typo eviciton => eviction commit b2a25f6 Author: Brad Dunbar <dunbarb2@gmail.com> Date: Sun May 17 12:39:59 2020 -0400 Fix a typo. commit 12842ae Author: hwware <wen.hui.ware@gmail.com> Date: Sun May 3 17:16:59 2020 -0400 fix spelling in redis conf commit ddba07c Author: Chris Lamb <chris@chris-lamb.co.uk> Date: Sat May 2 23:25:34 2020 +0100 Correct a "conflicts" spelling error. commit 8fc7bf2 Author: Nao YONASHIRO <yonashiro@r.recruit.co.jp> Date: Thu Apr 30 10:25:27 2020 +0900 docs: fix EXPIRE_FAST_CYCLE_DURATION to ACTIVE_EXPIRE_CYCLE_FAST_DURATION commit 9b2b67a Author: Brad Dunbar <dunbarb2@gmail.com> Date: Fri Apr 24 11:46:22 2020 -0400 Fix a typo. commit 0746f10 Author: devilinrust <63737265+devilinrust@users.noreply.github.com> Date: Thu Apr 16 00:17:53 2020 +0200 Fix typos in server.c commit 92b588d Author: benjessop12 <56115861+benjessop12@users.noreply.github.com> Date: Mon Apr 13 13:43:55 2020 +0100 Fix spelling mistake in lazyfree.c commit 1da37aa Merge: 2d4ba28 af347a8 Author: hwware <wen.hui.ware@gmail.com> Date: Thu Mar 5 22:41:31 2020 -0500 Merge remote-tracking branch 'upstream/unstable' into expiretypofix commit 2d4ba28 Author: hwware <wen.hui.ware@gmail.com> Date: Mon Mar 2 00:09:40 2020 -0500 fix typo in expire.c commit 1a746f7 Author: SennoYuki <minakami1yuki@gmail.com> Date: Thu Feb 27 16:54:32 2020 +0800 fix typo commit 8599b1a Author: dongheejeong <donghee950403@gmail.com> Date: Sun Feb 16 20:31:43 2020 +0000 Fix typo in server.c commit f38d4e8 Author: hwware <wen.hui.ware@gmail.com> Date: Sun Feb 2 22:58:38 2020 -0500 fix typo in evict.c commit fe143fc Author: Leo Murillo <leonardo.murillo@gmail.com> Date: Sun Feb 2 01:57:22 2020 -0600 Fix a few typos in redis.conf commit 1ab4d21 Author: viraja1 <anchan.viraj@gmail.com> Date: Fri Dec 27 17:15:58 2019 +0530 Fix typo in Latency API docstring commit ca1f70e Author: gosth <danxuedexing@qq.com> Date: Wed Dec 18 15:18:02 2019 +0800 fix typo in sort.c commit a57c06b Author: ZYunH <zyunhjob@163.com> Date: Mon Dec 16 22:28:46 2019 +0800 fix-zset-typo commit b8c92b5 Author: git-hulk <hulk.website@gmail.com> Date: Mon Dec 16 15:51:42 2019 +0800 FIX: typo in cluster.c, onformation->information commit 9dd981c Author: wujm2007 <jim.wujm@gmail.com> Date: Mon Dec 16 09:37:52 2019 +0800 Fix typo commit e132d7a Author: Sebastien Williams-Wynn <s.williamswynn.mail@gmail.com> Date: Fri Nov 15 00:14:07 2019 +0000 Minor typo change commit 47f44d5 Author: happynote3966 <01ssrmikururudevice01@gmail.com> Date: Mon Nov 11 22:08:48 2019 +0900 fix comment typo in redis-cli.c commit b8bdb0d Author: fulei <fulei@kuaishou.com> Date: Wed Oct 16 18:00:17 2019 +0800 Fix a spelling mistake of comments in defragDictBucketCallback commit 0def46a Author: fulei <fulei@kuaishou.com> Date: Wed Oct 16 13:09:27 2019 +0800 fix some spelling mistakes of comments in defrag.c commit f3596fd Author: Phil Rajchgot <tophil@outlook.com> Date: Sun Oct 13 02:02:32 2019 -0400 Typo and grammar fixes Redis and its documentation are great -- just wanted to submit a few corrections in the spirit of Hacktoberfest. Thanks for all your work on this project. I use it all the time and it works beautifully. commit 2b928cd Author: KangZhiDong <worldkzd@gmail.com> Date: Sun Sep 1 07:03:11 2019 +0800 fix typos commit 33aea14 Author: Axlgrep <axlgrep@gmail.com> Date: Tue Aug 27 11:02:18 2019 +0800 Fixed eviction spelling issues commit e282a80 Author: Simen Flatby <simen@oms.no> Date: Tue Aug 20 15:25:51 2019 +0200 Update comments to reflect prop name In the comments the prop is referenced as replica-validity-factor, but it is really named cluster-replica-validity-factor. commit 74d1f9a Author: Jim Green <jimgreen2013@qq.com> Date: Tue Aug 20 20:00:31 2019 +0800 fix comment error, the code is ok commit eea1407 Author: Liao Tonglang <liaotonglang@gmail.com> Date: Fri May 31 10:16:18 2019 +0800 typo fix fix cna't to can't commit 0da553c Author: KAWACHI Takashi <tkawachi@gmail.com> Date: Wed Jul 17 00:38:16 2019 +0900 Fix typo commit 7fc8fb6 Author: Michael Prokop <mika@grml.org> Date: Tue May 28 17:58:42 2019 +0200 Typo fixes s/familar/familiar/ s/compatiblity/compatibility/ s/ ot / to / s/itsef/itself/ commit 5f46c9d Author: zhumoing <34539422+zhumoing@users.noreply.github.com> Date: Tue May 21 21:16:50 2019 +0800 typo-fixes typo-fixes commit 321dfe1 Author: wxisme <850885154@qq.com> Date: Sat Mar 16 15:10:55 2019 +0800 typo fix commit b4fb131 Merge: 267e0e6 3df1eb8 Author: Nikitas Bastas <nikitasbst@gmail.com> Date: Fri Feb 8 22:55:45 2019 +0200 Merge branch 'unstable' of antirez/redis into unstable commit 267e0e6 Author: Nikitas Bastas <nikitasbst@gmail.com> Date: Wed Jan 30 21:26:04 2019 +0200 Minor typo fix commit 30544e7 Author: inshal96 <39904558+inshal96@users.noreply.github.com> Date: Fri Jan 4 16:54:50 2019 +0500 remove an extra 'a' in the comments commit 337969d Author: BrotherGao <yangdongheng11@gmail.com> Date: Sat Dec 29 12:37:29 2018 +0800 fix typo in redis.conf commit 9f4b121 Merge: 423a030 e504583 Author: BrotherGao <yangdongheng@xiaomi.com> Date: Sat Dec 29 11:41:12 2018 +0800 Merge branch 'unstable' of antirez/redis into unstable commit 423a030 Merge: 42b02b7 46a51cd Author: 杨东衡 <yangdongheng@xiaomi.com> Date: Tue Dec 4 23:56:11 2018 +0800 Merge branch 'unstable' of antirez/redis into unstable commit 42b02b7 Merge: 68c0e6e b8febe6 Author: Dongheng Yang <yangdongheng11@gmail.com> Date: Sun Oct 28 15:54:23 2018 +0800 Merge pull request #1 from antirez/unstable update local data commit 714b589 Author: Christian <crifei93@gmail.com> Date: Fri Dec 28 01:17:26 2018 +0100 fix typo "resulution" commit e23259d Author: garenchan <1412950785@qq.com> Date: Wed Dec 26 09:58:35 2018 +0800 fix typo: segfauls -> segfault commit a9359f8 Author: xjp <jianping_xie@aliyun.com> Date: Tue Dec 18 17:31:44 2018 +0800 Fixed REDISMODULE_H spell bug commit a12c3e4 Author: jdiaz <jrd.palacios@gmail.com> Date: Sat Dec 15 23:39:52 2018 -0600 Fixes hyperloglog hash function comment block description commit 770eb11 Author: 林上耀 <1210tom@163.com> Date: Sun Nov 25 17:16:10 2018 +0800 fix typo commit fd97fbb Author: Chris Lamb <chris@chris-lamb.co.uk> Date: Fri Nov 23 17:14:01 2018 +0100 Correct "unsupported" typo. commit a85522d Author: Jungnam Lee <jungnam.lee@oracle.com> Date: Thu Nov 8 23:01:29 2018 +0900 fix typo in test comments commit ade8007 Author: Arun Kumar <palerdot@users.noreply.github.com> Date: Tue Oct 23 16:56:35 2018 +0530 Fixed grammatical typo Fixed typo for word 'dictionary' commit 869ee39 Author: Hamid Alaei <hamid.a85@gmail.com> Date: Sun Aug 12 16:40:02 2018 +0430 fix documentations: (ThreadSafeContextStart/Stop -> ThreadSafeContextLock/Unlock), minor typo commit f89d158 Author: Mayank Jain <mayankjain255@gmail.com> Date: Tue Jul 31 23:01:21 2018 +0530 Updated README.md with some spelling corrections. Made correction in spelling of some misspelled words. commit 892198e Author: dsomeshwar <someshwar.dhayalan@gmail.com> Date: Sat Jul 21 23:23:04 2018 +0530 typo fix commit 8a4d780 Author: Itamar Haber <itamar@redislabs.com> Date: Mon Apr 30 02:06:52 2018 +0300 Fixes some typos commit e3acef6 Author: Noah Rosamilia <ivoahivoah@gmail.com> Date: Sat Mar 3 23:41:21 2018 -0500 Fix typo in /deps/README.md commit 04442fb Author: WuYunlong <xzsyeb@126.com> Date: Sat Mar 3 10:32:42 2018 +0800 Fix typo in readSyncBulkPayload() comment. commit 9f36880 Author: WuYunlong <xzsyeb@126.com> Date: Sat Mar 3 10:20:37 2018 +0800 replication.c comment: run_id -> replid. commit f866b4a Author: Francesco 'makevoid' Canessa <makevoid@gmail.com> Date: Thu Feb 22 22:01:56 2018 +0000 fix comment typo in server.c commit 0ebc69b Author: 줍 <jubee0124@gmail.com> Date: Mon Feb 12 16:38:48 2018 +0900 Fix typo in redis.conf Fix `five behaviors` to `eight behaviors` in [this sentence ](antirez/redis@unstable/redis.conf#L564) commit b50a620 Author: martinbroadhurst <martinbroadhurst@users.noreply.github.com> Date: Thu Dec 28 12:07:30 2017 +0000 Fix typo in valgrind.sup commit 7d8f349 Author: Peter Boughton <peter@sorcerersisle.com> Date: Mon Nov 27 19:52:19 2017 +0000 Update CONTRIBUTING; refer doc updates to redis-doc repo. commit 02dec7e Author: Klauswk <klauswk1@hotmail.com> Date: Tue Oct 24 16:18:38 2017 -0200 Fix typo in comment commit e1efbc8 Author: chenshi <baiwfg2@gmail.com> Date: Tue Oct 3 18:26:30 2017 +0800 Correct two spelling errors of comments commit 93327d8 Author: spacewander <spacewanderlzx@gmail.com> Date: Wed Sep 13 16:47:24 2017 +0800 Update the comment for OBJ_ENCODING_EMBSTR_SIZE_LIMIT's value The value of OBJ_ENCODING_EMBSTR_SIZE_LIMIT is 44 now instead of 39. commit 63d361f Author: spacewander <spacewanderlzx@gmail.com> Date: Tue Sep 12 15:06:42 2017 +0800 Fix <prevlen> related doc in ziplist.c According to the definition of ZIP_BIG_PREVLEN and other related code, the guard of single byte <prevlen> should be 254 instead of 255. commit ebe228d Author: hanael80 <hanael80@gmail.com> Date: Tue Aug 15 09:09:40 2017 +0900 Fix typo commit 6b696e6 Author: Matt Robenolt <matt@ydekproductions.com> Date: Mon Aug 14 14:50:47 2017 -0700 Fix typo in LATENCY DOCTOR output commit a2ec6ae Author: caosiyang <caosiyang@qiyi.com> Date: Tue Aug 15 14:15:16 2017 +0800 Fix a typo: form => from commit 3ab7699 Author: caosiyang <caosiyang@qiyi.com> Date: Thu Aug 10 18:40:33 2017 +0800 Fix a typo: replicationFeedSlavesFromMaster() => replicationFeedSlavesFromMasterStream() commit 72d43ef Author: caosiyang <caosiyang@qiyi.com> Date: Tue Aug 8 15:57:25 2017 +0800 fix a typo: servewr => server commit 707c958 Author: Bo Cai <charpty@gmail.com> Date: Wed Jul 26 21:49:42 2017 +0800 redis-cli.c typo: conut -> count. Signed-off-by: Bo Cai <charpty@gmail.com> commit b9385b2 Author: JackDrogon <jack.xsuperman@gmail.com> Date: Fri Jun 30 14:22:31 2017 +0800 Fix some spell problems commit 20d9230 Author: akosel <aaronjkosel@gmail.com> Date: Sun Jun 4 19:35:13 2017 -0500 Fix typo commit b167bfc Author: Krzysiek Witkowicz <krzysiekwitkowicz@gmail.com> Date: Mon May 22 21:32:27 2017 +0100 Fix #4008 small typo in comment commit 2b78ac8 Author: Jake Clarkson <jacobwclarkson@gmail.com> Date: Wed Apr 26 15:49:50 2017 +0100 Correct typo in tests/unit/hyperloglog.tcl commit b0f1cdb Author: Qi Luo <qiluo-msft@users.noreply.github.com> Date: Wed Apr 19 14:25:18 2017 -0700 Fix typo commit a90b0f9 Author: charsyam <charsyam@naver.com> Date: Thu Mar 16 18:19:53 2017 +0900 fix typos fix typos fix typos commit 8430a79 Author: Richard Hart <richardhart92@gmail.com> Date: Mon Mar 13 22:17:41 2017 -0400 Fixed log message typo in listenToPort. commit 481a1c2 Author: Vinod Kumar <kumar003vinod@gmail.com> Date: Sun Jan 15 23:04:51 2017 +0530 src/db.c: Correct "save" -> "safe" typo commit 586b4d3 Author: wangshaonan <wshn13@gmail.com> Date: Wed Dec 21 20:28:27 2016 +0800 Fix typo they->the in helloworld.c commit c1c4b5e Author: Jenner <hypxm@qq.com> Date: Mon Dec 19 16:39:46 2016 +0800 typo error commit 1ee1a3f Author: tielei <43289893@qq.com> Date: Mon Jul 18 13:52:25 2016 +0800 fix some comments commit 11a41fb Author: Otto Kekäläinen <otto@seravo.fi> Date: Sun Jul 3 10:23:55 2016 +0100 Fix spelling in documentation and comments commit 5fb5d82 Author: francischan <f1ancis621@gmail.com> Date: Tue Jun 28 00:19:33 2016 +0800 Fix outdated comments about redis.c file. It should now refer to server.c file. commit 6b254bc Author: lmatt-bit <lmatt123n@gmail.com> Date: Thu Apr 21 21:45:58 2016 +0800 Refine the comment of dictRehashMilliseconds func SLAVECONF->REPLCONF in comment - by andyli029 commit ee9869f Author: clark.kang <charsyam@naver.com> Date: Tue Mar 22 11:09:51 2016 +0900 fix typos commit f7b3b11 Author: Harisankar H <harisankarh@gmail.com> Date: Wed Mar 9 11:49:42 2016 +0530 Typo correction: "faield" --> "failed" Typo correction: "faield" --> "failed" commit 3fd40fc Author: Itamar Haber <itamar@redislabs.com> Date: Thu Feb 25 10:31:51 2016 +0200 Fixes a typo in comments commit 621c160 Author: Prayag Verma <prayag.verma@gmail.com> Date: Mon Feb 1 12:36:20 2016 +0530 Fix typo in Readme.md Spelling mistakes - `eviciton` > `eviction` `familar` > `familiar` commit d7d07d6 Author: WonCheol Lee <toctoc21c@gmail.com> Date: Wed Dec 30 15:11:34 2015 +0900 Typo fixed commit a4dade7 Author: Felix Bünemann <buenemann@louis.info> Date: Mon Dec 28 11:02:55 2015 +0100 [ci skip] Improve supervised upstart config docs This mentions that "expect stop" is required for supervised upstart to work correctly. See http://upstart.ubuntu.com/cookbook/#expect-stop for an explanation. commit d9caba9 Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:30:03 2015 +1100 README: Remove trailing whitespace commit 72d42e5 Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:29:32 2015 +1100 README: Fix typo. th => the commit dd6e957 Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:29:20 2015 +1100 README: Fix typo. familar => familiar commit 3a12b23 Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:28:54 2015 +1100 README: Fix typo. eviciton => eviction commit 2d1d03b Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:21:45 2015 +1100 README: Fix typo. sever => server commit 3973b06 Author: Itamar Haber <itamar@garantiadata.com> Date: Sat Dec 19 17:01:20 2015 +0200 Typo fix commit 4f2e460 Author: Steve Gao <fu@2token.com> Date: Fri Dec 4 10:22:05 2015 +0800 Update README - fix typos commit b21667c Author: binyan <binbin.yan@nokia.com> Date: Wed Dec 2 22:48:37 2015 +0800 delete redundancy color judge in sdscatcolor commit 88894c7 Author: binyan <binbin.yan@nokia.com> Date: Wed Dec 2 22:14:42 2015 +0800 the example output shoule be HelloWorld commit 2763470 Author: binyan <binbin.yan@nokia.com> Date: Wed Dec 2 17:41:39 2015 +0800 modify error word keyevente Signed-off-by: binyan <binbin.yan@nokia.com> commit 0847b3d Author: Bruno Martins <bscmartins@gmail.com> Date: Wed Nov 4 11:37:01 2015 +0000 typo commit bbb9e9e Author: dawedawe <dawedawe@gmx.de> Date: Fri Mar 27 00:46:41 2015 +0100 typo: zimap -> zipmap commit 5ed297e Author: Axel Advento <badwolf.bloodseeker.rev@gmail.com> Date: Tue Mar 3 15:58:29 2015 +0800 Fix 'salve' typos to 'slave' commit edec9d6 Author: LudwikJaniuk <ludvig.janiuk@gmail.com> Date: Wed Jun 12 14:12:47 2019 +0200 Update README.md Co-Authored-By: Qix <Qix-@users.noreply.github.com> commit 692a7af Author: LudwikJaniuk <ludvig.janiuk@gmail.com> Date: Tue May 28 14:32:04 2019 +0200 grammar commit d962b0a Author: Nick Frost <nickfrostatx@gmail.com> Date: Wed Jul 20 15:17:12 2016 -0700 Minor grammar fix commit 24fff01aaccaf5956973ada8c50ceb1462e211c6 (typos) Author: Chad Miller <chadm@squareup.com> Date: Tue Sep 8 13:46:11 2020 -0400 Fix faulty comment about operation of unlink() commit 3cd5c1f3326c52aa552ada7ec797c6bb16452355 Author: Kevin <kevin.xgr@gmail.com> Date: Wed Nov 20 00:13:50 2019 +0800 Fix typo in server.c. From a83af59 Mon Sep 17 00:00:00 2001 From: wuwo <wuwo@wacai.com> Date: Fri, 17 Mar 2017 20:37:45 +0800 Subject: [PATCH] falure to failure From c961896 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E5=B7=A6=E6=87=B6?= <veficos@gmail.com> Date: Sat, 27 May 2017 15:33:04 +0800 Subject: [PATCH] fix typo From e600ef2 Mon Sep 17 00:00:00 2001 From: "rui.zou" <rui.zou@yunify.com> Date: Sat, 30 Sep 2017 12:38:15 +0800 Subject: [PATCH] fix a typo From c7d07fa Mon Sep 17 00:00:00 2001 From: Alexandre Perrin <alex@kaworu.ch> Date: Thu, 16 Aug 2018 10:35:31 +0200 Subject: [PATCH] deps README.md typo From b25cb67 Mon Sep 17 00:00:00 2001 From: Guy Korland <gkorland@gmail.com> Date: Wed, 26 Sep 2018 10:55:37 +0300 Subject: [PATCH 1/2] fix typos in header From ad28ca6 Mon Sep 17 00:00:00 2001 From: Guy Korland <gkorland@gmail.com> Date: Wed, 26 Sep 2018 11:02:36 +0300 Subject: [PATCH 2/2] fix typos commit 34924cdedd8552466fc22c1168d49236cb7ee915 Author: Adrian Lynch <adi_ady_ade@hotmail.com> Date: Sat Apr 4 21:59:15 2015 +0100 Typos fixed commit fd2a1e7 Author: Jan <jsteemann@users.noreply.github.com> Date: Sat Oct 27 19:13:01 2018 +0200 Fix typos Fix typos commit e14e47c1a234b53b0e103c5f6a1c61481cbcbb02 Author: Andy Lester <andy@petdance.com> Date: Fri Aug 2 22:30:07 2019 -0500 Fix multiple misspellings of "following" commit 79b948ce2dac6b453fe80995abbcaac04c213d5a Author: Andy Lester <andy@petdance.com> Date: Fri Aug 2 22:24:28 2019 -0500 Fix misspelling of create-cluster commit 1fffde52666dc99ab35efbd31071a4c008cb5a71 Author: Andy Lester <andy@petdance.com> Date: Wed Jul 31 17:57:56 2019 -0500 Fix typos commit 204c9ba9651e9e05fd73936b452b9a30be456cfe Author: Xiaobo Zhu <xiaobo.zhu@shopee.com> Date: Tue Aug 13 22:19:25 2019 +0800 fix typos Squashed commit of the following: commit 1d9aaf8 Author: danmedani <danmedani@gmail.com> Date: Sun Aug 2 11:40:26 2015 -0700 README typo fix. Squashed commit of the following: commit 32bfa7c Author: Erik Dubbelboer <erik@dubbelboer.com> Date: Mon Jul 6 21:15:08 2015 +0200 Fixed grammer Squashed commit of the following: commit b24f69c Author: Sisir Koppaka <sisir.koppaka@gmail.com> Date: Mon Mar 2 22:38:45 2015 -0500 utils/hashtable/rehashing.c: Fix typos Squashed commit of the following: commit 4e04082 Author: Erik Dubbelboer <erik@dubbelboer.com> Date: Mon Mar 23 08:22:21 2015 +0000 Small config file documentation improvements Squashed commit of the following: commit acb8773 Author: ctd1500 <ctd1500@gmail.com> Date: Fri May 8 01:52:48 2015 -0700 Typo and grammar fixes in readme commit 2eb75b6 Author: ctd1500 <ctd1500@gmail.com> Date: Fri May 8 01:36:18 2015 -0700 fixed redis.conf comment Squashed commit of the following: commit a8249a2 Author: Masahiko Sawada <sawada.mshk@gmail.com> Date: Fri Dec 11 11:39:52 2015 +0530 Revise correction of typos. Squashed commit of the following: commit 3c02028 Author: zhaojun11 <zhaojun11@jd.com> Date: Wed Jan 17 19:05:28 2018 +0800 Fix typos include two code typos in cluster.c and latency.c Squashed commit of the following: commit 9dba47c Author: q191201771 <191201771@qq.com> Date: Sat Jan 4 11:31:04 2020 +0800 fix function listCreate comment in adlist.c Update src/server.c commit 2c7c2cb536e78dd211b1ac6f7bda00f0f54faaeb Author: charpty <charpty@gmail.com> Date: Tue May 1 23:16:59 2018 +0800 server.c typo: modules system dictionary type comment Signed-off-by: charpty <charpty@gmail.com> commit a8395323fb63cb59cb3591cb0f0c8edb7c29a680 Author: Itamar Haber <itamar@redislabs.com> Date: Sun May 6 00:25:18 2018 +0300 Updates test_helper.tcl's help with undocumented options Specifically: * Host * Port * Client commit bde6f9ced15755cd6407b4af7d601b030f36d60b Author: wxisme <850885154@qq.com> Date: Wed Aug 8 15:19:19 2018 +0800 fix comments in deps files commit 3172474ba991532ab799ee1873439f3402412331 Author: wxisme <850885154@qq.com> Date: Wed Aug 8 14:33:49 2018 +0800 fix some comments commit 01b6f2b6858b5cf2ce4ad5092d2c746e755f53f0 Author: Thor Juhasz <thor@juhasz.pro> Date: Sun Nov 18 14:37:41 2018 +0100 Minor fixes to comments Found some parts a little unclear on a first read, which prompted me to have a better look at the file and fix some minor things I noticed. Fixing minor typos and grammar. There are no changes to configuration options. These changes are only meant to help the user better understand the explanations to the various configuration options
2020-09-10 12:43:38 +02:00
* structure in an easy way.
*
* redisOpArrayInit();
* redisOpArrayAppend();
* redisOpArrayFree();
*/
typedef struct redisOpArray {
redisOp *ops;
int numops;
Sort out mess around propagation and MULTI/EXEC (#9890) The mess: Some parts use alsoPropagate for late propagation, others using an immediate one (propagate()), causing edge cases, ugly/hacky code, and the tendency for bugs The basic idea is that all commands are propagated via alsoPropagate (i.e. added to a list) and the top-most call() is responsible for going over that list and actually propagating them (and wrapping them in MULTI/EXEC if there's more than one command). This is done in the new function, propagatePendingCommands. Callers to propagatePendingCommands: 1. top-most call() (we want all nested call()s to add to the also_propagate array and just the top-most one to propagate them) - via `afterCommand` 2. handleClientsBlockedOnKeys: it is out of call() context and it may propagate stuff - via `afterCommand`. 3. handleClientsBlockedOnKeys edge case: if the looked-up key is already expired, we will propagate the expire but will not unblock any client so `afterCommand` isn't called. in that case, we have to propagate the deletion explicitly. 4. cron stuff: active-expire and eviction may also propagate stuff 5. modules: the module API allows to propagate stuff from just about anywhere (timers, keyspace notifications, threads). I could have tried to catch all the out-of-call-context places but it seemed easier to handle it in one place: when we free the context. in the spirit of what was done in call(), only the top-most freeing of a module context may cause propagation. 6. modules: when using a thread-safe ctx it's not clear when/if the ctx will be freed. we do know that the module must lock the GIL before calling RM_Replicate/RM_Call so we propagate the pending commands when releasing the GIL. A "known limitation", which were actually a bug, was fixed because of this commit (see propagate.tcl): When using a mix of RM_Call with `!` and RM_Replicate, the command would propagate out-of-order: first all the commands from RM_Call, and then the ones from RM_Replicate Another thing worth mentioning is that if, in the past, a client would issue a MULTI/EXEC with just one write command the server would blindly propagate the MULTI/EXEC too, even though it's redundant. not anymore. This commit renames propagate() to propagateNow() in order to cause conflicts in pending PRs. propagatePendingCommands is the only caller of propagateNow, which is now a static, internal helper function. Optimizations: 1. alsoPropagate will not add stuff to also_propagate if there's no AOF and replicas 2. alsoPropagate reallocs also_propagagte exponentially, to save calls to memmove Bugfixes: 1. CONFIG SET can create evictions, sending notifications which can cause to dirty++ with modules. we need to prevent it from propagating to AOF/replicas 2. We need to set current_client in RM_Call. buggy scenario: - CONFIG SET maxmemory, eviction notifications, module hook calls RM_Call - assertion in lookupKey crashes, because current_client has CONFIG SET, which isn't CMD_WRITE 3. minor: in eviction, call propagateDeletion after notification, like active-expire and all commands (we always send a notification before propagating the command)
2021-12-22 23:03:48 +01:00
int capacity;
} redisOpArray;
/* This structure is returned by the getMemoryOverheadData() function in
* order to return memory overhead information. */
struct redisMemOverhead {
size_t peak_allocated;
size_t total_allocated;
size_t startup_allocated;
size_t repl_backlog;
size_t clients_slaves;
size_t clients_normal;
size_t cluster_links;
size_t aof_buffer;
size_t lua_caches;
size_t functions_caches;
size_t overhead_total;
size_t dataset;
2016-09-16 16:36:53 +02:00
size_t total_keys;
size_t bytes_per_key;
float dataset_perc;
float peak_perc;
float total_frag;
ssize_t total_frag_bytes;
float allocator_frag;
ssize_t allocator_frag_bytes;
float allocator_rss;
ssize_t allocator_rss_bytes;
float rss_extra;
size_t rss_extra_bytes;
size_t num_dbs;
struct {
size_t dbid;
size_t overhead_ht_main;
size_t overhead_ht_expires;
size_t overhead_ht_slot_to_keys;
} *db;
};
/* Replication error behavior determines the replica behavior
* when it receives an error over the replication stream. In
* either case the error is logged. */
typedef enum {
PROPAGATION_ERR_BEHAVIOR_IGNORE = 0,
PROPAGATION_ERR_BEHAVIOR_PANIC,
PROPAGATION_ERR_BEHAVIOR_PANIC_ON_REPLICAS
} replicationErrorBehavior;
PSYNC2: different improvements to Redis replication. The gist of the changes is that now, partial resynchronizations between slaves and masters (without the need of a full resync with RDB transfer and so forth), work in a number of cases when it was impossible in the past. For instance: 1. When a slave is promoted to mastrer, the slaves of the old master can partially resynchronize with the new master. 2. Chained slalves (slaves of slaves) can be moved to replicate to other slaves or the master itsef, without requiring a full resync. 3. The master itself, after being turned into a slave, is able to partially resynchronize with the new master, when it joins replication again. In order to obtain this, the following main changes were operated: * Slaves also take a replication backlog, not just masters. * Same stream replication for all the slaves and sub slaves. The replication stream is identical from the top level master to its slaves and is also the same from the slaves to their sub-slaves and so forth. This means that if a slave is later promoted to master, it has the same replication backlong, and can partially resynchronize with its slaves (that were previously slaves of the old master). * A given replication history is no longer identified by the `runid` of a Redis node. There is instead a `replication ID` which changes every time the instance has a new history no longer coherent with the past one. So, for example, slaves publish the same replication history of their master, however when they are turned into masters, they publish a new replication ID, but still remember the old ID, so that they are able to partially resynchronize with slaves of the old master (up to a given offset). * The replication protocol was slightly modified so that a new extended +CONTINUE reply from the master is able to inform the slave of a replication ID change. * REPLCONF CAPA is used in order to notify masters that a slave is able to understand the new +CONTINUE reply. * The RDB file was extended with an auxiliary field that is able to select a given DB after loading in the slave, so that the slave can continue receiving the replication stream from the point it was disconnected without requiring the master to insert "SELECT" statements. This is useful in order to guarantee the "same stream" property, because the slave must be able to accumulate an identical backlog. * Slave pings to sub-slaves are now sent in a special form, when the top-level master is disconnected, in order to don't interfer with the replication stream. We just use out of band "\n" bytes as in other parts of the Redis protocol. An old design document is available here: https://gist.github.com/antirez/ae068f95c0d084891305 However the implementation is not identical to the description because during the work to implement it, different changes were needed in order to make things working well.
2016-11-09 11:31:06 +01:00
/* This structure can be optionally passed to RDB save/load functions in
* order to implement additional functionalities, by storing and loading
* metadata to the RDB file.
*
* For example, to use select a DB at load time, useful in
PSYNC2: different improvements to Redis replication. The gist of the changes is that now, partial resynchronizations between slaves and masters (without the need of a full resync with RDB transfer and so forth), work in a number of cases when it was impossible in the past. For instance: 1. When a slave is promoted to mastrer, the slaves of the old master can partially resynchronize with the new master. 2. Chained slalves (slaves of slaves) can be moved to replicate to other slaves or the master itsef, without requiring a full resync. 3. The master itself, after being turned into a slave, is able to partially resynchronize with the new master, when it joins replication again. In order to obtain this, the following main changes were operated: * Slaves also take a replication backlog, not just masters. * Same stream replication for all the slaves and sub slaves. The replication stream is identical from the top level master to its slaves and is also the same from the slaves to their sub-slaves and so forth. This means that if a slave is later promoted to master, it has the same replication backlong, and can partially resynchronize with its slaves (that were previously slaves of the old master). * A given replication history is no longer identified by the `runid` of a Redis node. There is instead a `replication ID` which changes every time the instance has a new history no longer coherent with the past one. So, for example, slaves publish the same replication history of their master, however when they are turned into masters, they publish a new replication ID, but still remember the old ID, so that they are able to partially resynchronize with slaves of the old master (up to a given offset). * The replication protocol was slightly modified so that a new extended +CONTINUE reply from the master is able to inform the slave of a replication ID change. * REPLCONF CAPA is used in order to notify masters that a slave is able to understand the new +CONTINUE reply. * The RDB file was extended with an auxiliary field that is able to select a given DB after loading in the slave, so that the slave can continue receiving the replication stream from the point it was disconnected without requiring the master to insert "SELECT" statements. This is useful in order to guarantee the "same stream" property, because the slave must be able to accumulate an identical backlog. * Slave pings to sub-slaves are now sent in a special form, when the top-level master is disconnected, in order to don't interfer with the replication stream. We just use out of band "\n" bytes as in other parts of the Redis protocol. An old design document is available here: https://gist.github.com/antirez/ae068f95c0d084891305 However the implementation is not identical to the description because during the work to implement it, different changes were needed in order to make things working well.
2016-11-09 11:31:06 +01:00
* replication in order to make sure that chained slaves (slaves of slaves)
* select the correct DB and are able to accept the stream coming from the
* top-level master. */
typedef struct rdbSaveInfo {
/* Used saving and loading. */
PSYNC2: different improvements to Redis replication. The gist of the changes is that now, partial resynchronizations between slaves and masters (without the need of a full resync with RDB transfer and so forth), work in a number of cases when it was impossible in the past. For instance: 1. When a slave is promoted to mastrer, the slaves of the old master can partially resynchronize with the new master. 2. Chained slalves (slaves of slaves) can be moved to replicate to other slaves or the master itsef, without requiring a full resync. 3. The master itself, after being turned into a slave, is able to partially resynchronize with the new master, when it joins replication again. In order to obtain this, the following main changes were operated: * Slaves also take a replication backlog, not just masters. * Same stream replication for all the slaves and sub slaves. The replication stream is identical from the top level master to its slaves and is also the same from the slaves to their sub-slaves and so forth. This means that if a slave is later promoted to master, it has the same replication backlong, and can partially resynchronize with its slaves (that were previously slaves of the old master). * A given replication history is no longer identified by the `runid` of a Redis node. There is instead a `replication ID` which changes every time the instance has a new history no longer coherent with the past one. So, for example, slaves publish the same replication history of their master, however when they are turned into masters, they publish a new replication ID, but still remember the old ID, so that they are able to partially resynchronize with slaves of the old master (up to a given offset). * The replication protocol was slightly modified so that a new extended +CONTINUE reply from the master is able to inform the slave of a replication ID change. * REPLCONF CAPA is used in order to notify masters that a slave is able to understand the new +CONTINUE reply. * The RDB file was extended with an auxiliary field that is able to select a given DB after loading in the slave, so that the slave can continue receiving the replication stream from the point it was disconnected without requiring the master to insert "SELECT" statements. This is useful in order to guarantee the "same stream" property, because the slave must be able to accumulate an identical backlog. * Slave pings to sub-slaves are now sent in a special form, when the top-level master is disconnected, in order to don't interfer with the replication stream. We just use out of band "\n" bytes as in other parts of the Redis protocol. An old design document is available here: https://gist.github.com/antirez/ae068f95c0d084891305 However the implementation is not identical to the description because during the work to implement it, different changes were needed in order to make things working well.
2016-11-09 11:31:06 +01:00
int repl_stream_db; /* DB to select in server.master client. */
/* Used only loading. */
int repl_id_is_set; /* True if repl_id field is set. */
char repl_id[CONFIG_RUN_ID_SIZE+1]; /* Replication ID. */
long long repl_offset; /* Replication offset. */
PSYNC2: different improvements to Redis replication. The gist of the changes is that now, partial resynchronizations between slaves and masters (without the need of a full resync with RDB transfer and so forth), work in a number of cases when it was impossible in the past. For instance: 1. When a slave is promoted to mastrer, the slaves of the old master can partially resynchronize with the new master. 2. Chained slalves (slaves of slaves) can be moved to replicate to other slaves or the master itsef, without requiring a full resync. 3. The master itself, after being turned into a slave, is able to partially resynchronize with the new master, when it joins replication again. In order to obtain this, the following main changes were operated: * Slaves also take a replication backlog, not just masters. * Same stream replication for all the slaves and sub slaves. The replication stream is identical from the top level master to its slaves and is also the same from the slaves to their sub-slaves and so forth. This means that if a slave is later promoted to master, it has the same replication backlong, and can partially resynchronize with its slaves (that were previously slaves of the old master). * A given replication history is no longer identified by the `runid` of a Redis node. There is instead a `replication ID` which changes every time the instance has a new history no longer coherent with the past one. So, for example, slaves publish the same replication history of their master, however when they are turned into masters, they publish a new replication ID, but still remember the old ID, so that they are able to partially resynchronize with slaves of the old master (up to a given offset). * The replication protocol was slightly modified so that a new extended +CONTINUE reply from the master is able to inform the slave of a replication ID change. * REPLCONF CAPA is used in order to notify masters that a slave is able to understand the new +CONTINUE reply. * The RDB file was extended with an auxiliary field that is able to select a given DB after loading in the slave, so that the slave can continue receiving the replication stream from the point it was disconnected without requiring the master to insert "SELECT" statements. This is useful in order to guarantee the "same stream" property, because the slave must be able to accumulate an identical backlog. * Slave pings to sub-slaves are now sent in a special form, when the top-level master is disconnected, in order to don't interfer with the replication stream. We just use out of band "\n" bytes as in other parts of the Redis protocol. An old design document is available here: https://gist.github.com/antirez/ae068f95c0d084891305 However the implementation is not identical to the description because during the work to implement it, different changes were needed in order to make things working well.
2016-11-09 11:31:06 +01:00
} rdbSaveInfo;
#define RDB_SAVE_INFO_INIT {-1,0,"0000000000000000000000000000000000000000",-1}
PSYNC2: different improvements to Redis replication. The gist of the changes is that now, partial resynchronizations between slaves and masters (without the need of a full resync with RDB transfer and so forth), work in a number of cases when it was impossible in the past. For instance: 1. When a slave is promoted to mastrer, the slaves of the old master can partially resynchronize with the new master. 2. Chained slalves (slaves of slaves) can be moved to replicate to other slaves or the master itsef, without requiring a full resync. 3. The master itself, after being turned into a slave, is able to partially resynchronize with the new master, when it joins replication again. In order to obtain this, the following main changes were operated: * Slaves also take a replication backlog, not just masters. * Same stream replication for all the slaves and sub slaves. The replication stream is identical from the top level master to its slaves and is also the same from the slaves to their sub-slaves and so forth. This means that if a slave is later promoted to master, it has the same replication backlong, and can partially resynchronize with its slaves (that were previously slaves of the old master). * A given replication history is no longer identified by the `runid` of a Redis node. There is instead a `replication ID` which changes every time the instance has a new history no longer coherent with the past one. So, for example, slaves publish the same replication history of their master, however when they are turned into masters, they publish a new replication ID, but still remember the old ID, so that they are able to partially resynchronize with slaves of the old master (up to a given offset). * The replication protocol was slightly modified so that a new extended +CONTINUE reply from the master is able to inform the slave of a replication ID change. * REPLCONF CAPA is used in order to notify masters that a slave is able to understand the new +CONTINUE reply. * The RDB file was extended with an auxiliary field that is able to select a given DB after loading in the slave, so that the slave can continue receiving the replication stream from the point it was disconnected without requiring the master to insert "SELECT" statements. This is useful in order to guarantee the "same stream" property, because the slave must be able to accumulate an identical backlog. * Slave pings to sub-slaves are now sent in a special form, when the top-level master is disconnected, in order to don't interfer with the replication stream. We just use out of band "\n" bytes as in other parts of the Redis protocol. An old design document is available here: https://gist.github.com/antirez/ae068f95c0d084891305 However the implementation is not identical to the description because during the work to implement it, different changes were needed in order to make things working well.
2016-11-09 11:31:06 +01:00
struct malloc_stats {
size_t zmalloc_used;
size_t process_rss;
size_t allocator_allocated;
size_t allocator_active;
size_t allocator_resident;
};
typedef struct socketFds {
int fd[CONFIG_BINDADDR_MAX];
int count;
} socketFds;
/*-----------------------------------------------------------------------------
* TLS Context Configuration
*----------------------------------------------------------------------------*/
typedef struct redisTLSContextConfig {
char *cert_file; /* Server side and optionally client side cert file name */
char *key_file; /* Private key filename for cert_file */
char *key_file_pass; /* Optional password for key_file */
char *client_cert_file; /* Certificate to use as a client; if none, use cert_file */
char *client_key_file; /* Private key filename for client_cert_file */
char *client_key_file_pass; /* Optional password for client_key_file */
char *dh_params_file;
char *ca_cert_file;
char *ca_cert_dir;
char *protocols;
char *ciphers;
char *ciphersuites;
int prefer_server_ciphers;
int session_caching;
int session_cache_size;
int session_cache_timeout;
} redisTLSContextConfig;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788) Implement Multi-Part AOF mechanism to avoid overheads during AOFRW. Introducing a folder with multiple AOF files tracked by a manifest file. The main issues with the the original AOFRW mechanism are: * buffering of commands that are processed during rewrite (consuming a lot of RAM) * freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it. * double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files) The main modifications of this PR: 1. Remove the AOF rewrite buffer and related code. 2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type, it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the incremental commands since the last AOFRW. 3. Use a AOF manifest file to record and manage these AOF files mentioned above. 4. The original configuration of `appendfilename` will be the base part of the new file name, for example: `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof` 5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename` 6. Remove the `aof_rewrite_buffer_length` field in info. 7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs. It also gives users the opportunity to preserve the history AOFs. just for testing use now. 8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now), we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately. 9. Support upgrade (load) data from old version redis. 10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and manifest file will be placed in this directory. 11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if `aof-load-truncated` is enabled. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-03 18:14:13 +01:00
/*-----------------------------------------------------------------------------
* AOF manifest definition
*----------------------------------------------------------------------------*/
typedef enum {
AOF_FILE_TYPE_BASE = 'b', /* BASE file */
AOF_FILE_TYPE_HIST = 'h', /* HISTORY file */
AOF_FILE_TYPE_INCR = 'i', /* INCR file */
} aof_file_type;
typedef struct {
sds file_name; /* file name */
long long file_seq; /* file sequence */
aof_file_type file_type; /* file type */
} aofInfo;
typedef struct {
aofInfo *base_aof_info; /* BASE file information. NULL if there is no BASE file. */
list *incr_aof_list; /* INCR AOFs list. We may have multiple INCR AOF when rewrite fails. */
list *history_aof_list; /* HISTORY AOF list. When the AOFRW success, The aofInfo contained in
`base_aof_info` and `incr_aof_list` will be moved to this list. We
will delete these AOF files when AOFRW finish. */
long long curr_base_file_seq; /* The sequence number used by the current BASE file. */
long long curr_incr_file_seq; /* The sequence number used by the current INCR file. */
int dirty; /* 1 Indicates that the aofManifest in the memory is inconsistent with
disk, we need to persist it immediately. */
} aofManifest;
2011-03-29 17:51:15 +02:00
/*-----------------------------------------------------------------------------
* Global server state
*----------------------------------------------------------------------------*/
2014-08-07 12:14:21 +02:00
/* AIX defines hz to __hz, we don't use this define and in order to allow
* Redis build on AIX we need to undef it. */
#ifdef _AIX
#undef hz
#endif
#define CHILD_TYPE_NONE 0
#define CHILD_TYPE_RDB 1
#define CHILD_TYPE_AOF 2
#define CHILD_TYPE_LDB 3
#define CHILD_TYPE_MODULE 4
typedef enum childInfoType {
CHILD_INFO_TYPE_CURRENT_INFO,
CHILD_INFO_TYPE_AOF_COW_SIZE,
CHILD_INFO_TYPE_RDB_COW_SIZE,
CHILD_INFO_TYPE_MODULE_COW_SIZE
} childInfoType;
struct redisServer {
/* General */
pid_t pid; /* Main process pid. */
pthread_t main_thread_id; /* Main thread id */
char *configfile; /* Absolute config file path, or NULL */
char *executable; /* Absolute executable file path. */
char **exec_argv; /* Executable argv vector (copy). */
int dynamic_hz; /* Change hz value depending on # of clients. */
int config_hz; /* Configured HZ value. May be different than
the actual 'hz' field value if dynamic-hz
is enabled. */
mode_t umask; /* The umask value of the process on startup */
int hz; /* serverCron() calls frequency in hertz */
int in_fork_child; /* indication that this is a fork child */
redisDb *db;
dict *commands; /* Command table */
dict *orig_commands; /* Command table before command renaming. */
aeEventLoop *el;
rax *errors; /* Errors table */
Implement redisAtomic to replace _Atomic C11 builtin (#7707) Redis 6.0 introduces I/O threads, it is so cool and efficient, we use C11 _Atomic to establish inter-thread synchronization without mutex. But the compiler that must supports C11 _Atomic can compile redis code, that brings a lot of inconvenience since some common platforms can't support by default such as CentOS7, so we want to implement redis atomic type to make it more portable. We have implemented our atomic variable for redis that only has 'relaxed' operations in src/atomicvar.h, so we implement some operations with 'sequentially-consistent', just like the default behavior of C11 _Atomic that can establish inter-thread synchronization. And we replace all uses of C11 _Atomic with redis atomic variable. Our implementation of redis atomic variable uses C11 _Atomic, __atomic or __sync macros if available, it supports most common platforms, and we will detect automatically which feature we use. In Makefile we use a dummy file to detect if the compiler supports C11 _Atomic. Now for gcc, we can compile redis code theoretically if your gcc version is not less than 4.1.2(starts to support __sync_xxx operations). Otherwise, we remove use mutex fallback to implement redis atomic variable for performance and test. You will get compiling errors if your compiler doesn't support all features of above. For cover redis atomic variable tests, we add other CI jobs that build redis on CentOS6 and CentOS7 and workflow daily jobs that run the tests on them. For them, we just install gcc by default in order to cover different compiler versions, gcc is 4.4.7 by default installation on CentOS6 and 4.8.5 on CentOS7. We restore the feature that we can test redis with Helgrind to find data race errors. But you need install Valgrind in the default path configuration firstly before running your tests, since we use macros in helgrind.h to tell Helgrind inter-thread happens-before relationship explicitly for avoiding false positives. Please open an issue on github if you find data race errors relate to this commit. Unrelated: - Fix redefinition of typedef 'RedisModuleUserChangedFunc' For some old version compilers, they will report errors or warnings, if we re-define function type.
2020-09-17 15:01:45 +02:00
redisAtomic unsigned int lruclock; /* Clock for LRU eviction */
Wait for replicas when shutting down (#9872) To avoid data loss, this commit adds a grace period for lagging replicas to catch up the replication offset. Done: * Wait for replicas when shutdown is triggered by SIGTERM and SIGINT. * Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients to call SHUTDOWN in parallel. Note that they don't expect a response unless an error happens and shutdown is aborted. * Log warning for each replica lagging behind when finishing shutdown. * CLIENT_PAUSE_WRITE while waiting for replicas. * Configurable grace period 'shutdown-timeout' in seconds (default 10). * New flags for the SHUTDOWN command: - NOW disables the grace period for lagging replicas. - FORCE ignores errors writing the RDB or AOF files which would normally prevent a shutdown. - ABORT cancels ongoing shutdown. Can't be combined with other flags. * New field in the output of the INFO command: 'shutdown_in_milliseconds'. The value is the remaining maximum time to wait for lagging replicas before finishing the shutdown. This field is present in the Server section **only** during shutdown. Not directly related: * When shutting down, if there is an AOF saving child, it is killed **even** if AOF is disabled. This can happen if BGREWRITEAOF is used when AOF is off. * Client pause now has end time and type (WRITE or ALL) per purpose. The different pause purposes are *CLIENT PAUSE command*, *failover* and *shutdown*. If clients are unpaused for one purpose, it doesn't affect client pause for other purposes. For example, the CLIENT UNPAUSE command doesn't affect client pause initiated by the failover or shutdown procedures. A completed failover or a failed shutdown doesn't unpause clients paused by the CLIENT PAUSE command. Notes: * DEBUG RESTART doesn't wait for replicas. * We already have a warning logged when a replica disconnects. This means that if any replica connection is lost during the shutdown, it is either logged as disconnected or as lagging at the time of exit. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-02 08:50:15 +01:00
volatile sig_atomic_t shutdown_asap; /* Shutdown ordered by signal handler. */
mstime_t shutdown_mstime; /* Timestamp to limit graceful shutdown. */
int last_sig_received; /* Indicates the last SIGNAL received, if any (e.g., SIGINT or SIGTERM). */
Wait for replicas when shutting down (#9872) To avoid data loss, this commit adds a grace period for lagging replicas to catch up the replication offset. Done: * Wait for replicas when shutdown is triggered by SIGTERM and SIGINT. * Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients to call SHUTDOWN in parallel. Note that they don't expect a response unless an error happens and shutdown is aborted. * Log warning for each replica lagging behind when finishing shutdown. * CLIENT_PAUSE_WRITE while waiting for replicas. * Configurable grace period 'shutdown-timeout' in seconds (default 10). * New flags for the SHUTDOWN command: - NOW disables the grace period for lagging replicas. - FORCE ignores errors writing the RDB or AOF files which would normally prevent a shutdown. - ABORT cancels ongoing shutdown. Can't be combined with other flags. * New field in the output of the INFO command: 'shutdown_in_milliseconds'. The value is the remaining maximum time to wait for lagging replicas before finishing the shutdown. This field is present in the Server section **only** during shutdown. Not directly related: * When shutting down, if there is an AOF saving child, it is killed **even** if AOF is disabled. This can happen if BGREWRITEAOF is used when AOF is off. * Client pause now has end time and type (WRITE or ALL) per purpose. The different pause purposes are *CLIENT PAUSE command*, *failover* and *shutdown*. If clients are unpaused for one purpose, it doesn't affect client pause for other purposes. For example, the CLIENT UNPAUSE command doesn't affect client pause initiated by the failover or shutdown procedures. A completed failover or a failed shutdown doesn't unpause clients paused by the CLIENT PAUSE command. Notes: * DEBUG RESTART doesn't wait for replicas. * We already have a warning logged when a replica disconnects. This means that if any replica connection is lost during the shutdown, it is either logged as disconnected or as lagging at the time of exit. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-02 08:50:15 +01:00
int shutdown_flags; /* Flags passed to prepareForShutdown(). */
int activerehashing; /* Incremental rehash in serverCron() */
2016-12-30 02:37:52 +01:00
int active_defrag_running; /* Active defragmentation running (holds current scan aggressiveness) */
char *pidfile; /* PID file path */
int arch_bits; /* 32 or 64 depending on sizeof(long) */
int cronloops; /* Number of times the cron function run */
2015-07-27 09:41:48 +02:00
char runid[CONFIG_RUN_ID_SIZE+1]; /* ID always different at every exec. */
int sentinel_mode; /* True if this instance is a Sentinel. */
size_t initial_memory_usage; /* Bytes used after initialization. */
int always_show_logo; /* Show logo even for non-stdout logging. */
Sort out mess around propagation and MULTI/EXEC (#9890) The mess: Some parts use alsoPropagate for late propagation, others using an immediate one (propagate()), causing edge cases, ugly/hacky code, and the tendency for bugs The basic idea is that all commands are propagated via alsoPropagate (i.e. added to a list) and the top-most call() is responsible for going over that list and actually propagating them (and wrapping them in MULTI/EXEC if there's more than one command). This is done in the new function, propagatePendingCommands. Callers to propagatePendingCommands: 1. top-most call() (we want all nested call()s to add to the also_propagate array and just the top-most one to propagate them) - via `afterCommand` 2. handleClientsBlockedOnKeys: it is out of call() context and it may propagate stuff - via `afterCommand`. 3. handleClientsBlockedOnKeys edge case: if the looked-up key is already expired, we will propagate the expire but will not unblock any client so `afterCommand` isn't called. in that case, we have to propagate the deletion explicitly. 4. cron stuff: active-expire and eviction may also propagate stuff 5. modules: the module API allows to propagate stuff from just about anywhere (timers, keyspace notifications, threads). I could have tried to catch all the out-of-call-context places but it seemed easier to handle it in one place: when we free the context. in the spirit of what was done in call(), only the top-most freeing of a module context may cause propagation. 6. modules: when using a thread-safe ctx it's not clear when/if the ctx will be freed. we do know that the module must lock the GIL before calling RM_Replicate/RM_Call so we propagate the pending commands when releasing the GIL. A "known limitation", which were actually a bug, was fixed because of this commit (see propagate.tcl): When using a mix of RM_Call with `!` and RM_Replicate, the command would propagate out-of-order: first all the commands from RM_Call, and then the ones from RM_Replicate Another thing worth mentioning is that if, in the past, a client would issue a MULTI/EXEC with just one write command the server would blindly propagate the MULTI/EXEC too, even though it's redundant. not anymore. This commit renames propagate() to propagateNow() in order to cause conflicts in pending PRs. propagatePendingCommands is the only caller of propagateNow, which is now a static, internal helper function. Optimizations: 1. alsoPropagate will not add stuff to also_propagate if there's no AOF and replicas 2. alsoPropagate reallocs also_propagagte exponentially, to save calls to memmove Bugfixes: 1. CONFIG SET can create evictions, sending notifications which can cause to dirty++ with modules. we need to prevent it from propagating to AOF/replicas 2. We need to set current_client in RM_Call. buggy scenario: - CONFIG SET maxmemory, eviction notifications, module hook calls RM_Call - assertion in lookupKey crashes, because current_client has CONFIG SET, which isn't CMD_WRITE 3. minor: in eviction, call propagateDeletion after notification, like active-expire and all commands (we always send a notification before propagating the command)
2021-12-22 23:03:48 +01:00
int in_script; /* Are we inside EVAL? */
int in_exec; /* Are we inside EXEC? */
int busy_module_yield_flags; /* Are we inside a busy module? (triggered by RM_Yield). see BUSY_MODULE_YIELD_ flags. */
const char *busy_module_yield_reply; /* When non-null, we are inside RM_Yield. */
Sort out mess around propagation and MULTI/EXEC (#9890) The mess: Some parts use alsoPropagate for late propagation, others using an immediate one (propagate()), causing edge cases, ugly/hacky code, and the tendency for bugs The basic idea is that all commands are propagated via alsoPropagate (i.e. added to a list) and the top-most call() is responsible for going over that list and actually propagating them (and wrapping them in MULTI/EXEC if there's more than one command). This is done in the new function, propagatePendingCommands. Callers to propagatePendingCommands: 1. top-most call() (we want all nested call()s to add to the also_propagate array and just the top-most one to propagate them) - via `afterCommand` 2. handleClientsBlockedOnKeys: it is out of call() context and it may propagate stuff - via `afterCommand`. 3. handleClientsBlockedOnKeys edge case: if the looked-up key is already expired, we will propagate the expire but will not unblock any client so `afterCommand` isn't called. in that case, we have to propagate the deletion explicitly. 4. cron stuff: active-expire and eviction may also propagate stuff 5. modules: the module API allows to propagate stuff from just about anywhere (timers, keyspace notifications, threads). I could have tried to catch all the out-of-call-context places but it seemed easier to handle it in one place: when we free the context. in the spirit of what was done in call(), only the top-most freeing of a module context may cause propagation. 6. modules: when using a thread-safe ctx it's not clear when/if the ctx will be freed. we do know that the module must lock the GIL before calling RM_Replicate/RM_Call so we propagate the pending commands when releasing the GIL. A "known limitation", which were actually a bug, was fixed because of this commit (see propagate.tcl): When using a mix of RM_Call with `!` and RM_Replicate, the command would propagate out-of-order: first all the commands from RM_Call, and then the ones from RM_Replicate Another thing worth mentioning is that if, in the past, a client would issue a MULTI/EXEC with just one write command the server would blindly propagate the MULTI/EXEC too, even though it's redundant. not anymore. This commit renames propagate() to propagateNow() in order to cause conflicts in pending PRs. propagatePendingCommands is the only caller of propagateNow, which is now a static, internal helper function. Optimizations: 1. alsoPropagate will not add stuff to also_propagate if there's no AOF and replicas 2. alsoPropagate reallocs also_propagagte exponentially, to save calls to memmove Bugfixes: 1. CONFIG SET can create evictions, sending notifications which can cause to dirty++ with modules. we need to prevent it from propagating to AOF/replicas 2. We need to set current_client in RM_Call. buggy scenario: - CONFIG SET maxmemory, eviction notifications, module hook calls RM_Call - assertion in lookupKey crashes, because current_client has CONFIG SET, which isn't CMD_WRITE 3. minor: in eviction, call propagateDeletion after notification, like active-expire and all commands (we always send a notification before propagating the command)
2021-12-22 23:03:48 +01:00
int core_propagates; /* Is the core (in oppose to the module subsystem) is in charge of calling propagatePendingCommands? */
int propagate_no_multi; /* True if propagatePendingCommands should avoid wrapping command in MULTI/EXEC */
int module_ctx_nesting; /* moduleCreateContext() nesting level */
char *ignore_warnings; /* Config: warnings that should be ignored. */
int client_pause_in_transaction; /* Was a client pause executed during this Exec? */
Use madvise(MADV_DONTNEED) to release memory to reduce COW (#8974) ## Backgroud As we know, after `fork`, one process will copy pages when writing data to these pages(CoW), and another process still keep old pages, they totally cost more memory. For redis, we suffered that redis consumed much memory when the fork child is serializing key/values, even that maybe cause OOM. But actually we find, in redis fork child process, the child process don't need to keep some memory and parent process may write or update that, for example, child process will never access the key-value that is serialized but users may update it in parent process. So we think it may reduce COW if the child process release memory that it is not needed. ## Implementation For releasing key value in child process, we may think we call `decrRefCount` to free memory, but i find the fork child process still use much memory when we don't write any data to redis, and it costs much more time that slows down bgsave. Maybe because memory allocator doesn't really release memory to OS, and it may modify some inner data for this free operation, especially when we free small objects. Moreover, CoW is based on pages, so it is a easy way that we only free the memory bulk that is not less than kernel page size. madvise(MADV_DONTNEED) can quickly release specified region pages to OS bypassing memory allocator, and allocator still consider that this memory still is used and don't change its inner data. There are some buffers we can release in the fork child process: - **Serialized key-values** the fork child process never access serialized key-values, so we try to free them. Because we only can release big bulk memory, and it is time consumed to iterate all items/members/fields/entries of complex data type. So we decide to iterate them and try to release them only when their average size of item/member/field/entry is more than page size of OS. - **Replication backlog** Because replication backlog is a cycle buffer, it will be changed quickly if redis has heavy write traffic, but in fork child process, we don't need to access that. - **Client buffers** If clients have requests during having the fork child process, clients' buffer also be changed frequently. The memory includes client query buffer, output buffer, and client struct used memory. To get child process peak private dirty memory, we need to count peak memory instead of last used memory, because the child process may continue to release memory (since COW used to only grow till now, the last was equivalent to the peak). Also we're adding a new `current_cow_peak` info variable (to complement the existing `current_cow_size`) Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-04 22:01:46 +02:00
int thp_enabled; /* If true, THP is enabled. */
size_t page_size; /* The page size of OS. */
2016-03-06 13:44:24 +01:00
/* Modules */
dict *moduleapi; /* Exported core APIs dictionary for modules. */
dict *sharedapi; /* Like moduleapi but containing the APIs that
modules share with each other. */
Module Configurations (#10285) This feature adds the ability to add four different types (Bool, Numeric, String, Enum) of configurations to a module to be accessed via the redis config file, and the CONFIG command. **Configuration Names**: We impose a restriction that a module configuration always starts with the module name and contains a '.' followed by the config name. If a module passes "config1" as the name to a register function, it will be registered as MODULENAME.config1. **Configuration Persistence**: Module Configurations exist only as long as a module is loaded. If a module is unloaded, the configurations are removed. There is now also a minimal core API for removal of standardConfig objects from configs by name. **Get and Set Callbacks**: Storage of config values is owned by the module that registers them, and provides callbacks for Redis to access and manipulate the values. This is exposed through a GET and SET callback. The get callback returns a typed value of the config to redis. The callback takes the name of the configuration, and also a privdata pointer. Note that these only take the CONFIGNAME portion of the config, not the entire MODULENAME.CONFIGNAME. ``` typedef RedisModuleString * (*RedisModuleConfigGetStringFunc)(const char *name, void *privdata); typedef long long (*RedisModuleConfigGetNumericFunc)(const char *name, void *privdata); typedef int (*RedisModuleConfigGetBoolFunc)(const char *name, void *privdata); typedef int (*RedisModuleConfigGetEnumFunc)(const char *name, void *privdata); ``` Configs must also must specify a set callback, i.e. what to do on a CONFIG SET XYZ 123 or when loading configurations from cli/.conf file matching these typedefs. *name* is again just the CONFIGNAME portion, *val* is the parsed value from the core, *privdata* is the registration time privdata pointer, and *err* is for providing errors to a client. ``` typedef int (*RedisModuleConfigSetStringFunc)(const char *name, RedisModuleString *val, void *privdata, RedisModuleString **err); typedef int (*RedisModuleConfigSetNumericFunc)(const char *name, long long val, void *privdata, RedisModuleString **err); typedef int (*RedisModuleConfigSetBoolFunc)(const char *name, int val, void *privdata, RedisModuleString **err); typedef int (*RedisModuleConfigSetEnumFunc)(const char *name, int val, void *privdata, RedisModuleString **err); ``` Modules can also specify an optional apply callback that will be called after value(s) have been set via CONFIG SET: ``` typedef int (*RedisModuleConfigApplyFunc)(RedisModuleCtx *ctx, void *privdata, RedisModuleString **err); ``` **Flags:** We expose 7 new flags to the module, which are used as part of the config registration. ``` #define REDISMODULE_CONFIG_MODIFIABLE 0 /* This is the default for a module config. */ #define REDISMODULE_CONFIG_IMMUTABLE (1ULL<<0) /* Can this value only be set at startup? */ #define REDISMODULE_CONFIG_SENSITIVE (1ULL<<1) /* Does this value contain sensitive information */ #define REDISMODULE_CONFIG_HIDDEN (1ULL<<4) /* This config is hidden in `config get <pattern>` (used for tests/debugging) */ #define REDISMODULE_CONFIG_PROTECTED (1ULL<<5) /* Becomes immutable if enable-protected-configs is enabled. */ #define REDISMODULE_CONFIG_DENY_LOADING (1ULL<<6) /* This config is forbidden during loading. */ /* Numeric Specific Configs */ #define REDISMODULE_CONFIG_MEMORY (1ULL<<7) /* Indicates if this value can be set as a memory value */ ``` **Module Registration APIs**: ``` int (*RedisModule_RegisterBoolConfig)(RedisModuleCtx *ctx, char *name, int default_val, unsigned int flags, RedisModuleConfigGetBoolFunc getfn, RedisModuleConfigSetBoolFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_RegisterNumericConfig)(RedisModuleCtx *ctx, const char *name, long long default_val, unsigned int flags, long long min, long long max, RedisModuleConfigGetNumericFunc getfn, RedisModuleConfigSetNumericFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_RegisterStringConfig)(RedisModuleCtx *ctx, const char *name, const char *default_val, unsigned int flags, RedisModuleConfigGetStringFunc getfn, RedisModuleConfigSetStringFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_RegisterEnumConfig)(RedisModuleCtx *ctx, const char *name, int default_val, unsigned int flags, const char **enum_values, const int *int_values, int num_enum_vals, RedisModuleConfigGetEnumFunc getfn, RedisModuleConfigSetEnumFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_LoadConfigs)(RedisModuleCtx *ctx); ``` The module name will be auto appended along with a "." to the front of the name of the config. **What RM_Register[...]Config does**: A RedisModule struct now keeps a list of ModuleConfig objects which look like: ``` typedef struct ModuleConfig { sds name; /* Name of config without the module name appended to the front */ void *privdata; /* Optional data passed into the module config callbacks */ union get_fn { /* The get callback specificed by the module */ RedisModuleConfigGetStringFunc get_string; RedisModuleConfigGetNumericFunc get_numeric; RedisModuleConfigGetBoolFunc get_bool; RedisModuleConfigGetEnumFunc get_enum; } get_fn; union set_fn { /* The set callback specified by the module */ RedisModuleConfigSetStringFunc set_string; RedisModuleConfigSetNumericFunc set_numeric; RedisModuleConfigSetBoolFunc set_bool; RedisModuleConfigSetEnumFunc set_enum; } set_fn; RedisModuleConfigApplyFunc apply_fn; RedisModule *module; } ModuleConfig; ``` It also registers a standardConfig in the configs array, with a pointer to the ModuleConfig object associated with it. **What happens on a CONFIG GET/SET MODULENAME.MODULECONFIG:** For CONFIG SET, we do the same parsing as is done in config.c and pass that as the argument to the module set callback. For CONFIG GET, we call the module get callback and return that value to config.c to return to a client. **CONFIG REWRITE**: Starting up a server with module configurations in a .conf file but no module load directive will fail. The flip side is also true, specifying a module load and a bunch of module configurations will load those configurations in using the module defined set callbacks on a RM_LoadConfigs call. Configs being rewritten works the same way as it does for standard configs, as the module has the ability to specify a default value. If a module is unloaded with configurations specified in the .conf file those configurations will be commented out from the .conf file on the next config rewrite. **RM_LoadConfigs:** `RedisModule_LoadConfigs(RedisModuleCtx *ctx);` This last API is used to make configs available within the onLoad() after they have been registered. The expected usage is that a module will register all of its configs, then call LoadConfigs to trigger all of the set callbacks, and then can error out if any of them were malformed. LoadConfigs will attempt to set all configs registered to either a .conf file argument/loadex argument or their default value if an argument is not specified. **LoadConfigs is a required function if configs are registered. ** Also note that LoadConfigs **does not** call the apply callbacks, but a module can do that directly after the LoadConfigs call. **New Command: MODULE LOADEX [CONFIG NAME VALUE] [ARGS ...]:** This command provides the ability to provide startup context information to a module. LOADEX stands for "load extended" similar to GETEX. Note that provided config names need the full MODULENAME.MODULECONFIG name. Any additional arguments a module might want are intended to be specified after ARGS. Everything after ARGS is passed to onLoad as RedisModuleString **argv. Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <matolson@amazon.com> Co-authored-by: sundb <sundbcn@gmail.com> Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com> Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2022-03-30 14:47:06 +02:00
dict *module_configs_queue; /* Dict that stores module configurations from .conf file until after modules are loaded during startup or arguments to loadex. */
2016-03-06 13:44:24 +01:00
list *loadmodule_queue; /* List of modules to load at startup. */
Add event loop support to the module API (#10001) Modules can now register sockets/pipe to the Redis main thread event loop and do network operations asynchronously. Previously, modules had to maintain an event loop and another thread for asynchronous network operations. Also, if a module is calling API functions after doing some network operations, it had to synchronize its event loop thread's access with Redis main thread by locking the GIL, causing contention on the lock. After this commit, no synchronization is needed as module can operate in Redis main thread context. So, this commit may improve the performance for some use cases. Added three functions to the module API: * RedisModule_EventLoopAdd(int fd, int mask, RedisModuleEventLoopFunc func, void *user_data) * RedisModule_EventLoopDel(int fd, int mask) * RedisModule_EventLoopAddOneShot(RedisModuleEventLoopOneShotFunc func, void *user_data) - This function can be called from other threads to trigger callback on Redis main thread. Callback will be triggered only once. If Redis main thread is sleeping, this call will wake up the Redis main thread. Event loop callbacks are called by Redis main thread after locking the GIL. Inside callbacks, modules can operate as if they are holding the GIL. Added REDISMODULE_EVENT_EVENTLOOP event with two subevents: * REDISMODULE_SUBEVENT_EVENTLOOP_BEFORE_SLEEP * REDISMODULE_SUBEVENT_EVENTLOOP_AFTER_SLEEP These events are for modules that want to participate in the before and after sleep action. e.g It might be useful to implement batching : Read data from the network, write all to a file in one go on BEFORE_SLEEP event.
2022-01-18 12:10:07 +01:00
int module_pipe[2]; /* Pipe used to awake the event loop by module threads. */
pid_t child_pid; /* PID of current child */
int child_type; /* Type of current child */
/* Networking */
int port; /* TCP listening port */
int tls_port; /* TLS listening port */
int tcp_backlog; /* TCP listen() backlog */
2015-07-27 09:41:48 +02:00
char *bindaddr[CONFIG_BINDADDR_MAX]; /* Addresses we should bind to */
2013-07-04 18:50:15 +02:00
int bindaddr_count; /* Number of addresses in server.bindaddr[] */
char *bind_source_addr; /* Source address to bind on for outgoing connections */
char *unixsocket; /* UNIX socket path */
unsigned int unixsocketperm; /* UNIX socket permission (see mode_t) */
socketFds ipfd; /* TCP socket file descriptors */
socketFds tlsfd; /* TLS socket file descriptors */
int sofd; /* Unix socket file descriptor */
uint32_t socket_mark_id; /* ID for listen socket marking */
socketFds cfd; /* Cluster bus listening socket */
list *clients; /* List of active clients */
list *clients_to_close; /* Clients to close asynchronously */
list *clients_pending_write; /* There is to write or install handler. */
2019-03-30 11:26:58 +01:00
list *clients_pending_read; /* Client has pending read socket buffers. */
list *slaves, *monitors; /* List of slaves and MONITORs */
client *current_client; /* Current client executing the command. */
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
/* Stuff for client mem eviction */
clientMemUsageBucket client_mem_usage_buckets[CLIENT_MEM_USAGE_BUCKETS];
rax *clients_timeout_table; /* Radix tree for blocked clients timeouts. */
long fixed_time_expire; /* If > 0, expire keys against server.mstime. */
int in_nested_call; /* If > 0, in a nested call of a call */
rax *clients_index; /* Active clients dictionary by client ID. */
pause_type client_pause_type; /* True if clients are currently paused */
list *postponed_clients; /* List of postponed clients */
mstime_t client_pause_end_time; /* Time when we undo clients_paused */
Wait for replicas when shutting down (#9872) To avoid data loss, this commit adds a grace period for lagging replicas to catch up the replication offset. Done: * Wait for replicas when shutdown is triggered by SIGTERM and SIGINT. * Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients to call SHUTDOWN in parallel. Note that they don't expect a response unless an error happens and shutdown is aborted. * Log warning for each replica lagging behind when finishing shutdown. * CLIENT_PAUSE_WRITE while waiting for replicas. * Configurable grace period 'shutdown-timeout' in seconds (default 10). * New flags for the SHUTDOWN command: - NOW disables the grace period for lagging replicas. - FORCE ignores errors writing the RDB or AOF files which would normally prevent a shutdown. - ABORT cancels ongoing shutdown. Can't be combined with other flags. * New field in the output of the INFO command: 'shutdown_in_milliseconds'. The value is the remaining maximum time to wait for lagging replicas before finishing the shutdown. This field is present in the Server section **only** during shutdown. Not directly related: * When shutting down, if there is an AOF saving child, it is killed **even** if AOF is disabled. This can happen if BGREWRITEAOF is used when AOF is off. * Client pause now has end time and type (WRITE or ALL) per purpose. The different pause purposes are *CLIENT PAUSE command*, *failover* and *shutdown*. If clients are unpaused for one purpose, it doesn't affect client pause for other purposes. For example, the CLIENT UNPAUSE command doesn't affect client pause initiated by the failover or shutdown procedures. A completed failover or a failed shutdown doesn't unpause clients paused by the CLIENT PAUSE command. Notes: * DEBUG RESTART doesn't wait for replicas. * We already have a warning logged when a replica disconnects. This means that if any replica connection is lost during the shutdown, it is either logged as disconnected or as lagging at the time of exit. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-02 08:50:15 +01:00
pause_event *client_pause_per_purpose[NUM_PAUSE_PURPOSES];
char neterr[ANET_ERR_LEN]; /* Error buffer for anet.c */
dict *migrate_cached_sockets;/* MIGRATE cached sockets */
Implement redisAtomic to replace _Atomic C11 builtin (#7707) Redis 6.0 introduces I/O threads, it is so cool and efficient, we use C11 _Atomic to establish inter-thread synchronization without mutex. But the compiler that must supports C11 _Atomic can compile redis code, that brings a lot of inconvenience since some common platforms can't support by default such as CentOS7, so we want to implement redis atomic type to make it more portable. We have implemented our atomic variable for redis that only has 'relaxed' operations in src/atomicvar.h, so we implement some operations with 'sequentially-consistent', just like the default behavior of C11 _Atomic that can establish inter-thread synchronization. And we replace all uses of C11 _Atomic with redis atomic variable. Our implementation of redis atomic variable uses C11 _Atomic, __atomic or __sync macros if available, it supports most common platforms, and we will detect automatically which feature we use. In Makefile we use a dummy file to detect if the compiler supports C11 _Atomic. Now for gcc, we can compile redis code theoretically if your gcc version is not less than 4.1.2(starts to support __sync_xxx operations). Otherwise, we remove use mutex fallback to implement redis atomic variable for performance and test. You will get compiling errors if your compiler doesn't support all features of above. For cover redis atomic variable tests, we add other CI jobs that build redis on CentOS6 and CentOS7 and workflow daily jobs that run the tests on them. For them, we just install gcc by default in order to cover different compiler versions, gcc is 4.4.7 by default installation on CentOS6 and 4.8.5 on CentOS7. We restore the feature that we can test redis with Helgrind to find data race errors. But you need install Valgrind in the default path configuration firstly before running your tests, since we use macros in helgrind.h to tell Helgrind inter-thread happens-before relationship explicitly for avoiding false positives. Please open an issue on github if you find data race errors relate to this commit. Unrelated: - Fix redefinition of typedef 'RedisModuleUserChangedFunc' For some old version compilers, they will report errors or warnings, if we re-define function type.
2020-09-17 15:01:45 +02:00
redisAtomic uint64_t next_client_id; /* Next client unique ID. Incremental. */
int protected_mode; /* Don't accept external connections. */
int io_threads_num; /* Number of IO threads to use. */
int io_threads_do_reads; /* Read and parse from IO threads? */
int io_threads_active; /* Is IO threads currently active? */
long long events_processed_while_blocked; /* processEventsWhileBlocked() */
int enable_protected_configs; /* Enable the modification of protected configs, see PROTECTED_ACTION_ALLOWED_* */
int enable_debug_cmd; /* Enable DEBUG commands, see PROTECTED_ACTION_ALLOWED_* */
int enable_module_cmd; /* Enable MODULE commands, see PROTECTED_ACTION_ALLOWED_* */
/* RDB / AOF loading information */
volatile sig_atomic_t loading; /* We are loading data from disk if true */
Replica keep serving data during repl-diskless-load=swapdb for better availability (#9323) For diskless replication in swapdb mode, considering we already spend replica memory having a backup of current db to restore in case of failure, we can have the following benefits by instead swapping database only in case we succeeded in transferring db from master: - Avoid `LOADING` response during failed and successful synchronization for cases where the replica is already up and running with data. - Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping. - This could be implemented also for disk replication with similar benefits if consumers are willing to spend the extra memory usage. General notes: - The concept of `backupDb` becomes `tempDb` for clarity. - Async loading mode will only kick in if the replica is syncing from a master that has the same repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline. - New property in INFO: `async_loading` to differentiate from the blocking loading - Slot to Key mapping is now a field of `redisDb` as it's more natural to access it from both server.db and the tempDb that is passed around. - Because this is affecting replicas only, we assume that if they are not readonly and write commands during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET here anyways to avoid complications. Considerations for review: - We have many cases where server.loading flag is used and even though I tried my best, there may be cases where async_loading should be checked as well and cases where it shouldn't (would require very good understanding of whole code) - Several places that had different behavior depending on the loading flag where actually meant to just handle commands coming from the AOF client differently than ones coming from real clients, changed to check CLIENT_ID_AOF instead. **Additional for Release Notes** - Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't contribute on triggering next database SAVE - New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING - Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event. Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED, ABORTED and COMPLETED. - New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions to allow modules to declare they support the diskless replication with async loading (when absent, we fall back to disk-based loading). Co-authored-by: Eduardo Semprebon <edus@saxobank.com> Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-04 09:46:50 +01:00
volatile sig_atomic_t async_loading; /* We are loading data without blocking the db being served */
off_t loading_total_bytes;
off_t loading_rdb_used_mem;
off_t loading_loaded_bytes;
time_t loading_start_time;
off_t loading_process_events_interval_bytes;
/* Fields used only for stats */
time_t stat_starttime; /* Server start time */
long long stat_numcommands; /* Number of processed commands */
long long stat_numconnections; /* Number of connections received */
long long stat_expiredkeys; /* Number of expired keys */
Track number of logically expired keys still in memory. This commit adds two new fields in the INFO output, stats section: expired_stale_perc:0.34 expired_time_cap_reached_count:58 The first field is an estimate of the number of keys that are yet in memory but are already logically expired. They reason why those keys are yet not reclaimed is because the active expire cycle can't spend more time on the process of reclaiming the keys, and at the same time nobody is accessing such keys. However as the active expire cycle runs, while it will eventually have to return to the caller, because of time limit or because there are less than 25% of keys logically expired in each given database, it collects the stats in order to populate this INFO field. Note that expired_stale_perc is a running average, where the current sample accounts for 5% and the history for 95%, so you'll see it changing smoothly over time. The other field, expired_time_cap_reached_count, counts the number of times the expire cycle had to stop, even if still it was finding a sizeable number of keys yet to expire, because of the time limit. This allows people handling operations to understand if the Redis server, during mass-expiration events, is able to collect keys fast enough usually. It is normal for this field to increment during mass expires, but normally it should very rarely increment. When instead it constantly increments, it means that the current workloads is using a very important percentage of CPU time to expire keys. This feature was created thanks to the hints of Rashmi Ramesh and Bart Robinson from Twitter. In private email exchanges, they noted how it was important to improve the observability of this parameter in the Redis server. Actually in big deployments, the amount of keys that are yet to expire in each server, even if they are logically expired, may account for a very big amount of wasted memory.
2018-02-19 11:12:49 +01:00
double stat_expired_stale_perc; /* Percentage of keys probably expired */
long long stat_expired_time_cap_reached_count; /* Early expire cycle stops.*/
long long stat_expire_cycle_time_used; /* Cumulative microseconds used. */
long long stat_evictedkeys; /* Number of evicted keys (maxmemory) */
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
long long stat_evictedclients; /* Number of evicted clients */
long long stat_total_eviction_exceeded_time; /* Total time over the memory limit, unit us */
monotime stat_last_eviction_exceeded_time; /* Timestamp of current eviction start, unit us */
long long stat_keyspace_hits; /* Number of successful lookups of keys */
long long stat_keyspace_misses; /* Number of failed lookups of keys */
2016-12-30 02:37:52 +01:00
long long stat_active_defrag_hits; /* number of allocations moved */
long long stat_active_defrag_misses; /* number of allocations scanned but not moved */
long long stat_active_defrag_key_hits; /* number of keys with moved allocations */
long long stat_active_defrag_key_misses;/* number of keys scanned and not moved */
long long stat_active_defrag_scanned; /* number of dictEntries scanned */
long long stat_total_active_defrag_time; /* Total time memory fragmentation over the limit, unit us */
monotime stat_last_active_defrag_time; /* Timestamp of current active defrag start */
size_t stat_peak_memory; /* Max used memory record */
long long stat_aof_rewrites; /* number of aof file rewrites performed */
long long stat_rdb_saves; /* number of rdb saves performed */
2013-01-16 18:00:20 +01:00
long long stat_fork_time; /* Time needed to perform latest fork() */
double stat_fork_rate; /* Fork rate in GB/sec. */
2020-12-13 09:01:18 +01:00
long long stat_total_forks; /* Total count of fork. */
long long stat_rejected_conn; /* Clients rejected because of maxclients */
long long stat_sync_full; /* Number of full resyncs with slaves. */
long long stat_sync_partial_ok; /* Number of accepted PSYNC requests. */
long long stat_sync_partial_err;/* Number of unaccepted PSYNC requests. */
list *slowlog; /* SLOWLOG list of commands */
long long slowlog_entry_id; /* SLOWLOG current entry ID */
long long slowlog_log_slower_than; /* SLOWLOG time limit (to get logged) */
unsigned long slowlog_max_len; /* SLOWLOG max number of items logged */
struct malloc_stats cron_malloc_stats; /* sampled in serverCron(). */
Implement redisAtomic to replace _Atomic C11 builtin (#7707) Redis 6.0 introduces I/O threads, it is so cool and efficient, we use C11 _Atomic to establish inter-thread synchronization without mutex. But the compiler that must supports C11 _Atomic can compile redis code, that brings a lot of inconvenience since some common platforms can't support by default such as CentOS7, so we want to implement redis atomic type to make it more portable. We have implemented our atomic variable for redis that only has 'relaxed' operations in src/atomicvar.h, so we implement some operations with 'sequentially-consistent', just like the default behavior of C11 _Atomic that can establish inter-thread synchronization. And we replace all uses of C11 _Atomic with redis atomic variable. Our implementation of redis atomic variable uses C11 _Atomic, __atomic or __sync macros if available, it supports most common platforms, and we will detect automatically which feature we use. In Makefile we use a dummy file to detect if the compiler supports C11 _Atomic. Now for gcc, we can compile redis code theoretically if your gcc version is not less than 4.1.2(starts to support __sync_xxx operations). Otherwise, we remove use mutex fallback to implement redis atomic variable for performance and test. You will get compiling errors if your compiler doesn't support all features of above. For cover redis atomic variable tests, we add other CI jobs that build redis on CentOS6 and CentOS7 and workflow daily jobs that run the tests on them. For them, we just install gcc by default in order to cover different compiler versions, gcc is 4.4.7 by default installation on CentOS6 and 4.8.5 on CentOS7. We restore the feature that we can test redis with Helgrind to find data race errors. But you need install Valgrind in the default path configuration firstly before running your tests, since we use macros in helgrind.h to tell Helgrind inter-thread happens-before relationship explicitly for avoiding false positives. Please open an issue on github if you find data race errors relate to this commit. Unrelated: - Fix redefinition of typedef 'RedisModuleUserChangedFunc' For some old version compilers, they will report errors or warnings, if we re-define function type.
2020-09-17 15:01:45 +02:00
redisAtomic long long stat_net_input_bytes; /* Bytes read from network. */
redisAtomic long long stat_net_output_bytes; /* Bytes written to network. */
Use madvise(MADV_DONTNEED) to release memory to reduce COW (#8974) ## Backgroud As we know, after `fork`, one process will copy pages when writing data to these pages(CoW), and another process still keep old pages, they totally cost more memory. For redis, we suffered that redis consumed much memory when the fork child is serializing key/values, even that maybe cause OOM. But actually we find, in redis fork child process, the child process don't need to keep some memory and parent process may write or update that, for example, child process will never access the key-value that is serialized but users may update it in parent process. So we think it may reduce COW if the child process release memory that it is not needed. ## Implementation For releasing key value in child process, we may think we call `decrRefCount` to free memory, but i find the fork child process still use much memory when we don't write any data to redis, and it costs much more time that slows down bgsave. Maybe because memory allocator doesn't really release memory to OS, and it may modify some inner data for this free operation, especially when we free small objects. Moreover, CoW is based on pages, so it is a easy way that we only free the memory bulk that is not less than kernel page size. madvise(MADV_DONTNEED) can quickly release specified region pages to OS bypassing memory allocator, and allocator still consider that this memory still is used and don't change its inner data. There are some buffers we can release in the fork child process: - **Serialized key-values** the fork child process never access serialized key-values, so we try to free them. Because we only can release big bulk memory, and it is time consumed to iterate all items/members/fields/entries of complex data type. So we decide to iterate them and try to release them only when their average size of item/member/field/entry is more than page size of OS. - **Replication backlog** Because replication backlog is a cycle buffer, it will be changed quickly if redis has heavy write traffic, but in fork child process, we don't need to access that. - **Client buffers** If clients have requests during having the fork child process, clients' buffer also be changed frequently. The memory includes client query buffer, output buffer, and client struct used memory. To get child process peak private dirty memory, we need to count peak memory instead of last used memory, because the child process may continue to release memory (since COW used to only grow till now, the last was equivalent to the peak). Also we're adding a new `current_cow_peak` info variable (to complement the existing `current_cow_size`) Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-04 22:01:46 +02:00
size_t stat_current_cow_peak; /* Peak size of copy on write bytes. */
size_t stat_current_cow_bytes; /* Copy on write bytes while child is active. */
monotime stat_current_cow_updated; /* Last update time of stat_current_cow_bytes */
size_t stat_current_save_keys_processed; /* Processed keys while child is active. */
size_t stat_current_save_keys_total; /* Number of keys when child started. */
size_t stat_rdb_cow_bytes; /* Copy on write bytes during RDB saving. */
size_t stat_aof_cow_bytes; /* Copy on write bytes during AOF rewrite. */
size_t stat_module_cow_bytes; /* Copy on write bytes during module fork. */
double stat_module_progress; /* Module save progress. */
size_t stat_clients_type_memory[CLIENT_TYPE_COUNT];/* Mem usage by type */
size_t stat_cluster_links_memory; /* Mem usage by cluster links */
long long stat_unexpected_error_replies; /* Number of unexpected (aof-loading, replica to master, etc.) error replies */
long long stat_total_error_replies; /* Total number of issued error replies ( command + rejected errors ) */
long long stat_dump_payload_sanitizations; /* Number deep dump payloads integrity validations. */
long long stat_io_reads_processed; /* Number of read events processed by IO / Main threads */
long long stat_io_writes_processed; /* Number of write events processed by IO / Main threads */
Implement redisAtomic to replace _Atomic C11 builtin (#7707) Redis 6.0 introduces I/O threads, it is so cool and efficient, we use C11 _Atomic to establish inter-thread synchronization without mutex. But the compiler that must supports C11 _Atomic can compile redis code, that brings a lot of inconvenience since some common platforms can't support by default such as CentOS7, so we want to implement redis atomic type to make it more portable. We have implemented our atomic variable for redis that only has 'relaxed' operations in src/atomicvar.h, so we implement some operations with 'sequentially-consistent', just like the default behavior of C11 _Atomic that can establish inter-thread synchronization. And we replace all uses of C11 _Atomic with redis atomic variable. Our implementation of redis atomic variable uses C11 _Atomic, __atomic or __sync macros if available, it supports most common platforms, and we will detect automatically which feature we use. In Makefile we use a dummy file to detect if the compiler supports C11 _Atomic. Now for gcc, we can compile redis code theoretically if your gcc version is not less than 4.1.2(starts to support __sync_xxx operations). Otherwise, we remove use mutex fallback to implement redis atomic variable for performance and test. You will get compiling errors if your compiler doesn't support all features of above. For cover redis atomic variable tests, we add other CI jobs that build redis on CentOS6 and CentOS7 and workflow daily jobs that run the tests on them. For them, we just install gcc by default in order to cover different compiler versions, gcc is 4.4.7 by default installation on CentOS6 and 4.8.5 on CentOS7. We restore the feature that we can test redis with Helgrind to find data race errors. But you need install Valgrind in the default path configuration firstly before running your tests, since we use macros in helgrind.h to tell Helgrind inter-thread happens-before relationship explicitly for avoiding false positives. Please open an issue on github if you find data race errors relate to this commit. Unrelated: - Fix redefinition of typedef 'RedisModuleUserChangedFunc' For some old version compilers, they will report errors or warnings, if we re-define function type.
2020-09-17 15:01:45 +02:00
redisAtomic long long stat_total_reads_processed; /* Total number of read events processed */
redisAtomic long long stat_total_writes_processed; /* Total number of write events processed */
/* The following two are used to track instantaneous metrics, like
* number of operations per second, network traffic. */
struct {
long long last_sample_time; /* Timestamp of last sample in ms */
long long last_sample_count;/* Count in last sample */
2015-07-27 09:41:48 +02:00
long long samples[STATS_METRIC_SAMPLES];
int idx;
2015-07-27 09:41:48 +02:00
} inst_metric[STATS_METRIC_COUNT];
introduce dynamic client reply buffer size - save memory on idle clients (#9822) Current implementation simple idle client which serves no traffic still use ~17Kb of memory. this is mainly due to a fixed size reply buffer currently set to 16kb. We have encountered some cases in which the server operates in a low memory environments. In such cases a user who wishes to create large connection pools to support potential burst period, will exhaust a large amount of memory to maintain connected Idle clients. Some users may choose to "sacrifice" performance in order to save memory. This commit introduce a dynamic mechanism to shrink and expend the client reply buffer based on periodic observed peak. the algorithm works as follows: 1. each time a client reply buffer has been fully written, the last recorded peak is updated: new peak = MAX( last peak, current written size) 2. during clients cron we check for each client if the last observed peak was: a. matching the current buffer size - in which case we expend (resize) the buffer size by 100% b. less than half the buffer size - in which case we shrink the buffer size by 50% 3. In any case we will **not** resize the buffer in case: a. the current buffer peak is less then the current buffer usable size and higher than 1/2 the current buffer usable size b. the value of (current buffer usable size/2) is less than 1Kib c. the value of (current buffer usable size*2) is larger than 16Kib 4. the peak value is reset to the current buffer position once every **5** seconds. we maintain a new field in the client structure (buf_peak_last_reset_time) which is used to keep track of how long it passed since the last buffer peak reset. ### **Interface changes:** **CIENT LIST** - now contains 2 new extra fields: rbs= < the current size in bytes of the client reply buffer > rbp=< the current value in bytes of the last observed buffer peak position > **INFO STATS** - now contains 2 new statistics: reply_buffer_shrinks = < total number of buffer shrinks performed > reply_buffer_expends = < total number of buffer expends performed > Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Yoav Steinberg <yoav@redislabs.com>
2022-02-22 10:19:38 +01:00
long long stat_reply_buffer_shrinks; /* Total number of output buffer shrinks */
long long stat_reply_buffer_expands; /* Total number of output buffer expands */
/* Configuration */
int verbosity; /* Loglevel in redis.conf */
int maxidletime; /* Client timeout in seconds */
int tcpkeepalive; /* Set SO_KEEPALIVE if non-zero. */
int active_expire_enabled; /* Can be disabled for testing purposes. */
int active_expire_effort; /* From 1 (default) to 10, active effort. */
2016-12-30 02:37:52 +01:00
int active_defrag_enabled;
int sanitize_dump_payload; /* Enables deep sanitization for ziplist and listpack in RDB and RESTORE. */
int skip_checksum_validation; /* Disable checksum validation for RDB and RESTORE payload. */
int jemalloc_bg_thread; /* Enable jemalloc background thread */
2016-12-30 02:37:52 +01:00
size_t active_defrag_ignore_bytes; /* minimum amount of fragmentation waste to start active defrag */
int active_defrag_threshold_lower; /* minimum percentage of fragmentation to start active defrag */
int active_defrag_threshold_upper; /* maximum percentage of fragmentation at which we use maximum effort */
int active_defrag_cycle_min; /* minimal effort for defrag in CPU percentage */
int active_defrag_cycle_max; /* maximal effort for defrag in CPU percentage */
unsigned long active_defrag_max_scan_fields; /* maximum number of fields of set/hash/zset/list to process from within the main dict scan */
Implement redisAtomic to replace _Atomic C11 builtin (#7707) Redis 6.0 introduces I/O threads, it is so cool and efficient, we use C11 _Atomic to establish inter-thread synchronization without mutex. But the compiler that must supports C11 _Atomic can compile redis code, that brings a lot of inconvenience since some common platforms can't support by default such as CentOS7, so we want to implement redis atomic type to make it more portable. We have implemented our atomic variable for redis that only has 'relaxed' operations in src/atomicvar.h, so we implement some operations with 'sequentially-consistent', just like the default behavior of C11 _Atomic that can establish inter-thread synchronization. And we replace all uses of C11 _Atomic with redis atomic variable. Our implementation of redis atomic variable uses C11 _Atomic, __atomic or __sync macros if available, it supports most common platforms, and we will detect automatically which feature we use. In Makefile we use a dummy file to detect if the compiler supports C11 _Atomic. Now for gcc, we can compile redis code theoretically if your gcc version is not less than 4.1.2(starts to support __sync_xxx operations). Otherwise, we remove use mutex fallback to implement redis atomic variable for performance and test. You will get compiling errors if your compiler doesn't support all features of above. For cover redis atomic variable tests, we add other CI jobs that build redis on CentOS6 and CentOS7 and workflow daily jobs that run the tests on them. For them, we just install gcc by default in order to cover different compiler versions, gcc is 4.4.7 by default installation on CentOS6 and 4.8.5 on CentOS7. We restore the feature that we can test redis with Helgrind to find data race errors. But you need install Valgrind in the default path configuration firstly before running your tests, since we use macros in helgrind.h to tell Helgrind inter-thread happens-before relationship explicitly for avoiding false positives. Please open an issue on github if you find data race errors relate to this commit. Unrelated: - Fix redefinition of typedef 'RedisModuleUserChangedFunc' For some old version compilers, they will report errors or warnings, if we re-define function type.
2020-09-17 15:01:45 +02:00
size_t client_max_querybuf_len; /* Limit for client query buffer length */
int dbnum; /* Total number of configured DBs */
int supervised; /* 1 if supervised, 0 otherwise. */
2015-07-27 09:41:48 +02:00
int supervised_mode; /* See SUPERVISED_* */
int daemonize; /* True if running as a daemon */
Add 'set-proc-title' config so that this mechanism can be disabled (#3623) if option `set-proc-title' is no, then do nothing for proc title. The reason has been explained long ago, see following: We update redis to 2.8.8, then found there are some side effect when redis always change the process title. We run several slave instance on one computer, and all these salves listen on unix socket only, then ps will show: 1 S redis 18036 1 0 80 0 - 56130 ep_pol 14:02 ? 00:00:31 /usr/sbin/redis-server *:0 1 S redis 23949 1 0 80 0 - 11074 ep_pol 15:41 ? 00:00:00 /usr/sbin/redis-server *:0 for redis 2.6 the output of ps is like following: 1 S redis 18036 1 0 80 0 - 56130 ep_pol 14:02 ? 00:00:31 /usr/sbin/redis-server /etc/redis/a.conf 1 S redis 23949 1 0 80 0 - 11074 ep_pol 15:41 ? 00:00:00 /usr/sbin/redis-server /etc/redis/b.conf Later is more informational in our case. The situation is worse when we manage the config and process running state by salt. Salt check the process by running "ps | grep SIG" (for Gentoo System) to check the running state, where SIG is the string to search for when looking for the service process with ps. Previously, we define sig as "/usr/sbin/redis-server /etc/redis/a.conf". Since the ps output is identical for our case, so we have no way to check the state of specified redis instance. So, for our case, we prefer the old behavior, i.e, do not change the process title for the main redis process. Or add an option such as "set-proc-title [yes|no]" to control this behavior. Co-authored-by: Yossi Gottlieb <yossigo@gmail.com> Co-authored-by: Oran Agra <oran@redislabs.com>
2021-01-28 10:12:39 +01:00
int set_proc_title; /* True if change proc title */
char *proc_title_template; /* Process title template format */
2015-07-28 16:58:04 +02:00
clientBufferLimitsConfig client_obuf_limits[CLIENT_TYPE_OBUF_COUNT];
int pause_cron; /* Don't run cron tasks (debug) */
Added INFO LATENCYSTATS section: latency by percentile distribution/latency by cumulative distribution of latencies (#9462) # Short description The Redis extended latency stats track per command latencies and enables: - exporting the per-command percentile distribution via the `INFO LATENCYSTATS` command. **( percentile distribution is not mergeable between cluster nodes ).** - exporting the per-command cumulative latency distributions via the `LATENCY HISTOGRAM` command. Using the cumulative distribution of latencies we can merge several stats from different cluster nodes to calculate aggregate metrics . By default, the extended latency monitoring is enabled since the overhead of keeping track of the command latency is very small. If you don't want to track extended latency metrics, you can easily disable it at runtime using the command: - `CONFIG SET latency-tracking no` By default, the exported latency percentiles are the p50, p99, and p999. You can alter them at runtime using the command: - `CONFIG SET latency-tracking-info-percentiles "0.0 50.0 100.0"` ## Some details: - The total size per histogram should sit around 40 KiB. We only allocate those 40KiB when a command was called for the first time. - With regards to the WRITE overhead As seen below, there is no measurable overhead on the achievable ops/sec or full latency spectrum on the client. Including also the measured redis-benchmark for unstable vs this branch. - We track from 1 nanosecond to 1 second ( everything above 1 second is considered +Inf ) ## `INFO LATENCYSTATS` exposition format - Format: `latency_percentiles_usec_<CMDNAME>:p0=XX,p50....` ## `LATENCY HISTOGRAM [command ...]` exposition format Return a cumulative distribution of latencies in the format of a histogram for the specified command names. The histogram is composed of a map of time buckets: - Each representing a latency range, between 1 nanosecond and roughly 1 second. - Each bucket covers twice the previous bucket's range. - Empty buckets are not printed. - Everything above 1 sec is considered +Inf. - At max there will be log2(1000000000)=30 buckets We reply a map for each command in the format: `<command name> : { `calls`: <total command calls> , `histogram` : { <bucket 1> : latency , < bucket 2> : latency, ... } }` Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-05 13:01:05 +01:00
int latency_tracking_enabled; /* 1 if extended latency tracking is enabled, 0 otherwise. */
double *latency_tracking_info_percentiles; /* Extended latency tracking info output percentile list configuration. */
int latency_tracking_info_percentiles_len;
/* AOF persistence */
int aof_enabled; /* AOF configuration */
2015-07-27 09:41:48 +02:00
int aof_state; /* AOF_(ON|OFF|WAIT_REWRITE) */
int aof_fsync; /* Kind of fsync() policy */
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788) Implement Multi-Part AOF mechanism to avoid overheads during AOFRW. Introducing a folder with multiple AOF files tracked by a manifest file. The main issues with the the original AOFRW mechanism are: * buffering of commands that are processed during rewrite (consuming a lot of RAM) * freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it. * double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files) The main modifications of this PR: 1. Remove the AOF rewrite buffer and related code. 2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type, it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the incremental commands since the last AOFRW. 3. Use a AOF manifest file to record and manage these AOF files mentioned above. 4. The original configuration of `appendfilename` will be the base part of the new file name, for example: `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof` 5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename` 6. Remove the `aof_rewrite_buffer_length` field in info. 7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs. It also gives users the opportunity to preserve the history AOFs. just for testing use now. 8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now), we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately. 9. Support upgrade (load) data from old version redis. 10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and manifest file will be placed in this directory. 11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if `aof-load-truncated` is enabled. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-03 18:14:13 +01:00
char *aof_filename; /* Basename of the AOF file and manifest file */
char *aof_dirname; /* Name of the AOF directory */
int aof_no_fsync_on_rewrite; /* Don't fsync if a rewrite is in prog. */
int aof_rewrite_perc; /* Rewrite AOF if % growth is > M and... */
off_t aof_rewrite_min_size; /* the AOF file is at least N bytes. */
off_t aof_rewrite_base_size; /* AOF size on latest startup or rewrite. */
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788) Implement Multi-Part AOF mechanism to avoid overheads during AOFRW. Introducing a folder with multiple AOF files tracked by a manifest file. The main issues with the the original AOFRW mechanism are: * buffering of commands that are processed during rewrite (consuming a lot of RAM) * freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it. * double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files) The main modifications of this PR: 1. Remove the AOF rewrite buffer and related code. 2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type, it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the incremental commands since the last AOFRW. 3. Use a AOF manifest file to record and manage these AOF files mentioned above. 4. The original configuration of `appendfilename` will be the base part of the new file name, for example: `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof` 5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename` 6. Remove the `aof_rewrite_buffer_length` field in info. 7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs. It also gives users the opportunity to preserve the history AOFs. just for testing use now. 8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now), we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately. 9. Support upgrade (load) data from old version redis. 10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and manifest file will be placed in this directory. 11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if `aof-load-truncated` is enabled. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-03 18:14:13 +01:00
off_t aof_current_size; /* AOF current size (Including BASE + INCRs). */
off_t aof_last_incr_size; /* The size of the latest incr AOF. */
off_t aof_fsync_offset; /* AOF offset which is already synced to disk. */
int aof_flush_sleep; /* Micros to sleep before flush. (used by tests) */
int aof_rewrite_scheduled; /* Rewrite once BGSAVE terminates. */
2011-12-21 12:17:02 +01:00
sds aof_buf; /* AOF buffer, written before entering the event loop */
int aof_fd; /* File descriptor of currently selected AOF file */
int aof_selected_db; /* Currently selected DB in AOF */
time_t aof_flush_postponed_start; /* UNIX time of postponed AOF flush */
2011-12-21 12:17:02 +01:00
time_t aof_last_fsync; /* UNIX time of last fsync() */
time_t aof_rewrite_time_last; /* Time used by last AOF rewrite run. */
time_t aof_rewrite_time_start; /* Current AOF rewrite start time. */
time_t aof_cur_timestamp; /* Current record timestamp in AOF */
int aof_timestamp_enabled; /* Enable record timestamp in AOF */
int aof_lastbgrewrite_status; /* C_OK or C_ERR */
unsigned long aof_delayed_fsync; /* delayed AOF fsync() counter */
int aof_rewrite_incremental_fsync;/* fsync incrementally while aof rewriting? */
int rdb_save_incremental_fsync; /* fsync incrementally while rdb saving? */
int aof_last_write_status; /* C_OK or C_ERR */
int aof_last_write_errno; /* Valid if aof write/fsync status is ERR */
2014-09-05 11:48:35 +02:00
int aof_load_truncated; /* Don't stop on unexpected AOF EOF. */
int aof_use_rdb_preamble; /* Specify base AOF to use RDB encoding on AOF rewrites. */
redisAtomic int aof_bio_fsync_status; /* Status of AOF fsync in bio job. */
redisAtomic int aof_bio_fsync_errno; /* Errno of AOF fsync in bio job. */
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788) Implement Multi-Part AOF mechanism to avoid overheads during AOFRW. Introducing a folder with multiple AOF files tracked by a manifest file. The main issues with the the original AOFRW mechanism are: * buffering of commands that are processed during rewrite (consuming a lot of RAM) * freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it. * double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files) The main modifications of this PR: 1. Remove the AOF rewrite buffer and related code. 2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type, it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the incremental commands since the last AOFRW. 3. Use a AOF manifest file to record and manage these AOF files mentioned above. 4. The original configuration of `appendfilename` will be the base part of the new file name, for example: `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof` 5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename` 6. Remove the `aof_rewrite_buffer_length` field in info. 7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs. It also gives users the opportunity to preserve the history AOFs. just for testing use now. 8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now), we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately. 9. Support upgrade (load) data from old version redis. 10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and manifest file will be placed in this directory. 11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if `aof-load-truncated` is enabled. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-03 18:14:13 +01:00
aofManifest *aof_manifest; /* Used to track AOFs. */
int aof_disable_auto_gc; /* If disable automatically deleting HISTORY type AOFs?
default no. (for testings). */
/* RDB persistence */
long long dirty; /* Changes to DB from the last save */
long long dirty_before_bgsave; /* Used to restore dirty on failed BGSAVE */
long long rdb_last_load_keys_expired; /* number of expired keys when loading RDB */
long long rdb_last_load_keys_loaded; /* number of loaded keys when loading RDB */
struct saveparam *saveparams; /* Save points array for RDB */
int saveparamslen; /* Number of saving points */
2011-12-21 12:22:13 +01:00
char *rdb_filename; /* Name of RDB file */
int rdb_compression; /* Use compression in RDB? */
int rdb_checksum; /* Use RDB checksum? */
int rdb_del_sync_files; /* Remove RDB files used only for SYNC if
the instance does not use persistence. */
2013-01-16 18:00:20 +01:00
time_t lastsave; /* Unix time of last successful save */
time_t lastbgsave_try; /* Unix time of last attempted bgsave */
time_t rdb_save_time_last; /* Time used by last RDB save run. */
time_t rdb_save_time_start; /* Current RDB save start time. */
int rdb_bgsave_scheduled; /* BGSAVE when possible if true. */
int rdb_child_type; /* Type of save by active child. */
int lastbgsave_status; /* C_OK or C_ERR */
int stop_writes_on_bgsave_err; /* Don't allow writes if can't BGSAVE */
if diskless repl child is killed, make sure to reap the pid (#7742) Starting redis 6.0 and the changes we made to the diskless master to be suitable for TLS, I made the master avoid reaping (wait3) the pid of the child until we know all replicas are done reading their rdb. I did that in order to avoid a state where the rdb_child_pid is -1 but we don't yet want to start another fork (still busy serving that data to replicas). It turns out that the solution used so far was problematic in case the fork child was being killed (e.g. by the kernel OOM killer), in that case there's a chance that we currently disabled the read event on the rdb pipe, since we're waiting for a replica to become writable again. and in that scenario the master would have never realized the child exited, and the replica will remain hung too. Note that there's no mechanism to detect a hung replica while it's in rdb transfer state. The solution here is to add another pipe which is used by the parent to tell the child it is safe to exit. this mean that when the child exits, for whatever reason, it is safe to reap it. Besides that, i'm re-introducing an adjustment to REPLCONF ACK which was part of #6271 (Accelerate diskless master connections) but was dropped when that PR was rebased after the TLS fork/pipe changes (5a47794). Now that RdbPipeCleanup no longer calls checkChildrenDone, and the ACK has chance to detect that the child exited, it should be the one to call it so that we don't have to wait for cron (server.hz) to do that.
2020-09-06 15:43:57 +02:00
int rdb_pipe_read; /* RDB pipe used to transfer the rdb data */
/* to the parent process in diskless repl. */
int rdb_child_exit_pipe; /* Used by the diskless parent allow child exit. */
connection **rdb_pipe_conns; /* Connections which are currently the */
int rdb_pipe_numconns; /* target of diskless rdb fork child. */
int rdb_pipe_numconns_writing; /* Number of rdb conns with pending writes. */
char *rdb_pipe_buff; /* In diskless replication, this buffer holds data */
2022-03-09 12:55:17 +01:00
int rdb_pipe_bufflen; /* that was read from the rdb pipe. */
int rdb_key_save_delay; /* Delay in microseconds between keys while
* writing the RDB. (for testings). negative
* value means fractions of microseconds (on average). */
int key_load_delay; /* Delay in microseconds between keys while
* loading aof or rdb. (for testings). negative
* value means fractions of microseconds (on average). */
/* Pipe and data structures for child -> parent info sharing. */
int child_info_pipe[2]; /* Pipe used to write the child_info_data. */
int child_info_nread; /* Num of bytes of the last read from pipe */
/* Propagation of commands in AOF / replication */
redisOpArray also_propagate; /* Additional command to propagate. */
Fix some issues with modules and MULTI/EXEC (#8617) Bug 1: When a module ctx is freed moduleHandlePropagationAfterCommandCallback is called and handles propagation. We want to prevent it from propagating commands that were not replicated by the same context. Example: 1. module1.foo does: RM_Replicate(cmd1); RM_Call(cmd2); RM_Replicate(cmd3) 2. RM_Replicate(cmd1) propagates MULTI and adds cmd1 to also_propagagte 3. RM_Call(cmd2) create a new ctx, calls call() and destroys the ctx. 4. moduleHandlePropagationAfterCommandCallback is called, calling alsoPropagates EXEC (Note: EXEC is still not written to socket), setting server.in_trnsaction = 0 5. RM_Replicate(cmd3) is called, propagagting yet another MULTI (now we have nested MULTI calls, which is no good) and then cmd3 We must prevent RM_Call(cmd2) from resetting server.in_transaction. REDISMODULE_CTX_MULTI_EMITTED was revived for that purpose. Bug 2: Fix issues with nested RM_Call where some have '!' and some don't. Example: 1. module1.foo does RM_Call of module2.bar without replication (i.e. no '!') 2. module2.bar internally calls RM_Call of INCR with '!' 3. at the end of module1.foo we call RM_ReplicateVerbatim We want the replica/AOF to see only module1.foo and not the INCR from module2.bar Introduced a global replication_allowed flag inside RM_Call to determine whether we need to replicate or not (even if '!' was specified) Other changes: Split beforePropagateMultiOrExec to beforePropagateMulti afterPropagateExec just for better readability
2021-03-10 17:02:17 +01:00
int replication_allowed; /* Are we allowed to replicate? */
/* Logging */
char *logfile; /* Path of log file */
int syslog_enabled; /* Is syslog enabled? */
char *syslog_ident; /* Syslog ident */
int syslog_facility; /* Syslog facility */
int crashlog_enabled; /* Enable signal handler for crashlog.
* disable for clean core dumps. */
int memcheck_enabled; /* Enable memory check on crash. */
int use_exit_on_panic; /* Use exit() on panic and assert rather than
* abort(). useful for Valgrind. */
Wait for replicas when shutting down (#9872) To avoid data loss, this commit adds a grace period for lagging replicas to catch up the replication offset. Done: * Wait for replicas when shutdown is triggered by SIGTERM and SIGINT. * Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients to call SHUTDOWN in parallel. Note that they don't expect a response unless an error happens and shutdown is aborted. * Log warning for each replica lagging behind when finishing shutdown. * CLIENT_PAUSE_WRITE while waiting for replicas. * Configurable grace period 'shutdown-timeout' in seconds (default 10). * New flags for the SHUTDOWN command: - NOW disables the grace period for lagging replicas. - FORCE ignores errors writing the RDB or AOF files which would normally prevent a shutdown. - ABORT cancels ongoing shutdown. Can't be combined with other flags. * New field in the output of the INFO command: 'shutdown_in_milliseconds'. The value is the remaining maximum time to wait for lagging replicas before finishing the shutdown. This field is present in the Server section **only** during shutdown. Not directly related: * When shutting down, if there is an AOF saving child, it is killed **even** if AOF is disabled. This can happen if BGREWRITEAOF is used when AOF is off. * Client pause now has end time and type (WRITE or ALL) per purpose. The different pause purposes are *CLIENT PAUSE command*, *failover* and *shutdown*. If clients are unpaused for one purpose, it doesn't affect client pause for other purposes. For example, the CLIENT UNPAUSE command doesn't affect client pause initiated by the failover or shutdown procedures. A completed failover or a failed shutdown doesn't unpause clients paused by the CLIENT PAUSE command. Notes: * DEBUG RESTART doesn't wait for replicas. * We already have a warning logged when a replica disconnects. This means that if any replica connection is lost during the shutdown, it is either logged as disconnected or as lagging at the time of exit. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-02 08:50:15 +01:00
/* Shutdown */
int shutdown_timeout; /* Graceful shutdown time limit in seconds. */
int shutdown_on_sigint; /* Shutdown flags configured for SIGINT. */
int shutdown_on_sigterm; /* Shutdown flags configured for SIGTERM. */
Wait for replicas when shutting down (#9872) To avoid data loss, this commit adds a grace period for lagging replicas to catch up the replication offset. Done: * Wait for replicas when shutdown is triggered by SIGTERM and SIGINT. * Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients to call SHUTDOWN in parallel. Note that they don't expect a response unless an error happens and shutdown is aborted. * Log warning for each replica lagging behind when finishing shutdown. * CLIENT_PAUSE_WRITE while waiting for replicas. * Configurable grace period 'shutdown-timeout' in seconds (default 10). * New flags for the SHUTDOWN command: - NOW disables the grace period for lagging replicas. - FORCE ignores errors writing the RDB or AOF files which would normally prevent a shutdown. - ABORT cancels ongoing shutdown. Can't be combined with other flags. * New field in the output of the INFO command: 'shutdown_in_milliseconds'. The value is the remaining maximum time to wait for lagging replicas before finishing the shutdown. This field is present in the Server section **only** during shutdown. Not directly related: * When shutting down, if there is an AOF saving child, it is killed **even** if AOF is disabled. This can happen if BGREWRITEAOF is used when AOF is off. * Client pause now has end time and type (WRITE or ALL) per purpose. The different pause purposes are *CLIENT PAUSE command*, *failover* and *shutdown*. If clients are unpaused for one purpose, it doesn't affect client pause for other purposes. For example, the CLIENT UNPAUSE command doesn't affect client pause initiated by the failover or shutdown procedures. A completed failover or a failed shutdown doesn't unpause clients paused by the CLIENT PAUSE command. Notes: * DEBUG RESTART doesn't wait for replicas. * We already have a warning logged when a replica disconnects. This means that if any replica connection is lost during the shutdown, it is either logged as disconnected or as lagging at the time of exit. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-02 08:50:15 +01:00
/* Replication (master) */
PSYNC2: different improvements to Redis replication. The gist of the changes is that now, partial resynchronizations between slaves and masters (without the need of a full resync with RDB transfer and so forth), work in a number of cases when it was impossible in the past. For instance: 1. When a slave is promoted to mastrer, the slaves of the old master can partially resynchronize with the new master. 2. Chained slalves (slaves of slaves) can be moved to replicate to other slaves or the master itsef, without requiring a full resync. 3. The master itself, after being turned into a slave, is able to partially resynchronize with the new master, when it joins replication again. In order to obtain this, the following main changes were operated: * Slaves also take a replication backlog, not just masters. * Same stream replication for all the slaves and sub slaves. The replication stream is identical from the top level master to its slaves and is also the same from the slaves to their sub-slaves and so forth. This means that if a slave is later promoted to master, it has the same replication backlong, and can partially resynchronize with its slaves (that were previously slaves of the old master). * A given replication history is no longer identified by the `runid` of a Redis node. There is instead a `replication ID` which changes every time the instance has a new history no longer coherent with the past one. So, for example, slaves publish the same replication history of their master, however when they are turned into masters, they publish a new replication ID, but still remember the old ID, so that they are able to partially resynchronize with slaves of the old master (up to a given offset). * The replication protocol was slightly modified so that a new extended +CONTINUE reply from the master is able to inform the slave of a replication ID change. * REPLCONF CAPA is used in order to notify masters that a slave is able to understand the new +CONTINUE reply. * The RDB file was extended with an auxiliary field that is able to select a given DB after loading in the slave, so that the slave can continue receiving the replication stream from the point it was disconnected without requiring the master to insert "SELECT" statements. This is useful in order to guarantee the "same stream" property, because the slave must be able to accumulate an identical backlog. * Slave pings to sub-slaves are now sent in a special form, when the top-level master is disconnected, in order to don't interfer with the replication stream. We just use out of band "\n" bytes as in other parts of the Redis protocol. An old design document is available here: https://gist.github.com/antirez/ae068f95c0d084891305 However the implementation is not identical to the description because during the work to implement it, different changes were needed in order to make things working well.
2016-11-09 11:31:06 +01:00
char replid[CONFIG_RUN_ID_SIZE+1]; /* My current replication ID. */
char replid2[CONFIG_RUN_ID_SIZE+1]; /* replid inherited from master*/
long long master_repl_offset; /* My current replication offset */
long long second_replid_offset; /* Accept offsets up to this for replid2. */
int slaveseldb; /* Last SELECTed DB in replication output */
int repl_ping_slave_period; /* Master pings the slave every N seconds */
Replication backlog and replicas use one global shared replication buffer (#9166) ## Background For redis master, one replica uses one copy of replication buffer, that is a big waste of memory, more replicas more waste, and allocate/free memory for every reply list also cost much. If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with replicas and can't finish synchronization with replica. If we set client-output-buffer-limit big, master may be OOM when there are many replicas that separately keep much memory. Because replication buffers of different replica client are the same, one simple idea is that all replicas only use one replication buffer, that will effectively save memory. Since replication backlog content is the same as replicas' output buffer, now we can discard replication backlog memory and use global shared replication buffer to implement replication backlog mechanism. ## Implementation I create one global "replication buffer" which contains content of replication stream. The structure of "replication buffer" is similar to the reply list that exists in every client. But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields. ```c /* Replication buffer blocks is the list of replBufBlock. * * +--------------+ +--------------+ +--------------+ * | refcount = 1 | ... | refcount = 0 | ... | refcount = 2 | * +--------------+ +--------------+ +--------------+ * | / \ * | / \ * | / \ * Repl Backlog Replia_A Replia_B * * Each replica or replication backlog increments only the refcount of the * 'ref_repl_buf_node' which it points to. So when replica walks to the next * node, it should first increase the next node's refcount, and when we trim * the replication buffer nodes, we remove node always from the head node which * refcount is 0. If the refcount of the head node is not 0, we must stop * trimming and never iterate the next node. */ /* Similar with 'clientReplyBlock', it is used for shared buffers between * all replica clients and replication backlog. */ typedef struct replBufBlock { int refcount; /* Number of replicas or repl backlog using. */ long long id; /* The unique incremental number. */ long long repl_offset; /* Start replication offset of the block. */ size_t size, used; char buf[]; } replBufBlock; ``` So now when we feed replication stream into replication backlog and all replicas, we only need to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of replication backlog and replicas to references of the global replication buffer blocks. And we also need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim replication backlog if exceeding `repl-backlog-size`. When sending reply to replicas, we also need to iterate replication buffer blocks and send its content, when totally sending one block for replica, we decrease current node count and increase the next current node count, and then free the block which reference is 0 from the head of replication buffer blocks. Since now we use linked list to manage replication backlog, it may cost much time for iterating all linked list nodes to find corresponding replication buffer node. So we create a rax tree to store some nodes for index, but to avoid rax tree occupying too much memory, i record one per 64 nodes for index. Currently, to make partial resynchronization as possible as much, we always let replication backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks, and this method doesn't increase memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced replication buffer blocks when we need to trim backlog for exceeding backlog size setting, we trim backlog incrementally (free 64 blocks per call now), and make it faster in `beforeSleep` (free 640 blocks). ### Other changes - `mem_total_replication_buffers`: we add this field in INFO command, it means the total memory of replication buffers used. - `mem_clients_slaves`: now even replica is slow to replicate, and its output buffer memory is not 0, but it still may be 0, since replication backlog and replicas share one global replication buffer, only if replication buffer memory is more than the repl backlog setting size, we consider the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption of repl backlog. - Key eviction Since all replicas and replication backlog share global replication buffer, we think only the part of exceeding backlog size the extra separate consumption of replicas. Because we trim backlog incrementally in the background, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks disconnect. To avoid massive eviction loop, we don't count the delayed freed replication backlog into used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory. - `client-output-buffer-limit` check for replica clients It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size config (partial sync will succeed and then replica will get disconnected). Such a configuration is ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption implications since the replica client will share the backlog buffers memory. - Drop replication backlog after loading data if needed We always create replication backlog if server is a master, we need it because we put DELs in it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb, it is not possible to support partial resynchronization, to avoid extra memory of replication backlog, we drop it. - Multi IO threads Since all replicas and replication backlog use global replication buffer, if I/O threads are enabled, to guarantee data accessing thread safe, we must let main thread handle sending the output buffer to all replicas. But before, other IO threads could handle sending output buffer of all replicas. ## Other optimizations This solution resolve some other problem: - When replicas disconnect with master since of out of output buffer limit, releasing the output buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now, it doesn't cause freezing. - This implementation may mitigate reply list copy cost time(also freezes server) when one replication has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy reference info, it is very light. - If we set replication backlog size big, it also may cost much time to copy replication backlog into replica's output buffer. But this commit eliminates this problem. - Resizing replication backlog size doesn't empty current replication backlog content.
2021-10-25 08:24:31 +02:00
replBacklog *repl_backlog; /* Replication backlog for partial syncs */
long long repl_backlog_size; /* Backlog circular buffer size */
time_t repl_backlog_time_limit; /* Time without slaves after the backlog
gets released. */
time_t repl_no_slaves_since; /* We have no slaves since that time.
Only valid if server.slaves len is 0. */
int repl_min_slaves_to_write; /* Min number of slaves to write. */
int repl_min_slaves_max_lag; /* Max lag of <count> slaves to write. */
int repl_good_slaves_count; /* Number of slaves with lag <= max_lag. */
int repl_diskless_sync; /* Master send RDB to slaves sockets directly. */
int repl_diskless_load; /* Slave parse RDB directly from the socket.
* see REPL_DISKLESS_LOAD_* enum */
int repl_diskless_sync_delay; /* Delay to start a diskless repl BGSAVE. */
Set repl-diskless-sync to yes by default, add repl-diskless-sync-max-replicas (#10092) 1. enable diskless replication by default 2. add a new config named repl-diskless-sync-max-replicas that enables replication to start before the full repl-diskless-sync-delay was reached. 3. put replica online sooner on the master (see below) 4. test suite uses repl-diskless-sync-delay of 0 to be faster 5. a few tests that use multiple replica on a pre-populated master, are now using the new repl-diskless-sync-max-replicas 6. fix possible timing issues in a few cluster tests (see below) put replica online sooner on the master ---------------------------------------------------- there were two tests that failed because they needed for the master to realize that the replica is online, but the test code was actually only waiting for the replica to realize it's online, and in diskless it could have been before the master realized it. changes include two things: 1. the tests wait on the right thing 2. issues in the master, putting the replica online in two steps. the master used to put the replica as online in 2 steps. the first step was to mark it as online, and the second step was to enable the write event (only after getting ACK), but in fact the first step didn't contains some of the tasks to put it online (like updating good slave count, and sending the module event). this meant that if a test was waiting to see that the replica is online form the point of view of the master, and then confirm that the module got an event, or that the master has enough good replicas, it could fail due to timing issues. so now the full effect of putting the replica online, happens at once, and only the part about enabling the writes is delayed till the ACK. fix cluster tests -------------------- I added some code to wait for the replica to sync and avoid race conditions. later realized the sentinel and cluster tests where using the original 5 seconds delay, so changed it to 0. this means the other changes are probably not needed, but i suppose they're still better (avoid race conditions)
2022-01-17 13:11:11 +01:00
int repl_diskless_sync_max_replicas;/* Max replicas for diskless repl BGSAVE
* delay (start sooner if they all connect). */
Replication backlog and replicas use one global shared replication buffer (#9166) ## Background For redis master, one replica uses one copy of replication buffer, that is a big waste of memory, more replicas more waste, and allocate/free memory for every reply list also cost much. If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with replicas and can't finish synchronization with replica. If we set client-output-buffer-limit big, master may be OOM when there are many replicas that separately keep much memory. Because replication buffers of different replica client are the same, one simple idea is that all replicas only use one replication buffer, that will effectively save memory. Since replication backlog content is the same as replicas' output buffer, now we can discard replication backlog memory and use global shared replication buffer to implement replication backlog mechanism. ## Implementation I create one global "replication buffer" which contains content of replication stream. The structure of "replication buffer" is similar to the reply list that exists in every client. But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields. ```c /* Replication buffer blocks is the list of replBufBlock. * * +--------------+ +--------------+ +--------------+ * | refcount = 1 | ... | refcount = 0 | ... | refcount = 2 | * +--------------+ +--------------+ +--------------+ * | / \ * | / \ * | / \ * Repl Backlog Replia_A Replia_B * * Each replica or replication backlog increments only the refcount of the * 'ref_repl_buf_node' which it points to. So when replica walks to the next * node, it should first increase the next node's refcount, and when we trim * the replication buffer nodes, we remove node always from the head node which * refcount is 0. If the refcount of the head node is not 0, we must stop * trimming and never iterate the next node. */ /* Similar with 'clientReplyBlock', it is used for shared buffers between * all replica clients and replication backlog. */ typedef struct replBufBlock { int refcount; /* Number of replicas or repl backlog using. */ long long id; /* The unique incremental number. */ long long repl_offset; /* Start replication offset of the block. */ size_t size, used; char buf[]; } replBufBlock; ``` So now when we feed replication stream into replication backlog and all replicas, we only need to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of replication backlog and replicas to references of the global replication buffer blocks. And we also need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim replication backlog if exceeding `repl-backlog-size`. When sending reply to replicas, we also need to iterate replication buffer blocks and send its content, when totally sending one block for replica, we decrease current node count and increase the next current node count, and then free the block which reference is 0 from the head of replication buffer blocks. Since now we use linked list to manage replication backlog, it may cost much time for iterating all linked list nodes to find corresponding replication buffer node. So we create a rax tree to store some nodes for index, but to avoid rax tree occupying too much memory, i record one per 64 nodes for index. Currently, to make partial resynchronization as possible as much, we always let replication backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks, and this method doesn't increase memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced replication buffer blocks when we need to trim backlog for exceeding backlog size setting, we trim backlog incrementally (free 64 blocks per call now), and make it faster in `beforeSleep` (free 640 blocks). ### Other changes - `mem_total_replication_buffers`: we add this field in INFO command, it means the total memory of replication buffers used. - `mem_clients_slaves`: now even replica is slow to replicate, and its output buffer memory is not 0, but it still may be 0, since replication backlog and replicas share one global replication buffer, only if replication buffer memory is more than the repl backlog setting size, we consider the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption of repl backlog. - Key eviction Since all replicas and replication backlog share global replication buffer, we think only the part of exceeding backlog size the extra separate consumption of replicas. Because we trim backlog incrementally in the background, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks disconnect. To avoid massive eviction loop, we don't count the delayed freed replication backlog into used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory. - `client-output-buffer-limit` check for replica clients It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size config (partial sync will succeed and then replica will get disconnected). Such a configuration is ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption implications since the replica client will share the backlog buffers memory. - Drop replication backlog after loading data if needed We always create replication backlog if server is a master, we need it because we put DELs in it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb, it is not possible to support partial resynchronization, to avoid extra memory of replication backlog, we drop it. - Multi IO threads Since all replicas and replication backlog use global replication buffer, if I/O threads are enabled, to guarantee data accessing thread safe, we must let main thread handle sending the output buffer to all replicas. But before, other IO threads could handle sending output buffer of all replicas. ## Other optimizations This solution resolve some other problem: - When replicas disconnect with master since of out of output buffer limit, releasing the output buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now, it doesn't cause freezing. - This implementation may mitigate reply list copy cost time(also freezes server) when one replication has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy reference info, it is very light. - If we set replication backlog size big, it also may cost much time to copy replication backlog into replica's output buffer. But this commit eliminates this problem. - Resizing replication backlog size doesn't empty current replication backlog content.
2021-10-25 08:24:31 +02:00
size_t repl_buffer_mem; /* The memory of replication buffer. */
list *repl_buffer_blocks; /* Replication buffers blocks list
* (serving replica clients and repl backlog) */
/* Replication (slave) */
char *masteruser; /* AUTH with this user and masterauth with master */
sds masterauth; /* AUTH with this password with master */
char *masterhost; /* Hostname of master */
int masterport; /* Port of master */
int repl_timeout; /* Timeout after N seconds of master idle */
client *master; /* Client that is master for this slave */
client *cached_master; /* Cached master to be reused for PSYNC. */
int repl_syncio_timeout; /* Timeout for synchronous I/O calls */
2011-12-21 12:23:18 +01:00
int repl_state; /* Replication status if the instance is a slave */
off_t repl_transfer_size; /* Size of RDB to read from master during sync. */
off_t repl_transfer_read; /* Amount of RDB read from master during sync. */
off_t repl_transfer_last_fsync_off; /* Offset when we fsync-ed last time. */
connection *repl_transfer_s; /* Slave -> Master SYNC connection */
int repl_transfer_fd; /* Slave -> Master SYNC temp file descriptor */
char *repl_transfer_tmpfile; /* Slave-> master SYNC temp file name */
time_t repl_transfer_lastio; /* Unix time of the latest read, for timeout */
int repl_serve_stale_data; /* Serve stale data when link is down? */
int repl_slave_ro; /* Slave is read only? */
int repl_slave_ignore_maxmemory; /* If true slaves do not evict. */
time_t repl_down_since; /* Unix time at which link with master went down */
int repl_disable_tcp_nodelay; /* Disable TCP_NODELAY after SYNC? */
int slave_priority; /* Reported in INFO and used by Sentinel. */
int replica_announced; /* If true, replica is announced by Sentinel */
int slave_announce_port; /* Give the master this listening port. */
char *slave_announce_ip; /* Give the master this ip address. */
int propagation_error_behavior; /* Configures the behavior of the replica
* when it receives an error on the replication stream */
int repl_ignore_disk_write_error; /* Configures whether replicas panic when unable to
* persist writes to AOF. */
PSYNC2: different improvements to Redis replication. The gist of the changes is that now, partial resynchronizations between slaves and masters (without the need of a full resync with RDB transfer and so forth), work in a number of cases when it was impossible in the past. For instance: 1. When a slave is promoted to mastrer, the slaves of the old master can partially resynchronize with the new master. 2. Chained slalves (slaves of slaves) can be moved to replicate to other slaves or the master itsef, without requiring a full resync. 3. The master itself, after being turned into a slave, is able to partially resynchronize with the new master, when it joins replication again. In order to obtain this, the following main changes were operated: * Slaves also take a replication backlog, not just masters. * Same stream replication for all the slaves and sub slaves. The replication stream is identical from the top level master to its slaves and is also the same from the slaves to their sub-slaves and so forth. This means that if a slave is later promoted to master, it has the same replication backlong, and can partially resynchronize with its slaves (that were previously slaves of the old master). * A given replication history is no longer identified by the `runid` of a Redis node. There is instead a `replication ID` which changes every time the instance has a new history no longer coherent with the past one. So, for example, slaves publish the same replication history of their master, however when they are turned into masters, they publish a new replication ID, but still remember the old ID, so that they are able to partially resynchronize with slaves of the old master (up to a given offset). * The replication protocol was slightly modified so that a new extended +CONTINUE reply from the master is able to inform the slave of a replication ID change. * REPLCONF CAPA is used in order to notify masters that a slave is able to understand the new +CONTINUE reply. * The RDB file was extended with an auxiliary field that is able to select a given DB after loading in the slave, so that the slave can continue receiving the replication stream from the point it was disconnected without requiring the master to insert "SELECT" statements. This is useful in order to guarantee the "same stream" property, because the slave must be able to accumulate an identical backlog. * Slave pings to sub-slaves are now sent in a special form, when the top-level master is disconnected, in order to don't interfer with the replication stream. We just use out of band "\n" bytes as in other parts of the Redis protocol. An old design document is available here: https://gist.github.com/antirez/ae068f95c0d084891305 However the implementation is not identical to the description because during the work to implement it, different changes were needed in order to make things working well.
2016-11-09 11:31:06 +01:00
/* The following two fields is where we store master PSYNC replid/offset
* while the PSYNC is in progress. At the end we'll copy the fields into
* the server->master client structure. */
char master_replid[CONFIG_RUN_ID_SIZE+1]; /* Master PSYNC runid. */
long long master_initial_offset; /* Master PSYNC offset. */
int repl_slave_lazy_flush; /* Lazy FLUSHALL before loading DB? */
/* Synchronous replication. */
list *clients_waiting_acks; /* Clients waiting in WAIT command. */
int get_ack_from_slaves; /* If true we send REPLCONF GETACK. */
/* Limits */
unsigned int maxclients; /* Max number of simultaneous clients */
unsigned long long maxmemory; /* Max number of memory bytes to use */
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
ssize_t maxmemory_clients; /* Memory limit for total client buffers */
2013-01-16 18:00:20 +01:00
int maxmemory_policy; /* Policy for key eviction */
Squash merging 125 typo/grammar/comment/doc PRs (#7773) List of squashed commits or PRs =============================== commit 66801ea Author: hwware <wen.hui.ware@gmail.com> Date: Mon Jan 13 00:54:31 2020 -0500 typo fix in acl.c commit 46f55db Author: Itamar Haber <itamar@redislabs.com> Date: Sun Sep 6 18:24:11 2020 +0300 Updates a couple of comments Specifically: * RM_AutoMemory completed instead of pointing to docs * Updated link to custom type doc commit 61a2aa0 Author: xindoo <xindoo@qq.com> Date: Tue Sep 1 19:24:59 2020 +0800 Correct errors in code comments commit a5871d1 Author: yz1509 <pro-756@qq.com> Date: Tue Sep 1 18:36:06 2020 +0800 fix typos in module.c commit 41eede7 Author: bookug <bookug@qq.com> Date: Sat Aug 15 01:11:33 2020 +0800 docs: fix typos in comments commit c303c84 Author: lazy-snail <ws.niu@outlook.com> Date: Fri Aug 7 11:15:44 2020 +0800 fix spelling in redis.conf commit 1eb76bf Author: zhujian <zhujianxyz@gmail.com> Date: Thu Aug 6 15:22:10 2020 +0800 add a missing 'n' in comment commit 1530ec2 Author: Daniel Dai <764122422@qq.com> Date: Mon Jul 27 00:46:35 2020 -0400 fix spelling in tracking.c commit e517b31 Author: Hunter-Chen <huntcool001@gmail.com> Date: Fri Jul 17 22:33:32 2020 +0800 Update redis.conf Co-authored-by: Itamar Haber <itamar@redislabs.com> commit c300eff Author: Hunter-Chen <huntcool001@gmail.com> Date: Fri Jul 17 22:33:23 2020 +0800 Update redis.conf Co-authored-by: Itamar Haber <itamar@redislabs.com> commit 4c058a8 Author: 陈浩鹏 <chenhaopeng@heytea.com> Date: Thu Jun 25 19:00:56 2020 +0800 Grammar fix and clarification commit 5fcaa81 Author: bodong.ybd <bodong.ybd@alibaba-inc.com> Date: Fri Jun 19 10:09:00 2020 +0800 Fix typos commit 4caca9a Author: Pruthvi P <pruthvi@ixigo.com> Date: Fri May 22 00:33:22 2020 +0530 Fix typo eviciton => eviction commit b2a25f6 Author: Brad Dunbar <dunbarb2@gmail.com> Date: Sun May 17 12:39:59 2020 -0400 Fix a typo. commit 12842ae Author: hwware <wen.hui.ware@gmail.com> Date: Sun May 3 17:16:59 2020 -0400 fix spelling in redis conf commit ddba07c Author: Chris Lamb <chris@chris-lamb.co.uk> Date: Sat May 2 23:25:34 2020 +0100 Correct a "conflicts" spelling error. commit 8fc7bf2 Author: Nao YONASHIRO <yonashiro@r.recruit.co.jp> Date: Thu Apr 30 10:25:27 2020 +0900 docs: fix EXPIRE_FAST_CYCLE_DURATION to ACTIVE_EXPIRE_CYCLE_FAST_DURATION commit 9b2b67a Author: Brad Dunbar <dunbarb2@gmail.com> Date: Fri Apr 24 11:46:22 2020 -0400 Fix a typo. commit 0746f10 Author: devilinrust <63737265+devilinrust@users.noreply.github.com> Date: Thu Apr 16 00:17:53 2020 +0200 Fix typos in server.c commit 92b588d Author: benjessop12 <56115861+benjessop12@users.noreply.github.com> Date: Mon Apr 13 13:43:55 2020 +0100 Fix spelling mistake in lazyfree.c commit 1da37aa Merge: 2d4ba28 af347a8 Author: hwware <wen.hui.ware@gmail.com> Date: Thu Mar 5 22:41:31 2020 -0500 Merge remote-tracking branch 'upstream/unstable' into expiretypofix commit 2d4ba28 Author: hwware <wen.hui.ware@gmail.com> Date: Mon Mar 2 00:09:40 2020 -0500 fix typo in expire.c commit 1a746f7 Author: SennoYuki <minakami1yuki@gmail.com> Date: Thu Feb 27 16:54:32 2020 +0800 fix typo commit 8599b1a Author: dongheejeong <donghee950403@gmail.com> Date: Sun Feb 16 20:31:43 2020 +0000 Fix typo in server.c commit f38d4e8 Author: hwware <wen.hui.ware@gmail.com> Date: Sun Feb 2 22:58:38 2020 -0500 fix typo in evict.c commit fe143fc Author: Leo Murillo <leonardo.murillo@gmail.com> Date: Sun Feb 2 01:57:22 2020 -0600 Fix a few typos in redis.conf commit 1ab4d21 Author: viraja1 <anchan.viraj@gmail.com> Date: Fri Dec 27 17:15:58 2019 +0530 Fix typo in Latency API docstring commit ca1f70e Author: gosth <danxuedexing@qq.com> Date: Wed Dec 18 15:18:02 2019 +0800 fix typo in sort.c commit a57c06b Author: ZYunH <zyunhjob@163.com> Date: Mon Dec 16 22:28:46 2019 +0800 fix-zset-typo commit b8c92b5 Author: git-hulk <hulk.website@gmail.com> Date: Mon Dec 16 15:51:42 2019 +0800 FIX: typo in cluster.c, onformation->information commit 9dd981c Author: wujm2007 <jim.wujm@gmail.com> Date: Mon Dec 16 09:37:52 2019 +0800 Fix typo commit e132d7a Author: Sebastien Williams-Wynn <s.williamswynn.mail@gmail.com> Date: Fri Nov 15 00:14:07 2019 +0000 Minor typo change commit 47f44d5 Author: happynote3966 <01ssrmikururudevice01@gmail.com> Date: Mon Nov 11 22:08:48 2019 +0900 fix comment typo in redis-cli.c commit b8bdb0d Author: fulei <fulei@kuaishou.com> Date: Wed Oct 16 18:00:17 2019 +0800 Fix a spelling mistake of comments in defragDictBucketCallback commit 0def46a Author: fulei <fulei@kuaishou.com> Date: Wed Oct 16 13:09:27 2019 +0800 fix some spelling mistakes of comments in defrag.c commit f3596fd Author: Phil Rajchgot <tophil@outlook.com> Date: Sun Oct 13 02:02:32 2019 -0400 Typo and grammar fixes Redis and its documentation are great -- just wanted to submit a few corrections in the spirit of Hacktoberfest. Thanks for all your work on this project. I use it all the time and it works beautifully. commit 2b928cd Author: KangZhiDong <worldkzd@gmail.com> Date: Sun Sep 1 07:03:11 2019 +0800 fix typos commit 33aea14 Author: Axlgrep <axlgrep@gmail.com> Date: Tue Aug 27 11:02:18 2019 +0800 Fixed eviction spelling issues commit e282a80 Author: Simen Flatby <simen@oms.no> Date: Tue Aug 20 15:25:51 2019 +0200 Update comments to reflect prop name In the comments the prop is referenced as replica-validity-factor, but it is really named cluster-replica-validity-factor. commit 74d1f9a Author: Jim Green <jimgreen2013@qq.com> Date: Tue Aug 20 20:00:31 2019 +0800 fix comment error, the code is ok commit eea1407 Author: Liao Tonglang <liaotonglang@gmail.com> Date: Fri May 31 10:16:18 2019 +0800 typo fix fix cna't to can't commit 0da553c Author: KAWACHI Takashi <tkawachi@gmail.com> Date: Wed Jul 17 00:38:16 2019 +0900 Fix typo commit 7fc8fb6 Author: Michael Prokop <mika@grml.org> Date: Tue May 28 17:58:42 2019 +0200 Typo fixes s/familar/familiar/ s/compatiblity/compatibility/ s/ ot / to / s/itsef/itself/ commit 5f46c9d Author: zhumoing <34539422+zhumoing@users.noreply.github.com> Date: Tue May 21 21:16:50 2019 +0800 typo-fixes typo-fixes commit 321dfe1 Author: wxisme <850885154@qq.com> Date: Sat Mar 16 15:10:55 2019 +0800 typo fix commit b4fb131 Merge: 267e0e6 3df1eb8 Author: Nikitas Bastas <nikitasbst@gmail.com> Date: Fri Feb 8 22:55:45 2019 +0200 Merge branch 'unstable' of antirez/redis into unstable commit 267e0e6 Author: Nikitas Bastas <nikitasbst@gmail.com> Date: Wed Jan 30 21:26:04 2019 +0200 Minor typo fix commit 30544e7 Author: inshal96 <39904558+inshal96@users.noreply.github.com> Date: Fri Jan 4 16:54:50 2019 +0500 remove an extra 'a' in the comments commit 337969d Author: BrotherGao <yangdongheng11@gmail.com> Date: Sat Dec 29 12:37:29 2018 +0800 fix typo in redis.conf commit 9f4b121 Merge: 423a030 e504583 Author: BrotherGao <yangdongheng@xiaomi.com> Date: Sat Dec 29 11:41:12 2018 +0800 Merge branch 'unstable' of antirez/redis into unstable commit 423a030 Merge: 42b02b7 46a51cd Author: 杨东衡 <yangdongheng@xiaomi.com> Date: Tue Dec 4 23:56:11 2018 +0800 Merge branch 'unstable' of antirez/redis into unstable commit 42b02b7 Merge: 68c0e6e b8febe6 Author: Dongheng Yang <yangdongheng11@gmail.com> Date: Sun Oct 28 15:54:23 2018 +0800 Merge pull request #1 from antirez/unstable update local data commit 714b589 Author: Christian <crifei93@gmail.com> Date: Fri Dec 28 01:17:26 2018 +0100 fix typo "resulution" commit e23259d Author: garenchan <1412950785@qq.com> Date: Wed Dec 26 09:58:35 2018 +0800 fix typo: segfauls -> segfault commit a9359f8 Author: xjp <jianping_xie@aliyun.com> Date: Tue Dec 18 17:31:44 2018 +0800 Fixed REDISMODULE_H spell bug commit a12c3e4 Author: jdiaz <jrd.palacios@gmail.com> Date: Sat Dec 15 23:39:52 2018 -0600 Fixes hyperloglog hash function comment block description commit 770eb11 Author: 林上耀 <1210tom@163.com> Date: Sun Nov 25 17:16:10 2018 +0800 fix typo commit fd97fbb Author: Chris Lamb <chris@chris-lamb.co.uk> Date: Fri Nov 23 17:14:01 2018 +0100 Correct "unsupported" typo. commit a85522d Author: Jungnam Lee <jungnam.lee@oracle.com> Date: Thu Nov 8 23:01:29 2018 +0900 fix typo in test comments commit ade8007 Author: Arun Kumar <palerdot@users.noreply.github.com> Date: Tue Oct 23 16:56:35 2018 +0530 Fixed grammatical typo Fixed typo for word 'dictionary' commit 869ee39 Author: Hamid Alaei <hamid.a85@gmail.com> Date: Sun Aug 12 16:40:02 2018 +0430 fix documentations: (ThreadSafeContextStart/Stop -> ThreadSafeContextLock/Unlock), minor typo commit f89d158 Author: Mayank Jain <mayankjain255@gmail.com> Date: Tue Jul 31 23:01:21 2018 +0530 Updated README.md with some spelling corrections. Made correction in spelling of some misspelled words. commit 892198e Author: dsomeshwar <someshwar.dhayalan@gmail.com> Date: Sat Jul 21 23:23:04 2018 +0530 typo fix commit 8a4d780 Author: Itamar Haber <itamar@redislabs.com> Date: Mon Apr 30 02:06:52 2018 +0300 Fixes some typos commit e3acef6 Author: Noah Rosamilia <ivoahivoah@gmail.com> Date: Sat Mar 3 23:41:21 2018 -0500 Fix typo in /deps/README.md commit 04442fb Author: WuYunlong <xzsyeb@126.com> Date: Sat Mar 3 10:32:42 2018 +0800 Fix typo in readSyncBulkPayload() comment. commit 9f36880 Author: WuYunlong <xzsyeb@126.com> Date: Sat Mar 3 10:20:37 2018 +0800 replication.c comment: run_id -> replid. commit f866b4a Author: Francesco 'makevoid' Canessa <makevoid@gmail.com> Date: Thu Feb 22 22:01:56 2018 +0000 fix comment typo in server.c commit 0ebc69b Author: 줍 <jubee0124@gmail.com> Date: Mon Feb 12 16:38:48 2018 +0900 Fix typo in redis.conf Fix `five behaviors` to `eight behaviors` in [this sentence ](antirez/redis@unstable/redis.conf#L564) commit b50a620 Author: martinbroadhurst <martinbroadhurst@users.noreply.github.com> Date: Thu Dec 28 12:07:30 2017 +0000 Fix typo in valgrind.sup commit 7d8f349 Author: Peter Boughton <peter@sorcerersisle.com> Date: Mon Nov 27 19:52:19 2017 +0000 Update CONTRIBUTING; refer doc updates to redis-doc repo. commit 02dec7e Author: Klauswk <klauswk1@hotmail.com> Date: Tue Oct 24 16:18:38 2017 -0200 Fix typo in comment commit e1efbc8 Author: chenshi <baiwfg2@gmail.com> Date: Tue Oct 3 18:26:30 2017 +0800 Correct two spelling errors of comments commit 93327d8 Author: spacewander <spacewanderlzx@gmail.com> Date: Wed Sep 13 16:47:24 2017 +0800 Update the comment for OBJ_ENCODING_EMBSTR_SIZE_LIMIT's value The value of OBJ_ENCODING_EMBSTR_SIZE_LIMIT is 44 now instead of 39. commit 63d361f Author: spacewander <spacewanderlzx@gmail.com> Date: Tue Sep 12 15:06:42 2017 +0800 Fix <prevlen> related doc in ziplist.c According to the definition of ZIP_BIG_PREVLEN and other related code, the guard of single byte <prevlen> should be 254 instead of 255. commit ebe228d Author: hanael80 <hanael80@gmail.com> Date: Tue Aug 15 09:09:40 2017 +0900 Fix typo commit 6b696e6 Author: Matt Robenolt <matt@ydekproductions.com> Date: Mon Aug 14 14:50:47 2017 -0700 Fix typo in LATENCY DOCTOR output commit a2ec6ae Author: caosiyang <caosiyang@qiyi.com> Date: Tue Aug 15 14:15:16 2017 +0800 Fix a typo: form => from commit 3ab7699 Author: caosiyang <caosiyang@qiyi.com> Date: Thu Aug 10 18:40:33 2017 +0800 Fix a typo: replicationFeedSlavesFromMaster() => replicationFeedSlavesFromMasterStream() commit 72d43ef Author: caosiyang <caosiyang@qiyi.com> Date: Tue Aug 8 15:57:25 2017 +0800 fix a typo: servewr => server commit 707c958 Author: Bo Cai <charpty@gmail.com> Date: Wed Jul 26 21:49:42 2017 +0800 redis-cli.c typo: conut -> count. Signed-off-by: Bo Cai <charpty@gmail.com> commit b9385b2 Author: JackDrogon <jack.xsuperman@gmail.com> Date: Fri Jun 30 14:22:31 2017 +0800 Fix some spell problems commit 20d9230 Author: akosel <aaronjkosel@gmail.com> Date: Sun Jun 4 19:35:13 2017 -0500 Fix typo commit b167bfc Author: Krzysiek Witkowicz <krzysiekwitkowicz@gmail.com> Date: Mon May 22 21:32:27 2017 +0100 Fix #4008 small typo in comment commit 2b78ac8 Author: Jake Clarkson <jacobwclarkson@gmail.com> Date: Wed Apr 26 15:49:50 2017 +0100 Correct typo in tests/unit/hyperloglog.tcl commit b0f1cdb Author: Qi Luo <qiluo-msft@users.noreply.github.com> Date: Wed Apr 19 14:25:18 2017 -0700 Fix typo commit a90b0f9 Author: charsyam <charsyam@naver.com> Date: Thu Mar 16 18:19:53 2017 +0900 fix typos fix typos fix typos commit 8430a79 Author: Richard Hart <richardhart92@gmail.com> Date: Mon Mar 13 22:17:41 2017 -0400 Fixed log message typo in listenToPort. commit 481a1c2 Author: Vinod Kumar <kumar003vinod@gmail.com> Date: Sun Jan 15 23:04:51 2017 +0530 src/db.c: Correct "save" -> "safe" typo commit 586b4d3 Author: wangshaonan <wshn13@gmail.com> Date: Wed Dec 21 20:28:27 2016 +0800 Fix typo they->the in helloworld.c commit c1c4b5e Author: Jenner <hypxm@qq.com> Date: Mon Dec 19 16:39:46 2016 +0800 typo error commit 1ee1a3f Author: tielei <43289893@qq.com> Date: Mon Jul 18 13:52:25 2016 +0800 fix some comments commit 11a41fb Author: Otto Kekäläinen <otto@seravo.fi> Date: Sun Jul 3 10:23:55 2016 +0100 Fix spelling in documentation and comments commit 5fb5d82 Author: francischan <f1ancis621@gmail.com> Date: Tue Jun 28 00:19:33 2016 +0800 Fix outdated comments about redis.c file. It should now refer to server.c file. commit 6b254bc Author: lmatt-bit <lmatt123n@gmail.com> Date: Thu Apr 21 21:45:58 2016 +0800 Refine the comment of dictRehashMilliseconds func SLAVECONF->REPLCONF in comment - by andyli029 commit ee9869f Author: clark.kang <charsyam@naver.com> Date: Tue Mar 22 11:09:51 2016 +0900 fix typos commit f7b3b11 Author: Harisankar H <harisankarh@gmail.com> Date: Wed Mar 9 11:49:42 2016 +0530 Typo correction: "faield" --> "failed" Typo correction: "faield" --> "failed" commit 3fd40fc Author: Itamar Haber <itamar@redislabs.com> Date: Thu Feb 25 10:31:51 2016 +0200 Fixes a typo in comments commit 621c160 Author: Prayag Verma <prayag.verma@gmail.com> Date: Mon Feb 1 12:36:20 2016 +0530 Fix typo in Readme.md Spelling mistakes - `eviciton` > `eviction` `familar` > `familiar` commit d7d07d6 Author: WonCheol Lee <toctoc21c@gmail.com> Date: Wed Dec 30 15:11:34 2015 +0900 Typo fixed commit a4dade7 Author: Felix Bünemann <buenemann@louis.info> Date: Mon Dec 28 11:02:55 2015 +0100 [ci skip] Improve supervised upstart config docs This mentions that "expect stop" is required for supervised upstart to work correctly. See http://upstart.ubuntu.com/cookbook/#expect-stop for an explanation. commit d9caba9 Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:30:03 2015 +1100 README: Remove trailing whitespace commit 72d42e5 Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:29:32 2015 +1100 README: Fix typo. th => the commit dd6e957 Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:29:20 2015 +1100 README: Fix typo. familar => familiar commit 3a12b23 Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:28:54 2015 +1100 README: Fix typo. eviciton => eviction commit 2d1d03b Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:21:45 2015 +1100 README: Fix typo. sever => server commit 3973b06 Author: Itamar Haber <itamar@garantiadata.com> Date: Sat Dec 19 17:01:20 2015 +0200 Typo fix commit 4f2e460 Author: Steve Gao <fu@2token.com> Date: Fri Dec 4 10:22:05 2015 +0800 Update README - fix typos commit b21667c Author: binyan <binbin.yan@nokia.com> Date: Wed Dec 2 22:48:37 2015 +0800 delete redundancy color judge in sdscatcolor commit 88894c7 Author: binyan <binbin.yan@nokia.com> Date: Wed Dec 2 22:14:42 2015 +0800 the example output shoule be HelloWorld commit 2763470 Author: binyan <binbin.yan@nokia.com> Date: Wed Dec 2 17:41:39 2015 +0800 modify error word keyevente Signed-off-by: binyan <binbin.yan@nokia.com> commit 0847b3d Author: Bruno Martins <bscmartins@gmail.com> Date: Wed Nov 4 11:37:01 2015 +0000 typo commit bbb9e9e Author: dawedawe <dawedawe@gmx.de> Date: Fri Mar 27 00:46:41 2015 +0100 typo: zimap -> zipmap commit 5ed297e Author: Axel Advento <badwolf.bloodseeker.rev@gmail.com> Date: Tue Mar 3 15:58:29 2015 +0800 Fix 'salve' typos to 'slave' commit edec9d6 Author: LudwikJaniuk <ludvig.janiuk@gmail.com> Date: Wed Jun 12 14:12:47 2019 +0200 Update README.md Co-Authored-By: Qix <Qix-@users.noreply.github.com> commit 692a7af Author: LudwikJaniuk <ludvig.janiuk@gmail.com> Date: Tue May 28 14:32:04 2019 +0200 grammar commit d962b0a Author: Nick Frost <nickfrostatx@gmail.com> Date: Wed Jul 20 15:17:12 2016 -0700 Minor grammar fix commit 24fff01aaccaf5956973ada8c50ceb1462e211c6 (typos) Author: Chad Miller <chadm@squareup.com> Date: Tue Sep 8 13:46:11 2020 -0400 Fix faulty comment about operation of unlink() commit 3cd5c1f3326c52aa552ada7ec797c6bb16452355 Author: Kevin <kevin.xgr@gmail.com> Date: Wed Nov 20 00:13:50 2019 +0800 Fix typo in server.c. From a83af59 Mon Sep 17 00:00:00 2001 From: wuwo <wuwo@wacai.com> Date: Fri, 17 Mar 2017 20:37:45 +0800 Subject: [PATCH] falure to failure From c961896 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E5=B7=A6=E6=87=B6?= <veficos@gmail.com> Date: Sat, 27 May 2017 15:33:04 +0800 Subject: [PATCH] fix typo From e600ef2 Mon Sep 17 00:00:00 2001 From: "rui.zou" <rui.zou@yunify.com> Date: Sat, 30 Sep 2017 12:38:15 +0800 Subject: [PATCH] fix a typo From c7d07fa Mon Sep 17 00:00:00 2001 From: Alexandre Perrin <alex@kaworu.ch> Date: Thu, 16 Aug 2018 10:35:31 +0200 Subject: [PATCH] deps README.md typo From b25cb67 Mon Sep 17 00:00:00 2001 From: Guy Korland <gkorland@gmail.com> Date: Wed, 26 Sep 2018 10:55:37 +0300 Subject: [PATCH 1/2] fix typos in header From ad28ca6 Mon Sep 17 00:00:00 2001 From: Guy Korland <gkorland@gmail.com> Date: Wed, 26 Sep 2018 11:02:36 +0300 Subject: [PATCH 2/2] fix typos commit 34924cdedd8552466fc22c1168d49236cb7ee915 Author: Adrian Lynch <adi_ady_ade@hotmail.com> Date: Sat Apr 4 21:59:15 2015 +0100 Typos fixed commit fd2a1e7 Author: Jan <jsteemann@users.noreply.github.com> Date: Sat Oct 27 19:13:01 2018 +0200 Fix typos Fix typos commit e14e47c1a234b53b0e103c5f6a1c61481cbcbb02 Author: Andy Lester <andy@petdance.com> Date: Fri Aug 2 22:30:07 2019 -0500 Fix multiple misspellings of "following" commit 79b948ce2dac6b453fe80995abbcaac04c213d5a Author: Andy Lester <andy@petdance.com> Date: Fri Aug 2 22:24:28 2019 -0500 Fix misspelling of create-cluster commit 1fffde52666dc99ab35efbd31071a4c008cb5a71 Author: Andy Lester <andy@petdance.com> Date: Wed Jul 31 17:57:56 2019 -0500 Fix typos commit 204c9ba9651e9e05fd73936b452b9a30be456cfe Author: Xiaobo Zhu <xiaobo.zhu@shopee.com> Date: Tue Aug 13 22:19:25 2019 +0800 fix typos Squashed commit of the following: commit 1d9aaf8 Author: danmedani <danmedani@gmail.com> Date: Sun Aug 2 11:40:26 2015 -0700 README typo fix. Squashed commit of the following: commit 32bfa7c Author: Erik Dubbelboer <erik@dubbelboer.com> Date: Mon Jul 6 21:15:08 2015 +0200 Fixed grammer Squashed commit of the following: commit b24f69c Author: Sisir Koppaka <sisir.koppaka@gmail.com> Date: Mon Mar 2 22:38:45 2015 -0500 utils/hashtable/rehashing.c: Fix typos Squashed commit of the following: commit 4e04082 Author: Erik Dubbelboer <erik@dubbelboer.com> Date: Mon Mar 23 08:22:21 2015 +0000 Small config file documentation improvements Squashed commit of the following: commit acb8773 Author: ctd1500 <ctd1500@gmail.com> Date: Fri May 8 01:52:48 2015 -0700 Typo and grammar fixes in readme commit 2eb75b6 Author: ctd1500 <ctd1500@gmail.com> Date: Fri May 8 01:36:18 2015 -0700 fixed redis.conf comment Squashed commit of the following: commit a8249a2 Author: Masahiko Sawada <sawada.mshk@gmail.com> Date: Fri Dec 11 11:39:52 2015 +0530 Revise correction of typos. Squashed commit of the following: commit 3c02028 Author: zhaojun11 <zhaojun11@jd.com> Date: Wed Jan 17 19:05:28 2018 +0800 Fix typos include two code typos in cluster.c and latency.c Squashed commit of the following: commit 9dba47c Author: q191201771 <191201771@qq.com> Date: Sat Jan 4 11:31:04 2020 +0800 fix function listCreate comment in adlist.c Update src/server.c commit 2c7c2cb536e78dd211b1ac6f7bda00f0f54faaeb Author: charpty <charpty@gmail.com> Date: Tue May 1 23:16:59 2018 +0800 server.c typo: modules system dictionary type comment Signed-off-by: charpty <charpty@gmail.com> commit a8395323fb63cb59cb3591cb0f0c8edb7c29a680 Author: Itamar Haber <itamar@redislabs.com> Date: Sun May 6 00:25:18 2018 +0300 Updates test_helper.tcl's help with undocumented options Specifically: * Host * Port * Client commit bde6f9ced15755cd6407b4af7d601b030f36d60b Author: wxisme <850885154@qq.com> Date: Wed Aug 8 15:19:19 2018 +0800 fix comments in deps files commit 3172474ba991532ab799ee1873439f3402412331 Author: wxisme <850885154@qq.com> Date: Wed Aug 8 14:33:49 2018 +0800 fix some comments commit 01b6f2b6858b5cf2ce4ad5092d2c746e755f53f0 Author: Thor Juhasz <thor@juhasz.pro> Date: Sun Nov 18 14:37:41 2018 +0100 Minor fixes to comments Found some parts a little unclear on a first read, which prompted me to have a better look at the file and fix some minor things I noticed. Fixing minor typos and grammar. There are no changes to configuration options. These changes are only meant to help the user better understand the explanations to the various configuration options
2020-09-10 12:43:38 +02:00
int maxmemory_samples; /* Precision of random sampling */
int maxmemory_eviction_tenacity;/* Aggressiveness of eviction processing */
2017-10-13 05:09:48 +02:00
int lfu_log_factor; /* LFU logarithmic counter factor. */
int lfu_decay_time; /* LFU counter decay factor. */
long long proto_max_bulk_len; /* Protocol bulk length maximum size. */
int oom_score_adj_values[CONFIG_OOM_COUNT]; /* Linux oom_score_adj configuration */
int oom_score_adj; /* If true, oom_score_adj is managed */
int disable_thp; /* If true, disable THP by syscall */
/* Blocked clients */
unsigned int blocked_clients; /* # of clients executing a blocking cmd.*/
unsigned int blocked_clients_by_type[BLOCKED_NUM];
list *unblocked_clients; /* list of clients to unblock before next loop */
A reimplementation of blocking operation internals. Redis provides support for blocking operations such as BLPOP or BRPOP. This operations are identical to normal LPOP and RPOP operations as long as there are elements in the target list, but if the list is empty they block waiting for new data to arrive to the list. All the clients blocked waiting for th same list are served in a FIFO way, so the first that blocked is the first to be served when there is more data pushed by another client into the list. The previous implementation of blocking operations was conceived to serve clients in the context of push operations. For for instance: 1) There is a client "A" blocked on list "foo". 2) The client "B" performs `LPUSH foo somevalue`. 3) The client "A" is served in the context of the "B" LPUSH, synchronously. Processing things in a synchronous way was useful as if "A" pushes a value that is served by "B", from the point of view of the database is a NOP (no operation) thing, that is, nothing is replicated, nothing is written in the AOF file, and so forth. However later we implemented two things: 1) Variadic LPUSH that could add multiple values to a list in the context of a single call. 2) BRPOPLPUSH that was a version of BRPOP that also provided a "PUSH" side effect when receiving data. This forced us to make the synchronous implementation more complex. If client "B" is waiting for data, and "A" pushes three elemnents in a single call, we needed to propagate an LPUSH with a missing argument in the AOF and replication link. We also needed to make sure to replicate the LPUSH side of BRPOPLPUSH, but only if in turn did not happened to serve another blocking client into another list ;) This were complex but with a few of mutually recursive functions everything worked as expected... until one day we introduced scripting in Redis. Scripting + synchronous blocking operations = Issue #614. Basically you can't "rewrite" a script to have just a partial effect on the replicas and AOF file if the script happened to serve a few blocked clients. The solution to all this problems, implemented by this commit, is to change the way we serve blocked clients. Instead of serving the blocked clients synchronously, in the context of the command performing the PUSH operation, it is now an asynchronous and iterative process: 1) If a key that has clients blocked waiting for data is the subject of a list push operation, We simply mark keys as "ready" and put it into a queue. 2) Every command pushing stuff on lists, as a variadic LPUSH, a script, or whatever it is, is replicated verbatim without any rewriting. 3) Every time a Redis command, a MULTI/EXEC block, or a script, completed its execution, we run the list of keys ready to serve blocked clients (as more data arrived), and process this list serving the blocked clients. 4) As a result of "3" maybe more keys are ready again for other clients (as a result of BRPOPLPUSH we may have push operations), so we iterate back to step "3" if it's needed. The new code has a much simpler semantics, and a simpler to understand implementation, with the disadvantage of not being able to "optmize out" a PUSH+BPOP as a No OP. This commit will be tested with care before the final merge, more tests will be added likely.
2012-09-04 10:37:49 +02:00
list *ready_keys; /* List of readyList structures for BLPOP & co */
/* Client side caching. */
unsigned int tracking_clients; /* # of clients with tracking enabled.*/
size_t tracking_table_max_keys; /* Max number of keys in tracking table. */
list *tracking_pending_keys; /* tracking invalidation keys pending to flush */
/* Sort parameters - qsort_r() is only available under BSD so we
* have to take this state global, in order to pass it to sortCompare() */
int sort_desc;
int sort_alpha;
int sort_bypattern;
int sort_store;
/* Zip structure config, see redis.conf for more information */
size_t hash_max_listpack_entries;
size_t hash_max_listpack_value;
size_t set_max_intset_entries;
Replace all usage of ziplist with listpack for t_zset (#9366) Part two of implementing #8702 (zset), after #8887. ## Description of the feature Replaced all uses of ziplist with listpack in t_zset, and optimized some of the code to optimize performance. ## Rdb format changes New `RDB_TYPE_ZSET_LISTPACK` rdb type. ## Rdb loading improvements: 1) Pre-expansion of dict for validation of duplicate data for listpack and ziplist. 2) Simplifying the release of empty key objects when RDB loading. 3) Unify ziplist and listpack data verify methods for zset and hash, and move code to rdb.c. ## Interface changes 1) New `zset-max-listpack-entries` config is an alias for `zset-max-ziplist-entries` (same with `zset-max-listpack-value`). 2) OBJECT ENCODING will return listpack instead of ziplist. ## Listpack improvements: 1) Add `lpDeleteRange` and `lpDeleteRangeWithEntry` functions to delete a range of entries from listpack. 2) Improve the performance of `lpCompare`, converting from string to integer is faster than converting from integer to string. 3) Replace `snprintf` with `ll2string` to improve performance in converting numbers to strings in `lpGet()`. ## Zset improvements: 1) Improve the performance of `zzlFind` method, use `lpFind` instead of `lpCompare` in a loop. 2) Use `lpDeleteRangeWithEntry` instead of `lpDelete` twice to delete a element of zset. ## Tests 1) Add some unittests for `lpDeleteRange` and `lpDeleteRangeWithEntry` function. 2) Add zset RDB loading test. 3) Add benchmark test for `lpCompare` and `ziplsitCompare`. 4) Add empty listpack zset corrupt dump test.
2021-09-09 17:18:53 +02:00
size_t zset_max_listpack_entries;
size_t zset_max_listpack_value;
size_t hll_sparse_max_bytes;
size_t stream_node_max_bytes;
long long stream_node_max_entries;
/* List parameters */
2021-11-24 12:34:13 +01:00
int list_max_listpack_size;
int list_compress_depth;
/* time cache */
Implement redisAtomic to replace _Atomic C11 builtin (#7707) Redis 6.0 introduces I/O threads, it is so cool and efficient, we use C11 _Atomic to establish inter-thread synchronization without mutex. But the compiler that must supports C11 _Atomic can compile redis code, that brings a lot of inconvenience since some common platforms can't support by default such as CentOS7, so we want to implement redis atomic type to make it more portable. We have implemented our atomic variable for redis that only has 'relaxed' operations in src/atomicvar.h, so we implement some operations with 'sequentially-consistent', just like the default behavior of C11 _Atomic that can establish inter-thread synchronization. And we replace all uses of C11 _Atomic with redis atomic variable. Our implementation of redis atomic variable uses C11 _Atomic, __atomic or __sync macros if available, it supports most common platforms, and we will detect automatically which feature we use. In Makefile we use a dummy file to detect if the compiler supports C11 _Atomic. Now for gcc, we can compile redis code theoretically if your gcc version is not less than 4.1.2(starts to support __sync_xxx operations). Otherwise, we remove use mutex fallback to implement redis atomic variable for performance and test. You will get compiling errors if your compiler doesn't support all features of above. For cover redis atomic variable tests, we add other CI jobs that build redis on CentOS6 and CentOS7 and workflow daily jobs that run the tests on them. For them, we just install gcc by default in order to cover different compiler versions, gcc is 4.4.7 by default installation on CentOS6 and 4.8.5 on CentOS7. We restore the feature that we can test redis with Helgrind to find data race errors. But you need install Valgrind in the default path configuration firstly before running your tests, since we use macros in helgrind.h to tell Helgrind inter-thread happens-before relationship explicitly for avoiding false positives. Please open an issue on github if you find data race errors relate to this commit. Unrelated: - Fix redefinition of typedef 'RedisModuleUserChangedFunc' For some old version compilers, they will report errors or warnings, if we re-define function type.
2020-09-17 15:01:45 +02:00
redisAtomic time_t unixtime; /* Unix time sampled every cron cycle. */
2019-03-30 11:26:58 +01:00
time_t timezone; /* Cached timezone. As set by tzset(). */
int daylight_active; /* Currently in daylight saving time. */
mstime_t mstime; /* 'unixtime' in milliseconds. */
ustime_t ustime; /* 'unixtime' in microseconds. */
size_t blocking_op_nesting; /* Nesting level of blocking operation, used to reset blocked_last_cron. */
long long blocked_last_cron; /* Indicate the mstime of the last time we did cron jobs from a blocking operation */
/* Pubsub */
dict *pubsub_channels; /* Map channels to list of subscribed clients */
dict *pubsub_patterns; /* A dict of pubsub_patterns */
int notify_keyspace_events; /* Events to propagate via Pub/Sub. This is an
2015-07-27 09:41:48 +02:00
xor of NOTIFY_... flags. */
dict *pubsubshard_channels; /* Map channels to list of subscribed clients */
/* Cluster */
int cluster_enabled; /* Is cluster enabled? */
int cluster_port; /* Set the cluster port for a node. */
mstime_t cluster_node_timeout; /* Cluster node timeout. */
char *cluster_configfile; /* Cluster auto-generated config file name. */
struct clusterState *cluster; /* State of the cluster */
int cluster_migration_barrier; /* Cluster replicas migration barrier. */
int cluster_allow_replica_migration; /* Automatic replica migrations to orphaned masters and from empty masters */
int cluster_slave_validity_factor; /* Slave max data age for failover. */
int cluster_require_full_coverage; /* If true, put the cluster down if
2015-10-29 13:50:04 +01:00
there is at least an uncovered slot.*/
int cluster_slave_no_failover; /* Prevent slave from starting a failover
if the master is in failure state. */
char *cluster_announce_ip; /* IP address to announce on cluster bus. */
char *cluster_announce_hostname; /* hostname to announce on cluster bus. */
int cluster_preferred_endpoint_type; /* Use the announced hostname when available. */
int cluster_announce_port; /* base port to announce on cluster bus. */
int cluster_announce_tls_port; /* TLS port to announce on cluster bus. */
int cluster_announce_bus_port; /* bus port to announce on cluster bus. */
int cluster_module_flags; /* Set of flags that Redis modules are able
to set in order to suppress certain
native Redis Cluster features. Check the
REDISMODULE_CLUSTER_FLAG_*. */
int cluster_allow_reads_when_down; /* Are reads allowed when the cluster
is down? */
int cluster_config_file_lock_fd; /* cluster config fd, will be flock */
unsigned long long cluster_link_sendbuf_limit_bytes; /* Memory usage limit on individual link send buffers*/
int cluster_drop_packet_filter; /* Debug config that allows tactically
* dropping packets of a specific type */
/* Scripting */
client *script_caller; /* The client running script right now, or NULL */
mstime_t busy_reply_threshold; /* Script / module timeout in milliseconds */
int script_oom; /* OOM detected when script start */
int script_disable_deny_script; /* Allow running commands marked "no-script" inside a script. */
/* Lazy free */
int lazyfree_lazy_eviction;
int lazyfree_lazy_expire;
int lazyfree_lazy_server_del;
int lazyfree_lazy_user_del;
int lazyfree_lazy_user_flush;
/* Latency monitor */
long long latency_monitor_threshold;
dict *latency_events;
/* ACLs */
Adds pub/sub channel patterns to ACL (#7993) Fixes #7923. This PR appropriates the special `&` symbol (because `@` and `*` are taken), followed by a literal value or pattern for describing the Pub/Sub patterns that an ACL user can interact with. It is similar to the existing key patterns mechanism in function (additive) and implementation (copy-pasta). It also adds the allchannels and resetchannels ACL keywords, naturally. The default user is given allchannels permissions, whereas new users get whatever is defined by the acl-pubsub-default configuration directive. For backward compatibility in 6.2, the default of this directive is allchannels but this is likely to be changed to resetchannels in the next major version for stronger default security settings. Unless allchannels is set for the user, channel access permissions are checked as follows : * Calls to both PUBLISH and SUBSCRIBE will fail unless a pattern matching the argumentative channel name(s) exists for the user. * Calls to PSUBSCRIBE will fail unless the pattern(s) provided as an argument literally exist(s) in the user's list. Such failures are logged to the ACL log. Runtime changes to channel permissions for a user with existing subscribing clients cause said clients to disconnect unless the new permissions permit the connections to continue. Note, however, that PSUBSCRIBErs' patterns are matched literally, so given the change bar:* -> b*, pattern subscribers to bar:* will be disconnected. Notes/questions: * UNSUBSCRIBE, PUNSUBSCRIBE and PUBSUB remain unprotected due to lack of reasons for touching them.
2020-12-01 13:21:39 +01:00
char *acl_filename; /* ACL Users file. NULL if not configured. */
unsigned long acllog_max_len; /* Maximum length of the ACL LOG list. */
Adds pub/sub channel patterns to ACL (#7993) Fixes #7923. This PR appropriates the special `&` symbol (because `@` and `*` are taken), followed by a literal value or pattern for describing the Pub/Sub patterns that an ACL user can interact with. It is similar to the existing key patterns mechanism in function (additive) and implementation (copy-pasta). It also adds the allchannels and resetchannels ACL keywords, naturally. The default user is given allchannels permissions, whereas new users get whatever is defined by the acl-pubsub-default configuration directive. For backward compatibility in 6.2, the default of this directive is allchannels but this is likely to be changed to resetchannels in the next major version for stronger default security settings. Unless allchannels is set for the user, channel access permissions are checked as follows : * Calls to both PUBLISH and SUBSCRIBE will fail unless a pattern matching the argumentative channel name(s) exists for the user. * Calls to PSUBSCRIBE will fail unless the pattern(s) provided as an argument literally exist(s) in the user's list. Such failures are logged to the ACL log. Runtime changes to channel permissions for a user with existing subscribing clients cause said clients to disconnect unless the new permissions permit the connections to continue. Note, however, that PSUBSCRIBErs' patterns are matched literally, so given the change bar:* -> b*, pattern subscribers to bar:* will be disconnected. Notes/questions: * UNSUBSCRIBE, PUNSUBSCRIBE and PUBSUB remain unprotected due to lack of reasons for touching them.
2020-12-01 13:21:39 +01:00
sds requirepass; /* Remember the cleartext password set with
the old "requirepass" directive for
backward compatibility with Redis <= 5. */
2021-04-05 22:13:20 +02:00
int acl_pubsub_default; /* Default ACL pub/sub channels flag */
2013-01-16 18:00:20 +01:00
/* Assert & bug reporting */
2012-03-27 11:47:51 +02:00
int watchdog_period; /* Software watchdog period in ms. 0 = off */
/* System hardware info */
size_t system_memory_size; /* Total memory in system as reported by OS */
/* TLS Configuration */
int tls_cluster;
int tls_replication;
int tls_auth_clients;
redisTLSContextConfig tls_ctx_config;
/* cpu affinity */
char *server_cpulist; /* cpu affinity list of redis server main/io thread. */
char *bio_cpulist; /* cpu affinity list of bio thread. */
char *aof_rewrite_cpulist; /* cpu affinity list of aof rewrite process. */
char *bgsave_cpulist; /* cpu affinity list of bgsave process. */
/* Sentinel config */
struct sentinelConfig *sentinel_config; /* sentinel config to load at startup time. */
/* Coordinate failover info */
mstime_t failover_end_time; /* Deadline for failover command. */
int force_failover; /* If true then failover will be forced at the
* deadline, otherwise failover is aborted. */
char *target_replica_host; /* Failover target host. If null during a
* failover then any replica can be used. */
int target_replica_port; /* Failover target port */
int failover_state; /* Failover state */
int cluster_allow_pubsubshard_when_down; /* Is pubsubshard allowed when the cluster
is down, doesn't affect pubsub global. */
introduce dynamic client reply buffer size - save memory on idle clients (#9822) Current implementation simple idle client which serves no traffic still use ~17Kb of memory. this is mainly due to a fixed size reply buffer currently set to 16kb. We have encountered some cases in which the server operates in a low memory environments. In such cases a user who wishes to create large connection pools to support potential burst period, will exhaust a large amount of memory to maintain connected Idle clients. Some users may choose to "sacrifice" performance in order to save memory. This commit introduce a dynamic mechanism to shrink and expend the client reply buffer based on periodic observed peak. the algorithm works as follows: 1. each time a client reply buffer has been fully written, the last recorded peak is updated: new peak = MAX( last peak, current written size) 2. during clients cron we check for each client if the last observed peak was: a. matching the current buffer size - in which case we expend (resize) the buffer size by 100% b. less than half the buffer size - in which case we shrink the buffer size by 50% 3. In any case we will **not** resize the buffer in case: a. the current buffer peak is less then the current buffer usable size and higher than 1/2 the current buffer usable size b. the value of (current buffer usable size/2) is less than 1Kib c. the value of (current buffer usable size*2) is larger than 16Kib 4. the peak value is reset to the current buffer position once every **5** seconds. we maintain a new field in the client structure (buf_peak_last_reset_time) which is used to keep track of how long it passed since the last buffer peak reset. ### **Interface changes:** **CIENT LIST** - now contains 2 new extra fields: rbs= < the current size in bytes of the client reply buffer > rbp=< the current value in bytes of the last observed buffer peak position > **INFO STATS** - now contains 2 new statistics: reply_buffer_shrinks = < total number of buffer shrinks performed > reply_buffer_expends = < total number of buffer expends performed > Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Yoav Steinberg <yoav@redislabs.com>
2022-02-22 10:19:38 +01:00
long reply_buffer_peak_reset_time; /* The amount of time (in milliseconds) to wait between reply buffer peak resets */
int reply_buffer_resizing_enabled; /* Is reply buffer resizing enabled (1 by default) */
};
#define MAX_KEYS_BUFFER 256
typedef struct {
int pos; /* The position of the key within the client array */
int flags; /* The flags associated with the key access, see
CMD_KEY_* for more information */
} keyReference;
/* A result structure for the various getkeys function calls. It lists the
* keys as indices to the provided argv. This functionality is also re-used
* for returning channel information.
*/
typedef struct {
keyReference keysbuf[MAX_KEYS_BUFFER]; /* Pre-allocated buffer, to save heap allocations */
keyReference *keys; /* Key indices array, points to keysbuf or heap */
int numkeys; /* Number of key indices return */
int size; /* Available array size */
} getKeysResult;
#define GETKEYS_RESULT_INIT { {{0}}, NULL, 0, MAX_KEYS_BUFFER }
A better approach for COMMAND INFO for movablekeys commands (#8324) Fix #7297 The problem: Today, there is no way for a client library or app to know the key name indexes for commands such as ZUNIONSTORE/EVAL and others with "numkeys", since COMMAND INFO returns no useful info for them. For cluster-aware redis clients, this requires to 'patch' the client library code specifically for each of these commands or to resolve each execution of these commands with COMMAND GETKEYS. The solution: Introducing key specs other than the legacy "range" (first,last,step) The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates the command doesn't take any key arguments or may contain one or more key-specs, each one may leads to the discovery of 0 or more key arguments. A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will obviously remain unchanged. A client that supports this key-specs feature needs only to look at the key-specs array. If it finds an unrecognized spec, it must resort to using COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no need to use GETKEYS. Each spec is an array of arguments, first one is the spec name, the second is an array of flags, and the third is an array containing details about the spec (specific meaning for each spec type) The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write. clients should ignore any unfamiliar flags. In order to easily find the positions of keys in a given array of args we introduce keys specs. There are two logical steps of key specs: 1. `start_search`: Given an array of args, indicate where we should start searching for keys 2. `find_keys`: Given the output of start_search and an array of args, indicate all possible indices of keys. ### start_search step specs - `index`: specify an argument index explicitly - `index`: 0 based index (1 means the first command argument) - `keyword`: specify a string to match in `argv`. We should start searching for keys just after the keyword appears. - `keyword`: the string to search for - `start_search`: an index from which to start the keyword search (can be negative, which means to search from the end) Examples: - `SET` has start_search of type `index` with value `1` - `XREAD` has start_search of type `keyword` with value `[“STREAMS”,1]` - `MIGRATE` has start_search of type `keyword` with value `[“KEYS”,-2]` ### find_keys step specs - `range`: specify `[count, step, limit]`. - `lastkey`: index of the last key. relative to the index returned from begin_search. -1 indicating till the last argument, -2 one before the last - `step`: how many args should we skip after finding a key, in order to find the next one - `limit`: if count is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means ½ of the remaining args, 3 means ⅓, and so on. - “keynum”: specify `[keynum_index, first_key_index, step]`. - `keynum_index`: is relative to the return of the `start_search` spec. - `first_key_index`: is relative to `keynum_index`. - `step`: how many args should we skip after finding a key, in order to find the next one Examples: - `SET` has `range` of `[0,1,0]` - `MSET` has `range` of `[-1,2,0]` - `XREAD` has `range` of `[-1,1,2]` - `ZUNION` has `start_search` of type `index` with value `1` and `find_keys` of type `keynum` with value `[0,1,1]` - `AI.DAGRUN` has `start_search` of type `keyword` with value `[“LOAD“,1]` and `find_keys` of type `keynum` with value `[0,1,1]` (see https://oss.redislabs.com/redisai/master/commands/#aidagrun) Note: this solution is not perfect as the module writers can come up with anything, but at least we will be able to find the key args of the vast majority of commands. If one of the above specs can’t describe the key positions, the module writer can always fall back to the `getkeys-api` option. Some keys cannot be found easily (`KEYS` in `MIGRATE`: Imagine the argument for `AUTH` is the string “KEYS” - we will start searching in the wrong index). The guarantee is that the specs may be incomplete (`incomplete` will be specified in the spec to denote that) but we never report false information (assuming the command syntax is correct). For `MIGRATE` we start searching from the end - `startfrom=-1` - and if one of the keys is actually called "keys" we will report only a subset of all keys - hence the `incomplete` flag. Some `incomplete` specs can be completely empty (i.e. UNKNOWN begin_search) which should tell the client that COMMAND GETKEYS (or any other way to get the keys) must be used (Example: For `SORT` there is no way to describe the STORE keyword spec, as the word "store" can appear anywhere in the command). We will expose these key specs in the `COMMAND` command so that clients can learn, on startup, where the keys are for all commands instead of holding hardcoded tables or use `COMMAND GETKEYS` in runtime. Comments: 1. Redis doesn't internally use the new specs, they are only used for COMMAND output. 2. In order to support the current COMMAND INFO format (reply array indices 4, 5, 6) we created a synthetic range, called legacy_range, that, if possible, is built according to the new specs. 3. Redis currently uses only getkeys_proc or the legacy_range to get the keys indices (in COMMAND GETKEYS for example). "incomplete" specs: the command we have issues with are MIGRATE, STRALGO, and SORT for MIGRATE, because the token KEYS, if exists, must be the last token, we can search in reverse. it one of the keys is actually the string "keys" will return just a subset of the keys (hence, it's "incomplete") for SORT and STRALGO we can use this heuristic (the keys can be anywhere in the command) and therefore we added a key spec that is both "incomplete" and of "unknown type" if a client encounters an "incomplete" spec it means that it must find a different way (either COMMAND GETKEYS or have its own parser) to retrieve the keys. please note that all commands, apart from the three mentioned above, have "complete" key specs
2021-09-15 10:10:29 +02:00
/* Key specs definitions.
*
* Brief: This is a scheme that tries to describe the location
* of key arguments better than the old [first,last,step] scheme
* which is limited and doesn't fit many commands.
*
* There are two steps:
* 1. begin_search (BS): in which index should we start searching for keys?
A better approach for COMMAND INFO for movablekeys commands (#8324) Fix #7297 The problem: Today, there is no way for a client library or app to know the key name indexes for commands such as ZUNIONSTORE/EVAL and others with "numkeys", since COMMAND INFO returns no useful info for them. For cluster-aware redis clients, this requires to 'patch' the client library code specifically for each of these commands or to resolve each execution of these commands with COMMAND GETKEYS. The solution: Introducing key specs other than the legacy "range" (first,last,step) The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates the command doesn't take any key arguments or may contain one or more key-specs, each one may leads to the discovery of 0 or more key arguments. A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will obviously remain unchanged. A client that supports this key-specs feature needs only to look at the key-specs array. If it finds an unrecognized spec, it must resort to using COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no need to use GETKEYS. Each spec is an array of arguments, first one is the spec name, the second is an array of flags, and the third is an array containing details about the spec (specific meaning for each spec type) The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write. clients should ignore any unfamiliar flags. In order to easily find the positions of keys in a given array of args we introduce keys specs. There are two logical steps of key specs: 1. `start_search`: Given an array of args, indicate where we should start searching for keys 2. `find_keys`: Given the output of start_search and an array of args, indicate all possible indices of keys. ### start_search step specs - `index`: specify an argument index explicitly - `index`: 0 based index (1 means the first command argument) - `keyword`: specify a string to match in `argv`. We should start searching for keys just after the keyword appears. - `keyword`: the string to search for - `start_search`: an index from which to start the keyword search (can be negative, which means to search from the end) Examples: - `SET` has start_search of type `index` with value `1` - `XREAD` has start_search of type `keyword` with value `[“STREAMS”,1]` - `MIGRATE` has start_search of type `keyword` with value `[“KEYS”,-2]` ### find_keys step specs - `range`: specify `[count, step, limit]`. - `lastkey`: index of the last key. relative to the index returned from begin_search. -1 indicating till the last argument, -2 one before the last - `step`: how many args should we skip after finding a key, in order to find the next one - `limit`: if count is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means ½ of the remaining args, 3 means ⅓, and so on. - “keynum”: specify `[keynum_index, first_key_index, step]`. - `keynum_index`: is relative to the return of the `start_search` spec. - `first_key_index`: is relative to `keynum_index`. - `step`: how many args should we skip after finding a key, in order to find the next one Examples: - `SET` has `range` of `[0,1,0]` - `MSET` has `range` of `[-1,2,0]` - `XREAD` has `range` of `[-1,1,2]` - `ZUNION` has `start_search` of type `index` with value `1` and `find_keys` of type `keynum` with value `[0,1,1]` - `AI.DAGRUN` has `start_search` of type `keyword` with value `[“LOAD“,1]` and `find_keys` of type `keynum` with value `[0,1,1]` (see https://oss.redislabs.com/redisai/master/commands/#aidagrun) Note: this solution is not perfect as the module writers can come up with anything, but at least we will be able to find the key args of the vast majority of commands. If one of the above specs can’t describe the key positions, the module writer can always fall back to the `getkeys-api` option. Some keys cannot be found easily (`KEYS` in `MIGRATE`: Imagine the argument for `AUTH` is the string “KEYS” - we will start searching in the wrong index). The guarantee is that the specs may be incomplete (`incomplete` will be specified in the spec to denote that) but we never report false information (assuming the command syntax is correct). For `MIGRATE` we start searching from the end - `startfrom=-1` - and if one of the keys is actually called "keys" we will report only a subset of all keys - hence the `incomplete` flag. Some `incomplete` specs can be completely empty (i.e. UNKNOWN begin_search) which should tell the client that COMMAND GETKEYS (or any other way to get the keys) must be used (Example: For `SORT` there is no way to describe the STORE keyword spec, as the word "store" can appear anywhere in the command). We will expose these key specs in the `COMMAND` command so that clients can learn, on startup, where the keys are for all commands instead of holding hardcoded tables or use `COMMAND GETKEYS` in runtime. Comments: 1. Redis doesn't internally use the new specs, they are only used for COMMAND output. 2. In order to support the current COMMAND INFO format (reply array indices 4, 5, 6) we created a synthetic range, called legacy_range, that, if possible, is built according to the new specs. 3. Redis currently uses only getkeys_proc or the legacy_range to get the keys indices (in COMMAND GETKEYS for example). "incomplete" specs: the command we have issues with are MIGRATE, STRALGO, and SORT for MIGRATE, because the token KEYS, if exists, must be the last token, we can search in reverse. it one of the keys is actually the string "keys" will return just a subset of the keys (hence, it's "incomplete") for SORT and STRALGO we can use this heuristic (the keys can be anywhere in the command) and therefore we added a key spec that is both "incomplete" and of "unknown type" if a client encounters an "incomplete" spec it means that it must find a different way (either COMMAND GETKEYS or have its own parser) to retrieve the keys. please note that all commands, apart from the three mentioned above, have "complete" key specs
2021-09-15 10:10:29 +02:00
* 2. find_keys (FK): relative to the output of BS, how can we will which args are keys?
*
* There are two types of BS:
* 1. index: key args start at a constant index
* 2. keyword: key args start just after a specific keyword
*
* There are two kinds of FK:
* 1. range: keys end at a specific index (or relative to the last argument)
* 2. keynum: there's an arg that contains the number of key args somewhere before the keys themselves
*/
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
/* Must be synced with generate-command-code.py */
A better approach for COMMAND INFO for movablekeys commands (#8324) Fix #7297 The problem: Today, there is no way for a client library or app to know the key name indexes for commands such as ZUNIONSTORE/EVAL and others with "numkeys", since COMMAND INFO returns no useful info for them. For cluster-aware redis clients, this requires to 'patch' the client library code specifically for each of these commands or to resolve each execution of these commands with COMMAND GETKEYS. The solution: Introducing key specs other than the legacy "range" (first,last,step) The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates the command doesn't take any key arguments or may contain one or more key-specs, each one may leads to the discovery of 0 or more key arguments. A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will obviously remain unchanged. A client that supports this key-specs feature needs only to look at the key-specs array. If it finds an unrecognized spec, it must resort to using COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no need to use GETKEYS. Each spec is an array of arguments, first one is the spec name, the second is an array of flags, and the third is an array containing details about the spec (specific meaning for each spec type) The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write. clients should ignore any unfamiliar flags. In order to easily find the positions of keys in a given array of args we introduce keys specs. There are two logical steps of key specs: 1. `start_search`: Given an array of args, indicate where we should start searching for keys 2. `find_keys`: Given the output of start_search and an array of args, indicate all possible indices of keys. ### start_search step specs - `index`: specify an argument index explicitly - `index`: 0 based index (1 means the first command argument) - `keyword`: specify a string to match in `argv`. We should start searching for keys just after the keyword appears. - `keyword`: the string to search for - `start_search`: an index from which to start the keyword search (can be negative, which means to search from the end) Examples: - `SET` has start_search of type `index` with value `1` - `XREAD` has start_search of type `keyword` with value `[“STREAMS”,1]` - `MIGRATE` has start_search of type `keyword` with value `[“KEYS”,-2]` ### find_keys step specs - `range`: specify `[count, step, limit]`. - `lastkey`: index of the last key. relative to the index returned from begin_search. -1 indicating till the last argument, -2 one before the last - `step`: how many args should we skip after finding a key, in order to find the next one - `limit`: if count is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means ½ of the remaining args, 3 means ⅓, and so on. - “keynum”: specify `[keynum_index, first_key_index, step]`. - `keynum_index`: is relative to the return of the `start_search` spec. - `first_key_index`: is relative to `keynum_index`. - `step`: how many args should we skip after finding a key, in order to find the next one Examples: - `SET` has `range` of `[0,1,0]` - `MSET` has `range` of `[-1,2,0]` - `XREAD` has `range` of `[-1,1,2]` - `ZUNION` has `start_search` of type `index` with value `1` and `find_keys` of type `keynum` with value `[0,1,1]` - `AI.DAGRUN` has `start_search` of type `keyword` with value `[“LOAD“,1]` and `find_keys` of type `keynum` with value `[0,1,1]` (see https://oss.redislabs.com/redisai/master/commands/#aidagrun) Note: this solution is not perfect as the module writers can come up with anything, but at least we will be able to find the key args of the vast majority of commands. If one of the above specs can’t describe the key positions, the module writer can always fall back to the `getkeys-api` option. Some keys cannot be found easily (`KEYS` in `MIGRATE`: Imagine the argument for `AUTH` is the string “KEYS” - we will start searching in the wrong index). The guarantee is that the specs may be incomplete (`incomplete` will be specified in the spec to denote that) but we never report false information (assuming the command syntax is correct). For `MIGRATE` we start searching from the end - `startfrom=-1` - and if one of the keys is actually called "keys" we will report only a subset of all keys - hence the `incomplete` flag. Some `incomplete` specs can be completely empty (i.e. UNKNOWN begin_search) which should tell the client that COMMAND GETKEYS (or any other way to get the keys) must be used (Example: For `SORT` there is no way to describe the STORE keyword spec, as the word "store" can appear anywhere in the command). We will expose these key specs in the `COMMAND` command so that clients can learn, on startup, where the keys are for all commands instead of holding hardcoded tables or use `COMMAND GETKEYS` in runtime. Comments: 1. Redis doesn't internally use the new specs, they are only used for COMMAND output. 2. In order to support the current COMMAND INFO format (reply array indices 4, 5, 6) we created a synthetic range, called legacy_range, that, if possible, is built according to the new specs. 3. Redis currently uses only getkeys_proc or the legacy_range to get the keys indices (in COMMAND GETKEYS for example). "incomplete" specs: the command we have issues with are MIGRATE, STRALGO, and SORT for MIGRATE, because the token KEYS, if exists, must be the last token, we can search in reverse. it one of the keys is actually the string "keys" will return just a subset of the keys (hence, it's "incomplete") for SORT and STRALGO we can use this heuristic (the keys can be anywhere in the command) and therefore we added a key spec that is both "incomplete" and of "unknown type" if a client encounters an "incomplete" spec it means that it must find a different way (either COMMAND GETKEYS or have its own parser) to retrieve the keys. please note that all commands, apart from the three mentioned above, have "complete" key specs
2021-09-15 10:10:29 +02:00
typedef enum {
KSPEC_BS_INVALID = 0, /* Must be 0 */
KSPEC_BS_UNKNOWN,
KSPEC_BS_INDEX,
KSPEC_BS_KEYWORD
} kspec_bs_type;
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
/* Must be synced with generate-command-code.py */
A better approach for COMMAND INFO for movablekeys commands (#8324) Fix #7297 The problem: Today, there is no way for a client library or app to know the key name indexes for commands such as ZUNIONSTORE/EVAL and others with "numkeys", since COMMAND INFO returns no useful info for them. For cluster-aware redis clients, this requires to 'patch' the client library code specifically for each of these commands or to resolve each execution of these commands with COMMAND GETKEYS. The solution: Introducing key specs other than the legacy "range" (first,last,step) The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates the command doesn't take any key arguments or may contain one or more key-specs, each one may leads to the discovery of 0 or more key arguments. A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will obviously remain unchanged. A client that supports this key-specs feature needs only to look at the key-specs array. If it finds an unrecognized spec, it must resort to using COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no need to use GETKEYS. Each spec is an array of arguments, first one is the spec name, the second is an array of flags, and the third is an array containing details about the spec (specific meaning for each spec type) The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write. clients should ignore any unfamiliar flags. In order to easily find the positions of keys in a given array of args we introduce keys specs. There are two logical steps of key specs: 1. `start_search`: Given an array of args, indicate where we should start searching for keys 2. `find_keys`: Given the output of start_search and an array of args, indicate all possible indices of keys. ### start_search step specs - `index`: specify an argument index explicitly - `index`: 0 based index (1 means the first command argument) - `keyword`: specify a string to match in `argv`. We should start searching for keys just after the keyword appears. - `keyword`: the string to search for - `start_search`: an index from which to start the keyword search (can be negative, which means to search from the end) Examples: - `SET` has start_search of type `index` with value `1` - `XREAD` has start_search of type `keyword` with value `[“STREAMS”,1]` - `MIGRATE` has start_search of type `keyword` with value `[“KEYS”,-2]` ### find_keys step specs - `range`: specify `[count, step, limit]`. - `lastkey`: index of the last key. relative to the index returned from begin_search. -1 indicating till the last argument, -2 one before the last - `step`: how many args should we skip after finding a key, in order to find the next one - `limit`: if count is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means ½ of the remaining args, 3 means ⅓, and so on. - “keynum”: specify `[keynum_index, first_key_index, step]`. - `keynum_index`: is relative to the return of the `start_search` spec. - `first_key_index`: is relative to `keynum_index`. - `step`: how many args should we skip after finding a key, in order to find the next one Examples: - `SET` has `range` of `[0,1,0]` - `MSET` has `range` of `[-1,2,0]` - `XREAD` has `range` of `[-1,1,2]` - `ZUNION` has `start_search` of type `index` with value `1` and `find_keys` of type `keynum` with value `[0,1,1]` - `AI.DAGRUN` has `start_search` of type `keyword` with value `[“LOAD“,1]` and `find_keys` of type `keynum` with value `[0,1,1]` (see https://oss.redislabs.com/redisai/master/commands/#aidagrun) Note: this solution is not perfect as the module writers can come up with anything, but at least we will be able to find the key args of the vast majority of commands. If one of the above specs can’t describe the key positions, the module writer can always fall back to the `getkeys-api` option. Some keys cannot be found easily (`KEYS` in `MIGRATE`: Imagine the argument for `AUTH` is the string “KEYS” - we will start searching in the wrong index). The guarantee is that the specs may be incomplete (`incomplete` will be specified in the spec to denote that) but we never report false information (assuming the command syntax is correct). For `MIGRATE` we start searching from the end - `startfrom=-1` - and if one of the keys is actually called "keys" we will report only a subset of all keys - hence the `incomplete` flag. Some `incomplete` specs can be completely empty (i.e. UNKNOWN begin_search) which should tell the client that COMMAND GETKEYS (or any other way to get the keys) must be used (Example: For `SORT` there is no way to describe the STORE keyword spec, as the word "store" can appear anywhere in the command). We will expose these key specs in the `COMMAND` command so that clients can learn, on startup, where the keys are for all commands instead of holding hardcoded tables or use `COMMAND GETKEYS` in runtime. Comments: 1. Redis doesn't internally use the new specs, they are only used for COMMAND output. 2. In order to support the current COMMAND INFO format (reply array indices 4, 5, 6) we created a synthetic range, called legacy_range, that, if possible, is built according to the new specs. 3. Redis currently uses only getkeys_proc or the legacy_range to get the keys indices (in COMMAND GETKEYS for example). "incomplete" specs: the command we have issues with are MIGRATE, STRALGO, and SORT for MIGRATE, because the token KEYS, if exists, must be the last token, we can search in reverse. it one of the keys is actually the string "keys" will return just a subset of the keys (hence, it's "incomplete") for SORT and STRALGO we can use this heuristic (the keys can be anywhere in the command) and therefore we added a key spec that is both "incomplete" and of "unknown type" if a client encounters an "incomplete" spec it means that it must find a different way (either COMMAND GETKEYS or have its own parser) to retrieve the keys. please note that all commands, apart from the three mentioned above, have "complete" key specs
2021-09-15 10:10:29 +02:00
typedef enum {
KSPEC_FK_INVALID = 0, /* Must be 0 */
KSPEC_FK_UNKNOWN,
KSPEC_FK_RANGE,
KSPEC_FK_KEYNUM
} kspec_fk_type;
typedef struct {
/* Declarative data */
const char *notes;
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
uint64_t flags;
A better approach for COMMAND INFO for movablekeys commands (#8324) Fix #7297 The problem: Today, there is no way for a client library or app to know the key name indexes for commands such as ZUNIONSTORE/EVAL and others with "numkeys", since COMMAND INFO returns no useful info for them. For cluster-aware redis clients, this requires to 'patch' the client library code specifically for each of these commands or to resolve each execution of these commands with COMMAND GETKEYS. The solution: Introducing key specs other than the legacy "range" (first,last,step) The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates the command doesn't take any key arguments or may contain one or more key-specs, each one may leads to the discovery of 0 or more key arguments. A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will obviously remain unchanged. A client that supports this key-specs feature needs only to look at the key-specs array. If it finds an unrecognized spec, it must resort to using COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no need to use GETKEYS. Each spec is an array of arguments, first one is the spec name, the second is an array of flags, and the third is an array containing details about the spec (specific meaning for each spec type) The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write. clients should ignore any unfamiliar flags. In order to easily find the positions of keys in a given array of args we introduce keys specs. There are two logical steps of key specs: 1. `start_search`: Given an array of args, indicate where we should start searching for keys 2. `find_keys`: Given the output of start_search and an array of args, indicate all possible indices of keys. ### start_search step specs - `index`: specify an argument index explicitly - `index`: 0 based index (1 means the first command argument) - `keyword`: specify a string to match in `argv`. We should start searching for keys just after the keyword appears. - `keyword`: the string to search for - `start_search`: an index from which to start the keyword search (can be negative, which means to search from the end) Examples: - `SET` has start_search of type `index` with value `1` - `XREAD` has start_search of type `keyword` with value `[“STREAMS”,1]` - `MIGRATE` has start_search of type `keyword` with value `[“KEYS”,-2]` ### find_keys step specs - `range`: specify `[count, step, limit]`. - `lastkey`: index of the last key. relative to the index returned from begin_search. -1 indicating till the last argument, -2 one before the last - `step`: how many args should we skip after finding a key, in order to find the next one - `limit`: if count is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means ½ of the remaining args, 3 means ⅓, and so on. - “keynum”: specify `[keynum_index, first_key_index, step]`. - `keynum_index`: is relative to the return of the `start_search` spec. - `first_key_index`: is relative to `keynum_index`. - `step`: how many args should we skip after finding a key, in order to find the next one Examples: - `SET` has `range` of `[0,1,0]` - `MSET` has `range` of `[-1,2,0]` - `XREAD` has `range` of `[-1,1,2]` - `ZUNION` has `start_search` of type `index` with value `1` and `find_keys` of type `keynum` with value `[0,1,1]` - `AI.DAGRUN` has `start_search` of type `keyword` with value `[“LOAD“,1]` and `find_keys` of type `keynum` with value `[0,1,1]` (see https://oss.redislabs.com/redisai/master/commands/#aidagrun) Note: this solution is not perfect as the module writers can come up with anything, but at least we will be able to find the key args of the vast majority of commands. If one of the above specs can’t describe the key positions, the module writer can always fall back to the `getkeys-api` option. Some keys cannot be found easily (`KEYS` in `MIGRATE`: Imagine the argument for `AUTH` is the string “KEYS” - we will start searching in the wrong index). The guarantee is that the specs may be incomplete (`incomplete` will be specified in the spec to denote that) but we never report false information (assuming the command syntax is correct). For `MIGRATE` we start searching from the end - `startfrom=-1` - and if one of the keys is actually called "keys" we will report only a subset of all keys - hence the `incomplete` flag. Some `incomplete` specs can be completely empty (i.e. UNKNOWN begin_search) which should tell the client that COMMAND GETKEYS (or any other way to get the keys) must be used (Example: For `SORT` there is no way to describe the STORE keyword spec, as the word "store" can appear anywhere in the command). We will expose these key specs in the `COMMAND` command so that clients can learn, on startup, where the keys are for all commands instead of holding hardcoded tables or use `COMMAND GETKEYS` in runtime. Comments: 1. Redis doesn't internally use the new specs, they are only used for COMMAND output. 2. In order to support the current COMMAND INFO format (reply array indices 4, 5, 6) we created a synthetic range, called legacy_range, that, if possible, is built according to the new specs. 3. Redis currently uses only getkeys_proc or the legacy_range to get the keys indices (in COMMAND GETKEYS for example). "incomplete" specs: the command we have issues with are MIGRATE, STRALGO, and SORT for MIGRATE, because the token KEYS, if exists, must be the last token, we can search in reverse. it one of the keys is actually the string "keys" will return just a subset of the keys (hence, it's "incomplete") for SORT and STRALGO we can use this heuristic (the keys can be anywhere in the command) and therefore we added a key spec that is both "incomplete" and of "unknown type" if a client encounters an "incomplete" spec it means that it must find a different way (either COMMAND GETKEYS or have its own parser) to retrieve the keys. please note that all commands, apart from the three mentioned above, have "complete" key specs
2021-09-15 10:10:29 +02:00
kspec_bs_type begin_search_type;
union {
struct {
/* The index from which we start the search for keys */
int pos;
} index;
struct {
/* The keyword that indicates the beginning of key args */
const char *keyword;
/* An index in argv from which to start searching.
* Can be negative, which means start search from the end, in reverse
* (Example: -2 means to start in reverse from the penultimate arg) */
A better approach for COMMAND INFO for movablekeys commands (#8324) Fix #7297 The problem: Today, there is no way for a client library or app to know the key name indexes for commands such as ZUNIONSTORE/EVAL and others with "numkeys", since COMMAND INFO returns no useful info for them. For cluster-aware redis clients, this requires to 'patch' the client library code specifically for each of these commands or to resolve each execution of these commands with COMMAND GETKEYS. The solution: Introducing key specs other than the legacy "range" (first,last,step) The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates the command doesn't take any key arguments or may contain one or more key-specs, each one may leads to the discovery of 0 or more key arguments. A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will obviously remain unchanged. A client that supports this key-specs feature needs only to look at the key-specs array. If it finds an unrecognized spec, it must resort to using COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no need to use GETKEYS. Each spec is an array of arguments, first one is the spec name, the second is an array of flags, and the third is an array containing details about the spec (specific meaning for each spec type) The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write. clients should ignore any unfamiliar flags. In order to easily find the positions of keys in a given array of args we introduce keys specs. There are two logical steps of key specs: 1. `start_search`: Given an array of args, indicate where we should start searching for keys 2. `find_keys`: Given the output of start_search and an array of args, indicate all possible indices of keys. ### start_search step specs - `index`: specify an argument index explicitly - `index`: 0 based index (1 means the first command argument) - `keyword`: specify a string to match in `argv`. We should start searching for keys just after the keyword appears. - `keyword`: the string to search for - `start_search`: an index from which to start the keyword search (can be negative, which means to search from the end) Examples: - `SET` has start_search of type `index` with value `1` - `XREAD` has start_search of type `keyword` with value `[“STREAMS”,1]` - `MIGRATE` has start_search of type `keyword` with value `[“KEYS”,-2]` ### find_keys step specs - `range`: specify `[count, step, limit]`. - `lastkey`: index of the last key. relative to the index returned from begin_search. -1 indicating till the last argument, -2 one before the last - `step`: how many args should we skip after finding a key, in order to find the next one - `limit`: if count is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means ½ of the remaining args, 3 means ⅓, and so on. - “keynum”: specify `[keynum_index, first_key_index, step]`. - `keynum_index`: is relative to the return of the `start_search` spec. - `first_key_index`: is relative to `keynum_index`. - `step`: how many args should we skip after finding a key, in order to find the next one Examples: - `SET` has `range` of `[0,1,0]` - `MSET` has `range` of `[-1,2,0]` - `XREAD` has `range` of `[-1,1,2]` - `ZUNION` has `start_search` of type `index` with value `1` and `find_keys` of type `keynum` with value `[0,1,1]` - `AI.DAGRUN` has `start_search` of type `keyword` with value `[“LOAD“,1]` and `find_keys` of type `keynum` with value `[0,1,1]` (see https://oss.redislabs.com/redisai/master/commands/#aidagrun) Note: this solution is not perfect as the module writers can come up with anything, but at least we will be able to find the key args of the vast majority of commands. If one of the above specs can’t describe the key positions, the module writer can always fall back to the `getkeys-api` option. Some keys cannot be found easily (`KEYS` in `MIGRATE`: Imagine the argument for `AUTH` is the string “KEYS” - we will start searching in the wrong index). The guarantee is that the specs may be incomplete (`incomplete` will be specified in the spec to denote that) but we never report false information (assuming the command syntax is correct). For `MIGRATE` we start searching from the end - `startfrom=-1` - and if one of the keys is actually called "keys" we will report only a subset of all keys - hence the `incomplete` flag. Some `incomplete` specs can be completely empty (i.e. UNKNOWN begin_search) which should tell the client that COMMAND GETKEYS (or any other way to get the keys) must be used (Example: For `SORT` there is no way to describe the STORE keyword spec, as the word "store" can appear anywhere in the command). We will expose these key specs in the `COMMAND` command so that clients can learn, on startup, where the keys are for all commands instead of holding hardcoded tables or use `COMMAND GETKEYS` in runtime. Comments: 1. Redis doesn't internally use the new specs, they are only used for COMMAND output. 2. In order to support the current COMMAND INFO format (reply array indices 4, 5, 6) we created a synthetic range, called legacy_range, that, if possible, is built according to the new specs. 3. Redis currently uses only getkeys_proc or the legacy_range to get the keys indices (in COMMAND GETKEYS for example). "incomplete" specs: the command we have issues with are MIGRATE, STRALGO, and SORT for MIGRATE, because the token KEYS, if exists, must be the last token, we can search in reverse. it one of the keys is actually the string "keys" will return just a subset of the keys (hence, it's "incomplete") for SORT and STRALGO we can use this heuristic (the keys can be anywhere in the command) and therefore we added a key spec that is both "incomplete" and of "unknown type" if a client encounters an "incomplete" spec it means that it must find a different way (either COMMAND GETKEYS or have its own parser) to retrieve the keys. please note that all commands, apart from the three mentioned above, have "complete" key specs
2021-09-15 10:10:29 +02:00
int startfrom;
} keyword;
} bs;
kspec_fk_type find_keys_type;
union {
/* NOTE: Indices in this struct are relative to the result of the begin_search step!
* These are: range.lastkey, keynum.keynumidx, keynum.firstkey */
struct {
/* Index of the last key.
* Can be negative, in which case it's not relative. -1 indicating till the last argument,
* -2 one before the last and so on. */
int lastkey;
/* How many args should we skip after finding a key, in order to find the next one. */
int keystep;
/* If lastkey is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit.
* 2 means 1/2 of the remaining args, 3 means 1/3, and so on. */
int limit;
} range;
struct {
/* Index of the argument containing the number of keys to come */
int keynumidx;
/* Index of the fist key (Usually it's just after keynumidx, in
* which case it should be set to keynumidx+1). */
int firstkey;
/* How many args should we skip after finding a key, in order to find the next one. */
int keystep;
} keynum;
} fk;
} keySpec;
/* Number of static key specs */
#define STATIC_KEY_SPECS_NUM 4
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
/* Must be synced with ARG_TYPE_STR and generate-command-code.py */
typedef enum {
ARG_TYPE_STRING,
ARG_TYPE_INTEGER,
ARG_TYPE_DOUBLE,
ARG_TYPE_KEY, /* A string, but represents a keyname */
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
ARG_TYPE_PATTERN,
ARG_TYPE_UNIX_TIME,
ARG_TYPE_PURE_TOKEN,
ARG_TYPE_ONEOF, /* Has subargs */
ARG_TYPE_BLOCK /* Has subargs */
} redisCommandArgType;
#define CMD_ARG_NONE (0)
#define CMD_ARG_OPTIONAL (1<<0)
#define CMD_ARG_MULTIPLE (1<<1)
#define CMD_ARG_MULTIPLE_TOKEN (1<<2)
typedef struct redisCommandArg {
const char *name;
redisCommandArgType type;
int key_spec_index;
const char *token;
const char *summary;
const char *since;
int flags;
const char *deprecated_since;
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
struct redisCommandArg *subargs;
Move doc metadata from COMMAND to COMMAND DOCS (#10056) Syntax: `COMMAND DOCS [<command name> ...]` Background: Apparently old version of hiredis (and thus also redis-cli) can't support more than 7 levels of multi-bulk nesting. The solution is to move all the doc related metadata from COMMAND to a new COMMAND DOCS sub-command. The new DOCS sub-command returns a map of commands (not an array like in COMMAND), And the same goes for the `subcommands` field inside it (also contains a map) Besides that, the remaining new fields of COMMAND (hints, key-specs, and sub-commands), are placed in the outer array rather than a nested map. this was done mainly for consistency with the old format. Other changes: --- * Allow COMMAND INFO with no arguments, which returns all commands, so that we can some day deprecated the plain COMMAND (no args) * Reduce the amount of deferred replies from both COMMAND and COMMAND DOCS, especially in the inner loops, since these create many small reply objects, which lead to many small write syscalls and many small TCP packets. To make this easier, when populating the command table, we count the history, args, and hints so we later know their size in advance. Additionally, the movablekeys flag was moved into the flags register. * Update generate-commands-json.py to take the data from both command, it now executes redis-cli directly, instead of taking input from stdin. * Sub-commands in both COMMAND (and COMMAND INFO), and also COMMAND DOCS, show their full name. i.e. CONFIG * GET will be shown as `config|get` rather than just `get`. This will be visible both when asking for `COMMAND INFO config` and COMMAND INFO config|get`, but is especially important for the later. i.e. imagine someone doing `COMMAND INFO slowlog|get config|get` not being able to distinguish between the two items in the array response.
2022-01-11 16:16:16 +01:00
/* runtime populated data */
int num_args;
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
} redisCommandArg;
/* Must be synced with RESP2_TYPE_STR and generate-command-code.py */
typedef enum {
RESP2_SIMPLE_STRING,
RESP2_ERROR,
RESP2_INTEGER,
RESP2_BULK_STRING,
RESP2_NULL_BULK_STRING,
RESP2_ARRAY,
RESP2_NULL_ARRAY,
} redisCommandRESP2Type;
/* Must be synced with RESP3_TYPE_STR and generate-command-code.py */
typedef enum {
RESP3_SIMPLE_STRING,
RESP3_ERROR,
RESP3_INTEGER,
RESP3_DOUBLE,
RESP3_BULK_STRING,
RESP3_ARRAY,
RESP3_MAP,
RESP3_SET,
RESP3_BOOL,
RESP3_NULL,
} redisCommandRESP3Type;
typedef struct {
const char *since;
const char *changes;
} commandHistory;
/* Must be synced with COMMAND_GROUP_STR and generate-command-code.py */
typedef enum {
COMMAND_GROUP_GENERIC,
COMMAND_GROUP_STRING,
COMMAND_GROUP_LIST,
COMMAND_GROUP_SET,
COMMAND_GROUP_SORTED_SET,
COMMAND_GROUP_HASH,
COMMAND_GROUP_PUBSUB,
COMMAND_GROUP_TRANSACTIONS,
COMMAND_GROUP_CONNECTION,
COMMAND_GROUP_SERVER,
COMMAND_GROUP_SCRIPTING,
COMMAND_GROUP_HYPERLOGLOG,
COMMAND_GROUP_CLUSTER,
COMMAND_GROUP_SENTINEL,
COMMAND_GROUP_GEO,
COMMAND_GROUP_STREAM,
COMMAND_GROUP_BITMAP,
COMMAND_GROUP_MODULE,
} redisCommandGroup;
typedef void redisCommandProc(client *c);
typedef int redisGetKeysProc(struct redisCommand *cmd, robj **argv, int argc, getKeysResult *result);
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
/* Redis command structure.
*
* Note that the command table is in commands.c and it is auto-generated.
*
* This is the meaning of the flags:
*
* CMD_WRITE: Write command (may modify the key space).
*
* CMD_READONLY: Commands just reading from keys without changing the content.
* Note that commands that don't read from the keyspace such as
* TIME, SELECT, INFO, administrative commands, and connection
* or transaction related commands (multi, exec, discard, ...)
* are not flagged as read-only commands, since they affect the
* server or the connection in other ways.
*
* CMD_DENYOOM: May increase memory usage once called. Don't allow if out
* of memory.
*
* CMD_ADMIN: Administrative command, like SAVE or SHUTDOWN.
*
* CMD_PUBSUB: Pub/Sub related command.
*
* CMD_NOSCRIPT: Command not allowed in scripts.
*
Add command tips to COMMAND DOCS (#10104) Adding command tips (see https://redis.io/topics/command-tips) to commands. Breaking changes: 1. Removed the "random" and "sort_for_script" flags. They are now command tips. (this isn't affecting redis behavior since #9812, but could affect some client applications that's relying on COMMAND command flags) Summary of changes: 1. add BLOCKING flag (new flag) for all commands that could block. The ACL category with the same name is now implicit. 2. move RANDOM flag to a `nondeterministic_output` tip 3. move SORT_FOR_SCRIPT flag to `nondeterministic_output_order` tip 3. add REQUEST_POLICY and RESPONSE_POLICY where appropriate as documented in the tips 4. deprecate (ignore) the `random` flag for RM_CreateCommand Other notes: 1. Proxies need to send `RANDOMKEY` to all shards and then select one key randomly. The other option is to pick a random shard and transfer `RANDOMKEY `to it, but that scheme fails if this specific shard is empty 2. Remove CMD_RANDOM from `XACK` (i.e. XACK does not have RANDOM_OUTPUT) It was added in 9e4fb96ca12476b1c7468b143efca86b478bfb4a, I guess by mistake. Also from `(P)EXPIRETIME` (new command, was flagged "random" by mistake). 3. Add `nondeterministic_output` to `OBJECT ENCODING` (for the same reason `XTRIM` has it: the reply may differ depending on the internal representation in memory) 4. RANDOM on `HGETALL` was wrong (there due to a limitation of the old script sorting logic), now it's `nondeterministic_output_order` 5. Unrelated: Hide CMD_PROTECTED from COMMAND
2022-01-20 10:32:11 +01:00
* CMD_BLOCKING: The command has the potential to block the client.
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
*
* CMD_LOADING: Allow the command while loading the database.
*
Allow most CONFIG SET during loading, block some commands in async-loading (#9878) ## background Till now CONFIG SET was blocked during loading. (In the not so distant past, GET was disallowed too) We recently (not released yet) added an async-loading mode, see #9323, and during that time it'll serve CONFIG SET and any other command. And now we realized (#9770) that some configs, and commands are dangerous during async-loading. ## changes * Allow most CONFIG SET during loading (both on async-loading and normal loading) * Allow CONFIG REWRITE and CONFIG RESETSTAT during loading * Block a few config during loading (`appendonly`, `repl-diskless-load`, and `dir`) * Block a few commands during loading (list below) ## the blocked commands: * SAVE - obviously we don't wanna start a foregreound save during loading 8-) * BGSAVE - we don't mind to schedule one, but we don't wanna fork now * BGREWRITEAOF - we don't mind to schedule one, but we don't wanna fork now * MODULE - we obviously don't wanna unload a module during replication / rdb loading (MODULE HELP and MODULE LIST are not blocked) * SYNC / PSYNC - we're in the middle of RDB loading from master, must not allow sync requests now. * REPLICAOF / SLAVEOF - we're in the middle of replicating, maybe it makes sense to let the user abort it, but he couldn't do that so far, i don't wanna take any risk of bugs due to odd state. * CLUSTER - only allow [HELP, SLOTS, NODES, INFO, MYID, LINKS, KEYSLOT, COUNTKEYSINSLOT, GETKEYSINSLOT, RESET, REPLICAS, COUNT_FAILURE_REPORTS], for others, preserve the status quo ## other fixes * processEventsWhileBlocked had an issue when being nested, this could happen with a busy script during async loading (new), but also in a busy script during AOF loading (old). this lead to a crash in the scenario described in #6988
2021-12-22 13:11:16 +01:00
* CMD_NO_ASYNC_LOADING: Deny during async loading (when a replica uses diskless
* sync swapdb, and allows access to the old dataset)
Allow most CONFIG SET during loading, block some commands in async-loading (#9878) ## background Till now CONFIG SET was blocked during loading. (In the not so distant past, GET was disallowed too) We recently (not released yet) added an async-loading mode, see #9323, and during that time it'll serve CONFIG SET and any other command. And now we realized (#9770) that some configs, and commands are dangerous during async-loading. ## changes * Allow most CONFIG SET during loading (both on async-loading and normal loading) * Allow CONFIG REWRITE and CONFIG RESETSTAT during loading * Block a few config during loading (`appendonly`, `repl-diskless-load`, and `dir`) * Block a few commands during loading (list below) ## the blocked commands: * SAVE - obviously we don't wanna start a foregreound save during loading 8-) * BGSAVE - we don't mind to schedule one, but we don't wanna fork now * BGREWRITEAOF - we don't mind to schedule one, but we don't wanna fork now * MODULE - we obviously don't wanna unload a module during replication / rdb loading (MODULE HELP and MODULE LIST are not blocked) * SYNC / PSYNC - we're in the middle of RDB loading from master, must not allow sync requests now. * REPLICAOF / SLAVEOF - we're in the middle of replicating, maybe it makes sense to let the user abort it, but he couldn't do that so far, i don't wanna take any risk of bugs due to odd state. * CLUSTER - only allow [HELP, SLOTS, NODES, INFO, MYID, LINKS, KEYSLOT, COUNTKEYSINSLOT, GETKEYSINSLOT, RESET, REPLICAS, COUNT_FAILURE_REPORTS], for others, preserve the status quo ## other fixes * processEventsWhileBlocked had an issue when being nested, this could happen with a busy script during async loading (new), but also in a busy script during AOF loading (old). this lead to a crash in the scenario described in #6988
2021-12-22 13:11:16 +01:00
*
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
* CMD_STALE: Allow the command while a slave has stale data but is not
* allowed to serve this data. Normally no command is accepted
* in this condition but just a few.
*
* CMD_SKIP_MONITOR: Do not automatically propagate the command on MONITOR.
*
* CMD_SKIP_SLOWLOG: Do not automatically propagate the command to the slowlog.
*
* CMD_ASKING: Perform an implicit ASKING for this command, so the
* command will be accepted in cluster mode if the slot is marked
* as 'importing'.
*
* CMD_FAST: Fast command: O(1) or O(log(N)) command that should never
* delay its execution as long as the kernel scheduler is giving
* us time. Note that commands that may trigger a DEL as a side
* effect (like SET) are not fast commands.
*
* CMD_NO_AUTH: Command doesn't require authentication
*
* CMD_MAY_REPLICATE: Command may produce replication traffic, but should be
* allowed under circumstances where write commands are disallowed.
* Examples include PUBLISH, which replicates pubsub messages,and
* EVAL, which may execute write commands, which are replicated,
* or may just execute read commands. A command can not be marked
* both CMD_WRITE and CMD_MAY_REPLICATE
*
* CMD_SENTINEL: This command is present in sentinel mode too.
*
* CMD_SENTINEL_ONLY: This command is present only when in sentinel mode.
*
* CMD_NO_MANDATORY_KEYS: This key arguments for this command are optional.
*
* CMD_NO_MULTI: The command is nt allowed inside a transaction
*
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
* The following additional flags are only used in order to put commands
* in a specific ACL category. Commands can have multiple ACL categories.
* See redis.conf for the exact meaning of each.
*
* @keyspace, @read, @write, @set, @sortedset, @list, @hash, @string, @bitmap,
* @hyperloglog, @stream, @admin, @fast, @slow, @pubsub, @blocking, @dangerous,
* @connection, @transaction, @scripting, @geo.
*
* Note that:
*
* 1) The read-only flag implies the @read ACL category.
* 2) The write flag implies the @write ACL category.
* 3) The fast flag implies the @fast ACL category.
* 4) The admin flag implies the @admin and @dangerous ACL category.
* 5) The pub-sub flag implies the @pubsub ACL category.
* 6) The lack of fast flag implies the @slow ACL category.
* 7) The non obvious "keyspace" category includes the commands
* that interact with keys without having anything to do with
* specific data structures, such as: DEL, RENAME, MOVE, SELECT,
* TYPE, EXPIRE*, PEXPIRE*, TTL, PTTL, ...
*/
struct redisCommand {
A better approach for COMMAND INFO for movablekeys commands (#8324) Fix #7297 The problem: Today, there is no way for a client library or app to know the key name indexes for commands such as ZUNIONSTORE/EVAL and others with "numkeys", since COMMAND INFO returns no useful info for them. For cluster-aware redis clients, this requires to 'patch' the client library code specifically for each of these commands or to resolve each execution of these commands with COMMAND GETKEYS. The solution: Introducing key specs other than the legacy "range" (first,last,step) The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates the command doesn't take any key arguments or may contain one or more key-specs, each one may leads to the discovery of 0 or more key arguments. A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will obviously remain unchanged. A client that supports this key-specs feature needs only to look at the key-specs array. If it finds an unrecognized spec, it must resort to using COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no need to use GETKEYS. Each spec is an array of arguments, first one is the spec name, the second is an array of flags, and the third is an array containing details about the spec (specific meaning for each spec type) The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write. clients should ignore any unfamiliar flags. In order to easily find the positions of keys in a given array of args we introduce keys specs. There are two logical steps of key specs: 1. `start_search`: Given an array of args, indicate where we should start searching for keys 2. `find_keys`: Given the output of start_search and an array of args, indicate all possible indices of keys. ### start_search step specs - `index`: specify an argument index explicitly - `index`: 0 based index (1 means the first command argument) - `keyword`: specify a string to match in `argv`. We should start searching for keys just after the keyword appears. - `keyword`: the string to search for - `start_search`: an index from which to start the keyword search (can be negative, which means to search from the end) Examples: - `SET` has start_search of type `index` with value `1` - `XREAD` has start_search of type `keyword` with value `[“STREAMS”,1]` - `MIGRATE` has start_search of type `keyword` with value `[“KEYS”,-2]` ### find_keys step specs - `range`: specify `[count, step, limit]`. - `lastkey`: index of the last key. relative to the index returned from begin_search. -1 indicating till the last argument, -2 one before the last - `step`: how many args should we skip after finding a key, in order to find the next one - `limit`: if count is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means ½ of the remaining args, 3 means ⅓, and so on. - “keynum”: specify `[keynum_index, first_key_index, step]`. - `keynum_index`: is relative to the return of the `start_search` spec. - `first_key_index`: is relative to `keynum_index`. - `step`: how many args should we skip after finding a key, in order to find the next one Examples: - `SET` has `range` of `[0,1,0]` - `MSET` has `range` of `[-1,2,0]` - `XREAD` has `range` of `[-1,1,2]` - `ZUNION` has `start_search` of type `index` with value `1` and `find_keys` of type `keynum` with value `[0,1,1]` - `AI.DAGRUN` has `start_search` of type `keyword` with value `[“LOAD“,1]` and `find_keys` of type `keynum` with value `[0,1,1]` (see https://oss.redislabs.com/redisai/master/commands/#aidagrun) Note: this solution is not perfect as the module writers can come up with anything, but at least we will be able to find the key args of the vast majority of commands. If one of the above specs can’t describe the key positions, the module writer can always fall back to the `getkeys-api` option. Some keys cannot be found easily (`KEYS` in `MIGRATE`: Imagine the argument for `AUTH` is the string “KEYS” - we will start searching in the wrong index). The guarantee is that the specs may be incomplete (`incomplete` will be specified in the spec to denote that) but we never report false information (assuming the command syntax is correct). For `MIGRATE` we start searching from the end - `startfrom=-1` - and if one of the keys is actually called "keys" we will report only a subset of all keys - hence the `incomplete` flag. Some `incomplete` specs can be completely empty (i.e. UNKNOWN begin_search) which should tell the client that COMMAND GETKEYS (or any other way to get the keys) must be used (Example: For `SORT` there is no way to describe the STORE keyword spec, as the word "store" can appear anywhere in the command). We will expose these key specs in the `COMMAND` command so that clients can learn, on startup, where the keys are for all commands instead of holding hardcoded tables or use `COMMAND GETKEYS` in runtime. Comments: 1. Redis doesn't internally use the new specs, they are only used for COMMAND output. 2. In order to support the current COMMAND INFO format (reply array indices 4, 5, 6) we created a synthetic range, called legacy_range, that, if possible, is built according to the new specs. 3. Redis currently uses only getkeys_proc or the legacy_range to get the keys indices (in COMMAND GETKEYS for example). "incomplete" specs: the command we have issues with are MIGRATE, STRALGO, and SORT for MIGRATE, because the token KEYS, if exists, must be the last token, we can search in reverse. it one of the keys is actually the string "keys" will return just a subset of the keys (hence, it's "incomplete") for SORT and STRALGO we can use this heuristic (the keys can be anywhere in the command) and therefore we added a key spec that is both "incomplete" and of "unknown type" if a client encounters an "incomplete" spec it means that it must find a different way (either COMMAND GETKEYS or have its own parser) to retrieve the keys. please note that all commands, apart from the three mentioned above, have "complete" key specs
2021-09-15 10:10:29 +02:00
/* Declarative data */
const char *declared_name; /* A string representing the command declared_name.
* It is a const char * for native commands and SDS for module commands. */
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
const char *summary; /* Summary of the command (optional). */
const char *complexity; /* Complexity description (optional). */
const char *since; /* Debut version of the command (optional). */
int doc_flags; /* Flags for documentation (see CMD_DOC_*). */
const char *replaced_by; /* In case the command is deprecated, this is the successor command. */
const char *deprecated_since; /* In case the command is deprecated, when did it happen? */
redisCommandGroup group; /* Command group */
commandHistory *history; /* History of the command */
Add command tips to COMMAND DOCS (#10104) Adding command tips (see https://redis.io/topics/command-tips) to commands. Breaking changes: 1. Removed the "random" and "sort_for_script" flags. They are now command tips. (this isn't affecting redis behavior since #9812, but could affect some client applications that's relying on COMMAND command flags) Summary of changes: 1. add BLOCKING flag (new flag) for all commands that could block. The ACL category with the same name is now implicit. 2. move RANDOM flag to a `nondeterministic_output` tip 3. move SORT_FOR_SCRIPT flag to `nondeterministic_output_order` tip 3. add REQUEST_POLICY and RESPONSE_POLICY where appropriate as documented in the tips 4. deprecate (ignore) the `random` flag for RM_CreateCommand Other notes: 1. Proxies need to send `RANDOMKEY` to all shards and then select one key randomly. The other option is to pick a random shard and transfer `RANDOMKEY `to it, but that scheme fails if this specific shard is empty 2. Remove CMD_RANDOM from `XACK` (i.e. XACK does not have RANDOM_OUTPUT) It was added in 9e4fb96ca12476b1c7468b143efca86b478bfb4a, I guess by mistake. Also from `(P)EXPIRETIME` (new command, was flagged "random" by mistake). 3. Add `nondeterministic_output` to `OBJECT ENCODING` (for the same reason `XTRIM` has it: the reply may differ depending on the internal representation in memory) 4. RANDOM on `HGETALL` was wrong (there due to a limitation of the old script sorting logic), now it's `nondeterministic_output_order` 5. Unrelated: Hide CMD_PROTECTED from COMMAND
2022-01-20 10:32:11 +01:00
const char **tips; /* An array of strings that are meant to be tips for clients/proxies regarding this command */
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
redisCommandProc *proc; /* Command implementation */
int arity; /* Number of arguments, it is possible to use -N to say >= N */
uint64_t flags; /* Command flags, see CMD_*. */
uint64_t acl_categories; /* ACl categories, see ACL_CATEGORY_*. */
keySpec key_specs_static[STATIC_KEY_SPECS_NUM]; /* Key specs. See keySpec */
/* Use a function to determine keys arguments in a command line.
Treat subcommands as commands (#9504) ## Intro The purpose is to allow having different flags/ACL categories for subcommands (Example: CONFIG GET is ok-loading but CONFIG SET isn't) We create a small command table for every command that has subcommands and each subcommand has its own flags, etc. (same as a "regular" command) This commit also unites the Redis and the Sentinel command tables ## Affected commands CONFIG Used to have "admin ok-loading ok-stale no-script" Changes: 1. Dropped "ok-loading" in all except GET (this doesn't change behavior since there were checks in the code doing that) XINFO Used to have "read-only random" Changes: 1. Dropped "random" in all except CONSUMERS XGROUP Used to have "write use-memory" Changes: 1. Dropped "use-memory" in all except CREATE and CREATECONSUMER COMMAND No changes. MEMORY Used to have "random read-only" Changes: 1. Dropped "random" in PURGE and USAGE ACL Used to have "admin no-script ok-loading ok-stale" Changes: 1. Dropped "admin" in WHOAMI, GENPASS, and CAT LATENCY No changes. MODULE No changes. SLOWLOG Used to have "admin random ok-loading ok-stale" Changes: 1. Dropped "random" in RESET OBJECT Used to have "read-only random" Changes: 1. Dropped "random" in ENCODING and REFCOUNT SCRIPT Used to have "may-replicate no-script" Changes: 1. Dropped "may-replicate" in all except FLUSH and LOAD CLIENT Used to have "admin no-script random ok-loading ok-stale" Changes: 1. Dropped "random" in all except INFO and LIST 2. Dropped "admin" in ID, TRACKING, CACHING, GETREDIR, INFO, SETNAME, GETNAME, and REPLY STRALGO No changes. PUBSUB No changes. CLUSTER Changes: 1. Dropped "admin in countkeysinslots, getkeysinslot, info, nodes, keyslot, myid, and slots SENTINEL No changes. (note that DEBUG also fits, but we decided not to convert it since it's for debugging and anyway undocumented) ## New sub-command This commit adds another element to the per-command output of COMMAND, describing the list of subcommands, if any (in the same structure as "regular" commands) Also, it adds a new subcommand: ``` COMMAND LIST [FILTERBY (MODULE <module-name>|ACLCAT <cat>|PATTERN <pattern>)] ``` which returns a set of all commands (unless filters), but excluding subcommands. ## Module API A new module API, RM_CreateSubcommand, was added, in order to allow module writer to define subcommands ## ACL changes: 1. Now, that each subcommand is actually a command, each has its own ACL id. 2. The old mechanism of allowed_subcommands is redundant (blocking/allowing a subcommand is the same as blocking/allowing a regular command), but we had to keep it, to support the widespread usage of allowed_subcommands to block commands with certain args, that aren't subcommands (e.g. "-select +select|0"). 3. I have renamed allowed_subcommands to allowed_firstargs to emphasize the difference. 4. Because subcommands are commands in ACL too, you can now use "-" to block subcommands (e.g. "+client -client|kill"), which wasn't possible in the past. 5. It is also possible to use the allowed_firstargs mechanism with subcommand. For example: `+config -config|set +config|set|loglevel` will block all CONFIG SET except for setting the log level. 6. All of the ACL changes above required some amount of refactoring. ## Misc 1. There are two approaches: Either each subcommand has its own function or all subcommands use the same function, determining what to do according to argv[0]. For now, I took the former approaches only with CONFIG and COMMAND, while other commands use the latter approach (for smaller blamelog diff). 2. Deleted memoryGetKeys: It is no longer needed because MEMORY USAGE now uses the "range" key spec. 4. Bugfix: GETNAME was missing from CLIENT's help message. 5. Sentinel and Redis now use the same table, with the same function pointer. Some commands have a different implementation in Sentinel, so we redirect them (these are ROLE, PUBLISH, and INFO). 6. Command stats now show the stats per subcommand (e.g. instead of stats just for "config" you will have stats for "config|set", "config|get", etc.) 7. It is now possible to use COMMAND directly on subcommands: COMMAND INFO CONFIG|GET (The pipeline syntax was inspired from ACL, and can be used in functions lookupCommandBySds and lookupCommandByCString) 8. STRALGO is now a container command (has "help") ## Breaking changes: 1. Command stats now show the stats per subcommand (see (5) above)
2021-10-20 10:52:57 +02:00
* Used for Redis Cluster redirect (may be NULL) */
redisGetKeysProc *getkeys_proc;
Treat subcommands as commands (#9504) ## Intro The purpose is to allow having different flags/ACL categories for subcommands (Example: CONFIG GET is ok-loading but CONFIG SET isn't) We create a small command table for every command that has subcommands and each subcommand has its own flags, etc. (same as a "regular" command) This commit also unites the Redis and the Sentinel command tables ## Affected commands CONFIG Used to have "admin ok-loading ok-stale no-script" Changes: 1. Dropped "ok-loading" in all except GET (this doesn't change behavior since there were checks in the code doing that) XINFO Used to have "read-only random" Changes: 1. Dropped "random" in all except CONSUMERS XGROUP Used to have "write use-memory" Changes: 1. Dropped "use-memory" in all except CREATE and CREATECONSUMER COMMAND No changes. MEMORY Used to have "random read-only" Changes: 1. Dropped "random" in PURGE and USAGE ACL Used to have "admin no-script ok-loading ok-stale" Changes: 1. Dropped "admin" in WHOAMI, GENPASS, and CAT LATENCY No changes. MODULE No changes. SLOWLOG Used to have "admin random ok-loading ok-stale" Changes: 1. Dropped "random" in RESET OBJECT Used to have "read-only random" Changes: 1. Dropped "random" in ENCODING and REFCOUNT SCRIPT Used to have "may-replicate no-script" Changes: 1. Dropped "may-replicate" in all except FLUSH and LOAD CLIENT Used to have "admin no-script random ok-loading ok-stale" Changes: 1. Dropped "random" in all except INFO and LIST 2. Dropped "admin" in ID, TRACKING, CACHING, GETREDIR, INFO, SETNAME, GETNAME, and REPLY STRALGO No changes. PUBSUB No changes. CLUSTER Changes: 1. Dropped "admin in countkeysinslots, getkeysinslot, info, nodes, keyslot, myid, and slots SENTINEL No changes. (note that DEBUG also fits, but we decided not to convert it since it's for debugging and anyway undocumented) ## New sub-command This commit adds another element to the per-command output of COMMAND, describing the list of subcommands, if any (in the same structure as "regular" commands) Also, it adds a new subcommand: ``` COMMAND LIST [FILTERBY (MODULE <module-name>|ACLCAT <cat>|PATTERN <pattern>)] ``` which returns a set of all commands (unless filters), but excluding subcommands. ## Module API A new module API, RM_CreateSubcommand, was added, in order to allow module writer to define subcommands ## ACL changes: 1. Now, that each subcommand is actually a command, each has its own ACL id. 2. The old mechanism of allowed_subcommands is redundant (blocking/allowing a subcommand is the same as blocking/allowing a regular command), but we had to keep it, to support the widespread usage of allowed_subcommands to block commands with certain args, that aren't subcommands (e.g. "-select +select|0"). 3. I have renamed allowed_subcommands to allowed_firstargs to emphasize the difference. 4. Because subcommands are commands in ACL too, you can now use "-" to block subcommands (e.g. "+client -client|kill"), which wasn't possible in the past. 5. It is also possible to use the allowed_firstargs mechanism with subcommand. For example: `+config -config|set +config|set|loglevel` will block all CONFIG SET except for setting the log level. 6. All of the ACL changes above required some amount of refactoring. ## Misc 1. There are two approaches: Either each subcommand has its own function or all subcommands use the same function, determining what to do according to argv[0]. For now, I took the former approaches only with CONFIG and COMMAND, while other commands use the latter approach (for smaller blamelog diff). 2. Deleted memoryGetKeys: It is no longer needed because MEMORY USAGE now uses the "range" key spec. 4. Bugfix: GETNAME was missing from CLIENT's help message. 5. Sentinel and Redis now use the same table, with the same function pointer. Some commands have a different implementation in Sentinel, so we redirect them (these are ROLE, PUBLISH, and INFO). 6. Command stats now show the stats per subcommand (e.g. instead of stats just for "config" you will have stats for "config|set", "config|get", etc.) 7. It is now possible to use COMMAND directly on subcommands: COMMAND INFO CONFIG|GET (The pipeline syntax was inspired from ACL, and can be used in functions lookupCommandBySds and lookupCommandByCString) 8. STRALGO is now a container command (has "help") ## Breaking changes: 1. Command stats now show the stats per subcommand (see (5) above)
2021-10-20 10:52:57 +02:00
/* Array of subcommands (may be NULL) */
struct redisCommand *subcommands;
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
/* Array of arguments (may be NULL) */
struct redisCommandArg *args;
A better approach for COMMAND INFO for movablekeys commands (#8324) Fix #7297 The problem: Today, there is no way for a client library or app to know the key name indexes for commands such as ZUNIONSTORE/EVAL and others with "numkeys", since COMMAND INFO returns no useful info for them. For cluster-aware redis clients, this requires to 'patch' the client library code specifically for each of these commands or to resolve each execution of these commands with COMMAND GETKEYS. The solution: Introducing key specs other than the legacy "range" (first,last,step) The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates the command doesn't take any key arguments or may contain one or more key-specs, each one may leads to the discovery of 0 or more key arguments. A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will obviously remain unchanged. A client that supports this key-specs feature needs only to look at the key-specs array. If it finds an unrecognized spec, it must resort to using COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no need to use GETKEYS. Each spec is an array of arguments, first one is the spec name, the second is an array of flags, and the third is an array containing details about the spec (specific meaning for each spec type) The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write. clients should ignore any unfamiliar flags. In order to easily find the positions of keys in a given array of args we introduce keys specs. There are two logical steps of key specs: 1. `start_search`: Given an array of args, indicate where we should start searching for keys 2. `find_keys`: Given the output of start_search and an array of args, indicate all possible indices of keys. ### start_search step specs - `index`: specify an argument index explicitly - `index`: 0 based index (1 means the first command argument) - `keyword`: specify a string to match in `argv`. We should start searching for keys just after the keyword appears. - `keyword`: the string to search for - `start_search`: an index from which to start the keyword search (can be negative, which means to search from the end) Examples: - `SET` has start_search of type `index` with value `1` - `XREAD` has start_search of type `keyword` with value `[“STREAMS”,1]` - `MIGRATE` has start_search of type `keyword` with value `[“KEYS”,-2]` ### find_keys step specs - `range`: specify `[count, step, limit]`. - `lastkey`: index of the last key. relative to the index returned from begin_search. -1 indicating till the last argument, -2 one before the last - `step`: how many args should we skip after finding a key, in order to find the next one - `limit`: if count is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means ½ of the remaining args, 3 means ⅓, and so on. - “keynum”: specify `[keynum_index, first_key_index, step]`. - `keynum_index`: is relative to the return of the `start_search` spec. - `first_key_index`: is relative to `keynum_index`. - `step`: how many args should we skip after finding a key, in order to find the next one Examples: - `SET` has `range` of `[0,1,0]` - `MSET` has `range` of `[-1,2,0]` - `XREAD` has `range` of `[-1,1,2]` - `ZUNION` has `start_search` of type `index` with value `1` and `find_keys` of type `keynum` with value `[0,1,1]` - `AI.DAGRUN` has `start_search` of type `keyword` with value `[“LOAD“,1]` and `find_keys` of type `keynum` with value `[0,1,1]` (see https://oss.redislabs.com/redisai/master/commands/#aidagrun) Note: this solution is not perfect as the module writers can come up with anything, but at least we will be able to find the key args of the vast majority of commands. If one of the above specs can’t describe the key positions, the module writer can always fall back to the `getkeys-api` option. Some keys cannot be found easily (`KEYS` in `MIGRATE`: Imagine the argument for `AUTH` is the string “KEYS” - we will start searching in the wrong index). The guarantee is that the specs may be incomplete (`incomplete` will be specified in the spec to denote that) but we never report false information (assuming the command syntax is correct). For `MIGRATE` we start searching from the end - `startfrom=-1` - and if one of the keys is actually called "keys" we will report only a subset of all keys - hence the `incomplete` flag. Some `incomplete` specs can be completely empty (i.e. UNKNOWN begin_search) which should tell the client that COMMAND GETKEYS (or any other way to get the keys) must be used (Example: For `SORT` there is no way to describe the STORE keyword spec, as the word "store" can appear anywhere in the command). We will expose these key specs in the `COMMAND` command so that clients can learn, on startup, where the keys are for all commands instead of holding hardcoded tables or use `COMMAND GETKEYS` in runtime. Comments: 1. Redis doesn't internally use the new specs, they are only used for COMMAND output. 2. In order to support the current COMMAND INFO format (reply array indices 4, 5, 6) we created a synthetic range, called legacy_range, that, if possible, is built according to the new specs. 3. Redis currently uses only getkeys_proc or the legacy_range to get the keys indices (in COMMAND GETKEYS for example). "incomplete" specs: the command we have issues with are MIGRATE, STRALGO, and SORT for MIGRATE, because the token KEYS, if exists, must be the last token, we can search in reverse. it one of the keys is actually the string "keys" will return just a subset of the keys (hence, it's "incomplete") for SORT and STRALGO we can use this heuristic (the keys can be anywhere in the command) and therefore we added a key spec that is both "incomplete" and of "unknown type" if a client encounters an "incomplete" spec it means that it must find a different way (either COMMAND GETKEYS or have its own parser) to retrieve the keys. please note that all commands, apart from the three mentioned above, have "complete" key specs
2021-09-15 10:10:29 +02:00
Move doc metadata from COMMAND to COMMAND DOCS (#10056) Syntax: `COMMAND DOCS [<command name> ...]` Background: Apparently old version of hiredis (and thus also redis-cli) can't support more than 7 levels of multi-bulk nesting. The solution is to move all the doc related metadata from COMMAND to a new COMMAND DOCS sub-command. The new DOCS sub-command returns a map of commands (not an array like in COMMAND), And the same goes for the `subcommands` field inside it (also contains a map) Besides that, the remaining new fields of COMMAND (hints, key-specs, and sub-commands), are placed in the outer array rather than a nested map. this was done mainly for consistency with the old format. Other changes: --- * Allow COMMAND INFO with no arguments, which returns all commands, so that we can some day deprecated the plain COMMAND (no args) * Reduce the amount of deferred replies from both COMMAND and COMMAND DOCS, especially in the inner loops, since these create many small reply objects, which lead to many small write syscalls and many small TCP packets. To make this easier, when populating the command table, we count the history, args, and hints so we later know their size in advance. Additionally, the movablekeys flag was moved into the flags register. * Update generate-commands-json.py to take the data from both command, it now executes redis-cli directly, instead of taking input from stdin. * Sub-commands in both COMMAND (and COMMAND INFO), and also COMMAND DOCS, show their full name. i.e. CONFIG * GET will be shown as `config|get` rather than just `get`. This will be visible both when asking for `COMMAND INFO config` and COMMAND INFO config|get`, but is especially important for the later. i.e. imagine someone doing `COMMAND INFO slowlog|get config|get` not being able to distinguish between the two items in the array response.
2022-01-11 16:16:16 +01:00
/* Runtime populated data */
/* What keys should be loaded in background when calling this command? */
long long microseconds, calls, rejected_calls, failed_calls;
int id; /* Command ID. This is a progressive ID starting from 0 that
is assigned at runtime, and is used in order to check
ACLs. A connection is able to execute a given command if
the user associated to the connection has this command
bit set in the bitmap of allowed commands. */
sds fullname; /* A SDS string representing the command fullname. */
Added INFO LATENCYSTATS section: latency by percentile distribution/latency by cumulative distribution of latencies (#9462) # Short description The Redis extended latency stats track per command latencies and enables: - exporting the per-command percentile distribution via the `INFO LATENCYSTATS` command. **( percentile distribution is not mergeable between cluster nodes ).** - exporting the per-command cumulative latency distributions via the `LATENCY HISTOGRAM` command. Using the cumulative distribution of latencies we can merge several stats from different cluster nodes to calculate aggregate metrics . By default, the extended latency monitoring is enabled since the overhead of keeping track of the command latency is very small. If you don't want to track extended latency metrics, you can easily disable it at runtime using the command: - `CONFIG SET latency-tracking no` By default, the exported latency percentiles are the p50, p99, and p999. You can alter them at runtime using the command: - `CONFIG SET latency-tracking-info-percentiles "0.0 50.0 100.0"` ## Some details: - The total size per histogram should sit around 40 KiB. We only allocate those 40KiB when a command was called for the first time. - With regards to the WRITE overhead As seen below, there is no measurable overhead on the achievable ops/sec or full latency spectrum on the client. Including also the measured redis-benchmark for unstable vs this branch. - We track from 1 nanosecond to 1 second ( everything above 1 second is considered +Inf ) ## `INFO LATENCYSTATS` exposition format - Format: `latency_percentiles_usec_<CMDNAME>:p0=XX,p50....` ## `LATENCY HISTOGRAM [command ...]` exposition format Return a cumulative distribution of latencies in the format of a histogram for the specified command names. The histogram is composed of a map of time buckets: - Each representing a latency range, between 1 nanosecond and roughly 1 second. - Each bucket covers twice the previous bucket's range. - Empty buckets are not printed. - Everything above 1 sec is considered +Inf. - At max there will be log2(1000000000)=30 buckets We reply a map for each command in the format: `<command name> : { `calls`: <total command calls> , `histogram` : { <bucket 1> : latency , < bucket 2> : latency, ... } }` Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-05 13:01:05 +01:00
struct hdr_histogram* latency_histogram; /*points to the command latency command histogram (unit of time nanosecond) */
A better approach for COMMAND INFO for movablekeys commands (#8324) Fix #7297 The problem: Today, there is no way for a client library or app to know the key name indexes for commands such as ZUNIONSTORE/EVAL and others with "numkeys", since COMMAND INFO returns no useful info for them. For cluster-aware redis clients, this requires to 'patch' the client library code specifically for each of these commands or to resolve each execution of these commands with COMMAND GETKEYS. The solution: Introducing key specs other than the legacy "range" (first,last,step) The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates the command doesn't take any key arguments or may contain one or more key-specs, each one may leads to the discovery of 0 or more key arguments. A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will obviously remain unchanged. A client that supports this key-specs feature needs only to look at the key-specs array. If it finds an unrecognized spec, it must resort to using COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no need to use GETKEYS. Each spec is an array of arguments, first one is the spec name, the second is an array of flags, and the third is an array containing details about the spec (specific meaning for each spec type) The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write. clients should ignore any unfamiliar flags. In order to easily find the positions of keys in a given array of args we introduce keys specs. There are two logical steps of key specs: 1. `start_search`: Given an array of args, indicate where we should start searching for keys 2. `find_keys`: Given the output of start_search and an array of args, indicate all possible indices of keys. ### start_search step specs - `index`: specify an argument index explicitly - `index`: 0 based index (1 means the first command argument) - `keyword`: specify a string to match in `argv`. We should start searching for keys just after the keyword appears. - `keyword`: the string to search for - `start_search`: an index from which to start the keyword search (can be negative, which means to search from the end) Examples: - `SET` has start_search of type `index` with value `1` - `XREAD` has start_search of type `keyword` with value `[“STREAMS”,1]` - `MIGRATE` has start_search of type `keyword` with value `[“KEYS”,-2]` ### find_keys step specs - `range`: specify `[count, step, limit]`. - `lastkey`: index of the last key. relative to the index returned from begin_search. -1 indicating till the last argument, -2 one before the last - `step`: how many args should we skip after finding a key, in order to find the next one - `limit`: if count is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means ½ of the remaining args, 3 means ⅓, and so on. - “keynum”: specify `[keynum_index, first_key_index, step]`. - `keynum_index`: is relative to the return of the `start_search` spec. - `first_key_index`: is relative to `keynum_index`. - `step`: how many args should we skip after finding a key, in order to find the next one Examples: - `SET` has `range` of `[0,1,0]` - `MSET` has `range` of `[-1,2,0]` - `XREAD` has `range` of `[-1,1,2]` - `ZUNION` has `start_search` of type `index` with value `1` and `find_keys` of type `keynum` with value `[0,1,1]` - `AI.DAGRUN` has `start_search` of type `keyword` with value `[“LOAD“,1]` and `find_keys` of type `keynum` with value `[0,1,1]` (see https://oss.redislabs.com/redisai/master/commands/#aidagrun) Note: this solution is not perfect as the module writers can come up with anything, but at least we will be able to find the key args of the vast majority of commands. If one of the above specs can’t describe the key positions, the module writer can always fall back to the `getkeys-api` option. Some keys cannot be found easily (`KEYS` in `MIGRATE`: Imagine the argument for `AUTH` is the string “KEYS” - we will start searching in the wrong index). The guarantee is that the specs may be incomplete (`incomplete` will be specified in the spec to denote that) but we never report false information (assuming the command syntax is correct). For `MIGRATE` we start searching from the end - `startfrom=-1` - and if one of the keys is actually called "keys" we will report only a subset of all keys - hence the `incomplete` flag. Some `incomplete` specs can be completely empty (i.e. UNKNOWN begin_search) which should tell the client that COMMAND GETKEYS (or any other way to get the keys) must be used (Example: For `SORT` there is no way to describe the STORE keyword spec, as the word "store" can appear anywhere in the command). We will expose these key specs in the `COMMAND` command so that clients can learn, on startup, where the keys are for all commands instead of holding hardcoded tables or use `COMMAND GETKEYS` in runtime. Comments: 1. Redis doesn't internally use the new specs, they are only used for COMMAND output. 2. In order to support the current COMMAND INFO format (reply array indices 4, 5, 6) we created a synthetic range, called legacy_range, that, if possible, is built according to the new specs. 3. Redis currently uses only getkeys_proc or the legacy_range to get the keys indices (in COMMAND GETKEYS for example). "incomplete" specs: the command we have issues with are MIGRATE, STRALGO, and SORT for MIGRATE, because the token KEYS, if exists, must be the last token, we can search in reverse. it one of the keys is actually the string "keys" will return just a subset of the keys (hence, it's "incomplete") for SORT and STRALGO we can use this heuristic (the keys can be anywhere in the command) and therefore we added a key spec that is both "incomplete" and of "unknown type" if a client encounters an "incomplete" spec it means that it must find a different way (either COMMAND GETKEYS or have its own parser) to retrieve the keys. please note that all commands, apart from the three mentioned above, have "complete" key specs
2021-09-15 10:10:29 +02:00
keySpec *key_specs;
keySpec legacy_range_key_spec; /* The legacy (first,last,step) key spec is
* still maintained (if applicable) so that
* we can still support the reply format of
* COMMAND INFO and COMMAND GETKEYS */
Move doc metadata from COMMAND to COMMAND DOCS (#10056) Syntax: `COMMAND DOCS [<command name> ...]` Background: Apparently old version of hiredis (and thus also redis-cli) can't support more than 7 levels of multi-bulk nesting. The solution is to move all the doc related metadata from COMMAND to a new COMMAND DOCS sub-command. The new DOCS sub-command returns a map of commands (not an array like in COMMAND), And the same goes for the `subcommands` field inside it (also contains a map) Besides that, the remaining new fields of COMMAND (hints, key-specs, and sub-commands), are placed in the outer array rather than a nested map. this was done mainly for consistency with the old format. Other changes: --- * Allow COMMAND INFO with no arguments, which returns all commands, so that we can some day deprecated the plain COMMAND (no args) * Reduce the amount of deferred replies from both COMMAND and COMMAND DOCS, especially in the inner loops, since these create many small reply objects, which lead to many small write syscalls and many small TCP packets. To make this easier, when populating the command table, we count the history, args, and hints so we later know their size in advance. Additionally, the movablekeys flag was moved into the flags register. * Update generate-commands-json.py to take the data from both command, it now executes redis-cli directly, instead of taking input from stdin. * Sub-commands in both COMMAND (and COMMAND INFO), and also COMMAND DOCS, show their full name. i.e. CONFIG * GET will be shown as `config|get` rather than just `get`. This will be visible both when asking for `COMMAND INFO config` and COMMAND INFO config|get`, but is especially important for the later. i.e. imagine someone doing `COMMAND INFO slowlog|get config|get` not being able to distinguish between the two items in the array response.
2022-01-11 16:16:16 +01:00
int num_args;
int num_history;
Add command tips to COMMAND DOCS (#10104) Adding command tips (see https://redis.io/topics/command-tips) to commands. Breaking changes: 1. Removed the "random" and "sort_for_script" flags. They are now command tips. (this isn't affecting redis behavior since #9812, but could affect some client applications that's relying on COMMAND command flags) Summary of changes: 1. add BLOCKING flag (new flag) for all commands that could block. The ACL category with the same name is now implicit. 2. move RANDOM flag to a `nondeterministic_output` tip 3. move SORT_FOR_SCRIPT flag to `nondeterministic_output_order` tip 3. add REQUEST_POLICY and RESPONSE_POLICY where appropriate as documented in the tips 4. deprecate (ignore) the `random` flag for RM_CreateCommand Other notes: 1. Proxies need to send `RANDOMKEY` to all shards and then select one key randomly. The other option is to pick a random shard and transfer `RANDOMKEY `to it, but that scheme fails if this specific shard is empty 2. Remove CMD_RANDOM from `XACK` (i.e. XACK does not have RANDOM_OUTPUT) It was added in 9e4fb96ca12476b1c7468b143efca86b478bfb4a, I guess by mistake. Also from `(P)EXPIRETIME` (new command, was flagged "random" by mistake). 3. Add `nondeterministic_output` to `OBJECT ENCODING` (for the same reason `XTRIM` has it: the reply may differ depending on the internal representation in memory) 4. RANDOM on `HGETALL` was wrong (there due to a limitation of the old script sorting logic), now it's `nondeterministic_output_order` 5. Unrelated: Hide CMD_PROTECTED from COMMAND
2022-01-20 10:32:11 +01:00
int num_tips;
A better approach for COMMAND INFO for movablekeys commands (#8324) Fix #7297 The problem: Today, there is no way for a client library or app to know the key name indexes for commands such as ZUNIONSTORE/EVAL and others with "numkeys", since COMMAND INFO returns no useful info for them. For cluster-aware redis clients, this requires to 'patch' the client library code specifically for each of these commands or to resolve each execution of these commands with COMMAND GETKEYS. The solution: Introducing key specs other than the legacy "range" (first,last,step) The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates the command doesn't take any key arguments or may contain one or more key-specs, each one may leads to the discovery of 0 or more key arguments. A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will obviously remain unchanged. A client that supports this key-specs feature needs only to look at the key-specs array. If it finds an unrecognized spec, it must resort to using COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no need to use GETKEYS. Each spec is an array of arguments, first one is the spec name, the second is an array of flags, and the third is an array containing details about the spec (specific meaning for each spec type) The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write. clients should ignore any unfamiliar flags. In order to easily find the positions of keys in a given array of args we introduce keys specs. There are two logical steps of key specs: 1. `start_search`: Given an array of args, indicate where we should start searching for keys 2. `find_keys`: Given the output of start_search and an array of args, indicate all possible indices of keys. ### start_search step specs - `index`: specify an argument index explicitly - `index`: 0 based index (1 means the first command argument) - `keyword`: specify a string to match in `argv`. We should start searching for keys just after the keyword appears. - `keyword`: the string to search for - `start_search`: an index from which to start the keyword search (can be negative, which means to search from the end) Examples: - `SET` has start_search of type `index` with value `1` - `XREAD` has start_search of type `keyword` with value `[“STREAMS”,1]` - `MIGRATE` has start_search of type `keyword` with value `[“KEYS”,-2]` ### find_keys step specs - `range`: specify `[count, step, limit]`. - `lastkey`: index of the last key. relative to the index returned from begin_search. -1 indicating till the last argument, -2 one before the last - `step`: how many args should we skip after finding a key, in order to find the next one - `limit`: if count is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means ½ of the remaining args, 3 means ⅓, and so on. - “keynum”: specify `[keynum_index, first_key_index, step]`. - `keynum_index`: is relative to the return of the `start_search` spec. - `first_key_index`: is relative to `keynum_index`. - `step`: how many args should we skip after finding a key, in order to find the next one Examples: - `SET` has `range` of `[0,1,0]` - `MSET` has `range` of `[-1,2,0]` - `XREAD` has `range` of `[-1,1,2]` - `ZUNION` has `start_search` of type `index` with value `1` and `find_keys` of type `keynum` with value `[0,1,1]` - `AI.DAGRUN` has `start_search` of type `keyword` with value `[“LOAD“,1]` and `find_keys` of type `keynum` with value `[0,1,1]` (see https://oss.redislabs.com/redisai/master/commands/#aidagrun) Note: this solution is not perfect as the module writers can come up with anything, but at least we will be able to find the key args of the vast majority of commands. If one of the above specs can’t describe the key positions, the module writer can always fall back to the `getkeys-api` option. Some keys cannot be found easily (`KEYS` in `MIGRATE`: Imagine the argument for `AUTH` is the string “KEYS” - we will start searching in the wrong index). The guarantee is that the specs may be incomplete (`incomplete` will be specified in the spec to denote that) but we never report false information (assuming the command syntax is correct). For `MIGRATE` we start searching from the end - `startfrom=-1` - and if one of the keys is actually called "keys" we will report only a subset of all keys - hence the `incomplete` flag. Some `incomplete` specs can be completely empty (i.e. UNKNOWN begin_search) which should tell the client that COMMAND GETKEYS (or any other way to get the keys) must be used (Example: For `SORT` there is no way to describe the STORE keyword spec, as the word "store" can appear anywhere in the command). We will expose these key specs in the `COMMAND` command so that clients can learn, on startup, where the keys are for all commands instead of holding hardcoded tables or use `COMMAND GETKEYS` in runtime. Comments: 1. Redis doesn't internally use the new specs, they are only used for COMMAND output. 2. In order to support the current COMMAND INFO format (reply array indices 4, 5, 6) we created a synthetic range, called legacy_range, that, if possible, is built according to the new specs. 3. Redis currently uses only getkeys_proc or the legacy_range to get the keys indices (in COMMAND GETKEYS for example). "incomplete" specs: the command we have issues with are MIGRATE, STRALGO, and SORT for MIGRATE, because the token KEYS, if exists, must be the last token, we can search in reverse. it one of the keys is actually the string "keys" will return just a subset of the keys (hence, it's "incomplete") for SORT and STRALGO we can use this heuristic (the keys can be anywhere in the command) and therefore we added a key spec that is both "incomplete" and of "unknown type" if a client encounters an "incomplete" spec it means that it must find a different way (either COMMAND GETKEYS or have its own parser) to retrieve the keys. please note that all commands, apart from the three mentioned above, have "complete" key specs
2021-09-15 10:10:29 +02:00
int key_specs_num;
int key_specs_max;
dict *subcommands_dict; /* A dictionary that holds the subcommands, the key is the subcommand sds name
* (not the fullname), and the value is the redisCommand structure pointer. */
Treat subcommands as commands (#9504) ## Intro The purpose is to allow having different flags/ACL categories for subcommands (Example: CONFIG GET is ok-loading but CONFIG SET isn't) We create a small command table for every command that has subcommands and each subcommand has its own flags, etc. (same as a "regular" command) This commit also unites the Redis and the Sentinel command tables ## Affected commands CONFIG Used to have "admin ok-loading ok-stale no-script" Changes: 1. Dropped "ok-loading" in all except GET (this doesn't change behavior since there were checks in the code doing that) XINFO Used to have "read-only random" Changes: 1. Dropped "random" in all except CONSUMERS XGROUP Used to have "write use-memory" Changes: 1. Dropped "use-memory" in all except CREATE and CREATECONSUMER COMMAND No changes. MEMORY Used to have "random read-only" Changes: 1. Dropped "random" in PURGE and USAGE ACL Used to have "admin no-script ok-loading ok-stale" Changes: 1. Dropped "admin" in WHOAMI, GENPASS, and CAT LATENCY No changes. MODULE No changes. SLOWLOG Used to have "admin random ok-loading ok-stale" Changes: 1. Dropped "random" in RESET OBJECT Used to have "read-only random" Changes: 1. Dropped "random" in ENCODING and REFCOUNT SCRIPT Used to have "may-replicate no-script" Changes: 1. Dropped "may-replicate" in all except FLUSH and LOAD CLIENT Used to have "admin no-script random ok-loading ok-stale" Changes: 1. Dropped "random" in all except INFO and LIST 2. Dropped "admin" in ID, TRACKING, CACHING, GETREDIR, INFO, SETNAME, GETNAME, and REPLY STRALGO No changes. PUBSUB No changes. CLUSTER Changes: 1. Dropped "admin in countkeysinslots, getkeysinslot, info, nodes, keyslot, myid, and slots SENTINEL No changes. (note that DEBUG also fits, but we decided not to convert it since it's for debugging and anyway undocumented) ## New sub-command This commit adds another element to the per-command output of COMMAND, describing the list of subcommands, if any (in the same structure as "regular" commands) Also, it adds a new subcommand: ``` COMMAND LIST [FILTERBY (MODULE <module-name>|ACLCAT <cat>|PATTERN <pattern>)] ``` which returns a set of all commands (unless filters), but excluding subcommands. ## Module API A new module API, RM_CreateSubcommand, was added, in order to allow module writer to define subcommands ## ACL changes: 1. Now, that each subcommand is actually a command, each has its own ACL id. 2. The old mechanism of allowed_subcommands is redundant (blocking/allowing a subcommand is the same as blocking/allowing a regular command), but we had to keep it, to support the widespread usage of allowed_subcommands to block commands with certain args, that aren't subcommands (e.g. "-select +select|0"). 3. I have renamed allowed_subcommands to allowed_firstargs to emphasize the difference. 4. Because subcommands are commands in ACL too, you can now use "-" to block subcommands (e.g. "+client -client|kill"), which wasn't possible in the past. 5. It is also possible to use the allowed_firstargs mechanism with subcommand. For example: `+config -config|set +config|set|loglevel` will block all CONFIG SET except for setting the log level. 6. All of the ACL changes above required some amount of refactoring. ## Misc 1. There are two approaches: Either each subcommand has its own function or all subcommands use the same function, determining what to do according to argv[0]. For now, I took the former approaches only with CONFIG and COMMAND, while other commands use the latter approach (for smaller blamelog diff). 2. Deleted memoryGetKeys: It is no longer needed because MEMORY USAGE now uses the "range" key spec. 4. Bugfix: GETNAME was missing from CLIENT's help message. 5. Sentinel and Redis now use the same table, with the same function pointer. Some commands have a different implementation in Sentinel, so we redirect them (these are ROLE, PUBLISH, and INFO). 6. Command stats now show the stats per subcommand (e.g. instead of stats just for "config" you will have stats for "config|set", "config|get", etc.) 7. It is now possible to use COMMAND directly on subcommands: COMMAND INFO CONFIG|GET (The pipeline syntax was inspired from ACL, and can be used in functions lookupCommandBySds and lookupCommandByCString) 8. STRALGO is now a container command (has "help") ## Breaking changes: 1. Command stats now show the stats per subcommand (see (5) above)
2021-10-20 10:52:57 +02:00
struct redisCommand *parent;
};
struct redisError {
long long count;
};
struct redisFunctionSym {
char *name;
unsigned long pointer;
};
typedef struct _redisSortObject {
robj *obj;
union {
double score;
robj *cmpobj;
} u;
} redisSortObject;
typedef struct _redisSortOperation {
int type;
robj *pattern;
} redisSortOperation;
/* Structure to hold list iteration abstraction. */
typedef struct {
robj *subject;
unsigned char encoding;
unsigned char direction; /* Iteration direction */
quicklistIter *iter;
} listTypeIterator;
/* Structure for an entry while iterating over a list. */
typedef struct {
listTypeIterator *li;
quicklistEntry entry; /* Entry in quicklist */
} listTypeEntry;
/* Structure to hold set iteration abstraction. */
typedef struct {
robj *subject;
int encoding;
int ii; /* intset iterator */
dictIterator *di;
} setTypeIterator;
2013-01-16 18:00:20 +01:00
/* Structure to hold hash iteration abstraction. Note that iteration over
* hashes involves both fields and values. Because it is possible that
* not both are required, store pointers in the iterator to avoid
* unnecessary memory allocation for fields/values. */
typedef struct {
2012-01-03 07:14:10 +01:00
robj *subject;
int encoding;
2012-01-03 07:14:10 +01:00
unsigned char *fptr, *vptr;
dictIterator *di;
dictEntry *de;
} hashTypeIterator;
#include "stream.h" /* Stream data type header file. */
#define OBJ_HASH_KEY 1
#define OBJ_HASH_VALUE 2
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
#define IO_THREADS_OP_IDLE 0
#define IO_THREADS_OP_READ 1
#define IO_THREADS_OP_WRITE 2
extern int io_threads_op;
/*-----------------------------------------------------------------------------
* Extern declarations
*----------------------------------------------------------------------------*/
extern struct redisServer server;
extern struct sharedObjectsStruct shared;
extern dictType objectKeyPointerValueDictType;
extern dictType objectKeyHeapPointerValueDictType;
extern dictType setDictType;
extern dictType BenchmarkDictType;
extern dictType zsetDictType;
extern dictType dbDictType;
extern double R_Zero, R_PosInf, R_NegInf, R_Nan;
2012-03-27 18:18:57 +02:00
extern dictType hashDictType;
extern dictType stringSetDictType;
extern dictType externalStringType;
extern dictType sdsHashDictType;
Limit the main db and expires dictionaries to expand (#7954) As we know, redis may reject user's requests or evict some keys if used memory is over maxmemory. Dictionaries expanding may make things worse, some big dictionaries, such as main db and expires dict, may eat huge memory at once for allocating a new big hash table and be far more than maxmemory after expanding. There are related issues: #4213 #4583 More details, when expand dict in redis, we will allocate a new big ht[1] that generally is double of ht[0], The size of ht[1] will be very big if ht[0] already is big. For db dict, if we have more than 64 million keys, we need to cost 1GB for ht[1] when dict expands. If the sum of used memory and new hash table of dict needed exceeds maxmemory, we shouldn't allow the dict to expand. Because, if we enable keys eviction, we still couldn't add much more keys after eviction and rehashing, what's worse, redis will keep less keys when redis only remains a little memory for storing new hash table instead of users' data. Moreover users can't write data in redis if disable keys eviction. What this commit changed ? Add a new member function expandAllowed for dict type, it provide a way for caller to allow expand or not. We expose two parameters for this function: more memory needed for expanding and dict current load factor, users can implement a function to make a decision by them. For main db dict and expires dict type, these dictionaries may be very big and cost huge memory for expanding, so we implement a judgement function: we can stop dict to expand provisionally if used memory will be over maxmemory after dict expands, but to guarantee the performance of redis, we still allow dict to expand if dict load factor exceeds the safe load factor. Add test cases to verify we don't allow main db to expand when left memory is not enough, so that avoid keys eviction. Other changes: For new hash table size when expand. Before this commit, the size is that double used of dict and later _dictNextPower. Actually we aim to control a dict load factor between 0.5 and 1.0. Now we replace *2 with +1, since the first check is that used >= size, the outcome of before will usually be the same as _dictNextPower(used+1). The only case where it'll differ is when dict_can_resize is false during fork, so that later the _dictNextPower(used*2) will cause the dict to jump to *4 (i.e. _dictNextPower(1025*2) will return 4096). Fix rehash test cases due to changing algorithm of new hash table size when expand.
2020-12-06 10:53:04 +01:00
extern dictType dbExpiresDictType;
2016-03-06 13:44:24 +01:00
extern dictType modulesDictType;
extern dictType sdsReplyDictType;
extern dict *modules;
/*-----------------------------------------------------------------------------
* Functions prototypes
*----------------------------------------------------------------------------*/
/* Command metadata */
A better approach for COMMAND INFO for movablekeys commands (#8324) Fix #7297 The problem: Today, there is no way for a client library or app to know the key name indexes for commands such as ZUNIONSTORE/EVAL and others with "numkeys", since COMMAND INFO returns no useful info for them. For cluster-aware redis clients, this requires to 'patch' the client library code specifically for each of these commands or to resolve each execution of these commands with COMMAND GETKEYS. The solution: Introducing key specs other than the legacy "range" (first,last,step) The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates the command doesn't take any key arguments or may contain one or more key-specs, each one may leads to the discovery of 0 or more key arguments. A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will obviously remain unchanged. A client that supports this key-specs feature needs only to look at the key-specs array. If it finds an unrecognized spec, it must resort to using COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no need to use GETKEYS. Each spec is an array of arguments, first one is the spec name, the second is an array of flags, and the third is an array containing details about the spec (specific meaning for each spec type) The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write. clients should ignore any unfamiliar flags. In order to easily find the positions of keys in a given array of args we introduce keys specs. There are two logical steps of key specs: 1. `start_search`: Given an array of args, indicate where we should start searching for keys 2. `find_keys`: Given the output of start_search and an array of args, indicate all possible indices of keys. ### start_search step specs - `index`: specify an argument index explicitly - `index`: 0 based index (1 means the first command argument) - `keyword`: specify a string to match in `argv`. We should start searching for keys just after the keyword appears. - `keyword`: the string to search for - `start_search`: an index from which to start the keyword search (can be negative, which means to search from the end) Examples: - `SET` has start_search of type `index` with value `1` - `XREAD` has start_search of type `keyword` with value `[“STREAMS”,1]` - `MIGRATE` has start_search of type `keyword` with value `[“KEYS”,-2]` ### find_keys step specs - `range`: specify `[count, step, limit]`. - `lastkey`: index of the last key. relative to the index returned from begin_search. -1 indicating till the last argument, -2 one before the last - `step`: how many args should we skip after finding a key, in order to find the next one - `limit`: if count is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means ½ of the remaining args, 3 means ⅓, and so on. - “keynum”: specify `[keynum_index, first_key_index, step]`. - `keynum_index`: is relative to the return of the `start_search` spec. - `first_key_index`: is relative to `keynum_index`. - `step`: how many args should we skip after finding a key, in order to find the next one Examples: - `SET` has `range` of `[0,1,0]` - `MSET` has `range` of `[-1,2,0]` - `XREAD` has `range` of `[-1,1,2]` - `ZUNION` has `start_search` of type `index` with value `1` and `find_keys` of type `keynum` with value `[0,1,1]` - `AI.DAGRUN` has `start_search` of type `keyword` with value `[“LOAD“,1]` and `find_keys` of type `keynum` with value `[0,1,1]` (see https://oss.redislabs.com/redisai/master/commands/#aidagrun) Note: this solution is not perfect as the module writers can come up with anything, but at least we will be able to find the key args of the vast majority of commands. If one of the above specs can’t describe the key positions, the module writer can always fall back to the `getkeys-api` option. Some keys cannot be found easily (`KEYS` in `MIGRATE`: Imagine the argument for `AUTH` is the string “KEYS” - we will start searching in the wrong index). The guarantee is that the specs may be incomplete (`incomplete` will be specified in the spec to denote that) but we never report false information (assuming the command syntax is correct). For `MIGRATE` we start searching from the end - `startfrom=-1` - and if one of the keys is actually called "keys" we will report only a subset of all keys - hence the `incomplete` flag. Some `incomplete` specs can be completely empty (i.e. UNKNOWN begin_search) which should tell the client that COMMAND GETKEYS (or any other way to get the keys) must be used (Example: For `SORT` there is no way to describe the STORE keyword spec, as the word "store" can appear anywhere in the command). We will expose these key specs in the `COMMAND` command so that clients can learn, on startup, where the keys are for all commands instead of holding hardcoded tables or use `COMMAND GETKEYS` in runtime. Comments: 1. Redis doesn't internally use the new specs, they are only used for COMMAND output. 2. In order to support the current COMMAND INFO format (reply array indices 4, 5, 6) we created a synthetic range, called legacy_range, that, if possible, is built according to the new specs. 3. Redis currently uses only getkeys_proc or the legacy_range to get the keys indices (in COMMAND GETKEYS for example). "incomplete" specs: the command we have issues with are MIGRATE, STRALGO, and SORT for MIGRATE, because the token KEYS, if exists, must be the last token, we can search in reverse. it one of the keys is actually the string "keys" will return just a subset of the keys (hence, it's "incomplete") for SORT and STRALGO we can use this heuristic (the keys can be anywhere in the command) and therefore we added a key spec that is both "incomplete" and of "unknown type" if a client encounters an "incomplete" spec it means that it must find a different way (either COMMAND GETKEYS or have its own parser) to retrieve the keys. please note that all commands, apart from the three mentioned above, have "complete" key specs
2021-09-15 10:10:29 +02:00
void populateCommandLegacyRangeSpec(struct redisCommand *c);
int populateArgsStructure(struct redisCommandArg *args);
A better approach for COMMAND INFO for movablekeys commands (#8324) Fix #7297 The problem: Today, there is no way for a client library or app to know the key name indexes for commands such as ZUNIONSTORE/EVAL and others with "numkeys", since COMMAND INFO returns no useful info for them. For cluster-aware redis clients, this requires to 'patch' the client library code specifically for each of these commands or to resolve each execution of these commands with COMMAND GETKEYS. The solution: Introducing key specs other than the legacy "range" (first,last,step) The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates the command doesn't take any key arguments or may contain one or more key-specs, each one may leads to the discovery of 0 or more key arguments. A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will obviously remain unchanged. A client that supports this key-specs feature needs only to look at the key-specs array. If it finds an unrecognized spec, it must resort to using COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no need to use GETKEYS. Each spec is an array of arguments, first one is the spec name, the second is an array of flags, and the third is an array containing details about the spec (specific meaning for each spec type) The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write. clients should ignore any unfamiliar flags. In order to easily find the positions of keys in a given array of args we introduce keys specs. There are two logical steps of key specs: 1. `start_search`: Given an array of args, indicate where we should start searching for keys 2. `find_keys`: Given the output of start_search and an array of args, indicate all possible indices of keys. ### start_search step specs - `index`: specify an argument index explicitly - `index`: 0 based index (1 means the first command argument) - `keyword`: specify a string to match in `argv`. We should start searching for keys just after the keyword appears. - `keyword`: the string to search for - `start_search`: an index from which to start the keyword search (can be negative, which means to search from the end) Examples: - `SET` has start_search of type `index` with value `1` - `XREAD` has start_search of type `keyword` with value `[“STREAMS”,1]` - `MIGRATE` has start_search of type `keyword` with value `[“KEYS”,-2]` ### find_keys step specs - `range`: specify `[count, step, limit]`. - `lastkey`: index of the last key. relative to the index returned from begin_search. -1 indicating till the last argument, -2 one before the last - `step`: how many args should we skip after finding a key, in order to find the next one - `limit`: if count is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means ½ of the remaining args, 3 means ⅓, and so on. - “keynum”: specify `[keynum_index, first_key_index, step]`. - `keynum_index`: is relative to the return of the `start_search` spec. - `first_key_index`: is relative to `keynum_index`. - `step`: how many args should we skip after finding a key, in order to find the next one Examples: - `SET` has `range` of `[0,1,0]` - `MSET` has `range` of `[-1,2,0]` - `XREAD` has `range` of `[-1,1,2]` - `ZUNION` has `start_search` of type `index` with value `1` and `find_keys` of type `keynum` with value `[0,1,1]` - `AI.DAGRUN` has `start_search` of type `keyword` with value `[“LOAD“,1]` and `find_keys` of type `keynum` with value `[0,1,1]` (see https://oss.redislabs.com/redisai/master/commands/#aidagrun) Note: this solution is not perfect as the module writers can come up with anything, but at least we will be able to find the key args of the vast majority of commands. If one of the above specs can’t describe the key positions, the module writer can always fall back to the `getkeys-api` option. Some keys cannot be found easily (`KEYS` in `MIGRATE`: Imagine the argument for `AUTH` is the string “KEYS” - we will start searching in the wrong index). The guarantee is that the specs may be incomplete (`incomplete` will be specified in the spec to denote that) but we never report false information (assuming the command syntax is correct). For `MIGRATE` we start searching from the end - `startfrom=-1` - and if one of the keys is actually called "keys" we will report only a subset of all keys - hence the `incomplete` flag. Some `incomplete` specs can be completely empty (i.e. UNKNOWN begin_search) which should tell the client that COMMAND GETKEYS (or any other way to get the keys) must be used (Example: For `SORT` there is no way to describe the STORE keyword spec, as the word "store" can appear anywhere in the command). We will expose these key specs in the `COMMAND` command so that clients can learn, on startup, where the keys are for all commands instead of holding hardcoded tables or use `COMMAND GETKEYS` in runtime. Comments: 1. Redis doesn't internally use the new specs, they are only used for COMMAND output. 2. In order to support the current COMMAND INFO format (reply array indices 4, 5, 6) we created a synthetic range, called legacy_range, that, if possible, is built according to the new specs. 3. Redis currently uses only getkeys_proc or the legacy_range to get the keys indices (in COMMAND GETKEYS for example). "incomplete" specs: the command we have issues with are MIGRATE, STRALGO, and SORT for MIGRATE, because the token KEYS, if exists, must be the last token, we can search in reverse. it one of the keys is actually the string "keys" will return just a subset of the keys (hence, it's "incomplete") for SORT and STRALGO we can use this heuristic (the keys can be anywhere in the command) and therefore we added a key spec that is both "incomplete" and of "unknown type" if a client encounters an "incomplete" spec it means that it must find a different way (either COMMAND GETKEYS or have its own parser) to retrieve the keys. please note that all commands, apart from the three mentioned above, have "complete" key specs
2021-09-15 10:10:29 +02:00
2016-03-06 13:44:24 +01:00
/* Modules */
void moduleInitModulesSystem(void);
void moduleInitModulesSystemLast(void);
2022-01-11 18:00:56 +01:00
void modulesCron(void);
Module Configurations (#10285) This feature adds the ability to add four different types (Bool, Numeric, String, Enum) of configurations to a module to be accessed via the redis config file, and the CONFIG command. **Configuration Names**: We impose a restriction that a module configuration always starts with the module name and contains a '.' followed by the config name. If a module passes "config1" as the name to a register function, it will be registered as MODULENAME.config1. **Configuration Persistence**: Module Configurations exist only as long as a module is loaded. If a module is unloaded, the configurations are removed. There is now also a minimal core API for removal of standardConfig objects from configs by name. **Get and Set Callbacks**: Storage of config values is owned by the module that registers them, and provides callbacks for Redis to access and manipulate the values. This is exposed through a GET and SET callback. The get callback returns a typed value of the config to redis. The callback takes the name of the configuration, and also a privdata pointer. Note that these only take the CONFIGNAME portion of the config, not the entire MODULENAME.CONFIGNAME. ``` typedef RedisModuleString * (*RedisModuleConfigGetStringFunc)(const char *name, void *privdata); typedef long long (*RedisModuleConfigGetNumericFunc)(const char *name, void *privdata); typedef int (*RedisModuleConfigGetBoolFunc)(const char *name, void *privdata); typedef int (*RedisModuleConfigGetEnumFunc)(const char *name, void *privdata); ``` Configs must also must specify a set callback, i.e. what to do on a CONFIG SET XYZ 123 or when loading configurations from cli/.conf file matching these typedefs. *name* is again just the CONFIGNAME portion, *val* is the parsed value from the core, *privdata* is the registration time privdata pointer, and *err* is for providing errors to a client. ``` typedef int (*RedisModuleConfigSetStringFunc)(const char *name, RedisModuleString *val, void *privdata, RedisModuleString **err); typedef int (*RedisModuleConfigSetNumericFunc)(const char *name, long long val, void *privdata, RedisModuleString **err); typedef int (*RedisModuleConfigSetBoolFunc)(const char *name, int val, void *privdata, RedisModuleString **err); typedef int (*RedisModuleConfigSetEnumFunc)(const char *name, int val, void *privdata, RedisModuleString **err); ``` Modules can also specify an optional apply callback that will be called after value(s) have been set via CONFIG SET: ``` typedef int (*RedisModuleConfigApplyFunc)(RedisModuleCtx *ctx, void *privdata, RedisModuleString **err); ``` **Flags:** We expose 7 new flags to the module, which are used as part of the config registration. ``` #define REDISMODULE_CONFIG_MODIFIABLE 0 /* This is the default for a module config. */ #define REDISMODULE_CONFIG_IMMUTABLE (1ULL<<0) /* Can this value only be set at startup? */ #define REDISMODULE_CONFIG_SENSITIVE (1ULL<<1) /* Does this value contain sensitive information */ #define REDISMODULE_CONFIG_HIDDEN (1ULL<<4) /* This config is hidden in `config get <pattern>` (used for tests/debugging) */ #define REDISMODULE_CONFIG_PROTECTED (1ULL<<5) /* Becomes immutable if enable-protected-configs is enabled. */ #define REDISMODULE_CONFIG_DENY_LOADING (1ULL<<6) /* This config is forbidden during loading. */ /* Numeric Specific Configs */ #define REDISMODULE_CONFIG_MEMORY (1ULL<<7) /* Indicates if this value can be set as a memory value */ ``` **Module Registration APIs**: ``` int (*RedisModule_RegisterBoolConfig)(RedisModuleCtx *ctx, char *name, int default_val, unsigned int flags, RedisModuleConfigGetBoolFunc getfn, RedisModuleConfigSetBoolFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_RegisterNumericConfig)(RedisModuleCtx *ctx, const char *name, long long default_val, unsigned int flags, long long min, long long max, RedisModuleConfigGetNumericFunc getfn, RedisModuleConfigSetNumericFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_RegisterStringConfig)(RedisModuleCtx *ctx, const char *name, const char *default_val, unsigned int flags, RedisModuleConfigGetStringFunc getfn, RedisModuleConfigSetStringFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_RegisterEnumConfig)(RedisModuleCtx *ctx, const char *name, int default_val, unsigned int flags, const char **enum_values, const int *int_values, int num_enum_vals, RedisModuleConfigGetEnumFunc getfn, RedisModuleConfigSetEnumFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_LoadConfigs)(RedisModuleCtx *ctx); ``` The module name will be auto appended along with a "." to the front of the name of the config. **What RM_Register[...]Config does**: A RedisModule struct now keeps a list of ModuleConfig objects which look like: ``` typedef struct ModuleConfig { sds name; /* Name of config without the module name appended to the front */ void *privdata; /* Optional data passed into the module config callbacks */ union get_fn { /* The get callback specificed by the module */ RedisModuleConfigGetStringFunc get_string; RedisModuleConfigGetNumericFunc get_numeric; RedisModuleConfigGetBoolFunc get_bool; RedisModuleConfigGetEnumFunc get_enum; } get_fn; union set_fn { /* The set callback specified by the module */ RedisModuleConfigSetStringFunc set_string; RedisModuleConfigSetNumericFunc set_numeric; RedisModuleConfigSetBoolFunc set_bool; RedisModuleConfigSetEnumFunc set_enum; } set_fn; RedisModuleConfigApplyFunc apply_fn; RedisModule *module; } ModuleConfig; ``` It also registers a standardConfig in the configs array, with a pointer to the ModuleConfig object associated with it. **What happens on a CONFIG GET/SET MODULENAME.MODULECONFIG:** For CONFIG SET, we do the same parsing as is done in config.c and pass that as the argument to the module set callback. For CONFIG GET, we call the module get callback and return that value to config.c to return to a client. **CONFIG REWRITE**: Starting up a server with module configurations in a .conf file but no module load directive will fail. The flip side is also true, specifying a module load and a bunch of module configurations will load those configurations in using the module defined set callbacks on a RM_LoadConfigs call. Configs being rewritten works the same way as it does for standard configs, as the module has the ability to specify a default value. If a module is unloaded with configurations specified in the .conf file those configurations will be commented out from the .conf file on the next config rewrite. **RM_LoadConfigs:** `RedisModule_LoadConfigs(RedisModuleCtx *ctx);` This last API is used to make configs available within the onLoad() after they have been registered. The expected usage is that a module will register all of its configs, then call LoadConfigs to trigger all of the set callbacks, and then can error out if any of them were malformed. LoadConfigs will attempt to set all configs registered to either a .conf file argument/loadex argument or their default value if an argument is not specified. **LoadConfigs is a required function if configs are registered. ** Also note that LoadConfigs **does not** call the apply callbacks, but a module can do that directly after the LoadConfigs call. **New Command: MODULE LOADEX [CONFIG NAME VALUE] [ARGS ...]:** This command provides the ability to provide startup context information to a module. LOADEX stands for "load extended" similar to GETEX. Note that provided config names need the full MODULENAME.MODULECONFIG name. Any additional arguments a module might want are intended to be specified after ARGS. Everything after ARGS is passed to onLoad as RedisModuleString **argv. Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <matolson@amazon.com> Co-authored-by: sundb <sundbcn@gmail.com> Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com> Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2022-03-30 14:47:06 +02:00
int moduleLoad(const char *path, void **argv, int argc, int is_loadex);
int moduleUnload(sds name);
2016-03-06 13:44:24 +01:00
void moduleLoadFromQueue(void);
int moduleGetCommandKeysViaAPI(struct redisCommand *cmd, robj **argv, int argc, getKeysResult *result);
int moduleGetCommandChannelsViaAPI(struct redisCommand *cmd, robj **argv, int argc, getKeysResult *result);
moduleType *moduleTypeLookupModuleByID(uint64_t id);
void moduleTypeNameByID(char *name, uint64_t moduleid);
const char *moduleTypeModuleName(moduleType *mt);
const char *moduleNameFromCommand(struct redisCommand *cmd);
void moduleFreeContext(struct RedisModuleCtx *ctx);
void unblockClientFromModule(client *c);
void moduleHandleBlockedClients(void);
void moduleBlockedClientTimedOut(client *c);
Add event loop support to the module API (#10001) Modules can now register sockets/pipe to the Redis main thread event loop and do network operations asynchronously. Previously, modules had to maintain an event loop and another thread for asynchronous network operations. Also, if a module is calling API functions after doing some network operations, it had to synchronize its event loop thread's access with Redis main thread by locking the GIL, causing contention on the lock. After this commit, no synchronization is needed as module can operate in Redis main thread context. So, this commit may improve the performance for some use cases. Added three functions to the module API: * RedisModule_EventLoopAdd(int fd, int mask, RedisModuleEventLoopFunc func, void *user_data) * RedisModule_EventLoopDel(int fd, int mask) * RedisModule_EventLoopAddOneShot(RedisModuleEventLoopOneShotFunc func, void *user_data) - This function can be called from other threads to trigger callback on Redis main thread. Callback will be triggered only once. If Redis main thread is sleeping, this call will wake up the Redis main thread. Event loop callbacks are called by Redis main thread after locking the GIL. Inside callbacks, modules can operate as if they are holding the GIL. Added REDISMODULE_EVENT_EVENTLOOP event with two subevents: * REDISMODULE_SUBEVENT_EVENTLOOP_BEFORE_SLEEP * REDISMODULE_SUBEVENT_EVENTLOOP_AFTER_SLEEP These events are for modules that want to participate in the before and after sleep action. e.g It might be useful to implement batching : Read data from the network, write all to a file in one go on BEFORE_SLEEP event.
2022-01-18 12:10:07 +01:00
void modulePipeReadable(aeEventLoop *el, int fd, void *privdata, int mask);
size_t moduleCount(void);
void moduleAcquireGIL(void);
int moduleTryAcquireGIL(void);
void moduleReleaseGIL(void);
void moduleNotifyKeyspaceEvent(int type, const char *event, robj *key, int dbid);
2018-02-23 15:19:37 +01:00
void moduleCallCommandFilters(client *c);
void ModuleForkDoneHandler(int exitcode, int bysignal);
int TerminateModuleForkChild(int child_pid, int wait);
ssize_t rdbSaveModulesAux(rio *rdb, int when);
int moduleAllDatatypesHandleErrors();
Replica keep serving data during repl-diskless-load=swapdb for better availability (#9323) For diskless replication in swapdb mode, considering we already spend replica memory having a backup of current db to restore in case of failure, we can have the following benefits by instead swapping database only in case we succeeded in transferring db from master: - Avoid `LOADING` response during failed and successful synchronization for cases where the replica is already up and running with data. - Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping. - This could be implemented also for disk replication with similar benefits if consumers are willing to spend the extra memory usage. General notes: - The concept of `backupDb` becomes `tempDb` for clarity. - Async loading mode will only kick in if the replica is syncing from a master that has the same repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline. - New property in INFO: `async_loading` to differentiate from the blocking loading - Slot to Key mapping is now a field of `redisDb` as it's more natural to access it from both server.db and the tempDb that is passed around. - Because this is affecting replicas only, we assume that if they are not readonly and write commands during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET here anyways to avoid complications. Considerations for review: - We have many cases where server.loading flag is used and even though I tried my best, there may be cases where async_loading should be checked as well and cases where it shouldn't (would require very good understanding of whole code) - Several places that had different behavior depending on the loading flag where actually meant to just handle commands coming from the AOF client differently than ones coming from real clients, changed to check CLIENT_ID_AOF instead. **Additional for Release Notes** - Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't contribute on triggering next database SAVE - New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING - Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event. Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED, ABORTED and COMPLETED. - New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions to allow modules to declare they support the diskless replication with async loading (when absent, we fall back to disk-based loading). Co-authored-by: Eduardo Semprebon <edus@saxobank.com> Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-04 09:46:50 +01:00
int moduleAllModulesHandleReplAsyncLoad();
sds modulesCollectInfo(sds info, dict *sections_dict, int for_crash_report, int sections);
void moduleFireServerEvent(uint64_t eid, int subid, void *data);
void processModuleLoadingProgressEvent(int is_aof);
int moduleTryServeClientBlockedOnKey(client *c, robj *key);
void moduleUnblockClient(client *c);
int moduleBlockedClientMayTimeout(client *c);
int moduleClientIsBlockedOnKeys(client *c);
void moduleNotifyUserChanged(client *c);
void moduleNotifyKeyUnlink(robj *key, robj *val, int dbid);
size_t moduleGetFreeEffort(robj *key, robj *val, int dbid);
size_t moduleGetMemUsage(robj *key, robj *val, size_t sample_size, int dbid);
robj *moduleTypeDupOrReply(client *c, robj *fromkey, robj *tokey, int todb, robj *value);
int moduleDefragValue(robj *key, robj *obj, long *defragged, int dbid);
int moduleLateDefrag(robj *key, robj *value, unsigned long *cursor, long long endtime, long long *defragged, int dbid);
long moduleDefragGlobals(void);
Treat subcommands as commands (#9504) ## Intro The purpose is to allow having different flags/ACL categories for subcommands (Example: CONFIG GET is ok-loading but CONFIG SET isn't) We create a small command table for every command that has subcommands and each subcommand has its own flags, etc. (same as a "regular" command) This commit also unites the Redis and the Sentinel command tables ## Affected commands CONFIG Used to have "admin ok-loading ok-stale no-script" Changes: 1. Dropped "ok-loading" in all except GET (this doesn't change behavior since there were checks in the code doing that) XINFO Used to have "read-only random" Changes: 1. Dropped "random" in all except CONSUMERS XGROUP Used to have "write use-memory" Changes: 1. Dropped "use-memory" in all except CREATE and CREATECONSUMER COMMAND No changes. MEMORY Used to have "random read-only" Changes: 1. Dropped "random" in PURGE and USAGE ACL Used to have "admin no-script ok-loading ok-stale" Changes: 1. Dropped "admin" in WHOAMI, GENPASS, and CAT LATENCY No changes. MODULE No changes. SLOWLOG Used to have "admin random ok-loading ok-stale" Changes: 1. Dropped "random" in RESET OBJECT Used to have "read-only random" Changes: 1. Dropped "random" in ENCODING and REFCOUNT SCRIPT Used to have "may-replicate no-script" Changes: 1. Dropped "may-replicate" in all except FLUSH and LOAD CLIENT Used to have "admin no-script random ok-loading ok-stale" Changes: 1. Dropped "random" in all except INFO and LIST 2. Dropped "admin" in ID, TRACKING, CACHING, GETREDIR, INFO, SETNAME, GETNAME, and REPLY STRALGO No changes. PUBSUB No changes. CLUSTER Changes: 1. Dropped "admin in countkeysinslots, getkeysinslot, info, nodes, keyslot, myid, and slots SENTINEL No changes. (note that DEBUG also fits, but we decided not to convert it since it's for debugging and anyway undocumented) ## New sub-command This commit adds another element to the per-command output of COMMAND, describing the list of subcommands, if any (in the same structure as "regular" commands) Also, it adds a new subcommand: ``` COMMAND LIST [FILTERBY (MODULE <module-name>|ACLCAT <cat>|PATTERN <pattern>)] ``` which returns a set of all commands (unless filters), but excluding subcommands. ## Module API A new module API, RM_CreateSubcommand, was added, in order to allow module writer to define subcommands ## ACL changes: 1. Now, that each subcommand is actually a command, each has its own ACL id. 2. The old mechanism of allowed_subcommands is redundant (blocking/allowing a subcommand is the same as blocking/allowing a regular command), but we had to keep it, to support the widespread usage of allowed_subcommands to block commands with certain args, that aren't subcommands (e.g. "-select +select|0"). 3. I have renamed allowed_subcommands to allowed_firstargs to emphasize the difference. 4. Because subcommands are commands in ACL too, you can now use "-" to block subcommands (e.g. "+client -client|kill"), which wasn't possible in the past. 5. It is also possible to use the allowed_firstargs mechanism with subcommand. For example: `+config -config|set +config|set|loglevel` will block all CONFIG SET except for setting the log level. 6. All of the ACL changes above required some amount of refactoring. ## Misc 1. There are two approaches: Either each subcommand has its own function or all subcommands use the same function, determining what to do according to argv[0]. For now, I took the former approaches only with CONFIG and COMMAND, while other commands use the latter approach (for smaller blamelog diff). 2. Deleted memoryGetKeys: It is no longer needed because MEMORY USAGE now uses the "range" key spec. 4. Bugfix: GETNAME was missing from CLIENT's help message. 5. Sentinel and Redis now use the same table, with the same function pointer. Some commands have a different implementation in Sentinel, so we redirect them (these are ROLE, PUBLISH, and INFO). 6. Command stats now show the stats per subcommand (e.g. instead of stats just for "config" you will have stats for "config|set", "config|get", etc.) 7. It is now possible to use COMMAND directly on subcommands: COMMAND INFO CONFIG|GET (The pipeline syntax was inspired from ACL, and can be used in functions lookupCommandBySds and lookupCommandByCString) 8. STRALGO is now a container command (has "help") ## Breaking changes: 1. Command stats now show the stats per subcommand (see (5) above)
2021-10-20 10:52:57 +02:00
void *moduleGetHandleByName(char *modulename);
int moduleIsModuleCommand(void *module_handle, struct redisCommand *cmd);
2016-03-06 13:44:24 +01:00
/* Utils */
long long ustime(void);
long long mstime(void);
void getRandomHexChars(char *p, size_t len);
void getRandomBytes(unsigned char *p, size_t len);
uint64_t crc64(uint64_t crc, const unsigned char *s, uint64_t l);
void exitFromChild(int retcode);
long long redisPopcount(void *s, long count);
int redisSetProcTitle(char *title);
int validateProcTitleTemplate(const char *template);
int redisCommunicateSystemd(const char *sd_notify_msg);
void redisSetCpuAffinity(const char *cpulist);
Sort out the mess around Lua error messages and error stats (#10329) This PR fix 2 issues on Lua scripting: * Server error reply statistics (some errors were counted twice). * Error code and error strings returning from scripts (error code was missing / misplaced). ## Statistics a Lua script user is considered part of the user application, a sophisticated transaction, so we want to count an error even if handled silently by the script, but when it is propagated outwards from the script we don't wanna count it twice. on the other hand, if the script decides to throw an error on its own (using `redis.error_reply`), we wanna count that too. Besides, we do count the `calls` in command statistics for the commands the script calls, we we should certainly also count `failed_calls`. So when a simple `eval "return redis.call('set','x','y')" 0` fails, it should count the failed call to both SET and EVAL, but the `errorstats` and `total_error_replies` should be counted only once. The PR changes the error object that is raised on errors. Instead of raising a simple Lua string, Redis will raise a Lua table in the following format: ``` { err='<error message (including error code)>', source='<User source file name>', line='<line where the error happned>', ignore_error_stats_update=true/false, } ``` The `luaPushError` function was modified to construct the new error table as describe above. The `luaRaiseError` was renamed to `luaError` and is now simply called `lua_error` to raise the table on the top of the Lua stack as the error object. The reason is that since its functionality is changed, in case some Redis branch / fork uses it, it's better to have a compilation error than a bug. The `source` and `line` fields are enriched by the error handler (if possible) and the `ignore_error_stats_update` is optional and if its not present then the default value is `false`. If `ignore_error_stats_update` is true, the error will not be counted on the error stats. When parsing Redis call reply, each error is translated to a Lua table on the format describe above and the `ignore_error_stats_update` field is set to `true` so we will not count errors twice (we counted this error when we invoke the command). The changes in this PR might have been considered as a breaking change for users that used Lua `pcall` function. Before, the error was a string and now its a table. To keep backward comparability the PR override the `pcall` implementation and extract the error message from the error table and return it. Example of the error stats update: ``` 127.0.0.1:6379> lpush l 1 (integer) 2 127.0.0.1:6379> eval "return redis.call('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value. script: e471b73f1ef44774987ab00bdf51f21fd9f7974a, on @user_script:1. 127.0.0.1:6379> info Errorstats # Errorstats errorstat_WRONGTYPE:count=1 127.0.0.1:6379> info commandstats # Commandstats cmdstat_eval:calls=1,usec=341,usec_per_call=341.00,rejected_calls=0,failed_calls=1 cmdstat_info:calls=1,usec=35,usec_per_call=35.00,rejected_calls=0,failed_calls=0 cmdstat_lpush:calls=1,usec=14,usec_per_call=14.00,rejected_calls=0,failed_calls=0 cmdstat_get:calls=1,usec=10,usec_per_call=10.00,rejected_calls=0,failed_calls=1 ``` ## error message We can now construct the error message (sent as a reply to the user) from the error table, so this solves issues where the error message was malformed and the error code appeared in the middle of the error message: ```diff 127.0.0.1:6379> eval "return redis.call('set','x','y')" 0 -(error) ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: OOM command not allowed when used memory > 'maxmemory'. +(error) OOM command not allowed when used memory > 'maxmemory' @user_script:1. Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479) ``` ```diff 127.0.0.1:6379> eval "redis.call('get', 'l')" 0 -(error) ERR Error running script (call to f_8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1): @user_script:1: WRONGTYPE Operation against a key holding the wrong kind of value +(error) WRONGTYPE Operation against a key holding the wrong kind of value script: 8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1, on @user_script:1. ``` Notica that `redis.pcall` was not change: ``` 127.0.0.1:6379> eval "return redis.pcall('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value ``` ## other notes Notice that Some commands (like GEOADD) changes the cmd variable on the client stats so we can not count on it to update the command stats. In order to be able to update those stats correctly we needed to promote `realcmd` variable to be located on the client struct. Tests was added and modified to verify the changes. Related PR's: #10279, #10218, #10278, #10309 Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-27 12:40:57 +01:00
/* afterErrorReply flags */
#define ERR_REPLY_FLAG_NO_STATS_UPDATE (1ULL<<0) /* Indicating that we should not update
error stats after sending error reply */
/* networking.c -- Networking and Client related operations */
client *createClient(connection *conn);
void freeClient(client *c);
void freeClientAsync(client *c);
void logInvalidUseAndFreeClientAsync(client *c, const char *fmt, ...);
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
int beforeNextClient(client *c);
2022-01-11 18:00:56 +01:00
void clearClientConnectionState(client *c);
void resetClient(client *c);
void freeClientOriginalArgv(client *c);
acl check api for functions and eval (#10220) Changes: 1. Adds the `redis.acl_check_cmd()` api to lua scripts. It can be used to check if the current user has permissions to execute a given command. The new function receives the command to check as an argument exactly like `redis.call()` receives the command to execute as an argument. 2. In the PR I unified the code used to convert lua arguments to redis argv arguments from both the new `redis.acl_check_cmd()` API and the `redis.[p]call()` API. This cleans up potential duplicate code. 3. While doing the refactoring in 2 I noticed there's an optimization to reduce allocation calls when parsing lua arguments into an `argv` array in the `redis.[p]call()` implementation. These optimizations were introduced years ago in 48c49c485155ba9e4a7851fd1644c171674c6f0f and 4f686555ce962e6632235d824512ea8fdeda003c. It is unclear why this was added. The original commit message claims a 4% performance increase which I couldn't recreate and might not be worth it even if it did recreate. This PR removes that optimization. Following are details of the benchmark I did that couldn't reveal any performance improvements due to this optimization: ``` benchmark 1: src/redis-benchmark -P 500 -n 10000000 eval 'return redis.call("ping")' 0 benchmark 2: src/redis-benchmark -P 500 -r 1000 -n 1000000 eval 'return redis.call("mset","k1__rand_int__","v1__rand_int__","k2__rand_int__","v2__rand_int__","k3__rand_int__","v3__rand_int__","k4__rand_int__","v4__rand_int__")' 0 benchmark 3: src/redis-benchmark -P 500 -r 1000 -n 100000 eval "for i=1,100,1 do redis.call('set','kk'..i,'vv'..__rand_int__) end return redis.call('get','kk5')" 0 benchmark 4: src/redis-benchmark -P 500 -r 1000 -n 1000000 eval 'return redis.call("mset","k1__rand_int__","v1__rand_int__","k2__rand_int__","v2__rand_int__","k3__rand_int__","v3__rand_int__","k4__rand_int__","v4__rand_int__xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")' ``` I ran the benchmark on this branch with and without commit 68b71680a4d3bb8f0509e06578a9f15d05b92a47 Results in requests per second: cmd | without optimization | without optimization 2nd run | with original optimization | with original optimization 2nd run -- | -- | -- | -- | -- 1 | 461233.34 | 477395.31 | 471098.16 | 469946.91 2 | 34774.14 | 35469.8 | 35149.38 | 34464.93 3 | 6390.59 | 6281.41 | 6146.28 | 6464.12 4 | 28005.71 |   | 27965.77 |   As you can see, different use cases showed identical or negligible performance differences. So finally I decided to chuck the original optimization and simplify the code.
2022-02-07 07:04:01 +01:00
void freeClientArgv(client *c);
void sendReplyToClient(connection *conn);
void *addReplyDeferredLen(client *c);
void setDeferredArrayLen(client *c, void *node, long length);
void setDeferredMapLen(client *c, void *node, long length);
void setDeferredSetLen(client *c, void *node, long length);
void setDeferredAttributeLen(client *c, void *node, long length);
void setDeferredPushLen(client *c, void *node, long length);
int processInputBuffer(client *c);
void acceptTcpHandler(aeEventLoop *el, int fd, void *privdata, int mask);
void acceptTLSHandler(aeEventLoop *el, int fd, void *privdata, int mask);
void acceptUnixHandler(aeEventLoop *el, int fd, void *privdata, int mask);
void readQueryFromClient(connection *conn);
Replication backlog and replicas use one global shared replication buffer (#9166) ## Background For redis master, one replica uses one copy of replication buffer, that is a big waste of memory, more replicas more waste, and allocate/free memory for every reply list also cost much. If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with replicas and can't finish synchronization with replica. If we set client-output-buffer-limit big, master may be OOM when there are many replicas that separately keep much memory. Because replication buffers of different replica client are the same, one simple idea is that all replicas only use one replication buffer, that will effectively save memory. Since replication backlog content is the same as replicas' output buffer, now we can discard replication backlog memory and use global shared replication buffer to implement replication backlog mechanism. ## Implementation I create one global "replication buffer" which contains content of replication stream. The structure of "replication buffer" is similar to the reply list that exists in every client. But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields. ```c /* Replication buffer blocks is the list of replBufBlock. * * +--------------+ +--------------+ +--------------+ * | refcount = 1 | ... | refcount = 0 | ... | refcount = 2 | * +--------------+ +--------------+ +--------------+ * | / \ * | / \ * | / \ * Repl Backlog Replia_A Replia_B * * Each replica or replication backlog increments only the refcount of the * 'ref_repl_buf_node' which it points to. So when replica walks to the next * node, it should first increase the next node's refcount, and when we trim * the replication buffer nodes, we remove node always from the head node which * refcount is 0. If the refcount of the head node is not 0, we must stop * trimming and never iterate the next node. */ /* Similar with 'clientReplyBlock', it is used for shared buffers between * all replica clients and replication backlog. */ typedef struct replBufBlock { int refcount; /* Number of replicas or repl backlog using. */ long long id; /* The unique incremental number. */ long long repl_offset; /* Start replication offset of the block. */ size_t size, used; char buf[]; } replBufBlock; ``` So now when we feed replication stream into replication backlog and all replicas, we only need to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of replication backlog and replicas to references of the global replication buffer blocks. And we also need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim replication backlog if exceeding `repl-backlog-size`. When sending reply to replicas, we also need to iterate replication buffer blocks and send its content, when totally sending one block for replica, we decrease current node count and increase the next current node count, and then free the block which reference is 0 from the head of replication buffer blocks. Since now we use linked list to manage replication backlog, it may cost much time for iterating all linked list nodes to find corresponding replication buffer node. So we create a rax tree to store some nodes for index, but to avoid rax tree occupying too much memory, i record one per 64 nodes for index. Currently, to make partial resynchronization as possible as much, we always let replication backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks, and this method doesn't increase memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced replication buffer blocks when we need to trim backlog for exceeding backlog size setting, we trim backlog incrementally (free 64 blocks per call now), and make it faster in `beforeSleep` (free 640 blocks). ### Other changes - `mem_total_replication_buffers`: we add this field in INFO command, it means the total memory of replication buffers used. - `mem_clients_slaves`: now even replica is slow to replicate, and its output buffer memory is not 0, but it still may be 0, since replication backlog and replicas share one global replication buffer, only if replication buffer memory is more than the repl backlog setting size, we consider the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption of repl backlog. - Key eviction Since all replicas and replication backlog share global replication buffer, we think only the part of exceeding backlog size the extra separate consumption of replicas. Because we trim backlog incrementally in the background, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks disconnect. To avoid massive eviction loop, we don't count the delayed freed replication backlog into used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory. - `client-output-buffer-limit` check for replica clients It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size config (partial sync will succeed and then replica will get disconnected). Such a configuration is ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption implications since the replica client will share the backlog buffers memory. - Drop replication backlog after loading data if needed We always create replication backlog if server is a master, we need it because we put DELs in it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb, it is not possible to support partial resynchronization, to avoid extra memory of replication backlog, we drop it. - Multi IO threads Since all replicas and replication backlog use global replication buffer, if I/O threads are enabled, to guarantee data accessing thread safe, we must let main thread handle sending the output buffer to all replicas. But before, other IO threads could handle sending output buffer of all replicas. ## Other optimizations This solution resolve some other problem: - When replicas disconnect with master since of out of output buffer limit, releasing the output buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now, it doesn't cause freezing. - This implementation may mitigate reply list copy cost time(also freezes server) when one replication has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy reference info, it is very light. - If we set replication backlog size big, it also may cost much time to copy replication backlog into replica's output buffer. But this commit eliminates this problem. - Resizing replication backlog size doesn't empty current replication backlog content.
2021-10-25 08:24:31 +02:00
int prepareClientToWrite(client *c);
2018-11-27 11:58:55 +01:00
void addReplyNull(client *c);
void addReplyNullArray(client *c);
2018-12-04 18:00:35 +01:00
void addReplyBool(client *c, int b);
void addReplyVerbatim(client *c, const char *s, size_t len, const char *ext);
void addReplyProto(client *c, const char *s, size_t len);
void AddReplyFromClient(client *c, client *src);
void addReplyBulk(client *c, robj *obj);
void addReplyBulkCString(client *c, const char *s);
void addReplyBulkCBuffer(client *c, const void *p, size_t len);
void addReplyBulkLongLong(client *c, long long ll);
void addReply(client *c, robj *obj);
void addReplySds(client *c, sds s);
void addReplyBulkSds(client *c, sds s);
void setDeferredReplyBulkSds(client *c, void *node, sds s);
void addReplyErrorObject(client *c, robj *err);
void addReplyOrErrorObject(client *c, robj *reply);
Sort out the mess around Lua error messages and error stats (#10329) This PR fix 2 issues on Lua scripting: * Server error reply statistics (some errors were counted twice). * Error code and error strings returning from scripts (error code was missing / misplaced). ## Statistics a Lua script user is considered part of the user application, a sophisticated transaction, so we want to count an error even if handled silently by the script, but when it is propagated outwards from the script we don't wanna count it twice. on the other hand, if the script decides to throw an error on its own (using `redis.error_reply`), we wanna count that too. Besides, we do count the `calls` in command statistics for the commands the script calls, we we should certainly also count `failed_calls`. So when a simple `eval "return redis.call('set','x','y')" 0` fails, it should count the failed call to both SET and EVAL, but the `errorstats` and `total_error_replies` should be counted only once. The PR changes the error object that is raised on errors. Instead of raising a simple Lua string, Redis will raise a Lua table in the following format: ``` { err='<error message (including error code)>', source='<User source file name>', line='<line where the error happned>', ignore_error_stats_update=true/false, } ``` The `luaPushError` function was modified to construct the new error table as describe above. The `luaRaiseError` was renamed to `luaError` and is now simply called `lua_error` to raise the table on the top of the Lua stack as the error object. The reason is that since its functionality is changed, in case some Redis branch / fork uses it, it's better to have a compilation error than a bug. The `source` and `line` fields are enriched by the error handler (if possible) and the `ignore_error_stats_update` is optional and if its not present then the default value is `false`. If `ignore_error_stats_update` is true, the error will not be counted on the error stats. When parsing Redis call reply, each error is translated to a Lua table on the format describe above and the `ignore_error_stats_update` field is set to `true` so we will not count errors twice (we counted this error when we invoke the command). The changes in this PR might have been considered as a breaking change for users that used Lua `pcall` function. Before, the error was a string and now its a table. To keep backward comparability the PR override the `pcall` implementation and extract the error message from the error table and return it. Example of the error stats update: ``` 127.0.0.1:6379> lpush l 1 (integer) 2 127.0.0.1:6379> eval "return redis.call('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value. script: e471b73f1ef44774987ab00bdf51f21fd9f7974a, on @user_script:1. 127.0.0.1:6379> info Errorstats # Errorstats errorstat_WRONGTYPE:count=1 127.0.0.1:6379> info commandstats # Commandstats cmdstat_eval:calls=1,usec=341,usec_per_call=341.00,rejected_calls=0,failed_calls=1 cmdstat_info:calls=1,usec=35,usec_per_call=35.00,rejected_calls=0,failed_calls=0 cmdstat_lpush:calls=1,usec=14,usec_per_call=14.00,rejected_calls=0,failed_calls=0 cmdstat_get:calls=1,usec=10,usec_per_call=10.00,rejected_calls=0,failed_calls=1 ``` ## error message We can now construct the error message (sent as a reply to the user) from the error table, so this solves issues where the error message was malformed and the error code appeared in the middle of the error message: ```diff 127.0.0.1:6379> eval "return redis.call('set','x','y')" 0 -(error) ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: OOM command not allowed when used memory > 'maxmemory'. +(error) OOM command not allowed when used memory > 'maxmemory' @user_script:1. Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479) ``` ```diff 127.0.0.1:6379> eval "redis.call('get', 'l')" 0 -(error) ERR Error running script (call to f_8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1): @user_script:1: WRONGTYPE Operation against a key holding the wrong kind of value +(error) WRONGTYPE Operation against a key holding the wrong kind of value script: 8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1, on @user_script:1. ``` Notica that `redis.pcall` was not change: ``` 127.0.0.1:6379> eval "return redis.pcall('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value ``` ## other notes Notice that Some commands (like GEOADD) changes the cmd variable on the client stats so we can not count on it to update the command stats. In order to be able to update those stats correctly we needed to promote `realcmd` variable to be located on the client struct. Tests was added and modified to verify the changes. Related PR's: #10279, #10218, #10278, #10309 Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-27 12:40:57 +01:00
void afterErrorReply(client *c, const char *s, size_t len, int flags);
void addReplyErrorSdsEx(client *c, sds err, int flags);
void addReplyErrorSds(client *c, sds err);
void addReplyError(client *c, const char *err);
void addReplyErrorArity(client *c);
void addReplyErrorExpireTime(client *c);
void addReplyStatus(client *c, const char *status);
void addReplyDouble(client *c, double d);
Unified Lua and modules reply parsing and added RESP3 support to RM_Call (#9202) ## Current state 1. Lua has its own parser that handles parsing `reds.call` replies and translates them to Lua objects that can be used by the user Lua code. The parser partially handles resp3 (missing big number, verbatim, attribute, ...) 2. Modules have their own parser that handles parsing `RM_Call` replies and translates them to RedisModuleCallReply objects. The parser does not support resp3. In addition, in the future, we want to add Redis Function (#8693) that will probably support more languages. At some point maintaining so many parsers will stop scaling (bug fixes and protocol changes will need to be applied on all of them). We will probably end up with different parsers that support different parts of the resp protocol (like we already have today with Lua and modules) ## PR Changes This PR attempt to unified the reply parsing of Lua and modules (and in the future Redis Function) by introducing a new parser unit (`resp_parser.c`). The new parser handles parsing the reply and calls different callbacks to allow the users (another unit that uses the parser, i.e, Lua, modules, or Redis Function) to analyze the reply. ### Lua API Additions The code that handles reply parsing on `scripting.c` was removed. Instead, it uses the resp_parser to parse and create a Lua object out of the reply. As mentioned above the Lua parser did not handle parsing big numbers, verbatim, and attribute. The new parser can handle those and so Lua also gets it for free. Those are translated to Lua objects in the following way: 1. Big Number - Lua table `{'big_number':'<str representation for big number>'}` 2. Verbatim - Lua table `{'verbatim_string':{'format':'<verbatim format>', 'string':'<verbatim string value>'}}` 3. Attribute - currently ignored and not expose to the Lua parser, another issue will be open to decide how to expose it. Tests were added to check resp3 reply parsing on Lua ### Modules API Additions The reply parsing code on `module.c` was also removed and the new resp_parser is used instead. In addition, the RedisModuleCallReply was also extracted to a separate unit located on `call_reply.c` (in the future, this unit will also be used by Redis Function). A nice side effect of unified parsing is that modules now also support resp3. Resp3 can be enabled by giving `3` as a parameter to the fmt argument of `RM_Call`. It is also possible to give `0`, which will indicate an auto mode. i.e, Redis will automatically chose the reply protocol base on the current client set on the RedisModuleCtx (this mode will mostly be used when the module want to pass the reply to the client as is). In addition, the following RedisModuleAPI were added to allow analyzing resp3 replies: * New RedisModuleCallReply types: * `REDISMODULE_REPLY_MAP` * `REDISMODULE_REPLY_SET` * `REDISMODULE_REPLY_BOOL` * `REDISMODULE_REPLY_DOUBLE` * `REDISMODULE_REPLY_BIG_NUMBER` * `REDISMODULE_REPLY_VERBATIM_STRING` * `REDISMODULE_REPLY_ATTRIBUTE` * New RedisModuleAPI: * `RedisModule_CallReplyDouble` - getting double value from resp3 double reply * `RedisModule_CallReplyBool` - getting boolean value from resp3 boolean reply * `RedisModule_CallReplyBigNumber` - getting big number value from resp3 big number reply * `RedisModule_CallReplyVerbatim` - getting format and value from resp3 verbatim reply * `RedisModule_CallReplySetElement` - getting element from resp3 set reply * `RedisModule_CallReplyMapElement` - getting key and value from resp3 map reply * `RedisModule_CallReplyAttribute` - getting a reply attribute * `RedisModule_CallReplyAttributeElement` - getting key and value from resp3 attribute reply * New context flags: * `REDISMODULE_CTX_FLAGS_RESP3` - indicate that the client is using resp3 Tests were added to check the new RedisModuleAPI ### Modules API Changes * RM_ReplyWithCallReply might return REDISMODULE_ERR if the given CallReply is in resp3 but the client expects resp2. This is not a breaking change because in order to get a resp3 CallReply one needs to specifically specify `3` as a parameter to the fmt argument of `RM_Call` (as mentioned above). Tests were added to check this change ### More small Additions * Added `debug set-disable-deny-scripts` that allows to turn on and off the commands no-script flag protection. This is used by the Lua resp3 tests so it will be possible to run `debug protocol` and check the resp3 parsing code. Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2021-08-04 15:28:07 +02:00
void addReplyLongLongWithPrefix(client *c, long long ll, char prefix);
void addReplyBigNum(client *c, const char* num, size_t len);
void addReplyHumanLongDouble(client *c, long double d);
void addReplyLongLong(client *c, long long ll);
void addReplyArrayLen(client *c, long length);
void addReplyMapLen(client *c, long length);
void addReplySetLen(client *c, long length);
void addReplyAttributeLen(client *c, long length);
void addReplyPushLen(client *c, long length);
void addReplyHelp(client *c, const char **help);
void addReplySubcommandSyntaxError(client *c);
void addReplyLoadedModules(client *c);
Replication backlog and replicas use one global shared replication buffer (#9166) ## Background For redis master, one replica uses one copy of replication buffer, that is a big waste of memory, more replicas more waste, and allocate/free memory for every reply list also cost much. If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with replicas and can't finish synchronization with replica. If we set client-output-buffer-limit big, master may be OOM when there are many replicas that separately keep much memory. Because replication buffers of different replica client are the same, one simple idea is that all replicas only use one replication buffer, that will effectively save memory. Since replication backlog content is the same as replicas' output buffer, now we can discard replication backlog memory and use global shared replication buffer to implement replication backlog mechanism. ## Implementation I create one global "replication buffer" which contains content of replication stream. The structure of "replication buffer" is similar to the reply list that exists in every client. But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields. ```c /* Replication buffer blocks is the list of replBufBlock. * * +--------------+ +--------------+ +--------------+ * | refcount = 1 | ... | refcount = 0 | ... | refcount = 2 | * +--------------+ +--------------+ +--------------+ * | / \ * | / \ * | / \ * Repl Backlog Replia_A Replia_B * * Each replica or replication backlog increments only the refcount of the * 'ref_repl_buf_node' which it points to. So when replica walks to the next * node, it should first increase the next node's refcount, and when we trim * the replication buffer nodes, we remove node always from the head node which * refcount is 0. If the refcount of the head node is not 0, we must stop * trimming and never iterate the next node. */ /* Similar with 'clientReplyBlock', it is used for shared buffers between * all replica clients and replication backlog. */ typedef struct replBufBlock { int refcount; /* Number of replicas or repl backlog using. */ long long id; /* The unique incremental number. */ long long repl_offset; /* Start replication offset of the block. */ size_t size, used; char buf[]; } replBufBlock; ``` So now when we feed replication stream into replication backlog and all replicas, we only need to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of replication backlog and replicas to references of the global replication buffer blocks. And we also need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim replication backlog if exceeding `repl-backlog-size`. When sending reply to replicas, we also need to iterate replication buffer blocks and send its content, when totally sending one block for replica, we decrease current node count and increase the next current node count, and then free the block which reference is 0 from the head of replication buffer blocks. Since now we use linked list to manage replication backlog, it may cost much time for iterating all linked list nodes to find corresponding replication buffer node. So we create a rax tree to store some nodes for index, but to avoid rax tree occupying too much memory, i record one per 64 nodes for index. Currently, to make partial resynchronization as possible as much, we always let replication backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks, and this method doesn't increase memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced replication buffer blocks when we need to trim backlog for exceeding backlog size setting, we trim backlog incrementally (free 64 blocks per call now), and make it faster in `beforeSleep` (free 640 blocks). ### Other changes - `mem_total_replication_buffers`: we add this field in INFO command, it means the total memory of replication buffers used. - `mem_clients_slaves`: now even replica is slow to replicate, and its output buffer memory is not 0, but it still may be 0, since replication backlog and replicas share one global replication buffer, only if replication buffer memory is more than the repl backlog setting size, we consider the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption of repl backlog. - Key eviction Since all replicas and replication backlog share global replication buffer, we think only the part of exceeding backlog size the extra separate consumption of replicas. Because we trim backlog incrementally in the background, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks disconnect. To avoid massive eviction loop, we don't count the delayed freed replication backlog into used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory. - `client-output-buffer-limit` check for replica clients It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size config (partial sync will succeed and then replica will get disconnected). Such a configuration is ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption implications since the replica client will share the backlog buffers memory. - Drop replication backlog after loading data if needed We always create replication backlog if server is a master, we need it because we put DELs in it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb, it is not possible to support partial resynchronization, to avoid extra memory of replication backlog, we drop it. - Multi IO threads Since all replicas and replication backlog use global replication buffer, if I/O threads are enabled, to guarantee data accessing thread safe, we must let main thread handle sending the output buffer to all replicas. But before, other IO threads could handle sending output buffer of all replicas. ## Other optimizations This solution resolve some other problem: - When replicas disconnect with master since of out of output buffer limit, releasing the output buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now, it doesn't cause freezing. - This implementation may mitigate reply list copy cost time(also freezes server) when one replication has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy reference info, it is very light. - If we set replication backlog size big, it also may cost much time to copy replication backlog into replica's output buffer. But this commit eliminates this problem. - Resizing replication backlog size doesn't empty current replication backlog content.
2021-10-25 08:24:31 +02:00
void copyReplicaOutputBuffer(client *dst, client *src);
void addListRangeReply(client *c, robj *o, long start, long end, int reverse);
void deferredAfterErrorReply(client *c, list *errors);
2016-05-18 07:08:43 +02:00
size_t sdsZmallocSize(sds s);
size_t getStringObjectSdsUsedMemory(robj *o);
void freeClientReplyValue(void *o);
void *dupClientReplyValue(void *o);
char *getClientPeerId(client *client);
char *getClientSockName(client *client);
sds catClientInfoString(sds s, client *client);
sds getAllClientsInfoString(int type);
void rewriteClientCommandVector(client *c, int argc, ...);
void rewriteClientCommandArgument(client *c, int i, robj *newval);
void replaceClientCommandVector(client *c, int argc, robj **argv);
void redactClientCommandArgument(client *c, int argc);
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
size_t getClientOutputBufferMemoryUsage(client *c);
size_t getClientMemoryUsage(client *c, size_t *output_buffer_mem_usage);
int freeClientsInAsyncFreeQueue(void);
int closeClientOnOutputBufferLimitReached(client *c, int async);
int getClientType(client *c);
int getClientTypeByName(char *name);
char *getClientTypeName(int class);
2012-02-06 16:56:42 +01:00
void flushSlavesOutputBuffers(void);
void disconnectSlaves(void);
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
void evictClients(void);
int listenToPort(int port, socketFds *fds);
Wait for replicas when shutting down (#9872) To avoid data loss, this commit adds a grace period for lagging replicas to catch up the replication offset. Done: * Wait for replicas when shutdown is triggered by SIGTERM and SIGINT. * Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients to call SHUTDOWN in parallel. Note that they don't expect a response unless an error happens and shutdown is aborted. * Log warning for each replica lagging behind when finishing shutdown. * CLIENT_PAUSE_WRITE while waiting for replicas. * Configurable grace period 'shutdown-timeout' in seconds (default 10). * New flags for the SHUTDOWN command: - NOW disables the grace period for lagging replicas. - FORCE ignores errors writing the RDB or AOF files which would normally prevent a shutdown. - ABORT cancels ongoing shutdown. Can't be combined with other flags. * New field in the output of the INFO command: 'shutdown_in_milliseconds'. The value is the remaining maximum time to wait for lagging replicas before finishing the shutdown. This field is present in the Server section **only** during shutdown. Not directly related: * When shutting down, if there is an AOF saving child, it is killed **even** if AOF is disabled. This can happen if BGREWRITEAOF is used when AOF is off. * Client pause now has end time and type (WRITE or ALL) per purpose. The different pause purposes are *CLIENT PAUSE command*, *failover* and *shutdown*. If clients are unpaused for one purpose, it doesn't affect client pause for other purposes. For example, the CLIENT UNPAUSE command doesn't affect client pause initiated by the failover or shutdown procedures. A completed failover or a failed shutdown doesn't unpause clients paused by the CLIENT PAUSE command. Notes: * DEBUG RESTART doesn't wait for replicas. * We already have a warning logged when a replica disconnects. This means that if any replica connection is lost during the shutdown, it is either logged as disconnected or as lagging at the time of exit. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-02 08:50:15 +01:00
void pauseClients(pause_purpose purpose, mstime_t end, pause_type type);
void unpauseClients(pause_purpose purpose);
int areClientsPaused(void);
int checkClientPauseTimeoutAndReturnIfPaused(void);
void unblockPostponedClients();
void processEventsWhileBlocked(void);
void whileBlockedCron();
void blockingOperationStarts();
void blockingOperationEnds();
int handleClientsWithPendingWrites(void);
int handleClientsWithPendingWritesUsingThreads(void);
2019-03-31 21:59:50 +02:00
int handleClientsWithPendingReadsUsingThreads(void);
int stopThreadedIOIfNeeded(void);
int clientHasPendingReplies(client *c);
int islocalClient(client *c);
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
int updateClientMemUsage(client *c);
void updateClientMemUsageBucket(client *c);
void unlinkClient(client *c);
int writeToClient(client *c, int handler_installed);
void linkClient(client *c);
void protectClient(client *c);
void unprotectClient(client *c);
void initThreadedIO(void);
client *lookupClientByID(uint64_t id);
int authRequired(client *c);
Durability enhancement for appendfsync=always policy (#9678) Durability of database is a big and old topic, in this regard Redis use AOF to support it, and `appendfsync=alwasys` policy is the most strict level, guarantee all data is both written and synced on disk before reply success to client. But there are some cases have been overlooked, and could lead to durability broken. 1. The most clear one is about threaded-io mode we should also set client's write handler with `ae_barrier` in `handleClientsWithPendingWritesUsingThreads`, or the write handler would be called after read handler in the next event loop, it means the write command result could be replied to client before flush to AOF. 2. About blocked client (mostly by module) in `beforeSleep()`, `handleClientsBlockedOnKeys()` should be called before `flushAppendOnlyFile()`, in case the unblocked clients modify data without persistence but send reply. 3. When handling `ProcessingEventsWhileBlocked` normally it takes place when lua/function/module timeout, and we give a chance to users to kill the slow operation, but we should call `flushAppendOnlyFile()` before `handleClientsWithPendingWrites()`, in case the other clients in the last event loop get acknowledge before data persistence. for a instance: ``` in the same event loop client A executes set foo bar client B executes eval "for var=1,10000000,1 do end" 0 ``` after the script timeout, client A will get `OK` but lose data after restart (kill redis when timeout) if we don't flush the write command to AOF. 4. A more complex case about `ProcessingEventsWhileBlocked` it is lua timeout in transaction, for example `MULTI; set foo bar; eval "for var=1,10000000,1 do end" 0; EXEC`, then client will get set command's result before the whole transaction done, that breaks atomicity too. fortunately, it's already fixed by #5428 (although it's not the original purpose just a side effect : )), but module timeout should be fixed too. case 1, 2, 3 are fixed in this commit, the module issue in case 4 needs a followup PR.
2022-04-11 10:08:39 +02:00
void putClientInPendingWriteQueue(client *c);
#ifdef __GNUC__
Sort out the mess around Lua error messages and error stats (#10329) This PR fix 2 issues on Lua scripting: * Server error reply statistics (some errors were counted twice). * Error code and error strings returning from scripts (error code was missing / misplaced). ## Statistics a Lua script user is considered part of the user application, a sophisticated transaction, so we want to count an error even if handled silently by the script, but when it is propagated outwards from the script we don't wanna count it twice. on the other hand, if the script decides to throw an error on its own (using `redis.error_reply`), we wanna count that too. Besides, we do count the `calls` in command statistics for the commands the script calls, we we should certainly also count `failed_calls`. So when a simple `eval "return redis.call('set','x','y')" 0` fails, it should count the failed call to both SET and EVAL, but the `errorstats` and `total_error_replies` should be counted only once. The PR changes the error object that is raised on errors. Instead of raising a simple Lua string, Redis will raise a Lua table in the following format: ``` { err='<error message (including error code)>', source='<User source file name>', line='<line where the error happned>', ignore_error_stats_update=true/false, } ``` The `luaPushError` function was modified to construct the new error table as describe above. The `luaRaiseError` was renamed to `luaError` and is now simply called `lua_error` to raise the table on the top of the Lua stack as the error object. The reason is that since its functionality is changed, in case some Redis branch / fork uses it, it's better to have a compilation error than a bug. The `source` and `line` fields are enriched by the error handler (if possible) and the `ignore_error_stats_update` is optional and if its not present then the default value is `false`. If `ignore_error_stats_update` is true, the error will not be counted on the error stats. When parsing Redis call reply, each error is translated to a Lua table on the format describe above and the `ignore_error_stats_update` field is set to `true` so we will not count errors twice (we counted this error when we invoke the command). The changes in this PR might have been considered as a breaking change for users that used Lua `pcall` function. Before, the error was a string and now its a table. To keep backward comparability the PR override the `pcall` implementation and extract the error message from the error table and return it. Example of the error stats update: ``` 127.0.0.1:6379> lpush l 1 (integer) 2 127.0.0.1:6379> eval "return redis.call('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value. script: e471b73f1ef44774987ab00bdf51f21fd9f7974a, on @user_script:1. 127.0.0.1:6379> info Errorstats # Errorstats errorstat_WRONGTYPE:count=1 127.0.0.1:6379> info commandstats # Commandstats cmdstat_eval:calls=1,usec=341,usec_per_call=341.00,rejected_calls=0,failed_calls=1 cmdstat_info:calls=1,usec=35,usec_per_call=35.00,rejected_calls=0,failed_calls=0 cmdstat_lpush:calls=1,usec=14,usec_per_call=14.00,rejected_calls=0,failed_calls=0 cmdstat_get:calls=1,usec=10,usec_per_call=10.00,rejected_calls=0,failed_calls=1 ``` ## error message We can now construct the error message (sent as a reply to the user) from the error table, so this solves issues where the error message was malformed and the error code appeared in the middle of the error message: ```diff 127.0.0.1:6379> eval "return redis.call('set','x','y')" 0 -(error) ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: OOM command not allowed when used memory > 'maxmemory'. +(error) OOM command not allowed when used memory > 'maxmemory' @user_script:1. Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479) ``` ```diff 127.0.0.1:6379> eval "redis.call('get', 'l')" 0 -(error) ERR Error running script (call to f_8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1): @user_script:1: WRONGTYPE Operation against a key holding the wrong kind of value +(error) WRONGTYPE Operation against a key holding the wrong kind of value script: 8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1, on @user_script:1. ``` Notica that `redis.pcall` was not change: ``` 127.0.0.1:6379> eval "return redis.pcall('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value ``` ## other notes Notice that Some commands (like GEOADD) changes the cmd variable on the client stats so we can not count on it to update the command stats. In order to be able to update those stats correctly we needed to promote `realcmd` variable to be located on the client struct. Tests was added and modified to verify the changes. Related PR's: #10279, #10218, #10278, #10309 Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-27 12:40:57 +01:00
void addReplyErrorFormatEx(client *c, int flags, const char *fmt, ...)
__attribute__((format(printf, 3, 4)));
void addReplyErrorFormat(client *c, const char *fmt, ...)
__attribute__((format(printf, 2, 3)));
void addReplyStatusFormat(client *c, const char *fmt, ...)
__attribute__((format(printf, 2, 3)));
#else
Sort out the mess around Lua error messages and error stats (#10329) This PR fix 2 issues on Lua scripting: * Server error reply statistics (some errors were counted twice). * Error code and error strings returning from scripts (error code was missing / misplaced). ## Statistics a Lua script user is considered part of the user application, a sophisticated transaction, so we want to count an error even if handled silently by the script, but when it is propagated outwards from the script we don't wanna count it twice. on the other hand, if the script decides to throw an error on its own (using `redis.error_reply`), we wanna count that too. Besides, we do count the `calls` in command statistics for the commands the script calls, we we should certainly also count `failed_calls`. So when a simple `eval "return redis.call('set','x','y')" 0` fails, it should count the failed call to both SET and EVAL, but the `errorstats` and `total_error_replies` should be counted only once. The PR changes the error object that is raised on errors. Instead of raising a simple Lua string, Redis will raise a Lua table in the following format: ``` { err='<error message (including error code)>', source='<User source file name>', line='<line where the error happned>', ignore_error_stats_update=true/false, } ``` The `luaPushError` function was modified to construct the new error table as describe above. The `luaRaiseError` was renamed to `luaError` and is now simply called `lua_error` to raise the table on the top of the Lua stack as the error object. The reason is that since its functionality is changed, in case some Redis branch / fork uses it, it's better to have a compilation error than a bug. The `source` and `line` fields are enriched by the error handler (if possible) and the `ignore_error_stats_update` is optional and if its not present then the default value is `false`. If `ignore_error_stats_update` is true, the error will not be counted on the error stats. When parsing Redis call reply, each error is translated to a Lua table on the format describe above and the `ignore_error_stats_update` field is set to `true` so we will not count errors twice (we counted this error when we invoke the command). The changes in this PR might have been considered as a breaking change for users that used Lua `pcall` function. Before, the error was a string and now its a table. To keep backward comparability the PR override the `pcall` implementation and extract the error message from the error table and return it. Example of the error stats update: ``` 127.0.0.1:6379> lpush l 1 (integer) 2 127.0.0.1:6379> eval "return redis.call('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value. script: e471b73f1ef44774987ab00bdf51f21fd9f7974a, on @user_script:1. 127.0.0.1:6379> info Errorstats # Errorstats errorstat_WRONGTYPE:count=1 127.0.0.1:6379> info commandstats # Commandstats cmdstat_eval:calls=1,usec=341,usec_per_call=341.00,rejected_calls=0,failed_calls=1 cmdstat_info:calls=1,usec=35,usec_per_call=35.00,rejected_calls=0,failed_calls=0 cmdstat_lpush:calls=1,usec=14,usec_per_call=14.00,rejected_calls=0,failed_calls=0 cmdstat_get:calls=1,usec=10,usec_per_call=10.00,rejected_calls=0,failed_calls=1 ``` ## error message We can now construct the error message (sent as a reply to the user) from the error table, so this solves issues where the error message was malformed and the error code appeared in the middle of the error message: ```diff 127.0.0.1:6379> eval "return redis.call('set','x','y')" 0 -(error) ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: OOM command not allowed when used memory > 'maxmemory'. +(error) OOM command not allowed when used memory > 'maxmemory' @user_script:1. Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479) ``` ```diff 127.0.0.1:6379> eval "redis.call('get', 'l')" 0 -(error) ERR Error running script (call to f_8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1): @user_script:1: WRONGTYPE Operation against a key holding the wrong kind of value +(error) WRONGTYPE Operation against a key holding the wrong kind of value script: 8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1, on @user_script:1. ``` Notica that `redis.pcall` was not change: ``` 127.0.0.1:6379> eval "return redis.pcall('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value ``` ## other notes Notice that Some commands (like GEOADD) changes the cmd variable on the client stats so we can not count on it to update the command stats. In order to be able to update those stats correctly we needed to promote `realcmd` variable to be located on the client struct. Tests was added and modified to verify the changes. Related PR's: #10279, #10218, #10278, #10309 Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-27 12:40:57 +01:00
void addReplyErrorFormatEx(client *c, int flags, const char *fmt, ...);
void addReplyErrorFormat(client *c, const char *fmt, ...);
void addReplyStatusFormat(client *c, const char *fmt, ...);
#endif
/* Client side caching (tracking mode) */
2020-02-21 16:39:42 +01:00
void enableTracking(client *c, uint64_t redirect_to, uint64_t options, robj **prefix, size_t numprefix);
void disableTracking(client *c);
void trackingRememberKeys(client *c);
void trackingInvalidateKey(client *c, robj *keyobj, int bcast);
void trackingScheduleKeyInvalidation(uint64_t client_id, robj *keyobj);
void trackingHandlePendingKeyInvalidations(void);
void trackingInvalidateKeysOnFlush(int async);
void freeTrackingRadixTree(rax *rt);
void freeTrackingRadixTreeAsync(rax *rt);
void trackingLimitUsedSlots(void);
uint64_t trackingGetTotalItems(void);
uint64_t trackingGetTotalKeys(void);
uint64_t trackingGetTotalPrefixes(void);
void trackingBroadcastInvalidationMessages(void);
int checkPrefixCollisionsOrReply(client *c, robj **prefix, size_t numprefix);
/* List data type */
void listTypePush(robj *subject, robj *value, int where);
robj *listTypePop(robj *subject, int where);
unsigned long listTypeLength(const robj *subject);
listTypeIterator *listTypeInitIterator(robj *subject, long index, unsigned char direction);
void listTypeReleaseIterator(listTypeIterator *li);
void listTypeSetIteratorDirection(listTypeIterator *li, unsigned char direction);
int listTypeNext(listTypeIterator *li, listTypeEntry *entry);
robj *listTypeGet(listTypeEntry *entry);
void listTypeInsert(listTypeEntry *entry, robj *value, int where);
void listTypeReplace(listTypeEntry *entry, robj *value);
int listTypeEqual(listTypeEntry *entry, robj *o);
void listTypeDelete(listTypeIterator *iter, listTypeEntry *entry);
robj *listTypeDup(robj *o);
int listTypeDelRange(robj *o, long start, long stop);
void unblockClientWaitingData(client *c);
void popGenericCommand(client *c, int where);
void listElementsRemoved(client *c, robj *key, int where, robj *o, long count, int *deleted);
/* MULTI/EXEC/WATCH... */
void unwatchAllKeys(client *c);
void initClientMultiState(client *c);
void freeClientMultiState(client *c);
void queueMultiCommand(client *c);
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
size_t multiStateMemOverhead(client *c);
void touchWatchedKey(redisDb *db, robj *key);
int isWatchedKeyExpired(client *c);
void touchAllWatchedKeysInDb(redisDb *emptied, redisDb *replaced_with);
void discardTransaction(client *c);
void flagTransaction(client *c);
void execCommandAbort(client *c, sds error);
/* Redis object implementation */
void decrRefCount(robj *o);
void decrRefCountVoid(void *o);
void incrRefCount(robj *o);
robj *makeObjectShared(robj *o);
robj *resetRefCount(robj *obj);
void freeStringObject(robj *o);
void freeListObject(robj *o);
void freeSetObject(robj *o);
void freeZsetObject(robj *o);
void freeHashObject(robj *o);
Use madvise(MADV_DONTNEED) to release memory to reduce COW (#8974) ## Backgroud As we know, after `fork`, one process will copy pages when writing data to these pages(CoW), and another process still keep old pages, they totally cost more memory. For redis, we suffered that redis consumed much memory when the fork child is serializing key/values, even that maybe cause OOM. But actually we find, in redis fork child process, the child process don't need to keep some memory and parent process may write or update that, for example, child process will never access the key-value that is serialized but users may update it in parent process. So we think it may reduce COW if the child process release memory that it is not needed. ## Implementation For releasing key value in child process, we may think we call `decrRefCount` to free memory, but i find the fork child process still use much memory when we don't write any data to redis, and it costs much more time that slows down bgsave. Maybe because memory allocator doesn't really release memory to OS, and it may modify some inner data for this free operation, especially when we free small objects. Moreover, CoW is based on pages, so it is a easy way that we only free the memory bulk that is not less than kernel page size. madvise(MADV_DONTNEED) can quickly release specified region pages to OS bypassing memory allocator, and allocator still consider that this memory still is used and don't change its inner data. There are some buffers we can release in the fork child process: - **Serialized key-values** the fork child process never access serialized key-values, so we try to free them. Because we only can release big bulk memory, and it is time consumed to iterate all items/members/fields/entries of complex data type. So we decide to iterate them and try to release them only when their average size of item/member/field/entry is more than page size of OS. - **Replication backlog** Because replication backlog is a cycle buffer, it will be changed quickly if redis has heavy write traffic, but in fork child process, we don't need to access that. - **Client buffers** If clients have requests during having the fork child process, clients' buffer also be changed frequently. The memory includes client query buffer, output buffer, and client struct used memory. To get child process peak private dirty memory, we need to count peak memory instead of last used memory, because the child process may continue to release memory (since COW used to only grow till now, the last was equivalent to the peak). Also we're adding a new `current_cow_peak` info variable (to complement the existing `current_cow_size`) Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-04 22:01:46 +02:00
void dismissObject(robj *o, size_t dump_size);
robj *createObject(int type, void *ptr);
robj *createStringObject(const char *ptr, size_t len);
robj *createRawStringObject(const char *ptr, size_t len);
robj *createEmbeddedStringObject(const char *ptr, size_t len);
robj *tryCreateRawStringObject(const char *ptr, size_t len);
robj *tryCreateStringObject(const char *ptr, size_t len);
robj *dupStringObject(const robj *o);
int isSdsRepresentableAsLongLong(sds s, long long *llval);
int isObjectRepresentableAsLongLong(robj *o, long long *llongval);
robj *tryObjectEncoding(robj *o);
robj *getDecodedObject(robj *o);
size_t stringObjectLen(robj *o);
robj *createStringObjectFromLongLong(long long value);
robj *createStringObjectFromLongLongForValue(long long value);
robj *createStringObjectFromLongDouble(long double value, int humanfriendly);
robj *createQuicklistObject(void);
robj *createSetObject(void);
robj *createIntsetObject(void);
robj *createHashObject(void);
robj *createZsetObject(void);
Replace all usage of ziplist with listpack for t_zset (#9366) Part two of implementing #8702 (zset), after #8887. ## Description of the feature Replaced all uses of ziplist with listpack in t_zset, and optimized some of the code to optimize performance. ## Rdb format changes New `RDB_TYPE_ZSET_LISTPACK` rdb type. ## Rdb loading improvements: 1) Pre-expansion of dict for validation of duplicate data for listpack and ziplist. 2) Simplifying the release of empty key objects when RDB loading. 3) Unify ziplist and listpack data verify methods for zset and hash, and move code to rdb.c. ## Interface changes 1) New `zset-max-listpack-entries` config is an alias for `zset-max-ziplist-entries` (same with `zset-max-listpack-value`). 2) OBJECT ENCODING will return listpack instead of ziplist. ## Listpack improvements: 1) Add `lpDeleteRange` and `lpDeleteRangeWithEntry` functions to delete a range of entries from listpack. 2) Improve the performance of `lpCompare`, converting from string to integer is faster than converting from integer to string. 3) Replace `snprintf` with `ll2string` to improve performance in converting numbers to strings in `lpGet()`. ## Zset improvements: 1) Improve the performance of `zzlFind` method, use `lpFind` instead of `lpCompare` in a loop. 2) Use `lpDeleteRangeWithEntry` instead of `lpDelete` twice to delete a element of zset. ## Tests 1) Add some unittests for `lpDeleteRange` and `lpDeleteRangeWithEntry` function. 2) Add zset RDB loading test. 3) Add benchmark test for `lpCompare` and `ziplsitCompare`. 4) Add empty listpack zset corrupt dump test.
2021-09-09 17:18:53 +02:00
robj *createZsetListpackObject(void);
robj *createStreamObject(void);
robj *createModuleObject(moduleType *mt, void *value);
int getLongFromObjectOrReply(client *c, robj *o, long *target, const char *msg);
int getPositiveLongFromObjectOrReply(client *c, robj *o, long *target, const char *msg);
int getRangeLongFromObjectOrReply(client *c, robj *o, long min, long max, long *target, const char *msg);
int checkType(client *c, robj *o, int type);
int getLongLongFromObjectOrReply(client *c, robj *o, long long *target, const char *msg);
int getDoubleFromObjectOrReply(client *c, robj *o, double *target, const char *msg);
int getDoubleFromObject(const robj *o, double *target);
int getLongLongFromObject(robj *o, long long *target);
2011-11-12 19:27:35 +01:00
int getLongDoubleFromObject(robj *o, long double *target);
int getLongDoubleFromObjectOrReply(client *c, robj *o, long double *target, const char *msg);
int getIntFromObjectOrReply(client *c, robj *o, int *target, const char *msg);
char *strEncoding(int encoding);
int compareStringObjects(robj *a, robj *b);
int collateStringObjects(robj *a, robj *b);
int equalStringObjects(robj *a, robj *b);
unsigned long long estimateObjectIdleTime(robj *o);
void trimStringObjectIfNeeded(robj *o);
#define sdsEncodedObject(objptr) (objptr->encoding == OBJ_ENCODING_RAW || objptr->encoding == OBJ_ENCODING_EMBSTR)
/* Synchronous I/O with timeout */
ssize_t syncWrite(int fd, char *ptr, ssize_t size, long long timeout);
ssize_t syncRead(int fd, char *ptr, ssize_t size, long long timeout);
ssize_t syncReadLine(int fd, char *ptr, ssize_t size, long long timeout);
/* Replication */
void replicationFeedSlaves(list *slaves, int dictid, robj **argv, int argc);
Replication backlog and replicas use one global shared replication buffer (#9166) ## Background For redis master, one replica uses one copy of replication buffer, that is a big waste of memory, more replicas more waste, and allocate/free memory for every reply list also cost much. If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with replicas and can't finish synchronization with replica. If we set client-output-buffer-limit big, master may be OOM when there are many replicas that separately keep much memory. Because replication buffers of different replica client are the same, one simple idea is that all replicas only use one replication buffer, that will effectively save memory. Since replication backlog content is the same as replicas' output buffer, now we can discard replication backlog memory and use global shared replication buffer to implement replication backlog mechanism. ## Implementation I create one global "replication buffer" which contains content of replication stream. The structure of "replication buffer" is similar to the reply list that exists in every client. But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields. ```c /* Replication buffer blocks is the list of replBufBlock. * * +--------------+ +--------------+ +--------------+ * | refcount = 1 | ... | refcount = 0 | ... | refcount = 2 | * +--------------+ +--------------+ +--------------+ * | / \ * | / \ * | / \ * Repl Backlog Replia_A Replia_B * * Each replica or replication backlog increments only the refcount of the * 'ref_repl_buf_node' which it points to. So when replica walks to the next * node, it should first increase the next node's refcount, and when we trim * the replication buffer nodes, we remove node always from the head node which * refcount is 0. If the refcount of the head node is not 0, we must stop * trimming and never iterate the next node. */ /* Similar with 'clientReplyBlock', it is used for shared buffers between * all replica clients and replication backlog. */ typedef struct replBufBlock { int refcount; /* Number of replicas or repl backlog using. */ long long id; /* The unique incremental number. */ long long repl_offset; /* Start replication offset of the block. */ size_t size, used; char buf[]; } replBufBlock; ``` So now when we feed replication stream into replication backlog and all replicas, we only need to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of replication backlog and replicas to references of the global replication buffer blocks. And we also need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim replication backlog if exceeding `repl-backlog-size`. When sending reply to replicas, we also need to iterate replication buffer blocks and send its content, when totally sending one block for replica, we decrease current node count and increase the next current node count, and then free the block which reference is 0 from the head of replication buffer blocks. Since now we use linked list to manage replication backlog, it may cost much time for iterating all linked list nodes to find corresponding replication buffer node. So we create a rax tree to store some nodes for index, but to avoid rax tree occupying too much memory, i record one per 64 nodes for index. Currently, to make partial resynchronization as possible as much, we always let replication backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks, and this method doesn't increase memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced replication buffer blocks when we need to trim backlog for exceeding backlog size setting, we trim backlog incrementally (free 64 blocks per call now), and make it faster in `beforeSleep` (free 640 blocks). ### Other changes - `mem_total_replication_buffers`: we add this field in INFO command, it means the total memory of replication buffers used. - `mem_clients_slaves`: now even replica is slow to replicate, and its output buffer memory is not 0, but it still may be 0, since replication backlog and replicas share one global replication buffer, only if replication buffer memory is more than the repl backlog setting size, we consider the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption of repl backlog. - Key eviction Since all replicas and replication backlog share global replication buffer, we think only the part of exceeding backlog size the extra separate consumption of replicas. Because we trim backlog incrementally in the background, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks disconnect. To avoid massive eviction loop, we don't count the delayed freed replication backlog into used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory. - `client-output-buffer-limit` check for replica clients It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size config (partial sync will succeed and then replica will get disconnected). Such a configuration is ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption implications since the replica client will share the backlog buffers memory. - Drop replication backlog after loading data if needed We always create replication backlog if server is a master, we need it because we put DELs in it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb, it is not possible to support partial resynchronization, to avoid extra memory of replication backlog, we drop it. - Multi IO threads Since all replicas and replication backlog use global replication buffer, if I/O threads are enabled, to guarantee data accessing thread safe, we must let main thread handle sending the output buffer to all replicas. But before, other IO threads could handle sending output buffer of all replicas. ## Other optimizations This solution resolve some other problem: - When replicas disconnect with master since of out of output buffer limit, releasing the output buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now, it doesn't cause freezing. - This implementation may mitigate reply list copy cost time(also freezes server) when one replication has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy reference info, it is very light. - If we set replication backlog size big, it also may cost much time to copy replication backlog into replica's output buffer. But this commit eliminates this problem. - Resizing replication backlog size doesn't empty current replication backlog content.
2021-10-25 08:24:31 +02:00
void replicationFeedStreamFromMasterStream(char *buf, size_t buflen);
void resetReplicationBuffer(void);
void feedReplicationBuffer(char *buf, size_t len);
void freeReplicaReferencedReplBuffer(client *replica);
void replicationFeedMonitors(client *c, list *monitors, int dictid, robj **argv, int argc);
void updateSlavesWaitingBgsave(int bgsaveerr, int type);
void replicationCron(void);
Accelerate diskless master connections, and general re-connections (#6271) Diskless master has some inherent latencies. 1) fork starts with delay from cron rather than immediately 2) replica is put online only after an ACK. but the ACK was sent only once a second. 3) but even if it would arrive immediately, it will not register in case cron didn't yet detect that the fork is done. Besides that, when a replica disconnects, it doesn't immediately attempts to re-connect, it waits for replication cron (one per second). in case it was already online, it may be important to try to re-connect as soon as possible, so that the backlog at the master doesn't vanish. In case it disconnected during rdb transfer, one can argue that it's not very important to re-connect immediately, but this is needed for the "diskless loading short read" test to be able to run 100 iterations in 5 seconds, rather than 3 (waiting for replication cron re-connection) changes in this commit: 1) sync command starts a fork immediately if no sync_delay is configured 2) replica sends REPLCONF ACK when done reading the rdb (rather than on 1s cron) 3) when a replica unexpectedly disconnets, it immediately tries to re-connect rather than waiting 1s 4) when when a child exits, if there is another replica waiting, we spawn a new one right away, instead of waiting for 1s replicationCron. 5) added a call to connectWithMaster from replicationSetMaster. which is called from the REPLICAOF command but also in 3 places in cluster.c, in all of these the connection attempt will now be immediate instead of delayed by 1 second. side note: we can add a call to rdbPipeReadHandler in replconfCommand when getting a REPLCONF ACK from the replica to solve a race where the replica got the entire rdb and EOF marker before we detected that the pipe was closed. in the test i did see this race happens in one about of some 300 runs, but i concluded that this race is unlikely in real life (where the replica is on another host and we're more likely to first detect the pipe was closed. the test runs 100 iterations in 3 seconds, so in some cases it'll take 4 seconds instead (waiting for another REPLCONF ACK). Removing unneeded startBgsaveForReplication from updateSlavesWaitingForBgsave Now that CheckChildrenDone is calling the new replicationStartPendingFork (extracted from serverCron) there's actually no need to call startBgsaveForReplication from updateSlavesWaitingForBgsave anymore, since as soon as updateSlavesWaitingForBgsave returns, CheckChildrenDone is calling replicationStartPendingFork that handles that anyway. The code in updateSlavesWaitingForBgsave had a bug in which it ignored repl-diskless-sync-delay, but removing that code shows that this bug was hiding another bug, which is that the max_idle should have used >= and not >, this one second delay has a big impact on my new test.
2020-08-06 15:53:06 +02:00
void replicationStartPendingFork(void);
void replicationHandleMasterDisconnection(void);
void replicationCacheMaster(client *c);
Multiparam config set (#9748) We can now do: `config set maxmemory 10m repl-backlog-size 5m` ## Basic algorithm to support "transaction like" config sets: 1. Backup all relevant current values (via get). 2. Run "verify" and "set" on everything, if we fail run "restore". 3. Run "apply" on everything (optional optimization: skip functions already run). If we fail run "restore". 4. Return success. ### restore 1. Run set on everything in backup. If we fail log it and continue (this puts us in an undefined state but we decided it's better than the alternative of panicking). This indicates either a bug or some unsupported external state. 2. Run apply on everything in backup (optimization: skip functions already run). If we fail log it (see comment above). 3. Return error. ## Implementation/design changes: * Apply function are idempotent (have no effect if they are run more than once for the same config). * No indication in set functions if we're reading the config or running from the `CONFIG SET` command (removed `update` argument). * Set function should set some config variable and assume an (optional) apply function will use that later to apply. If we know this setting can be safely applied immediately and can always be reverted and doesn't depend on any other configuration we can apply immediately from within the set function (and not store the setting anywhere). This is the case of this `dir` config, for example, which has no apply function. No apply function is need also in the case that setting the variable in the `server` struct is all that needs to be done to make the configuration take effect. Note that the original concept of `update_fn`, which received the old and new values was removed and replaced by the optional apply function. * Apply functions use settings written to the `server` struct and don't receive any inputs. * I take care that for the generic (non-special) configs if there's no change I avoid calling the setter (possible optimization: avoid calling the apply function as well). * Passing the same config parameter more than once to `config set` will fail. You can't do `config set my-setting value1 my-setting value2`. Note that getting `save` in the context of the conf file parsing to work here as before was a pain. The conf file supports an aggregate `save` definition, where each `save` line is added to the server's save params. This is unlike any other line in the config file where each line overwrites any previous configuration. Since we now support passing multiple save params in a single line (see top comments about `save` in https://github.com/redis/redis/pull/9644) we should deprecate the aggregate nature of this config line and perhaps reduce this ugly code in the future.
2021-12-01 09:15:11 +01:00
void resizeReplicationBacklog();
void replicationSetMaster(char *ip, int port);
void replicationUnsetMaster(void);
void refreshGoodSlavesCount(void);
Add new RM_Call flags for script mode, no writes, and error replies. (#10372) The PR extends RM_Call with 3 new capabilities using new flags that are given to RM_Call as part of the `fmt` argument. It aims to assist modules that are getting a list of commands to be executed from the user (not hard coded as part of the module logic), think of a module that implements a new scripting language... * `S` - Run the command in a script mode, this means that it will raise an error if a command which are not allowed inside a script (flaged with the `deny-script` flag) is invoked (like SHUTDOWN). In addition, on script mode, write commands are not allowed if there is not enough good replicas (as configured with `min-replicas-to-write`) and/or a disk error happened. * `W` - no writes mode, Redis will reject any command that is marked with `write` flag. Again can be useful to modules that implement a new scripting language and wants to prevent any write commands. * `E` - Return errors as RedisModuleCallReply. Today the errors that happened before the command was invoked (like unknown commands or acl error) return a NULL reply and set errno. This might be missing important information about the failure and it is also impossible to just pass the error to the user using RM_ReplyWithCallReply. This new flag allows you to get a RedisModuleCallReply object with the relevant error message and treat it as if it was an error that was raised by the command invocation. Tests were added to verify the new code paths. In addition small refactoring was done to share some code between modules, scripts, and `processCommand` function: 1. `getAclErrorMessage` was added to `acl.c` to unified to log message extraction from the acl result 2. `checkGoodReplicasStatus` was added to `replication.c` to check the status of good replicas. It is used on `scriptVerifyWriteCommandAllow`, `RM_Call`, and `processCommand`. 3. `writeCommandsGetDiskErrorMessage` was added to `server.c` to get the error message on persistence failure. Again it is used on `scriptVerifyWriteCommandAllow`, `RM_Call`, and `processCommand`.
2022-03-22 13:13:28 +01:00
int checkGoodReplicasStatus(void);
void processClientsWaitingReplicas(void);
void unblockClientWaitingReplicas(client *c);
int replicationCountAcksByOffset(long long offset);
void replicationSendNewlineToMaster(void);
long long replicationGetSlaveOffset(void);
char *replicationGetSlaveName(client *c);
long long getPsyncInitialOffset(void);
int replicationSetupSlaveForFullResync(client *slave, long long offset);
PSYNC2: different improvements to Redis replication. The gist of the changes is that now, partial resynchronizations between slaves and masters (without the need of a full resync with RDB transfer and so forth), work in a number of cases when it was impossible in the past. For instance: 1. When a slave is promoted to mastrer, the slaves of the old master can partially resynchronize with the new master. 2. Chained slalves (slaves of slaves) can be moved to replicate to other slaves or the master itsef, without requiring a full resync. 3. The master itself, after being turned into a slave, is able to partially resynchronize with the new master, when it joins replication again. In order to obtain this, the following main changes were operated: * Slaves also take a replication backlog, not just masters. * Same stream replication for all the slaves and sub slaves. The replication stream is identical from the top level master to its slaves and is also the same from the slaves to their sub-slaves and so forth. This means that if a slave is later promoted to master, it has the same replication backlong, and can partially resynchronize with its slaves (that were previously slaves of the old master). * A given replication history is no longer identified by the `runid` of a Redis node. There is instead a `replication ID` which changes every time the instance has a new history no longer coherent with the past one. So, for example, slaves publish the same replication history of their master, however when they are turned into masters, they publish a new replication ID, but still remember the old ID, so that they are able to partially resynchronize with slaves of the old master (up to a given offset). * The replication protocol was slightly modified so that a new extended +CONTINUE reply from the master is able to inform the slave of a replication ID change. * REPLCONF CAPA is used in order to notify masters that a slave is able to understand the new +CONTINUE reply. * The RDB file was extended with an auxiliary field that is able to select a given DB after loading in the slave, so that the slave can continue receiving the replication stream from the point it was disconnected without requiring the master to insert "SELECT" statements. This is useful in order to guarantee the "same stream" property, because the slave must be able to accumulate an identical backlog. * Slave pings to sub-slaves are now sent in a special form, when the top-level master is disconnected, in order to don't interfer with the replication stream. We just use out of band "\n" bytes as in other parts of the Redis protocol. An old design document is available here: https://gist.github.com/antirez/ae068f95c0d084891305 However the implementation is not identical to the description because during the work to implement it, different changes were needed in order to make things working well.
2016-11-09 11:31:06 +01:00
void changeReplicationId(void);
void clearReplicationId2(void);
void createReplicationBacklog(void);
Replication backlog and replicas use one global shared replication buffer (#9166) ## Background For redis master, one replica uses one copy of replication buffer, that is a big waste of memory, more replicas more waste, and allocate/free memory for every reply list also cost much. If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with replicas and can't finish synchronization with replica. If we set client-output-buffer-limit big, master may be OOM when there are many replicas that separately keep much memory. Because replication buffers of different replica client are the same, one simple idea is that all replicas only use one replication buffer, that will effectively save memory. Since replication backlog content is the same as replicas' output buffer, now we can discard replication backlog memory and use global shared replication buffer to implement replication backlog mechanism. ## Implementation I create one global "replication buffer" which contains content of replication stream. The structure of "replication buffer" is similar to the reply list that exists in every client. But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields. ```c /* Replication buffer blocks is the list of replBufBlock. * * +--------------+ +--------------+ +--------------+ * | refcount = 1 | ... | refcount = 0 | ... | refcount = 2 | * +--------------+ +--------------+ +--------------+ * | / \ * | / \ * | / \ * Repl Backlog Replia_A Replia_B * * Each replica or replication backlog increments only the refcount of the * 'ref_repl_buf_node' which it points to. So when replica walks to the next * node, it should first increase the next node's refcount, and when we trim * the replication buffer nodes, we remove node always from the head node which * refcount is 0. If the refcount of the head node is not 0, we must stop * trimming and never iterate the next node. */ /* Similar with 'clientReplyBlock', it is used for shared buffers between * all replica clients and replication backlog. */ typedef struct replBufBlock { int refcount; /* Number of replicas or repl backlog using. */ long long id; /* The unique incremental number. */ long long repl_offset; /* Start replication offset of the block. */ size_t size, used; char buf[]; } replBufBlock; ``` So now when we feed replication stream into replication backlog and all replicas, we only need to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of replication backlog and replicas to references of the global replication buffer blocks. And we also need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim replication backlog if exceeding `repl-backlog-size`. When sending reply to replicas, we also need to iterate replication buffer blocks and send its content, when totally sending one block for replica, we decrease current node count and increase the next current node count, and then free the block which reference is 0 from the head of replication buffer blocks. Since now we use linked list to manage replication backlog, it may cost much time for iterating all linked list nodes to find corresponding replication buffer node. So we create a rax tree to store some nodes for index, but to avoid rax tree occupying too much memory, i record one per 64 nodes for index. Currently, to make partial resynchronization as possible as much, we always let replication backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks, and this method doesn't increase memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced replication buffer blocks when we need to trim backlog for exceeding backlog size setting, we trim backlog incrementally (free 64 blocks per call now), and make it faster in `beforeSleep` (free 640 blocks). ### Other changes - `mem_total_replication_buffers`: we add this field in INFO command, it means the total memory of replication buffers used. - `mem_clients_slaves`: now even replica is slow to replicate, and its output buffer memory is not 0, but it still may be 0, since replication backlog and replicas share one global replication buffer, only if replication buffer memory is more than the repl backlog setting size, we consider the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption of repl backlog. - Key eviction Since all replicas and replication backlog share global replication buffer, we think only the part of exceeding backlog size the extra separate consumption of replicas. Because we trim backlog incrementally in the background, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks disconnect. To avoid massive eviction loop, we don't count the delayed freed replication backlog into used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory. - `client-output-buffer-limit` check for replica clients It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size config (partial sync will succeed and then replica will get disconnected). Such a configuration is ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption implications since the replica client will share the backlog buffers memory. - Drop replication backlog after loading data if needed We always create replication backlog if server is a master, we need it because we put DELs in it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb, it is not possible to support partial resynchronization, to avoid extra memory of replication backlog, we drop it. - Multi IO threads Since all replicas and replication backlog use global replication buffer, if I/O threads are enabled, to guarantee data accessing thread safe, we must let main thread handle sending the output buffer to all replicas. But before, other IO threads could handle sending output buffer of all replicas. ## Other optimizations This solution resolve some other problem: - When replicas disconnect with master since of out of output buffer limit, releasing the output buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now, it doesn't cause freezing. - This implementation may mitigate reply list copy cost time(also freezes server) when one replication has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy reference info, it is very light. - If we set replication backlog size big, it also may cost much time to copy replication backlog into replica's output buffer. But this commit eliminates this problem. - Resizing replication backlog size doesn't empty current replication backlog content.
2021-10-25 08:24:31 +02:00
void freeReplicationBacklog(void);
void replicationCacheMasterUsingMyself(void);
Fix replication of SLAVEOF inside transaction. In Redis 4.0 replication, with the introduction of PSYNC2, masters and slaves replicate commands to cascading slaves and to the replication backlog itself in a different way compared to the past. Masters actually replicate the effects of client commands. Slaves just propagate what they receive from masters. This mechanism can cause problems when the configuration of an instance is changed from master to slave inside a transaction. For instance we could send to a master instance the following sequence: MULTI SLAVEOF 127.0.0.1 0 EXEC SLAVEOF NO ONE Before the fixes in this commit, the MULTI command used to be propagated into the replication backlog, however after the SLAVEOF command the instance is a slave, so the EXEC implementation failed to also propagate the EXEC command. When the slaves of the above instance reconnected, they were incrementally synchronized just sending a "MULTI". This put the master client (in the slaves) into MULTI state, breaking the replication. Notably even Redis Sentinel uses the above approach in order to guarantee that configuration changes are always performed together with rewrites of the configuration and with clients disconnection. Sentiel does: MULTI SLAVEOF ... CONFIG REWRITE CLIENT KILL TYPE normal EXEC So this was a really problematic issue. However even with the fix in this commit, that will add the final EXEC to the replication stream in case the instance was switched from master to slave during the transaction, the result would be to increment the slave replication offset, so a successive reconnection with the new master, will not permit a successful partial resynchronization: no way the new master can provide us with the backlog needed, we incremented our offset to a value that the new master cannot have. However the EXEC implementation waits to emit the MULTI, so that if the commands inside the transaction actually do not need to be replicated, no commands propagation happens at all. From multi.c: if (!must_propagate && !(c->cmd->flags & (CMD_READONLY|CMD_ADMIN))) { execCommandPropagateMulti(c); must_propagate = 1; } The above code is already modified by this commit you are reading. Now also ADMIN commands do not trigger the emission of MULTI. It is actually not clear why we do not just check for CMD_WRITE... Probably I wrote it this way in order to make the code more reliable: better to over-emit MULTI than not emitting it in time. So this commit should indeed fix issue #3836 (verified), however it looks like some reconsideration of this code path is needed in the long term. BONUS POINT: The reverse bug. Even in a read only slave "B", in a replication setup like: A -> B -> C There are commands without the READONLY nor the ADMIN flag, that are also not flagged as WRITE commands. An example is just the PING command. So if we send B the following sequence: MULTI PING SLAVEOF NO ONE EXEC The result will be the reverse bug, where only EXEC is emitted, but not the previous MULTI. However this apparently does not create problems in practice but it is yet another acknowledge of the fact some work is needed here in order to make this code path less surprising. Note that there are many different approaches we could follow. For instance MULTI/EXEC blocks containing administrative commands may be allowed ONLY if all the commands are administrative ones, otherwise they could be denined. When allowed, the commands could simply never be replicated at all.
2017-07-12 11:07:28 +02:00
void feedReplicationBacklog(void *ptr, size_t len);
Replication backlog and replicas use one global shared replication buffer (#9166) ## Background For redis master, one replica uses one copy of replication buffer, that is a big waste of memory, more replicas more waste, and allocate/free memory for every reply list also cost much. If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with replicas and can't finish synchronization with replica. If we set client-output-buffer-limit big, master may be OOM when there are many replicas that separately keep much memory. Because replication buffers of different replica client are the same, one simple idea is that all replicas only use one replication buffer, that will effectively save memory. Since replication backlog content is the same as replicas' output buffer, now we can discard replication backlog memory and use global shared replication buffer to implement replication backlog mechanism. ## Implementation I create one global "replication buffer" which contains content of replication stream. The structure of "replication buffer" is similar to the reply list that exists in every client. But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields. ```c /* Replication buffer blocks is the list of replBufBlock. * * +--------------+ +--------------+ +--------------+ * | refcount = 1 | ... | refcount = 0 | ... | refcount = 2 | * +--------------+ +--------------+ +--------------+ * | / \ * | / \ * | / \ * Repl Backlog Replia_A Replia_B * * Each replica or replication backlog increments only the refcount of the * 'ref_repl_buf_node' which it points to. So when replica walks to the next * node, it should first increase the next node's refcount, and when we trim * the replication buffer nodes, we remove node always from the head node which * refcount is 0. If the refcount of the head node is not 0, we must stop * trimming and never iterate the next node. */ /* Similar with 'clientReplyBlock', it is used for shared buffers between * all replica clients and replication backlog. */ typedef struct replBufBlock { int refcount; /* Number of replicas or repl backlog using. */ long long id; /* The unique incremental number. */ long long repl_offset; /* Start replication offset of the block. */ size_t size, used; char buf[]; } replBufBlock; ``` So now when we feed replication stream into replication backlog and all replicas, we only need to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of replication backlog and replicas to references of the global replication buffer blocks. And we also need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim replication backlog if exceeding `repl-backlog-size`. When sending reply to replicas, we also need to iterate replication buffer blocks and send its content, when totally sending one block for replica, we decrease current node count and increase the next current node count, and then free the block which reference is 0 from the head of replication buffer blocks. Since now we use linked list to manage replication backlog, it may cost much time for iterating all linked list nodes to find corresponding replication buffer node. So we create a rax tree to store some nodes for index, but to avoid rax tree occupying too much memory, i record one per 64 nodes for index. Currently, to make partial resynchronization as possible as much, we always let replication backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks, and this method doesn't increase memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced replication buffer blocks when we need to trim backlog for exceeding backlog size setting, we trim backlog incrementally (free 64 blocks per call now), and make it faster in `beforeSleep` (free 640 blocks). ### Other changes - `mem_total_replication_buffers`: we add this field in INFO command, it means the total memory of replication buffers used. - `mem_clients_slaves`: now even replica is slow to replicate, and its output buffer memory is not 0, but it still may be 0, since replication backlog and replicas share one global replication buffer, only if replication buffer memory is more than the repl backlog setting size, we consider the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption of repl backlog. - Key eviction Since all replicas and replication backlog share global replication buffer, we think only the part of exceeding backlog size the extra separate consumption of replicas. Because we trim backlog incrementally in the background, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks disconnect. To avoid massive eviction loop, we don't count the delayed freed replication backlog into used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory. - `client-output-buffer-limit` check for replica clients It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size config (partial sync will succeed and then replica will get disconnected). Such a configuration is ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption implications since the replica client will share the backlog buffers memory. - Drop replication backlog after loading data if needed We always create replication backlog if server is a master, we need it because we put DELs in it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb, it is not possible to support partial resynchronization, to avoid extra memory of replication backlog, we drop it. - Multi IO threads Since all replicas and replication backlog use global replication buffer, if I/O threads are enabled, to guarantee data accessing thread safe, we must let main thread handle sending the output buffer to all replicas. But before, other IO threads could handle sending output buffer of all replicas. ## Other optimizations This solution resolve some other problem: - When replicas disconnect with master since of out of output buffer limit, releasing the output buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now, it doesn't cause freezing. - This implementation may mitigate reply list copy cost time(also freezes server) when one replication has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy reference info, it is very light. - If we set replication backlog size big, it also may cost much time to copy replication backlog into replica's output buffer. But this commit eliminates this problem. - Resizing replication backlog size doesn't empty current replication backlog content.
2021-10-25 08:24:31 +02:00
void incrementalTrimReplicationBacklog(size_t blocks);
int canFeedReplicaReplBuffer(client *replica);
void rebaseReplicationBuffer(long long base_repl_offset);
void showLatestBacklog(void);
void rdbPipeReadHandler(struct aeEventLoop *eventLoop, int fd, void *clientData, int mask);
void rdbPipeWriteHandlerConnRemoved(struct connection *conn);
void clearFailoverState(void);
void updateFailoverStatus(void);
void abortFailover(const char *err);
const char *getFailoverStateString();
/* Generic persistence functions */
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788) Implement Multi-Part AOF mechanism to avoid overheads during AOFRW. Introducing a folder with multiple AOF files tracked by a manifest file. The main issues with the the original AOFRW mechanism are: * buffering of commands that are processed during rewrite (consuming a lot of RAM) * freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it. * double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files) The main modifications of this PR: 1. Remove the AOF rewrite buffer and related code. 2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type, it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the incremental commands since the last AOFRW. 3. Use a AOF manifest file to record and manage these AOF files mentioned above. 4. The original configuration of `appendfilename` will be the base part of the new file name, for example: `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof` 5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename` 6. Remove the `aof_rewrite_buffer_length` field in info. 7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs. It also gives users the opportunity to preserve the history AOFs. just for testing use now. 8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now), we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately. 9. Support upgrade (load) data from old version redis. 10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and manifest file will be placed in this directory. 11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if `aof-load-truncated` is enabled. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-03 18:14:13 +01:00
void startLoadingFile(size_t size, char* filename, int rdbflags);
Replica keep serving data during repl-diskless-load=swapdb for better availability (#9323) For diskless replication in swapdb mode, considering we already spend replica memory having a backup of current db to restore in case of failure, we can have the following benefits by instead swapping database only in case we succeeded in transferring db from master: - Avoid `LOADING` response during failed and successful synchronization for cases where the replica is already up and running with data. - Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping. - This could be implemented also for disk replication with similar benefits if consumers are willing to spend the extra memory usage. General notes: - The concept of `backupDb` becomes `tempDb` for clarity. - Async loading mode will only kick in if the replica is syncing from a master that has the same repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline. - New property in INFO: `async_loading` to differentiate from the blocking loading - Slot to Key mapping is now a field of `redisDb` as it's more natural to access it from both server.db and the tempDb that is passed around. - Because this is affecting replicas only, we assume that if they are not readonly and write commands during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET here anyways to avoid complications. Considerations for review: - We have many cases where server.loading flag is used and even though I tried my best, there may be cases where async_loading should be checked as well and cases where it shouldn't (would require very good understanding of whole code) - Several places that had different behavior depending on the loading flag where actually meant to just handle commands coming from the AOF client differently than ones coming from real clients, changed to check CLIENT_ID_AOF instead. **Additional for Release Notes** - Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't contribute on triggering next database SAVE - New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING - Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event. Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED, ABORTED and COMPLETED. - New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions to allow modules to declare they support the diskless replication with async loading (when absent, we fall back to disk-based loading). Co-authored-by: Eduardo Semprebon <edus@saxobank.com> Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-04 09:46:50 +01:00
void startLoading(size_t size, int rdbflags, int async);
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788) Implement Multi-Part AOF mechanism to avoid overheads during AOFRW. Introducing a folder with multiple AOF files tracked by a manifest file. The main issues with the the original AOFRW mechanism are: * buffering of commands that are processed during rewrite (consuming a lot of RAM) * freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it. * double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files) The main modifications of this PR: 1. Remove the AOF rewrite buffer and related code. 2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type, it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the incremental commands since the last AOFRW. 3. Use a AOF manifest file to record and manage these AOF files mentioned above. 4. The original configuration of `appendfilename` will be the base part of the new file name, for example: `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof` 5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename` 6. Remove the `aof_rewrite_buffer_length` field in info. 7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs. It also gives users the opportunity to preserve the history AOFs. just for testing use now. 8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now), we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately. 9. Support upgrade (load) data from old version redis. 10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and manifest file will be placed in this directory. 11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if `aof-load-truncated` is enabled. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-03 18:14:13 +01:00
void loadingAbsProgress(off_t pos);
void loadingIncrProgress(off_t size);
void stopLoading(int success);
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788) Implement Multi-Part AOF mechanism to avoid overheads during AOFRW. Introducing a folder with multiple AOF files tracked by a manifest file. The main issues with the the original AOFRW mechanism are: * buffering of commands that are processed during rewrite (consuming a lot of RAM) * freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it. * double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files) The main modifications of this PR: 1. Remove the AOF rewrite buffer and related code. 2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type, it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the incremental commands since the last AOFRW. 3. Use a AOF manifest file to record and manage these AOF files mentioned above. 4. The original configuration of `appendfilename` will be the base part of the new file name, for example: `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof` 5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename` 6. Remove the `aof_rewrite_buffer_length` field in info. 7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs. It also gives users the opportunity to preserve the history AOFs. just for testing use now. 8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now), we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately. 9. Support upgrade (load) data from old version redis. 10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and manifest file will be placed in this directory. 11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if `aof-load-truncated` is enabled. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-03 18:14:13 +01:00
void updateLoadingFileName(char* filename);
void startSaving(int rdbflags);
void stopSaving(int success);
int allPersistenceDisabled(void);
#define DISK_ERROR_TYPE_AOF 1 /* Don't accept writes: AOF errors. */
#define DISK_ERROR_TYPE_RDB 2 /* Don't accept writes: RDB errors. */
#define DISK_ERROR_TYPE_NONE 0 /* No problems, we can accept writes. */
int writeCommandsDeniedByDiskError(void);
Add new RM_Call flags for script mode, no writes, and error replies. (#10372) The PR extends RM_Call with 3 new capabilities using new flags that are given to RM_Call as part of the `fmt` argument. It aims to assist modules that are getting a list of commands to be executed from the user (not hard coded as part of the module logic), think of a module that implements a new scripting language... * `S` - Run the command in a script mode, this means that it will raise an error if a command which are not allowed inside a script (flaged with the `deny-script` flag) is invoked (like SHUTDOWN). In addition, on script mode, write commands are not allowed if there is not enough good replicas (as configured with `min-replicas-to-write`) and/or a disk error happened. * `W` - no writes mode, Redis will reject any command that is marked with `write` flag. Again can be useful to modules that implement a new scripting language and wants to prevent any write commands. * `E` - Return errors as RedisModuleCallReply. Today the errors that happened before the command was invoked (like unknown commands or acl error) return a NULL reply and set errno. This might be missing important information about the failure and it is also impossible to just pass the error to the user using RM_ReplyWithCallReply. This new flag allows you to get a RedisModuleCallReply object with the relevant error message and treat it as if it was an error that was raised by the command invocation. Tests were added to verify the new code paths. In addition small refactoring was done to share some code between modules, scripts, and `processCommand` function: 1. `getAclErrorMessage` was added to `acl.c` to unified to log message extraction from the acl result 2. `checkGoodReplicasStatus` was added to `replication.c` to check the status of good replicas. It is used on `scriptVerifyWriteCommandAllow`, `RM_Call`, and `processCommand`. 3. `writeCommandsGetDiskErrorMessage` was added to `server.c` to get the error message on persistence failure. Again it is used on `scriptVerifyWriteCommandAllow`, `RM_Call`, and `processCommand`.
2022-03-22 13:13:28 +01:00
sds writeCommandsGetDiskErrorMessage(int);
/* RDB persistence */
#include "rdb.h"
void killRDBChild(void);
int bg_unlink(const char *filename);
/* AOF persistence */
void flushAppendOnlyFile(int force);
void feedAppendOnlyFile(int dictid, robj **argv, int argc);
void aofRemoveTempFile(pid_t childpid);
int rewriteAppendOnlyFileBackground(void);
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788) Implement Multi-Part AOF mechanism to avoid overheads during AOFRW. Introducing a folder with multiple AOF files tracked by a manifest file. The main issues with the the original AOFRW mechanism are: * buffering of commands that are processed during rewrite (consuming a lot of RAM) * freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it. * double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files) The main modifications of this PR: 1. Remove the AOF rewrite buffer and related code. 2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type, it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the incremental commands since the last AOFRW. 3. Use a AOF manifest file to record and manage these AOF files mentioned above. 4. The original configuration of `appendfilename` will be the base part of the new file name, for example: `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof` 5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename` 6. Remove the `aof_rewrite_buffer_length` field in info. 7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs. It also gives users the opportunity to preserve the history AOFs. just for testing use now. 8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now), we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately. 9. Support upgrade (load) data from old version redis. 10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and manifest file will be placed in this directory. 11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if `aof-load-truncated` is enabled. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-03 18:14:13 +01:00
int loadAppendOnlyFiles(aofManifest *am);
void stopAppendOnly(void);
int startAppendOnly(void);
void backgroundRewriteDoneHandler(int exitcode, int bysignal);
2016-08-09 16:41:40 +02:00
ssize_t aofReadDiffFromParent(void);
void killAppendOnlyChild(void);
void restartAOFAfterSYNC();
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788) Implement Multi-Part AOF mechanism to avoid overheads during AOFRW. Introducing a folder with multiple AOF files tracked by a manifest file. The main issues with the the original AOFRW mechanism are: * buffering of commands that are processed during rewrite (consuming a lot of RAM) * freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it. * double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files) The main modifications of this PR: 1. Remove the AOF rewrite buffer and related code. 2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type, it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the incremental commands since the last AOFRW. 3. Use a AOF manifest file to record and manage these AOF files mentioned above. 4. The original configuration of `appendfilename` will be the base part of the new file name, for example: `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof` 5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename` 6. Remove the `aof_rewrite_buffer_length` field in info. 7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs. It also gives users the opportunity to preserve the history AOFs. just for testing use now. 8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now), we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately. 9. Support upgrade (load) data from old version redis. 10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and manifest file will be placed in this directory. 11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if `aof-load-truncated` is enabled. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-03 18:14:13 +01:00
void aofLoadManifestFromDisk(void);
void aofOpenIfNeededOnServerStart(void);
void aofManifestFree(aofManifest *am);
int aofDelHistoryFiles(void);
int aofRewriteLimited(void);
/* Child info */
void openChildInfoPipe(void);
void closeChildInfoPipe(void);
void sendChildInfoGeneric(childInfoType info_type, size_t keys, double progress, char *pname);
void sendChildCowInfo(childInfoType info_type, char *pname);
void sendChildInfo(childInfoType info_type, size_t keys, char *pname);
void receiveChildInfo(void);
/* Fork helpers */
int redisFork(int type);
int hasActiveChildProcess();
void resetChildState();
int isMutuallyExclusiveChildType(int type);
/* acl.c -- Authentication related prototypes. */
extern rax *Users;
extern user *DefaultUser;
2019-01-10 16:39:32 +01:00
void ACLInit(void);
/* Return values for ACLCheckAllPerm(). */
2019-01-16 18:31:05 +01:00
#define ACL_OK 0
#define ACL_DENIED_CMD 1
#define ACL_DENIED_KEY 2
2020-02-04 12:55:26 +01:00
#define ACL_DENIED_AUTH 3 /* Only used for ACL LOG entries. */
Adds pub/sub channel patterns to ACL (#7993) Fixes #7923. This PR appropriates the special `&` symbol (because `@` and `*` are taken), followed by a literal value or pattern for describing the Pub/Sub patterns that an ACL user can interact with. It is similar to the existing key patterns mechanism in function (additive) and implementation (copy-pasta). It also adds the allchannels and resetchannels ACL keywords, naturally. The default user is given allchannels permissions, whereas new users get whatever is defined by the acl-pubsub-default configuration directive. For backward compatibility in 6.2, the default of this directive is allchannels but this is likely to be changed to resetchannels in the next major version for stronger default security settings. Unless allchannels is set for the user, channel access permissions are checked as follows : * Calls to both PUBLISH and SUBSCRIBE will fail unless a pattern matching the argumentative channel name(s) exists for the user. * Calls to PSUBSCRIBE will fail unless the pattern(s) provided as an argument literally exist(s) in the user's list. Such failures are logged to the ACL log. Runtime changes to channel permissions for a user with existing subscribing clients cause said clients to disconnect unless the new permissions permit the connections to continue. Note, however, that PSUBSCRIBErs' patterns are matched literally, so given the change bar:* -> b*, pattern subscribers to bar:* will be disconnected. Notes/questions: * UNSUBSCRIBE, PUNSUBSCRIBE and PUBSUB remain unprotected due to lack of reasons for touching them.
2020-12-01 13:21:39 +01:00
#define ACL_DENIED_CHANNEL 4 /* Only used for pub/sub commands */
/* Context values for addACLLogEntry(). */
#define ACL_LOG_CTX_TOPLEVEL 0
#define ACL_LOG_CTX_LUA 1
#define ACL_LOG_CTX_MULTI 2
#define ACL_LOG_CTX_MODULE 3
/* ACL key permission types */
#define ACL_READ_PERMISSION (1<<0)
#define ACL_WRITE_PERMISSION (1<<1)
#define ACL_ALL_PERMISSION (ACL_READ_PERMISSION|ACL_WRITE_PERMISSION)
int ACLCheckUserCredentials(robj *username, robj *password);
int ACLAuthenticateUser(client *c, robj *username, robj *password);
unsigned long ACLGetCommandID(sds cmdname);
void ACLClearCommandID(void);
user *ACLGetUserByName(const char *name, size_t namelen);
int ACLUserCheckKeyPerm(user *u, const char *key, int keylen, int flags);
int ACLUserCheckChannelPerm(user *u, sds channel, int literal);
int ACLCheckAllUserCommandPerm(user *u, struct redisCommand *cmd, robj **argv, int argc, int *idxptr);
make sort/ro commands validate external keys access patterns (#10106) (#10340) Currently the sort and sort_ro can access external keys via `GET` and `BY` in order to make sure the user cannot violate the authorization ACL rules, the decision is to reject external keys access patterns unless ACL allows SORT full access to all keys. I.e. for backwards compatibility, SORT with GET/BY keeps working, but if ACL has restrictions to certain keys, these features get permission denied. ### Implemented solution We have discussed several potential solutions and decided to only allow the GET and BY arguments when the user has all key permissions with the SORT command. The reasons being that SORT with GET or BY is problematic anyway, for instance it is not supported in cluster mode since it doesn't declare keys, and we're not sure the combination of that feature with ACL key restriction is really required. **HOWEVER** If in the fullness of time we will identify a real need for fine grain access support for SORT, we would implement the complete solution which is the alternative described below. ### Alternative (Completion solution): Check sort ACL rules after executing it and before committing output (either via store or to COB). it would require making several changes to the sort command itself. and would potentially cause performance degradation since we will have to collect all the get keys instead of just applying them to a temp array and then scan the access keys against the ACL selectors. This solution can include an optimization to avoid the overheads of collecting the key names, in case the ACL rules grant SORT full key-access, or if the ACL key pattern literal matches the one used in GET/BY. It would also mean that authorization would be O(nlogn) since we will have to complete most of the command execution before we can perform verification Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Oran Agra <oran@redislabs.com>
2022-03-15 16:14:53 +01:00
int ACLUserCheckCmdWithUnrestrictedKeyAccess(user *u, struct redisCommand *cmd, robj **argv, int argc, int flags);
int ACLCheckAllPerm(client *c, int *idxptr);
int ACLSetUser(user *u, const char *op, ssize_t oplen);
uint64_t ACLGetCommandCategoryFlagByName(const char *name);
int ACLAppendUserForLoading(sds *argv, int argc, int *argc_err);
const char *ACLSetUserStringError(void);
int ACLLoadConfiguredUsers(void);
sds ACLDescribeUser(user *u);
void ACLLoadUsersAtStartup(void);
void addReplyCommandCategories(client *c, struct redisCommand *cmd);
user *ACLCreateUnlinkedUser();
void ACLFreeUserAndKillClients(user *u);
void addACLLogEntry(client *c, int reason, int context, int argpos, sds username, sds object);
Add new RM_Call flags for script mode, no writes, and error replies. (#10372) The PR extends RM_Call with 3 new capabilities using new flags that are given to RM_Call as part of the `fmt` argument. It aims to assist modules that are getting a list of commands to be executed from the user (not hard coded as part of the module logic), think of a module that implements a new scripting language... * `S` - Run the command in a script mode, this means that it will raise an error if a command which are not allowed inside a script (flaged with the `deny-script` flag) is invoked (like SHUTDOWN). In addition, on script mode, write commands are not allowed if there is not enough good replicas (as configured with `min-replicas-to-write`) and/or a disk error happened. * `W` - no writes mode, Redis will reject any command that is marked with `write` flag. Again can be useful to modules that implement a new scripting language and wants to prevent any write commands. * `E` - Return errors as RedisModuleCallReply. Today the errors that happened before the command was invoked (like unknown commands or acl error) return a NULL reply and set errno. This might be missing important information about the failure and it is also impossible to just pass the error to the user using RM_ReplyWithCallReply. This new flag allows you to get a RedisModuleCallReply object with the relevant error message and treat it as if it was an error that was raised by the command invocation. Tests were added to verify the new code paths. In addition small refactoring was done to share some code between modules, scripts, and `processCommand` function: 1. `getAclErrorMessage` was added to `acl.c` to unified to log message extraction from the acl result 2. `checkGoodReplicasStatus` was added to `replication.c` to check the status of good replicas. It is used on `scriptVerifyWriteCommandAllow`, `RM_Call`, and `processCommand`. 3. `writeCommandsGetDiskErrorMessage` was added to `server.c` to get the error message on persistence failure. Again it is used on `scriptVerifyWriteCommandAllow`, `RM_Call`, and `processCommand`.
2022-03-22 13:13:28 +01:00
const char* getAclErrorMessage(int acl_res);
void ACLUpdateDefaultUserPassword(sds password);
/* Sorted sets data type */
2016-04-14 15:58:49 +02:00
/* Input flags. */
#define ZADD_IN_NONE 0
#define ZADD_IN_INCR (1<<0) /* Increment the score instead of setting it. */
#define ZADD_IN_NX (1<<1) /* Don't touch elements not already existing. */
#define ZADD_IN_XX (1<<2) /* Only touch elements already existing. */
#define ZADD_IN_GT (1<<3) /* Only update existing when new scores are higher. */
#define ZADD_IN_LT (1<<4) /* Only update existing when new scores are lower. */
2016-04-14 15:58:49 +02:00
/* Output flags. */
#define ZADD_OUT_NOP (1<<0) /* Operation not performed because of conditionals.*/
#define ZADD_OUT_NAN (1<<1) /* Only touch elements already existing. */
#define ZADD_OUT_ADDED (1<<2) /* The element was new and was added. */
#define ZADD_OUT_UPDATED (1<<3) /* The element already existed, score updated. */
2016-04-14 15:58:49 +02:00
Squash merging 125 typo/grammar/comment/doc PRs (#7773) List of squashed commits or PRs =============================== commit 66801ea Author: hwware <wen.hui.ware@gmail.com> Date: Mon Jan 13 00:54:31 2020 -0500 typo fix in acl.c commit 46f55db Author: Itamar Haber <itamar@redislabs.com> Date: Sun Sep 6 18:24:11 2020 +0300 Updates a couple of comments Specifically: * RM_AutoMemory completed instead of pointing to docs * Updated link to custom type doc commit 61a2aa0 Author: xindoo <xindoo@qq.com> Date: Tue Sep 1 19:24:59 2020 +0800 Correct errors in code comments commit a5871d1 Author: yz1509 <pro-756@qq.com> Date: Tue Sep 1 18:36:06 2020 +0800 fix typos in module.c commit 41eede7 Author: bookug <bookug@qq.com> Date: Sat Aug 15 01:11:33 2020 +0800 docs: fix typos in comments commit c303c84 Author: lazy-snail <ws.niu@outlook.com> Date: Fri Aug 7 11:15:44 2020 +0800 fix spelling in redis.conf commit 1eb76bf Author: zhujian <zhujianxyz@gmail.com> Date: Thu Aug 6 15:22:10 2020 +0800 add a missing 'n' in comment commit 1530ec2 Author: Daniel Dai <764122422@qq.com> Date: Mon Jul 27 00:46:35 2020 -0400 fix spelling in tracking.c commit e517b31 Author: Hunter-Chen <huntcool001@gmail.com> Date: Fri Jul 17 22:33:32 2020 +0800 Update redis.conf Co-authored-by: Itamar Haber <itamar@redislabs.com> commit c300eff Author: Hunter-Chen <huntcool001@gmail.com> Date: Fri Jul 17 22:33:23 2020 +0800 Update redis.conf Co-authored-by: Itamar Haber <itamar@redislabs.com> commit 4c058a8 Author: 陈浩鹏 <chenhaopeng@heytea.com> Date: Thu Jun 25 19:00:56 2020 +0800 Grammar fix and clarification commit 5fcaa81 Author: bodong.ybd <bodong.ybd@alibaba-inc.com> Date: Fri Jun 19 10:09:00 2020 +0800 Fix typos commit 4caca9a Author: Pruthvi P <pruthvi@ixigo.com> Date: Fri May 22 00:33:22 2020 +0530 Fix typo eviciton => eviction commit b2a25f6 Author: Brad Dunbar <dunbarb2@gmail.com> Date: Sun May 17 12:39:59 2020 -0400 Fix a typo. commit 12842ae Author: hwware <wen.hui.ware@gmail.com> Date: Sun May 3 17:16:59 2020 -0400 fix spelling in redis conf commit ddba07c Author: Chris Lamb <chris@chris-lamb.co.uk> Date: Sat May 2 23:25:34 2020 +0100 Correct a "conflicts" spelling error. commit 8fc7bf2 Author: Nao YONASHIRO <yonashiro@r.recruit.co.jp> Date: Thu Apr 30 10:25:27 2020 +0900 docs: fix EXPIRE_FAST_CYCLE_DURATION to ACTIVE_EXPIRE_CYCLE_FAST_DURATION commit 9b2b67a Author: Brad Dunbar <dunbarb2@gmail.com> Date: Fri Apr 24 11:46:22 2020 -0400 Fix a typo. commit 0746f10 Author: devilinrust <63737265+devilinrust@users.noreply.github.com> Date: Thu Apr 16 00:17:53 2020 +0200 Fix typos in server.c commit 92b588d Author: benjessop12 <56115861+benjessop12@users.noreply.github.com> Date: Mon Apr 13 13:43:55 2020 +0100 Fix spelling mistake in lazyfree.c commit 1da37aa Merge: 2d4ba28 af347a8 Author: hwware <wen.hui.ware@gmail.com> Date: Thu Mar 5 22:41:31 2020 -0500 Merge remote-tracking branch 'upstream/unstable' into expiretypofix commit 2d4ba28 Author: hwware <wen.hui.ware@gmail.com> Date: Mon Mar 2 00:09:40 2020 -0500 fix typo in expire.c commit 1a746f7 Author: SennoYuki <minakami1yuki@gmail.com> Date: Thu Feb 27 16:54:32 2020 +0800 fix typo commit 8599b1a Author: dongheejeong <donghee950403@gmail.com> Date: Sun Feb 16 20:31:43 2020 +0000 Fix typo in server.c commit f38d4e8 Author: hwware <wen.hui.ware@gmail.com> Date: Sun Feb 2 22:58:38 2020 -0500 fix typo in evict.c commit fe143fc Author: Leo Murillo <leonardo.murillo@gmail.com> Date: Sun Feb 2 01:57:22 2020 -0600 Fix a few typos in redis.conf commit 1ab4d21 Author: viraja1 <anchan.viraj@gmail.com> Date: Fri Dec 27 17:15:58 2019 +0530 Fix typo in Latency API docstring commit ca1f70e Author: gosth <danxuedexing@qq.com> Date: Wed Dec 18 15:18:02 2019 +0800 fix typo in sort.c commit a57c06b Author: ZYunH <zyunhjob@163.com> Date: Mon Dec 16 22:28:46 2019 +0800 fix-zset-typo commit b8c92b5 Author: git-hulk <hulk.website@gmail.com> Date: Mon Dec 16 15:51:42 2019 +0800 FIX: typo in cluster.c, onformation->information commit 9dd981c Author: wujm2007 <jim.wujm@gmail.com> Date: Mon Dec 16 09:37:52 2019 +0800 Fix typo commit e132d7a Author: Sebastien Williams-Wynn <s.williamswynn.mail@gmail.com> Date: Fri Nov 15 00:14:07 2019 +0000 Minor typo change commit 47f44d5 Author: happynote3966 <01ssrmikururudevice01@gmail.com> Date: Mon Nov 11 22:08:48 2019 +0900 fix comment typo in redis-cli.c commit b8bdb0d Author: fulei <fulei@kuaishou.com> Date: Wed Oct 16 18:00:17 2019 +0800 Fix a spelling mistake of comments in defragDictBucketCallback commit 0def46a Author: fulei <fulei@kuaishou.com> Date: Wed Oct 16 13:09:27 2019 +0800 fix some spelling mistakes of comments in defrag.c commit f3596fd Author: Phil Rajchgot <tophil@outlook.com> Date: Sun Oct 13 02:02:32 2019 -0400 Typo and grammar fixes Redis and its documentation are great -- just wanted to submit a few corrections in the spirit of Hacktoberfest. Thanks for all your work on this project. I use it all the time and it works beautifully. commit 2b928cd Author: KangZhiDong <worldkzd@gmail.com> Date: Sun Sep 1 07:03:11 2019 +0800 fix typos commit 33aea14 Author: Axlgrep <axlgrep@gmail.com> Date: Tue Aug 27 11:02:18 2019 +0800 Fixed eviction spelling issues commit e282a80 Author: Simen Flatby <simen@oms.no> Date: Tue Aug 20 15:25:51 2019 +0200 Update comments to reflect prop name In the comments the prop is referenced as replica-validity-factor, but it is really named cluster-replica-validity-factor. commit 74d1f9a Author: Jim Green <jimgreen2013@qq.com> Date: Tue Aug 20 20:00:31 2019 +0800 fix comment error, the code is ok commit eea1407 Author: Liao Tonglang <liaotonglang@gmail.com> Date: Fri May 31 10:16:18 2019 +0800 typo fix fix cna't to can't commit 0da553c Author: KAWACHI Takashi <tkawachi@gmail.com> Date: Wed Jul 17 00:38:16 2019 +0900 Fix typo commit 7fc8fb6 Author: Michael Prokop <mika@grml.org> Date: Tue May 28 17:58:42 2019 +0200 Typo fixes s/familar/familiar/ s/compatiblity/compatibility/ s/ ot / to / s/itsef/itself/ commit 5f46c9d Author: zhumoing <34539422+zhumoing@users.noreply.github.com> Date: Tue May 21 21:16:50 2019 +0800 typo-fixes typo-fixes commit 321dfe1 Author: wxisme <850885154@qq.com> Date: Sat Mar 16 15:10:55 2019 +0800 typo fix commit b4fb131 Merge: 267e0e6 3df1eb8 Author: Nikitas Bastas <nikitasbst@gmail.com> Date: Fri Feb 8 22:55:45 2019 +0200 Merge branch 'unstable' of antirez/redis into unstable commit 267e0e6 Author: Nikitas Bastas <nikitasbst@gmail.com> Date: Wed Jan 30 21:26:04 2019 +0200 Minor typo fix commit 30544e7 Author: inshal96 <39904558+inshal96@users.noreply.github.com> Date: Fri Jan 4 16:54:50 2019 +0500 remove an extra 'a' in the comments commit 337969d Author: BrotherGao <yangdongheng11@gmail.com> Date: Sat Dec 29 12:37:29 2018 +0800 fix typo in redis.conf commit 9f4b121 Merge: 423a030 e504583 Author: BrotherGao <yangdongheng@xiaomi.com> Date: Sat Dec 29 11:41:12 2018 +0800 Merge branch 'unstable' of antirez/redis into unstable commit 423a030 Merge: 42b02b7 46a51cd Author: 杨东衡 <yangdongheng@xiaomi.com> Date: Tue Dec 4 23:56:11 2018 +0800 Merge branch 'unstable' of antirez/redis into unstable commit 42b02b7 Merge: 68c0e6e b8febe6 Author: Dongheng Yang <yangdongheng11@gmail.com> Date: Sun Oct 28 15:54:23 2018 +0800 Merge pull request #1 from antirez/unstable update local data commit 714b589 Author: Christian <crifei93@gmail.com> Date: Fri Dec 28 01:17:26 2018 +0100 fix typo "resulution" commit e23259d Author: garenchan <1412950785@qq.com> Date: Wed Dec 26 09:58:35 2018 +0800 fix typo: segfauls -> segfault commit a9359f8 Author: xjp <jianping_xie@aliyun.com> Date: Tue Dec 18 17:31:44 2018 +0800 Fixed REDISMODULE_H spell bug commit a12c3e4 Author: jdiaz <jrd.palacios@gmail.com> Date: Sat Dec 15 23:39:52 2018 -0600 Fixes hyperloglog hash function comment block description commit 770eb11 Author: 林上耀 <1210tom@163.com> Date: Sun Nov 25 17:16:10 2018 +0800 fix typo commit fd97fbb Author: Chris Lamb <chris@chris-lamb.co.uk> Date: Fri Nov 23 17:14:01 2018 +0100 Correct "unsupported" typo. commit a85522d Author: Jungnam Lee <jungnam.lee@oracle.com> Date: Thu Nov 8 23:01:29 2018 +0900 fix typo in test comments commit ade8007 Author: Arun Kumar <palerdot@users.noreply.github.com> Date: Tue Oct 23 16:56:35 2018 +0530 Fixed grammatical typo Fixed typo for word 'dictionary' commit 869ee39 Author: Hamid Alaei <hamid.a85@gmail.com> Date: Sun Aug 12 16:40:02 2018 +0430 fix documentations: (ThreadSafeContextStart/Stop -> ThreadSafeContextLock/Unlock), minor typo commit f89d158 Author: Mayank Jain <mayankjain255@gmail.com> Date: Tue Jul 31 23:01:21 2018 +0530 Updated README.md with some spelling corrections. Made correction in spelling of some misspelled words. commit 892198e Author: dsomeshwar <someshwar.dhayalan@gmail.com> Date: Sat Jul 21 23:23:04 2018 +0530 typo fix commit 8a4d780 Author: Itamar Haber <itamar@redislabs.com> Date: Mon Apr 30 02:06:52 2018 +0300 Fixes some typos commit e3acef6 Author: Noah Rosamilia <ivoahivoah@gmail.com> Date: Sat Mar 3 23:41:21 2018 -0500 Fix typo in /deps/README.md commit 04442fb Author: WuYunlong <xzsyeb@126.com> Date: Sat Mar 3 10:32:42 2018 +0800 Fix typo in readSyncBulkPayload() comment. commit 9f36880 Author: WuYunlong <xzsyeb@126.com> Date: Sat Mar 3 10:20:37 2018 +0800 replication.c comment: run_id -> replid. commit f866b4a Author: Francesco 'makevoid' Canessa <makevoid@gmail.com> Date: Thu Feb 22 22:01:56 2018 +0000 fix comment typo in server.c commit 0ebc69b Author: 줍 <jubee0124@gmail.com> Date: Mon Feb 12 16:38:48 2018 +0900 Fix typo in redis.conf Fix `five behaviors` to `eight behaviors` in [this sentence ](antirez/redis@unstable/redis.conf#L564) commit b50a620 Author: martinbroadhurst <martinbroadhurst@users.noreply.github.com> Date: Thu Dec 28 12:07:30 2017 +0000 Fix typo in valgrind.sup commit 7d8f349 Author: Peter Boughton <peter@sorcerersisle.com> Date: Mon Nov 27 19:52:19 2017 +0000 Update CONTRIBUTING; refer doc updates to redis-doc repo. commit 02dec7e Author: Klauswk <klauswk1@hotmail.com> Date: Tue Oct 24 16:18:38 2017 -0200 Fix typo in comment commit e1efbc8 Author: chenshi <baiwfg2@gmail.com> Date: Tue Oct 3 18:26:30 2017 +0800 Correct two spelling errors of comments commit 93327d8 Author: spacewander <spacewanderlzx@gmail.com> Date: Wed Sep 13 16:47:24 2017 +0800 Update the comment for OBJ_ENCODING_EMBSTR_SIZE_LIMIT's value The value of OBJ_ENCODING_EMBSTR_SIZE_LIMIT is 44 now instead of 39. commit 63d361f Author: spacewander <spacewanderlzx@gmail.com> Date: Tue Sep 12 15:06:42 2017 +0800 Fix <prevlen> related doc in ziplist.c According to the definition of ZIP_BIG_PREVLEN and other related code, the guard of single byte <prevlen> should be 254 instead of 255. commit ebe228d Author: hanael80 <hanael80@gmail.com> Date: Tue Aug 15 09:09:40 2017 +0900 Fix typo commit 6b696e6 Author: Matt Robenolt <matt@ydekproductions.com> Date: Mon Aug 14 14:50:47 2017 -0700 Fix typo in LATENCY DOCTOR output commit a2ec6ae Author: caosiyang <caosiyang@qiyi.com> Date: Tue Aug 15 14:15:16 2017 +0800 Fix a typo: form => from commit 3ab7699 Author: caosiyang <caosiyang@qiyi.com> Date: Thu Aug 10 18:40:33 2017 +0800 Fix a typo: replicationFeedSlavesFromMaster() => replicationFeedSlavesFromMasterStream() commit 72d43ef Author: caosiyang <caosiyang@qiyi.com> Date: Tue Aug 8 15:57:25 2017 +0800 fix a typo: servewr => server commit 707c958 Author: Bo Cai <charpty@gmail.com> Date: Wed Jul 26 21:49:42 2017 +0800 redis-cli.c typo: conut -> count. Signed-off-by: Bo Cai <charpty@gmail.com> commit b9385b2 Author: JackDrogon <jack.xsuperman@gmail.com> Date: Fri Jun 30 14:22:31 2017 +0800 Fix some spell problems commit 20d9230 Author: akosel <aaronjkosel@gmail.com> Date: Sun Jun 4 19:35:13 2017 -0500 Fix typo commit b167bfc Author: Krzysiek Witkowicz <krzysiekwitkowicz@gmail.com> Date: Mon May 22 21:32:27 2017 +0100 Fix #4008 small typo in comment commit 2b78ac8 Author: Jake Clarkson <jacobwclarkson@gmail.com> Date: Wed Apr 26 15:49:50 2017 +0100 Correct typo in tests/unit/hyperloglog.tcl commit b0f1cdb Author: Qi Luo <qiluo-msft@users.noreply.github.com> Date: Wed Apr 19 14:25:18 2017 -0700 Fix typo commit a90b0f9 Author: charsyam <charsyam@naver.com> Date: Thu Mar 16 18:19:53 2017 +0900 fix typos fix typos fix typos commit 8430a79 Author: Richard Hart <richardhart92@gmail.com> Date: Mon Mar 13 22:17:41 2017 -0400 Fixed log message typo in listenToPort. commit 481a1c2 Author: Vinod Kumar <kumar003vinod@gmail.com> Date: Sun Jan 15 23:04:51 2017 +0530 src/db.c: Correct "save" -> "safe" typo commit 586b4d3 Author: wangshaonan <wshn13@gmail.com> Date: Wed Dec 21 20:28:27 2016 +0800 Fix typo they->the in helloworld.c commit c1c4b5e Author: Jenner <hypxm@qq.com> Date: Mon Dec 19 16:39:46 2016 +0800 typo error commit 1ee1a3f Author: tielei <43289893@qq.com> Date: Mon Jul 18 13:52:25 2016 +0800 fix some comments commit 11a41fb Author: Otto Kekäläinen <otto@seravo.fi> Date: Sun Jul 3 10:23:55 2016 +0100 Fix spelling in documentation and comments commit 5fb5d82 Author: francischan <f1ancis621@gmail.com> Date: Tue Jun 28 00:19:33 2016 +0800 Fix outdated comments about redis.c file. It should now refer to server.c file. commit 6b254bc Author: lmatt-bit <lmatt123n@gmail.com> Date: Thu Apr 21 21:45:58 2016 +0800 Refine the comment of dictRehashMilliseconds func SLAVECONF->REPLCONF in comment - by andyli029 commit ee9869f Author: clark.kang <charsyam@naver.com> Date: Tue Mar 22 11:09:51 2016 +0900 fix typos commit f7b3b11 Author: Harisankar H <harisankarh@gmail.com> Date: Wed Mar 9 11:49:42 2016 +0530 Typo correction: "faield" --> "failed" Typo correction: "faield" --> "failed" commit 3fd40fc Author: Itamar Haber <itamar@redislabs.com> Date: Thu Feb 25 10:31:51 2016 +0200 Fixes a typo in comments commit 621c160 Author: Prayag Verma <prayag.verma@gmail.com> Date: Mon Feb 1 12:36:20 2016 +0530 Fix typo in Readme.md Spelling mistakes - `eviciton` > `eviction` `familar` > `familiar` commit d7d07d6 Author: WonCheol Lee <toctoc21c@gmail.com> Date: Wed Dec 30 15:11:34 2015 +0900 Typo fixed commit a4dade7 Author: Felix Bünemann <buenemann@louis.info> Date: Mon Dec 28 11:02:55 2015 +0100 [ci skip] Improve supervised upstart config docs This mentions that "expect stop" is required for supervised upstart to work correctly. See http://upstart.ubuntu.com/cookbook/#expect-stop for an explanation. commit d9caba9 Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:30:03 2015 +1100 README: Remove trailing whitespace commit 72d42e5 Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:29:32 2015 +1100 README: Fix typo. th => the commit dd6e957 Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:29:20 2015 +1100 README: Fix typo. familar => familiar commit 3a12b23 Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:28:54 2015 +1100 README: Fix typo. eviciton => eviction commit 2d1d03b Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:21:45 2015 +1100 README: Fix typo. sever => server commit 3973b06 Author: Itamar Haber <itamar@garantiadata.com> Date: Sat Dec 19 17:01:20 2015 +0200 Typo fix commit 4f2e460 Author: Steve Gao <fu@2token.com> Date: Fri Dec 4 10:22:05 2015 +0800 Update README - fix typos commit b21667c Author: binyan <binbin.yan@nokia.com> Date: Wed Dec 2 22:48:37 2015 +0800 delete redundancy color judge in sdscatcolor commit 88894c7 Author: binyan <binbin.yan@nokia.com> Date: Wed Dec 2 22:14:42 2015 +0800 the example output shoule be HelloWorld commit 2763470 Author: binyan <binbin.yan@nokia.com> Date: Wed Dec 2 17:41:39 2015 +0800 modify error word keyevente Signed-off-by: binyan <binbin.yan@nokia.com> commit 0847b3d Author: Bruno Martins <bscmartins@gmail.com> Date: Wed Nov 4 11:37:01 2015 +0000 typo commit bbb9e9e Author: dawedawe <dawedawe@gmx.de> Date: Fri Mar 27 00:46:41 2015 +0100 typo: zimap -> zipmap commit 5ed297e Author: Axel Advento <badwolf.bloodseeker.rev@gmail.com> Date: Tue Mar 3 15:58:29 2015 +0800 Fix 'salve' typos to 'slave' commit edec9d6 Author: LudwikJaniuk <ludvig.janiuk@gmail.com> Date: Wed Jun 12 14:12:47 2019 +0200 Update README.md Co-Authored-By: Qix <Qix-@users.noreply.github.com> commit 692a7af Author: LudwikJaniuk <ludvig.janiuk@gmail.com> Date: Tue May 28 14:32:04 2019 +0200 grammar commit d962b0a Author: Nick Frost <nickfrostatx@gmail.com> Date: Wed Jul 20 15:17:12 2016 -0700 Minor grammar fix commit 24fff01aaccaf5956973ada8c50ceb1462e211c6 (typos) Author: Chad Miller <chadm@squareup.com> Date: Tue Sep 8 13:46:11 2020 -0400 Fix faulty comment about operation of unlink() commit 3cd5c1f3326c52aa552ada7ec797c6bb16452355 Author: Kevin <kevin.xgr@gmail.com> Date: Wed Nov 20 00:13:50 2019 +0800 Fix typo in server.c. From a83af59 Mon Sep 17 00:00:00 2001 From: wuwo <wuwo@wacai.com> Date: Fri, 17 Mar 2017 20:37:45 +0800 Subject: [PATCH] falure to failure From c961896 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E5=B7=A6=E6=87=B6?= <veficos@gmail.com> Date: Sat, 27 May 2017 15:33:04 +0800 Subject: [PATCH] fix typo From e600ef2 Mon Sep 17 00:00:00 2001 From: "rui.zou" <rui.zou@yunify.com> Date: Sat, 30 Sep 2017 12:38:15 +0800 Subject: [PATCH] fix a typo From c7d07fa Mon Sep 17 00:00:00 2001 From: Alexandre Perrin <alex@kaworu.ch> Date: Thu, 16 Aug 2018 10:35:31 +0200 Subject: [PATCH] deps README.md typo From b25cb67 Mon Sep 17 00:00:00 2001 From: Guy Korland <gkorland@gmail.com> Date: Wed, 26 Sep 2018 10:55:37 +0300 Subject: [PATCH 1/2] fix typos in header From ad28ca6 Mon Sep 17 00:00:00 2001 From: Guy Korland <gkorland@gmail.com> Date: Wed, 26 Sep 2018 11:02:36 +0300 Subject: [PATCH 2/2] fix typos commit 34924cdedd8552466fc22c1168d49236cb7ee915 Author: Adrian Lynch <adi_ady_ade@hotmail.com> Date: Sat Apr 4 21:59:15 2015 +0100 Typos fixed commit fd2a1e7 Author: Jan <jsteemann@users.noreply.github.com> Date: Sat Oct 27 19:13:01 2018 +0200 Fix typos Fix typos commit e14e47c1a234b53b0e103c5f6a1c61481cbcbb02 Author: Andy Lester <andy@petdance.com> Date: Fri Aug 2 22:30:07 2019 -0500 Fix multiple misspellings of "following" commit 79b948ce2dac6b453fe80995abbcaac04c213d5a Author: Andy Lester <andy@petdance.com> Date: Fri Aug 2 22:24:28 2019 -0500 Fix misspelling of create-cluster commit 1fffde52666dc99ab35efbd31071a4c008cb5a71 Author: Andy Lester <andy@petdance.com> Date: Wed Jul 31 17:57:56 2019 -0500 Fix typos commit 204c9ba9651e9e05fd73936b452b9a30be456cfe Author: Xiaobo Zhu <xiaobo.zhu@shopee.com> Date: Tue Aug 13 22:19:25 2019 +0800 fix typos Squashed commit of the following: commit 1d9aaf8 Author: danmedani <danmedani@gmail.com> Date: Sun Aug 2 11:40:26 2015 -0700 README typo fix. Squashed commit of the following: commit 32bfa7c Author: Erik Dubbelboer <erik@dubbelboer.com> Date: Mon Jul 6 21:15:08 2015 +0200 Fixed grammer Squashed commit of the following: commit b24f69c Author: Sisir Koppaka <sisir.koppaka@gmail.com> Date: Mon Mar 2 22:38:45 2015 -0500 utils/hashtable/rehashing.c: Fix typos Squashed commit of the following: commit 4e04082 Author: Erik Dubbelboer <erik@dubbelboer.com> Date: Mon Mar 23 08:22:21 2015 +0000 Small config file documentation improvements Squashed commit of the following: commit acb8773 Author: ctd1500 <ctd1500@gmail.com> Date: Fri May 8 01:52:48 2015 -0700 Typo and grammar fixes in readme commit 2eb75b6 Author: ctd1500 <ctd1500@gmail.com> Date: Fri May 8 01:36:18 2015 -0700 fixed redis.conf comment Squashed commit of the following: commit a8249a2 Author: Masahiko Sawada <sawada.mshk@gmail.com> Date: Fri Dec 11 11:39:52 2015 +0530 Revise correction of typos. Squashed commit of the following: commit 3c02028 Author: zhaojun11 <zhaojun11@jd.com> Date: Wed Jan 17 19:05:28 2018 +0800 Fix typos include two code typos in cluster.c and latency.c Squashed commit of the following: commit 9dba47c Author: q191201771 <191201771@qq.com> Date: Sat Jan 4 11:31:04 2020 +0800 fix function listCreate comment in adlist.c Update src/server.c commit 2c7c2cb536e78dd211b1ac6f7bda00f0f54faaeb Author: charpty <charpty@gmail.com> Date: Tue May 1 23:16:59 2018 +0800 server.c typo: modules system dictionary type comment Signed-off-by: charpty <charpty@gmail.com> commit a8395323fb63cb59cb3591cb0f0c8edb7c29a680 Author: Itamar Haber <itamar@redislabs.com> Date: Sun May 6 00:25:18 2018 +0300 Updates test_helper.tcl's help with undocumented options Specifically: * Host * Port * Client commit bde6f9ced15755cd6407b4af7d601b030f36d60b Author: wxisme <850885154@qq.com> Date: Wed Aug 8 15:19:19 2018 +0800 fix comments in deps files commit 3172474ba991532ab799ee1873439f3402412331 Author: wxisme <850885154@qq.com> Date: Wed Aug 8 14:33:49 2018 +0800 fix some comments commit 01b6f2b6858b5cf2ce4ad5092d2c746e755f53f0 Author: Thor Juhasz <thor@juhasz.pro> Date: Sun Nov 18 14:37:41 2018 +0100 Minor fixes to comments Found some parts a little unclear on a first read, which prompted me to have a better look at the file and fix some minor things I noticed. Fixing minor typos and grammar. There are no changes to configuration options. These changes are only meant to help the user better understand the explanations to the various configuration options
2020-09-10 12:43:38 +02:00
/* Struct to hold an inclusive/exclusive range spec by score comparison. */
typedef struct {
double min, max;
int minex, maxex; /* are min or max exclusive? */
} zrangespec;
/* Struct to hold an inclusive/exclusive range spec by lexicographic comparison. */
typedef struct {
sds min, max; /* May be set to shared.(minstring|maxstring) */
int minex, maxex; /* are min or max exclusive? */
} zlexrangespec;
Sort out the mess around Lua error messages and error stats (#10329) This PR fix 2 issues on Lua scripting: * Server error reply statistics (some errors were counted twice). * Error code and error strings returning from scripts (error code was missing / misplaced). ## Statistics a Lua script user is considered part of the user application, a sophisticated transaction, so we want to count an error even if handled silently by the script, but when it is propagated outwards from the script we don't wanna count it twice. on the other hand, if the script decides to throw an error on its own (using `redis.error_reply`), we wanna count that too. Besides, we do count the `calls` in command statistics for the commands the script calls, we we should certainly also count `failed_calls`. So when a simple `eval "return redis.call('set','x','y')" 0` fails, it should count the failed call to both SET and EVAL, but the `errorstats` and `total_error_replies` should be counted only once. The PR changes the error object that is raised on errors. Instead of raising a simple Lua string, Redis will raise a Lua table in the following format: ``` { err='<error message (including error code)>', source='<User source file name>', line='<line where the error happned>', ignore_error_stats_update=true/false, } ``` The `luaPushError` function was modified to construct the new error table as describe above. The `luaRaiseError` was renamed to `luaError` and is now simply called `lua_error` to raise the table on the top of the Lua stack as the error object. The reason is that since its functionality is changed, in case some Redis branch / fork uses it, it's better to have a compilation error than a bug. The `source` and `line` fields are enriched by the error handler (if possible) and the `ignore_error_stats_update` is optional and if its not present then the default value is `false`. If `ignore_error_stats_update` is true, the error will not be counted on the error stats. When parsing Redis call reply, each error is translated to a Lua table on the format describe above and the `ignore_error_stats_update` field is set to `true` so we will not count errors twice (we counted this error when we invoke the command). The changes in this PR might have been considered as a breaking change for users that used Lua `pcall` function. Before, the error was a string and now its a table. To keep backward comparability the PR override the `pcall` implementation and extract the error message from the error table and return it. Example of the error stats update: ``` 127.0.0.1:6379> lpush l 1 (integer) 2 127.0.0.1:6379> eval "return redis.call('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value. script: e471b73f1ef44774987ab00bdf51f21fd9f7974a, on @user_script:1. 127.0.0.1:6379> info Errorstats # Errorstats errorstat_WRONGTYPE:count=1 127.0.0.1:6379> info commandstats # Commandstats cmdstat_eval:calls=1,usec=341,usec_per_call=341.00,rejected_calls=0,failed_calls=1 cmdstat_info:calls=1,usec=35,usec_per_call=35.00,rejected_calls=0,failed_calls=0 cmdstat_lpush:calls=1,usec=14,usec_per_call=14.00,rejected_calls=0,failed_calls=0 cmdstat_get:calls=1,usec=10,usec_per_call=10.00,rejected_calls=0,failed_calls=1 ``` ## error message We can now construct the error message (sent as a reply to the user) from the error table, so this solves issues where the error message was malformed and the error code appeared in the middle of the error message: ```diff 127.0.0.1:6379> eval "return redis.call('set','x','y')" 0 -(error) ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: OOM command not allowed when used memory > 'maxmemory'. +(error) OOM command not allowed when used memory > 'maxmemory' @user_script:1. Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479) ``` ```diff 127.0.0.1:6379> eval "redis.call('get', 'l')" 0 -(error) ERR Error running script (call to f_8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1): @user_script:1: WRONGTYPE Operation against a key holding the wrong kind of value +(error) WRONGTYPE Operation against a key holding the wrong kind of value script: 8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1, on @user_script:1. ``` Notica that `redis.pcall` was not change: ``` 127.0.0.1:6379> eval "return redis.pcall('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value ``` ## other notes Notice that Some commands (like GEOADD) changes the cmd variable on the client stats so we can not count on it to update the command stats. In order to be able to update those stats correctly we needed to promote `realcmd` variable to be located on the client struct. Tests was added and modified to verify the changes. Related PR's: #10279, #10218, #10278, #10309 Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-27 12:40:57 +01:00
/* flags for incrCommandFailedCalls */
#define ERROR_COMMAND_REJECTED (1<<0) /* Indicate to update the command rejected stats */
#define ERROR_COMMAND_FAILED (1<<1) /* Indicate to update the command failed stats */
zskiplist *zslCreate(void);
void zslFree(zskiplist *zsl);
zskiplistNode *zslInsert(zskiplist *zsl, double score, sds ele);
unsigned char *zzlInsert(unsigned char *zl, sds ele, double score);
int zslDelete(zskiplist *zsl, double score, sds ele, zskiplistNode **node);
zskiplistNode *zslFirstInRange(zskiplist *zsl, zrangespec *range);
zskiplistNode *zslLastInRange(zskiplist *zsl, zrangespec *range);
2011-03-14 13:30:06 +01:00
double zzlGetScore(unsigned char *sptr);
void zzlNext(unsigned char *zl, unsigned char **eptr, unsigned char **sptr);
void zzlPrev(unsigned char *zl, unsigned char **eptr, unsigned char **sptr);
2016-04-19 15:22:33 +02:00
unsigned char *zzlFirstInRange(unsigned char *zl, zrangespec *range);
2016-04-20 12:38:14 +02:00
unsigned char *zzlLastInRange(unsigned char *zl, zrangespec *range);
2017-12-08 08:37:08 +01:00
unsigned long zsetLength(const robj *zobj);
void zsetConvert(robj *zobj, int encoding);
void zsetConvertToListpackIfNeeded(robj *zobj, size_t maxelelen, size_t totelelen);
int zsetScore(robj *zobj, sds member, double *score);
unsigned long zslGetRank(zskiplist *zsl, double score, sds o);
int zsetAdd(robj *zobj, double score, sds ele, int in_flags, int *out_flags, double *newscore);
2016-04-15 12:46:18 +02:00
long zsetRank(robj *zobj, sds ele, int reverse);
2016-04-15 15:20:25 +02:00
int zsetDel(robj *zobj, sds ele);
robj *zsetDup(robj *o);
void genericZpopCommand(client *c, robj **keyv, int keyc, int where, int emitkey, long count, int use_nested_array, int reply_nil_when_empty, int *deleted);
Replace all usage of ziplist with listpack for t_zset (#9366) Part two of implementing #8702 (zset), after #8887. ## Description of the feature Replaced all uses of ziplist with listpack in t_zset, and optimized some of the code to optimize performance. ## Rdb format changes New `RDB_TYPE_ZSET_LISTPACK` rdb type. ## Rdb loading improvements: 1) Pre-expansion of dict for validation of duplicate data for listpack and ziplist. 2) Simplifying the release of empty key objects when RDB loading. 3) Unify ziplist and listpack data verify methods for zset and hash, and move code to rdb.c. ## Interface changes 1) New `zset-max-listpack-entries` config is an alias for `zset-max-ziplist-entries` (same with `zset-max-listpack-value`). 2) OBJECT ENCODING will return listpack instead of ziplist. ## Listpack improvements: 1) Add `lpDeleteRange` and `lpDeleteRangeWithEntry` functions to delete a range of entries from listpack. 2) Improve the performance of `lpCompare`, converting from string to integer is faster than converting from integer to string. 3) Replace `snprintf` with `ll2string` to improve performance in converting numbers to strings in `lpGet()`. ## Zset improvements: 1) Improve the performance of `zzlFind` method, use `lpFind` instead of `lpCompare` in a loop. 2) Use `lpDeleteRangeWithEntry` instead of `lpDelete` twice to delete a element of zset. ## Tests 1) Add some unittests for `lpDeleteRange` and `lpDeleteRangeWithEntry` function. 2) Add zset RDB loading test. 3) Add benchmark test for `lpCompare` and `ziplsitCompare`. 4) Add empty listpack zset corrupt dump test.
2021-09-09 17:18:53 +02:00
sds lpGetObject(unsigned char *sptr);
2016-04-19 17:02:24 +02:00
int zslValueGteMin(double value, zrangespec *spec);
int zslValueLteMax(double value, zrangespec *spec);
2016-04-20 23:01:40 +02:00
void zslFreeLexRange(zlexrangespec *spec);
2016-04-21 09:27:13 +02:00
int zslParseLexRange(robj *min, robj *max, zlexrangespec *spec);
unsigned char *zzlFirstInLexRange(unsigned char *zl, zlexrangespec *range);
unsigned char *zzlLastInLexRange(unsigned char *zl, zlexrangespec *range);
zskiplistNode *zslFirstInLexRange(zskiplist *zsl, zlexrangespec *range);
zskiplistNode *zslLastInLexRange(zskiplist *zsl, zlexrangespec *range);
2016-04-21 11:17:00 +02:00
int zzlLexValueGteMin(unsigned char *p, zlexrangespec *spec);
int zzlLexValueLteMax(unsigned char *p, zlexrangespec *spec);
int zslLexValueGteMin(sds value, zlexrangespec *spec);
int zslLexValueLteMax(sds value, zlexrangespec *spec);
/* Core functions */
int getMaxmemoryState(size_t *total, size_t *logical, size_t *tofree, float *level);
size_t freeMemoryGetNotCountedMemory();
Limit the main db and expires dictionaries to expand (#7954) As we know, redis may reject user's requests or evict some keys if used memory is over maxmemory. Dictionaries expanding may make things worse, some big dictionaries, such as main db and expires dict, may eat huge memory at once for allocating a new big hash table and be far more than maxmemory after expanding. There are related issues: #4213 #4583 More details, when expand dict in redis, we will allocate a new big ht[1] that generally is double of ht[0], The size of ht[1] will be very big if ht[0] already is big. For db dict, if we have more than 64 million keys, we need to cost 1GB for ht[1] when dict expands. If the sum of used memory and new hash table of dict needed exceeds maxmemory, we shouldn't allow the dict to expand. Because, if we enable keys eviction, we still couldn't add much more keys after eviction and rehashing, what's worse, redis will keep less keys when redis only remains a little memory for storing new hash table instead of users' data. Moreover users can't write data in redis if disable keys eviction. What this commit changed ? Add a new member function expandAllowed for dict type, it provide a way for caller to allow expand or not. We expose two parameters for this function: more memory needed for expanding and dict current load factor, users can implement a function to make a decision by them. For main db dict and expires dict type, these dictionaries may be very big and cost huge memory for expanding, so we implement a judgement function: we can stop dict to expand provisionally if used memory will be over maxmemory after dict expands, but to guarantee the performance of redis, we still allow dict to expand if dict load factor exceeds the safe load factor. Add test cases to verify we don't allow main db to expand when left memory is not enough, so that avoid keys eviction. Other changes: For new hash table size when expand. Before this commit, the size is that double used of dict and later _dictNextPower. Actually we aim to control a dict load factor between 0.5 and 1.0. Now we replace *2 with +1, since the first check is that used >= size, the outcome of before will usually be the same as _dictNextPower(used+1). The only case where it'll differ is when dict_can_resize is false during fork, so that later the _dictNextPower(used*2) will cause the dict to jump to *4 (i.e. _dictNextPower(1025*2) will return 4096). Fix rehash test cases due to changing algorithm of new hash table size when expand.
2020-12-06 10:53:04 +01:00
int overMaxmemoryAfterAlloc(size_t moremem);
int processCommand(client *c);
int processPendingCommandAndInputBuffer(client *c);
2011-03-06 17:49:22 +01:00
void setupSignalHandlers(void);
void removeSignalHandlers(void);
int createSocketAcceptHandler(socketFds *sfd, aeFileProc *accept_handler);
int changeListenPort(int port, socketFds *sfd, aeFileProc *accept_handler);
Multiparam config set (#9748) We can now do: `config set maxmemory 10m repl-backlog-size 5m` ## Basic algorithm to support "transaction like" config sets: 1. Backup all relevant current values (via get). 2. Run "verify" and "set" on everything, if we fail run "restore". 3. Run "apply" on everything (optional optimization: skip functions already run). If we fail run "restore". 4. Return success. ### restore 1. Run set on everything in backup. If we fail log it and continue (this puts us in an undefined state but we decided it's better than the alternative of panicking). This indicates either a bug or some unsupported external state. 2. Run apply on everything in backup (optimization: skip functions already run). If we fail log it (see comment above). 3. Return error. ## Implementation/design changes: * Apply function are idempotent (have no effect if they are run more than once for the same config). * No indication in set functions if we're reading the config or running from the `CONFIG SET` command (removed `update` argument). * Set function should set some config variable and assume an (optional) apply function will use that later to apply. If we know this setting can be safely applied immediately and can always be reverted and doesn't depend on any other configuration we can apply immediately from within the set function (and not store the setting anywhere). This is the case of this `dir` config, for example, which has no apply function. No apply function is need also in the case that setting the variable in the `server` struct is all that needs to be done to make the configuration take effect. Note that the original concept of `update_fn`, which received the old and new values was removed and replaced by the optional apply function. * Apply functions use settings written to the `server` struct and don't receive any inputs. * I take care that for the generic (non-special) configs if there's no change I avoid calling the setter (possible optimization: avoid calling the apply function as well). * Passing the same config parameter more than once to `config set` will fail. You can't do `config set my-setting value1 my-setting value2`. Note that getting `save` in the context of the conf file parsing to work here as before was a pain. The conf file supports an aggregate `save` definition, where each `save` line is added to the server's save params. This is unlike any other line in the config file where each line overwrites any previous configuration. Since we now support passing multiple save params in a single line (see top comments about `save` in https://github.com/redis/redis/pull/9644) we should deprecate the aggregate nature of this config line and perhaps reduce this ugly code in the future.
2021-12-01 09:15:11 +01:00
int changeBindAddr(void);
struct redisCommand *lookupSubcommand(struct redisCommand *container, sds sub_name);
struct redisCommand *lookupCommand(robj **argv, int argc);
Treat subcommands as commands (#9504) ## Intro The purpose is to allow having different flags/ACL categories for subcommands (Example: CONFIG GET is ok-loading but CONFIG SET isn't) We create a small command table for every command that has subcommands and each subcommand has its own flags, etc. (same as a "regular" command) This commit also unites the Redis and the Sentinel command tables ## Affected commands CONFIG Used to have "admin ok-loading ok-stale no-script" Changes: 1. Dropped "ok-loading" in all except GET (this doesn't change behavior since there were checks in the code doing that) XINFO Used to have "read-only random" Changes: 1. Dropped "random" in all except CONSUMERS XGROUP Used to have "write use-memory" Changes: 1. Dropped "use-memory" in all except CREATE and CREATECONSUMER COMMAND No changes. MEMORY Used to have "random read-only" Changes: 1. Dropped "random" in PURGE and USAGE ACL Used to have "admin no-script ok-loading ok-stale" Changes: 1. Dropped "admin" in WHOAMI, GENPASS, and CAT LATENCY No changes. MODULE No changes. SLOWLOG Used to have "admin random ok-loading ok-stale" Changes: 1. Dropped "random" in RESET OBJECT Used to have "read-only random" Changes: 1. Dropped "random" in ENCODING and REFCOUNT SCRIPT Used to have "may-replicate no-script" Changes: 1. Dropped "may-replicate" in all except FLUSH and LOAD CLIENT Used to have "admin no-script random ok-loading ok-stale" Changes: 1. Dropped "random" in all except INFO and LIST 2. Dropped "admin" in ID, TRACKING, CACHING, GETREDIR, INFO, SETNAME, GETNAME, and REPLY STRALGO No changes. PUBSUB No changes. CLUSTER Changes: 1. Dropped "admin in countkeysinslots, getkeysinslot, info, nodes, keyslot, myid, and slots SENTINEL No changes. (note that DEBUG also fits, but we decided not to convert it since it's for debugging and anyway undocumented) ## New sub-command This commit adds another element to the per-command output of COMMAND, describing the list of subcommands, if any (in the same structure as "regular" commands) Also, it adds a new subcommand: ``` COMMAND LIST [FILTERBY (MODULE <module-name>|ACLCAT <cat>|PATTERN <pattern>)] ``` which returns a set of all commands (unless filters), but excluding subcommands. ## Module API A new module API, RM_CreateSubcommand, was added, in order to allow module writer to define subcommands ## ACL changes: 1. Now, that each subcommand is actually a command, each has its own ACL id. 2. The old mechanism of allowed_subcommands is redundant (blocking/allowing a subcommand is the same as blocking/allowing a regular command), but we had to keep it, to support the widespread usage of allowed_subcommands to block commands with certain args, that aren't subcommands (e.g. "-select +select|0"). 3. I have renamed allowed_subcommands to allowed_firstargs to emphasize the difference. 4. Because subcommands are commands in ACL too, you can now use "-" to block subcommands (e.g. "+client -client|kill"), which wasn't possible in the past. 5. It is also possible to use the allowed_firstargs mechanism with subcommand. For example: `+config -config|set +config|set|loglevel` will block all CONFIG SET except for setting the log level. 6. All of the ACL changes above required some amount of refactoring. ## Misc 1. There are two approaches: Either each subcommand has its own function or all subcommands use the same function, determining what to do according to argv[0]. For now, I took the former approaches only with CONFIG and COMMAND, while other commands use the latter approach (for smaller blamelog diff). 2. Deleted memoryGetKeys: It is no longer needed because MEMORY USAGE now uses the "range" key spec. 4. Bugfix: GETNAME was missing from CLIENT's help message. 5. Sentinel and Redis now use the same table, with the same function pointer. Some commands have a different implementation in Sentinel, so we redirect them (these are ROLE, PUBLISH, and INFO). 6. Command stats now show the stats per subcommand (e.g. instead of stats just for "config" you will have stats for "config|set", "config|get", etc.) 7. It is now possible to use COMMAND directly on subcommands: COMMAND INFO CONFIG|GET (The pipeline syntax was inspired from ACL, and can be used in functions lookupCommandBySds and lookupCommandByCString) 8. STRALGO is now a container command (has "help") ## Breaking changes: 1. Command stats now show the stats per subcommand (see (5) above)
2021-10-20 10:52:57 +02:00
struct redisCommand *lookupCommandBySdsLogic(dict *commands, sds s);
struct redisCommand *lookupCommandBySds(sds s);
struct redisCommand *lookupCommandByCStringLogic(dict *commands, const char *s);
struct redisCommand *lookupCommandByCString(const char *s);
struct redisCommand *lookupCommandOrOriginal(robj **argv, int argc);
int commandCheckExistence(client *c, sds *err);
int commandCheckArity(client *c, sds *err);
Sort out the mess around Lua error messages and error stats (#10329) This PR fix 2 issues on Lua scripting: * Server error reply statistics (some errors were counted twice). * Error code and error strings returning from scripts (error code was missing / misplaced). ## Statistics a Lua script user is considered part of the user application, a sophisticated transaction, so we want to count an error even if handled silently by the script, but when it is propagated outwards from the script we don't wanna count it twice. on the other hand, if the script decides to throw an error on its own (using `redis.error_reply`), we wanna count that too. Besides, we do count the `calls` in command statistics for the commands the script calls, we we should certainly also count `failed_calls`. So when a simple `eval "return redis.call('set','x','y')" 0` fails, it should count the failed call to both SET and EVAL, but the `errorstats` and `total_error_replies` should be counted only once. The PR changes the error object that is raised on errors. Instead of raising a simple Lua string, Redis will raise a Lua table in the following format: ``` { err='<error message (including error code)>', source='<User source file name>', line='<line where the error happned>', ignore_error_stats_update=true/false, } ``` The `luaPushError` function was modified to construct the new error table as describe above. The `luaRaiseError` was renamed to `luaError` and is now simply called `lua_error` to raise the table on the top of the Lua stack as the error object. The reason is that since its functionality is changed, in case some Redis branch / fork uses it, it's better to have a compilation error than a bug. The `source` and `line` fields are enriched by the error handler (if possible) and the `ignore_error_stats_update` is optional and if its not present then the default value is `false`. If `ignore_error_stats_update` is true, the error will not be counted on the error stats. When parsing Redis call reply, each error is translated to a Lua table on the format describe above and the `ignore_error_stats_update` field is set to `true` so we will not count errors twice (we counted this error when we invoke the command). The changes in this PR might have been considered as a breaking change for users that used Lua `pcall` function. Before, the error was a string and now its a table. To keep backward comparability the PR override the `pcall` implementation and extract the error message from the error table and return it. Example of the error stats update: ``` 127.0.0.1:6379> lpush l 1 (integer) 2 127.0.0.1:6379> eval "return redis.call('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value. script: e471b73f1ef44774987ab00bdf51f21fd9f7974a, on @user_script:1. 127.0.0.1:6379> info Errorstats # Errorstats errorstat_WRONGTYPE:count=1 127.0.0.1:6379> info commandstats # Commandstats cmdstat_eval:calls=1,usec=341,usec_per_call=341.00,rejected_calls=0,failed_calls=1 cmdstat_info:calls=1,usec=35,usec_per_call=35.00,rejected_calls=0,failed_calls=0 cmdstat_lpush:calls=1,usec=14,usec_per_call=14.00,rejected_calls=0,failed_calls=0 cmdstat_get:calls=1,usec=10,usec_per_call=10.00,rejected_calls=0,failed_calls=1 ``` ## error message We can now construct the error message (sent as a reply to the user) from the error table, so this solves issues where the error message was malformed and the error code appeared in the middle of the error message: ```diff 127.0.0.1:6379> eval "return redis.call('set','x','y')" 0 -(error) ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: OOM command not allowed when used memory > 'maxmemory'. +(error) OOM command not allowed when used memory > 'maxmemory' @user_script:1. Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479) ``` ```diff 127.0.0.1:6379> eval "redis.call('get', 'l')" 0 -(error) ERR Error running script (call to f_8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1): @user_script:1: WRONGTYPE Operation against a key holding the wrong kind of value +(error) WRONGTYPE Operation against a key holding the wrong kind of value script: 8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1, on @user_script:1. ``` Notica that `redis.pcall` was not change: ``` 127.0.0.1:6379> eval "return redis.pcall('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value ``` ## other notes Notice that Some commands (like GEOADD) changes the cmd variable on the client stats so we can not count on it to update the command stats. In order to be able to update those stats correctly we needed to promote `realcmd` variable to be located on the client struct. Tests was added and modified to verify the changes. Related PR's: #10279, #10218, #10278, #10309 Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-27 12:40:57 +01:00
void startCommandExecution();
int incrCommandStatsOnError(struct redisCommand *cmd, int flags);
void call(client *c, int flags);
void alsoPropagate(int dbid, robj **argv, int argc, int target);
Sort out mess around propagation and MULTI/EXEC (#9890) The mess: Some parts use alsoPropagate for late propagation, others using an immediate one (propagate()), causing edge cases, ugly/hacky code, and the tendency for bugs The basic idea is that all commands are propagated via alsoPropagate (i.e. added to a list) and the top-most call() is responsible for going over that list and actually propagating them (and wrapping them in MULTI/EXEC if there's more than one command). This is done in the new function, propagatePendingCommands. Callers to propagatePendingCommands: 1. top-most call() (we want all nested call()s to add to the also_propagate array and just the top-most one to propagate them) - via `afterCommand` 2. handleClientsBlockedOnKeys: it is out of call() context and it may propagate stuff - via `afterCommand`. 3. handleClientsBlockedOnKeys edge case: if the looked-up key is already expired, we will propagate the expire but will not unblock any client so `afterCommand` isn't called. in that case, we have to propagate the deletion explicitly. 4. cron stuff: active-expire and eviction may also propagate stuff 5. modules: the module API allows to propagate stuff from just about anywhere (timers, keyspace notifications, threads). I could have tried to catch all the out-of-call-context places but it seemed easier to handle it in one place: when we free the context. in the spirit of what was done in call(), only the top-most freeing of a module context may cause propagation. 6. modules: when using a thread-safe ctx it's not clear when/if the ctx will be freed. we do know that the module must lock the GIL before calling RM_Replicate/RM_Call so we propagate the pending commands when releasing the GIL. A "known limitation", which were actually a bug, was fixed because of this commit (see propagate.tcl): When using a mix of RM_Call with `!` and RM_Replicate, the command would propagate out-of-order: first all the commands from RM_Call, and then the ones from RM_Replicate Another thing worth mentioning is that if, in the past, a client would issue a MULTI/EXEC with just one write command the server would blindly propagate the MULTI/EXEC too, even though it's redundant. not anymore. This commit renames propagate() to propagateNow() in order to cause conflicts in pending PRs. propagatePendingCommands is the only caller of propagateNow, which is now a static, internal helper function. Optimizations: 1. alsoPropagate will not add stuff to also_propagate if there's no AOF and replicas 2. alsoPropagate reallocs also_propagagte exponentially, to save calls to memmove Bugfixes: 1. CONFIG SET can create evictions, sending notifications which can cause to dirty++ with modules. we need to prevent it from propagating to AOF/replicas 2. We need to set current_client in RM_Call. buggy scenario: - CONFIG SET maxmemory, eviction notifications, module hook calls RM_Call - assertion in lookupKey crashes, because current_client has CONFIG SET, which isn't CMD_WRITE 3. minor: in eviction, call propagateDeletion after notification, like active-expire and all commands (we always send a notification before propagating the command)
2021-12-22 23:03:48 +01:00
void propagatePendingCommands();
void redisOpArrayInit(redisOpArray *oa);
void redisOpArrayFree(redisOpArray *oa);
void forceCommandPropagation(client *c, int flags);
void preventCommandPropagation(client *c);
void preventCommandAOF(client *c);
void preventCommandReplication(client *c);
2021-03-25 09:20:27 +01:00
void slowlogPushCurrentCommand(client *c, struct redisCommand *cmd, ustime_t duration);
Added INFO LATENCYSTATS section: latency by percentile distribution/latency by cumulative distribution of latencies (#9462) # Short description The Redis extended latency stats track per command latencies and enables: - exporting the per-command percentile distribution via the `INFO LATENCYSTATS` command. **( percentile distribution is not mergeable between cluster nodes ).** - exporting the per-command cumulative latency distributions via the `LATENCY HISTOGRAM` command. Using the cumulative distribution of latencies we can merge several stats from different cluster nodes to calculate aggregate metrics . By default, the extended latency monitoring is enabled since the overhead of keeping track of the command latency is very small. If you don't want to track extended latency metrics, you can easily disable it at runtime using the command: - `CONFIG SET latency-tracking no` By default, the exported latency percentiles are the p50, p99, and p999. You can alter them at runtime using the command: - `CONFIG SET latency-tracking-info-percentiles "0.0 50.0 100.0"` ## Some details: - The total size per histogram should sit around 40 KiB. We only allocate those 40KiB when a command was called for the first time. - With regards to the WRITE overhead As seen below, there is no measurable overhead on the achievable ops/sec or full latency spectrum on the client. Including also the measured redis-benchmark for unstable vs this branch. - We track from 1 nanosecond to 1 second ( everything above 1 second is considered +Inf ) ## `INFO LATENCYSTATS` exposition format - Format: `latency_percentiles_usec_<CMDNAME>:p0=XX,p50....` ## `LATENCY HISTOGRAM [command ...]` exposition format Return a cumulative distribution of latencies in the format of a histogram for the specified command names. The histogram is composed of a map of time buckets: - Each representing a latency range, between 1 nanosecond and roughly 1 second. - Each bucket covers twice the previous bucket's range. - Empty buckets are not printed. - Everything above 1 sec is considered +Inf. - At max there will be log2(1000000000)=30 buckets We reply a map for each command in the format: `<command name> : { `calls`: <total command calls> , `histogram` : { <bucket 1> : latency , < bucket 2> : latency, ... } }` Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-05 13:01:05 +01:00
void updateCommandLatencyHistogram(struct hdr_histogram** latency_histogram, int64_t duration_hist);
int prepareForShutdown(int flags);
Wait for replicas when shutting down (#9872) To avoid data loss, this commit adds a grace period for lagging replicas to catch up the replication offset. Done: * Wait for replicas when shutdown is triggered by SIGTERM and SIGINT. * Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients to call SHUTDOWN in parallel. Note that they don't expect a response unless an error happens and shutdown is aborted. * Log warning for each replica lagging behind when finishing shutdown. * CLIENT_PAUSE_WRITE while waiting for replicas. * Configurable grace period 'shutdown-timeout' in seconds (default 10). * New flags for the SHUTDOWN command: - NOW disables the grace period for lagging replicas. - FORCE ignores errors writing the RDB or AOF files which would normally prevent a shutdown. - ABORT cancels ongoing shutdown. Can't be combined with other flags. * New field in the output of the INFO command: 'shutdown_in_milliseconds'. The value is the remaining maximum time to wait for lagging replicas before finishing the shutdown. This field is present in the Server section **only** during shutdown. Not directly related: * When shutting down, if there is an AOF saving child, it is killed **even** if AOF is disabled. This can happen if BGREWRITEAOF is used when AOF is off. * Client pause now has end time and type (WRITE or ALL) per purpose. The different pause purposes are *CLIENT PAUSE command*, *failover* and *shutdown*. If clients are unpaused for one purpose, it doesn't affect client pause for other purposes. For example, the CLIENT UNPAUSE command doesn't affect client pause initiated by the failover or shutdown procedures. A completed failover or a failed shutdown doesn't unpause clients paused by the CLIENT PAUSE command. Notes: * DEBUG RESTART doesn't wait for replicas. * We already have a warning logged when a replica disconnects. This means that if any replica connection is lost during the shutdown, it is either logged as disconnected or as lagging at the time of exit. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-02 08:50:15 +01:00
void replyToClientsBlockedOnShutdown(void);
int abortShutdown(void);
void afterCommand(client *c);
int mustObeyClient(client *c);
#ifdef __GNUC__
void _serverLog(int level, const char *fmt, ...)
__attribute__((format(printf, 2, 3)));
#else
void _serverLog(int level, const char *fmt, ...);
#endif
2015-07-26 15:17:43 +02:00
void serverLogRaw(int level, const char *msg);
void serverLogFromHandler(int level, const char *msg);
void usage(void);
void updateDictResizePolicy(void);
int htNeedsResize(dict *dict);
void populateCommandTable(void);
Treat subcommands as commands (#9504) ## Intro The purpose is to allow having different flags/ACL categories for subcommands (Example: CONFIG GET is ok-loading but CONFIG SET isn't) We create a small command table for every command that has subcommands and each subcommand has its own flags, etc. (same as a "regular" command) This commit also unites the Redis and the Sentinel command tables ## Affected commands CONFIG Used to have "admin ok-loading ok-stale no-script" Changes: 1. Dropped "ok-loading" in all except GET (this doesn't change behavior since there were checks in the code doing that) XINFO Used to have "read-only random" Changes: 1. Dropped "random" in all except CONSUMERS XGROUP Used to have "write use-memory" Changes: 1. Dropped "use-memory" in all except CREATE and CREATECONSUMER COMMAND No changes. MEMORY Used to have "random read-only" Changes: 1. Dropped "random" in PURGE and USAGE ACL Used to have "admin no-script ok-loading ok-stale" Changes: 1. Dropped "admin" in WHOAMI, GENPASS, and CAT LATENCY No changes. MODULE No changes. SLOWLOG Used to have "admin random ok-loading ok-stale" Changes: 1. Dropped "random" in RESET OBJECT Used to have "read-only random" Changes: 1. Dropped "random" in ENCODING and REFCOUNT SCRIPT Used to have "may-replicate no-script" Changes: 1. Dropped "may-replicate" in all except FLUSH and LOAD CLIENT Used to have "admin no-script random ok-loading ok-stale" Changes: 1. Dropped "random" in all except INFO and LIST 2. Dropped "admin" in ID, TRACKING, CACHING, GETREDIR, INFO, SETNAME, GETNAME, and REPLY STRALGO No changes. PUBSUB No changes. CLUSTER Changes: 1. Dropped "admin in countkeysinslots, getkeysinslot, info, nodes, keyslot, myid, and slots SENTINEL No changes. (note that DEBUG also fits, but we decided not to convert it since it's for debugging and anyway undocumented) ## New sub-command This commit adds another element to the per-command output of COMMAND, describing the list of subcommands, if any (in the same structure as "regular" commands) Also, it adds a new subcommand: ``` COMMAND LIST [FILTERBY (MODULE <module-name>|ACLCAT <cat>|PATTERN <pattern>)] ``` which returns a set of all commands (unless filters), but excluding subcommands. ## Module API A new module API, RM_CreateSubcommand, was added, in order to allow module writer to define subcommands ## ACL changes: 1. Now, that each subcommand is actually a command, each has its own ACL id. 2. The old mechanism of allowed_subcommands is redundant (blocking/allowing a subcommand is the same as blocking/allowing a regular command), but we had to keep it, to support the widespread usage of allowed_subcommands to block commands with certain args, that aren't subcommands (e.g. "-select +select|0"). 3. I have renamed allowed_subcommands to allowed_firstargs to emphasize the difference. 4. Because subcommands are commands in ACL too, you can now use "-" to block subcommands (e.g. "+client -client|kill"), which wasn't possible in the past. 5. It is also possible to use the allowed_firstargs mechanism with subcommand. For example: `+config -config|set +config|set|loglevel` will block all CONFIG SET except for setting the log level. 6. All of the ACL changes above required some amount of refactoring. ## Misc 1. There are two approaches: Either each subcommand has its own function or all subcommands use the same function, determining what to do according to argv[0]. For now, I took the former approaches only with CONFIG and COMMAND, while other commands use the latter approach (for smaller blamelog diff). 2. Deleted memoryGetKeys: It is no longer needed because MEMORY USAGE now uses the "range" key spec. 4. Bugfix: GETNAME was missing from CLIENT's help message. 5. Sentinel and Redis now use the same table, with the same function pointer. Some commands have a different implementation in Sentinel, so we redirect them (these are ROLE, PUBLISH, and INFO). 6. Command stats now show the stats per subcommand (e.g. instead of stats just for "config" you will have stats for "config|set", "config|get", etc.) 7. It is now possible to use COMMAND directly on subcommands: COMMAND INFO CONFIG|GET (The pipeline syntax was inspired from ACL, and can be used in functions lookupCommandBySds and lookupCommandByCString) 8. STRALGO is now a container command (has "help") ## Breaking changes: 1. Command stats now show the stats per subcommand (see (5) above)
2021-10-20 10:52:57 +02:00
void resetCommandTableStats(dict* commands);
void resetErrorTableStats(void);
2013-06-28 17:08:03 +02:00
void adjustOpenFilesLimit(void);
void incrementErrorCount(const char *fullerr, size_t namelen);
void closeListeningSockets(int unlink_unix_socket);
void updateCachedTime(int update_daylight_info);
void resetServerStats(void);
2016-12-30 02:37:52 +01:00
void activeDefragCycle(void);
unsigned int getLRUClock(void);
unsigned int LRU_CLOCK(void);
const char *evictPolicyToString(void);
struct redisMemOverhead *getMemoryOverheadData(void);
void freeMemoryOverheadData(struct redisMemOverhead *mh);
void checkChildrenDone(void);
int setOOMScoreAdj(int process_class);
Adds pub/sub channel patterns to ACL (#7993) Fixes #7923. This PR appropriates the special `&` symbol (because `@` and `*` are taken), followed by a literal value or pattern for describing the Pub/Sub patterns that an ACL user can interact with. It is similar to the existing key patterns mechanism in function (additive) and implementation (copy-pasta). It also adds the allchannels and resetchannels ACL keywords, naturally. The default user is given allchannels permissions, whereas new users get whatever is defined by the acl-pubsub-default configuration directive. For backward compatibility in 6.2, the default of this directive is allchannels but this is likely to be changed to resetchannels in the next major version for stronger default security settings. Unless allchannels is set for the user, channel access permissions are checked as follows : * Calls to both PUBLISH and SUBSCRIBE will fail unless a pattern matching the argumentative channel name(s) exists for the user. * Calls to PSUBSCRIBE will fail unless the pattern(s) provided as an argument literally exist(s) in the user's list. Such failures are logged to the ACL log. Runtime changes to channel permissions for a user with existing subscribing clients cause said clients to disconnect unless the new permissions permit the connections to continue. Note, however, that PSUBSCRIBErs' patterns are matched literally, so given the change bar:* -> b*, pattern subscribers to bar:* will be disconnected. Notes/questions: * UNSUBSCRIBE, PUNSUBSCRIBE and PUBSUB remain unprotected due to lack of reasons for touching them.
2020-12-01 13:21:39 +01:00
void rejectCommandFormat(client *c, const char *fmt, ...);
void *activeDefragAlloc(void *ptr);
robj *activeDefragStringOb(robj* ob, long *defragged);
Use madvise(MADV_DONTNEED) to release memory to reduce COW (#8974) ## Backgroud As we know, after `fork`, one process will copy pages when writing data to these pages(CoW), and another process still keep old pages, they totally cost more memory. For redis, we suffered that redis consumed much memory when the fork child is serializing key/values, even that maybe cause OOM. But actually we find, in redis fork child process, the child process don't need to keep some memory and parent process may write or update that, for example, child process will never access the key-value that is serialized but users may update it in parent process. So we think it may reduce COW if the child process release memory that it is not needed. ## Implementation For releasing key value in child process, we may think we call `decrRefCount` to free memory, but i find the fork child process still use much memory when we don't write any data to redis, and it costs much more time that slows down bgsave. Maybe because memory allocator doesn't really release memory to OS, and it may modify some inner data for this free operation, especially when we free small objects. Moreover, CoW is based on pages, so it is a easy way that we only free the memory bulk that is not less than kernel page size. madvise(MADV_DONTNEED) can quickly release specified region pages to OS bypassing memory allocator, and allocator still consider that this memory still is used and don't change its inner data. There are some buffers we can release in the fork child process: - **Serialized key-values** the fork child process never access serialized key-values, so we try to free them. Because we only can release big bulk memory, and it is time consumed to iterate all items/members/fields/entries of complex data type. So we decide to iterate them and try to release them only when their average size of item/member/field/entry is more than page size of OS. - **Replication backlog** Because replication backlog is a cycle buffer, it will be changed quickly if redis has heavy write traffic, but in fork child process, we don't need to access that. - **Client buffers** If clients have requests during having the fork child process, clients' buffer also be changed frequently. The memory includes client query buffer, output buffer, and client struct used memory. To get child process peak private dirty memory, we need to count peak memory instead of last used memory, because the child process may continue to release memory (since COW used to only grow till now, the last was equivalent to the peak). Also we're adding a new `current_cow_peak` info variable (to complement the existing `current_cow_size`) Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-04 22:01:46 +02:00
void dismissSds(sds s);
void dismissMemory(void* ptr, size_t size_hint);
void dismissMemoryInChild(void);
#define RESTART_SERVER_NONE 0
#define RESTART_SERVER_GRACEFULLY (1<<0) /* Do proper shutdown. */
#define RESTART_SERVER_CONFIG_REWRITE (1<<1) /* CONFIG REWRITE before restart.*/
int restartServer(int flags, mstime_t delay);
/* Set data type */
robj *setTypeCreate(sds value);
int setTypeAdd(robj *subject, sds value);
int setTypeRemove(robj *subject, sds value);
int setTypeIsMember(robj *subject, sds value);
setTypeIterator *setTypeInitIterator(robj *subject);
void setTypeReleaseIterator(setTypeIterator *si);
int setTypeNext(setTypeIterator *si, sds *sdsele, int64_t *llele);
sds setTypeNextObject(setTypeIterator *si);
int setTypeRandomElement(robj *setobj, sds *sdsele, int64_t *llele);
unsigned long setTypeRandomElements(robj *set, unsigned long count, robj *aux_set);
unsigned long setTypeSize(const robj *subject);
void setTypeConvert(robj *subject, int enc);
robj *setTypeDup(robj *o);
/* Hash data type */
2016-04-25 15:39:33 +02:00
#define HASH_SET_TAKE_FIELD (1<<0)
#define HASH_SET_TAKE_VALUE (1<<1)
#define HASH_SET_COPY 0
2012-01-03 07:14:10 +01:00
void hashTypeConvert(robj *o, int enc);
void hashTypeTryConversion(robj *subject, robj **argv, int start, int end);
int hashTypeExists(robj *o, sds key);
int hashTypeDelete(robj *o, sds key);
unsigned long hashTypeLength(const robj *o);
hashTypeIterator *hashTypeInitIterator(robj *subject);
void hashTypeReleaseIterator(hashTypeIterator *hi);
int hashTypeNext(hashTypeIterator *hi);
void hashTypeCurrentFromListpack(hashTypeIterator *hi, int what,
unsigned char **vstr,
unsigned int *vlen,
long long *vll);
sds hashTypeCurrentFromHashTable(hashTypeIterator *hi, int what);
void hashTypeCurrentObject(hashTypeIterator *hi, int what, unsigned char **vstr, unsigned int *vlen, long long *vll);
sds hashTypeCurrentObjectNewSds(hashTypeIterator *hi, int what);
robj *hashTypeLookupWriteOrCreate(client *c, robj *key);
robj *hashTypeGetValueObject(robj *o, sds field);
2016-04-25 15:39:33 +02:00
int hashTypeSet(robj *o, sds field, sds value, int flags);
robj *hashTypeDup(robj *o);
/* Pub / Sub */
int pubsubUnsubscribeAllChannels(client *c, int notify);
int pubsubUnsubscribeShardAllChannels(client *c, int notify);
void pubsubUnsubscribeShardChannels(robj **channels, unsigned int count);
int pubsubUnsubscribeAllPatterns(client *c, int notify);
int pubsubPublishMessage(robj *channel, robj *message, int sharded);
int pubsubPublishMessageAndPropagateToCluster(robj *channel, robj *message, int sharded);
2019-07-05 12:24:28 +02:00
void addReplyPubsubMessage(client *c, robj *channel, robj *msg);
int serverPubsubSubscriptionCount();
int serverPubsubShardSubscriptionCount();
/* Keyspace events notification */
void notifyKeyspaceEvent(int type, char *event, robj *key, int dbid);
int keyspaceEventsStringToFlags(char *classes);
sds keyspaceEventsFlagsToString(int flags);
/* Configuration */
Module Configurations (#10285) This feature adds the ability to add four different types (Bool, Numeric, String, Enum) of configurations to a module to be accessed via the redis config file, and the CONFIG command. **Configuration Names**: We impose a restriction that a module configuration always starts with the module name and contains a '.' followed by the config name. If a module passes "config1" as the name to a register function, it will be registered as MODULENAME.config1. **Configuration Persistence**: Module Configurations exist only as long as a module is loaded. If a module is unloaded, the configurations are removed. There is now also a minimal core API for removal of standardConfig objects from configs by name. **Get and Set Callbacks**: Storage of config values is owned by the module that registers them, and provides callbacks for Redis to access and manipulate the values. This is exposed through a GET and SET callback. The get callback returns a typed value of the config to redis. The callback takes the name of the configuration, and also a privdata pointer. Note that these only take the CONFIGNAME portion of the config, not the entire MODULENAME.CONFIGNAME. ``` typedef RedisModuleString * (*RedisModuleConfigGetStringFunc)(const char *name, void *privdata); typedef long long (*RedisModuleConfigGetNumericFunc)(const char *name, void *privdata); typedef int (*RedisModuleConfigGetBoolFunc)(const char *name, void *privdata); typedef int (*RedisModuleConfigGetEnumFunc)(const char *name, void *privdata); ``` Configs must also must specify a set callback, i.e. what to do on a CONFIG SET XYZ 123 or when loading configurations from cli/.conf file matching these typedefs. *name* is again just the CONFIGNAME portion, *val* is the parsed value from the core, *privdata* is the registration time privdata pointer, and *err* is for providing errors to a client. ``` typedef int (*RedisModuleConfigSetStringFunc)(const char *name, RedisModuleString *val, void *privdata, RedisModuleString **err); typedef int (*RedisModuleConfigSetNumericFunc)(const char *name, long long val, void *privdata, RedisModuleString **err); typedef int (*RedisModuleConfigSetBoolFunc)(const char *name, int val, void *privdata, RedisModuleString **err); typedef int (*RedisModuleConfigSetEnumFunc)(const char *name, int val, void *privdata, RedisModuleString **err); ``` Modules can also specify an optional apply callback that will be called after value(s) have been set via CONFIG SET: ``` typedef int (*RedisModuleConfigApplyFunc)(RedisModuleCtx *ctx, void *privdata, RedisModuleString **err); ``` **Flags:** We expose 7 new flags to the module, which are used as part of the config registration. ``` #define REDISMODULE_CONFIG_MODIFIABLE 0 /* This is the default for a module config. */ #define REDISMODULE_CONFIG_IMMUTABLE (1ULL<<0) /* Can this value only be set at startup? */ #define REDISMODULE_CONFIG_SENSITIVE (1ULL<<1) /* Does this value contain sensitive information */ #define REDISMODULE_CONFIG_HIDDEN (1ULL<<4) /* This config is hidden in `config get <pattern>` (used for tests/debugging) */ #define REDISMODULE_CONFIG_PROTECTED (1ULL<<5) /* Becomes immutable if enable-protected-configs is enabled. */ #define REDISMODULE_CONFIG_DENY_LOADING (1ULL<<6) /* This config is forbidden during loading. */ /* Numeric Specific Configs */ #define REDISMODULE_CONFIG_MEMORY (1ULL<<7) /* Indicates if this value can be set as a memory value */ ``` **Module Registration APIs**: ``` int (*RedisModule_RegisterBoolConfig)(RedisModuleCtx *ctx, char *name, int default_val, unsigned int flags, RedisModuleConfigGetBoolFunc getfn, RedisModuleConfigSetBoolFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_RegisterNumericConfig)(RedisModuleCtx *ctx, const char *name, long long default_val, unsigned int flags, long long min, long long max, RedisModuleConfigGetNumericFunc getfn, RedisModuleConfigSetNumericFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_RegisterStringConfig)(RedisModuleCtx *ctx, const char *name, const char *default_val, unsigned int flags, RedisModuleConfigGetStringFunc getfn, RedisModuleConfigSetStringFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_RegisterEnumConfig)(RedisModuleCtx *ctx, const char *name, int default_val, unsigned int flags, const char **enum_values, const int *int_values, int num_enum_vals, RedisModuleConfigGetEnumFunc getfn, RedisModuleConfigSetEnumFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_LoadConfigs)(RedisModuleCtx *ctx); ``` The module name will be auto appended along with a "." to the front of the name of the config. **What RM_Register[...]Config does**: A RedisModule struct now keeps a list of ModuleConfig objects which look like: ``` typedef struct ModuleConfig { sds name; /* Name of config without the module name appended to the front */ void *privdata; /* Optional data passed into the module config callbacks */ union get_fn { /* The get callback specificed by the module */ RedisModuleConfigGetStringFunc get_string; RedisModuleConfigGetNumericFunc get_numeric; RedisModuleConfigGetBoolFunc get_bool; RedisModuleConfigGetEnumFunc get_enum; } get_fn; union set_fn { /* The set callback specified by the module */ RedisModuleConfigSetStringFunc set_string; RedisModuleConfigSetNumericFunc set_numeric; RedisModuleConfigSetBoolFunc set_bool; RedisModuleConfigSetEnumFunc set_enum; } set_fn; RedisModuleConfigApplyFunc apply_fn; RedisModule *module; } ModuleConfig; ``` It also registers a standardConfig in the configs array, with a pointer to the ModuleConfig object associated with it. **What happens on a CONFIG GET/SET MODULENAME.MODULECONFIG:** For CONFIG SET, we do the same parsing as is done in config.c and pass that as the argument to the module set callback. For CONFIG GET, we call the module get callback and return that value to config.c to return to a client. **CONFIG REWRITE**: Starting up a server with module configurations in a .conf file but no module load directive will fail. The flip side is also true, specifying a module load and a bunch of module configurations will load those configurations in using the module defined set callbacks on a RM_LoadConfigs call. Configs being rewritten works the same way as it does for standard configs, as the module has the ability to specify a default value. If a module is unloaded with configurations specified in the .conf file those configurations will be commented out from the .conf file on the next config rewrite. **RM_LoadConfigs:** `RedisModule_LoadConfigs(RedisModuleCtx *ctx);` This last API is used to make configs available within the onLoad() after they have been registered. The expected usage is that a module will register all of its configs, then call LoadConfigs to trigger all of the set callbacks, and then can error out if any of them were malformed. LoadConfigs will attempt to set all configs registered to either a .conf file argument/loadex argument or their default value if an argument is not specified. **LoadConfigs is a required function if configs are registered. ** Also note that LoadConfigs **does not** call the apply callbacks, but a module can do that directly after the LoadConfigs call. **New Command: MODULE LOADEX [CONFIG NAME VALUE] [ARGS ...]:** This command provides the ability to provide startup context information to a module. LOADEX stands for "load extended" similar to GETEX. Note that provided config names need the full MODULENAME.MODULECONFIG name. Any additional arguments a module might want are intended to be specified after ARGS. Everything after ARGS is passed to onLoad as RedisModuleString **argv. Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <matolson@amazon.com> Co-authored-by: sundb <sundbcn@gmail.com> Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com> Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2022-03-30 14:47:06 +02:00
/* Configuration Flags */
#define MODIFIABLE_CONFIG 0 /* This is the implied default for a standard
* config, which is mutable. */
#define IMMUTABLE_CONFIG (1ULL<<0) /* Can this value only be set at startup? */
#define SENSITIVE_CONFIG (1ULL<<1) /* Does this value contain sensitive information */
#define DEBUG_CONFIG (1ULL<<2) /* Values that are useful for debugging. */
#define MULTI_ARG_CONFIG (1ULL<<3) /* This config receives multiple arguments. */
#define HIDDEN_CONFIG (1ULL<<4) /* This config is hidden in `config get <pattern>` (used for tests/debugging) */
#define PROTECTED_CONFIG (1ULL<<5) /* Becomes immutable if enable-protected-configs is enabled. */
#define DENY_LOADING_CONFIG (1ULL<<6) /* This config is forbidden during loading. */
#define ALIAS_CONFIG (1ULL<<7) /* For configs with multiple names, this flag is set on the alias. */
#define MODULE_CONFIG (1ULL<<8) /* This config is a module config */
#define INTEGER_CONFIG 0 /* No flags means a simple integer configuration */
#define MEMORY_CONFIG (1<<0) /* Indicates if this value can be loaded as a memory value */
#define PERCENT_CONFIG (1<<1) /* Indicates if this value can be loaded as a percent (and stored as a negative int) */
#define OCTAL_CONFIG (1<<2) /* This value uses octal representation */
/* Enum Configs contain an array of configEnum objects that match a string with an integer. */
typedef struct configEnum {
char *name;
int val;
} configEnum;
/* Type of configuration. */
typedef enum {
BOOL_CONFIG,
NUMERIC_CONFIG,
STRING_CONFIG,
SDS_CONFIG,
ENUM_CONFIG,
SPECIAL_CONFIG,
} configType;
void loadServerConfig(char *filename, char config_from_stdin, char *options);
void appendServerSaveParams(time_t seconds, int changes);
void resetServerSaveParams(void);
struct rewriteConfigState; /* Forward declaration to export API. */
void rewriteConfigRewriteLine(struct rewriteConfigState *state, const char *option, sds line, int force);
void rewriteConfigMarkAsProcessed(struct rewriteConfigState *state, const char *option);
int rewriteConfig(char *path, int force_write);
void initConfigValues();
Module Configurations (#10285) This feature adds the ability to add four different types (Bool, Numeric, String, Enum) of configurations to a module to be accessed via the redis config file, and the CONFIG command. **Configuration Names**: We impose a restriction that a module configuration always starts with the module name and contains a '.' followed by the config name. If a module passes "config1" as the name to a register function, it will be registered as MODULENAME.config1. **Configuration Persistence**: Module Configurations exist only as long as a module is loaded. If a module is unloaded, the configurations are removed. There is now also a minimal core API for removal of standardConfig objects from configs by name. **Get and Set Callbacks**: Storage of config values is owned by the module that registers them, and provides callbacks for Redis to access and manipulate the values. This is exposed through a GET and SET callback. The get callback returns a typed value of the config to redis. The callback takes the name of the configuration, and also a privdata pointer. Note that these only take the CONFIGNAME portion of the config, not the entire MODULENAME.CONFIGNAME. ``` typedef RedisModuleString * (*RedisModuleConfigGetStringFunc)(const char *name, void *privdata); typedef long long (*RedisModuleConfigGetNumericFunc)(const char *name, void *privdata); typedef int (*RedisModuleConfigGetBoolFunc)(const char *name, void *privdata); typedef int (*RedisModuleConfigGetEnumFunc)(const char *name, void *privdata); ``` Configs must also must specify a set callback, i.e. what to do on a CONFIG SET XYZ 123 or when loading configurations from cli/.conf file matching these typedefs. *name* is again just the CONFIGNAME portion, *val* is the parsed value from the core, *privdata* is the registration time privdata pointer, and *err* is for providing errors to a client. ``` typedef int (*RedisModuleConfigSetStringFunc)(const char *name, RedisModuleString *val, void *privdata, RedisModuleString **err); typedef int (*RedisModuleConfigSetNumericFunc)(const char *name, long long val, void *privdata, RedisModuleString **err); typedef int (*RedisModuleConfigSetBoolFunc)(const char *name, int val, void *privdata, RedisModuleString **err); typedef int (*RedisModuleConfigSetEnumFunc)(const char *name, int val, void *privdata, RedisModuleString **err); ``` Modules can also specify an optional apply callback that will be called after value(s) have been set via CONFIG SET: ``` typedef int (*RedisModuleConfigApplyFunc)(RedisModuleCtx *ctx, void *privdata, RedisModuleString **err); ``` **Flags:** We expose 7 new flags to the module, which are used as part of the config registration. ``` #define REDISMODULE_CONFIG_MODIFIABLE 0 /* This is the default for a module config. */ #define REDISMODULE_CONFIG_IMMUTABLE (1ULL<<0) /* Can this value only be set at startup? */ #define REDISMODULE_CONFIG_SENSITIVE (1ULL<<1) /* Does this value contain sensitive information */ #define REDISMODULE_CONFIG_HIDDEN (1ULL<<4) /* This config is hidden in `config get <pattern>` (used for tests/debugging) */ #define REDISMODULE_CONFIG_PROTECTED (1ULL<<5) /* Becomes immutable if enable-protected-configs is enabled. */ #define REDISMODULE_CONFIG_DENY_LOADING (1ULL<<6) /* This config is forbidden during loading. */ /* Numeric Specific Configs */ #define REDISMODULE_CONFIG_MEMORY (1ULL<<7) /* Indicates if this value can be set as a memory value */ ``` **Module Registration APIs**: ``` int (*RedisModule_RegisterBoolConfig)(RedisModuleCtx *ctx, char *name, int default_val, unsigned int flags, RedisModuleConfigGetBoolFunc getfn, RedisModuleConfigSetBoolFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_RegisterNumericConfig)(RedisModuleCtx *ctx, const char *name, long long default_val, unsigned int flags, long long min, long long max, RedisModuleConfigGetNumericFunc getfn, RedisModuleConfigSetNumericFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_RegisterStringConfig)(RedisModuleCtx *ctx, const char *name, const char *default_val, unsigned int flags, RedisModuleConfigGetStringFunc getfn, RedisModuleConfigSetStringFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_RegisterEnumConfig)(RedisModuleCtx *ctx, const char *name, int default_val, unsigned int flags, const char **enum_values, const int *int_values, int num_enum_vals, RedisModuleConfigGetEnumFunc getfn, RedisModuleConfigSetEnumFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_LoadConfigs)(RedisModuleCtx *ctx); ``` The module name will be auto appended along with a "." to the front of the name of the config. **What RM_Register[...]Config does**: A RedisModule struct now keeps a list of ModuleConfig objects which look like: ``` typedef struct ModuleConfig { sds name; /* Name of config without the module name appended to the front */ void *privdata; /* Optional data passed into the module config callbacks */ union get_fn { /* The get callback specificed by the module */ RedisModuleConfigGetStringFunc get_string; RedisModuleConfigGetNumericFunc get_numeric; RedisModuleConfigGetBoolFunc get_bool; RedisModuleConfigGetEnumFunc get_enum; } get_fn; union set_fn { /* The set callback specified by the module */ RedisModuleConfigSetStringFunc set_string; RedisModuleConfigSetNumericFunc set_numeric; RedisModuleConfigSetBoolFunc set_bool; RedisModuleConfigSetEnumFunc set_enum; } set_fn; RedisModuleConfigApplyFunc apply_fn; RedisModule *module; } ModuleConfig; ``` It also registers a standardConfig in the configs array, with a pointer to the ModuleConfig object associated with it. **What happens on a CONFIG GET/SET MODULENAME.MODULECONFIG:** For CONFIG SET, we do the same parsing as is done in config.c and pass that as the argument to the module set callback. For CONFIG GET, we call the module get callback and return that value to config.c to return to a client. **CONFIG REWRITE**: Starting up a server with module configurations in a .conf file but no module load directive will fail. The flip side is also true, specifying a module load and a bunch of module configurations will load those configurations in using the module defined set callbacks on a RM_LoadConfigs call. Configs being rewritten works the same way as it does for standard configs, as the module has the ability to specify a default value. If a module is unloaded with configurations specified in the .conf file those configurations will be commented out from the .conf file on the next config rewrite. **RM_LoadConfigs:** `RedisModule_LoadConfigs(RedisModuleCtx *ctx);` This last API is used to make configs available within the onLoad() after they have been registered. The expected usage is that a module will register all of its configs, then call LoadConfigs to trigger all of the set callbacks, and then can error out if any of them were malformed. LoadConfigs will attempt to set all configs registered to either a .conf file argument/loadex argument or their default value if an argument is not specified. **LoadConfigs is a required function if configs are registered. ** Also note that LoadConfigs **does not** call the apply callbacks, but a module can do that directly after the LoadConfigs call. **New Command: MODULE LOADEX [CONFIG NAME VALUE] [ARGS ...]:** This command provides the ability to provide startup context information to a module. LOADEX stands for "load extended" similar to GETEX. Note that provided config names need the full MODULENAME.MODULECONFIG name. Any additional arguments a module might want are intended to be specified after ARGS. Everything after ARGS is passed to onLoad as RedisModuleString **argv. Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <matolson@amazon.com> Co-authored-by: sundb <sundbcn@gmail.com> Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com> Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2022-03-30 14:47:06 +02:00
void removeConfig(sds name);
sds getConfigDebugInfo();
int allowProtectedAction(int config, client *c);
Module Configurations (#10285) This feature adds the ability to add four different types (Bool, Numeric, String, Enum) of configurations to a module to be accessed via the redis config file, and the CONFIG command. **Configuration Names**: We impose a restriction that a module configuration always starts with the module name and contains a '.' followed by the config name. If a module passes "config1" as the name to a register function, it will be registered as MODULENAME.config1. **Configuration Persistence**: Module Configurations exist only as long as a module is loaded. If a module is unloaded, the configurations are removed. There is now also a minimal core API for removal of standardConfig objects from configs by name. **Get and Set Callbacks**: Storage of config values is owned by the module that registers them, and provides callbacks for Redis to access and manipulate the values. This is exposed through a GET and SET callback. The get callback returns a typed value of the config to redis. The callback takes the name of the configuration, and also a privdata pointer. Note that these only take the CONFIGNAME portion of the config, not the entire MODULENAME.CONFIGNAME. ``` typedef RedisModuleString * (*RedisModuleConfigGetStringFunc)(const char *name, void *privdata); typedef long long (*RedisModuleConfigGetNumericFunc)(const char *name, void *privdata); typedef int (*RedisModuleConfigGetBoolFunc)(const char *name, void *privdata); typedef int (*RedisModuleConfigGetEnumFunc)(const char *name, void *privdata); ``` Configs must also must specify a set callback, i.e. what to do on a CONFIG SET XYZ 123 or when loading configurations from cli/.conf file matching these typedefs. *name* is again just the CONFIGNAME portion, *val* is the parsed value from the core, *privdata* is the registration time privdata pointer, and *err* is for providing errors to a client. ``` typedef int (*RedisModuleConfigSetStringFunc)(const char *name, RedisModuleString *val, void *privdata, RedisModuleString **err); typedef int (*RedisModuleConfigSetNumericFunc)(const char *name, long long val, void *privdata, RedisModuleString **err); typedef int (*RedisModuleConfigSetBoolFunc)(const char *name, int val, void *privdata, RedisModuleString **err); typedef int (*RedisModuleConfigSetEnumFunc)(const char *name, int val, void *privdata, RedisModuleString **err); ``` Modules can also specify an optional apply callback that will be called after value(s) have been set via CONFIG SET: ``` typedef int (*RedisModuleConfigApplyFunc)(RedisModuleCtx *ctx, void *privdata, RedisModuleString **err); ``` **Flags:** We expose 7 new flags to the module, which are used as part of the config registration. ``` #define REDISMODULE_CONFIG_MODIFIABLE 0 /* This is the default for a module config. */ #define REDISMODULE_CONFIG_IMMUTABLE (1ULL<<0) /* Can this value only be set at startup? */ #define REDISMODULE_CONFIG_SENSITIVE (1ULL<<1) /* Does this value contain sensitive information */ #define REDISMODULE_CONFIG_HIDDEN (1ULL<<4) /* This config is hidden in `config get <pattern>` (used for tests/debugging) */ #define REDISMODULE_CONFIG_PROTECTED (1ULL<<5) /* Becomes immutable if enable-protected-configs is enabled. */ #define REDISMODULE_CONFIG_DENY_LOADING (1ULL<<6) /* This config is forbidden during loading. */ /* Numeric Specific Configs */ #define REDISMODULE_CONFIG_MEMORY (1ULL<<7) /* Indicates if this value can be set as a memory value */ ``` **Module Registration APIs**: ``` int (*RedisModule_RegisterBoolConfig)(RedisModuleCtx *ctx, char *name, int default_val, unsigned int flags, RedisModuleConfigGetBoolFunc getfn, RedisModuleConfigSetBoolFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_RegisterNumericConfig)(RedisModuleCtx *ctx, const char *name, long long default_val, unsigned int flags, long long min, long long max, RedisModuleConfigGetNumericFunc getfn, RedisModuleConfigSetNumericFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_RegisterStringConfig)(RedisModuleCtx *ctx, const char *name, const char *default_val, unsigned int flags, RedisModuleConfigGetStringFunc getfn, RedisModuleConfigSetStringFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_RegisterEnumConfig)(RedisModuleCtx *ctx, const char *name, int default_val, unsigned int flags, const char **enum_values, const int *int_values, int num_enum_vals, RedisModuleConfigGetEnumFunc getfn, RedisModuleConfigSetEnumFunc setfn, RedisModuleConfigApplyFunc applyfn, void *privdata); int (*RedisModule_LoadConfigs)(RedisModuleCtx *ctx); ``` The module name will be auto appended along with a "." to the front of the name of the config. **What RM_Register[...]Config does**: A RedisModule struct now keeps a list of ModuleConfig objects which look like: ``` typedef struct ModuleConfig { sds name; /* Name of config without the module name appended to the front */ void *privdata; /* Optional data passed into the module config callbacks */ union get_fn { /* The get callback specificed by the module */ RedisModuleConfigGetStringFunc get_string; RedisModuleConfigGetNumericFunc get_numeric; RedisModuleConfigGetBoolFunc get_bool; RedisModuleConfigGetEnumFunc get_enum; } get_fn; union set_fn { /* The set callback specified by the module */ RedisModuleConfigSetStringFunc set_string; RedisModuleConfigSetNumericFunc set_numeric; RedisModuleConfigSetBoolFunc set_bool; RedisModuleConfigSetEnumFunc set_enum; } set_fn; RedisModuleConfigApplyFunc apply_fn; RedisModule *module; } ModuleConfig; ``` It also registers a standardConfig in the configs array, with a pointer to the ModuleConfig object associated with it. **What happens on a CONFIG GET/SET MODULENAME.MODULECONFIG:** For CONFIG SET, we do the same parsing as is done in config.c and pass that as the argument to the module set callback. For CONFIG GET, we call the module get callback and return that value to config.c to return to a client. **CONFIG REWRITE**: Starting up a server with module configurations in a .conf file but no module load directive will fail. The flip side is also true, specifying a module load and a bunch of module configurations will load those configurations in using the module defined set callbacks on a RM_LoadConfigs call. Configs being rewritten works the same way as it does for standard configs, as the module has the ability to specify a default value. If a module is unloaded with configurations specified in the .conf file those configurations will be commented out from the .conf file on the next config rewrite. **RM_LoadConfigs:** `RedisModule_LoadConfigs(RedisModuleCtx *ctx);` This last API is used to make configs available within the onLoad() after they have been registered. The expected usage is that a module will register all of its configs, then call LoadConfigs to trigger all of the set callbacks, and then can error out if any of them were malformed. LoadConfigs will attempt to set all configs registered to either a .conf file argument/loadex argument or their default value if an argument is not specified. **LoadConfigs is a required function if configs are registered. ** Also note that LoadConfigs **does not** call the apply callbacks, but a module can do that directly after the LoadConfigs call. **New Command: MODULE LOADEX [CONFIG NAME VALUE] [ARGS ...]:** This command provides the ability to provide startup context information to a module. LOADEX stands for "load extended" similar to GETEX. Note that provided config names need the full MODULENAME.MODULECONFIG name. Any additional arguments a module might want are intended to be specified after ARGS. Everything after ARGS is passed to onLoad as RedisModuleString **argv. Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <matolson@amazon.com> Co-authored-by: sundb <sundbcn@gmail.com> Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com> Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2022-03-30 14:47:06 +02:00
/* Module Configuration */
typedef struct ModuleConfig ModuleConfig;
int performModuleConfigSetFromName(sds name, sds value, const char **err);
int performModuleConfigSetDefaultFromName(sds name, const char **err);
void addModuleBoolConfig(const char *module_name, const char *name, int flags, void *privdata, int default_val);
void addModuleStringConfig(const char *module_name, const char *name, int flags, void *privdata, sds default_val);
void addModuleEnumConfig(const char *module_name, const char *name, int flags, void *privdata, int default_val, configEnum *enum_vals);
void addModuleNumericConfig(const char *module_name, const char *name, int flags, void *privdata, long long default_val, int conf_flags, long long lower, long long upper);
void addModuleConfigApply(list *module_configs, ModuleConfig *module_config);
int moduleConfigApplyConfig(list *module_configs, const char **err, const char **err_arg_name);
int getModuleBoolConfig(ModuleConfig *module_config);
int setModuleBoolConfig(ModuleConfig *config, int val, const char **err);
sds getModuleStringConfig(ModuleConfig *module_config);
int setModuleStringConfig(ModuleConfig *config, sds strval, const char **err);
int getModuleEnumConfig(ModuleConfig *module_config);
int setModuleEnumConfig(ModuleConfig *config, int val, const char **err);
long long getModuleNumericConfig(ModuleConfig *module_config);
int setModuleNumericConfig(ModuleConfig *config, long long val, const char **err);
/* db.c -- Keyspace access API */
int removeExpire(redisDb *db, robj *key);
void deleteExpiredKeyAndPropagate(redisDb *db, robj *keyobj);
Sort out mess around propagation and MULTI/EXEC (#9890) The mess: Some parts use alsoPropagate for late propagation, others using an immediate one (propagate()), causing edge cases, ugly/hacky code, and the tendency for bugs The basic idea is that all commands are propagated via alsoPropagate (i.e. added to a list) and the top-most call() is responsible for going over that list and actually propagating them (and wrapping them in MULTI/EXEC if there's more than one command). This is done in the new function, propagatePendingCommands. Callers to propagatePendingCommands: 1. top-most call() (we want all nested call()s to add to the also_propagate array and just the top-most one to propagate them) - via `afterCommand` 2. handleClientsBlockedOnKeys: it is out of call() context and it may propagate stuff - via `afterCommand`. 3. handleClientsBlockedOnKeys edge case: if the looked-up key is already expired, we will propagate the expire but will not unblock any client so `afterCommand` isn't called. in that case, we have to propagate the deletion explicitly. 4. cron stuff: active-expire and eviction may also propagate stuff 5. modules: the module API allows to propagate stuff from just about anywhere (timers, keyspace notifications, threads). I could have tried to catch all the out-of-call-context places but it seemed easier to handle it in one place: when we free the context. in the spirit of what was done in call(), only the top-most freeing of a module context may cause propagation. 6. modules: when using a thread-safe ctx it's not clear when/if the ctx will be freed. we do know that the module must lock the GIL before calling RM_Replicate/RM_Call so we propagate the pending commands when releasing the GIL. A "known limitation", which were actually a bug, was fixed because of this commit (see propagate.tcl): When using a mix of RM_Call with `!` and RM_Replicate, the command would propagate out-of-order: first all the commands from RM_Call, and then the ones from RM_Replicate Another thing worth mentioning is that if, in the past, a client would issue a MULTI/EXEC with just one write command the server would blindly propagate the MULTI/EXEC too, even though it's redundant. not anymore. This commit renames propagate() to propagateNow() in order to cause conflicts in pending PRs. propagatePendingCommands is the only caller of propagateNow, which is now a static, internal helper function. Optimizations: 1. alsoPropagate will not add stuff to also_propagate if there's no AOF and replicas 2. alsoPropagate reallocs also_propagagte exponentially, to save calls to memmove Bugfixes: 1. CONFIG SET can create evictions, sending notifications which can cause to dirty++ with modules. we need to prevent it from propagating to AOF/replicas 2. We need to set current_client in RM_Call. buggy scenario: - CONFIG SET maxmemory, eviction notifications, module hook calls RM_Call - assertion in lookupKey crashes, because current_client has CONFIG SET, which isn't CMD_WRITE 3. minor: in eviction, call propagateDeletion after notification, like active-expire and all commands (we always send a notification before propagating the command)
2021-12-22 23:03:48 +01:00
void propagateDeletion(redisDb *db, robj *key, int lazy);
int keyIsExpired(redisDb *db, robj *key);
long long getExpire(redisDb *db, robj *key);
Replication: fix the infamous key leakage of writable slaves + EXPIRE. BACKGROUND AND USE CASEj Redis slaves are normally write only, however the supprot a "writable" mode which is very handy when scaling reads on slaves, that actually need write operations in order to access data. For instance imagine having slaves replicating certain Sets keys from the master. When accessing the data on the slave, we want to peform intersections between such Sets values. However we don't want to intersect each time: to cache the intersection for some time often is a good idea. To do so, it is possible to setup a slave as a writable slave, and perform the intersection on the slave side, perhaps setting a TTL on the resulting key so that it will expire after some time. THE BUG Problem: in order to have a consistent replication, expiring of keys in Redis replication is up to the master, that synthesize DEL operations to send in the replication stream. However slaves logically expire keys by hiding them from read attempts from clients so that if the master did not promptly sent a DEL, the client still see logically expired keys as non existing. Because slaves don't actively expire keys by actually evicting them but just masking from the POV of read operations, if a key is created in a writable slave, and an expire is set, the key will be leaked forever: 1. No DEL will be received from the master, which does not know about such a key at all. 2. No eviction will be performed by the slave, since it needs to disable eviction because it's up to masters, otherwise consistency of data is lost. THE FIX In order to fix the problem, the slave should be able to tag keys that were created in the slave side and have an expire set in some way. My solution involved using an unique additional dictionary created by the writable slave only if needed. The dictionary is obviously keyed by the key name that we need to track: all the keys that are set with an expire directly by a client writing to the slave are tracked. The value in the dictionary is a bitmap of all the DBs where such a key name need to be tracked, so that we can use a single dictionary to track keys in all the DBs used by the slave (actually this limits the solution to the first 64 DBs, but the default with Redis is to use 16 DBs). This solution allows to pay both a small complexity and CPU penalty, which is zero when the feature is not used, actually. The slave-side eviction is encapsulated in code which is not coupled with the rest of the Redis core, if not for the hook to track the keys. TODO I'm doing the first smoke tests to see if the feature works as expected: so far so good. Unit tests should be added before merging into the 4.0 branch.
2016-12-13 10:20:06 +01:00
void setExpire(client *c, redisDb *db, robj *key, long long when);
int checkAlreadyExpired(long long when);
robj *lookupKeyRead(redisDb *db, robj *key);
robj *lookupKeyWrite(redisDb *db, robj *key);
robj *lookupKeyReadOrReply(client *c, robj *key, robj *reply);
robj *lookupKeyWriteOrReply(client *c, robj *key, robj *reply);
robj *lookupKeyReadWithFlags(redisDb *db, robj *key, int flags);
robj *lookupKeyWriteWithFlags(redisDb *db, robj *key, int flags);
robj *objectCommandLookup(client *c, robj *key);
robj *objectCommandLookupOrReply(client *c, robj *key, robj *reply);
int objectSetLRUOrLFU(robj *val, long long lfu_freq, long long lru_idle,
long long lru_clock, int lru_multiplier);
#define LOOKUP_NONE 0
#define LOOKUP_NOTOUCH (1<<0) /* Don't update LRU. */
#define LOOKUP_NONOTIFY (1<<1) /* Don't trigger keyspace event on key misses. */
#define LOOKUP_NOSTATS (1<<2) /* Don't update keyspace hits/misses counters. */
#define LOOKUP_WRITE (1<<3) /* Delete expired keys even in replicas. */
void dbAdd(redisDb *db, robj *key, robj *val);
int dbAddRDBLoad(redisDb *db, sds key, robj *val);
void dbOverwrite(redisDb *db, robj *key, robj *val);
#define SETKEY_KEEPTTL 1
#define SETKEY_NO_SIGNAL 2
#define SETKEY_ALREADY_EXIST 4
#define SETKEY_DOESNT_EXIST 8
void setKey(client *c, redisDb *db, robj *key, robj *val, int flags);
robj *dbRandomKey(redisDb *db);
int dbSyncDelete(redisDb *db, robj *key);
int dbDelete(redisDb *db, robj *key);
robj *dbUnshareStringValue(redisDb *db, robj *key, robj *o);
#define EMPTYDB_NO_FLAGS 0 /* No flags. */
#define EMPTYDB_ASYNC (1<<0) /* Reclaim memory in another thread. */
Change FUNCTION CREATE, DELETE and FLUSH to be WRITE commands instead of MAY_REPLICATE. (#9953) The issue with MAY_REPLICATE is that all automatic mechanisms to handle write commands will not work. This require have a special treatment for: * Not allow those commands to be executed on RO replica. * Allow those commands to be executed on RO replica from primary connection. * Allow those commands to be executed on the RO replica from AOF. By setting those commands as WRITE commands we are getting all those properties from Redis. Test was added to verify that those properties work as expected. In addition, rearrange when and where functions are flushed. Before this PR functions were flushed manually on `rdbLoadRio` and cleaned manually on failure. This contradicts the assumptions that functions are data and need to be created/deleted alongside with the data. A side effect of this, for example, `debug reload noflush` did not flush the data but did flush the functions, `debug loadaof` flush the data but not the functions. This PR move functions deletion into `emptyDb`. `emptyDb` (renamed to `emptyData`) will now accept an additional flag, `NOFUNCTIONS` which specifically indicate that we do not want to flush the functions (on all other cases, functions will be flushed). Used the new flag on FLUSHALL and FLUSHDB only! Tests were added to `debug reload` and `debug loadaof` to verify that functions behave the same as the data. Notice that because now functions will be deleted along side with the data we can not allow `CLUSTER RESET` to be called from within a function (it will cause the function to be released while running), this PR adds `NO_SCRIPT` flag to `CLUSTER RESET` so it will not be possible to be called from within a function. The other cluster commands are allowed from within a function (there are use-cases that uses `GETKEYSINSLOT` to iterate over all the keys on a given slot). Tests was added to verify `CLUSTER RESET` is denied from within a script. Another small change on this PR is that `RDBFLAGS_ALLOW_DUP` is also applicable on functions. When loading functions, if this flag is set, we will replace old functions with new ones on collisions.
2021-12-21 15:13:29 +01:00
#define EMPTYDB_NOFUNCTIONS (1<<1) /* Indicate not to flush the functions. */
long long emptyData(int dbnum, int flags, void(callback)(dict*));
long long emptyDbStructure(redisDb *dbarray, int dbnum, int async, void(callback)(dict*));
void flushAllDataAndResetRDB(int flags);
long long dbTotalServerKeyCount();
Replica keep serving data during repl-diskless-load=swapdb for better availability (#9323) For diskless replication in swapdb mode, considering we already spend replica memory having a backup of current db to restore in case of failure, we can have the following benefits by instead swapping database only in case we succeeded in transferring db from master: - Avoid `LOADING` response during failed and successful synchronization for cases where the replica is already up and running with data. - Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping. - This could be implemented also for disk replication with similar benefits if consumers are willing to spend the extra memory usage. General notes: - The concept of `backupDb` becomes `tempDb` for clarity. - Async loading mode will only kick in if the replica is syncing from a master that has the same repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline. - New property in INFO: `async_loading` to differentiate from the blocking loading - Slot to Key mapping is now a field of `redisDb` as it's more natural to access it from both server.db and the tempDb that is passed around. - Because this is affecting replicas only, we assume that if they are not readonly and write commands during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET here anyways to avoid complications. Considerations for review: - We have many cases where server.loading flag is used and even though I tried my best, there may be cases where async_loading should be checked as well and cases where it shouldn't (would require very good understanding of whole code) - Several places that had different behavior depending on the loading flag where actually meant to just handle commands coming from the AOF client differently than ones coming from real clients, changed to check CLIENT_ID_AOF instead. **Additional for Release Notes** - Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't contribute on triggering next database SAVE - New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING - Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event. Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED, ABORTED and COMPLETED. - New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions to allow modules to declare they support the diskless replication with async loading (when absent, we fall back to disk-based loading). Co-authored-by: Eduardo Semprebon <edus@saxobank.com> Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-04 09:46:50 +01:00
redisDb *initTempDb(void);
void discardTempDb(redisDb *tempDb, void(callback)(dict*));
int selectDb(client *c, int id);
void signalModifiedKey(client *c, redisDb *db, robj *key);
void signalFlushedDb(int dbid, int async);
void scanGenericCommand(client *c, robj *o, unsigned long cursor);
int parseScanCursorOrReply(client *c, robj *o, unsigned long *cursor);
int dbAsyncDelete(redisDb *db, robj *key);
void emptyDbAsync(redisDb *db);
size_t lazyfreeGetPendingObjectsCount(void);
size_t lazyfreeGetFreedObjectsCount(void);
void lazyfreeResetStats(void);
void freeObjAsync(robj *key, robj *obj, int dbid);
Replication backlog and replicas use one global shared replication buffer (#9166) ## Background For redis master, one replica uses one copy of replication buffer, that is a big waste of memory, more replicas more waste, and allocate/free memory for every reply list also cost much. If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with replicas and can't finish synchronization with replica. If we set client-output-buffer-limit big, master may be OOM when there are many replicas that separately keep much memory. Because replication buffers of different replica client are the same, one simple idea is that all replicas only use one replication buffer, that will effectively save memory. Since replication backlog content is the same as replicas' output buffer, now we can discard replication backlog memory and use global shared replication buffer to implement replication backlog mechanism. ## Implementation I create one global "replication buffer" which contains content of replication stream. The structure of "replication buffer" is similar to the reply list that exists in every client. But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields. ```c /* Replication buffer blocks is the list of replBufBlock. * * +--------------+ +--------------+ +--------------+ * | refcount = 1 | ... | refcount = 0 | ... | refcount = 2 | * +--------------+ +--------------+ +--------------+ * | / \ * | / \ * | / \ * Repl Backlog Replia_A Replia_B * * Each replica or replication backlog increments only the refcount of the * 'ref_repl_buf_node' which it points to. So when replica walks to the next * node, it should first increase the next node's refcount, and when we trim * the replication buffer nodes, we remove node always from the head node which * refcount is 0. If the refcount of the head node is not 0, we must stop * trimming and never iterate the next node. */ /* Similar with 'clientReplyBlock', it is used for shared buffers between * all replica clients and replication backlog. */ typedef struct replBufBlock { int refcount; /* Number of replicas or repl backlog using. */ long long id; /* The unique incremental number. */ long long repl_offset; /* Start replication offset of the block. */ size_t size, used; char buf[]; } replBufBlock; ``` So now when we feed replication stream into replication backlog and all replicas, we only need to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of replication backlog and replicas to references of the global replication buffer blocks. And we also need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim replication backlog if exceeding `repl-backlog-size`. When sending reply to replicas, we also need to iterate replication buffer blocks and send its content, when totally sending one block for replica, we decrease current node count and increase the next current node count, and then free the block which reference is 0 from the head of replication buffer blocks. Since now we use linked list to manage replication backlog, it may cost much time for iterating all linked list nodes to find corresponding replication buffer node. So we create a rax tree to store some nodes for index, but to avoid rax tree occupying too much memory, i record one per 64 nodes for index. Currently, to make partial resynchronization as possible as much, we always let replication backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks, and this method doesn't increase memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced replication buffer blocks when we need to trim backlog for exceeding backlog size setting, we trim backlog incrementally (free 64 blocks per call now), and make it faster in `beforeSleep` (free 640 blocks). ### Other changes - `mem_total_replication_buffers`: we add this field in INFO command, it means the total memory of replication buffers used. - `mem_clients_slaves`: now even replica is slow to replicate, and its output buffer memory is not 0, but it still may be 0, since replication backlog and replicas share one global replication buffer, only if replication buffer memory is more than the repl backlog setting size, we consider the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption of repl backlog. - Key eviction Since all replicas and replication backlog share global replication buffer, we think only the part of exceeding backlog size the extra separate consumption of replicas. Because we trim backlog incrementally in the background, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks disconnect. To avoid massive eviction loop, we don't count the delayed freed replication backlog into used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory. - `client-output-buffer-limit` check for replica clients It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size config (partial sync will succeed and then replica will get disconnected). Such a configuration is ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption implications since the replica client will share the backlog buffers memory. - Drop replication backlog after loading data if needed We always create replication backlog if server is a master, we need it because we put DELs in it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb, it is not possible to support partial resynchronization, to avoid extra memory of replication backlog, we drop it. - Multi IO threads Since all replicas and replication backlog use global replication buffer, if I/O threads are enabled, to guarantee data accessing thread safe, we must let main thread handle sending the output buffer to all replicas. But before, other IO threads could handle sending output buffer of all replicas. ## Other optimizations This solution resolve some other problem: - When replicas disconnect with master since of out of output buffer limit, releasing the output buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now, it doesn't cause freezing. - This implementation may mitigate reply list copy cost time(also freezes server) when one replication has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy reference info, it is very light. - If we set replication backlog size big, it also may cost much time to copy replication backlog into replica's output buffer. But this commit eliminates this problem. - Resizing replication backlog size doesn't empty current replication backlog content.
2021-10-25 08:24:31 +02:00
void freeReplicationBacklogRefMemAsync(list *blocks, rax *index);
/* API to get key arguments from commands */
#define GET_KEYSPEC_DEFAULT 0
#define GET_KEYSPEC_INCLUDE_NOT_KEYS (1<<0) /* Consider 'fake' keys as keys */
#define GET_KEYSPEC_RETURN_PARTIAL (1<<1) /* Return all keys that can be found */
int getKeysFromCommandWithSpecs(struct redisCommand *cmd, robj **argv, int argc, int search_flags, getKeysResult *result);
keyReference *getKeysPrepareResult(getKeysResult *result, int numkeys);
int getKeysFromCommand(struct redisCommand *cmd, robj **argv, int argc, getKeysResult *result);
int doesCommandHaveKeys(struct redisCommand *cmd);
int getChannelsFromCommand(struct redisCommand *cmd, robj **argv, int argc, getKeysResult *result);
int doesCommandHaveChannelsWithFlags(struct redisCommand *cmd, int flags);
void getKeysFreeResult(getKeysResult *result);
int sintercardGetKeys(struct redisCommand *cmd,robj **argv, int argc, getKeysResult *result);
int zunionInterDiffGetKeys(struct redisCommand *cmd,robj **argv, int argc, getKeysResult *result);
int zunionInterDiffStoreGetKeys(struct redisCommand *cmd,robj **argv, int argc, getKeysResult *result);
int evalGetKeys(struct redisCommand *cmd, robj **argv, int argc, getKeysResult *result);
int functionGetKeys(struct redisCommand *cmd, robj **argv, int argc, getKeysResult *result);
int sortGetKeys(struct redisCommand *cmd, robj **argv, int argc, getKeysResult *result);
int sortROGetKeys(struct redisCommand *cmd, robj **argv, int argc, getKeysResult *result);
int migrateGetKeys(struct redisCommand *cmd, robj **argv, int argc, getKeysResult *result);
int georadiusGetKeys(struct redisCommand *cmd, robj **argv, int argc, getKeysResult *result);
int xreadGetKeys(struct redisCommand *cmd, robj **argv, int argc, getKeysResult *result);
int lmpopGetKeys(struct redisCommand *cmd, robj **argv, int argc, getKeysResult *result);
int blmpopGetKeys(struct redisCommand *cmd, robj **argv, int argc, getKeysResult *result);
int zmpopGetKeys(struct redisCommand *cmd, robj **argv, int argc, getKeysResult *result);
int bzmpopGetKeys(struct redisCommand *cmd, robj **argv, int argc, getKeysResult *result);
int setGetKeys(struct redisCommand *cmd, robj **argv, int argc, getKeysResult *result);
int bitfieldGetKeys(struct redisCommand *cmd, robj **argv, int argc, getKeysResult *result);
2011-03-29 17:51:15 +02:00
unsigned short crc16(const char *buf, int len);
/* Sentinel */
void initSentinelConfig(void);
void initSentinel(void);
void sentinelTimer(void);
const char *sentinelHandleConfiguration(char **argv, int argc);
void queueSentinelConfig(sds *argv, int argc, int linenum, sds line);
void loadSentinelConfigFromQueue(void);
void sentinelIsRunning(void);
void sentinelCheckConfigFile(void);
Treat subcommands as commands (#9504) ## Intro The purpose is to allow having different flags/ACL categories for subcommands (Example: CONFIG GET is ok-loading but CONFIG SET isn't) We create a small command table for every command that has subcommands and each subcommand has its own flags, etc. (same as a "regular" command) This commit also unites the Redis and the Sentinel command tables ## Affected commands CONFIG Used to have "admin ok-loading ok-stale no-script" Changes: 1. Dropped "ok-loading" in all except GET (this doesn't change behavior since there were checks in the code doing that) XINFO Used to have "read-only random" Changes: 1. Dropped "random" in all except CONSUMERS XGROUP Used to have "write use-memory" Changes: 1. Dropped "use-memory" in all except CREATE and CREATECONSUMER COMMAND No changes. MEMORY Used to have "random read-only" Changes: 1. Dropped "random" in PURGE and USAGE ACL Used to have "admin no-script ok-loading ok-stale" Changes: 1. Dropped "admin" in WHOAMI, GENPASS, and CAT LATENCY No changes. MODULE No changes. SLOWLOG Used to have "admin random ok-loading ok-stale" Changes: 1. Dropped "random" in RESET OBJECT Used to have "read-only random" Changes: 1. Dropped "random" in ENCODING and REFCOUNT SCRIPT Used to have "may-replicate no-script" Changes: 1. Dropped "may-replicate" in all except FLUSH and LOAD CLIENT Used to have "admin no-script random ok-loading ok-stale" Changes: 1. Dropped "random" in all except INFO and LIST 2. Dropped "admin" in ID, TRACKING, CACHING, GETREDIR, INFO, SETNAME, GETNAME, and REPLY STRALGO No changes. PUBSUB No changes. CLUSTER Changes: 1. Dropped "admin in countkeysinslots, getkeysinslot, info, nodes, keyslot, myid, and slots SENTINEL No changes. (note that DEBUG also fits, but we decided not to convert it since it's for debugging and anyway undocumented) ## New sub-command This commit adds another element to the per-command output of COMMAND, describing the list of subcommands, if any (in the same structure as "regular" commands) Also, it adds a new subcommand: ``` COMMAND LIST [FILTERBY (MODULE <module-name>|ACLCAT <cat>|PATTERN <pattern>)] ``` which returns a set of all commands (unless filters), but excluding subcommands. ## Module API A new module API, RM_CreateSubcommand, was added, in order to allow module writer to define subcommands ## ACL changes: 1. Now, that each subcommand is actually a command, each has its own ACL id. 2. The old mechanism of allowed_subcommands is redundant (blocking/allowing a subcommand is the same as blocking/allowing a regular command), but we had to keep it, to support the widespread usage of allowed_subcommands to block commands with certain args, that aren't subcommands (e.g. "-select +select|0"). 3. I have renamed allowed_subcommands to allowed_firstargs to emphasize the difference. 4. Because subcommands are commands in ACL too, you can now use "-" to block subcommands (e.g. "+client -client|kill"), which wasn't possible in the past. 5. It is also possible to use the allowed_firstargs mechanism with subcommand. For example: `+config -config|set +config|set|loglevel` will block all CONFIG SET except for setting the log level. 6. All of the ACL changes above required some amount of refactoring. ## Misc 1. There are two approaches: Either each subcommand has its own function or all subcommands use the same function, determining what to do according to argv[0]. For now, I took the former approaches only with CONFIG and COMMAND, while other commands use the latter approach (for smaller blamelog diff). 2. Deleted memoryGetKeys: It is no longer needed because MEMORY USAGE now uses the "range" key spec. 4. Bugfix: GETNAME was missing from CLIENT's help message. 5. Sentinel and Redis now use the same table, with the same function pointer. Some commands have a different implementation in Sentinel, so we redirect them (these are ROLE, PUBLISH, and INFO). 6. Command stats now show the stats per subcommand (e.g. instead of stats just for "config" you will have stats for "config|set", "config|get", etc.) 7. It is now possible to use COMMAND directly on subcommands: COMMAND INFO CONFIG|GET (The pipeline syntax was inspired from ACL, and can be used in functions lookupCommandBySds and lookupCommandByCString) 8. STRALGO is now a container command (has "help") ## Breaking changes: 1. Command stats now show the stats per subcommand (see (5) above)
2021-10-20 10:52:57 +02:00
void sentinelCommand(client *c);
void sentinelInfoCommand(client *c);
void sentinelPublishCommand(client *c);
void sentinelRoleCommand(client *c);
/* redis-check-rdb & aof */
int redis_check_rdb(char *rdbfilename, FILE *fp);
int redis_check_rdb_main(int argc, char **argv, FILE *fp);
int redis_check_aof_main(int argc, char **argv);
/* Scripting */
void scriptingInit(int setup);
int ldbRemoveChild(pid_t pid);
void ldbKillForkedSessions(void);
int ldbPendingChildren(void);
sds luaCreateFunction(client *c, robj *body);
void luaLdbLineHook(lua_State *lua, lua_Debug *ar);
void freeLuaScriptsAsync(dict *lua_scripts);
Redis Function Libraries (#10004) # Redis Function Libraries This PR implements Redis Functions Libraries as describe on: https://github.com/redis/redis/issues/9906. Libraries purpose is to provide a better code sharing between functions by allowing to create multiple functions in a single command. Functions that were created together can safely share code between each other without worrying about compatibility issues and versioning. Creating a new library is done using 'FUNCTION LOAD' command (full API is described below) This PR introduces a new struct called libraryInfo, libraryInfo holds information about a library: * name - name of the library * engine - engine used to create the library * code - library code * description - library description * functions - the functions exposed by the library When Redis gets the `FUNCTION LOAD` command it creates a new empty libraryInfo. Redis passes the `CODE` to the relevant engine alongside the empty libraryInfo. As a result, the engine will create one or more functions by calling 'libraryCreateFunction'. The new funcion will be added to the newly created libraryInfo. So far Everything is happening locally on the libraryInfo so it is easy to abort the operation (in case of an error) by simply freeing the libraryInfo. After the library info is fully constructed we start the joining phase by which we will join the new library to the other libraries currently exist on Redis. The joining phase make sure there is no function collision and add the library to the librariesCtx (renamed from functionCtx). LibrariesCtx is used all around the code in the exact same way as functionCtx was used (with respect to RDB loading, replicatio, ...). The only difference is that apart from function dictionary (maps function name to functionInfo object), the librariesCtx contains also a libraries dictionary that maps library name to libraryInfo object. ## New API ### FUNCTION LOAD `FUNCTION LOAD <ENGINE> <LIBRARY NAME> [REPLACE] [DESCRIPTION <DESCRIPTION>] <CODE>` Create a new library with the given parameters: * ENGINE - REPLACE Engine name to use to create the library. * LIBRARY NAME - The new library name. * REPLACE - If the library already exists, replace it. * DESCRIPTION - Library description. * CODE - Library code. Return "OK" on success, or error on the following cases: * Library name already taken and REPLACE was not used * Name collision with another existing library (even if replace was uses) * Library registration failed by the engine (usually compilation error) ## Changed API ### FUNCTION LIST `FUNCTION LIST [LIBRARYNAME <LIBRARY NAME PATTERN>] [WITHCODE]` Command was modified to also allow getting libraries code (so `FUNCTION INFO` command is no longer needed and removed). In addition the command gets an option argument, `LIBRARYNAME` allows you to only get libraries that match the given `LIBRARYNAME` pattern. By default, it returns all libraries. ### INFO MEMORY Added number of libraries to `INFO MEMORY` ### Commands flags `DENYOOM` flag was set on `FUNCTION LOAD` and `FUNCTION RESTORE`. We consider those commands as commands that add new data to the dateset (functions are data) and so we want to disallows to run those commands on OOM. ## Removed API * FUNCTION CREATE - Decided on https://github.com/redis/redis/issues/9906 * FUNCTION INFO - Decided on https://github.com/redis/redis/issues/9899 ## Lua engine changes When the Lua engine gets the code given on `FUNCTION LOAD` command, it immediately runs it, we call this run the loading run. Loading run is not a usual script run, it is not possible to invoke any Redis command from within the load run. Instead there is a new API provided by `library` object. The new API's: * `redis.log` - behave the same as `redis.log` * `redis.register_function` - register a new function to the library The loading run purpose is to register functions using the new `redis.register_function` API. Any attempt to use any other API will result in an error. In addition, the load run is has a time limit of 500ms, error is raise on timeout and the entire operation is aborted. ### `redis.register_function` `redis.register_function(<function_name>, <callback>, [<description>])` This new API allows users to register a new function that will be linked to the newly created library. This API can only be called during the load run (see definition above). Any attempt to use it outside of the load run will result in an error. The parameters pass to the API are: * function_name - Function name (must be a Lua string) * callback - Lua function object that will be called when the function is invokes using fcall/fcall_ro * description - Function description, optional (must be a Lua string). ### Example The following example creates a library called `lib` with 2 functions, `f1` and `f1`, returns 1 and 2 respectively: ``` local function f1(keys, args)     return 1 end local function f2(keys, args)     return 2 end redis.register_function('f1', f1) redis.register_function('f2', f2) ``` Notice: Unlike `eval`, functions inside a library get the KEYS and ARGV as arguments to the functions and not as global. ### Technical Details On the load run we only want the user to be able to call a white list on API's. This way, in the future, if new API's will be added, the new API's will not be available to the load run unless specifically added to this white list. We put the while list on the `library` object and make sure the `library` object is only available to the load run by using [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv) API. This API allows us to set the `globals` of a function (and all the function it creates). Before starting the load run we create a new fresh Lua table (call it `g`) that only contains the `library` API (we make sure to set global protection on this table just like the general global protection already exists today), then we use [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv) to set `g` as the global table of the load run. After the load run finished we update `g` metatable and set `__index` and `__newindex` functions to be `_G` (Lua default globals), we also pop out the `library` object as we do not need it anymore. This way, any function that was created on the load run (and will be invoke using `fcall`) will see the default globals as it expected to see them and will not have the `library` API anymore. An important outcome of this new approach is that now we can achieve a distinct global table for each library (it is not yet like that but it is very easy to achieve it now). In the future we can decide to remove global protection because global on different libraries will not collide or we can chose to give different API to different libraries base on some configuration or input. Notice that this technique was meant to prevent errors and was not meant to prevent malicious user from exploit it. For example, the load run can still save the `library` object on some local variable and then using in `fcall` context. To prevent such a malicious use, the C code also make sure it is running in the right context and if not raise an error.
2022-01-06 12:39:38 +01:00
void freeFunctionsAsync(functionsLibCtx *lib_ctx);
int ldbIsEnabled();
void ldbLog(sds entry);
void ldbLogRedisReply(char *reply);
void sha1hex(char *digest, char *script, size_t len);
unsigned long evalMemory();
dict* evalScriptsDict();
unsigned long evalScriptsMemory();
typedef struct luaScript {
uint64_t flags;
robj *body;
} luaScript;
/* Blocked clients */
void processUnblockedClients(void);
void blockClient(client *c, int btype);
void unblockClient(client *c);
void queueClientForReprocessing(client *c);
void replyToBlockedClientTimedOut(client *c);
int getTimeoutFromObjectOrReply(client *c, robj *object, mstime_t *timeout, int unit);
void disconnectAllBlockedClients(void);
void handleClientsBlockedOnKeys(void);
void signalKeyAsReady(redisDb *db, robj *key, int type);
void blockForKeys(client *c, int btype, robj **keys, int numkeys, long count, mstime_t timeout, robj *target, struct blockPos *blockpos, streamID *ids);
void updateStatsOnUnblock(client *c, long blocked_us, long reply_us, int had_errors);
void scanDatabaseForDeletedStreams(redisDb *emptied, redisDb *replaced_with);
/* timeout.c -- Blocked clients timeout and connections timeout. */
void addClientToTimeoutTable(client *c);
void removeClientFromTimeoutTable(client *c);
void handleBlockedClientsTimeout(void);
int clientsCronHandleTimeout(client *c, mstime_t now_ms);
/* expire.c -- Handling of expired keys */
void activeExpireCycle(int type);
Replication: fix the infamous key leakage of writable slaves + EXPIRE. BACKGROUND AND USE CASEj Redis slaves are normally write only, however the supprot a "writable" mode which is very handy when scaling reads on slaves, that actually need write operations in order to access data. For instance imagine having slaves replicating certain Sets keys from the master. When accessing the data on the slave, we want to peform intersections between such Sets values. However we don't want to intersect each time: to cache the intersection for some time often is a good idea. To do so, it is possible to setup a slave as a writable slave, and perform the intersection on the slave side, perhaps setting a TTL on the resulting key so that it will expire after some time. THE BUG Problem: in order to have a consistent replication, expiring of keys in Redis replication is up to the master, that synthesize DEL operations to send in the replication stream. However slaves logically expire keys by hiding them from read attempts from clients so that if the master did not promptly sent a DEL, the client still see logically expired keys as non existing. Because slaves don't actively expire keys by actually evicting them but just masking from the POV of read operations, if a key is created in a writable slave, and an expire is set, the key will be leaked forever: 1. No DEL will be received from the master, which does not know about such a key at all. 2. No eviction will be performed by the slave, since it needs to disable eviction because it's up to masters, otherwise consistency of data is lost. THE FIX In order to fix the problem, the slave should be able to tag keys that were created in the slave side and have an expire set in some way. My solution involved using an unique additional dictionary created by the writable slave only if needed. The dictionary is obviously keyed by the key name that we need to track: all the keys that are set with an expire directly by a client writing to the slave are tracked. The value in the dictionary is a bitmap of all the DBs where such a key name need to be tracked, so that we can use a single dictionary to track keys in all the DBs used by the slave (actually this limits the solution to the first 64 DBs, but the default with Redis is to use 16 DBs). This solution allows to pay both a small complexity and CPU penalty, which is zero when the feature is not used, actually. The slave-side eviction is encapsulated in code which is not coupled with the rest of the Redis core, if not for the hook to track the keys. TODO I'm doing the first smoke tests to see if the feature works as expected: so far so good. Unit tests should be added before merging into the 4.0 branch.
2016-12-13 10:20:06 +01:00
void expireSlaveKeys(void);
void rememberSlaveKeyWithExpire(redisDb *db, robj *key);
void flushSlaveKeysWithExpireList(void);
size_t getSlaveKeyWithExpireCount(void);
/* evict.c -- maxmemory handling and LRU eviction. */
void evictionPoolAlloc(void);
#define LFU_INIT_VAL 5
unsigned long LFUGetTimeInMinutes(void);
uint8_t LFULogIncr(uint8_t value);
unsigned long LFUDecrAndReturn(robj *o);
#define EVICT_OK 0
#define EVICT_RUNNING 1
#define EVICT_FAIL 2
int performEvictions(void);
void startEvictionTimeProc(void);
/* Keys hashing / comparison functions for dict.c hash tables. */
Use SipHash hash function to mitigate HashDos attempts. This change attempts to switch to an hash function which mitigates the effects of the HashDoS attack (denial of service attack trying to force data structures to worst case behavior) while at the same time providing Redis with an hash function that does not expect the input data to be word aligned, a condition no longer true now that sds.c strings have a varialbe length header. Note that it is possible sometimes that even using an hash function for which collisions cannot be generated without knowing the seed, special implementation details or the exposure of the seed in an indirect way (for example the ability to add elements to a Set and check the return in which Redis returns them with SMEMBERS) may make the attacker's life simpler in the process of trying to guess the correct seed, however the next step would be to switch to a log(N) data structure when too many items in a single bucket are detected: this seems like an overkill in the case of Redis. SPEED REGRESION TESTS: In order to verify that switching from MurmurHash to SipHash had no impact on speed, a set of benchmarks involving fast insertion of 5 million of keys were performed. The result shows Redis with SipHash in high pipelining conditions to be about 4% slower compared to using the previous hash function. However this could partially be related to the fact that the current implementation does not attempt to hash whole words at a time but reads single bytes, in order to have an output which is endian-netural and at the same time working on systems where unaligned memory accesses are a problem. Further X86 specific optimizations should be tested, the function may easily get at the same level of MurMurHash2 if a few optimizations are performed.
2017-02-20 16:09:54 +01:00
uint64_t dictSdsHash(const void *key);
uint64_t dictSdsCaseHash(const void *key);
int dictSdsKeyCompare(dict *d, const void *key1, const void *key2);
int dictSdsKeyCaseCompare(dict *d, const void *key1, const void *key2);
void dictSdsDestructor(dict *d, void *val);
void *dictSdsDup(dict *d, const void *key);
/* Git SHA1 */
char *redisGitSHA1(void);
char *redisGitDirty(void);
uint64_t redisBuildId(void);
2019-10-02 11:21:24 +02:00
char *redisBuildIdString(void);
/* Commands prototypes */
void authCommand(client *c);
void pingCommand(client *c);
void echoCommand(client *c);
void commandCommand(client *c);
Treat subcommands as commands (#9504) ## Intro The purpose is to allow having different flags/ACL categories for subcommands (Example: CONFIG GET is ok-loading but CONFIG SET isn't) We create a small command table for every command that has subcommands and each subcommand has its own flags, etc. (same as a "regular" command) This commit also unites the Redis and the Sentinel command tables ## Affected commands CONFIG Used to have "admin ok-loading ok-stale no-script" Changes: 1. Dropped "ok-loading" in all except GET (this doesn't change behavior since there were checks in the code doing that) XINFO Used to have "read-only random" Changes: 1. Dropped "random" in all except CONSUMERS XGROUP Used to have "write use-memory" Changes: 1. Dropped "use-memory" in all except CREATE and CREATECONSUMER COMMAND No changes. MEMORY Used to have "random read-only" Changes: 1. Dropped "random" in PURGE and USAGE ACL Used to have "admin no-script ok-loading ok-stale" Changes: 1. Dropped "admin" in WHOAMI, GENPASS, and CAT LATENCY No changes. MODULE No changes. SLOWLOG Used to have "admin random ok-loading ok-stale" Changes: 1. Dropped "random" in RESET OBJECT Used to have "read-only random" Changes: 1. Dropped "random" in ENCODING and REFCOUNT SCRIPT Used to have "may-replicate no-script" Changes: 1. Dropped "may-replicate" in all except FLUSH and LOAD CLIENT Used to have "admin no-script random ok-loading ok-stale" Changes: 1. Dropped "random" in all except INFO and LIST 2. Dropped "admin" in ID, TRACKING, CACHING, GETREDIR, INFO, SETNAME, GETNAME, and REPLY STRALGO No changes. PUBSUB No changes. CLUSTER Changes: 1. Dropped "admin in countkeysinslots, getkeysinslot, info, nodes, keyslot, myid, and slots SENTINEL No changes. (note that DEBUG also fits, but we decided not to convert it since it's for debugging and anyway undocumented) ## New sub-command This commit adds another element to the per-command output of COMMAND, describing the list of subcommands, if any (in the same structure as "regular" commands) Also, it adds a new subcommand: ``` COMMAND LIST [FILTERBY (MODULE <module-name>|ACLCAT <cat>|PATTERN <pattern>)] ``` which returns a set of all commands (unless filters), but excluding subcommands. ## Module API A new module API, RM_CreateSubcommand, was added, in order to allow module writer to define subcommands ## ACL changes: 1. Now, that each subcommand is actually a command, each has its own ACL id. 2. The old mechanism of allowed_subcommands is redundant (blocking/allowing a subcommand is the same as blocking/allowing a regular command), but we had to keep it, to support the widespread usage of allowed_subcommands to block commands with certain args, that aren't subcommands (e.g. "-select +select|0"). 3. I have renamed allowed_subcommands to allowed_firstargs to emphasize the difference. 4. Because subcommands are commands in ACL too, you can now use "-" to block subcommands (e.g. "+client -client|kill"), which wasn't possible in the past. 5. It is also possible to use the allowed_firstargs mechanism with subcommand. For example: `+config -config|set +config|set|loglevel` will block all CONFIG SET except for setting the log level. 6. All of the ACL changes above required some amount of refactoring. ## Misc 1. There are two approaches: Either each subcommand has its own function or all subcommands use the same function, determining what to do according to argv[0]. For now, I took the former approaches only with CONFIG and COMMAND, while other commands use the latter approach (for smaller blamelog diff). 2. Deleted memoryGetKeys: It is no longer needed because MEMORY USAGE now uses the "range" key spec. 4. Bugfix: GETNAME was missing from CLIENT's help message. 5. Sentinel and Redis now use the same table, with the same function pointer. Some commands have a different implementation in Sentinel, so we redirect them (these are ROLE, PUBLISH, and INFO). 6. Command stats now show the stats per subcommand (e.g. instead of stats just for "config" you will have stats for "config|set", "config|get", etc.) 7. It is now possible to use COMMAND directly on subcommands: COMMAND INFO CONFIG|GET (The pipeline syntax was inspired from ACL, and can be used in functions lookupCommandBySds and lookupCommandByCString) 8. STRALGO is now a container command (has "help") ## Breaking changes: 1. Command stats now show the stats per subcommand (see (5) above)
2021-10-20 10:52:57 +02:00
void commandCountCommand(client *c);
void commandListCommand(client *c);
void commandInfoCommand(client *c);
void commandGetKeysCommand(client *c);
void commandGetKeysAndFlagsCommand(client *c);
Treat subcommands as commands (#9504) ## Intro The purpose is to allow having different flags/ACL categories for subcommands (Example: CONFIG GET is ok-loading but CONFIG SET isn't) We create a small command table for every command that has subcommands and each subcommand has its own flags, etc. (same as a "regular" command) This commit also unites the Redis and the Sentinel command tables ## Affected commands CONFIG Used to have "admin ok-loading ok-stale no-script" Changes: 1. Dropped "ok-loading" in all except GET (this doesn't change behavior since there were checks in the code doing that) XINFO Used to have "read-only random" Changes: 1. Dropped "random" in all except CONSUMERS XGROUP Used to have "write use-memory" Changes: 1. Dropped "use-memory" in all except CREATE and CREATECONSUMER COMMAND No changes. MEMORY Used to have "random read-only" Changes: 1. Dropped "random" in PURGE and USAGE ACL Used to have "admin no-script ok-loading ok-stale" Changes: 1. Dropped "admin" in WHOAMI, GENPASS, and CAT LATENCY No changes. MODULE No changes. SLOWLOG Used to have "admin random ok-loading ok-stale" Changes: 1. Dropped "random" in RESET OBJECT Used to have "read-only random" Changes: 1. Dropped "random" in ENCODING and REFCOUNT SCRIPT Used to have "may-replicate no-script" Changes: 1. Dropped "may-replicate" in all except FLUSH and LOAD CLIENT Used to have "admin no-script random ok-loading ok-stale" Changes: 1. Dropped "random" in all except INFO and LIST 2. Dropped "admin" in ID, TRACKING, CACHING, GETREDIR, INFO, SETNAME, GETNAME, and REPLY STRALGO No changes. PUBSUB No changes. CLUSTER Changes: 1. Dropped "admin in countkeysinslots, getkeysinslot, info, nodes, keyslot, myid, and slots SENTINEL No changes. (note that DEBUG also fits, but we decided not to convert it since it's for debugging and anyway undocumented) ## New sub-command This commit adds another element to the per-command output of COMMAND, describing the list of subcommands, if any (in the same structure as "regular" commands) Also, it adds a new subcommand: ``` COMMAND LIST [FILTERBY (MODULE <module-name>|ACLCAT <cat>|PATTERN <pattern>)] ``` which returns a set of all commands (unless filters), but excluding subcommands. ## Module API A new module API, RM_CreateSubcommand, was added, in order to allow module writer to define subcommands ## ACL changes: 1. Now, that each subcommand is actually a command, each has its own ACL id. 2. The old mechanism of allowed_subcommands is redundant (blocking/allowing a subcommand is the same as blocking/allowing a regular command), but we had to keep it, to support the widespread usage of allowed_subcommands to block commands with certain args, that aren't subcommands (e.g. "-select +select|0"). 3. I have renamed allowed_subcommands to allowed_firstargs to emphasize the difference. 4. Because subcommands are commands in ACL too, you can now use "-" to block subcommands (e.g. "+client -client|kill"), which wasn't possible in the past. 5. It is also possible to use the allowed_firstargs mechanism with subcommand. For example: `+config -config|set +config|set|loglevel` will block all CONFIG SET except for setting the log level. 6. All of the ACL changes above required some amount of refactoring. ## Misc 1. There are two approaches: Either each subcommand has its own function or all subcommands use the same function, determining what to do according to argv[0]. For now, I took the former approaches only with CONFIG and COMMAND, while other commands use the latter approach (for smaller blamelog diff). 2. Deleted memoryGetKeys: It is no longer needed because MEMORY USAGE now uses the "range" key spec. 4. Bugfix: GETNAME was missing from CLIENT's help message. 5. Sentinel and Redis now use the same table, with the same function pointer. Some commands have a different implementation in Sentinel, so we redirect them (these are ROLE, PUBLISH, and INFO). 6. Command stats now show the stats per subcommand (e.g. instead of stats just for "config" you will have stats for "config|set", "config|get", etc.) 7. It is now possible to use COMMAND directly on subcommands: COMMAND INFO CONFIG|GET (The pipeline syntax was inspired from ACL, and can be used in functions lookupCommandBySds and lookupCommandByCString) 8. STRALGO is now a container command (has "help") ## Breaking changes: 1. Command stats now show the stats per subcommand (see (5) above)
2021-10-20 10:52:57 +02:00
void commandHelpCommand(client *c);
Move doc metadata from COMMAND to COMMAND DOCS (#10056) Syntax: `COMMAND DOCS [<command name> ...]` Background: Apparently old version of hiredis (and thus also redis-cli) can't support more than 7 levels of multi-bulk nesting. The solution is to move all the doc related metadata from COMMAND to a new COMMAND DOCS sub-command. The new DOCS sub-command returns a map of commands (not an array like in COMMAND), And the same goes for the `subcommands` field inside it (also contains a map) Besides that, the remaining new fields of COMMAND (hints, key-specs, and sub-commands), are placed in the outer array rather than a nested map. this was done mainly for consistency with the old format. Other changes: --- * Allow COMMAND INFO with no arguments, which returns all commands, so that we can some day deprecated the plain COMMAND (no args) * Reduce the amount of deferred replies from both COMMAND and COMMAND DOCS, especially in the inner loops, since these create many small reply objects, which lead to many small write syscalls and many small TCP packets. To make this easier, when populating the command table, we count the history, args, and hints so we later know their size in advance. Additionally, the movablekeys flag was moved into the flags register. * Update generate-commands-json.py to take the data from both command, it now executes redis-cli directly, instead of taking input from stdin. * Sub-commands in both COMMAND (and COMMAND INFO), and also COMMAND DOCS, show their full name. i.e. CONFIG * GET will be shown as `config|get` rather than just `get`. This will be visible both when asking for `COMMAND INFO config` and COMMAND INFO config|get`, but is especially important for the later. i.e. imagine someone doing `COMMAND INFO slowlog|get config|get` not being able to distinguish between the two items in the array response.
2022-01-11 16:16:16 +01:00
void commandDocsCommand(client *c);
void setCommand(client *c);
void setnxCommand(client *c);
void setexCommand(client *c);
void psetexCommand(client *c);
void getCommand(client *c);
GETEX, GETDEL and SET PXAT/EXAT (#8327) This commit introduces two new command and two options for an existing command GETEX <key> [PERSIST][EX seconds][PX milliseconds] [EXAT seconds-timestamp] [PXAT milliseconds-timestamp] The getexCommand() function implements extended options and variants of the GET command. Unlike GET command this command is not read-only. Only one of the options can be used at a given time. 1. PERSIST removes any TTL associated with the key. 2. EX Set expiry TTL in seconds. 3. PX Set expiry TTL in milliseconds. 4. EXAT Same like EX instead of specifying the number of seconds representing the TTL (time to live), it takes an absolute Unix timestamp 5. PXAT Same like PX instead of specifying the number of milliseconds representing the TTL (time to live), it takes an absolute Unix timestamp Command would return either the bulk string, error or nil. GETDEL <key> Would delete the key after getting. SET key value [NX] [XX] [KEEPTTL] [GET] [EX <seconds>] [PX <milliseconds>] [EXAT <seconds-timestamp>][PXAT <milliseconds-timestamp>] Two new options added here are EXAT and PXAT Key implementation notes - `SET` with `PX/EX/EXAT/PXAT` is always translated to `PXAT` in `AOF`. When relative time is specified (`PX/EX`), replication will always use `PX`. - `setexCommand` and `psetexCommand` would no longer need translation in `feedAppendOnlyFile` as they are modified to invoke `setGenericCommand ` with appropriate flags which will take care of correct AOF translation. - `GETEX` without any optional argument behaves like `GET`. - `GETEX` command is never propagated, It is either propagated as `PEXPIRE[AT], or PERSIST`. - `GETDEL` command is propagated as `DEL` - Combined the validation for `SET` and `GETEX` arguments. - Test cases to validate AOF/Replication propagation
2021-01-27 18:47:26 +01:00
void getexCommand(client *c);
void getdelCommand(client *c);
void delCommand(client *c);
void unlinkCommand(client *c);
void existsCommand(client *c);
void setbitCommand(client *c);
void getbitCommand(client *c);
void bitfieldCommand(client *c);
void bitfieldroCommand(client *c);
void setrangeCommand(client *c);
void getrangeCommand(client *c);
void incrCommand(client *c);
void decrCommand(client *c);
void incrbyCommand(client *c);
void decrbyCommand(client *c);
void incrbyfloatCommand(client *c);
void selectCommand(client *c);
SWAPDB command. This new command swaps two Redis databases, so that immediately all the clients connected to a given DB will see the data of the other DB, and the other way around. Example: SWAPDB 0 1 This will swap DB 0 with DB 1. All the clients connected with DB 0 will immediately see the new data, exactly like all the clients connected with DB 1 will see the data that was formerly of DB 0. MOTIVATION AND HISTORY --- The command was recently demanded by Pedro Melo, but was suggested in the past multiple times, and always refused by me. The reason why it was asked: Imagine you have clients operating in DB 0. At the same time, you create a new version of the dataset in DB 1. When the new version of the dataset is available, you immediately want to swap the two views, so that the clients will transparently use the new version of the data. At the same time you'll likely destroy the DB 1 dataset (that contains the old data) and start to build a new version, to repeat the process. This is an interesting pattern, but the reason why I always opposed to implement this, was that FLUSHDB was a blocking command in Redis before Redis 4.0 improvements. Now we have FLUSHDB ASYNC that releases the old data in O(1) from the point of view of the client, to reclaim memory incrementally in a different thread. At this point, the pattern can really be supported without latency spikes, so I'm providing this implementation for the users to comment. In case a very compelling argument will be made against this new command it may be removed. BEHAVIOR WITH BLOCKING OPERATIONS --- If a client is blocking for a list in a given DB, after the swap it will still be blocked in the same DB ID, since this is the most logical thing to do: if I was blocked for a list push to list "foo", even after the swap I want still a LPUSH to reach the key "foo" in the same DB in order to unblock. However an interesting thing happens when a client is, for instance, blocked waiting for new elements in list "foo" of DB 0. Then the DB 0 and 1 are swapped with SWAPDB. However the DB 1 happened to have a list called "foo" containing elements. When this happens, this implementation can correctly unblock the client. It is possible that there are subtle corner cases that are not covered in the implementation, but since the command is self-contained from the POV of the implementation and the Redis core, it cannot cause anything bad if not used. Tests and documentation are yet to be provided.
2016-10-14 15:28:04 +02:00
void swapdbCommand(client *c);
void randomkeyCommand(client *c);
void keysCommand(client *c);
void scanCommand(client *c);
void dbsizeCommand(client *c);
void lastsaveCommand(client *c);
void saveCommand(client *c);
void bgsaveCommand(client *c);
void bgrewriteaofCommand(client *c);
void shutdownCommand(client *c);
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
void slowlogCommand(client *c);
void moveCommand(client *c);
void copyCommand(client *c);
void renameCommand(client *c);
void renamenxCommand(client *c);
void lpushCommand(client *c);
void rpushCommand(client *c);
void lpushxCommand(client *c);
void rpushxCommand(client *c);
void linsertCommand(client *c);
void lpopCommand(client *c);
void rpopCommand(client *c);
void lmpopCommand(client *c);
void llenCommand(client *c);
void lindexCommand(client *c);
void lrangeCommand(client *c);
void ltrimCommand(client *c);
void typeCommand(client *c);
void lsetCommand(client *c);
void saddCommand(client *c);
void sremCommand(client *c);
void smoveCommand(client *c);
void sismemberCommand(client *c);
Implement SMISMEMBER key member [member ...] (#7615) This is a rebased version of #3078 originally by shaharmor with the following patches by TysonAndre made after rebasing to work with the updated C API: 1. Add 2 more unit tests (wrong argument count error message, integer over 64 bits) 2. Use addReplyArrayLen instead of addReplyMultiBulkLen. 3. Undo changes to src/help.h - for the ZMSCORE PR, I heard those should instead be automatically generated from the redis-doc repo if it gets updated Motivations: - Example use case: Client code to efficiently check if each element of a set of 1000 items is a member of a set of 10 million items. (Similar to reasons for working on #7593) - HMGET and ZMSCORE already exist. This may lead to developers deciding to implement functionality that's best suited to a regular set with a data type of sorted set or hash map instead, for the multi-get support. Currently, multi commands or lua scripting to call sismember multiple times would almost definitely be less efficient than a native smismember for the following reasons: - Need to fetch the set from the string every time instead of reusing the C pointer. - Using pipelining or multi-commands would result in more bytes sent and received by the client for the repeated SISMEMBER KEY sections. - Need to specially encode the data and decode it from the client for lua-based solutions. - Proposed solutions using Lua or SADD/SDIFF could trigger writes to memory, which is undesirable on a redis replica server or when commands get replicated to replicas. Co-Authored-By: Shahar Mor <shahar@peer5.com> Co-Authored-By: Tyson Andre <tysonandre775@hotmail.com>
2020-08-11 10:55:06 +02:00
void smismemberCommand(client *c);
void scardCommand(client *c);
void spopCommand(client *c);
void srandmemberCommand(client *c);
void sinterCommand(client *c);
void sinterCardCommand(client *c);
void sinterstoreCommand(client *c);
void sunionCommand(client *c);
void sunionstoreCommand(client *c);
void sdiffCommand(client *c);
void sdiffstoreCommand(client *c);
void sscanCommand(client *c);
void syncCommand(client *c);
void flushdbCommand(client *c);
void flushallCommand(client *c);
void sortCommand(client *c);
void sortroCommand(client *c);
void lremCommand(client *c);
2020-06-10 12:40:24 +02:00
void lposCommand(client *c);
void rpoplpushCommand(client *c);
void lmoveCommand(client *c);
void infoCommand(client *c);
void mgetCommand(client *c);
void monitorCommand(client *c);
void expireCommand(client *c);
void expireatCommand(client *c);
void pexpireCommand(client *c);
void pexpireatCommand(client *c);
void getsetCommand(client *c);
void ttlCommand(client *c);
void touchCommand(client *c);
void pttlCommand(client *c);
void expiretimeCommand(client *c);
void pexpiretimeCommand(client *c);
void persistCommand(client *c);
void replicaofCommand(client *c);
void roleCommand(client *c);
void debugCommand(client *c);
void msetCommand(client *c);
void msetnxCommand(client *c);
void zaddCommand(client *c);
void zincrbyCommand(client *c);
void zrangeCommand(client *c);
void zrangebyscoreCommand(client *c);
void zrevrangebyscoreCommand(client *c);
void zrangebylexCommand(client *c);
void zrevrangebylexCommand(client *c);
void zcountCommand(client *c);
void zlexcountCommand(client *c);
void zrevrangeCommand(client *c);
void zcardCommand(client *c);
void zremCommand(client *c);
void zscoreCommand(client *c);
void zmscoreCommand(client *c);
void zremrangebyscoreCommand(client *c);
void zremrangebylexCommand(client *c);
void zpopminCommand(client *c);
void zpopmaxCommand(client *c);
void zmpopCommand(client *c);
void bzpopminCommand(client *c);
void bzpopmaxCommand(client *c);
void bzmpopCommand(client *c);
void zrandmemberCommand(client *c);
void multiCommand(client *c);
void execCommand(client *c);
void discardCommand(client *c);
void blpopCommand(client *c);
void brpopCommand(client *c);
void blmpopCommand(client *c);
void brpoplpushCommand(client *c);
void blmoveCommand(client *c);
void appendCommand(client *c);
void strlenCommand(client *c);
void zrankCommand(client *c);
void zrevrankCommand(client *c);
void hsetCommand(client *c);
void hsetnxCommand(client *c);
void hgetCommand(client *c);
void hmgetCommand(client *c);
void hdelCommand(client *c);
void hlenCommand(client *c);
void hstrlenCommand(client *c);
void zremrangebyrankCommand(client *c);
void zunionstoreCommand(client *c);
void zinterstoreCommand(client *c);
void zdiffstoreCommand(client *c);
void zunionCommand(client *c);
void zinterCommand(client *c);
void zinterCardCommand(client *c);
void zrangestoreCommand(client *c);
void zdiffCommand(client *c);
void zscanCommand(client *c);
void hkeysCommand(client *c);
void hvalsCommand(client *c);
void hgetallCommand(client *c);
void hexistsCommand(client *c);
void hscanCommand(client *c);
void hrandfieldCommand(client *c);
Treat subcommands as commands (#9504) ## Intro The purpose is to allow having different flags/ACL categories for subcommands (Example: CONFIG GET is ok-loading but CONFIG SET isn't) We create a small command table for every command that has subcommands and each subcommand has its own flags, etc. (same as a "regular" command) This commit also unites the Redis and the Sentinel command tables ## Affected commands CONFIG Used to have "admin ok-loading ok-stale no-script" Changes: 1. Dropped "ok-loading" in all except GET (this doesn't change behavior since there were checks in the code doing that) XINFO Used to have "read-only random" Changes: 1. Dropped "random" in all except CONSUMERS XGROUP Used to have "write use-memory" Changes: 1. Dropped "use-memory" in all except CREATE and CREATECONSUMER COMMAND No changes. MEMORY Used to have "random read-only" Changes: 1. Dropped "random" in PURGE and USAGE ACL Used to have "admin no-script ok-loading ok-stale" Changes: 1. Dropped "admin" in WHOAMI, GENPASS, and CAT LATENCY No changes. MODULE No changes. SLOWLOG Used to have "admin random ok-loading ok-stale" Changes: 1. Dropped "random" in RESET OBJECT Used to have "read-only random" Changes: 1. Dropped "random" in ENCODING and REFCOUNT SCRIPT Used to have "may-replicate no-script" Changes: 1. Dropped "may-replicate" in all except FLUSH and LOAD CLIENT Used to have "admin no-script random ok-loading ok-stale" Changes: 1. Dropped "random" in all except INFO and LIST 2. Dropped "admin" in ID, TRACKING, CACHING, GETREDIR, INFO, SETNAME, GETNAME, and REPLY STRALGO No changes. PUBSUB No changes. CLUSTER Changes: 1. Dropped "admin in countkeysinslots, getkeysinslot, info, nodes, keyslot, myid, and slots SENTINEL No changes. (note that DEBUG also fits, but we decided not to convert it since it's for debugging and anyway undocumented) ## New sub-command This commit adds another element to the per-command output of COMMAND, describing the list of subcommands, if any (in the same structure as "regular" commands) Also, it adds a new subcommand: ``` COMMAND LIST [FILTERBY (MODULE <module-name>|ACLCAT <cat>|PATTERN <pattern>)] ``` which returns a set of all commands (unless filters), but excluding subcommands. ## Module API A new module API, RM_CreateSubcommand, was added, in order to allow module writer to define subcommands ## ACL changes: 1. Now, that each subcommand is actually a command, each has its own ACL id. 2. The old mechanism of allowed_subcommands is redundant (blocking/allowing a subcommand is the same as blocking/allowing a regular command), but we had to keep it, to support the widespread usage of allowed_subcommands to block commands with certain args, that aren't subcommands (e.g. "-select +select|0"). 3. I have renamed allowed_subcommands to allowed_firstargs to emphasize the difference. 4. Because subcommands are commands in ACL too, you can now use "-" to block subcommands (e.g. "+client -client|kill"), which wasn't possible in the past. 5. It is also possible to use the allowed_firstargs mechanism with subcommand. For example: `+config -config|set +config|set|loglevel` will block all CONFIG SET except for setting the log level. 6. All of the ACL changes above required some amount of refactoring. ## Misc 1. There are two approaches: Either each subcommand has its own function or all subcommands use the same function, determining what to do according to argv[0]. For now, I took the former approaches only with CONFIG and COMMAND, while other commands use the latter approach (for smaller blamelog diff). 2. Deleted memoryGetKeys: It is no longer needed because MEMORY USAGE now uses the "range" key spec. 4. Bugfix: GETNAME was missing from CLIENT's help message. 5. Sentinel and Redis now use the same table, with the same function pointer. Some commands have a different implementation in Sentinel, so we redirect them (these are ROLE, PUBLISH, and INFO). 6. Command stats now show the stats per subcommand (e.g. instead of stats just for "config" you will have stats for "config|set", "config|get", etc.) 7. It is now possible to use COMMAND directly on subcommands: COMMAND INFO CONFIG|GET (The pipeline syntax was inspired from ACL, and can be used in functions lookupCommandBySds and lookupCommandByCString) 8. STRALGO is now a container command (has "help") ## Breaking changes: 1. Command stats now show the stats per subcommand (see (5) above)
2021-10-20 10:52:57 +02:00
void configSetCommand(client *c);
void configGetCommand(client *c);
void configResetStatCommand(client *c);
void configRewriteCommand(client *c);
void configHelpCommand(client *c);
void hincrbyCommand(client *c);
void hincrbyfloatCommand(client *c);
void subscribeCommand(client *c);
void unsubscribeCommand(client *c);
void psubscribeCommand(client *c);
void punsubscribeCommand(client *c);
void publishCommand(client *c);
void pubsubCommand(client *c);
void spublishCommand(client *c);
void ssubscribeCommand(client *c);
void sunsubscribeCommand(client *c);
void watchCommand(client *c);
void unwatchCommand(client *c);
void clusterCommand(client *c);
void restoreCommand(client *c);
void migrateCommand(client *c);
void askingCommand(client *c);
void readonlyCommand(client *c);
void readwriteCommand(client *c);
int verifyDumpPayload(unsigned char *p, size_t len, uint16_t *rdbver_ptr);
void dumpCommand(client *c);
void objectCommand(client *c);
void memoryCommand(client *c);
void clientCommand(client *c);
void helloCommand(client *c);
void evalCommand(client *c);
void evalRoCommand(client *c);
void evalShaCommand(client *c);
void evalShaRoCommand(client *c);
void scriptCommand(client *c);
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
void fcallCommand(client *c);
void fcallroCommand(client *c);
Redis Function Libraries (#10004) # Redis Function Libraries This PR implements Redis Functions Libraries as describe on: https://github.com/redis/redis/issues/9906. Libraries purpose is to provide a better code sharing between functions by allowing to create multiple functions in a single command. Functions that were created together can safely share code between each other without worrying about compatibility issues and versioning. Creating a new library is done using 'FUNCTION LOAD' command (full API is described below) This PR introduces a new struct called libraryInfo, libraryInfo holds information about a library: * name - name of the library * engine - engine used to create the library * code - library code * description - library description * functions - the functions exposed by the library When Redis gets the `FUNCTION LOAD` command it creates a new empty libraryInfo. Redis passes the `CODE` to the relevant engine alongside the empty libraryInfo. As a result, the engine will create one or more functions by calling 'libraryCreateFunction'. The new funcion will be added to the newly created libraryInfo. So far Everything is happening locally on the libraryInfo so it is easy to abort the operation (in case of an error) by simply freeing the libraryInfo. After the library info is fully constructed we start the joining phase by which we will join the new library to the other libraries currently exist on Redis. The joining phase make sure there is no function collision and add the library to the librariesCtx (renamed from functionCtx). LibrariesCtx is used all around the code in the exact same way as functionCtx was used (with respect to RDB loading, replicatio, ...). The only difference is that apart from function dictionary (maps function name to functionInfo object), the librariesCtx contains also a libraries dictionary that maps library name to libraryInfo object. ## New API ### FUNCTION LOAD `FUNCTION LOAD <ENGINE> <LIBRARY NAME> [REPLACE] [DESCRIPTION <DESCRIPTION>] <CODE>` Create a new library with the given parameters: * ENGINE - REPLACE Engine name to use to create the library. * LIBRARY NAME - The new library name. * REPLACE - If the library already exists, replace it. * DESCRIPTION - Library description. * CODE - Library code. Return "OK" on success, or error on the following cases: * Library name already taken and REPLACE was not used * Name collision with another existing library (even if replace was uses) * Library registration failed by the engine (usually compilation error) ## Changed API ### FUNCTION LIST `FUNCTION LIST [LIBRARYNAME <LIBRARY NAME PATTERN>] [WITHCODE]` Command was modified to also allow getting libraries code (so `FUNCTION INFO` command is no longer needed and removed). In addition the command gets an option argument, `LIBRARYNAME` allows you to only get libraries that match the given `LIBRARYNAME` pattern. By default, it returns all libraries. ### INFO MEMORY Added number of libraries to `INFO MEMORY` ### Commands flags `DENYOOM` flag was set on `FUNCTION LOAD` and `FUNCTION RESTORE`. We consider those commands as commands that add new data to the dateset (functions are data) and so we want to disallows to run those commands on OOM. ## Removed API * FUNCTION CREATE - Decided on https://github.com/redis/redis/issues/9906 * FUNCTION INFO - Decided on https://github.com/redis/redis/issues/9899 ## Lua engine changes When the Lua engine gets the code given on `FUNCTION LOAD` command, it immediately runs it, we call this run the loading run. Loading run is not a usual script run, it is not possible to invoke any Redis command from within the load run. Instead there is a new API provided by `library` object. The new API's: * `redis.log` - behave the same as `redis.log` * `redis.register_function` - register a new function to the library The loading run purpose is to register functions using the new `redis.register_function` API. Any attempt to use any other API will result in an error. In addition, the load run is has a time limit of 500ms, error is raise on timeout and the entire operation is aborted. ### `redis.register_function` `redis.register_function(<function_name>, <callback>, [<description>])` This new API allows users to register a new function that will be linked to the newly created library. This API can only be called during the load run (see definition above). Any attempt to use it outside of the load run will result in an error. The parameters pass to the API are: * function_name - Function name (must be a Lua string) * callback - Lua function object that will be called when the function is invokes using fcall/fcall_ro * description - Function description, optional (must be a Lua string). ### Example The following example creates a library called `lib` with 2 functions, `f1` and `f1`, returns 1 and 2 respectively: ``` local function f1(keys, args)     return 1 end local function f2(keys, args)     return 2 end redis.register_function('f1', f1) redis.register_function('f2', f2) ``` Notice: Unlike `eval`, functions inside a library get the KEYS and ARGV as arguments to the functions and not as global. ### Technical Details On the load run we only want the user to be able to call a white list on API's. This way, in the future, if new API's will be added, the new API's will not be available to the load run unless specifically added to this white list. We put the while list on the `library` object and make sure the `library` object is only available to the load run by using [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv) API. This API allows us to set the `globals` of a function (and all the function it creates). Before starting the load run we create a new fresh Lua table (call it `g`) that only contains the `library` API (we make sure to set global protection on this table just like the general global protection already exists today), then we use [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv) to set `g` as the global table of the load run. After the load run finished we update `g` metatable and set `__index` and `__newindex` functions to be `_G` (Lua default globals), we also pop out the `library` object as we do not need it anymore. This way, any function that was created on the load run (and will be invoke using `fcall`) will see the default globals as it expected to see them and will not have the `library` API anymore. An important outcome of this new approach is that now we can achieve a distinct global table for each library (it is not yet like that but it is very easy to achieve it now). In the future we can decide to remove global protection because global on different libraries will not collide or we can chose to give different API to different libraries base on some configuration or input. Notice that this technique was meant to prevent errors and was not meant to prevent malicious user from exploit it. For example, the load run can still save the `library` object on some local variable and then using in `fcall` context. To prevent such a malicious use, the C code also make sure it is running in the right context and if not raise an error.
2022-01-06 12:39:38 +01:00
void functionLoadCommand(client *c);
Auto-generate the command table from JSON files (#9656) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M|KM|FT|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 20:23:15 +01:00
void functionDeleteCommand(client *c);
void functionKillCommand(client *c);
void functionStatsCommand(client *c);
void functionListCommand(client *c);
void functionHelpCommand(client *c);
void functionFlushCommand(client *c);
void functionRestoreCommand(client *c);
void functionDumpCommand(client *c);
void timeCommand(client *c);
void bitopCommand(client *c);
void bitcountCommand(client *c);
void bitposCommand(client *c);
void replconfCommand(client *c);
void waitCommand(client *c);
void georadiusbymemberCommand(client *c);
void georadiusbymemberroCommand(client *c);
void georadiusCommand(client *c);
void georadiusroCommand(client *c);
void geoaddCommand(client *c);
void geohashCommand(client *c);
void geoposCommand(client *c);
void geodistCommand(client *c);
void geosearchCommand(client *c);
void geosearchstoreCommand(client *c);
void pfselftestCommand(client *c);
void pfaddCommand(client *c);
void pfcountCommand(client *c);
void pfmergeCommand(client *c);
void pfdebugCommand(client *c);
void latencyCommand(client *c);
2016-03-06 13:44:24 +01:00
void moduleCommand(client *c);
void securityWarningCommand(client *c);
void xaddCommand(client *c);
void xrangeCommand(client *c);
void xrevrangeCommand(client *c);
2017-09-06 12:03:17 +02:00
void xlenCommand(client *c);
2017-09-08 11:40:16 +02:00
void xreadCommand(client *c);
void xgroupCommand(client *c);
void xsetidCommand(client *c);
2018-01-25 16:39:49 +01:00
void xackCommand(client *c);
void xpendingCommand(client *c);
void xclaimCommand(client *c);
void xautoclaimCommand(client *c);
2018-03-07 16:08:06 +01:00
void xinfoCommand(client *c);
2018-04-18 13:12:09 +02:00
void xdelCommand(client *c);
2018-04-19 16:25:29 +02:00
void xtrimCommand(client *c);
2018-09-12 11:34:10 +02:00
void lolwutCommand(client *c);
void aclCommand(client *c);
void lcsCommand(client *c);
void quitCommand(client *c);
void resetCommand(client *c);
void failoverCommand(client *c);
#if defined(__GNUC__)
void *calloc(size_t count, size_t size) __attribute__ ((deprecated));
void free(void *ptr) __attribute__ ((deprecated));
void *malloc(size_t size) __attribute__ ((deprecated));
void *realloc(void *ptr, size_t size) __attribute__ ((deprecated));
#endif
/* Debugging stuff */
void _serverAssertWithInfo(const client *c, const robj *o, const char *estr, const char *file, int line);
void _serverAssert(const char *estr, const char *file, int line);
#ifdef __GNUC__
void _serverPanic(const char *file, int line, const char *msg, ...)
__attribute__ ((format (printf, 3, 4)));
#else
void _serverPanic(const char *file, int line, const char *msg, ...);
#endif
void serverLogObjectDebugInfo(const robj *o);
void sigsegvHandler(int sig, siginfo_t *info, void *secret);
const char *getSafeInfoString(const char *s, size_t len, char **tmp);
dict *genInfoSectionDict(robj **argv, int argc, char **defaults, int *out_all, int *out_everything);
void releaseInfoSectionDict(dict *sec);
sds genRedisInfoString(dict *section_dict, int all_sections, int everything);
sds genModulesInfoString(sds info);
Refactor config.c for generic setter interface (#9644) This refactors all `CONFIG SET`s and conf file loading arguments go through the generic config handling interface. Refactoring changes: - All config params go through the `standardConfig` interface (some stuff which is only related to the config file and not the `CONFIG` command still has special handling for rewrite/config file parsing, `loadmodule`, for example.) . - Added `MULTI_ARG_CONFIG` flag for configs to signify they receive a variable number of arguments instead of a single argument. This is used to break up space separated arguments to `CONFIG SET` so the generic setter interface can pass multiple arguments to the setter function. When parsing the config file we also break up anything after the config name into multiple arguments to the setter function. Interface changes: - A side effect of the above interface is that the `bind` argument in the config file can be empty (no argument at all) this is treated the same as passing an single empty string argument (same as `save` already used to work). - Support rewrite and setting `watchdog-period` from config file (was only supported by the CONFIG command till now). - Another side effect is that the `save T X` config argument now supports multiple Time-Changes pairs in a single line like its `CONFIG SET` counterpart. So in the config file you can either do: ``` save 3600 1 save 600 10 ``` or do ``` save 3600 1 600 10 ``` Co-authored-by: Bjorn Svensson <bjorn.a.svensson@est.tech>
2021-11-07 12:40:08 +01:00
void applyWatchdogPeriod();
2012-03-27 11:47:51 +02:00
void watchdogScheduleSignal(int period);
2015-07-26 15:17:43 +02:00
void serverLogHexDump(int level, char *descr, void *value, size_t len);
int memtest_preserving_test(unsigned long *m, size_t bytes, int passes);
void mixDigest(unsigned char *digest, const void *ptr, size_t len);
void xorDigest(unsigned char *digest, const void *ptr, size_t len);
sds catSubCommandFullname(const char *parent_name, const char *sub_name);
void commandAddSubcommand(struct redisCommand *parent, struct redisCommand *subcommand, const char *declared_name);
A better approach for COMMAND INFO for movablekeys commands (#8324) Fix #7297 The problem: Today, there is no way for a client library or app to know the key name indexes for commands such as ZUNIONSTORE/EVAL and others with "numkeys", since COMMAND INFO returns no useful info for them. For cluster-aware redis clients, this requires to 'patch' the client library code specifically for each of these commands or to resolve each execution of these commands with COMMAND GETKEYS. The solution: Introducing key specs other than the legacy "range" (first,last,step) The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates the command doesn't take any key arguments or may contain one or more key-specs, each one may leads to the discovery of 0 or more key arguments. A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will obviously remain unchanged. A client that supports this key-specs feature needs only to look at the key-specs array. If it finds an unrecognized spec, it must resort to using COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no need to use GETKEYS. Each spec is an array of arguments, first one is the spec name, the second is an array of flags, and the third is an array containing details about the spec (specific meaning for each spec type) The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write. clients should ignore any unfamiliar flags. In order to easily find the positions of keys in a given array of args we introduce keys specs. There are two logical steps of key specs: 1. `start_search`: Given an array of args, indicate where we should start searching for keys 2. `find_keys`: Given the output of start_search and an array of args, indicate all possible indices of keys. ### start_search step specs - `index`: specify an argument index explicitly - `index`: 0 based index (1 means the first command argument) - `keyword`: specify a string to match in `argv`. We should start searching for keys just after the keyword appears. - `keyword`: the string to search for - `start_search`: an index from which to start the keyword search (can be negative, which means to search from the end) Examples: - `SET` has start_search of type `index` with value `1` - `XREAD` has start_search of type `keyword` with value `[“STREAMS”,1]` - `MIGRATE` has start_search of type `keyword` with value `[“KEYS”,-2]` ### find_keys step specs - `range`: specify `[count, step, limit]`. - `lastkey`: index of the last key. relative to the index returned from begin_search. -1 indicating till the last argument, -2 one before the last - `step`: how many args should we skip after finding a key, in order to find the next one - `limit`: if count is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means ½ of the remaining args, 3 means ⅓, and so on. - “keynum”: specify `[keynum_index, first_key_index, step]`. - `keynum_index`: is relative to the return of the `start_search` spec. - `first_key_index`: is relative to `keynum_index`. - `step`: how many args should we skip after finding a key, in order to find the next one Examples: - `SET` has `range` of `[0,1,0]` - `MSET` has `range` of `[-1,2,0]` - `XREAD` has `range` of `[-1,1,2]` - `ZUNION` has `start_search` of type `index` with value `1` and `find_keys` of type `keynum` with value `[0,1,1]` - `AI.DAGRUN` has `start_search` of type `keyword` with value `[“LOAD“,1]` and `find_keys` of type `keynum` with value `[0,1,1]` (see https://oss.redislabs.com/redisai/master/commands/#aidagrun) Note: this solution is not perfect as the module writers can come up with anything, but at least we will be able to find the key args of the vast majority of commands. If one of the above specs can’t describe the key positions, the module writer can always fall back to the `getkeys-api` option. Some keys cannot be found easily (`KEYS` in `MIGRATE`: Imagine the argument for `AUTH` is the string “KEYS” - we will start searching in the wrong index). The guarantee is that the specs may be incomplete (`incomplete` will be specified in the spec to denote that) but we never report false information (assuming the command syntax is correct). For `MIGRATE` we start searching from the end - `startfrom=-1` - and if one of the keys is actually called "keys" we will report only a subset of all keys - hence the `incomplete` flag. Some `incomplete` specs can be completely empty (i.e. UNKNOWN begin_search) which should tell the client that COMMAND GETKEYS (or any other way to get the keys) must be used (Example: For `SORT` there is no way to describe the STORE keyword spec, as the word "store" can appear anywhere in the command). We will expose these key specs in the `COMMAND` command so that clients can learn, on startup, where the keys are for all commands instead of holding hardcoded tables or use `COMMAND GETKEYS` in runtime. Comments: 1. Redis doesn't internally use the new specs, they are only used for COMMAND output. 2. In order to support the current COMMAND INFO format (reply array indices 4, 5, 6) we created a synthetic range, called legacy_range, that, if possible, is built according to the new specs. 3. Redis currently uses only getkeys_proc or the legacy_range to get the keys indices (in COMMAND GETKEYS for example). "incomplete" specs: the command we have issues with are MIGRATE, STRALGO, and SORT for MIGRATE, because the token KEYS, if exists, must be the last token, we can search in reverse. it one of the keys is actually the string "keys" will return just a subset of the keys (hence, it's "incomplete") for SORT and STRALGO we can use this heuristic (the keys can be anywhere in the command) and therefore we added a key spec that is both "incomplete" and of "unknown type" if a client encounters an "incomplete" spec it means that it must find a different way (either COMMAND GETKEYS or have its own parser) to retrieve the keys. please note that all commands, apart from the three mentioned above, have "complete" key specs
2021-09-15 10:10:29 +02:00
void populateCommandMovableKeys(struct redisCommand *cmd);
void debugDelay(int usec);
void killIOThreads(void);
void killThreads(void);
void makeThreadKillable(void);
Replica keep serving data during repl-diskless-load=swapdb for better availability (#9323) For diskless replication in swapdb mode, considering we already spend replica memory having a backup of current db to restore in case of failure, we can have the following benefits by instead swapping database only in case we succeeded in transferring db from master: - Avoid `LOADING` response during failed and successful synchronization for cases where the replica is already up and running with data. - Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping. - This could be implemented also for disk replication with similar benefits if consumers are willing to spend the extra memory usage. General notes: - The concept of `backupDb` becomes `tempDb` for clarity. - Async loading mode will only kick in if the replica is syncing from a master that has the same repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline. - New property in INFO: `async_loading` to differentiate from the blocking loading - Slot to Key mapping is now a field of `redisDb` as it's more natural to access it from both server.db and the tempDb that is passed around. - Because this is affecting replicas only, we assume that if they are not readonly and write commands during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET here anyways to avoid complications. Considerations for review: - We have many cases where server.loading flag is used and even though I tried my best, there may be cases where async_loading should be checked as well and cases where it shouldn't (would require very good understanding of whole code) - Several places that had different behavior depending on the loading flag where actually meant to just handle commands coming from the AOF client differently than ones coming from real clients, changed to check CLIENT_ID_AOF instead. **Additional for Release Notes** - Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't contribute on triggering next database SAVE - New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING - Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event. Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED, ABORTED and COMPLETED. - New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions to allow modules to declare they support the diskless replication with async loading (when absent, we fall back to disk-based loading). Co-authored-by: Eduardo Semprebon <edus@saxobank.com> Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-04 09:46:50 +01:00
void swapMainDbWithTempDb(redisDb *tempDb);
/* Use macro for checking log level to avoid evaluating arguments in cases log
* should be ignored due to low level. */
#define serverLog(level, ...) do {\
if (((level)&0xff) < server.verbosity) break;\
_serverLog(level, __VA_ARGS__);\
} while(0)
/* TLS stuff */
void tlsInit(void);
void tlsCleanup(void);
int tlsConfigure(redisTLSContextConfig *ctx_config);
Multiparam config set (#9748) We can now do: `config set maxmemory 10m repl-backlog-size 5m` ## Basic algorithm to support "transaction like" config sets: 1. Backup all relevant current values (via get). 2. Run "verify" and "set" on everything, if we fail run "restore". 3. Run "apply" on everything (optional optimization: skip functions already run). If we fail run "restore". 4. Return success. ### restore 1. Run set on everything in backup. If we fail log it and continue (this puts us in an undefined state but we decided it's better than the alternative of panicking). This indicates either a bug or some unsupported external state. 2. Run apply on everything in backup (optimization: skip functions already run). If we fail log it (see comment above). 3. Return error. ## Implementation/design changes: * Apply function are idempotent (have no effect if they are run more than once for the same config). * No indication in set functions if we're reading the config or running from the `CONFIG SET` command (removed `update` argument). * Set function should set some config variable and assume an (optional) apply function will use that later to apply. If we know this setting can be safely applied immediately and can always be reverted and doesn't depend on any other configuration we can apply immediately from within the set function (and not store the setting anywhere). This is the case of this `dir` config, for example, which has no apply function. No apply function is need also in the case that setting the variable in the `server` struct is all that needs to be done to make the configuration take effect. Note that the original concept of `update_fn`, which received the old and new values was removed and replaced by the optional apply function. * Apply functions use settings written to the `server` struct and don't receive any inputs. * I take care that for the generic (non-special) configs if there's no change I avoid calling the setter (possible optimization: avoid calling the apply function as well). * Passing the same config parameter more than once to `config set` will fail. You can't do `config set my-setting value1 my-setting value2`. Note that getting `save` in the context of the conf file parsing to work here as before was a pain. The conf file supports an aggregate `save` definition, where each `save` line is added to the server's save params. This is unlike any other line in the config file where each line overwrites any previous configuration. Since we now support passing multiple save params in a single line (see top comments about `save` in https://github.com/redis/redis/pull/9644) we should deprecate the aggregate nature of this config line and perhaps reduce this ugly code in the future.
2021-12-01 09:15:11 +01:00
int isTlsConfigured(void);
#define redisDebug(fmt, ...) \
printf("DEBUG %s:%d > " fmt "\n", __FILE__, __LINE__, __VA_ARGS__)
#define redisDebugMark() \
printf("-- MARK %s:%d --\n", __FILE__, __LINE__)
int iAmMaster(void);
#define STRINGIFY_(x) #x
#define STRINGIFY(x) STRINGIFY_(x)
#endif