Commit Graph

78 Commits

Author SHA1 Message Date
Pieter Cailliau 0b34396924
Change license from BSD-3 to dual RSALv2+SSPLv1 (#13157)
[Read more about the license change
here](https://redis.com/blog/redis-adopts-dual-source-available-licensing/)
Live long and prosper 🖖
2024-03-20 22:38:24 +00:00
Yanqi Lv 62153b3b2f
Refine the purpose of rdb saving with accurate flags (#12925)
In Redis, rdb is produced in three scenarios mainly.

- backup, such as `bgsave` and `save` command
- full sync in replication
- aof rewrite if `aof-use-rdb-preamble` is yes

We also have some RDB flags to identify the purpose of rdb saving.
```C
/* flags on the purpose of rdb save or load */
#define RDBFLAGS_NONE 0                 /* No special RDB loading. */
#define RDBFLAGS_AOF_PREAMBLE (1<<0)    /* Load/save the RDB as AOF preamble. */
#define RDBFLAGS_REPLICATION (1<<1)     /* Load/save for SYNC. */
```

But currently, it seems that these flags and purposes of rdb saving
don't exactly match. I find it in `rdbSaveRioWithEOFMark` which calls
`startSaving` with `RDBFLAGS_REPLICATION` but `rdbSaveRio` with
`RDBFLAGS_NONE`.
```C
int rdbSaveRioWithEOFMark(int req, rio *rdb, int *error, rdbSaveInfo *rsi) {
    char eofmark[RDB_EOF_MARK_SIZE];

    startSaving(RDBFLAGS_REPLICATION);
    getRandomHexChars(eofmark,RDB_EOF_MARK_SIZE);
    if (error) *error = 0;
    if (rioWrite(rdb,"$EOF:",5) == 0) goto werr;
    if (rioWrite(rdb,eofmark,RDB_EOF_MARK_SIZE) == 0) goto werr;
    if (rioWrite(rdb,"\r\n",2) == 0) goto werr;
    if (rdbSaveRio(req,rdb,error,RDBFLAGS_NONE,rsi) == C_ERR) goto werr;
    if (rioWrite(rdb,eofmark,RDB_EOF_MARK_SIZE) == 0) goto werr;
    stopSaving(1);
    return C_OK;

werr: /* Write error. */
    /* Set 'error' only if not already set by rdbSaveRio() call. */
    if (error && *error == 0) *error = errno;
    stopSaving(0);
    return C_ERR;
}
```

In this PR, I refine the purpose of rdb saving with accurate flags.
2024-02-01 13:41:02 +02:00
Oran Agra f9a0eb60f7
update redis-check-rdb types (#12969)
seems that we forgot to update the array in redis-check rdb.
2024-01-23 11:48:02 +02:00
Vitaly 0270abda82
Replace cluster metadata with slot specific dictionaries (#11695)
This is an implementation of https://github.com/redis/redis/issues/10589 that eliminates 16 bytes per entry in cluster mode, that are currently used to create a linked list between entries in the same slot.  Main idea is splitting main dictionary into 16k smaller dictionaries (one per slot), so we can perform all slot specific operations, such as iteration, without any additional info in the `dictEntry`. For Redis cluster, the expectation is that there will be a larger number of keys, so the fixed overhead of 16k dictionaries will be The expire dictionary is also split up so that each slot is logically decoupled, so that in subsequent revisions we will be able to atomically flush a slot of data.

## Important changes
* Incremental rehashing - one big change here is that it's not one, but rather up to 16k dictionaries that can be rehashing at the same time, in order to keep track of them, we introduce a separate queue for dictionaries that are rehashing. Also instead of rehashing a single dictionary, cron job will now try to rehash as many as it can in 1ms.
* getRandomKey - now needs to not only select a random key, from the random bucket, but also needs to select a random dictionary. Fairness is a major concern here, as it's possible that keys can be unevenly distributed across the slots. In order to address this search we introduced binary index tree). With that data structure we are able to efficiently find a random slot using binary search in O(log^2(slot count)) time.
* Iteration efficiency - when iterating dictionary with a lot of empty slots, we want to skip them efficiently. We can do this using same binary index that is used for random key selection, this index allows us to find a slot for a specific key index. For example if there are 10 keys in the slot 0, then we can quickly find a slot that contains 11th key using binary search on top of the binary index tree.
* scan API - in order to perform a scan across the entire DB, the cursor now needs to not only save position within the dictionary but also the slot id. In this change we append slot id into LSB of the cursor so it can be passed around between client and the server. This has interesting side effect, now you'll be able to start scanning specific slot by simply providing slot id as a cursor value. The plan is to not document this as defined behavior, however. It's also worth nothing the SCAN API is now technically incompatible with previous versions, although practically we don't believe it's an issue.
* Checksum calculation optimizations - During command execution, we know that all of the keys are from the same slot (outside of a few notable exceptions such as cross slot scripts and modules). We don't want to compute the checksum multiple multiple times, hence we are relying on cached slot id in the client during the command executions. All operations that access random keys, either should pass in the known slot or recompute the slot. 
* Slot info in RDB - in order to resize individual dictionaries correctly, while loading RDB, it's not enough to know total number of keys (of course we could approximate number of keys per slot, but it won't be precise). To address this issue, we've added additional metadata into RDB that contains number of keys in each slot, which can be used as a hint during loading.
* DB size - besides `DBSIZE` API, we need to know size of the DB in many places want, in order to avoid scanning all dictionaries and summing up their sizes in a loop, we've introduced a new field into `redisDb` that keeps track of `key_count`. This way we can keep DBSIZE operation O(1). This is also kept for O(1) expires computation as well.

## Performance
This change improves SET performance in cluster mode by ~5%, most of the gains come from us not having to maintain linked lists for keys in slot, non-cluster mode has same performance. For workloads that rely on evictions, the performance is similar because of the extra overhead for finding keys to evict. 

RDB loading performance is slightly reduced, as the slot of each key needs to be computed during the load.

## Interface changes
* Removed `overhead.hashtable.slot-to-keys` to `MEMORY STATS`
* Scan API will now require 64 bits to store the cursor, even on 32 bit systems, as the slot information will be stored.
* New RDB version to support the new op code for SLOT information. 

---------

Co-authored-by: Vitaly Arbuzov <arvit@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Roshan Khatri <rvkhatri@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-10-14 23:58:26 -07:00
Wen Hui 965dc90b72
change return type to be consistant (#12479)
Currently rdbSaveMillisecondTime, rdbSaveDoubleValue api's return type is
int but they return the value directly from rdbWriteRaw function which has the
return type of ssize_t. As this may cause overflow to int so changed to ssize_t.
2023-08-16 10:38:59 +03:00
Ozan Tezcan e55568edb5
Add RM_RdbLoad and RM_RdbSave module API functions (#11852)
Add `RM_RdbLoad()` and `RM_RdbSave()` to load/save RDB files from the module API. 

In our use case, we have our clustering implementation as a module. As part of this
implementation, the module needs to trigger RDB save operation at specific points.
Also, this module delivers RDB files to other nodes (not using Redis' replication).
When a node receives an RDB file, it should be able to load the RDB. Currently,
there is no module API to save/load RDB files. 


This PR adds four new APIs:
```c
RedisModuleRdbStream *RM_RdbStreamCreateFromFile(const char *filename);
void RM_RdbStreamFree(RedisModuleRdbStream *stream);

int RM_RdbLoad(RedisModuleCtx *ctx, RedisModuleRdbStream *stream, int flags);
int RM_RdbSave(RedisModuleCtx *ctx, RedisModuleRdbStream *stream, int flags);
```

The first step is to create a `RedisModuleRdbStream` object. This PR provides a function to
create RedisModuleRdbStream from the filename. (You can load/save RDB with the filename).
In the future, this API can be extended if needed: 
e.g., `RM_RdbStreamCreateFromFd()`, `RM_RdbStreamCreateFromSocket()` to save/load
RDB from an `fd` or a `socket`. 


Usage:
```c
/* Save RDB */
RedisModuleRdbStream *stream = RedisModule_RdbStreamCreateFromFile("example.rdb");
RedisModule_RdbSave(ctx, stream, 0);
RedisModule_RdbStreamFree(stream);

/* Load RDB */
RedisModuleRdbStream *stream = RedisModule_RdbStreamCreateFromFile("example.rdb");
RedisModule_RdbLoad(ctx, stream, 0);
RedisModule_RdbStreamFree(stream);
```
2023-04-09 12:07:32 +03:00
Tian 7dae142a2e
Reclaim page cache of RDB file (#11248)
# Background
The RDB file is usually generated and used once and seldom used again, but the content would reside in page cache until OS evicts it. A potential problem is that once the free memory exhausts, the OS have to reclaim some memory from page cache or swap anonymous page out, which may result in a jitters to the Redis service.

Supposing an exact scenario, a high-capacity machine hosts many redis instances, and we're upgrading the Redis together. The page cache in host machine increases as RDBs are generated. Once the free memory drop into low watermark(which is more likely to happen in older Linux kernel like 3.10, before [watermark_scale_factor](https://lore.kernel.org/lkml/1455813719-2395-1-git-send-email-hannes@cmpxchg.org/) is introduced, the `low watermark` is linear to `min watermark`, and there'is not too much buffer space for `kswapd` to be wake up to reclaim memory), a `direct reclaim` happens, which means the process would stall to wait for memory allocation.

# What the PR does
The PR introduces a capability to reclaim the cache when the RDB is operated. Generally there're two cases, read and write the RDB. For read it's a little messy to address the incremental reclaim, so the reclaim is done in one go in background after the load is finished to avoid blocking the work thread. For write, incremental reclaim amortizes the work of reclaim so no need to put it into background, and the peak watermark of cache can be reduced in this way.

Two cases are addresses specially, replication and restart, for both of which the cache is leveraged to speed up the processing, so the reclaim is postponed to a right time. To do this, a flag is added to`rdbSave` and `rdbLoad` to control whether the cache need to be kept, with the default value false.

# Something deserve noting
1. Though `posix_fadvise` is the POSIX standard, but only few platform support it, e.g. Linux, FreeBSD 10.0.
2. In Linux `posix_fadvise` only take effect on writeback-ed pages, so a `sync`(or `fsync`, `fdatasync`) is needed to flush the dirty page before `posix_fadvise` if we reclaim write cache.

# About test
A unit test is added to verify the effect of `posix_fadvise`.
In integration test overall cache increase is checked, as well as the cache backed by RDB as a specific TCL test is executed in isolated Github action job.
2023-02-12 09:23:29 +02:00
guybe7 72e90695ec
Stream consumers: Re-purpose seen-time, add active-time (#11099)
1. "Fixed" the current code so that seen-time/idle actually refers to interaction
  attempts (as documented; breaking change)
2. Added active-time/inactive to refer to successful interaction (what
  seen-time/idle used to be)

At first, I tried to avoid changing the behavior of seen-time/idle but then realized
that, in this case, the odds are the people read the docs and implemented their
code based on the docs (which didn't match the behavior).
For the most part, that would work fine, except that issue #9996 was found.

I was working under the assumption that people relied on the docs, and for
the most part, it could have worked well enough. so instead of fixing the docs,
as I would usually do, I fixed the code to match the docs in this particular case.

Note that, in case the consumer has never read any entries, the values
for both "active-time" (XINFO FULL) and "inactive" (XINFO CONSUMERS) will
be -1, meaning here that the consumer was never active.

Note that seen/active time is only affected by XREADGROUP / X[AUTO]CLAIM, not
by XPENDING, XINFO, and other "read-only" stream CG commands (always has been,
even before this PR)

Other changes:
* Another behavioral change (arguably a bugfix) is that XREADGROUP and X[AUTO]CLAIM
  create the consumer regardless of whether it was able to perform some reading/claiming
* RDB format change to save the `active_time`, and set it to the same value of `seen_time` in old rdb files.
2022-11-30 14:21:31 +02:00
Binbin 51887e61b8
sanitize dump payload: fix crash with empty set with listpack encoding (#11519)
The following example will create an empty set (listpack encoding):
```
> RESTORE key 0
"\x14\x25\x25\x00\x00\x00\x00\x00\x02\x01\x82\x5F\x37\x03\x06\x01\x82\x5F\x35\x03\x82\x5F\x33\x03\x00\x01\x82\x5F\x31\x03\x82\x5F\x39\x03\x04\xA9\x08\x01\xFF\x0B\x00\xA3\x26\x49\xB4\x86\xB0\x0F\x41"
OK
> SCARD key
(integer) 0
> SRANDMEMBER key
Error: Server closed the connection
```

In the spirit of #9297, skip empty set when loading RDB_TYPE_SET_LISTPACK.
Introduced in #11290
2022-11-20 12:12:15 +02:00
Viktor Söderqvist 4e472a1a7f
Listpack encoding for sets (#11290)
Small sets with not only integer elements are listpack encoded, by default
up to 128 elements, max 64 bytes per element, new config `set-max-listpack-entries`
and `set-max-listpack-value`. This saves memory for small sets compared to using a hashtable.

Sets with only integers, even very small sets, are still intset encoded (up to 1G
limit, etc.). Larger sets are hashtable encoded.

This PR increments the RDB version, and has an effect on OBJECT ENCODING

Possible conversions when elements are added:

    intset -> listpack
    listpack -> hashtable
    intset -> hashtable

Note: No conversion happens when elements are deleted. If all elements are
deleted and then added again, the set is deleted and recreated, thus implicitly
converted to a smaller encoding.
2022-11-09 19:50:07 +02:00
Meir Shpilraien (Spielrein) b43f254813
Avoid saving module aux on RDB if no aux data was saved by the module. (#11374)
### Background

The issue is that when saving an RDB with module AUX data, the module AUX metadata
(moduleid, when, ...) is saved to the RDB even though the module did not saved any actual data.
This prevent loading the RDB in the absence of the module (although there is no actual data in
the RDB that requires the module to be loaded).

### Solution

The solution suggested in this PR is that module AUX will be saved on the RDB only if the module
actually saved something during `aux_save` function.

To support backward compatibility, we introduce `aux_save2` callback that acts the same as
`aux_save` with the tiny change of avoid saving the aux field if no data was actually saved by
the module. Modules can use the new API to make sure that if they have no data to save,
then it will be possible to load the created RDB even without the module.

### Concerns

A module may register for the aux load and save hooks just in order to be notified when
saving or loading starts or completed (there are better ways to do that, but it still possible
that someone used it).

However, if a module didn't save a single field in the save callback, it means it's not allowed
to read in the read callback, since it has no way to distinguish between empty and non-empty
payloads. furthermore, it means that if the module did that, it must never change it, since it'll
break compatibility with it's old RDB files, so this is really not a valid use case.

Since some modules (ones who currently save one field indicating an empty payload), need
to know if saving an empty payload is valid, and if Redis is gonna ignore an empty payload
or store it, we opted to add a new API (rather than change behavior of an existing API and
expect modules to check the redis version)

### Technical Details

To avoid saving AUX data on RDB, we change the code to first save the AUX metadata
(moduleid, when, ...) into a temporary buffer. The buffer is then flushed to the rio at the first
time the module makes a write operation inside the `aux_save` function. If the module saves
nothing (and `aux_save2` was used), the entire temporary buffer is simply dropped and no
data about this AUX field is saved to the RDB. This make it possible to load the RDB even in
the absence of the module.

Test was added to verify the fix.
2022-10-18 19:45:46 +03:00
Oran Agra ac1cc5a6e1
Trim rdb loading code for pre-release formats (#11058)
The initial module format introduced in 4.0  RC1 and was changed in RC2
The initial function format introduced in 7.0 RC1 and changed in RC3
2022-08-15 21:41:44 +03:00
Moti Cohen 1aa6c4ab92
Adding parentheses and do-while(0) to macros (#11080)
Fixing few macros that doesn't follows most basic safety conventions
which is wrapping any usage of passed variable
with parentheses and if written more than one command, then wrap
it with do-while(0) (or parentheses).
2022-08-03 19:38:08 +03:00
Meir Shpilraien (Spielrein) ae020e3d56
Functions: Move library meta data to be part of the library payload. (#10500)
## Move library meta data to be part of the library payload.

Following the discussion on https://github.com/redis/redis/issues/10429 and the intention to add (in the future) library versioning support, we believe that the entire library metadata (like name and engine) should be part of the library payload and not provided by the `FUNCTION LOAD` command. The reasoning behind this is that the programmer who developed the library should be the one who set those values (name, engine, and in the future also version). **It is not the responsibility of the admin who load the library into the database.**

The PR moves all the library metadata (engine and function name) to be part of the library payload. The metadata needs to be provided on the first line of the payload using the shebang format (`#!<engine> name=<name>`), example:

```lua
#!lua name=test
redis.register_function('foo', function() return 1 end)
```

The above script will run on the Lua engine and will create a library called `test`.

## API Changes (compare to 7.0 rc2)

* `FUNCTION LOAD` command was change and now it simply gets the library payload and extract the engine and name from the payload. In addition, the command will now return the function name which can later be used on `FUNCTION DELETE` and `FUNCTION LIST`.
* The description field was completely removed from`FUNCTION LOAD`, and `FUNCTION LIST`


## Breaking Changes (compare to 7.0 rc2)

* Library description was removed (we can re-add it in the future either as part of the shebang line or an additional line).
* Loading an AOF file that was generated by either 7.0 rc1 or 7.0 rc2 will fail because the old command syntax is invalid.

## Notes

* Loading an RDB file that was generated by rc1 / rc2 **is** supported, Redis will automatically add the shebang to the libraries payloads (we can probably delete that code after 7.0.3 or so since there's no need to keep supporting upgrades from an RC build).
2022-04-05 10:27:24 +03:00
Itamar Haber c81c7f51c3
Add stream consumer group lag tracking and reporting (#9127)
Adds the ability to track the lag of a consumer group (CG), that is, the number
of entries yet-to-be-delivered from the stream.

The proposed constant-time solution is in the spirit of "best-effort."

Partially addresses #8737.

## Description of approach

We add a new "entries_added" property to the stream. This starts at 0 for a new
stream and is incremented by 1 with every `XADD`.  It is essentially an all-time
counter of the entries added to the stream.

Given the stream's length and this counter value, we can trivially find the logical
"entries_added" counter of the first ID if and only if the stream is contiguous.
A fragmented stream contains one or more tombstones generated by `XDEL`s.
The new "xdel_max_id" stream property tracks the latest tombstone.

The CG also tracks its last delivered ID's as an "entries_read" counter and
increments it independently when delivering new messages, unless the this
read counter is invalid (-1 means invalid offset). When the CG's counter is
available, the reported lag is the difference between added and read counters.

Lastly, this also adds a "first_id" field to the stream structure in order to make
looking it up cheaper in most cases.

## Limitations

There are two cases in which the mechanism isn't able to track the lag.
In these cases, `XINFO` replies with `null` in the "lag" field.

The first case is when a CG is created with an arbitrary last delivered ID,
that isn't "0-0", nor the first or the last entries of the stream. In this case,
it is impossible to obtain a valid read counter (short of an O(N) operation).
The second case is when there are one or more tombstones fragmenting
the stream's entries range.

In both cases, given enough time and assuming that the consumers are
active (reading and lacking) and advancing, the CG should be able to
catch up with the tip of the stream and report zero lag.
Once that's achieved, lag tracking would resume as normal (until the
next tombstone is set).

## API changes

* `XGROUP CREATE` added with the optional named argument `[ENTRIESREAD entries-read]`
  for explicitly specifying the new CG's counter.
* `XGROUP SETID` added with an optional positional argument `[ENTRIESREAD entries-read]`
  for specifying the CG's counter.
* `XINFO` reports the maximal tombstone ID, the recorded first entry ID, and total
  number of entries added to the stream.
* `XINFO` reports the current lag and logical read counter of CGs.
* `XSETID` is an internal command that's used in replication/aof. It has been added with
  the optional positional arguments `[ENTRIESADDED entries-added] [MAXDELETEDID max-deleted-entry-id]`
  for propagating the CG's offset and maximal tombstone ID of the stream.

## The generic unsolved problem

The current stream implementation doesn't provide an efficient way to obtain the
approximate/exact size of a range of entries. While it could've been nice to have
that ability (#5813) in general, let alone specifically in the context of CGs, the risk
and complexities involved in such implementation are in all likelihood prohibitive.

## A refactoring note

The `streamGetEdgeID` has been refactored to accommodate both the existing seek
of any entry as well as seeking non-deleted entries (the addition of the `skip_tombstones`
argument). Furthermore, this refactoring also migrated the seek logic to use the
`streamIterator` (rather than `raxIterator`) that was, in turn, extended with the
`skip_tombstones` Boolean struct field to control the emission of these.

Co-authored-by: Guy Benoish <guy.benoish@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-23 22:34:58 +02:00
Meir Shpilraien (Spielrein) 885f6b5ceb
Redis Function Libraries (#10004)
# Redis Function Libraries

This PR implements Redis Functions Libraries as describe on: https://github.com/redis/redis/issues/9906.

Libraries purpose is to provide a better code sharing between functions by allowing to create multiple
functions in a single command. Functions that were created together can safely share code between
each other without worrying about compatibility issues and versioning.

Creating a new library is done using 'FUNCTION LOAD' command (full API is described below)

This PR introduces a new struct called libraryInfo, libraryInfo holds information about a library:
* name - name of the library
* engine - engine used to create the library
* code - library code
* description - library description
* functions - the functions exposed by the library

When Redis gets the `FUNCTION LOAD` command it creates a new empty libraryInfo.
Redis passes the `CODE` to the relevant engine alongside the empty libraryInfo.
As a result, the engine will create one or more functions by calling 'libraryCreateFunction'.
The new funcion will be added to the newly created libraryInfo. So far Everything is happening
locally on the libraryInfo so it is easy to abort the operation (in case of an error) by simply
freeing the libraryInfo. After the library info is fully constructed we start the joining phase by
which we will join the new library to the other libraries currently exist on Redis.
The joining phase make sure there is no function collision and add the library to the
librariesCtx (renamed from functionCtx). LibrariesCtx is used all around the code in the exact
same way as functionCtx was used (with respect to RDB loading, replicatio, ...).
The only difference is that apart from function dictionary (maps function name to functionInfo
object), the librariesCtx contains also a libraries dictionary that maps library name to libraryInfo object.

## New API
### FUNCTION LOAD
`FUNCTION LOAD <ENGINE> <LIBRARY NAME> [REPLACE] [DESCRIPTION <DESCRIPTION>] <CODE>`
Create a new library with the given parameters:
* ENGINE - REPLACE Engine name to use to create the library.
* LIBRARY NAME - The new library name.
* REPLACE - If the library already exists, replace it.
* DESCRIPTION - Library description.
* CODE - Library code.

Return "OK" on success, or error on the following cases:
* Library name already taken and REPLACE was not used
* Name collision with another existing library (even if replace was uses)
* Library registration failed by the engine (usually compilation error)

## Changed API
### FUNCTION LIST
`FUNCTION LIST [LIBRARYNAME <LIBRARY NAME PATTERN>] [WITHCODE]`
Command was modified to also allow getting libraries code (so `FUNCTION INFO` command is no longer
needed and removed). In addition the command gets an option argument, `LIBRARYNAME` allows you to
only get libraries that match the given `LIBRARYNAME` pattern. By default, it returns all libraries.

### INFO MEMORY
Added number of libraries to `INFO MEMORY`

### Commands flags
`DENYOOM` flag was set on `FUNCTION LOAD` and `FUNCTION RESTORE`. We consider those commands
as commands that add new data to the dateset (functions are data) and so we want to disallows
to run those commands on OOM.

## Removed API
* FUNCTION CREATE - Decided on https://github.com/redis/redis/issues/9906
* FUNCTION INFO - Decided on https://github.com/redis/redis/issues/9899

## Lua engine changes
When the Lua engine gets the code given on `FUNCTION LOAD` command, it immediately runs it, we call
this run the loading run. Loading run is not a usual script run, it is not possible to invoke any
Redis command from within the load run.
Instead there is a new API provided by `library` object. The new API's: 
* `redis.log` - behave the same as `redis.log`
* `redis.register_function` - register a new function to the library

The loading run purpose is to register functions using the new `redis.register_function` API.
Any attempt to use any other API will result in an error. In addition, the load run is has a time
limit of 500ms, error is raise on timeout and the entire operation is aborted.

### `redis.register_function`
`redis.register_function(<function_name>, <callback>, [<description>])`
This new API allows users to register a new function that will be linked to the newly created library.
This API can only be called during the load run (see definition above). Any attempt to use it outside
of the load run will result in an error.
The parameters pass to the API are:
* function_name - Function name (must be a Lua string)
* callback - Lua function object that will be called when the function is invokes using fcall/fcall_ro
* description - Function description, optional (must be a Lua string).

### Example
The following example creates a library called `lib` with 2 functions, `f1` and `f1`, returns 1 and 2 respectively:
```
local function f1(keys, args)
    return 1
end

local function f2(keys, args)
    return 2
end

redis.register_function('f1', f1)
redis.register_function('f2', f2)
```

Notice: Unlike `eval`, functions inside a library get the KEYS and ARGV as arguments to the
functions and not as global.

### Technical Details

On the load run we only want the user to be able to call a white list on API's. This way, in
the future, if new API's will be added, the new API's will not be available to the load run
unless specifically added to this white list. We put the while list on the `library` object and
make sure the `library` object is only available to the load run by using [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv) API. This API allows us to set
the `globals` of a function (and all the function it creates). Before starting the load run we
create a new fresh Lua table (call it `g`) that only contains the `library` API (we make sure
to set global protection on this table just like the general global protection already exists
today), then we use [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv)
to set `g` as the global table of the load run. After the load run finished we update `g`
metatable and set `__index` and `__newindex` functions to be `_G` (Lua default globals),
we also pop out the `library` object as we do not need it anymore.
This way, any function that was created on the load run (and will be invoke using `fcall`) will
see the default globals as it expected to see them and will not have the `library` API anymore.

An important outcome of this new approach is that now we can achieve a distinct global table
for each library (it is not yet like that but it is very easy to achieve it now). In the future we can
decide to remove global protection because global on different libraries will not collide or we
can chose to give different API to different libraries base on some configuration or input.

Notice that this technique was meant to prevent errors and was not meant to prevent malicious
user from exploit it. For example, the load run can still save the `library` object on some local
variable and then using in `fcall` context. To prevent such a malicious use, the C code also make
sure it is running in the right context and if not raise an error.
2022-01-06 13:39:38 +02:00
yoav-steinberg 1bf6d6f11e
Generate RDB with Functions only via redis-cli --functions-rdb (#9968)
This is needed in order to ease the deployment of functions for ephemeral cases, where user
needs to spin up a server with functions pre-loaded.

#### Details:

* Added `--functions-rdb` option to _redis-cli_.
* Functions only rdb via `REPLCONF rdb-filter-only functions`. This is a placeholder for a space
  separated inclusion filter for the RDB. In the future can be `REPLCONF rdb-filter-only
  "functions db:3 key-patten:user*"` and a complementing `rdb-filter-exclude` `REPLCONF`
  can also be added.
* Handle "slave requirements" specification to RDB saving code so we can use the same RDB
  when different slaves express the same requirements (like functions-only) and not share the
  RDB when their requirements differ. This is currently just a flags `int`, but can be extended to
  a more complex structure with various filter fields.
* make sure to support filters only in diskless replication mode (not to override the persistence file),
  we do that by forcing diskless (even if disabled by config)

other changes:
* some refactoring in rdb.c (extract portion of a big function to a sub-function)
* rdb_key_save_delay used in AOFRW too
* sendChildInfo takes the number of updated keys (incremental, rather than absolute)

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-02 09:39:01 +02:00
Meir Shpilraien (Spielrein) 365cbf46a7
Add FUNCTION DUMP and RESTORE. (#9938)
Follow the conclusions to support Functions in redis cluster (#9899)

Added 2 new FUNCTION sub-commands:
1. `FUNCTION DUMP` - dump a binary payload representation of all the functions.
2. `FUNCTION RESTORE <PAYLOAD> [FLUSH|APPEND|REPLACE]` - give the binary payload extracted
   using `FUNCTION DUMP`, restore all the functions on the given payload. Restore policy can be given to
   control how to handle existing functions (default is APPEND):
   * FLUSH: delete all existing functions.
   * APPEND: appends the restored functions to the existing functions. On collision, abort.
   * REPLACE: appends the restored functions to the existing functions. On collision,
     replace the old function with the new function.

Modify `redis-cli --cluster add-node` to use `FUNCTION DUMP` to get existing functions from
one of the nodes in the cluster, and `FUNCTION RESTORE` to load the same set of functions
to the new node. `redis-cli` will execute this step before sending the `CLUSTER MEET` command
to the new node. If `FUNCTION DUMP` returns an error, assume the current Redis version do not
support functions and skip `FUNCTION RESTORE`. If `FUNCTION RESTORE` fails, abort and do not send
the `CLUSTER MEET` command. If the new node already contains functions (before the `FUNCTION RESTORE`
is sent), abort and do not add the node to the cluster. Test was added to verify
`redis-cli --cluster add-node` works as expected.
2021-12-26 09:03:37 +02:00
meir@redislabs.com cbd463175f Redis Functions - Added redis function unit and Lua engine
Redis function unit is located inside functions.c
and contains Redis Function implementation:
1. FUNCTION commands:
  * FUNCTION CREATE
  * FCALL
  * FCALL_RO
  * FUNCTION DELETE
  * FUNCTION KILL
  * FUNCTION INFO
2. Register engine

In addition, this commit introduce the first engine
that uses the Redis Function capabilities, the
Lua engine.
2021-12-02 19:35:52 +02:00
Eduardo Semprebon 91d0c758e5
Replica keep serving data during repl-diskless-load=swapdb for better availability (#9323)
For diskless replication in swapdb mode, considering we already spend replica memory
having a backup of current db to restore in case of failure, we can have the following benefits
by instead swapping database only in case we succeeded in transferring db from master:

- Avoid `LOADING` response during failed and successful synchronization for cases where the
  replica is already up and running with data.
- Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load
  time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping.
- This could be implemented also for disk replication with similar benefits if consumers are willing
  to spend the extra memory usage.

General notes:
- The concept of `backupDb` becomes `tempDb` for clarity.
- Async loading mode will only kick in if the replica is syncing from a master that has the same
  repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline. 
- New property in INFO: `async_loading` to differentiate from the blocking loading
- Slot to Key mapping is now a field of `redisDb` as it's more natural to access it from both server.db
  and the tempDb that is passed around.
- Because this is affecting replicas only, we assume that if they are not readonly and write commands
  during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET
  here anyways to avoid complications.

Considerations for review:
- We have many cases where server.loading flag is used and even though I tried my best, there may
  be cases where async_loading should be checked as well and cases where it shouldn't (would require
  very good understanding of whole code)
- Several places that had different behavior depending on the loading flag where actually meant to just
  handle commands coming from the AOF client differently than ones coming from real clients, changed
  to check CLIENT_ID_AOF instead.

**Additional for Release Notes**
- Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't
  contribute on triggering next database SAVE
- New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING
- Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event.
  Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED,
  ABORTED and COMPLETED.
- New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions
  to allow modules to declare they support the diskless replication with async loading (when absent, we fall
  back to disk-based loading).

Co-authored-by: Eduardo Semprebon <edus@saxobank.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-04 10:46:50 +02:00
perryitay f27083a4a8
Add support for list type to store elements larger than 4GB (#9357)
Redis lists are stored in quicklist, which is currently a linked list of ziplists.
Ziplists are limited to storing elements no larger than 4GB, so when bigger
items are added they're getting truncated.
This PR changes quicklists so that they're capable of storing large items
in quicklist nodes that are plain string buffers rather than ziplist.

As part of the PR there were few other changes in redis: 
1. new DEBUG sub-commands: 
   - QUICKLIST-PACKED-THRESHOLD - set the threshold of for the node type to
     be plan or ziplist. default (1GB)
   - QUICKLIST <key> - Shows low level info about the quicklist encoding of <key>
2. rdb format change:
   - A new type was added - RDB_TYPE_LIST_QUICKLIST_2 . 
   - container type (packed / plain) was added to the beginning of the rdb object
     (before the actual node list).
3. testing:
   - Tests that requires over 100MB will be by default skipped. a new flag was
     added to 'runtest' to run the large memory tests (not used by default)

Co-authored-by: sundb <sundbcn@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-03 20:47:18 +02:00
Binbin dd3ac97ffe
Cleanup typos, incorrect comments, and fixed small memory leak in redis-cli (#9153)
1. Remove forward declarations from header files to functions that do not exist:
hmsetCommand and rdbSaveTime.
2. Minor phrasing fixes in #9519
3. Add missing sdsfree(title) and fix typo in redis-benchmark.
4. Modify some error comments in some zset commands.
5. Fix copy-paste bug comment in syncWithMaster about `ip-address`.
2021-10-02 22:19:33 -07:00
zhaozhao.zz 794442b130
PSYNC2: make partial sync possible after master reboot (#8015)
The main idea is how to allow a master to load replication info from RDB file when rebooting, if master can load replication info it means that replicas may have the chance to psync with master, it can save much traffic.

The key point is we need guarantee safety and consistency, so there
are two differences between master and replica:

1. master would load the replication info as secondary ID and
   offset, in case other masters have the same replid.
2. when master loading RDB, it would propagate expired keys as DEL
   command to replication backlog, then replica can receive these
   commands to delete stale keys.
   p.s. the expired keys when RDB loading is useful for users, so
   we show it as `rdb_last_load_keys_expired` and `rdb_last_load_keys_loaded` in info persistence.

Moreover, after load replication info, master should update
`no_replica_time` in case loading RDB cost too long time.
2021-09-13 15:39:11 +08:00
sundb 3ca6972ecd
Replace all usage of ziplist with listpack for t_zset (#9366)
Part two of implementing #8702 (zset), after #8887.

## Description of the feature
Replaced all uses of ziplist with listpack in t_zset, and optimized some of the code to optimize performance.

## Rdb format changes
New `RDB_TYPE_ZSET_LISTPACK` rdb type.

## Rdb loading improvements:
1) Pre-expansion of dict for validation of duplicate data for listpack and ziplist.
2) Simplifying the release of empty key objects when RDB loading.
3) Unify ziplist and listpack data verify methods for zset and hash, and move code to rdb.c.

## Interface changes
1) New `zset-max-listpack-entries` config is an alias for `zset-max-ziplist-entries` (same with `zset-max-listpack-value`).
2) OBJECT ENCODING will return listpack instead of ziplist.

## Listpack improvements:
1) Add `lpDeleteRange` and `lpDeleteRangeWithEntry` functions to delete a range of entries from listpack.
2) Improve the performance of `lpCompare`, converting from string to integer is faster than converting from integer to string.
3) Replace `snprintf` with `ll2string` to improve performance in converting numbers to strings in `lpGet()`.

## Zset improvements:
1) Improve the performance of `zzlFind` method, use `lpFind` instead of `lpCompare` in a loop.
2) Use `lpDeleteRangeWithEntry` instead of `lpDelete` twice to delete a element of zset.

## Tests
1) Add some unittests for `lpDeleteRange` and `lpDeleteRangeWithEntry` function.
2) Add zset RDB loading test.
3) Add benchmark test for `lpCompare` and `ziplsitCompare`.
4) Add empty listpack zset corrupt dump test.
2021-09-09 18:18:53 +03:00
sundb 02fd76b97c
Replace all usage of ziplist with listpack for t_hash (#8887)
Part one of implementing #8702 (taking hashes first before other types)

## Description of the feature
1. Change ziplist encoded hash objects to listpack encoding.
2. Convert existing ziplists on RDB loading time. an O(n) operation.

## Rdb format changes
1. Add RDB_TYPE_HASH_LISTPACK rdb type.
2. Bump RDB_VERSION to 10

## Interface changes
1. New `hash-max-listpack-entries` config is an alias for `hash-max-ziplist-entries` (same with `hash-max-listpack-value`)
2. OBJECT ENCODING will return `listpack` instead of `ziplist`

## Listpack improvements:
1. Support direct insert, replace integer element (rather than convert back and forth from string)
3. Add more listpack capabilities to match the ziplist ones (like `lpFind`, `lpRandomPairs` and such)
4. Optimize element length fetching, avoid multiple calculations
5. Use inline to avoid function call overhead.

## Tests
1. Add a new test to the RDB load time conversion
2. Adding the listpack unit tests. (based on the one in ziplist.c)
3. Add a few "corrupt payload: fuzzer findings" tests, and slightly modify existing ones.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-10 09:18:49 +03:00
sundb 8ea777a6a0
Sanitize dump payload: fix empty keys when RDB loading and restore command (#9297)
When we load rdb or restore command, if we encounter a length of 0, it will result in the creation of an empty key.
This could either be a corrupt payload, or a result of a bug (see #8453 )

This PR mainly fixes the following:
1) When restore command will return `Bad data format` error.
2) When loading RDB, we will silently discard the key.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-05 22:42:20 +03:00
chenyang8094 e0cd3ad0de
Enhance mem_usage/free_effort/unlink/copy callbacks and add GetDbFromIO api. (#8999)
Create new module type enhanced callbacks: mem_usage2, free_effort2, unlink2, copy2.
These will be given a context point from which the module can obtain the key name and database id.
In addition the digest and defrag context can now be used to obtain the key name and database id.
2021-06-16 09:45:49 +03:00
Huang Zw 8f9958dc24
Fix typo and some out of date comments (#8449)
Fix typo and some out of date comments
2021-02-08 09:29:32 -08:00
Ariel Shtul 63a05dde46
Fix redis-check-rdb support for modules aux data (#7826)
redis-check-rdb was unable to parse rdb files containing module aux data.

Co-authored-by: Oran Agra <oran@redislabs.com>
2020-09-22 10:18:07 +03:00
Wang Yuan b002d2b4f1
Remove tmp rdb file in background thread (#7762)
We're already using bg_unlink in several places to delete the rdb file in the background,
and avoid paying the cost of the deletion from our main thread.
This commit uses bg_unlink to remove the temporary rdb file in the background too.

However, in case we delete that rdb file just before exiting, we don't actually wait for the
background thread or the main thread to delete it, and just let the OS clean up after us.
i.e. we open the file, unlink it and exit with the fd still open.

Furthermore, rdbRemoveTempFile can be called from a thread and was using snprintf which is
not async-signal-safe, we now use ll2string instead.
2020-09-17 18:20:10 +03:00
antirez 451872527c More powerful DEBUG RELOAD.
Related to #3243.
2020-04-09 12:10:10 +02:00
antirez 3d5b2d41b5 RDB: clarify a condition in rdbLoadRio(). 2020-04-09 11:09:40 +02:00
antirez 30adc62232 RDB: load files faster avoiding useless free+realloc.
Reloading of the RDB generated by

    DEBUG POPULATE 5000000
    SAVE

is now 25% faster.

This commit also prepares the ability to have more flexibility when
loading stuff from the RDB, since we no longer use dbAdd() but can
control exactly how things are added in the database.
2020-04-09 10:24:46 +02:00
Guy Benoish d72b7ed4fb DEBUG OBJECT should pass keyname to module when loading 2020-01-30 19:15:12 +05:30
Oran Agra 51c3ff8d75 Modules hooks: complete missing hooks for the initial set of hooks
* replication hooks: role change, master link status, replica online/offline
* persistence hooks: saving, loading, loading progress
* misc hooks: cron loop, shutdown, module loaded/unloaded
* change the way hooks test work, and add tests for all of the above

startLoading() now gets flag indicating what is loaded.
stopLoading() now gets an indication of success or failure.
adding startSaving() and stopSaving() with similar args and role.
2019-10-29 17:59:09 +02:00
Oran Agra 3b6aeea44c Implement module api for aux data in rdb
Other changes:
* fix memory leak in error handling of rdb loading of type OBJ_MODULE
2019-07-22 21:15:33 +03:00
antirez 42b6305964 RDB: try to make error handling code more readable. 2019-07-17 17:30:02 +02:00
Oran Agra c56b4ddc6f prevent diskless replica from terminating on short read
now that replica can read rdb directly from the socket, it should avoid exiting
on short read and instead try to re-sync.

this commit tries to have minimal effects on non-diskless rdb reading.
and includes a test that tries to trigger this scenario on various read cases.
2019-07-17 16:46:22 +02:00
Yossi Gottlieb a88264d934 Add RedisModule_GetKeyNameFromIO(). 2019-03-15 10:23:27 +02:00
Oran Agra f31b0405f0 fix redis-rdb-check to provide proper arguments to rdbLoadMillisecondTime
due to incorrect forward declaration, it didn't provide all arguments.
this lead to random value being read from the stack and return of incorrect time,
which in this case doesn't matter since no one uses it.
2018-06-19 16:54:22 +03:00
antirez 49147f36e9 Don't expire keys while loading RDB from AOF preamble.
The AOF tail of a combined RDB+AOF is based on the premise of applying
the AOF commands to the exact state that there was in the server while
the RDB was persisted. By expiring keys while loading the RDB file, we
change the state, so applying the AOF tail later may change the state.

Test case:

* Time1: SET a 10
* Time2: EXPIREAT a $time5
* Time3: INCR a
* Time4: PERSIT A. Start bgrewiteaof with RDB preamble. The value of a is 11 without expire time.
* Time5: Restart redis from the RDB+AOF: consistency violation.

Thanks to @soloestoy for providing the patch.
Thanks to @trevor211 for the original issue report and the initial fix.

Check issue #4950 for more info.
2018-05-29 12:37:42 +02:00
WuYunlong 2a887bd53f Fix rdb save by allowing dumping of expire keys, so that when
we add a new slave, and do a failover, eighter by manual or
not, other local slaves will delete the expired keys properly.
2018-05-29 12:35:15 +02:00
antirez 38bc8ea531 RDB version 9. 2018-03-16 13:48:44 +01:00
antirez 8b0cfb1e66 RDB: Implement future-proof module AUX data loading. 2018-03-16 13:47:10 +01:00
antirez d7a5c0eb71 RDB: Ability to save LFU/LRU info.
This is a big win for caching use cases, since on reloading Redis will
still have some idea about what is worth to evict and what not.
However this only solves part of the problem because the information is
only partially propagated to slaves (on write operations). Reads will
not affect slaves LFU and LRU counters, so after a failover the eviction
decisions are kinda random until keys start to collect some aging/freq info.

However since new slaves are initially populated via RDB file transfer,
this means that if we spin up a new slave from a master, and perform an
immediate manual failover (for instance in order to upgrade the master),
the slave will have eviction informations to use for some time.

The LFU/LRU info is persisted only if the maxmemory policy is set to one
of the relevant type, even if no actual "maxmemory"  memory limit is
set.
2018-03-15 13:15:55 +01:00
Oran Agra 60a4f12f8b fix processing of large bulks (above 2GB)
- protocol parsing (processMultibulkBuffer) was limitted to 32big positions in the buffer
  readQueryFromClient potential overflow
- rioWriteBulkCount used int, although rioWriteBulkString gave it size_t
- several places in sds.c that used int for string length or index.
- bugfix in RM_SaveAuxField (return was 1 or -1 and not length)
- RM_SaveStringBuffer was limitted to 32bit length
2017-12-29 12:24:19 +02:00
antirez 100d43c1ac Streams: assign value of 6 to OBJ_STREAM + some refactoring. 2017-12-01 10:24:24 +01:00
antirez 79866a6361 Streams: 12 commits squashed into the initial Streams implementation. 2017-12-01 10:24:24 +01:00
antirez c1c99e9f4e PSYNC2: Fix the way replication info is saved/loaded from RDB.
This commit attempts to fix a number of bugs reported in #4316.
They are related to the way replication info like replication ID,
offsets, and currently selected DB in the master client, are stored
and loaded by Redis. In order to avoid inconsistencies the changes in
this commit try to enforce that:

1. Replication information are only stored when the RDB file is
generated by a slave that has a valid 'master' client, so that we can
always extract the currently selected DB.
2. When replication informations are persisted in the RDB file, all the
info for a successful PSYNC or nothing is persisted.
3. The RDB replication informations are only loaded if the instance is
configured as a slave, otherwise a master can start with IDs that relate
to a different history of the data set, and stil retain such IDs in the
future while receiving unrelated writes.
2017-09-19 23:03:39 +02:00
antirez 365dd037dc RDB modules values serialization format version 2.
The original RDB serialization format was not parsable without the
module loaded, becuase the structure was managed only by the module
itself. Moreover RDB is a streaming protocol in the sense that it is
both produce di an append-only fashion, and is also sometimes directly
sent to the socket (in the case of diskless replication).

The fact that modules values cannot be parsed without the relevant
module loaded is a problem in many ways: RDB checking tools must have
loaded modules even for doing things not involving the value at all,
like splitting an RDB into N RDBs by key or alike, or just checking the
RDB for sanity.

In theory module values could be just a blob of data with a prefixed
length in order for us to be able to skip it. However prefixing the values
with a length would mean one of the following:

1. To be able to write some data at a previous offset. This breaks
stremaing.
2. To bufferize values before outputting them. This breaks performances.
3. To have some chunked RDB output format. This breaks simplicity.

Moreover, the above solution, still makes module values a totally opaque
matter, with the fowllowing problems:

1. The RDB check tool can just skip the value without being able to at
least check the general structure. For datasets composed mostly of
modules values this means to just check the outer level of the RDB not
actually doing any checko on most of the data itself.
2. It is not possible to do any recovering or processing of data for which a
module no longer exists in the future, or is unknown.

So this commit implements a different solution. The modules RDB
serialization API is composed if well defined calls to store integers,
floats, doubles or strings. After this commit, the parts generated by
the module API have a one-byte prefix for each of the above emitted
parts, and there is a final EOF byte as well. So even if we don't know
exactly how to interpret a module value, we can always parse it at an
high level, check the overall structure, understand the types used to
store the information, and easily skip the whole value.

The change is backward compatible: older RDB files can be still loaded
since the new encoding has a new RDB type: MODULE_2 (of value 7).
The commit also implements the ability to check RDB files for sanity
taking advantage of the new feature.
2017-06-27 13:19:16 +02:00