Commit Graph

11432 Commits

Author SHA1 Message Date
Oran Agra f35f36a265 Redis 7.0.15 2024-01-09 13:51:41 +02:00
Binbin 3e720bbcf5 Use CLZ in _dictNextExp to get the next power of two (#12815)
In the past, we did not call _dictNextExp frequently. It was only
called when the dictionary was expanded.

Later, dictTypeExpandAllowed was introduced in #7954, which is 6.2.
For the data dict and the expire dict, we can check maxmemory before
actually expanding the dict. This is a good optimization to avoid
maxmemory being exceeded due to the dict expansion.

And in #11692, we moved the dictTypeExpandAllowed check before the
threshold check, this caused a bit of performance degradation, every
time a key is added to the dict, dictTypeExpandAllowed is called to
check.

The main reason for degradation is that in a large dict, we need to
call _dictNextExp frequently, that is, every time we add a key, we
need to call _dictNextExp once. Then the threshold is checked to see
if the dict needs to be expanded. We can see that the order of checks
here can be optimized.

So we moved the dictTypeExpandAllowed check back to after the threshold
check in #12789. In this way, before the dict is actually expanded (that
is, before the threshold is reached), we will not do anything extra
compared to before, that is, we will not call _dictNextExp frequently.

But note we'll still hit the degradation when we over the thresholds.
When the threshold is reached, because #7954, we may delay the dict
expansion due to maxmemory limitations. In this case, we will call
_dictNextExp every time we add a key during this period.

This PR use CLZ in _dictNextExp to get the next power of two. CLZ (count
leading zeros) can easily give you the next power of two. It should be
noted that we have actually introduced the use of __builtin_clzl in
#8687,
which is 7.0. So i suppose all the platforms we use have it (even if the
CPU doesn't have an instruction).

We build 67108864 (2**26) keys through DEBUG POPULTE, which will use
approximately 5.49G memory (used_memory:5898522936). If expansion is
triggered, the additional hash table will consume approximately 1G
memory (2 ** 27 * 8). So we set maxmemory to 6871947673 (that is, 6.4G),
which will be less than 5.49G + 1G, so we will delay the dict rehash
while addint the keys.

After that, each time an element is added to the dict, an allow check
will be performed, that is, we can frequently call _dictNextExp to test
the comparison before and after the optimization. Using DEBUG HTSTATS 0
to
check and make sure that our dict expansion is dealyed.

Using `./src/redis-server redis.conf --save "" --maxmemory 6871947673`.
Using `./src/redis-benchmark -P 100 -r 1000000000 -t set -n 5000000`.
After ten rounds of testing:
```
unstable:           this PR:
769585.94           816860.00
771724.00           818196.69
775674.81           822368.44
781983.12           822503.69
783576.25           828088.75
784190.75           828637.75
791389.69           829875.50
794659.94           835660.69
798212.00           830013.25
801153.62           833934.56
```

We can see there is about 4-5% performance improvement in this case.

(cherry picked from commit 22cc9b5122)
2024-01-09 13:51:41 +02:00
Binbin 8263146227 Optimize dict expand check, move allow check after the thresholds check (#12789)
dictExpandAllowed (for the main db dict and the expire dict) seems to
involve a few function calls and memory accesses, and we can do it only
after the thresholds checks and can get some performance improvements.

A simple benchmark test: there are 11032768 fixed keys in the database,
start a redis-server with `--maxmemory big_number --save ""`,
start a redis-benchmark with `-P 100 -r 1000000000 -t set -n 5000000`,
collect `throughput summary: n requests per second` result.

After five rounds of testing:
```
unstable     this PR
848032.56    897988.56
854408.69    913408.88
858663.94    914076.81
871839.56    916758.31
882612.56    920640.75
```

We can see a 5% performance improvement in general condition.
But note we'll still hit the degradation when we over the thresholds.

(cherry picked from commit 463476933c)
2024-01-09 13:51:41 +02:00
Oran Agra e351099e11 Fix possible corruption in sdsResize (CVE-2023-41056)
#11766 introduced a bug in sdsResize where it could forget to update
the sds type in the sds header and then cause an overflow in sdsalloc.
it looks like the only implication of that is a possible assertion in HLL,
but it's hard to rule out possible heap corruption issues with clientsCronResizeQueryBuffer
2024-01-09 13:51:41 +02:00
Oran Agra c1d92a69c6 Redis 7.0.14 2023-10-18 10:43:44 +03:00
Binbin 6573acbd73 Support NO ONE block in REPLICAOF command json (#12633)
The current commands.json doesn't mention the special NO ONE arguments.
This change is also applied to SLAVEOF

(cherry picked from commit 8d92f7f2b7)
2023-10-18 10:43:44 +03:00
Jachin 8ada737f0a Fix compile on macOS 13 (#12611)
Use the __MAC_OS_X_VERSION_MIN_REQUIRED macro to detect the
macOS system version instead of using MAC_OS_X_VERSION_10_6.

From MacOSX14.0.sdk, the default definitions of MAC_OS_X_VERSION_xxx have
been removed in usr/include/AvailabilityMacros.h. It includes AvailabilityVersions.h,
where the following condition must be met:
`#if (!defined(_POSIX_C_SOURCE) && !defined(_XOPEN_SOURCE)) || defined(_DARWIN_C_SOURCE)`
Only then will MAC_OS_X_VERSION_xxx be defined.
However, in the project, _DARWIN_C_SOURCE is not defined, which leads to the
loss of the definition for MAC_OS_X_VERSION_10_6.

(cherry picked from commit a2b0701d2c)
2023-10-18 10:43:44 +03:00
Yossi Gottlieb 7f486ea6ee Fix issue of listen before chmod on Unix sockets (CVE-2023-45145)
Before this commit, Unix socket setup performed chmod(2) on the socket
file after calling listen(2). Depending on what umask is used, this
could leave the file with the wrong permissions for a short period of
time. As a result, another process could exploit this race condition and
establish a connection that would otherwise not be possible.

We now make sure the socket permissions are set up prior to calling
listen(2).

(cherry picked from commit a11b3bc34a)
2023-10-18 10:43:44 +03:00
Oran Agra 49dbedb1d5 Redis 7.0.13 2023-09-06 20:55:58 +03:00
bodong.ybd 0f14d32792 Fix sort_ro get-keys function return wrong key number (#12522)
Before:
```
127.0.0.1:6379> command getkeys sort_ro key
(empty array)
127.0.0.1:6379>
```
After:
```
127.0.0.1:6379> command getkeys sort_ro key
1) "key"
127.0.0.1:6379>
```

(cherry picked from commit b59f53efb3)
2023-09-06 20:55:58 +03:00
zhaozhao.zz 4d67bb6afa do not call handleClientsBlockedOnKeys inside yielding command (#12459)
Fix the assertion when a busy script (timeout) signal ready keys (like LPUSH),
and then an arbitrary client's `allow-busy` command steps into `handleClientsBlockedOnKeys`
try wake up clients blocked on keys (like BLPOP).

Reproduction process:
1. start a redis with aof
    `./redis-server --appendonly yes`
2. exec blpop
    `127.0.0.1:6379> blpop a 0`
3. use another client call a busy script and this script push the blocked key
    `127.0.0.1:6379> eval "redis.call('lpush','a','b') while(1) do end" 0`
4. user a new client call an allow-busy command like auth
    `127.0.0.1:6379> auth a`

BTW, this issue also break the atomicity of script.

This bug has been around for many years, the old versions only have the
atomic problem, only 7.0/7.2 has the assertion problem.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 8226f39fb2)
2023-09-06 20:55:58 +03:00
Meir Shpilraien (Spielrein) 37599fe75a Ensure that the function load timeout is disabled during loading from RDB/AOF and on replicas. (#12451)
When loading a function from either RDB/AOF or a replica, it is essential not to
fail on timeout errors. The loading time may vary due to various factors, such as
hardware specifications or the system's workload during the loading process.
Once a function has been successfully loaded, it should be allowed to load from
persistence or on replicas without encountering a timeout failure.

To maintain a clear separation between the engine and Redis internals, the
implementation refrains from directly checking the state of Redis within the
engine itself. Instead, the engine receives the desired timeout as part of the
library creation and duly respects this timeout value. If Redis wishes to disable
any timeout, it can simply send a value of 0.

(cherry picked from commit 2ee1bbb53b)
2023-09-06 20:55:58 +03:00
Sankar ea1bc6f623 Process loss of slot ownership in cluster bus (#12344)
Process loss of slot ownership in cluster bus

When a node no longer owns a slot, it clears the bit corresponding
to the slot in the cluster bus messages. The receiving nodes
currently don't record the fact that the sender stopped claiming
a slot until some other node in the cluster starts claiming the slot.
This can cause a slot to go missing during slot migration when subjected
to inopportune race with addition of new shards or a failover.
This fix forces the receiving nodes to process the loss of ownership
to avoid spreading wrong information.

(cherry picked from commit 1190f25ca7)
2023-09-06 20:55:58 +03:00
sundb 646069a900 Skip test for sdsRemoveFreeSpace when mem_allocator is not jemalloc (#11878)
Test `trim on SET with big value` (introduced from #11817) fails under mac m1 with libc mem_allocator.
The reason is that malloc(33000) will allocate 65536 bytes(>42000).
This test still passes under ubuntu with libc mem_allocator.

```
*** [err]: trim on SET with big value in tests/unit/type/string.tcl
Expected [r memory usage key] < 42000 (context: type source line 471 file /Users/iospack/data/redis_fork/tests/unit/type/string.tcl cmd {assert {[r memory usage key] < 42000}} proc ::test)
```

simple test under mac m1 with libc mem_allocator:
```c
void *p = zmalloc(33000);
printf("malloc size: %zu\n", zmalloc_size(p));

# output
malloc size: 65536
```

(cherry picked from commit 3fba3ccd96)
2023-09-06 20:55:58 +03:00
Oran Agra 8e73f9d348 Redis 7.0.12 2023-07-10 14:39:42 +03:00
sundb f90ecfb1f7 Fix compile errors when building with gcc-12 or clang (partial #12035)
This is a partial cherry-pick from Redis 7.2

## Fix various compilation warnings and errors

5) server.c

COMPILER: gcc-13 with FORTIFY_SOURCE

WARNING:
```
In function 'lookupCommandLogic',
    inlined from 'lookupCommandBySdsLogic' at server.c:3139:32:
server.c:3102:66: error: '*(robj **)argv' may be used uninitialized [-Werror=maybe-uninitialized]
 3102 |     struct redisCommand *base_cmd = dictFetchValue(commands, argv[0]->ptr);
      |                                                              ~~~~^~~
```

REASON: The compiler thinks that the `argc` returned by `sdssplitlen()` could be 0,
resulting in an empty array of size 0 being passed to lookupCommandLogic.
this should be a false positive, `argc` can't be 0 when strings are not NULL.

SOLUTION: add an assert to let the compiler know that `argc` is positive.

## Other changes
1) Fixed `ps -p [pid]`  doesn't output `<defunct>` when using procps 4.x causing `replication
  child dies when parent is killed - diskless` test to fail.

(cherry picked from commit 42c8c61813)
2023-07-10 14:39:42 +03:00
Lior Lahav bd1dac0c6e Fix possible crash in command getkeys (#12380)
When getKeysUsingKeySpecs processes a command with more than one key-spec,
and called with a total of more than 256 keys, it'll call getKeysPrepareResult again,
but since numkeys isn't updated, getKeysPrepareResult will not bother to copy key
names from the old result (leaving these slots uninitialized). Furthermore, it did not
consider the keys it already found when allocating more space.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit b7559d9f3b)
2023-07-10 14:39:42 +03:00
sundb 25f610fc29 Use Reservoir Sampling for random sampling of dict, and fix hang during fork (#12276)
## Issue:
When a dict has a long chain or the length of the chain is longer than
the number of samples, we will never be able to sample the elements
at the end of the chain using dictGetSomeKeys().
This could mean that SRANDMEMBER can be hang in and endless loop.
The most severe case, is the pathological case of when someone uses SCAN+DEL
or SSCAN+SREM creating an unevenly distributed dict.
This was amplified by the recent change in #11692 which prevented a
down-sizing rehashing while there is a fork.

## Solution
1. Before, we will stop sampling when we reach the maximum number
  of samples, even if there is more data after the current chain.
  Now when we reach the maximum we use the Reservoir Sampling
  algorithm to fairly sample the end of the chain that cannot be sampled
2. Fix the rehashing code, so that the same as it allows rehashing for up-sizing
  during fork when the ratio is extreme, it will allow it for down-sizing as well.

Issue was introduced (or became more severe) by #11692

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit b00a235186)
2023-07-10 14:39:42 +03:00
Binbin eb64a97d33 Add missing return on -UNKILLABLE sent by master case (#12277)
We now no longer propagate scripts (started from 7.0), so this is a
very rare issue that in nearly-dead-code.

This is an overlook in #9780

(cherry picked from commit e4d183afd3)
2023-07-10 14:39:42 +03:00
Oran Agra 2ba8de9d5e Fix WAIT for clients being blocked in a module command (#12220)
So far clients being blocked and unblocked by a module command would
update the c->woff variable and so WAIT was ineffective and got released
without waiting for the command actions to propagate.

This seems to have existed since forever, but not for RM_BlockClientOnKeys.

It is problematic though to know if the module did or didn't propagate
anything in that command, so for now, instead of adding an API, we'll
just update the woff to the latest offset when unblocking, this will
cause the client to possibly wait excessively, but that's not that bad.

(cherry picked from commit 6117f28822)
2023-07-10 14:39:42 +03:00
Shaya Potter 1d2839a830 Fix memory leak when RM_Call's RUN_AS_USER fails (#12158)
previously the argv wasn't freed so would leak.  not a common case, but should be handled.

Solution: move RUN_AS_USER setup and error exit to the right place.
this way, when we do `goto cleanup` (instead of return) it'll automatically do the right thing (including autoMemoryAdd)
Removed the user argument from moduleAllocTempClient (reverted to the state before 6e993a5)

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 71e6abe423)
2023-07-10 14:39:42 +03:00
Brennan c340fd5a39 Prevent repetitive backlog trimming (#12155)
When `replicationFeedSlaves()` serializes a command, it repeatedly calls
`feedReplicationBuffer()` to feed it to the replication backlog piece by piece.
It is unnecessary to call `incrementalTrimReplicationBacklog()` for every small
amount of data added with `feedReplicationBuffer()` as the chance of the conditions
being met for trimming are very low and these frequent calls add up to a notable
performance cost. Instead, we will only attempt trimming when a new block is added
to the replication backlog.

Using redis-benchmark to saturate a local redis server indicated a performance
improvement of around 3-3.5% for 100 byte SET commands with this change.

(cherry picked from commit 40e6131ba5)
2023-07-10 14:39:42 +03:00
zhaozhao.zz 88682ca305 Free backlog only if rsi is invalid when master reboot (#12088)
When master reboot from RDB, if rsi in RDB is valid we should not free replication backlog, even if master_repl_offset or repl-offset is 0.

Since if master doesn't send any data to replicas master_repl_offset is 0, it's a valid number.

A clear example:

1. start a master and apply some write commands, the master's master_repl_offset is 0 since it has no replicas.
2. stop write commands on master, and start another instance and replicaof the master, trigger an FULLRESYNC
3. the master's master_repl_offset is still 0 (set a large number for repl-ping-replica-period), do BGSAVE and restart the master
4. master load master_repl_offset from RDB's rsi and it's still 0, and we should make sure replica can partially resync with master.

(cherry picked from commit b0dd7b3245)
2023-07-10 14:39:42 +03:00
Oran Agra f6a7c9f9ec Lua cjson and cmsgpack integer overflow issues (CVE-2022-24834)
* Fix integer overflows due to using wrong integer size.
* Add assertions / panic when overflow still happens.
* Deletion of dead code to avoid need to maintain it
* Some changes are not because of bugs, but rather paranoia.
* Improve cmsgpack and cjson test coverage.

Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2023-07-10 14:39:42 +03:00
Oran Agra 391aa4079c Redis 7.0.11 2023-04-17 15:54:03 +03:00
Oran Agra 6b17d824c5 fix false valgrind error on new hash test (#11200)
New test fails on valgrind because strtold("+inf") with valgrind returns a non-inf result
same thing is done in incr.tcl.

(cherry picked from commit c3b7bde914)
2023-04-17 15:54:03 +03:00
Oran Agra 5656cc82a6 Avoid valgrind fishy value warning on corrupt restore payloads (#10937)
The corrupt dump fuzzer uncovered a valgrind warning saying:
```
==76370== Argument 'size' of function malloc has a fishy (possibly negative) value: -3744781444216323815
```
This allocation would have failed (returning NULL) and being handled properly by redis (even before this change), but we also want to silence the valgrind warnings (which are checking that casting to ssize_t produces a non-negative value).

The solution i opted for is to explicitly fail these allocations (returning NULL), before even reaching `malloc` (which would have failed and return NULL too).

The implication is that we will not be able to support a single allocation of more than 2GB on a 32bit system (which i don't think is a realistic scenario).
i.e. i do think we could be facing cases were redis consumes more than 2gb on a 32bit system, but not in a single allocation.

The byproduct of this, is that i dropped the overflow assertions, since these will now lead to the same OOM panic we have for failed allocations.

(cherry picked from commit 599e59ebc5)
2023-04-17 15:54:03 +03:00
sundb 863fcfbf52 Use dummy allocator to make accesses defined as per standard (#11982)
NOTE: for 7.0 backport we don't declare malloc_size attributes in
zmalloc.h so that we don't take the risk of inducing any crashes in a
bugfix release, so will only have effect if LTO was enforced from
outside.

## Issue
When we use GCC-12 later or clang 9.0 later to build with `-D_FORTIFY_SOURCE=3`,
we can see the following buffer overflow:
```
=== REDIS BUG REPORT START: Cut & paste starting from here ===
6263:M 06 Apr 2023 08:59:12.915 # Redis 255.255.255 crashed by signal: 6, si_code: -6
6263:M 06 Apr 2023 08:59:12.915 # Crashed running the instruction at: 0x7f03d59efa7c

------ STACK TRACE ------
EIP:
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f03d59efa7c]

Backtrace:
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f03d599b520]
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f03d59efa7c]
/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f03d599b476]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f03d59817f3]
/lib/x86_64-linux-gnu/libc.so.6(+0x896f6)[0x7f03d59e26f6]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x2a)[0x7f03d5a8f76a]
/lib/x86_64-linux-gnu/libc.so.6(+0x1350c6)[0x7f03d5a8e0c6]
src/redis-server 127.0.0.1:25111(+0xd5e80)[0x557cddd3be80]
src/redis-server 127.0.0.1:25111(feedReplicationBufferWithObject+0x78)[0x557cddd3c768]
src/redis-server 127.0.0.1:25111(replicationFeedSlaves+0x1a4)[0x557cddd3cbc4]
src/redis-server 127.0.0.1:25111(+0x8721a)[0x557cddced21a]
src/redis-server 127.0.0.1:25111(call+0x47a)[0x557cddcf38ea]
src/redis-server 127.0.0.1:25111(processCommand+0xbf4)[0x557cddcf4aa4]
src/redis-server 127.0.0.1:25111(processInputBuffer+0xe6)[0x557cddd22216]
src/redis-server 127.0.0.1:25111(readQueryFromClient+0x3a8)[0x557cddd22898]
src/redis-server 127.0.0.1:25111(+0x1b9134)[0x557cdde1f134]
src/redis-server 127.0.0.1:25111(aeMain+0x119)[0x557cddce5349]
src/redis-server 127.0.0.1:25111(main+0x466)[0x557cddcd6716]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f03d5982d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f03d5982e40]
src/redis-server 127.0.0.1:25111(_start+0x25)[0x557cddcd7025]
```

The main reason is that when FORTIFY_SOURCE is enabled, GCC or clang will enhance some
common functions, such as `strcpy`, `memcpy`, `fgets`, etc, so that they can detect buffer
overflow errors and stop program execution, thus improving the safety of the program.
We use `zmalloc_usable_size()` everywhere to use memory blocks, but that is an abuse since the
malloc_usable_size() isn't meant for this kind of use, it is for diagnostics only. That is also why the
behavior is flaky when built with _FORTIFY_SOURCE, the compiler can sense that we reach outside
the allocated block and SIGABRT.

### Solution
If we need to use the additional memory we got, we need to use a dummy realloc with `alloc_size` attribute
and no inlining, (see `extend_to_usable`) to let the compiler see the large of memory we need to use.
This can either be an implicit call inside `z*usable` that returns the size, so that the caller doesn't have any
other worry, or it can be a normal zmalloc call which means that if the caller wants to use
zmalloc_usable_size it must also use extend_to_usable.

### Changes

This PR does the following:
1) rename the current z[try]malloc_usable family to z[try]malloc_internal and don't expose them to users outside zmalloc.c,
2) expose a new set of `z[*]_usable` family that use z[*]_internal and `extend_to_usable()` implicitly, the caller gets the
  size of the allocation and it is safe to use.
3) go over all the users of `zmalloc_usable_size` and convert them to use the `z[*]_usable` family if possible.
4) in the places where the caller can't use `z[*]_usable` and store the real size, and must still rely on zmalloc_usable_size,
  we still make sure that the allocation used `z[*]_usable` (which has a call to `extend_to_usable()`) and ignores the
  returning size, this way a later call to `zmalloc_usable_size` is still safe.

[4] was done for module.c and listpack.c, all the others places (sds, reply proto list, replication backlog, client->buf)
are using [3].

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit e0b378d22b)
2023-04-17 15:54:03 +03:00
Slava Koyfman 90f489b0a4 Disconnect pub-sub subscribers when revoking `allchannels` permission (#11992)
The existing logic for killing pub-sub clients did not handle the `allchannels`
permission correctly. For example, if you:

    ACL SETUSER foo allchannels

Have a client authenticate as the user `foo` and subscribe to a channel, and then:

    ACL SETUSER foo resetchannels

The subscribed client would not be disconnected, though new clients under that user
would be blocked from subscribing to any channels.

This was caused by an incomplete optimization in `ACLKillPubsubClientsIfNeeded`
checking whether the new channel permissions were a strict superset of the old ones.

(cherry picked from commit f38aa6bfb7)
2023-04-17 15:54:03 +03:00
Binbin 1788568425 Fix fork done handler wrongly update fsync metrics and enhance AOF_ FSYNC_ALWAYS (#11973)
This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang.

The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.

Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.

The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.

(cherry picked from commit cb17178658)
2023-04-17 15:54:03 +03:00
chendianqiang 1c1bd618c9 fix hincrbyfloat not to create a key if the new value is invalid (#11149)
Check the validity of the value before performing the create operation,
prevents new data from being generated even if the request fails to execute.

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: chendianqiang <chendianqiang@meituan.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit bc7fe41e58)
2023-04-17 15:54:03 +03:00
Oran Agra f651708a19 Redis 7.0.10 2023-03-20 19:16:50 +02:00
Oran Agra 6956d15b77 Avoid assertion when MSETNX is used with the same key twice (CVE-2023-28425)
Using the same key twice in MSETNX command would trigger an assertion.

This reverts #11594 (introduced in Redis 7.0.8)
2023-03-20 19:16:50 +02:00
Binbin 66ff5e6974 Fix tail->repl_offset update in feedReplicationBuffer (#11905)
In #11666, we added a while loop and will split a big reply
node to multiple nodes. The update of tail->repl_offset may
be wrong. Like before #11666, we would have created at most
one new reply node, and now we will create multiple nodes if
it is a big reply node.

Now we are creating more than one node, and the tail->repl_offset
of all the nodes except the last one are incorrect. Because we
update master_repl_offset at the beginning, and then use it to
update the tail->repl_offset. This would have lead to an assertion
during PSYNC, a test was added to validate that case.

Besides that, the calculation of size was adjusted to fix
tests that failed due to a combination of a very low backlog size,
and some thresholds of that get violated because of the relatively
high overhead of replBufBlock. So now if the backlog size / 16 is too
small, we'll take PROTO_REPLY_CHUNK_BYTES instead.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 7997874f4d)
2023-03-20 19:16:50 +02:00
xbasel 8869589430 Large blocks of replica client output buffer could lead to psync loops and unnecessary memory usage (#11666)
This can happen when a key almost equal or larger than the
client output buffer limit of the replica is written.

Example:
1. DB is empty
2. Backlog size is 1 MB
3. Client out put buffer limit is 2 MB
4. Client writes a 3 MB key
5. The shared replication buffer will have a single node which contains
the key written above, and it exceeds the backlog size.

At this point the client output buffer usage calculation will report the
replica buffer to be 3 MB (or more) even after sending all the data to
the replica.
The primary drops the replica connection for exceeding the limits,
the replica reconnects and successfully executes partial sync but the
primary will drop the connection again because the buffer usage is still
3 MB. This happens over and over.

To mitigate the problem, this fix limits the maximum size of a single
backlog node to be (repl_backlog_size/16). This way a single node can't
exceed the limits of the COB (the COB has to be larger than the
backlog).
It also means that if the backlog has some excessive data it can't trim,
it would be at most about 6% overuse.

other notes:
1. a loop was added in feedReplicationBuffer which caused a massive LOC
  change due to indentation, the actual changes are just the `min(max` and the loop.
3. an unrelated change in an existing test to speed up a server termination which took 10 seconds.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 7be7834e65)
2023-03-20 19:16:50 +02:00
Binbin f8ae7a414c Fix the bug that CLIENT REPLY OFF|SKIP cannot receive push notifications (#11875)
This bug seems to be there forever, CLIENT REPLY OFF|SKIP will
mark the client with CLIENT_REPLY_OFF or CLIENT_REPLY_SKIP flags.
With these flags, prepareClientToWrite called by addReply* will
return C_ERR directly. So the client can't receive the Pub/Sub
messages and any other push notifications, e.g client side tracking.

In this PR, we adding a CLIENT_PUSHING flag, disables the reply
silencing flags. When adding push replies, set the flag, after the reply,
clear the flag. Then add the flag check in prepareClientToWrite.

Fixes #11874

Note, the SUBSCRIBE command response is a bit awkward,
see https://github.com/redis/redis-doc/pull/2327

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 416842e6c0)
2023-03-20 19:16:50 +02:00
Madelyn Olson 17181517ec Always compact nodes in stream listpacks after creating new nodes (#11885)
This change attempts to alleviate a minor memory usage degradation for Redis 6.2 and onwards when using rather large objects (~2k) in streams. Introduced in #6281, we pre-allocate the head nodes of a stream to be 4kb, to limit the amount of unnecessary initial reallocations that are done. However, if we only ever allocate one object because 2 objects exceeds the max_stream_entry_size, we never actually shrink it to fit the single item. This can lead to a lot of excessive memory usage. For smaller item sizes this becomes less of an issue, as the overhead decreases as the items become smaller in size.

This commit also changes the MEMORY USAGE of streams, since it was reporting the lpBytes instead of the allocated size. This introduced an observability issue when diagnosing the memory issue, since Redis reported the same amount of used bytes pre and post change, even though the new implementation allocated more memory.

(cherry picked from commit 2bb29e4aa3)
2023-03-20 19:16:50 +02:00
Ozan Tezcan a39032214d Ignore RM_Call deny-oom flag if maxmemory is zero (#11319)
If a command gets an OOM response and then if we set maxmemory to zero
to disable the limit, server.pre_command_oom_state never gets updated
and it stays true. As RM_Call() calls with "respect deny-oom" flag checks
server.pre_command_oom_state, all calls will fail with OOM.

Added server.maxmemory check in RM_Call() to process deny-oom flag
only if maxmemory is configured.

(cherry picked from commit 18920813a9)
2023-03-20 19:16:50 +02:00
Oran Agra 86920532f7 Redis 7.0.9 2023-02-28 18:32:34 +02:00
Oran Agra 2a2a582e7c Integer Overflow in RAND commands can lead to assertion (CVE-2023-25155)
Issue happens when passing a negative long value that greater than
the max positive value that the long can store.
2023-02-28 18:32:34 +02:00
Tom Levy 0825552565 String pattern matching had exponential time complexity on pathological patterns (CVE-2022-36021)
Authenticated users can use string matching commands with a
specially crafted pattern to trigger a denial-of-service attack on Redis,
causing it to hang and consume 100% CPU time.
2023-02-28 18:32:34 +02:00
ranshid 7091b495a0 Fix possible memory corruption in FLUSHALL when a client watches more than one key (#11854)
Avoid calling unwatchAllKeys when running touchAllWatchedKeysInDb (which was unnecessary)
This can potentially lead to use-after-free and memory corruption when the next entry
pointer held by the watched keys iterator is freed when unwatching all keys of a specific client.
found with address sanitizer, added a test which will not always fail (depending on the random
dict hashing seed)
problem introduced in #9829 (Reids 7.0)

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 18017df7c1)
2023-02-28 18:32:34 +02:00
Madelyn Olson ddf984d058 Prevent Redis from crashing from key tracking invalidations (#11814) 2023-02-28 18:32:34 +02:00
judeng feb796d312 add test case and comments for active expiry in the writeable replica (#11789)
This test case is to cover a edge scenario: when a writable replica enabled AOF
at the same time, active expiry keys which was created in writable replicas should
propagate to the AOF file, and some versions might crash (fixed by #11615).
For details, please refer to #11778

(cherry picked from commit 40659c3424)
2023-02-28 18:32:34 +02:00
zhaozhao.zz c2bedf2dd5 correct cluster inbound link keepalive time (#11785)
(cherry picked from commit a35e08370a)
2023-02-28 18:32:34 +02:00
guybe7 2db720591b SCAN/RANDOMKEY and lazy-expire (#11788)
Starting from Redis 7.0 (#9890) we started wrapping everything a command
 propagates with MULTI/EXEC. The problem is that both SCAN and RANDOMKEY can
lazy-expire arbitrary keys (similar behavior to active-expire), and put DELs in a transaction.

Fix: When these commands are called without a parent exec-unit (e.g. not in EVAL or
MULTI) we avoid wrapping their DELs in a transaction (for the same reasons active-expire
and eviction avoids a transaction)

This PR adds a per-command flag that indicates that the command may touch arbitrary
keys (not the ones in the arguments), and uses that flag to avoid the MULTI-EXEC.
For now, this flag is internal, since we're considering other solutions for the future.

Note for cluster mode: if SCAN/RANDOMKEY is inside EVAL/MULTI it can still cause the
same situation (as it always did), but it won't cause a CROSSSLOT because replicas and AOF
do not perform slot checks.
The problem with the above is mainly for 3rd party ecosystem tools that propagate commands
from master to master, or feed an AOF file with redis-cli into a master.
This PR aims to fix the regression in redis 7.0, and we opened #11792 to try to handle the
bigger problem with lazy expire better for another release.

(cherry picked from commit fd82bccd0e)
2023-02-28 18:32:34 +02:00
Ran Shidlansik ab05b28b22 fix cluster propagation in case of disconnected cluster node, see #11752
The mentioned PR which was fixed before 7.2 needed these adjustments in
order to fix the problem in redis 7.0.
2023-02-28 18:32:34 +02:00
Harkrishn Patro ca0b6caeed Propagate message to a node only if the cluster link is healthy. (#11752)
Currently while a sharded pubsub message publish tries to propagate the message across the cluster, a NULL check is missing for clusterLink. clusterLink could be NULL if the link is causing memory beyond the set threshold cluster-link-sendbuf-limit and server terminates the link.

This change introduces two things:

Avoids the engine crashes on the publishing node if a message is tried to be sent to a node and the link is NULL.
Adds a debugging tool CLUSTERLINK KILL to terminate the clusterLink between two nodes.

(cherry picked from commit fd3975684a)
2023-02-28 18:32:34 +02:00
Binbin 5aaa1a271c Document some fields history of CLIENT LIST command (#11729)
Change history:
- `user` added in 6.0.0, 0f42447a0e
- `argv-mem` and `tot-mem` added in 6.2.0, bea40e6a41
- `redir` added in 6.2.0, dd1f20edc5
- `resp` added in 7.0.0, 7c376398b1
- `multi-mem` added in 7.0.0, 2753429c99
- `rbs` and `rbp` added in 7.0.0, 47c51d0c78
- `ssub` added in 7.0.3, 35c2ee8716

(cherry picked from commit e7f35edb13)
2023-02-28 18:32:34 +02:00
uriyage af80a4a554 Optimization: sdsRemoveFreeSpace to avoid realloc on noop (#11766)
In #7875 (Redis 6.2), we changed the sds alloc to be the usable allocation
size in order to:

> reduce the need for realloc calls by making the sds implicitly take over
the internal fragmentation

This change was done most sds functions, excluding `sdsRemoveFreeSpace` and
`sdsResize`, the reason is that in some places (e.g. clientsCronResizeQueryBuffer)
we call sdsRemoveFreeSpace when we see excessive free space and want to trim it.
so if we don't trim it exactly to size, the caller may still see excessive free space and
call it again and again.

However, this resulted in some excessive calls to realloc, even when there's no need
and it's gonna be a no-op (e.g. when reducing 15 bytes allocation to 13).

It turns out that a call for realloc with jemalloc can be expensive even if it ends up
doing nothing, so this PR adds a check using `je_nallocx`, which is cheap to avoid
the call for realloc.

in addition to that this PR unifies sdsResize and sdsRemoveFreeSpace into common
code. the difference between them was that sdsResize would avoid using SDS_TYPE_5,
since it want to keep the string ready to be resized again, while sdsRemoveFreeSpace
would permit using SDS_TYPE_5 and get an optimal memory consumption.
now both methods take a `would_regrow` argument that makes it more explicit.

the only actual impact of that is that in clientsCronResizeQueryBuffer we call both sdsResize
and sdsRemoveFreeSpace for in different cases, and we now prevent the use of SDS_TYPE_5 in both.

The new test that was added to cover this concern used to pass before this PR as well,
this PR is just a performance optimization and cleanup.

Benchmark:
`redis-benchmark -c 100 -t set  -d 512 -P 10  -n  100000000`
on i7-9850H with jemalloc, shows improvement from 1021k ops/sec to 1067k (average of 3 runs).
some 4.5% improvement.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 46393f9819)
2023-02-28 18:32:34 +02:00