Commit Graph

9503 Commits

Author SHA1 Message Date
Oran Agra 5895d119b1 Redis 6.0.16 2021-10-04 13:59:19 +03:00
sundb aabe95125d Fix the timing of read and write events under kqueue (#9416)
Normally we execute the read event first and then the write event.
When the barrier is set, we will do it reverse.
However, under `kqueue`, if an `fd` has both read and write events,
reading the event using `kevent` will generate two events, which will
result in uncontrolled read and write timing.

This also means that the guarantees of AOF `appendfsync` = `always` are
not met on MacOS without this fix.

The main change to this pr is to cache the events already obtained when reading
them, so that if the same `fd` occurs again, only the mask in the cache is updated,
rather than a new event is generated.

This was exposed by the following test failure on MacOS:
```
*** [err]: AOF fsync always barrier issue in tests/integration/aof.tcl
Expected 544 != 544 (context: type eval line 26 cmd {assert {$size1 != $size2}} proc ::test)
```

(cherry picked from commit 306a5ccd2d)
2021-10-04 13:59:19 +03:00
Wang Yuan 03cb27e8ec Fix the wrong detection of sync_file_range system call (#9371)
If we want to check `defined(SYNC_FILE_RANGE_WAIT_BEFORE)`, we should include fcntl.h.
otherwise, SYNC_FILE_RANGE_WAIT_BEFORE is not defined, and there is alway not `sync_file_range` system call.
Introduced by #8532

(cherry picked from commit 8edc3cd62c)
2021-10-04 13:59:19 +03:00
Ewg-c dde1c975b8 Minor refactoring for rioConnRead and adding errno (#9280)
minor refactoring for rioConnRead and adding errno

(cherry picked from commit a403816405)
2021-10-04 13:59:19 +03:00
zhaozhao.zz 5a82df05aa more strict check in rioConnRead (#7564)
(cherry picked from commit da840e9851)
2021-10-04 13:59:19 +03:00
Oran Agra 12a6b438a3 Fix harmless bug in rioConnRead (#7557)
this code is in use only if the master is disk-based, and the replica is
diskless. In this case we use a buffered reader, but we must avoid reading
past the rdb file, into the command stream. which Luckly rdb.c doesn't
really attempt to do (it knows how much it should read).

When rioConnRead detects that the extra buffering attempt reaches beyond
the read limit it should read less, but if the caller actually requested
more, then it should return with an error rather than a short read. the
bug would have resulted in short read.

in order to fix it, the code must consider the real requested size, and
not the extra buffering size.

(cherry picked from commit 40d7fca368)
2021-10-04 13:59:19 +03:00
YiyuanGUO c6ad876774 Fix integer overflow in _sdsMakeRoomFor (CVE-2021-41099) 2021-10-04 13:59:19 +03:00
Oran Agra f6a40570fa Fix ziplist and listpack overflows and truncations (CVE-2021-32627, CVE-2021-32628)
- fix possible heap corruption in ziplist and listpack resulting by trying to
  allocate more than the maximum size of 4GB.
- prevent ziplist (hash and zset) from reaching size of above 1GB, will be
  converted to HT encoding, that's not a useful size.
- prevent listpack (stream) from reaching size of above 1GB.
- XADD will start a new listpack if the new record may cause the previous
  listpack to grow over 1GB.
- XADD will respond with an error if a single stream record is over 1GB
- List type (ziplist in quicklist) was truncating strings that were over 4GB,
  now it'll respond with an error.
2021-10-04 13:59:19 +03:00
meir@redislabs.com 666ed7facf Fix invalid memory write on lua stack overflow {CVE-2021-32626}
When LUA call our C code, by default, the LUA stack has room for 20
elements. In most cases, this is more than enough but sometimes it's not
and the caller must verify the LUA stack size before he pushes elements.

On 3 places in the code, there was no verification of the LUA stack size.
On specific inputs this missing verification could have lead to invalid
memory write:
1. On 'luaReplyToRedisReply', one might return a nested reply that will
   explode the LUA stack.
2. On 'redisProtocolToLuaType', the Redis reply might be deep enough
   to explode the LUA stack (notice that currently there is no such
   command in Redis that returns such a nested reply, but modules might
   do it)
3. On 'ldbRedis', one might give a command with enough arguments to
   explode the LUA stack (all the arguments will be pushed to the LUA
   stack)

This commit is solving all those 3 issues by calling 'lua_checkstack' and
verify that there is enough room in the LUA stack to push elements. In
case 'lua_checkstack' returns an error (there is not enough room in the
LUA stack and it's not possible to increase the stack), we will do the
following:
1. On 'luaReplyToRedisReply', we will return an error to the user.
2. On 'redisProtocolToLuaType' we will exit with panic (we assume this
   scenario is rare because it can only happen with a module).
3. On 'ldbRedis', we return an error.
2021-10-04 13:59:19 +03:00
meir@redislabs.com 6ac3c0b7ab Fix protocol parsing on 'ldbReplParseCommand' (CVE-2021-32672)
The protocol parsing on 'ldbReplParseCommand' (LUA debugging)
Assumed protocol correctness. This means that if the following
is given:
*1
$100
test
The parser will try to read additional 94 unallocated bytes after
the client buffer.
This commit fixes this issue by validating that there are actually enough
bytes to read. It also limits the amount of data that can be sent by
the debugger client to 1M so the client will not be able to explode
the memory.
2021-10-04 13:59:19 +03:00
Oran Agra 5674b0057f Prevent unauthenticated client from easily consuming lots of memory (CVE-2021-32675)
This change sets a low limit for multibulk and bulk length in the
protocol for unauthenticated connections, so that they can't easily
cause redis to allocate massive amounts of memory by sending just a few
characters on the network.
The new limits are 10 arguments of 16kb each (instead of 1m of 512mb)
2021-10-04 13:59:19 +03:00
Oran Agra bb7597f46e Fix redis-cli / redis-sential overflow on some platforms (CVE-2021-32762)
The redis-cli command line tool and redis-sentinel service may be vulnerable
to integer overflow when parsing specially crafted large multi-bulk network
replies. This is a result of a vulnerability in the underlying hiredis
library which does not perform an overflow check before calling the calloc()
heap allocation function.

This issue only impacts systems with heap allocators that do not perform their
own overflow checks. Most modern systems do and are therefore not likely to
be affected. Furthermore, by default redis-sentinel uses the jemalloc allocator
which is also not vulnerable.
2021-10-04 13:59:19 +03:00
Oran Agra a30d367a71 Fix Integer overflow issue with intsets (CVE-2021-32687)
The vulnerability involves changing the default set-max-intset-entries
configuration parameter to a very large value and constructing specially
crafted commands to manipulate sets
2021-10-04 13:59:19 +03:00
Oran Agra e0cf85b848 Redis 6.0.15 2021-07-21 21:07:02 +03:00
Huang Zhw 5f49f4fb40 On 32 bit platform, the bit position of GETBIT/SETBIT/BITFIELD/BITCOUNT,BITPOS may overflow (see CVE-2021-32761) (#9191)
GETBIT, SETBIT may access wrong address because of wrap.
BITCOUNT and BITPOS may return wrapped results.
BITFIELD may access the wrong address but also allocate insufficient memory and segfault (see CVE-2021-32761).

This commit uses `uint64_t` or `long long` instead of `size_t`.
related https://github.com/redis/redis/pull/8096

At 32bit platform:
> setbit bit 4294967295 1
(integer) 0
> config set proto-max-bulk-len 536870913
OK
> append bit "\xFF"
(integer) 536870913
> getbit bit 4294967296
(integer) 0

When the bit index is larger than 4294967295, size_t can't hold bit index. In the past,  `proto-max-bulk-len` is limit to 536870912, so there is no problem.

After this commit, bit position is stored in `uint64_t` or `long long`. So when `proto-max-bulk-len > 536870912`, 32bit platforms can still be correct.

For 64bit platform, this problem still exists. The major reason is bit pos 8 times of byte pos. When proto-max-bulk-len is very larger, bit pos may overflow.
But at 64bit platform, we don't have so long string. So this bug may never happen.

Additionally this commit add a test cost `512MB` memory which is tag as `large-memory`. Make freebsd ci and valgrind ci ignore this test.
* This test is disabled in this version since bitops doesn't rely on
proto-max-bulk-len. some of the overflows can still occur so we do want
the fixes.

(cherry picked from commit 71d452876e)
2021-07-21 21:07:02 +03:00
Huang Zhw 2cd271e5d5 Fix missing separator in module info line (usedby and using lists) (#9241)
Fix module info genModulesInfoStringRenderModulesList lack separator when there's more than one module in the list.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 1895e134a7)
2021-07-21 21:07:02 +03:00
Binbin 7d9878e4e5 SMOVE only notify dstset when the addition is successful. (#9244)
in case dest key already contains the member, the dest key isn't modified, so the command shouldn't invalidate watch.

(cherry picked from commit 11dc4e59b3)
2021-07-21 21:07:02 +03:00
qetu3790 240ff70ae6 Set TCP keepalive on inbound clusterbus connections (#9230)
Set TCP keepalive on inbound clusterbus connections to prevent memory leak

(cherry picked from commit f03af47a34)
2021-07-21 21:07:02 +03:00
Wen Hui 3941d2e062 redis-cli cluster import support source and target that need auth (#7994)
Make it possible for redis-cli cluster import to work with source and
target that require AUTH.

Adding two different flags --cluster-from-user, --cluster-from-pass
and --cluster-askpass for source node authentication.
Also for target authentication, using existing --user and --pass flag.

Example:

./redis-cli --cluster import 127.0.0.1:7000 --cluster-from 127.0.0.1:6379 --pass 1234 --user default --cluster-from-user default --cluster-from-pass 123456

./redis-cli --cluster import 127.0.0.1:7000 --cluster-from 127.0.0.1:6379 --askpass --cluster-from-user default --cluster-from-askpass


(cherry picked from commit 639b73cd2a)
2021-07-21 21:07:02 +03:00
Oran Agra 62bc09d92b Test infra, handle RESP3 attributes and big-numbers and bools (#9235)
- promote the code in DEBUG PROTOCOL to addReplyBigNum
- DEBUG PROTOCOL ATTRIB skips the attribute when client is RESP2
- networking.c addReply for push and attributes generate assertion when
  called on a RESP2 client, anything else would produce a broken
  protocol that clients can't handle.

(cherry picked from commit 6a5bac309e)
(cherry picked from commit 7f38aa8bc7)
2021-07-21 21:07:02 +03:00
Oran Agra c5446aca4c Tests: add a way to read raw RESP protocol reponses (#9193)
This makes it possible to distinguish between null response and an empty
array (currently the tests infra translates both to an empty string/list)

(cherry picked from commit 7103367ad4)
(cherry picked from commit e04bce2f01)
2021-07-21 21:07:02 +03:00
Huang Zhw 4f59673df5 redis-cli cluster import command may issue wrong MIGRATE command. (#8945)
In clusterManagerCommandImport strcat was used to concat COPY and
REPLACE, the space maybe not enough.
If we use --cluster-replace but not --cluster-copy, the MIGRATE
command contained COPY instead of REPLACE.

(cherry picked from commit a049f6295a)
(cherry picked from commit d4771a995e)
2021-07-21 21:07:02 +03:00
Binbin 1655576e23 Fix accidental deletion of sinterstore command when we meet wrong type error. (#9032)
SINTERSTORE would have deleted the dest key right away,
even when later on it is bound to fail on an (WRONGTYPE) error.

With this change it first picks up all the input keys, and only later
delete the dest key if one is empty.

Also add more tests for some commands.
Mainly focus on
- `wrong type error`:
	expand test case (base on sinter bug) in non-store variant
	add tests for store variant (although it exists in non-store variant, i think it would be better to have same tests)
- the dstkey result when we meet `non-exist key (empty set)` in *store

sdiff:
- improve test case about wrong type error (the one we found in sinter, although it is safe in sdiff)
- add test about using non-exist key (treat it like an empty set)
sdiffstore:
- according to sdiff test case, also add some tests about `wrong type error` and `non-exist key`
- the different is that in sdiffstore, we will consider the `dstkey` result

sunion/sunionstore add more tests (same as above)

sinter/sinterstore also same as above ...

(cherry picked from commit b8a5da80c4)
(cherry picked from commit f4702b8b7a)
2021-07-21 21:07:02 +03:00
Jason Elbaum 8c0f06c2e6 Change return value type for ZPOPMAX/MIN in RESP3 (#8981)
When using RESP3, ZPOPMAX/ZPOPMIN should return nested arrays for consistency
with other commands (e.g. ZRANGE).

We do that only when COUNT argument is present (similarly to how LPOP behaves).
for reasoning see https://github.com/redis/redis/issues/8824#issuecomment-855427955

This is a breaking change only when RESP3 is used, and COUNT argument is present!

(cherry picked from commit 7f342020dc)
(cherry picked from commit caaad2d686)
2021-07-21 21:07:02 +03:00
Rob Snyder 32d923f85a Fix ziplist length updates on bigendian platforms (#2080)
Adds call to intrev16ifbe to ensure ZIPLIST_LENGTH is compared correctly

(cherry picked from commit eaa52719a3)
(cherry picked from commit 4c18123085)
2021-07-21 21:07:02 +03:00
perryitay 8df81c0326 Fail EXEC command in case a watched key is expired (#9194)
There are two issues fixed in this commit:
1. we want to fail the EXEC command in case there is a watched key that's logically
   expired but not yet deleted by active expire or lazy expire.
2. we saw that currently cache time is update in every `call()` (including nested calls),
   this time is being also being use for the isKeyExpired comparison, we want to update
   the cache time only in the first call (execCommand)

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit ac8b1df885)
2021-07-21 21:07:02 +03:00
Huang Zhw 78c1623553 Do not install a file event to send data to rewrite child when parent stop sending diff to child in aof rewrite. (#8767)
In aof rewrite, when parent stop sending data to child, if there is
new rewrite data, aofChildWriteDiffData write event will be installed.
Then this event is issued and deletes the file event without do anyting.
This will happen over and over again until aof rewrite finish.

This bug used to waste a few system calls per excessive wake-up
(epoll_ctl and epoll_wait) per cycle, each cycle triggered by receiving
a write command from a client.

(cherry picked from commit cb961d8c8e)
2021-07-21 21:07:02 +03:00
Omer Shadmi 518fd2dcc4 Avoid exiting to allow diskless loading to recover from RDB short read on module AUX data (#9199)
Currently a replica is able to recover from a short read (when diskless loading
is enabled) and avoid crashing/exiting, replying to the master and then the rdb
could be sent again by the master for another load attempt by the replica.
There were a few scenarios that were not behaving similarly, such as when
there is no end-of-file marker, or when module aux data failed to load, which
should be allowed to occur due to a short read.

(cherry picked from commit f06d782f5a)
2021-07-21 21:07:02 +03:00
Oran Agra d80c87111b Fix race in client side tracking (#9116)
The `Tracking gets notification of expired keys` test in tracking.tcl
used to hung in valgrind CI quite a lot.

It turns out the reason is that with valgrind and a busy machine, the
server cron active expire cycle could easily run in the same event loop
as the command that created `mykey`, so that when they key got expired,
there were two change events to broadcast, one that set the key and one
that expired it, but since we used raxTryInsert, the client that was
associated with the "last" change was the one that created the key, so
the NOLOOP filtered that event.

This commit adds a test that reproduces the problem by using lazy expire
in a multi-exec which makes sure the key expires in the same event loop
as the one that added it.

(cherry picked from commit 9b564b525d)
2021-07-21 21:07:02 +03:00
Maxim Galushka 981953a6ef redis-cli: support for REDIS_REPLY_SET in CSV and RAW output. (#7338)
Fixes #6792. Added support of REDIS_REPLY_SET in raw and csv output of `./redis-cli`

Test:

run commands to test:
  ./redis-cli -3 --csv COMMAND
  ./redis-cli -3 --raw COMMAND

Now they are returning resuts, were failing with: "Unknown reply type: 10" before the change.

(cherry picked from commit 96bb078577)
2021-07-21 21:07:02 +03:00
YaacovHazan da127ed3dd unregister AE_READABLE from the read pipe in backgroundSaveDoneHandlerSocket (#8991)
In diskless replication, we create a read pipe for the RDB, between the child and the parent.
When we close this pipe (fd), the read handler also needs to be removed from the event loop (if it still registered).
Otherwise, next time we will use the same fd, the registration will be fail (panic), because
we will use EPOLL_CTL_MOD (the fd still register in the event loop), on fd that already removed from epoll_ctl

(cherry picked from commit 501d775583)
2021-07-21 21:07:02 +03:00
guybe7 fe417b06d6 ReplicationCron: Prevent invalid access to freed pointer (#8799)
Fixes #8797 

(cherry picked from commit a60016e061)
2021-07-21 21:07:02 +03:00
guybe7 ea0a376432 Add a timeout mechanism for replicas stuck in fullsync (#8762)
Starting redis 6.0 (part of the TLS feature), diskless master uses pipe from the fork
child so that the parent is the one sending data to the replicas.
This mechanism has an issue in which a hung replica will cause the master to wait
for it to read the data sent to it forever, thus preventing the fork child from terminating
and preventing the creations of any other forks.

This PR adds a timeout mechanism, much like the ACK-based timeout,
we disconnect replicas that aren't reading the RDB file fast enough.

(cherry picked from commit d63d02601f)
2021-07-21 21:07:02 +03:00
Oran Agra 18429ce594 if diskless repl child is killed, make sure to reap the pid (#7742)
Starting redis 6.0 and the changes we made to the diskless master to be
suitable for TLS, I made the master avoid reaping (wait3) the pid of the
child until we know all replicas are done reading their rdb.

I did that in order to avoid a state where the rdb_child_pid is -1 but
we don't yet want to start another fork (still busy serving that data to
replicas).

It turns out that the solution used so far was problematic in case the
fork child was being killed (e.g. by the kernel OOM killer), in that
case there's a chance that we currently disabled the read event on the
rdb pipe, since we're waiting for a replica to become writable again.
and in that scenario the master would have never realized the child
exited, and the replica will remain hung too.
Note that there's no mechanism to detect a hung replica while it's in
rdb transfer state.

The solution here is to add another pipe which is used by the parent to
tell the child it is safe to exit. this mean that when the child exits,
for whatever reason, it is safe to reap it.

Besides that, i'm re-introducing an adjustment to REPLCONF ACK which was
part of #6271 (Accelerate diskless master connections) but was dropped
when that PR was rebased after the TLS fork/pipe changes (5a47794).
Now that RdbPipeCleanup no longer calls checkChildrenDone, and the ACK
has chance to detect that the child exited, it should be the one to call
it so that we don't have to wait for cron (server.hz) to do that.

(cherry picked from commit 573246f73c)
2021-07-21 21:07:02 +03:00
M Sazzadul Hoque f0a91d5a00
Fix 6.0.14 release month to June (#9028) 2021-06-01 20:09:50 +03:00
Oran Agra 941d58d4c9 Redis 6.0.14 2021-06-01 17:03:44 +03:00
Oran Agra dd27c4e15e Fix integer overflow in STRALGO LCS (CVE-2021-32625) (#9011)
An integer overflow bug in Redis version 6.0 or newer can be exploited using the
STRALGO LCS command to corrupt the heap and potentially result with remote code
execution. This is a result of an incomplete fix by CVE-2021-29477.

(cherry picked from commit 1ddecf1958)
2021-06-01 17:03:44 +03:00
patpatbear 0506d85122 sinterstore: add missing keyspace del event when any source set not exists. (#8949)
this patch fixes sinterstore by add missing keyspace del event when any source set not exists.

Co-authored-by: srzhao <srzhao@sysnew.com>
(cherry picked from commit 46d9f31e94)
2021-06-01 17:03:44 +03:00
Oran Agra 0d05570a82 Fix crash unlinking a stream with groups rax and no groups (#8932)
When estimating the effort for unlink, we try to compute the effort of
the first group and extrapolate.
If there's a groups rax that's empty, there'a an assertion.

reproduce:
xadd s * a b
xgroup create s bla $
xgroup destroy s bla
unlink s

(cherry picked from commit 97108845e2)
2021-06-01 17:03:44 +03:00
ikeberlein 1d4ebefbff
Make pubsub mode grep friendly (#8998) 2021-05-29 20:08:33 -07:00
Oran Agra 68e93c22c3 Redis 6.0.13 2021-05-03 22:56:49 +03:00
Oran Agra adf8f6f63f Resolve nonsense static analysis warnings
(cherry picked from commit fd7d51c353)
2021-05-03 22:56:49 +03:00
Oran Agra 394614a5f9 Fix integer overflow in STRALGO LCS (CVE-2021-29477)
An integer overflow bug in Redis version 6.0 or newer could be exploited using
the STRALGO LCS command to corrupt the heap and potentially result with remote
code execution.

(cherry picked from commit f0c5f920d0)
2021-05-03 22:56:49 +03:00
Oran Agra 789f101560 Fix integer overflow in intset (CVE-2021-29478)
An integer overflow bug in Redis 6.2 could be exploited to corrupt the heap and
potentially result with remote code execution.

The vulnerability involves changing the default set-max-intset-entries
configuration value, creating a large set key that consists of integer values
and using the COPY command to duplicate it.

The integer overflow bug exists in all versions of Redis starting with 2.6,
where it could result with a corrupted RDB or DUMP payload, but not exploited
through COPY (which did not exist before 6.2).

(cherry picked from commit 29900d4e6b)
2021-05-03 22:56:49 +03:00
Igor 5c6f0b26c0 Introduce fast path to bypass expensive serveClientsBlockedOnKeyByModule call (#8689)
Introduce fast path to bypass expensive serveClientsBlockedOnKeyByModule call

(cherry picked from commit cf0a909e2d)
2021-05-03 22:56:49 +03:00
Madelyn Olson 8bcca06cb0 Skip unecessary check for pong recieved (#8585)
(cherry picked from commit 98d2e0017d)
2021-05-03 22:56:49 +03:00
Oran Agra aae2d6e351 Fix crash report killed by message (#8683)
We sometimes see the crash report saying we were killed by a random
process even in cases where the crash was spontanius in redis.
for instance, crashes found by the corrupt-dump test.

It looks like this si_pid is sometimes left uninitialized, and a good
way to tell if the crash originated in redis or trigged by outside is to
look at si_code, real signal codes are always > 0, and ones generated by
kill are have si_code of 0 or below.

(cherry picked from commit b45b0d81bb)
2021-05-03 22:56:49 +03:00
Yossi Gottlieb 9469401088 Server won't start on alpine/libmusl without IPv6. (#8655)
listenToPort attempts to gracefully handle and ignore certain errors but does not store errno prior to logging, which in turn calls several libc functions that may overwrite errno.

This has been discovered due to libmusl strftime() always returning with errno set to EINVAL, which resulted with docker-library/redis#273.

(cherry picked from commit df5f543b65)
2021-05-03 22:56:49 +03:00
guybe7 1b03a0d754 Fix edge-case when a module client is unblocked (#8618)
Scenario:
1. A module client is blocked on keys with a timeout
2. Shortly before the timeout expires, the key is being populated and signaled
   as ready
3. Redis calls moduleTryServeClientBlockedOnKey (which replies to client) and
   then moduleUnblockClient
4. moduleUnblockClient doesn't really unblock the client, it writes to
   server.module_blocked_pipe and only marks the BC as unblocked.
5. beforeSleep kics in, by this time the client still exists and techincally
   timed-out. beforeSleep replies to the timeout client (double reply) and
   only then moduleHandleBlockedClients is called, reading from module_blocked_pipe
   and calling unblockClient

The solution is similar to what was done in moduleTryServeClientBlockedOnKey: we
should avoid re-processing an already-unblocked client

(cherry picked from commit e58118cda6)
2021-05-03 22:56:49 +03:00
Oran Agra f86385f70b Redis 6.0.12 2021-03-02 08:13:08 +02:00