Commit Graph

12111 Commits

Author SHA1 Message Date
Binbin 0e5a4a27ea
Call emptyData when disk-based sync rdbLoad fails (#12510)
We doing this in diskless on-empty-db mode, when diskless
loading fails, we will call emptyData to remove the half-loaded
data in case we started with an empty replica.

Now when a disk-based sync rdbLoad fails, we will call emptyData
too in case it loads partially incomplete data.

when the replica attempts another re-sync, it'll empty the dataset
again anyway, so this affects two things:
1. memory consumption in the time gap until the next rdb loading begins
2. if the unsynced replica is for some reason promoted, it would have kept
  the partial dataset instead of being empty.
2024-01-18 16:28:52 +02:00
Binbin 29e6245a05
Fix unexpected resize causing test failure (#12960)
Before #12850, we will only try to shrink the dict in serverCron,
which we can control by using a child process, but now every time
we delete a key, the shrink check will be called.

In these test (added in #12802), we meant to disable the resizing,
but druing the delete, the dict will meet the force shrink, like
2 / 128 = 0.015 < 0.2, the delete will trigger a force resize and
will cause the test to fail.

In this commit, we try to keep the load factor at 3 / 128 = 0.023,
that is, do not meet the force shrink.
2024-01-18 11:19:29 +02:00
Binbin 14b1edfd99
Fix dict resize ratio checks, avoid precision loss from integer division (#12952)
In the past we used integers to compare ratios, let us assume that
we have the following data in expanding:
```
used / size > 5
`80 / 16 > 5` is false
`81 / 16 > 5` is false
`95 / 16 > 5` is false
`96 / 16 > 5` is true
```

Because the integer result is rounded, our resize breaks the ratio
constraint, this has existed since the beginning, which resulted in
us not strictly following the ratio (shrink also has the same issue).

This PR change it to multiplication to avoid floating point
calculations.
2024-01-18 11:16:50 +02:00
Binbin 131d95f203
Fix race in slot dict resize test (#12942)
The test have a race:
```
*** [err]: Redis can rewind and trigger smaller slot resizing in tests/unit/other.tcl
Expected '[Dictionary HT]
Hash table 0 stats (main hash table):
 table size: 12
 number of elements: 2
[Expires HT]
Hash table 0 stats (main hash table):
No stats available for empty dictionaries
' to match '*table size: 8*' (context: type eval line 12 cmd {assert_match "*table size: 8*" [r debug HTSTATS 0]} proc ::test)
```

When `r del "{alice}$j"` is executed in the loop, when the key is
deleted to [9, 12], the load factor has meet HASHTABLE_MIN_FILL,
if serverCron happens to trigger slot dict resize, then the test
will fail. Because there is not way to meet HASHTABLE_MIN_FILL in
the subsequent dels.

The solution is to avoid triggering the resize in advance. We can
use multi to delete them at once, or we can disable the resize.
Since we disabled resize in the previous test, the fix also uses
the method of disabling resize.

The test is introduced in #12802.
2024-01-17 08:46:09 +02:00
Binbin ecc31bc697
Updated comments on dictResizeEnable for new dict shrink (#12946)
The new shrink was added in #12850.
Also updated outdated comments, see #11692.
2024-01-15 10:28:24 +02:00
Yanqi Lv e2b7932b34
Shrink dict when deleting dictEntry (#12850)
When we insert entries into dict, it may autonomously expand if needed.
However, when we delete entries from dict, it doesn't shrink to the
proper size. If there are few entries in a very large dict, it may cause
huge waste of memory and inefficiency when iterating.

The main keyspace dicts (keys and expires), are shrinked by cron
(`tryResizeHashTables` calls `htNeedsResize` and `dictResize`),
And some data structures such as zset and hash also do that (call
`htNeedsResize`) right after a loop of calls to `dictDelete`,
But many other dicts are completely missing that call (they can only
expand).

In this PR, we provide the ability to automatically shrink the dict when
deleting. The conditions triggering the shrinking is the same as
`htNeedsResize` used to have. i.e. we expand when we're over 100%
utilization, and shrink when we're below 10% utilization.

Additionally:
* Add `dictPauseAutoResize` so that flows that do mass deletions, will
only trigger shrinkage at the end.
* Rename `dictResize` to `dictShrinkToFit` (same logic as it used to
have, but better name describing it)
* Rename `_dictExpand` to `_dictResize` (same logic as it used to have,
but better name describing it)
 
related to discussion
https://github.com/redis/redis/pull/12819#discussion_r1409293878

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
2024-01-15 08:20:53 +02:00
zhaozhao.zz bb2b6e2927
fix scripts access wrong slot if they disagree with pre-declared keys (#12906)
Regarding how to obtain the hash slot of a key, there is an optimization
in `getKeySlot()`, it is used to avoid redundant hash calculations for
keys: when the current client is in the process of executing a command,
it can directly use the slot of the current client because the slot to
access has already been calculated in advance in `processCommand()`.

However, scripts are a special case where, in default mode or with
`allow-cross-slot-keys` enabled, they are allowed to access keys beyond
the pre-declared range. This means that the keys they operate on may not
belong to the slot of the pre-declared keys. Currently, when the
commands in a script are executed, the slot of the original client
(i.e., the current client) is not correctly updated, leading to
subsequent access to the wrong slot.

This PR fixes the above issue. When checking the cluster constraints in
a script, the slot to be accessed by the current command is set for the
original client (i.e., the current client). This ensures that
`getKeySlot()` gets the correct slot cache.

Additionally, the following modifications are made:

1. The 'sort' and 'sort_ro' commands use `getKeySlot()` instead of
`c->slot` because the client could be an engine client in a script and
can lead to potential bug.
2. `getKeySlot()` is also used in pubsub to obtain the slot for the
channel, standardizing the way slots are retrieved.
2024-01-15 09:57:12 +08:00
Binbin 284ef21ea0
Fix fd check in memtest_test_linux_anonymous_maps (#12943)
The open function returns a fd on success or -1 on failure,
here we should check fd != -1, otherwise -1 will be judged
as success.

This closes #12938.
2024-01-14 11:18:17 +02:00
Chen Tianjie 87786342a5
Correct bytes_per_key computing. (#12897)
Change the calculation method of bytes_per_key to make it closer to
the true average key size. The calculation method is as follows:

mh->bytes_per_key = mh->total_keys ? (mh->dataset / mh->total_keys) : 0;
2024-01-12 11:58:53 +08:00
Harkrishn Patro 964f4a4576
Avoid double free of cluster link (#12930)
Avoid crash while performing `DEBUG CLUSTERLINK KILL` mutliple times
(cluster link might not be created/valid).
2024-01-11 15:59:22 -08:00
bentotten b3aaa0a136
When one shard, sole primary node marks potentially failed replica as FAIL instead of PFAIL (#12824)
Fixes issue where a single primary cannot mark a replica as failed in a
single-shard cluster.
2024-01-11 15:48:19 -08:00
Binbin b351a04b1e
Add announced-endpoints test to all_tests and fix tls related tests (#12927)
The test was introduced in #10745, but we forgot to add it to the
test_helper.tcl, so our CI did not actually run it. This PR adds it
and ensures it passes CI tests.
2024-01-09 18:18:59 -08:00
Oran Agra f7b1d0287d
Fix possible corruption in sdsResize (CVE-2023-41056) (#12924)
#11766 introduced a bug in sdsResize where it could forget to update the
sds type in the sds header and then cause an overflow in sdsalloc. it
looks like the only implication of that is a possible assertion in HLL,
but it's hard to rule out possible heap corruption issues with
clientsCronResizeQueryBuffer
2024-01-09 13:51:56 +02:00
Madelyn Olson 8bb9a2895e
Address some failures with new tests for improving debug report (#12915)
Fix a daily test failure because alpine doesn't support stack traces and
add in an extra assertion related to making sure the stack trace was
printed twice.
2024-01-08 17:56:06 -08:00
Binbin 14e4a9835a
Fix minor fd leak in rdbSaveToSlavesSockets (#12919)
We should close server.rdb_child_exit_pipe when redisFork fails,
otherwise the pipe fd will be leaked.

Just a cleanup.
2024-01-08 17:36:34 +02:00
Andy Pan 50b8b99763
Re-indent code and reduce code being complied on Solaris for anetKeepAlive (#12914)
This is a follow-up PR for #12782, in which we introduced nested
preprocessor directives for TCP keep-alive on Solaris and added
redundant indentation for code. Besides, it could result in unreachable
code due to the lack of `#else` on the latest Solaris 11.4 where
`TCP_KEEPIDLE`, `TCP_KEEPINTVL`, and `TCP_KEEPCNT` are available. As a
result, this PR does three main things:

- To eliminate the redundant indention for C code in nested preprocessor
directives
- To add `#else` directives and move `TCP_KEEPALIVE_THRESHOLD` +
`TCP_KEEPALIVE_ABORT_THRESHOLD` settings under it, avoid unreachable
code and compiler warnings when `#if defined(TCP_KEEPIDLE) &&
defined(TCP_KEEPINTVL) && defined(TCP_KEEPCNT)` is met on Solaris 11.4
- To remove a few trailing whitespace in comments
2024-01-08 11:12:24 +02:00
Yanqi Lv c452e414a8
Optimize performance when many clients [p|s]unsubscribe simultaneously (#12838)
I'm testing the performance of Pub/Sub command recently. I find if many
clients unsubscribe or are killed simultaneously, Redis needs a long
time to deal with it.

In my experiment, I set 5000 clients and each client subscribes 100
channels. Then I call `client kill type pubsub` to simulate the
situation where clients unsubscribe all channels at the same time and
calculate the execution time. The result shows that it takes about 23s.
I use the _perf_ and find that `listSearchKey` in
`pubsubUnsubscribeChannel` costs more than 90% cpu time. I think we can
optimize this situation.

In this PR, I replace list with dict to track the clients subscribing
the channel more efficiently. It changes O(N) to O(1) in the search
phase. Then I repeat the experiment as above. The results are as
follows.

|              | Execution Time(s) |used_memory(MB) |
| :---------------- | :------: | :----: |
| unstable(1bd0b54)        |   23.734   | 65.41 |
| optimize-pubsub           |   0.288   | 67.66 |

Thanks for #11595 , I use a no-value dict and the results shows that the
performance improves significantly but the memory usage only increases
slightly.

Notice:

- This PR will cause the performance degradation about 20% in
`[p|s]subscribe` command but won't freeze Redis.
2024-01-08 10:32:31 +02:00
debing.sun 4730563e93
Change destination key's key-spec flag from RW to OW for SINTERSTORE command (#12917)
In #10122, we set the destination key's flag of SINTERSTORE to `RW`, 
however, this command doesn't actually read or modify the destination
key, just overwrites it.
Therefore, we change it to `OW` similarly to all other *STORE commands.
2024-01-08 10:17:13 +02:00
Binbin 5b0c6a8255
Fix CLUSTER SHARDS crash in 7.0/7.2 mixed clusters where shard ids are not sync (#12832)
Crash reported in #12695. In the process of upgrading the cluster from
7.0 to 7.2, because the 7.0 nodes will not gossip shard id, in 7.2 we
will rely on shard id to build the server.cluster->shards dict.

In some cases, for example, the 7.0 master node and the 7.2 replica node.
From the view of 7.2 replica node, the cluster->shards dictionary does not
have its master node. In this case calling CLUSTER SHARDS on the 7.2 replica
node may crash.

We should fix the underlying assumption of updateShardId, which is that the
shard dict should be always in sync with the node's shard_id. The fix was
suggested by PingXie, see more details in #12695.
2024-01-07 20:54:41 -08:00
debing.sun ca1f67af80
Make RM_Yield thread-safe (#12905)
## Issues and solutions from #12817
1. Touch ProcessingEventsWhileBlocked and calling moduleCount() without
GIL in afterSleep()
    - Introduced: 
       Version: 7.0.0
       PR: #9963

   - Harm Level: Very High
If the module thread calls `RM_Yield()` before the main thread enters
afterSleep(),
and modifies `ProcessingEventsWhileBlocked`(+1), it will cause the main
thread to not wait for GIL,
which can lead to all kinds of unforeseen problems, including memory
data corruption.

   - Initial / Abandoned Solution:
      * Added `__thread` specifier for ProcessingEventsWhileBlocked.
`ProcessingEventsWhileBlocked` is used to protect against nested event
processing, but event processing
in the main thread and module threads should be completely independent
and unaffected, so it is safer
         to use TLS.
* Adding a cached module count to keep track of the current number of
modules, to avoid having to use `dictSize()`.
    
    - Related Warnings:
```
WARNING: ThreadSanitizer: data race (pid=1136)
  Write of size 4 at 0x0001045990c0 by thread T4 (mutexes: write M0):
    #0 processEventsWhileBlocked networking.c:4135 (redis-server:arm64+0x10006d124)
    #1 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c)
    #2 bg_call_worker <null>:83232836 (blockedclient.so:arm64+0x16a8)

  Previous read of size 4 at 0x0001045990c0 by main thread:
    #0 afterSleep server.c:1861 (redis-server:arm64+0x100024f98)
    #1 aeProcessEvents ae.c:408 (redis-server:arm64+0x10000fd64)
    #2 aeMain ae.c:496 (redis-server:arm64+0x100010f0c)
    #3 main server.c:7220 (redis-server:arm64+0x10003f38c)
```

2. aeApiPoll() is not thread-safe
When using RM_Yield to handle events in a module thread, if the main
thread has not yet
entered `afterSleep()`, both the module thread and the main thread may
touch `server.el` at the same time.

    - Introduced: 
       Version: 7.0.0
       PR: #9963

   - Old / Abandoned Solution:
Adding a new mutex to protect timing between after beforeSleep() and
before afterSleep().
Defect: If the main thread enters the ae loop without any IO events, it
will wait until
the next timeout or until there is any event again, and the module
thread will
always hang until the main thread leaves the event loop.

    - Related Warnings:
```
SUMMARY: ThreadSanitizer: data race ae_kqueue.c:55 in addEventMask
==================
==================
WARNING: ThreadSanitizer: data race (pid=14682)
  Write of size 4 at 0x000100b54000 by thread T9 (mutexes: write M0):
    #0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588)
    #1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84)
    #2 processEventsWhileBlocked networking.c:4138 (redis-server:arm64+0x10006d3c4)
    #3 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c)
    #4 bg_call_worker <null>:16042052 (blockedclient.so:arm64+0x169c)

  Previous write of size 4 at 0x000100b54000 by main thread:
    #0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588)
    #1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84)
    #2 aeMain ae.c:496 (redis-server:arm64+0x100010da8)
    #3 main server.c:7238 (redis-server:arm64+0x10003f51c)
```

## The final fix as the comments:
https://github.com/redis/redis/pull/12817#discussion_r1436427232
Optimized solution based on the above comment:

First, we add `module_gil_acquring` to indicate whether the main thread
is currently in the acquiring GIL state.

When the module thread starts to yield, there are two possibilities(we
assume the caller keeps the GIL):
1. The main thread is in the mid of beforeSleep() and afterSleep(), that
is, `module_gil_acquring` is not 1 now.
At this point, the module thread will wake up the main thread through
the pipe and leave the yield,
waiting for the next yield when the main thread may already in the
acquiring GIL state.
    
2. The main thread is in the acquiring GIL state.
The module thread release the GIL, yielding CPU to give the main thread
an opportunity to start
event processing, and then acquire the GIL again until the main thread
releases it.
This is what
https://github.com/redis/redis/pull/12817#discussion_r1436427232
mentioned direction.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2024-01-07 12:10:29 +02:00
Binbin 4cae66f5e8
Use shard-id of the master if the replica does not support shard-id (#12805)
If there are nodes in the cluster that do not support shard-id, they
will gossip shard-id. From the perspective of nodes that support shard-id,
their shard-id is meaningless (since shard-id is randomly generated when
we create a node.)

Nodes that support shard-id will save the shard-id information in nodes.conf.
If the node is restarted according to nodes.conf, the server will report a
corrupted cluster config file error. Because auxShardIdSetter will reject
configurations with inconsistent master-replica shard-ids.

A cluster-wide consensus for the node's shard_id is not necessary. The key
is maintaining consistency of the shard_id on each individual 7.2 node.
As the cluster progressively upgrades to version 7.2, we can expect the
shard_ids across all nodes to naturally converge and align.

In this PR, when processing the gossip, if sender is a replica and does not
support shard-id, set the shard_id to the shard_id of its master.
2024-01-06 20:24:41 -08:00
dependabot[bot] 38f0234946
Bump cross-platform-actions/action from 0.21.1 to 0.22.0 (#12904)
Bumps
[cross-platform-actions/action](https://github.com/cross-platform-actions/action)
from 0.21.1 to 0.22.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/cross-platform-actions/action/releases">cross-platform-actions/action's
releases</a>.</em></p>
<blockquote>
<h2>Cross Platform Action 0.22.0</h2>
<h3>Added</h3>
<ul>
<li>
<p>Added support for using the action in multiple steps in the same job
(<a
href="https://redirect.github.com/cross-platform-actions/action/issues/26">#26</a>).
All the inputs need to be the same for all steps, except for the
following
inputs: <code>sync_files</code>, <code>shutdown_vm</code> and
<code>run</code>.</p>
</li>
<li>
<p>Added support for specifying that the VM should not shutdown after
the action
has run. This adds a new input parameter: <code>shutdown_vm</code>. When
set to <code>false</code>,
this will hopefully mitigate very frequent freezing of VM during
teardown (<a
href="https://redirect.github.com/cross-platform-actions/action/issues/61">#61</a>,
<a
href="https://redirect.github.com/cross-platform-actions/action/issues/72">#72</a>).</p>
</li>
</ul>
<h3>Changed</h3>
<ul>
<li>
<p>Always terminate VM instead of shutting down. This is more efficient
and this
will hopefully mitigate very frequent freezing of VM during teardown
(<a
href="https://redirect.github.com/cross-platform-actions/action/issues/61">#61</a>,
<a
href="https://redirect.github.com/cross-platform-actions/action/issues/72">#72</a>).</p>
</li>
<li>
<p>Use <code>unsafe</code> as the cache mode for QEMU disks. This should
improve performance (<a
href="https://redirect.github.com/cross-platform-actions/action/issues/67">#67</a>).</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/cross-platform-actions/action/blob/master/changelog.md">cross-platform-actions/action's
changelog</a>.</em></p>
<blockquote>
<h2>[0.22.0] - 2023-12-27</h2>
<h3>Added</h3>
<ul>
<li>
<p>Added support for using the action in multiple steps in the same job
(<a
href="https://redirect.github.com/cross-platform-actions/action/issues/26">#26</a>).
All the inputs need to be the same for all steps, except for the
following
inputs: <code>sync_files</code>, <code>shutdown_vm</code> and
<code>run</code>.</p>
</li>
<li>
<p>Added support for specifying that the VM should not shutdown after
the action
has run. This adds a new input parameter: <code>shutdown_vm</code>. When
set to <code>false</code>,
this will hopefully mitigate very frequent freezing of VM during
teardown (<a
href="https://redirect.github.com/cross-platform-actions/action/issues/61">#61</a>,
<a
href="https://redirect.github.com/cross-platform-actions/action/issues/72">#72</a>).</p>
</li>
</ul>
<h3>Changed</h3>
<ul>
<li>
<p>Always terminate VM instead of shutting down. This is more efficient
and this
will hopefully mitigate very frequent freezing of VM during teardown
(<a
href="https://redirect.github.com/cross-platform-actions/action/issues/61">#61</a>,
<a
href="https://redirect.github.com/cross-platform-actions/action/issues/72">#72</a>).</p>
</li>
<li>
<p>Use <code>unsafe</code> as the cache mode for QEMU disks. This should
improve performance (<a
href="https://redirect.github.com/cross-platform-actions/action/issues/67">#67</a>).</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="5800fa0060"><code>5800fa0</code></a>
Release 0.22.0</li>
<li><a
href="20ad4b2ceb"><code>20ad4b2</code></a>
Fix <a
href="https://redirect.github.com/cross-platform-actions/action/issues/67">#67</a>:
Use <code>unsafe</code> as the cache mode disks</li>
<li><a
href="d9184930c3"><code>d918493</code></a>
Always terminate VM instead of shutting down.</li>
<li><a
href="626f1d6c95"><code>626f1d6</code></a>
Fix error when terminating the VM</li>
<li><a
href="d59f08dc5c"><code>d59f08d</code></a>
Print stack trace for uncaught exceptions</li>
<li><a
href="7f2fab9c56"><code>7f2fab9</code></a>
Revert &quot;Run SSH in verbose mode when debug mode is
enabled&quot;</li>
<li><a
href="0f566c356e"><code>0f566c3</code></a>
[no ci] Update the changelog</li>
<li><a
href="b7f77446bb"><code>b7f7744</code></a>
[no ci] Fix spelling</li>
<li><a
href="9894a9b118"><code>9894a9b</code></a>
Wrap <code>host</code> module in namespace</li>
<li><a
href="87fdd346a2"><code>87fdd34</code></a>
Fix broken test-vm-shutdown tests</li>
<li>Additional commits viewable in <a
href="https://github.com/cross-platform-actions/action/compare/v0.21.1...v0.22.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=cross-platform-actions/action&package-manager=github_actions&previous-version=0.21.1&new-version=0.22.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-04 22:38:33 +02:00
Lior Kogan 5189838350
Update CONTRIBUTING.md (#12907)
- Referring to Redis Discord channel instead of the mailing list.
- Referring to the licensing instead of repeating it.
2024-01-03 17:21:19 +02:00
Madelyn Olson 068051e378
Handle recursive serverAsserts and provide more information for recursive segfaults (#12857)
This change is trying to make two failure modes a bit easier to deep dive:
1. If a serverPanic or serverAssert occurs during the info (or module)
printing, it will recursively panic, which is a lot of fun as it will
just keep recursively printing. It will eventually stack overflow, but
will generate a lot of text in the process.
2. When a segfault happens during the segfault handler, no information
is communicated other than it happened. This can be problematic because
`info` may help diagnose the real issue, but without fixing the
recursive crash it might be hard to get at that info.
2024-01-02 18:20:22 -08:00
AshMosh c3f8b542ee
Manage number of new connections per cycle (#12178)
There are situations (especially in TLS) in which the engine gets too occupied managing a large number of new connections. Existing connections may time-out while the server is processing the new connections initial TLS handshakes, which may cause cause new connections to be established, perpetuating the problem. To better manage the tradeoff between new connection rate and other workloads, this change adds a new config to manage maximum number of new connections per event loop cycle, instead of using a predetermined number (currently 1000).

This change introduces two new configurations, max-new-connections-per-cycle and max-new-tls-connections-per-cycle. The default value of the tcp connections is 10 per cycle and the default value of tls connections per cycle is 1.
---------

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-01-02 15:15:03 -08:00
Chen Tianjie 9d0158bf89
Reorder signalModifiedKey in xaddCommand. (#12895)
This PR is a supplement to #11144, moving `signalModifiedKey` in
`xaddCommand` after the trimming, to ensure the state of key eventual
consistency. Currently there is no problem with Redis, but it is better
to avoid issues in future development on Redis.
2023-12-28 13:29:27 +02:00
dependabot[bot] 2c5b51ad26
Bump github/codeql-action from 2 to 3 (#12869)
Bumps [github/codeql-action](https://github.com/github/codeql-action)
from 2 to 3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/releases">github/codeql-action's
releases</a>.</em></p>
<blockquote>
<h2>CodeQL Bundle v2.15.4</h2>
<p>Bundles CodeQL CLI v2.15.4</p>
<ul>
<li>(<a
href="https://github.com/github/codeql-cli-binaries/blob/HEAD/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql-cli-binaries/releases/tag/v2.15.4">release</a>)</li>
</ul>
<p>Includes the following CodeQL language packs from <a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4"><code>github/codeql@codeql-cli/v2.15.4</code></a>:</p>
<ul>
<li><code>codeql/cpp-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/cpp/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/cpp/ql/src">source</a>)</li>
<li><code>codeql/cpp-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/cpp/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/cpp/ql/lib">source</a>)</li>
<li><code>codeql/csharp-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/csharp/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/csharp/ql/src">source</a>)</li>
<li><code>codeql/csharp-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/csharp/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/csharp/ql/lib">source</a>)</li>
<li><code>codeql/go-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/go/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/go/ql/src">source</a>)</li>
<li><code>codeql/go-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/go/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/go/ql/lib">source</a>)</li>
<li><code>codeql/java-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/java/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/java/ql/src">source</a>)</li>
<li><code>codeql/java-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/java/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/java/ql/lib">source</a>)</li>
<li><code>codeql/javascript-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/javascript/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/javascript/ql/src">source</a>)</li>
<li><code>codeql/javascript-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/javascript/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/javascript/ql/lib">source</a>)</li>
<li><code>codeql/python-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/python/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/python/ql/src">source</a>)</li>
<li><code>codeql/python-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/python/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/python/ql/lib">source</a>)</li>
<li><code>codeql/ruby-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/ruby/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/ruby/ql/src">source</a>)</li>
<li><code>codeql/ruby-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/ruby/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/ruby/ql/lib">source</a>)</li>
<li><code>codeql/swift-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/swift/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/swift/ql/src">source</a>)</li>
<li><code>codeql/swift-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/swift/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/swift/ql/lib">source</a>)</li>
</ul>
<h2>CodeQL Bundle</h2>
<p>Bundles CodeQL CLI v2.15.3</p>
<ul>
<li>(<a
href="https://github.com/github/codeql-cli-binaries/blob/HEAD/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql-cli-binaries/releases/tag/v2.15.3">release</a>)</li>
</ul>
<p>Includes the following CodeQL language packs from <a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3"><code>github/codeql@codeql-cli/v2.15.3</code></a>:</p>
<ul>
<li><code>codeql/cpp-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/cpp/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/cpp/ql/src">source</a>)</li>
<li><code>codeql/cpp-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/cpp/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/cpp/ql/lib">source</a>)</li>
<li><code>codeql/csharp-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/csharp/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/csharp/ql/src">source</a>)</li>
<li><code>codeql/csharp-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/csharp/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/csharp/ql/lib">source</a>)</li>
<li><code>codeql/go-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/go/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/go/ql/src">source</a>)</li>
<li><code>codeql/go-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/go/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/go/ql/lib">source</a>)</li>
<li><code>codeql/java-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/java/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/java/ql/src">source</a>)</li>
<li><code>codeql/java-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/java/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/java/ql/lib">source</a>)</li>
<li><code>codeql/javascript-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/javascript/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/javascript/ql/src">source</a>)</li>
<li><code>codeql/javascript-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/javascript/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/javascript/ql/lib">source</a>)</li>
<li><code>codeql/python-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/python/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/python/ql/src">source</a>)</li>
<li><code>codeql/python-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/python/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/python/ql/lib">source</a>)</li>
<li><code>codeql/ruby-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/ruby/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/ruby/ql/src">source</a>)</li>
<li><code>codeql/ruby-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/ruby/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/ruby/ql/lib">source</a>)</li>
<li><code>codeql/swift-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/swift/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/swift/ql/src">source</a>)</li>
<li><code>codeql/swift-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/swift/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/swift/ql/lib">source</a>)</li>
</ul>
<h2>CodeQL Bundle</h2>
<p>Bundles CodeQL CLI v2.15.2</p>
<ul>
<li>(<a
href="https://github.com/github/codeql-cli-binaries/blob/HEAD/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql-cli-binaries/releases/tag/v2.15.2">release</a>)</li>
</ul>
<p>Includes the following CodeQL language packs from <a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.2"><code>github/codeql@codeql-cli/v2.15.2</code></a>:</p>
<ul>
<li><code>codeql/cpp-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.2/cpp/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.2/cpp/ql/src">source</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/blob/main/CHANGELOG.md">github/codeql-action's
changelog</a>.</em></p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3a9f6a89e0"><code>3a9f6a8</code></a>
update javascript files</li>
<li><a
href="cc4fead714"><code>cc4fead</code></a>
update version in various hardcoded locations</li>
<li><a
href="183559cea8"><code>183559c</code></a>
Merge branch 'main' into update-bundle/codeql-bundle-v2.15.4</li>
<li><a
href="5b52b36d41"><code>5b52b36</code></a>
reintroduce PR check that confirm action can be still be compiled on
node16</li>
<li><a
href="5b19bef41e"><code>5b19bef</code></a>
change to node20 for all actions</li>
<li><a
href="f2d0c2e7ae"><code>f2d0c2e</code></a>
upgrade node type definitions</li>
<li><a
href="d651fbc494"><code>d651fbc</code></a>
change to node20 for all actions</li>
<li><a
href="382a50a028"><code>382a50a</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/2021">#2021</a>
from github/mergeback/v2.22.9-to-main-c0d1daa7</li>
<li><a
href="458b4226ad"><code>458b422</code></a>
Update checked-in dependencies</li>
<li><a
href="5e0f9dbc48"><code>5e0f9db</code></a>
Update changelog and version after v2.22.9</li>
<li>See full diff in <a
href="https://github.com/github/codeql-action/compare/v2...v3">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github/codeql-action&package-manager=github_actions&previous-version=2&new-version=3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-12-28 11:32:23 +02:00
guybe7 12b611b374
WAITAOF: Try to wake blocked clients ASAP in the next beforeSleep (#12627)
In case server.fsynced_reploff changed (e.g. flushAppendOnly set it to
server.master_repl_offset in case there was nothing to fsync) we want to
avoid sleeping before the next beforeSleep so we can call
blockedBeforeSleep ASAP.
without that, in case there's no incoming traffic, we could be waiting
for the next cron timer event to wake us up.
2023-12-28 11:27:58 +02:00
Binbin 99c468c38c
Fix crash caused by pubsubShardUnsubscribeAllChannelsInSlot not deleting the client (#12896)
The code does not delete the corresponding node when traversing clients,
resulting in a loop, causing the dictDelete() == DICT_OK assertion to
fail.

In addition, did a cleanup, in the dictCreate scenario, we can avoid a
dictFind call since the dict is empty.

Issue was introduced in #12804.
2023-12-28 08:32:51 +02:00
Binbin 5b1fe925f2
Adjust redis-cli --cluster create arity from -2 to -1 (#12892)
When arity is -2, it allows us to input two nodes, but returns:
```
*** ERROR: Invalid configuration for cluster creation.
*** Redis Cluster requires at least 3 master nodes.
*** This is not possible with 2 nodes and 0 replicas per node.
*** At least 3 nodes are required.
```

When we input one node, it return:
```
[ERR] Wrong number of arguments for specified --cluster sub command
```

Strictly speaking -2 should also be rejected, because redis-cli
requires at least three nodes. However, the error message was not
very friendly, so decided to change it arity -1.

This closes #12891.
2023-12-28 08:26:23 +02:00
Chen Tianjie 8527959598
Replace slots_to_channels radix tree with slot specific dictionaries for shard channels. (#12804)
We have achieved replacing `slots_to_keys` radix tree with key->slot
linked list (#9356), and then replacing the list with slot specific
dictionaries for keys (#11695).

Shard channels behave just like keys in many ways, and we also need a
slots->channels mapping. Currently this is still done by using a radix
tree. So we should split `server.pubsubshard_channels` into 16384 dicts
and drop the radix tree, just like what we did to DBs.

Some benefits (basically the benefits of what we've done to DBs):
1. Optimize counting channels in a slot. This is currently used only in
removing channels in a slot. But this is potentially more useful:
sometimes we need to know how many channels there are in a specific slot
when doing slot migration. Counting is now implemented by traversing the
radix tree, and with this PR it will be as simple as calling `dictSize`,
from O(n) to O(1).
2. The radix tree in the cluster has been removed. The shard channel
names no longer require additional storage, which can save memory.
3. Potentially useful in slot migration, as shard channels are logically
split by slots, thus making it easier to migrate, remove or add as a
whole.
4. Avoid rehashing a big dict when there is a large number of channels.

Drawbacks:
1. Takes more memory than using radix tree when there are relatively few
shard channels.

What this PR does:
1. in cluster mode, split `server.pubsubshard_channels` into 16384
dicts, in standalone mode, still use only one dict.
2. drop the `slots_to_channels` radix tree.
3. to save memory (to solve the drawback above), all 16384 dicts are
created lazily, which means only when a channel is about to be inserted
to the dict will the dict be initialized, and when all channels are
deleted, the dict would delete itself.
5. use `server.shard_channel_count` to keep track of the number of all
shard channels.

---------

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2023-12-27 17:40:45 +08:00
Moshe Kaplan fa751f9bef
config.c: Avoid leaking file handle if file is 0 bytes (#12828)
If fopen() is successful and redis_fstat determines that the file is 0
bytes, the file handle stored in fp will leak. This change closes the
filehandle stored in fp if the file is 0 bytes.

Second attempt at fixing Coverity 390029

This is a follow-up to #12796
2023-12-27 08:53:56 +02:00
sundb bef5715374
Fix oom-score-adj test due to no permission (#12887)
Fix #12792

On ubuntu 23(lunar), non-root users will not be allowed to change the
oom_score_adj of a process to a value that is too low.
Since terminal's default oom_score_adj is 200, if we run the test on
terminal, we won't be able to set the oom_score_adj of the redis process
to 9 or 22, which is too low.

Reproduction on ubuntu 23(lunar) terminal:
```sh
$ cat /proc/`pgrep redis-server`/oom_score_adj
200
$ echo 100 > /proc/`pgrep redis-server`/oom_score_adj
# success without error
$ echo 99 > /proc/`pgrep redis-server`/oom_score_adj
echo: write error: Permission denied
```

As from the output above, we can only set the minimum oom score of redis
processes to 100.
By modifying the test, make oom_score_adj only increase upwards and not
decrease.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2023-12-27 08:42:46 +02:00
Andy Pan 1aa633d61b
Implement TCP Keep-Alives across most Unix-like systems (#12782)
## TCP Keep-Alives

[TCP
Keep-Alives](https://datatracker.ietf.org/doc/html/rfc9293#name-tcp-keep-alives)
provides a way to detect whether a TCP connection is alive or dead,
which can be useful for reducing system resources by cleaning up dead
connections.

There is full support of TCP Keep-Alives on Linux and partial support on
macOS in `redis` at present.

This PR intends to complete the rest.

## Unix-like OS's support

`TCP_KEEPIDLE`, `TCP_KEEPINTVL`, and `TCP_KEEPCNT` are not included in
the POSIX standard for `setsockopts`, while these three socket options
are widely available on most Unix-like systems and Windows.

### References

- [AIX](https://www.ibm.com/support/pages/ibm-aix-tcp-keepalive-probes)
- [DragonflyBSD](https://man.dragonflybsd.org/?command=tcp&section=4)
- [FreeBSD](https://www.freebsd.org/cgi/man.cgi?query=tcp)
-
[HP-UX](https://docstore.mik.ua/manuals/hp-ux/en/B2355-60130/TCP.7P.html)
- [illumos](https://illumos.org/man/4P/tcp)
- [Linux](https://man7.org/linux/man-pages/man7/tcp.7.html)
- [NetBSD](https://man.netbsd.org/NetBSD-8.0/tcp.4)
-
[Windows](https://learn.microsoft.com/en-us/windows/win32/winsock/ipproto-tcp-socket-options)

### Mac OS

In earlier versions, macOS only supported setting `TCP_KEEPALIVE` (the
equivalent of `TCP_KEEPIDLE` on other platforms), but since macOS 10.8
it has supported `TCP_KEEPINTVL` and `TCP_KEEPCNT`.

Check out [this mailing
list](https://lists.apple.com/archives/macnetworkprog/2012/Jul/msg00005.html)
and [the source
code](https://github.com/apple/darwin-xnu/blob/main/bsd/netinet/tcp.h#L215-L230)
for more details.

### Solaris

Solaris claimed it supported the TCP-Alives mechanism, but
`TCP_KEEPIDLE`, `TCP_KEEPINTVL`, and `TCP_KEEPCNT` were not available on
Solaris until the latest version 11.4. Therefore, we need to simulate
the TCP-Alives mechanism on other platforms via
`TCP_KEEPALIVE_THRESHOLD` + `TCP_KEEPALIVE_ABORT_THRESHOLD`.

- [Solaris
11.3](https://docs.oracle.com/cd/E86824_01/html/E54777/tcp-7p.html)
- [Solaris
11.4](https://docs.oracle.com/cd/E88353_01/html/E37851/tcp-4p.html)

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2023-12-26 18:44:18 +02:00
Jeff Liu 27a8e3b04e
fix missing comments (#12878)
add a missing comment for `dont_compress` and fix the bits calculation
2023-12-25 19:30:05 -08:00
zalj baf5699d77
fix comment of aeProcessEvents (#12884)
The implementation of aeProcessEvents seems have different behavior from
the top comment.
The implementation process file events first, then process time events.
2023-12-25 18:58:14 -08:00
Binbin 71f31da66f
Add restart option to create-cluster script (#12885)
When testing and debugging the cluster code before, you need
to stop the cluster after making changes, and then start the
cluster again. Add a restart option for ease of use.
2023-12-25 18:36:44 -08:00
Slava Koyfman 20214b26a4
Don't disconnect all clients in ACL LOAD (#12171)
Previous implementation would disconnect _all_ clients when running `ACL
LOAD`, which wasn't very useful.

This change brings the behavior in line with that of `ACL SETUSER`, `ACL
DELUSER`, in that only clients whose user is deleted or clients
subscribed to channels which they no longer have access to will be
disconnected.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
2023-12-24 11:56:44 +02:00
Binbin 09e0d338f5
redis-cli adds -4 / -6 options to determine IPV4 / IPV6 priority in DNS lookup (#11315)
This PR, we added -4 and -6 options to redis-cli to determine
IPV4 / IPV6 priority in DNS lookup.
This was mentioned in
https://github.com/redis/redis/pull/11151#issuecomment-1231570651

For now it's only used in CLUSTER MEET.

The options also made it possible to reliably test dns lookup in CI,
using this option, we can add some localhost tests for #11151.

The commit was cherry-picked from #11151, back then we decided to split
the PR.

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2023-12-24 10:40:34 +02:00
Binbin 23e980e77a
Move cliVersion to cli_common and add --version support for redis-check-aof (#10856)
Let us see which version of redis this tool is part of.
Similarly to redis-cli, redis-benchmark and redis-check-rdb

redis-rdb-check and redis-aof-check are actually symlinks to redis,
so they will directly use getVersion in server, the format became:
```
{title} v={redis_version} sha={sha}:{dirty} malloc={malloc} bits={bits} build={build}
```

Move cliVersion into cli_common, redis-cli and redis-benchmark will
use it, and the format is not change:
```
{title} {redis_version} (git:{sha})
```
2023-12-21 13:51:46 +02:00
Wen Hui 5dc631d880
Add missing test cases for hash commands (#12851)
We dont have test for hgetall against key doesnot exist so added the
test in test suite and along with this, added wrong type cases for other
missing commands.
2023-12-17 14:02:53 +02:00
Binbin adbb534f03
Always keep an in-memory history of all commands in redis-cli (#12862)
redis-cli avoids saving sensitive commands in it's history (doesn't
persist them to the history file).
this means that if you had a typo and you wanna re-run the command, you
can't easily do that.
This PR changes that to keep an in-memory history of all the redacted
commands, and just
not persist them to disk. This way we would be able to press the up
arrow and
re-try the command freely, and it'll just not survive a redis-cli
restart.
2023-12-15 17:22:02 +02:00
zhaozhao.zz d8a21c5767
Unified db rehash method for both standalone and cluster (#12848)
After #11695, we added two functions `rehashingStarted` and
`rehashingCompleted` to the dict structure. We also registered two
handlers for the main database's dict and expire structures. This allows
the main database to record the dict in `rehashing` list when rehashing
starts. Later, in `serverCron`, the `incrementallyRehash` function is
continuously called to perform the rehashing operation. However,
currently, when rehashing is completed, `rehashingCompleted` does not
remove the dict from the `rehashing` list. This results in the
`rehashing` list containing many invalid dicts. Although subsequent cron
checks and removes dicts that don't require rehashing, it is still
inefficient.

This PR implements the functionality to remove the dict from the
`rehashing` list in `rehashingCompleted`. This is achieved by adding
`metadata` to the dict structure, which keeps track of its position in
the `rehashing` list, allowing for quick removal. This approach avoids
storing duplicate dicts in the `rehashing` list.

Additionally, there are other modifications:

1. Whether in standalone or cluster mode, the dict in database is
inserted into the rehashing linked list when rehashing starts. This
eliminates the need to distinguish between standalone and cluster mode
in `incrementallyRehash`. The function only needs to focus on the dicts
in the `rehashing` list that require rehashing.
2. `rehashing` list is moved from per-database to Redis server level.
This decouples `incrementallyRehash` from the database ID, and in
standalone mode, there is no need to iterate over all databases,
avoiding unnecessary access to databases that do not require rehashing.
In the future, even if unsharded-cluster mode supports multiple
databases, there will be no risk involved.
3. The insertion and removal operations of dict structures in the
`rehashing` list are decoupled from `activerehashing` config.
`activerehashing` only controls whether `incrementallyRehash` is
executed in serverCron. There is no need for additional steps when
modifying the `activerehashing` switch, as in #12705.
2023-12-15 10:42:53 +08:00
Guillaume Koenig 967fb3c6e8
Extend rax usage by allowing any long long value (#12837)
The raxFind implementation uses a special pointer value (the address of
a static string) as the "not found" value. It works as long as actual
pointers were used. However we've seen usages where long long,
non-pointer values have been used. It creates a risk that one of the
long long value precisely is the address of the special "not found"
value. This commit changes raxFind to return 1 or 0 to indicate
elementhood, and take in a new void **value to optionally return the
associated value.

By extension, this also allow the RedisModule_DictSet/Replace operations
to also safely insert integers instead of just pointers.
2023-12-14 14:50:18 -08:00
Chen Tianjie e95a5d4831
Support by/get options for sort(_ro) in cluster mode when pattern implies slot. (#12728)
The by/get options of sort/sort_ro command used to be forbidden in
cluster mode, since we are not sure which slot the pattern may be in.

As the optimization done in #12536, patterns now can be mapped to slots,
we should allow by/get options in cluster mode when the pattern maps to
the same slot as the key.
2023-12-13 21:16:36 +02:00
Binbin 3c0fd25201
Redact ACL username information and mark *-key-file-pass configs as sensitive (#12860)
In #11489, we consider acl username to be sensitive information,
and consider the ACL GETUSER a sensitive command and remove it
from redis-cli historyfile.

This PR redact username information in ACL GETUSER and ACL DELUSER
from SLOWLOG, and also remove ACL DELUSER from redis-cli historyfile.

This PR also mark tls-key-file-pass and tls-client-key-file-pass
as sensitive config, will redact it from SLOWLOG and also
remove them from redis-cli historyfile.
2023-12-13 15:28:13 +02:00
Chen Tianjie f9cc25c1dd
Add metric to INFO CLIENTS: pubsub_clients. (#12849)
In INFO CLIENTS section, we already have blocked_clients and
tracking_clients. We should add a new metric showing the number of
pubsub connections, which helps performance monitoring and trouble
shooting.
2023-12-13 13:44:13 +08:00
Binbin c85a9b7896
Fix delKeysInSlot server events are not executed inside an execution unit (#12745)
This is a follow-up fix to #12733. We need to apply the same changes to
delKeysInSlot. Refer to #12733 for more details.

This PR contains some other minor cleanups / improvements to the test
suite and docs.
It uses the postnotifications test module in a cluster mode test which
revealed a leak in the test module (fixed).
2023-12-11 20:15:19 +02:00
Binbin 62419c01db
Handle missing fields in dbSwapDatabases and swapMainDbWithTempDb (#12763)
The change in dbSwapDatabases seems harmless. Because in non-clustered
mode, dbBuckets calculations are strictly accurate and in cluster mode,
we only have one DB. Modify it for uniformity (just like resize_cursor).

The change in swapMainDbWithTempDb is needed in case we swap with the
temp db, otherwise the overhead memory usage of db can be miscalculated.

In addition we will swap all fields (including rehashing list), just for
completeness (and reduce the chance of surprises in the future).

Introduced in #12697.
2023-12-10 17:30:20 +02:00
Binbin e6423b7a7e
Fix rehashingStarted miscalculating bucket_count in dict initialization (#12846)
In the old dictRehashingInfo implementation, for the initialization
scenario,
it mistakenly directly set to_size to DICTHT_SIZE(DICT_HT_INITIAL_EXP),
which
is 4 in our code by default.

In scenarios where dictExpand directly passes the target size as
initialization,
the code will calculate bucket_count incorrectly. For example, in DEBUG
POPULATE
or RDB load scenarios, it will cause the final bucket_count to be
initialized to
65536 (16384 * 4), see:
```
before:
DB 0: 10000000 keys (0 volatile) in 65536 slots HT.

it should be:
DB 0: 10000000 keys (0 volatile) in 16777216 slots HT.
```

In PR, new ht will also be initialized before calling rehashingStarted
in
_dictExpand, so that the calls in dictRehashingInfo can be unified.

Bug was introduced in #12697.
2023-12-10 10:55:30 +02:00