redis

Commit Graph

Author	SHA1	Message	Date
Pieter Cailliau	0b34396924	Change license from BSD-3 to dual RSALv2+SSPLv1 (#13157 ) [Read more about the license change here](https://redis.com/blog/redis-adopts-dual-source-available-licensing/) Live long and prosper 🖖	2024-03-20 22:38:24 +00:00
yoav-steinberg	843a4cdc07	Add warning for suspected slow system clocksource setting (#10636 ) This PR does 2 main things: 1) Add warning for suspected slow system clocksource setting. This is Linux specific. 2) Add a `--check-system` argument to redis which runs all system checks and prints a report. ## System checks Add a command line option `--check-system` which runs all known system checks and provides a report to stdout of which systems checks have failed with details on how to reconfigure the system for optimized redis performance. The `--system-check` mode exists with an appropriate error code after running all the checks. ## Slow clocksource details We check the system's clocksource performance by running `clock_gettime()` in a loop and then checking how much time was spent in a system call (via `getrusage()`). If we spend more than 10% of the time in the kernel then we print a warning. I verified that using the slow clock sources: `acpi_pm` (~90% in the kernel on my laptop) and `xen` (~30% in the kernel on an ec2 `m4.large`) we get this warning. The check runs 5 system ticks so we can detect time spent in kernel at 20% jumps (0%,20%,40%...). Anything more accurate will require the test to run longer. Typically 5 ticks are 50ms. This means running the test on startup will delay startup by 50ms. To avoid this we make sure the test is only executed in the `--check-system` mode. For a quick startup check, we specifically warn if the we see the system is using the `xen` clocksource which we know has bad performance and isn't recommended (at least on ec2). In such a case the user should manually rung redis with `--check-system` to force the slower clocksource test described above. ## Other changes in the PR * All the system checks are now implemented as functions in _syscheck.c_. They are implemented using a standard interface (see details in _syscheck.c_). To do this I moved the checking functions `linuxOvercommitMemoryValue()`, `THPIsEnabled()`, `linuxMadvFreeForkBugCheck()` out of _server.c_ and _latency.c_ and into the new _syscheck.c_. When moving these functions I made sure they don't depend on other functionality provided in _server.c_ and made them use a standard "check functions" interface. Specifically: * I removed all logging out of `linuxMadvFreeForkBugCheck()`. In case there's some unexpected error during the check aborts as before, but without any logging. It returns an error code 0 meaning the check didn't not complete. * All these functions now return 1 on success, -1 on failure, 0 in case the check itself cannot be completed. * The `linuxMadvFreeForkBugCheck()` function now internally calls `exit()` and not `exitFromChild()` because the latter is only available in _server.c_ and I wanted to remove that dependency. This isn't an because we don't need to worry about the child process created by the test doing anything related to the rdb/aof files which is why `exitFromChild()` was created. * This also fixes parsing of other /proc/\<pid\>/stat fields to correctly handle spaces in the process name and be more robust in general. Not that before this fix the rss info in `INFO memory` was corrupt in case of spaces in the process name. To recreate just rename `redis-server` to `redis server`, start it, and run `INFO memory`.	2022-05-22 17:10:31 +03:00

Author

SHA1

Message

Date

Pieter Cailliau

0b34396924

Change license from BSD-3 to dual RSALv2+SSPLv1 (#13157 )

[Read more about the license change
here](https://redis.com/blog/redis-adopts-dual-source-available-licensing/)
Live long and prosper 🖖

2024-03-20 22:38:24 +00:00

yoav-steinberg

843a4cdc07

Add warning for suspected slow system clocksource setting (#10636 )

This PR does 2 main things:
1) Add warning for suspected slow system clocksource setting. This is Linux specific.
2) Add a `--check-system` argument to redis which runs all system checks and prints a report.

## System checks
Add a command line option `--check-system` which runs all known system checks and provides
a report to stdout of which systems checks have failed with details on how to reconfigure the
system for optimized redis performance.
The `--system-check` mode exists with an appropriate error code after running all the checks.

## Slow clocksource details
We check the system's clocksource performance by running `clock_gettime()` in a loop and then
checking how much time was spent in a system call (via `getrusage()`). If we spend more than
10% of the time in the kernel then we print a warning. I verified that using the slow clock sources:
`acpi_pm` (~90% in the kernel on my laptop) and `xen` (~30% in the kernel on an ec2 `m4.large`)
we get this warning.

The check runs 5 system ticks so we can detect time spent in kernel at 20% jumps (0%,20%,40%...).
Anything more accurate will require the test to run longer. Typically 5 ticks are 50ms. This means
running the test on startup will delay startup by 50ms. To avoid this we make sure the test is only
executed in the `--check-system` mode.

For a quick startup check, we specifically warn if the we see the system is using the `xen` clocksource
which we know has bad performance and isn't recommended (at least on ec2). In such a case the
user should manually rung redis with `--check-system` to force the slower clocksource test described
above.

## Other changes in the PR

* All the system checks are now implemented as functions in _syscheck.c_.
They are implemented using a standard interface (see details in _syscheck.c_).
To do this I moved the checking functions `linuxOvercommitMemoryValue()`,
`THPIsEnabled()`, `linuxMadvFreeForkBugCheck()` out of _server.c_ and _latency.c_
and into the new _syscheck.c_. When moving these functions I made sure they don't
depend on other functionality provided in _server.c_ and made them use a standard
"check functions" interface. Specifically:
* I removed all logging out of `linuxMadvFreeForkBugCheck()`. In case there's some
unexpected error during the check aborts as before, but without any logging.
It returns an error code 0 meaning the check didn't not complete.
* All these functions now return 1 on success, -1 on failure, 0 in case the check itself
cannot be completed.
* The `linuxMadvFreeForkBugCheck()` function now internally calls `exit()` and not
`exitFromChild()` because the latter is only available in _server.c_ and I wanted to
remove that dependency. This isn't an because we don't need to worry about the
child process created by the test doing anything related to the rdb/aof files which
is why `exitFromChild()` was created.

* This also fixes parsing of other /proc/\<pid\>/stat fields to correctly handle spaces
in the process name and be more robust in general. Not that before this fix the rss
info in `INFO memory` was corrupt in case of spaces in the process name. To
recreate just rename `redis-server` to `redis server`, start it, and run `INFO memory`.

2022-05-22 17:10:31 +03:00

2 Commits