redis/tests/support/util.tcl

1042 lines
31 KiB
Tcl
Raw Normal View History

proc randstring {min max {type binary}} {
set len [expr {$min+int(rand()*($max-$min+1))}]
set output {}
if {$type eq {binary}} {
set minval 0
set maxval 255
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
} elseif {$type eq {alpha} || $type eq {simplealpha}} {
set minval 48
set maxval 122
} elseif {$type eq {compr}} {
set minval 48
set maxval 52
}
while {$len} {
set num [expr {$minval+int(rand()*($maxval-$minval+1))}]
set rr [format "%c" $num]
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
if {$type eq {simplealpha} && ![string is alnum $rr]} {continue}
if {$type eq {alpha} && $num eq 92} {continue} ;# avoid putting '\' char in the string, it can mess up TCL processing
Client eviction (#8687) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 13:02:16 +02:00
append output $rr
incr len -1
}
return $output
}
# Useful for some test
proc zlistAlikeSort {a b} {
if {[lindex $a 0] > [lindex $b 0]} {return 1}
if {[lindex $a 0] < [lindex $b 0]} {return -1}
string compare [lindex $a 1] [lindex $b 1]
}
# Return all log lines starting with the first line that contains a warning.
# Generally, this will be an assertion error with a stack trace.
proc crashlog_from_file {filename} {
set lines [split [exec cat $filename] "\n"]
set matched 0
set logall 0
set result {}
foreach line $lines {
if {[string match {*REDIS BUG REPORT START*} $line]} {
set logall 1
}
if {[regexp {^\[\d+\]\s+\d+\s+\w+\s+\d{2}:\d{2}:\d{2} \#} $line]} {
set matched 1
}
if {$logall || $matched} {
lappend result $line
}
}
join $result "\n"
}
# Return sanitizer log lines
proc sanitizer_errors_from_file {filename} {
set log [exec cat $filename]
set lines [split [exec cat $filename] "\n"]
foreach line $lines {
# Ignore huge allocation warnings
if ([string match {*WARNING: AddressSanitizer failed to allocate*} $line]) {
continue
}
# GCC UBSAN output does not contain 'Sanitizer' but 'runtime error'.
if {[string match {*runtime error*} $log] ||
[string match {*Sanitizer*} $log]} {
return $log
}
}
return ""
}
proc getInfoProperty {infostr property} {
if {[regexp "\r\n$property:(.*?)\r\n" $infostr _ value]} {
set _ $value
}
}
# Return value for INFO property
proc status {r property} {
set _ [getInfoProperty [{*}$r info] $property]
}
proc waitForBgsave r {
while 1 {
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788) Implement Multi-Part AOF mechanism to avoid overheads during AOFRW. Introducing a folder with multiple AOF files tracked by a manifest file. The main issues with the the original AOFRW mechanism are: * buffering of commands that are processed during rewrite (consuming a lot of RAM) * freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it. * double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files) The main modifications of this PR: 1. Remove the AOF rewrite buffer and related code. 2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type, it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the incremental commands since the last AOFRW. 3. Use a AOF manifest file to record and manage these AOF files mentioned above. 4. The original configuration of `appendfilename` will be the base part of the new file name, for example: `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof` 5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename` 6. Remove the `aof_rewrite_buffer_length` field in info. 7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs. It also gives users the opportunity to preserve the history AOFs. just for testing use now. 8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now), we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately. 9. Support upgrade (load) data from old version redis. 10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and manifest file will be placed in this directory. 11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if `aof-load-truncated` is enabled. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-03 18:14:13 +01:00
if {[status $r rdb_bgsave_in_progress] eq 1} {
if {$::verbose} {
puts -nonewline "\nWaiting for background save to finish... "
flush stdout
}
after 50
} else {
break
}
}
}
proc waitForBgrewriteaof r {
while 1 {
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788) Implement Multi-Part AOF mechanism to avoid overheads during AOFRW. Introducing a folder with multiple AOF files tracked by a manifest file. The main issues with the the original AOFRW mechanism are: * buffering of commands that are processed during rewrite (consuming a lot of RAM) * freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it. * double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files) The main modifications of this PR: 1. Remove the AOF rewrite buffer and related code. 2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type, it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the incremental commands since the last AOFRW. 3. Use a AOF manifest file to record and manage these AOF files mentioned above. 4. The original configuration of `appendfilename` will be the base part of the new file name, for example: `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof` 5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename` 6. Remove the `aof_rewrite_buffer_length` field in info. 7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs. It also gives users the opportunity to preserve the history AOFs. just for testing use now. 8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now), we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately. 9. Support upgrade (load) data from old version redis. 10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and manifest file will be placed in this directory. 11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if `aof-load-truncated` is enabled. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-03 18:14:13 +01:00
if {[status $r aof_rewrite_in_progress] eq 1} {
if {$::verbose} {
puts -nonewline "\nWaiting for background AOF rewrite to finish... "
flush stdout
}
after 50
} else {
break
}
}
}
proc wait_for_sync r {
wait_for_condition 50 100 {
[status $r master_link_status] eq "up"
} else {
Set repl-diskless-sync to yes by default, add repl-diskless-sync-max-replicas (#10092) 1. enable diskless replication by default 2. add a new config named repl-diskless-sync-max-replicas that enables replication to start before the full repl-diskless-sync-delay was reached. 3. put replica online sooner on the master (see below) 4. test suite uses repl-diskless-sync-delay of 0 to be faster 5. a few tests that use multiple replica on a pre-populated master, are now using the new repl-diskless-sync-max-replicas 6. fix possible timing issues in a few cluster tests (see below) put replica online sooner on the master ---------------------------------------------------- there were two tests that failed because they needed for the master to realize that the replica is online, but the test code was actually only waiting for the replica to realize it's online, and in diskless it could have been before the master realized it. changes include two things: 1. the tests wait on the right thing 2. issues in the master, putting the replica online in two steps. the master used to put the replica as online in 2 steps. the first step was to mark it as online, and the second step was to enable the write event (only after getting ACK), but in fact the first step didn't contains some of the tasks to put it online (like updating good slave count, and sending the module event). this meant that if a test was waiting to see that the replica is online form the point of view of the master, and then confirm that the module got an event, or that the master has enough good replicas, it could fail due to timing issues. so now the full effect of putting the replica online, happens at once, and only the part about enabling the writes is delayed till the ACK. fix cluster tests -------------------- I added some code to wait for the replica to sync and avoid race conditions. later realized the sentinel and cluster tests where using the original 5 seconds delay, so changed it to 0. this means the other changes are probably not needed, but i suppose they're still better (avoid race conditions)
2022-01-17 13:11:11 +01:00
fail "replica didn't sync in time"
}
}
proc wait_replica_online r {
wait_for_condition 50 100 {
[string match "*slave0:*,state=online*" [$r info replication]]
} else {
fail "replica didn't online in time"
}
}
2018-11-11 08:22:42 +01:00
proc wait_for_ofs_sync {r1 r2} {
wait_for_condition 50 100 {
[status $r1 master_repl_offset] eq [status $r2 master_repl_offset]
} else {
fail "replica offset didn't match in time"
2018-11-11 08:22:42 +01:00
}
}
proc wait_done_loading r {
wait_for_condition 50 100 {
[catch {$r ping} e] == 0
} else {
fail "Loading DB is taking too much time."
}
}
proc wait_lazyfree_done r {
wait_for_condition 50 100 {
[status $r lazyfree_pending_objects] == 0
} else {
fail "lazyfree isn't done"
}
}
# count current log lines in server's stdout
proc count_log_lines {srv_idx} {
set _ [string trim [exec wc -l < [srv $srv_idx stdout]]]
}
# returns the number of times a line with that pattern appears in a file
proc count_message_lines {file pattern} {
set res 0
# exec fails when grep exists with status other than 0 (when the patter wasn't found)
catch {
set res [string trim [exec grep $pattern $file 2> /dev/null | wc -l]]
}
return $res
}
# returns the number of times a line with that pattern appears in the log
proc count_log_message {srv_idx pattern} {
set stdout [srv $srv_idx stdout]
return [count_message_lines $stdout $pattern]
}
# verify pattern exists in server's sdtout after a certain line number
proc verify_log_message {srv_idx pattern from_line} {
incr from_line
set result [exec tail -n +$from_line < [srv $srv_idx stdout]]
if {![string match $pattern $result]} {
error "assertion:expected message not found in log file: $pattern"
}
}
# wait for pattern to be found in server's stdout after certain line number
# return value is a list containing the line that matched the pattern and the line number
proc wait_for_log_messages {srv_idx patterns from_line maxtries delay} {
set retry $maxtries
set next_line [expr $from_line + 1] ;# searching form the line after
set stdout [srv $srv_idx stdout]
while {$retry} {
# re-read the last line (unless it's before to our first), last time we read it, it might have been incomplete
set next_line [expr $next_line - 1 > $from_line + 1 ? $next_line - 1 : $from_line + 1]
set result [exec tail -n +$next_line < $stdout]
set result [split $result "\n"]
foreach line $result {
foreach pattern $patterns {
if {[string match $pattern $line]} {
return [list $line $next_line]
}
}
incr next_line
}
incr retry -1
after $delay
}
if {$retry == 0} {
if {$::verbose} {
puts "content of $stdout from line: $from_line:"
puts [exec tail -n +$from_line < $stdout]
}
fail "log message of '$patterns' not found in $stdout after line: $from_line till line: [expr $next_line -1]"
}
}
# write line to server log file
proc write_log_line {srv_idx msg} {
set logfile [srv $srv_idx stdout]
set fd [open $logfile "a+"]
puts $fd "### $msg"
close $fd
}
2013-06-25 15:32:37 +02:00
# Random integer between 0 and max (excluded).
proc randomInt {max} {
expr {int(rand()*$max)}
}
# Random integer between min and max (excluded).
proc randomRange {min max} {
expr {int(rand()*[expr $max - $min]) + $min}
}
2013-06-25 15:32:37 +02:00
# Random signed integer between -max and max (both extremes excluded).
proc randomSignedInt {max} {
set i [randomInt $max]
if {rand() > 0.5} {
set i -$i
}
return $i
}
proc randpath args {
set path [expr {int(rand()*[llength $args])}]
uplevel 1 [lindex $args $path]
}
proc randomValue {} {
randpath {
# Small enough to likely collide
randomSignedInt 1000
} {
# 32 bit compressible signed/unsigned
randpath {randomSignedInt 2000000000} {randomSignedInt 4000000000}
} {
# 64 bit
randpath {randomSignedInt 1000000000000}
} {
# Random string
randpath {randstring 0 256 alpha} \
{randstring 0 256 compr} \
{randstring 0 256 binary}
}
}
proc randomKey {} {
randpath {
# Small enough to likely collide
randomInt 1000
} {
# 32 bit compressible signed/unsigned
randpath {randomInt 2000000000} {randomInt 4000000000}
} {
# 64 bit
randpath {randomInt 1000000000000}
} {
# Random string
randpath {randstring 1 256 alpha} \
{randstring 1 256 compr}
}
}
proc findKeyWithType {r type} {
for {set j 0} {$j < 20} {incr j} {
set k [{*}$r randomkey]
if {$k eq {}} {
return {}
}
if {[{*}$r type $k] eq $type} {
return $k
}
}
return {}
}
2010-08-03 13:38:39 +02:00
proc createComplexDataset {r ops {opt {}}} {
Improve test suite to handle external servers better. (#9033) This commit revives the improves the ability to run the test suite against external servers, instead of launching and managing `redis-server` processes as part of the test fixture. This capability existed in the past, using the `--host` and `--port` options. However, it was quite limited and mostly useful when running a specific tests. Attempting to run larger chunks of the test suite experienced many issues: * Many tests depend on being able to start and control `redis-server` themselves, and there's no clear distinction between external server compatible and other tests. * Cluster mode is not supported (resulting with `CROSSSLOT` errors). This PR cleans up many things and makes it possible to run the entire test suite against an external server. It also provides more fine grained controls to handle cases where the external server supports a subset of the Redis commands, limited number of databases, cluster mode, etc. The tests directory now contains a `README.md` file that describes how this works. This commit also includes additional cleanups and fixes: * Tests can now be tagged. * Tag-based selection is now unified across `start_server`, `tags` and `test`. * More information is provided about skipped or ignored tests. * Repeated patterns in tests have been extracted to common procedures, both at a global level and on a per-test file basis. * Cleaned up some cases where test setup was based on a previous test executing (a major anti-pattern that repeats itself in many places). * Cleaned up some cases where test teardown was not part of a test (in the future we should have dedicated teardown code that executes even when tests fail). * Fixed some tests that were flaky running on external servers.
2021-06-09 14:13:24 +02:00
set useexpire [expr {[lsearch -exact $opt useexpire] != -1}]
if {[lsearch -exact $opt usetag] != -1} {
set tag "{t}"
} else {
set tag ""
}
for {set j 0} {$j < $ops} {incr j} {
Improve test suite to handle external servers better. (#9033) This commit revives the improves the ability to run the test suite against external servers, instead of launching and managing `redis-server` processes as part of the test fixture. This capability existed in the past, using the `--host` and `--port` options. However, it was quite limited and mostly useful when running a specific tests. Attempting to run larger chunks of the test suite experienced many issues: * Many tests depend on being able to start and control `redis-server` themselves, and there's no clear distinction between external server compatible and other tests. * Cluster mode is not supported (resulting with `CROSSSLOT` errors). This PR cleans up many things and makes it possible to run the entire test suite against an external server. It also provides more fine grained controls to handle cases where the external server supports a subset of the Redis commands, limited number of databases, cluster mode, etc. The tests directory now contains a `README.md` file that describes how this works. This commit also includes additional cleanups and fixes: * Tests can now be tagged. * Tag-based selection is now unified across `start_server`, `tags` and `test`. * More information is provided about skipped or ignored tests. * Repeated patterns in tests have been extracted to common procedures, both at a global level and on a per-test file basis. * Cleaned up some cases where test setup was based on a previous test executing (a major anti-pattern that repeats itself in many places). * Cleaned up some cases where test teardown was not part of a test (in the future we should have dedicated teardown code that executes even when tests fail). * Fixed some tests that were flaky running on external servers.
2021-06-09 14:13:24 +02:00
set k [randomKey]$tag
set k2 [randomKey]$tag
set f [randomValue]
set v [randomValue]
2010-08-03 13:38:39 +02:00
Improve test suite to handle external servers better. (#9033) This commit revives the improves the ability to run the test suite against external servers, instead of launching and managing `redis-server` processes as part of the test fixture. This capability existed in the past, using the `--host` and `--port` options. However, it was quite limited and mostly useful when running a specific tests. Attempting to run larger chunks of the test suite experienced many issues: * Many tests depend on being able to start and control `redis-server` themselves, and there's no clear distinction between external server compatible and other tests. * Cluster mode is not supported (resulting with `CROSSSLOT` errors). This PR cleans up many things and makes it possible to run the entire test suite against an external server. It also provides more fine grained controls to handle cases where the external server supports a subset of the Redis commands, limited number of databases, cluster mode, etc. The tests directory now contains a `README.md` file that describes how this works. This commit also includes additional cleanups and fixes: * Tests can now be tagged. * Tag-based selection is now unified across `start_server`, `tags` and `test`. * More information is provided about skipped or ignored tests. * Repeated patterns in tests have been extracted to common procedures, both at a global level and on a per-test file basis. * Cleaned up some cases where test setup was based on a previous test executing (a major anti-pattern that repeats itself in many places). * Cleaned up some cases where test teardown was not part of a test (in the future we should have dedicated teardown code that executes even when tests fail). * Fixed some tests that were flaky running on external servers.
2021-06-09 14:13:24 +02:00
if {$useexpire} {
2010-08-03 13:38:39 +02:00
if {rand() < 0.1} {
{*}$r expire [randomKey] [randomInt 2]
}
}
randpath {
set d [expr {rand()}]
} {
set d [expr {rand()}]
} {
set d [expr {rand()}]
} {
set d [expr {rand()}]
} {
set d [expr {rand()}]
} {
randpath {set d +inf} {set d -inf}
}
set t [{*}$r type $k]
if {$t eq {none}} {
randpath {
{*}$r set $k $v
} {
{*}$r lpush $k $v
} {
{*}$r sadd $k $v
} {
{*}$r zadd $k $d $v
} {
{*}$r hset $k $f $v
} {
{*}$r del $k
}
set t [{*}$r type $k]
}
switch $t {
{string} {
# Nothing to do
}
{list} {
randpath {{*}$r lpush $k $v} \
{{*}$r rpush $k $v} \
{{*}$r lrem $k 0 $v} \
{{*}$r rpop $k} \
{{*}$r lpop $k}
}
{set} {
randpath {{*}$r sadd $k $v} \
{{*}$r srem $k $v} \
{
2010-11-04 10:48:49 +01:00
set otherset [findKeyWithType {*}$r set]
if {$otherset ne {}} {
randpath {
{*}$r sunionstore $k2 $k $otherset
} {
{*}$r sinterstore $k2 $k $otherset
} {
{*}$r sdiffstore $k2 $k $otherset
}
}
}
}
{zset} {
randpath {{*}$r zadd $k $d $v} \
{{*}$r zrem $k $v} \
{
2010-11-04 10:48:49 +01:00
set otherzset [findKeyWithType {*}$r zset]
if {$otherzset ne {}} {
randpath {
{*}$r zunionstore $k2 2 $k $otherzset
} {
{*}$r zinterstore $k2 2 $k $otherzset
}
}
}
}
{hash} {
randpath {{*}$r hset $k $f $v} \
{{*}$r hdel $k $f}
}
}
}
}
proc formatCommand {args} {
set cmd "*[llength $args]\r\n"
foreach a $args {
append cmd "$[string length $a]\r\n$a\r\n"
}
set _ $cmd
}
2010-07-27 14:42:11 +02:00
proc csvdump r {
set o {}
Improve test suite to handle external servers better. (#9033) This commit revives the improves the ability to run the test suite against external servers, instead of launching and managing `redis-server` processes as part of the test fixture. This capability existed in the past, using the `--host` and `--port` options. However, it was quite limited and mostly useful when running a specific tests. Attempting to run larger chunks of the test suite experienced many issues: * Many tests depend on being able to start and control `redis-server` themselves, and there's no clear distinction between external server compatible and other tests. * Cluster mode is not supported (resulting with `CROSSSLOT` errors). This PR cleans up many things and makes it possible to run the entire test suite against an external server. It also provides more fine grained controls to handle cases where the external server supports a subset of the Redis commands, limited number of databases, cluster mode, etc. The tests directory now contains a `README.md` file that describes how this works. This commit also includes additional cleanups and fixes: * Tests can now be tagged. * Tag-based selection is now unified across `start_server`, `tags` and `test`. * More information is provided about skipped or ignored tests. * Repeated patterns in tests have been extracted to common procedures, both at a global level and on a per-test file basis. * Cleaned up some cases where test setup was based on a previous test executing (a major anti-pattern that repeats itself in many places). * Cleaned up some cases where test teardown was not part of a test (in the future we should have dedicated teardown code that executes even when tests fail). * Fixed some tests that were flaky running on external servers.
2021-06-09 14:13:24 +02:00
if {$::singledb} {
set maxdb 1
} else {
set maxdb 16
}
for {set db 0} {$db < $maxdb} {incr db} {
if {!$::singledb} {
{*}$r select $db
}
2015-08-05 12:27:15 +02:00
foreach k [lsort [{*}$r keys *]] {
set type [{*}$r type $k]
append o [csvstring $db] , [csvstring $k] , [csvstring $type] ,
switch $type {
string {
append o [csvstring [{*}$r get $k]] "\n"
2010-07-27 14:42:11 +02:00
}
2015-08-05 12:27:15 +02:00
list {
foreach e [{*}$r lrange $k 0 -1] {
append o [csvstring $e] ,
}
append o "\n"
2010-07-27 14:42:11 +02:00
}
2015-08-05 12:27:15 +02:00
set {
foreach e [lsort [{*}$r smembers $k]] {
append o [csvstring $e] ,
}
append o "\n"
2010-07-27 14:42:11 +02:00
}
2015-08-05 12:27:15 +02:00
zset {
foreach e [{*}$r zrange $k 0 -1 withscores] {
append o [csvstring $e] ,
}
append o "\n"
2010-07-27 14:42:11 +02:00
}
2015-08-05 12:27:15 +02:00
hash {
set fields [{*}$r hgetall $k]
set newfields {}
foreach {k v} $fields {
lappend newfields [list $k $v]
}
set fields [lsort -index 0 $newfields]
foreach kv $fields {
append o [csvstring [lindex $kv 0]] ,
append o [csvstring [lindex $kv 1]] ,
}
append o "\n"
2010-07-27 14:42:11 +02:00
}
}
}
}
Improve test suite to handle external servers better. (#9033) This commit revives the improves the ability to run the test suite against external servers, instead of launching and managing `redis-server` processes as part of the test fixture. This capability existed in the past, using the `--host` and `--port` options. However, it was quite limited and mostly useful when running a specific tests. Attempting to run larger chunks of the test suite experienced many issues: * Many tests depend on being able to start and control `redis-server` themselves, and there's no clear distinction between external server compatible and other tests. * Cluster mode is not supported (resulting with `CROSSSLOT` errors). This PR cleans up many things and makes it possible to run the entire test suite against an external server. It also provides more fine grained controls to handle cases where the external server supports a subset of the Redis commands, limited number of databases, cluster mode, etc. The tests directory now contains a `README.md` file that describes how this works. This commit also includes additional cleanups and fixes: * Tests can now be tagged. * Tag-based selection is now unified across `start_server`, `tags` and `test`. * More information is provided about skipped or ignored tests. * Repeated patterns in tests have been extracted to common procedures, both at a global level and on a per-test file basis. * Cleaned up some cases where test setup was based on a previous test executing (a major anti-pattern that repeats itself in many places). * Cleaned up some cases where test teardown was not part of a test (in the future we should have dedicated teardown code that executes even when tests fail). * Fixed some tests that were flaky running on external servers.
2021-06-09 14:13:24 +02:00
if {!$::singledb} {
{*}$r select 9
}
2010-07-27 14:42:11 +02:00
return $o
}
proc csvstring s {
return "\"$s\""
}
proc roundFloat f {
format "%.10g" $f
}
set ::last_port_attempted 0
proc find_available_port {start count} {
set port [expr $::last_port_attempted + 1]
for {set attempts 0} {$attempts < $count} {incr attempts} {
if {$port < $start || $port >= $start+$count} {
set port $start
}
set fd1 -1
if {[catch {set fd1 [socket -server 127.0.0.1 $port]}] ||
[catch {set fd2 [socket -server 127.0.0.1 [expr $port+10000]]}]} {
if {$fd1 != -1} {
close $fd1
}
} else {
close $fd1
close $fd2
set ::last_port_attempted $port
return $port
}
incr port
}
error "Can't find a non busy port in the $start-[expr {$start+$count-1}] range."
}
2014-02-17 17:36:50 +01:00
# Test if TERM looks like to support colors
proc color_term {} {
expr {[info exists ::env(TERM)] && [string match *xterm* $::env(TERM)]}
}
proc colorstr {color str} {
if {[color_term]} {
set b 0
if {[string range $color 0 4] eq {bold-}} {
set b 1
set color [string range $color 5 end]
}
switch $color {
red {set colorcode {31}}
green {set colorcode {32}}
yellow {set colorcode {33}}
blue {set colorcode {34}}
magenta {set colorcode {35}}
cyan {set colorcode {36}}
white {set colorcode {37}}
default {set colorcode {37}}
}
if {$colorcode ne {}} {
return "\033\[$b;${colorcode};49m$str\033\[0m"
2014-02-17 17:36:50 +01:00
}
} else {
return $str
}
}
2014-07-10 11:24:59 +02:00
Sanitize dump payload: fuzz tester and fixes for segfaults and leaks it exposed The test creates keys with various encodings, DUMP them, corrupt the payload and RESTORES it. It utilizes the recently added use-exit-on-panic config to distinguish between asserts and segfaults. If the restore succeeds, it runs random commands on the key to attempt to trigger a crash. It runs in two modes, one with deep sanitation enabled and one without. In the first one we don't expect any assertions or segfaults, in the second one we expect assertions, but no segfaults. We also check for leaks and invalid reads using valgrind, and if we find them we print the commands that lead to that issue. Changes in the code (other than the test): - Replace a few NPD (null pointer deference) flows and division by zero with an assertion, so that it doesn't fail the test. (since we set the server to use `exit` rather than `abort` on assertion). - Fix quite a lot of flows in rdb.c that could have lead to memory leaks in RESTORE command (since it now responds with an error rather than panic) - Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need to bother with faking a valid checksum - Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to run in the crash report (see comments in the code) - fix a missing boundary check in lzf_decompress test suite infra improvements: - be able to run valgrind checks before the process terminates - rotate log files when restarting servers
2020-08-14 15:05:34 +02:00
proc find_valgrind_errors {stderr on_termination} {
set fd [open $stderr]
set buf [read $fd]
close $fd
# Look for stack trace (" at 0x") and other errors (Invalid, Mismatched, etc).
# Look for "Warnings", but not the "set address range perms". These don't indicate any real concern.
Sanitize dump payload: fuzz tester and fixes for segfaults and leaks it exposed The test creates keys with various encodings, DUMP them, corrupt the payload and RESTORES it. It utilizes the recently added use-exit-on-panic config to distinguish between asserts and segfaults. If the restore succeeds, it runs random commands on the key to attempt to trigger a crash. It runs in two modes, one with deep sanitation enabled and one without. In the first one we don't expect any assertions or segfaults, in the second one we expect assertions, but no segfaults. We also check for leaks and invalid reads using valgrind, and if we find them we print the commands that lead to that issue. Changes in the code (other than the test): - Replace a few NPD (null pointer deference) flows and division by zero with an assertion, so that it doesn't fail the test. (since we set the server to use `exit` rather than `abort` on assertion). - Fix quite a lot of flows in rdb.c that could have lead to memory leaks in RESTORE command (since it now responds with an error rather than panic) - Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need to bother with faking a valid checksum - Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to run in the crash report (see comments in the code) - fix a missing boundary check in lzf_decompress test suite infra improvements: - be able to run valgrind checks before the process terminates - rotate log files when restarting servers
2020-08-14 15:05:34 +02:00
# corrupt-dump unit, not sure why but it seems they don't indicate any real concern.
if {[regexp -- { at 0x} $buf] ||
[regexp -- {^(?=.*Warning)(?:(?!set address range perms).)*$} $buf] ||
[regexp -- {Invalid} $buf] ||
[regexp -- {Mismatched} $buf] ||
[regexp -- {uninitialized} $buf] ||
[regexp -- {has a fishy} $buf] ||
Sanitize dump payload: fuzz tester and fixes for segfaults and leaks it exposed The test creates keys with various encodings, DUMP them, corrupt the payload and RESTORES it. It utilizes the recently added use-exit-on-panic config to distinguish between asserts and segfaults. If the restore succeeds, it runs random commands on the key to attempt to trigger a crash. It runs in two modes, one with deep sanitation enabled and one without. In the first one we don't expect any assertions or segfaults, in the second one we expect assertions, but no segfaults. We also check for leaks and invalid reads using valgrind, and if we find them we print the commands that lead to that issue. Changes in the code (other than the test): - Replace a few NPD (null pointer deference) flows and division by zero with an assertion, so that it doesn't fail the test. (since we set the server to use `exit` rather than `abort` on assertion). - Fix quite a lot of flows in rdb.c that could have lead to memory leaks in RESTORE command (since it now responds with an error rather than panic) - Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need to bother with faking a valid checksum - Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to run in the crash report (see comments in the code) - fix a missing boundary check in lzf_decompress test suite infra improvements: - be able to run valgrind checks before the process terminates - rotate log files when restarting servers
2020-08-14 15:05:34 +02:00
[regexp -- {overlap} $buf]} {
return $buf
}
# If the process didn't terminate yet, we can't look for the summary report
if {!$on_termination} {
return ""
}
# Look for the absence of a leak free summary (happens when redis isn't terminated properly).
Sanitize dump payload: fuzz tester and fixes for segfaults and leaks it exposed The test creates keys with various encodings, DUMP them, corrupt the payload and RESTORES it. It utilizes the recently added use-exit-on-panic config to distinguish between asserts and segfaults. If the restore succeeds, it runs random commands on the key to attempt to trigger a crash. It runs in two modes, one with deep sanitation enabled and one without. In the first one we don't expect any assertions or segfaults, in the second one we expect assertions, but no segfaults. We also check for leaks and invalid reads using valgrind, and if we find them we print the commands that lead to that issue. Changes in the code (other than the test): - Replace a few NPD (null pointer deference) flows and division by zero with an assertion, so that it doesn't fail the test. (since we set the server to use `exit` rather than `abort` on assertion). - Fix quite a lot of flows in rdb.c that could have lead to memory leaks in RESTORE command (since it now responds with an error rather than panic) - Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need to bother with faking a valid checksum - Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to run in the crash report (see comments in the code) - fix a missing boundary check in lzf_decompress test suite infra improvements: - be able to run valgrind checks before the process terminates - rotate log files when restarting servers
2020-08-14 15:05:34 +02:00
if {(![regexp -- {definitely lost: 0 bytes} $buf] &&
![regexp -- {no leaks are possible} $buf])} {
return $buf
}
return ""
}
2014-07-10 11:24:59 +02:00
# Execute a background process writing random data for the specified number
# of seconds to the specified Redis instance.
proc start_write_load {host port seconds} {
set tclsh [info nameofexecutable]
exec $tclsh tests/helpers/gen_write_load.tcl $host $port $seconds $::tls &
2014-07-10 11:24:59 +02:00
}
# Stop a process generating write load executed with start_write_load.
proc stop_write_load {handle} {
catch {exec /bin/kill -9 $handle}
}
proc wait_load_handlers_disconnected {{level 0}} {
wait_for_condition 50 100 {
![string match {*name=LOAD_HANDLER*} [r $level client list]]
} else {
fail "load_handler(s) still connected after too long time."
}
}
proc K { x y } { set x }
# Shuffle a list with Fisher-Yates algorithm.
proc lshuffle {list} {
set n [llength $list]
while {$n>1} {
set j [expr {int(rand()*$n)}]
incr n -1
if {$n==$j} continue
set v [lindex $list $j]
lset list $j [lindex $list $n]
lset list $n $v
}
return $list
}
# Execute a background process writing complex data for the specified number
# of ops to the specified Redis instance.
proc start_bg_complex_data {host port db ops} {
set tclsh [info nameofexecutable]
exec $tclsh tests/helpers/bg_complex_data.tcl $host $port $db $ops $::tls &
}
# Stop a process generating write load executed with start_bg_complex_data.
proc stop_bg_complex_data {handle} {
catch {exec /bin/kill -9 $handle}
}
Wait for replicas when shutting down (#9872) To avoid data loss, this commit adds a grace period for lagging replicas to catch up the replication offset. Done: * Wait for replicas when shutdown is triggered by SIGTERM and SIGINT. * Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients to call SHUTDOWN in parallel. Note that they don't expect a response unless an error happens and shutdown is aborted. * Log warning for each replica lagging behind when finishing shutdown. * CLIENT_PAUSE_WRITE while waiting for replicas. * Configurable grace period 'shutdown-timeout' in seconds (default 10). * New flags for the SHUTDOWN command: - NOW disables the grace period for lagging replicas. - FORCE ignores errors writing the RDB or AOF files which would normally prevent a shutdown. - ABORT cancels ongoing shutdown. Can't be combined with other flags. * New field in the output of the INFO command: 'shutdown_in_milliseconds'. The value is the remaining maximum time to wait for lagging replicas before finishing the shutdown. This field is present in the Server section **only** during shutdown. Not directly related: * When shutting down, if there is an AOF saving child, it is killed **even** if AOF is disabled. This can happen if BGREWRITEAOF is used when AOF is off. * Client pause now has end time and type (WRITE or ALL) per purpose. The different pause purposes are *CLIENT PAUSE command*, *failover* and *shutdown*. If clients are unpaused for one purpose, it doesn't affect client pause for other purposes. For example, the CLIENT UNPAUSE command doesn't affect client pause initiated by the failover or shutdown procedures. A completed failover or a failed shutdown doesn't unpause clients paused by the CLIENT PAUSE command. Notes: * DEBUG RESTART doesn't wait for replicas. * We already have a warning logged when a replica disconnects. This means that if any replica connection is lost during the shutdown, it is either logged as disconnected or as lagging at the time of exit. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-02 08:50:15 +01:00
# Write num keys with the given key prefix and value size (in bytes). If idx is
# given, it's the index (AKA level) used with the srv procedure and it specifies
# to which Redis instance to write the keys.
proc populate {num {prefix key:} {size 3} {idx 0}} {
set rd [redis_deferring_client $idx]
for {set j 0} {$j < $num} {incr j} {
$rd set $prefix$j [string repeat A $size]
}
for {set j 0} {$j < $num} {incr j} {
$rd read
}
$rd close
}
if diskless repl child is killed, make sure to reap the pid (#7742) Starting redis 6.0 and the changes we made to the diskless master to be suitable for TLS, I made the master avoid reaping (wait3) the pid of the child until we know all replicas are done reading their rdb. I did that in order to avoid a state where the rdb_child_pid is -1 but we don't yet want to start another fork (still busy serving that data to replicas). It turns out that the solution used so far was problematic in case the fork child was being killed (e.g. by the kernel OOM killer), in that case there's a chance that we currently disabled the read event on the rdb pipe, since we're waiting for a replica to become writable again. and in that scenario the master would have never realized the child exited, and the replica will remain hung too. Note that there's no mechanism to detect a hung replica while it's in rdb transfer state. The solution here is to add another pipe which is used by the parent to tell the child it is safe to exit. this mean that when the child exits, for whatever reason, it is safe to reap it. Besides that, i'm re-introducing an adjustment to REPLCONF ACK which was part of #6271 (Accelerate diskless master connections) but was dropped when that PR was rebased after the TLS fork/pipe changes (5a47794). Now that RdbPipeCleanup no longer calls checkChildrenDone, and the ACK has chance to detect that the child exited, it should be the one to call it so that we don't have to wait for cron (server.hz) to do that.
2020-09-06 15:43:57 +02:00
proc get_child_pid {idx} {
set pid [srv $idx pid]
if {[file exists "/usr/bin/pgrep"]} {
set fd [open "|pgrep -P $pid" "r"]
set child_pid [string trim [lindex [split [read $fd] \n] 0]]
} else {
set fd [open "|ps --ppid $pid -o pid" "r"]
set child_pid [string trim [lindex [split [read $fd] \n] 1]]
}
if diskless repl child is killed, make sure to reap the pid (#7742) Starting redis 6.0 and the changes we made to the diskless master to be suitable for TLS, I made the master avoid reaping (wait3) the pid of the child until we know all replicas are done reading their rdb. I did that in order to avoid a state where the rdb_child_pid is -1 but we don't yet want to start another fork (still busy serving that data to replicas). It turns out that the solution used so far was problematic in case the fork child was being killed (e.g. by the kernel OOM killer), in that case there's a chance that we currently disabled the read event on the rdb pipe, since we're waiting for a replica to become writable again. and in that scenario the master would have never realized the child exited, and the replica will remain hung too. Note that there's no mechanism to detect a hung replica while it's in rdb transfer state. The solution here is to add another pipe which is used by the parent to tell the child it is safe to exit. this mean that when the child exits, for whatever reason, it is safe to reap it. Besides that, i'm re-introducing an adjustment to REPLCONF ACK which was part of #6271 (Accelerate diskless master connections) but was dropped when that PR was rebased after the TLS fork/pipe changes (5a47794). Now that RdbPipeCleanup no longer calls checkChildrenDone, and the ACK has chance to detect that the child exited, it should be the one to call it so that we don't have to wait for cron (server.hz) to do that.
2020-09-06 15:43:57 +02:00
close $fd
return $child_pid
}
proc cmdrstat {cmd r} {
if {[regexp "\r\ncmdstat_$cmd:(.*?)\r\n" [$r info commandstats] _ value]} {
set _ $value
}
}
Sanitize dump payload: fuzz tester and fixes for segfaults and leaks it exposed The test creates keys with various encodings, DUMP them, corrupt the payload and RESTORES it. It utilizes the recently added use-exit-on-panic config to distinguish between asserts and segfaults. If the restore succeeds, it runs random commands on the key to attempt to trigger a crash. It runs in two modes, one with deep sanitation enabled and one without. In the first one we don't expect any assertions or segfaults, in the second one we expect assertions, but no segfaults. We also check for leaks and invalid reads using valgrind, and if we find them we print the commands that lead to that issue. Changes in the code (other than the test): - Replace a few NPD (null pointer deference) flows and division by zero with an assertion, so that it doesn't fail the test. (since we set the server to use `exit` rather than `abort` on assertion). - Fix quite a lot of flows in rdb.c that could have lead to memory leaks in RESTORE command (since it now responds with an error rather than panic) - Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need to bother with faking a valid checksum - Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to run in the crash report (see comments in the code) - fix a missing boundary check in lzf_decompress test suite infra improvements: - be able to run valgrind checks before the process terminates - rotate log files when restarting servers
2020-08-14 15:05:34 +02:00
proc errorrstat {cmd r} {
if {[regexp "\r\nerrorstat_$cmd:(.*?)\r\n" [$r info errorstats] _ value]} {
set _ $value
}
}
Added INFO LATENCYSTATS section: latency by percentile distribution/latency by cumulative distribution of latencies (#9462) # Short description The Redis extended latency stats track per command latencies and enables: - exporting the per-command percentile distribution via the `INFO LATENCYSTATS` command. **( percentile distribution is not mergeable between cluster nodes ).** - exporting the per-command cumulative latency distributions via the `LATENCY HISTOGRAM` command. Using the cumulative distribution of latencies we can merge several stats from different cluster nodes to calculate aggregate metrics . By default, the extended latency monitoring is enabled since the overhead of keeping track of the command latency is very small. If you don't want to track extended latency metrics, you can easily disable it at runtime using the command: - `CONFIG SET latency-tracking no` By default, the exported latency percentiles are the p50, p99, and p999. You can alter them at runtime using the command: - `CONFIG SET latency-tracking-info-percentiles "0.0 50.0 100.0"` ## Some details: - The total size per histogram should sit around 40 KiB. We only allocate those 40KiB when a command was called for the first time. - With regards to the WRITE overhead As seen below, there is no measurable overhead on the achievable ops/sec or full latency spectrum on the client. Including also the measured redis-benchmark for unstable vs this branch. - We track from 1 nanosecond to 1 second ( everything above 1 second is considered +Inf ) ## `INFO LATENCYSTATS` exposition format - Format: `latency_percentiles_usec_<CMDNAME>:p0=XX,p50....` ## `LATENCY HISTOGRAM [command ...]` exposition format Return a cumulative distribution of latencies in the format of a histogram for the specified command names. The histogram is composed of a map of time buckets: - Each representing a latency range, between 1 nanosecond and roughly 1 second. - Each bucket covers twice the previous bucket's range. - Empty buckets are not printed. - Everything above 1 sec is considered +Inf. - At max there will be log2(1000000000)=30 buckets We reply a map for each command in the format: `<command name> : { `calls`: <total command calls> , `histogram` : { <bucket 1> : latency , < bucket 2> : latency, ... } }` Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-05 13:01:05 +01:00
proc latencyrstat_percentiles {cmd r} {
if {[regexp "\r\nlatency_percentiles_usec_$cmd:(.*?)\r\n" [$r info latencystats] _ value]} {
set _ $value
}
}
Sanitize dump payload: fuzz tester and fixes for segfaults and leaks it exposed The test creates keys with various encodings, DUMP them, corrupt the payload and RESTORES it. It utilizes the recently added use-exit-on-panic config to distinguish between asserts and segfaults. If the restore succeeds, it runs random commands on the key to attempt to trigger a crash. It runs in two modes, one with deep sanitation enabled and one without. In the first one we don't expect any assertions or segfaults, in the second one we expect assertions, but no segfaults. We also check for leaks and invalid reads using valgrind, and if we find them we print the commands that lead to that issue. Changes in the code (other than the test): - Replace a few NPD (null pointer deference) flows and division by zero with an assertion, so that it doesn't fail the test. (since we set the server to use `exit` rather than `abort` on assertion). - Fix quite a lot of flows in rdb.c that could have lead to memory leaks in RESTORE command (since it now responds with an error rather than panic) - Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need to bother with faking a valid checksum - Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to run in the crash report (see comments in the code) - fix a missing boundary check in lzf_decompress test suite infra improvements: - be able to run valgrind checks before the process terminates - rotate log files when restarting servers
2020-08-14 15:05:34 +02:00
proc generate_fuzzy_traffic_on_key {key duration} {
# Commands per type, blocking commands removed
# TODO: extract these from COMMAND DOCS, and improve to include other types
set string_commands {APPEND BITCOUNT BITFIELD BITOP BITPOS DECR DECRBY GET GETBIT GETRANGE GETSET INCR INCRBY INCRBYFLOAT MGET MSET MSETNX PSETEX SET SETBIT SETEX SETNX SETRANGE LCS STRLEN}
set hash_commands {HDEL HEXISTS HGET HGETALL HINCRBY HINCRBYFLOAT HKEYS HLEN HMGET HMSET HSCAN HSET HSETNX HSTRLEN HVALS HRANDFIELD}
set zset_commands {ZADD ZCARD ZCOUNT ZINCRBY ZINTERSTORE ZLEXCOUNT ZPOPMAX ZPOPMIN ZRANGE ZRANGEBYLEX ZRANGEBYSCORE ZRANK ZREM ZREMRANGEBYLEX ZREMRANGEBYRANK ZREMRANGEBYSCORE ZREVRANGE ZREVRANGEBYLEX ZREVRANGEBYSCORE ZREVRANK ZSCAN ZSCORE ZUNIONSTORE ZRANDMEMBER}
Sanitize dump payload: fuzz tester and fixes for segfaults and leaks it exposed The test creates keys with various encodings, DUMP them, corrupt the payload and RESTORES it. It utilizes the recently added use-exit-on-panic config to distinguish between asserts and segfaults. If the restore succeeds, it runs random commands on the key to attempt to trigger a crash. It runs in two modes, one with deep sanitation enabled and one without. In the first one we don't expect any assertions or segfaults, in the second one we expect assertions, but no segfaults. We also check for leaks and invalid reads using valgrind, and if we find them we print the commands that lead to that issue. Changes in the code (other than the test): - Replace a few NPD (null pointer deference) flows and division by zero with an assertion, so that it doesn't fail the test. (since we set the server to use `exit` rather than `abort` on assertion). - Fix quite a lot of flows in rdb.c that could have lead to memory leaks in RESTORE command (since it now responds with an error rather than panic) - Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need to bother with faking a valid checksum - Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to run in the crash report (see comments in the code) - fix a missing boundary check in lzf_decompress test suite infra improvements: - be able to run valgrind checks before the process terminates - rotate log files when restarting servers
2020-08-14 15:05:34 +02:00
set list_commands {LINDEX LINSERT LLEN LPOP LPOS LPUSH LPUSHX LRANGE LREM LSET LTRIM RPOP RPOPLPUSH RPUSH RPUSHX}
set set_commands {SADD SCARD SDIFF SDIFFSTORE SINTER SINTERSTORE SISMEMBER SMEMBERS SMOVE SPOP SRANDMEMBER SREM SSCAN SUNION SUNIONSTORE}
set stream_commands {XACK XADD XCLAIM XDEL XGROUP XINFO XLEN XPENDING XRANGE XREAD XREADGROUP XREVRANGE XTRIM}
set commands [dict create string $string_commands hash $hash_commands zset $zset_commands list $list_commands set $set_commands stream $stream_commands]
set type [r type $key]
set cmds [dict get $commands $type]
set start_time [clock seconds]
set sent {}
set succeeded 0
while {([clock seconds]-$start_time) < $duration} {
# find a random command for our key type
set cmd_idx [expr {int(rand()*[llength $cmds])}]
set cmd [lindex $cmds $cmd_idx]
# get the command details from redis
if { [ catch {
set cmd_info [lindex [r command info $cmd] 0]
} err ] } {
# if we failed, it means redis crashed after the previous command
return $sent
}
# try to build a valid command argument
set arity [lindex $cmd_info 1]
set arity [expr $arity < 0 ? - $arity: $arity]
set firstkey [lindex $cmd_info 3]
set lastkey [lindex $cmd_info 4]
Sanitize dump payload: fuzz tester and fixes for segfaults and leaks it exposed The test creates keys with various encodings, DUMP them, corrupt the payload and RESTORES it. It utilizes the recently added use-exit-on-panic config to distinguish between asserts and segfaults. If the restore succeeds, it runs random commands on the key to attempt to trigger a crash. It runs in two modes, one with deep sanitation enabled and one without. In the first one we don't expect any assertions or segfaults, in the second one we expect assertions, but no segfaults. We also check for leaks and invalid reads using valgrind, and if we find them we print the commands that lead to that issue. Changes in the code (other than the test): - Replace a few NPD (null pointer deference) flows and division by zero with an assertion, so that it doesn't fail the test. (since we set the server to use `exit` rather than `abort` on assertion). - Fix quite a lot of flows in rdb.c that could have lead to memory leaks in RESTORE command (since it now responds with an error rather than panic) - Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need to bother with faking a valid checksum - Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to run in the crash report (see comments in the code) - fix a missing boundary check in lzf_decompress test suite infra improvements: - be able to run valgrind checks before the process terminates - rotate log files when restarting servers
2020-08-14 15:05:34 +02:00
set i 1
if {$cmd == "XINFO"} {
lappend cmd "STREAM"
lappend cmd $key
lappend cmd "FULL"
incr i 3
}
if {$cmd == "XREAD"} {
lappend cmd "STREAMS"
lappend cmd $key
randpath {
lappend cmd \$
} {
lappend cmd [randomValue]
}
incr i 3
}
if {$cmd == "XADD"} {
lappend cmd $key
randpath {
lappend cmd "*"
} {
lappend cmd [randomValue]
}
lappend cmd [randomValue]
lappend cmd [randomValue]
incr i 4
}
for {} {$i < $arity} {incr i} {
if {$i == $firstkey || $i == $lastkey} {
Sanitize dump payload: fuzz tester and fixes for segfaults and leaks it exposed The test creates keys with various encodings, DUMP them, corrupt the payload and RESTORES it. It utilizes the recently added use-exit-on-panic config to distinguish between asserts and segfaults. If the restore succeeds, it runs random commands on the key to attempt to trigger a crash. It runs in two modes, one with deep sanitation enabled and one without. In the first one we don't expect any assertions or segfaults, in the second one we expect assertions, but no segfaults. We also check for leaks and invalid reads using valgrind, and if we find them we print the commands that lead to that issue. Changes in the code (other than the test): - Replace a few NPD (null pointer deference) flows and division by zero with an assertion, so that it doesn't fail the test. (since we set the server to use `exit` rather than `abort` on assertion). - Fix quite a lot of flows in rdb.c that could have lead to memory leaks in RESTORE command (since it now responds with an error rather than panic) - Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need to bother with faking a valid checksum - Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to run in the crash report (see comments in the code) - fix a missing boundary check in lzf_decompress test suite infra improvements: - be able to run valgrind checks before the process terminates - rotate log files when restarting servers
2020-08-14 15:05:34 +02:00
lappend cmd $key
} else {
lappend cmd [randomValue]
}
}
# execute the command, we expect commands to fail on syntax errors
lappend sent $cmd
if { ! [ catch {
r {*}$cmd
} err ] } {
incr succeeded
} else {
set err [format "%s" $err] ;# convert to string for pattern matching
if {[string match "*SIGTERM*" $err]} {
puts "command caused test to hang? $cmd"
exit 1
}
Sanitize dump payload: fuzz tester and fixes for segfaults and leaks it exposed The test creates keys with various encodings, DUMP them, corrupt the payload and RESTORES it. It utilizes the recently added use-exit-on-panic config to distinguish between asserts and segfaults. If the restore succeeds, it runs random commands on the key to attempt to trigger a crash. It runs in two modes, one with deep sanitation enabled and one without. In the first one we don't expect any assertions or segfaults, in the second one we expect assertions, but no segfaults. We also check for leaks and invalid reads using valgrind, and if we find them we print the commands that lead to that issue. Changes in the code (other than the test): - Replace a few NPD (null pointer deference) flows and division by zero with an assertion, so that it doesn't fail the test. (since we set the server to use `exit` rather than `abort` on assertion). - Fix quite a lot of flows in rdb.c that could have lead to memory leaks in RESTORE command (since it now responds with an error rather than panic) - Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need to bother with faking a valid checksum - Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to run in the crash report (see comments in the code) - fix a missing boundary check in lzf_decompress test suite infra improvements: - be able to run valgrind checks before the process terminates - rotate log files when restarting servers
2020-08-14 15:05:34 +02:00
}
}
# print stats so that we know if we managed to generate commands that actually made senes
#if {$::verbose} {
# set count [llength $sent]
# puts "Fuzzy traffic sent: $count, succeeded: $succeeded"
#}
# return the list of commands we sent
return $sent
}
proc string2printable s {
set res {}
set has_special_chars false
foreach i [split $s {}] {
scan $i %c int
# non printable characters, including space and excluding: " \ $ { }
if {$int < 32 || $int > 122 || $int == 34 || $int == 36 || $int == 92} {
set has_special_chars true
}
# TCL8.5 has issues mixing \x notation and normal chars in the same
# source code string, so we'll convert the entire string.
append res \\x[format %02X $int]
}
if {!$has_special_chars} {
return $s
}
set res "\"$res\""
return $res
}
# Calculation value of Chi-Square Distribution. By this value
# we can verify the random distribution sample confidence.
# Based on the following wiki:
# https://en.wikipedia.org/wiki/Chi-square_distribution
#
# param res Random sample list
# return Value of Chi-Square Distribution
#
# x2_value: return of chi_square_value function
# df: Degrees of freedom, Number of independent values minus 1
#
# By using x2_value and df to back check the cardinality table,
# we can know the confidence of the random sample.
proc chi_square_value {res} {
unset -nocomplain mydict
foreach key $res {
dict incr mydict $key 1
}
set x2_value 0
set p [expr [llength $res] / [dict size $mydict]]
foreach key [dict keys $mydict] {
set value [dict get $mydict $key]
# Aggregate the chi-square value of each element
set v [expr {pow($value - $p, 2) / $p}]
set x2_value [expr {$x2_value + $v}]
}
return $x2_value
}
#subscribe to Pub/Sub channels
proc consume_subscribe_messages {client type channels} {
set numsub -1
set counts {}
for {set i [llength $channels]} {$i > 0} {incr i -1} {
set msg [$client read]
assert_equal $type [lindex $msg 0]
# when receiving subscribe messages the channels names
# are ordered. when receiving unsubscribe messages
# they are unordered
set idx [lsearch -exact $channels [lindex $msg 1]]
if {[string match "*unsubscribe" $type]} {
assert {$idx >= 0}
} else {
assert {$idx == 0}
}
set channels [lreplace $channels $idx $idx]
# aggregate the subscription count to return to the caller
lappend counts [lindex $msg 2]
}
# we should have received messages for channels
assert {[llength $channels] == 0}
return $counts
}
proc subscribe {client channels} {
$client subscribe {*}$channels
consume_subscribe_messages $client subscribe $channels
}
proc ssubscribe {client channels} {
$client ssubscribe {*}$channels
consume_subscribe_messages $client ssubscribe $channels
}
proc unsubscribe {client {channels {}}} {
$client unsubscribe {*}$channels
consume_subscribe_messages $client unsubscribe $channels
}
proc sunsubscribe {client {channels {}}} {
$client sunsubscribe {*}$channels
consume_subscribe_messages $client sunsubscribe $channels
}
proc psubscribe {client channels} {
$client psubscribe {*}$channels
consume_subscribe_messages $client psubscribe $channels
}
proc punsubscribe {client {channels {}}} {
$client punsubscribe {*}$channels
consume_subscribe_messages $client punsubscribe $channels
}
Improve test suite to handle external servers better. (#9033) This commit revives the improves the ability to run the test suite against external servers, instead of launching and managing `redis-server` processes as part of the test fixture. This capability existed in the past, using the `--host` and `--port` options. However, it was quite limited and mostly useful when running a specific tests. Attempting to run larger chunks of the test suite experienced many issues: * Many tests depend on being able to start and control `redis-server` themselves, and there's no clear distinction between external server compatible and other tests. * Cluster mode is not supported (resulting with `CROSSSLOT` errors). This PR cleans up many things and makes it possible to run the entire test suite against an external server. It also provides more fine grained controls to handle cases where the external server supports a subset of the Redis commands, limited number of databases, cluster mode, etc. The tests directory now contains a `README.md` file that describes how this works. This commit also includes additional cleanups and fixes: * Tests can now be tagged. * Tag-based selection is now unified across `start_server`, `tags` and `test`. * More information is provided about skipped or ignored tests. * Repeated patterns in tests have been extracted to common procedures, both at a global level and on a per-test file basis. * Cleaned up some cases where test setup was based on a previous test executing (a major anti-pattern that repeats itself in many places). * Cleaned up some cases where test teardown was not part of a test (in the future we should have dedicated teardown code that executes even when tests fail). * Fixed some tests that were flaky running on external servers.
2021-06-09 14:13:24 +02:00
proc debug_digest_value {key} {
if {[lsearch $::denytags "needs:debug"] >= 0 || $::ignoredigest} {
Improve test suite to handle external servers better. (#9033) This commit revives the improves the ability to run the test suite against external servers, instead of launching and managing `redis-server` processes as part of the test fixture. This capability existed in the past, using the `--host` and `--port` options. However, it was quite limited and mostly useful when running a specific tests. Attempting to run larger chunks of the test suite experienced many issues: * Many tests depend on being able to start and control `redis-server` themselves, and there's no clear distinction between external server compatible and other tests. * Cluster mode is not supported (resulting with `CROSSSLOT` errors). This PR cleans up many things and makes it possible to run the entire test suite against an external server. It also provides more fine grained controls to handle cases where the external server supports a subset of the Redis commands, limited number of databases, cluster mode, etc. The tests directory now contains a `README.md` file that describes how this works. This commit also includes additional cleanups and fixes: * Tests can now be tagged. * Tag-based selection is now unified across `start_server`, `tags` and `test`. * More information is provided about skipped or ignored tests. * Repeated patterns in tests have been extracted to common procedures, both at a global level and on a per-test file basis. * Cleaned up some cases where test setup was based on a previous test executing (a major anti-pattern that repeats itself in many places). * Cleaned up some cases where test teardown was not part of a test (in the future we should have dedicated teardown code that executes even when tests fail). * Fixed some tests that were flaky running on external servers.
2021-06-09 14:13:24 +02:00
return "dummy-digest-value"
}
r debug digest-value $key
}
proc debug_digest {{level 0}} {
if {[lsearch $::denytags "needs:debug"] >= 0 || $::ignoredigest} {
return "dummy-digest"
}
r $level debug digest
Improve test suite to handle external servers better. (#9033) This commit revives the improves the ability to run the test suite against external servers, instead of launching and managing `redis-server` processes as part of the test fixture. This capability existed in the past, using the `--host` and `--port` options. However, it was quite limited and mostly useful when running a specific tests. Attempting to run larger chunks of the test suite experienced many issues: * Many tests depend on being able to start and control `redis-server` themselves, and there's no clear distinction between external server compatible and other tests. * Cluster mode is not supported (resulting with `CROSSSLOT` errors). This PR cleans up many things and makes it possible to run the entire test suite against an external server. It also provides more fine grained controls to handle cases where the external server supports a subset of the Redis commands, limited number of databases, cluster mode, etc. The tests directory now contains a `README.md` file that describes how this works. This commit also includes additional cleanups and fixes: * Tests can now be tagged. * Tag-based selection is now unified across `start_server`, `tags` and `test`. * More information is provided about skipped or ignored tests. * Repeated patterns in tests have been extracted to common procedures, both at a global level and on a per-test file basis. * Cleaned up some cases where test setup was based on a previous test executing (a major anti-pattern that repeats itself in many places). * Cleaned up some cases where test teardown was not part of a test (in the future we should have dedicated teardown code that executes even when tests fail). * Fixed some tests that were flaky running on external servers.
2021-06-09 14:13:24 +02:00
}
proc wait_for_blocked_client {} {
wait_for_condition 50 100 {
[s blocked_clients] ne 0
} else {
fail "no blocked clients"
}
}
proc wait_for_blocked_clients_count {count {maxtries 100} {delay 10}} {
wait_for_condition $maxtries $delay {
[s blocked_clients] == $count
} else {
fail "Timeout waiting for blocked clients"
}
}
proc read_from_aof {fp} {
# Input fp is a blocking binary file descriptor of an opened AOF file.
if {[gets $fp count] == -1} return ""
set count [string range $count 1 end]
# Return a list of arguments for the command.
set res {}
for {set j 0} {$j < $count} {incr j} {
read $fp 1
set arg [::redis::redis_bulk_read $fp]
if {$j == 0} {set arg [string tolower $arg]}
lappend res $arg
}
return $res
}
proc assert_aof_content {aof_path patterns} {
set fp [open $aof_path r]
fconfigure $fp -translation binary
fconfigure $fp -blocking 1
for {set j 0} {$j < [llength $patterns]} {incr j} {
assert_match [lindex $patterns $j] [read_from_aof $fp]
}
}
Improve test suite to handle external servers better. (#9033) This commit revives the improves the ability to run the test suite against external servers, instead of launching and managing `redis-server` processes as part of the test fixture. This capability existed in the past, using the `--host` and `--port` options. However, it was quite limited and mostly useful when running a specific tests. Attempting to run larger chunks of the test suite experienced many issues: * Many tests depend on being able to start and control `redis-server` themselves, and there's no clear distinction between external server compatible and other tests. * Cluster mode is not supported (resulting with `CROSSSLOT` errors). This PR cleans up many things and makes it possible to run the entire test suite against an external server. It also provides more fine grained controls to handle cases where the external server supports a subset of the Redis commands, limited number of databases, cluster mode, etc. The tests directory now contains a `README.md` file that describes how this works. This commit also includes additional cleanups and fixes: * Tests can now be tagged. * Tag-based selection is now unified across `start_server`, `tags` and `test`. * More information is provided about skipped or ignored tests. * Repeated patterns in tests have been extracted to common procedures, both at a global level and on a per-test file basis. * Cleaned up some cases where test setup was based on a previous test executing (a major anti-pattern that repeats itself in many places). * Cleaned up some cases where test teardown was not part of a test (in the future we should have dedicated teardown code that executes even when tests fail). * Fixed some tests that were flaky running on external servers.
2021-06-09 14:13:24 +02:00
proc config_set {param value {options {}}} {
set mayfail 0
foreach option $options {
switch $option {
"mayfail" {
set mayfail 1
}
default {
error "Unknown option $option"
}
}
}
if {[catch {r config set $param $value} err]} {
if {!$mayfail} {
error $err
} else {
if {$::verbose} {
puts "Ignoring CONFIG SET $param $value failure: $err"
}
}
}
}
proc delete_lines_with_pattern {filename tmpfilename pattern} {
set fh_in [open $filename r]
set fh_out [open $tmpfilename w]
while {[gets $fh_in line] != -1} {
if {![regexp $pattern $line]} {
puts $fh_out $line
}
}
close $fh_in
close $fh_out
file rename -force $tmpfilename $filename
}
proc get_nonloopback_addr {} {
set addrlist [list {}]
catch { set addrlist [exec hostname -I] }
return [lindex $addrlist 0]
}
proc get_nonloopback_client {} {
return [redis [get_nonloopback_addr] [srv 0 "port"] 0 $::tls]
}
# The following functions and variables are used only when running large-memory
# tests. We avoid defining them when not running large-memory tests because the
# global variables takes up lots of memory.
proc init_large_mem_vars {} {
if {![info exists ::str500]} {
set ::str500 [string repeat x 500000000] ;# 500mb
set ::str500_len [string length $::str500]
}
}
# Utility function to write big argument into redis client connection
proc write_big_bulk {size {prefix ""} {skip_read no}} {
init_large_mem_vars
assert {[string length prefix] <= $size}
r write "\$$size\r\n"
r write $prefix
incr size -[string length $prefix]
while {$size >= 500000000} {
r write $::str500
incr size -500000000
}
if {$size > 0} {
r write [string repeat x $size]
}
r write "\r\n"
if {!$skip_read} {
r flush
r read
}
}
# Utility to read big bulk response (work around Tcl limitations)
proc read_big_bulk {code {compare no} {prefix ""}} {
init_large_mem_vars
r readraw 1
set resp_len [uplevel 1 $code] ;# get the first line of the RESP response
assert_equal [string range $resp_len 0 0] "$"
set resp_len [string range $resp_len 1 end]
set prefix_len [string length $prefix]
if {$compare} {
assert {$prefix_len <= $resp_len}
assert {$prefix_len <= $::str500_len}
}
set remaining $resp_len
while {$remaining > 0} {
set l $remaining
if {$l > $::str500_len} {set l $::str500_len} ; # can't read more than 2gb at a time, so read 500mb so we can easily verify read data
set read_data [r rawread $l]
set nbytes [string length $read_data]
if {$compare} {
set comp_len $nbytes
# Compare prefix part
if {$remaining == $resp_len} {
assert_equal $prefix [string range $read_data 0 [expr $prefix_len - 1]]
set read_data [string range $read_data $prefix_len $nbytes]
incr comp_len -$prefix_len
}
# Compare rest of data, evaluate and then assert to avoid huge print in case of failure
set data_equal [expr {$read_data == [string range $::str500 0 [expr $comp_len - 1]]}]
assert $data_equal
}
incr remaining -$nbytes
}
assert_equal [r rawread 2] "\r\n"
r readraw 0
return $resp_len
}
proc prepare_value {size} {
set _v "c"
for {set i 1} {$i < $size} {incr i} {
append _v 0
}
return $_v
}
proc memory_usage {key} {
set usage [r memory usage $key]
if {![string match {*jemalloc*} [s mem_allocator]]} {
# libc allocator can sometimes return a different size allocation for the same requested size
# this makes tests that rely on MEMORY USAGE unreliable, so instead we return a constant 1
set usage 1
}
return $usage
}