Time to time slave buffer are counted correctly
test timeouts. It happens more often if run the test suite with --clients 8
parameter and not so often if I run it with --clients 16
. The error looks like this:
[TIMEOUT]: clients state report follows.
sock55cd97dc2cc0 => (IN PROGRESS) slave buffer are counted correctly
Killing still running Redis server 3568
Killing still running Redis server 3599
I tried to skip this test but then the test suite fails with an exception. However, if I try to run only this test it always pass. The exception looks like this:
[ready]: 32
Testing unit/maxmemory
[ok]: eviction due to output buffers of many MGET clients, client eviction: false (32 ms)
[ok]: eviction due to input buffer of a dead client, client eviction: false (233 ms)
[ok]: eviction due to output buffers of pubsub, client eviction: false (1082 ms)
[ok]: eviction due to output buffers of many MGET clients, client eviction: true (35 ms)
[ok]: eviction due to input buffer of a dead client, client eviction: true (242 ms)
[ok]: eviction due to output buffers of pubsub, client eviction: true (429 ms)
[ok]: Without maxmemory small integers are shared (1 ms)
[ok]: With maxmemory and non-LRU policy integers are still shared (0 ms)
[ok]: With maxmemory and LRU policy integers are not shared (1 ms)
[ok]: maxmemory - is the memory limit honoured? (policy allkeys-random) (81 ms)
[ok]: maxmemory - is the memory limit honoured? (policy allkeys-lru) (202 ms)
[ok]: maxmemory - is the memory limit honoured? (policy allkeys-lfu) (170 ms)
[ok]: maxmemory - is the memory limit honoured? (policy volatile-lru) (175 ms)
[ok]: maxmemory - is the memory limit honoured? (policy volatile-lfu) (168 ms)
[ok]: maxmemory - is the memory limit honoured? (policy volatile-random) (174 ms)
[ok]: maxmemory - is the memory limit honoured? (policy volatile-ttl) (168 ms)
[ok]: maxmemory - only allkeys-* should remove non-volatile keys (allkeys-random) (258 ms)
[ok]: maxmemory - only allkeys-* should remove non-volatile keys (allkeys-lru) (264 ms)
[ok]: maxmemory - only allkeys-* should remove non-volatile keys (volatile-lru) (284 ms)
[ok]: maxmemory - only allkeys-* should remove non-volatile keys (volatile-random) (286 ms)
[ok]: maxmemory - only allkeys-* should remove non-volatile keys (volatile-ttl) (275 ms)
[ok]: maxmemory - policy volatile-lru should only remove volatile keys. (238 ms)
[ok]: maxmemory - policy volatile-lfu should only remove volatile keys. (231 ms)
[ok]: maxmemory - policy volatile-random should only remove volatile keys. (227 ms)
[ok]: maxmemory - policy volatile-ttl should only remove volatile keys. (227 ms)
[skip]: slave buffer are counted correctly
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
32 100 1 1 pts/1 1 Sl+ 250 0:00 src/redis-server 127.0.0.1:21114
[exception]: Executing test client: assertion:process was not stopped.
assertion:process was not stopped
while executing
"error "assertion:$msg""
(procedure "fail" line 2)
invoked from within
"fail "process was not stopped""
("uplevel" body line 3)
invoked from within
"uplevel 1 $elsescript"
(procedure "wait_for_condition" line 12)
invoked from within
"wait_for_condition 50 1000 {
[string match "T*" [exec ps -o state= -p $pid]]
} else {
puts [exec ps j $pid]
fail "process ..."
(procedure "resume_process" line 2)
invoked from within
"resume_process $slave_pid"
("uplevel" body line 98)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 2)
invoked from within
"start_server {} {
set slave_pid [s process_id]
test "$test_name" {
set slave [srv 0 client]
set slave_host [sr..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 2)
invoked from within
"start_server {tags {"maxmemory external:skip"}} {
start_server {} {
set slave_pid [s process_id]
test "$test_name" {
..."
(procedure "test_slave_buffers" line 2)
invoked from within
"test_slave_buffers {slave buffer are counted correctly} 1000000 10 0 1"
(file "tests/unit/maxmemory.tcl" line 415)
invoked from within
"source $path"
(procedure "execute_test_file" line 4)
invoked from within
"execute_test_file $data"
(procedure "test_client_main" line 10)
invoked from within
"test_client_main $::test_server_port "
To reproduce
My minimal reproduce (used for the exception failure above) looks like this:
./runtest --clients 1 --timeout 180 --single unit/maxmemory --skiptest 'slave buffer are counted correctly'
Additional information
I used this reproducer to bisect the code base and the first bad commit is: 73a9b916c9f4 - Rdb channel replication (#13732). I also found similar bug report in valkey https://github.com/valkey-io/valkey/issues/841 with a comment that the issue was potentially fixed with https://github.com/valkey-io/valkey/pull/1737, however, I backported it to the first bad commit and I still am reproducing the exception.