I am experiencing frequent test TIMEOUTs in version lines 7.2, 7.4 and 8.0, when I run them in Gentoo test environment. I noticed that after the latest security release of versions 7.2.10, 7.4.5 and 8.0.3. I don't think that I experienced the similar timeouts when I was testing the previous security batch, when 7.2.9, 7.4.4 and 8.0.2 was released but they are now also affected. grep -C5 TIMEOUT redis-{7.2.10,7.4.5,8.0.3}/temp/build.log gives me this:

redis-7.2.10/temp/build.log-[ignore]: hash with one huge field: large memory flag not provided
redis-7.2.10/temp/build.log-[91/91 done]: violations (1 seconds)
redis-7.2.10/temp/build.log-Testing solo test
redis-7.2.10/temp/build.log-[ok]: Active defrag (46825 ms)
redis-7.2.10/temp/build.log-[skip]: Active defrag eval scripts
redis-7.2.10/temp/build.log:[TIMEOUT]: clients state report follows.
redis-7.2.10/temp/build.log-sock561b0c21d610 => (IN PROGRESS) Active defrag big keys
redis-7.2.10/temp/build.log-Killing still running Redis server 1751
redis-7.2.10/temp/build.log-Killing still running Redis server 1797
redis-7.2.10/temp/build.log-Killing still running Redis server 1794
redis-7.2.10/temp/build.log-Killing still running Redis server 1967
--
redis-7.2.10/temp/build.log-  0 seconds - bitops-large-memory
redis-7.2.10/temp/build.log-  1 seconds - violations
redis-7.2.10/temp/build.log-
redis-7.2.10/temp/build.log-!!! WARNING The following tests failed:
redis-7.2.10/temp/build.log-
redis-7.2.10/temp/build.log:*** [TIMEOUT]: clients state report follows.
redis-7.2.10/temp/build.log-Cleanup: may take some time... OK
redis-7.2.10/temp/build.log- * ERROR: dev-db/redis-7.2.10::gentoo failed (test phase):
redis-7.2.10/temp/build.log- *   Failed to run command: ./runtest
redis-7.2.10/temp/build.log- *
redis-7.2.10/temp/build.log- * Call stack:
--
redis-7.4.5/temp/build.log-[ok]: Active defrag big keys: cluster (28088 ms)
redis-7.4.5/temp/build.log-[ok]: Active defrag pubsub: cluster (35031 ms)
redis-7.4.5/temp/build.log-[ok]: Active Defrag HFE: cluster (8165 ms)
redis-7.4.5/temp/build.log-[ok]: Active defrag main dictionary: standalone (48346 ms)
redis-7.4.5/temp/build.log-[ok]: Active defrag eval scripts: standalone (5361 ms)
redis-7.4.5/temp/build.log:[TIMEOUT]: clients state report follows.
redis-7.4.5/temp/build.log-sock5555f20636e0 => (IN PROGRESS) Active defrag big keys: standalone
redis-7.4.5/temp/build.log-Killing still running Redis server 20333
redis-7.4.5/temp/build.log-
redis-7.4.5/temp/build.log-                   The End
redis-7.4.5/temp/build.log-
--
redis-7.4.5/temp/build.log-  162 seconds - integration/replication-psync
redis-7.4.5/temp/build.log-  1 seconds - bitops-large-memory
redis-7.4.5/temp/build.log-
redis-7.4.5/temp/build.log-!!! WARNING The following tests failed:
redis-7.4.5/temp/build.log-
redis-7.4.5/temp/build.log:*** [TIMEOUT]: clients state report follows.
redis-7.4.5/temp/build.log-Cleanup: may take some time... OK
redis-7.4.5/temp/build.log- * ERROR: dev-db/redis-7.4.5::gentoo failed (test phase):
redis-7.4.5/temp/build.log- *   Failed to run command: ./runtest
redis-7.4.5/temp/build.log- *
redis-7.4.5/temp/build.log- * Call stack:
--
redis-8.0.3/temp/build.log-[ignore]: SETBIT values larger than UINT32_MAX and lzf_compress/lzf_decompress correctly: large memory flag not provided
redis-8.0.3/temp/build.log-[95/95 done]: bitops-large-memory (0 seconds)
redis-8.0.3/temp/build.log-Testing solo test
redis-8.0.3/temp/build.log-[ok]: Active defrag main dictionary: cluster (53920 ms)
redis-8.0.3/temp/build.log-[ok]: Active defrag eval scripts: cluster (7182 ms)
redis-8.0.3/temp/build.log:[TIMEOUT]: clients state report follows.
redis-8.0.3/temp/build.log-sock55bd580abee0 => (IN PROGRESS) Active defrag big keys: cluster
redis-8.0.3/temp/build.log-Killing still running Redis server 23878
redis-8.0.3/temp/build.log-
redis-8.0.3/temp/build.log-                   The End
redis-8.0.3/temp/build.log-
--
redis-8.0.3/temp/build.log-  240 seconds - integration/replication-psync
redis-8.0.3/temp/build.log-  0 seconds - bitops-large-memory
redis-8.0.3/temp/build.log-
redis-8.0.3/temp/build.log-!!! WARNING The following tests failed:
redis-8.0.3/temp/build.log-
redis-8.0.3/temp/build.log:*** [TIMEOUT]: clients state report follows.
redis-8.0.3/temp/build.log-Cleanup: may take some time... OK
redis-8.0.3/temp/build.log- * ERROR: dev-db/redis-8.0.3::gentoo failed (test phase):
redis-8.0.3/temp/build.log- *   Failed to run command: ./runtest
redis-8.0.3/temp/build.log- *
redis-8.0.3/temp/build.log- * Call stack:

Interestingly, 6.2.19 versions is not affected, therefore, I bisected the code between 8.0.3 and 6.2.19. The git bisect pointed to the commit 98b3f52599cc - add test suite infra to test RESP3 attributes (#10247) and tests pass if I revert this change in 8.0.3. The tests are executed with following parameters in gentoo ebuild:

./runtest --clients 16 --skiptest '/Active defrag for argv retained by the main thread from IO thread.*' --skipunit unit/oom-score-adj --skiptest 'CONFIG SET rollback on apply error' --tls

and it is related to --tls parameter, because tests pass without it.

Comment From: sundb

@arkamar this timeout has also frequently happended in github actions recently. Could you share your detailed steps to reproduce it?

Gentoo test environment

Do you mean run this test on Gentoo?

Comment From: arkamar

@subdb, yes, I can. I have a running docker container with updated ~amd64 (testing) Gentoo. I installed all necessary dependencies

emerge -qav1ok --with-test-deps --with-bdeps=y dev-db/redis

and then I executed the test

ebuild /var/db/repos/gentoo/dev-db/redis/redis-8.0.3.ebuild clean test

I also tried to downgrade this Gentoo container to the state when redis 8.0.2 was released and the tests also timeouted. This is a little bit weird because I really think they passed when redis 8.0.2 was out. The only difference which I see is a host kernel, which was originally 6.11.4 and now it is 6.15.2.

Comment From: arkamar

I tried it with different openssl versions (3.3.3, 3.4.1, 3.4.2, 3.5.0 and 3.5.1) tests timeout with all of them.

Comment From: sundb

@arkamar great, the timeout is likely related to https://github.com/redis/redis/commit/98b3f52599cc3106ddc882d0dcc744bcaf9e0264 , when I reverted this commit the GH CI passed without timeout, but I don't know why this change can cause this timeout, the code for this commit is rather simple. Do you have any more clues?

Comment From: arkamar

Well, it really seems to be related to linux kernel. I created a virtual machine and tried the same test in system with kernel 6.11.4 and 6.15.2. The tests pass with kernel 6.11.4 and they timeout with 6.15.2. I am currently bisecting the kernel to find out what causes the regression. I'll let you know, but so far it seems that 6.12 and 6.13 are fine.

Comment From: sundb

@arkamar thanks a lot, this thing has been bothering me for a long time, and I still don't know the root cause. The reason why 6.x is ok might be that https://github.com/redis/redis/pull/10247 was introduced since 7.0.

Comment From: arkamar

Yeah, I know.

Comment From: arkamar

This regression was introduced in https://github.com/torvalds/linux/commit/8c670bdfa58e48abad1d5b6ca1ee843ca91f7303 commit, which is part of linux kernel 6.14. It is a solution for CVE-2025-21710. I guess this thread might be also related as it describes what is most probably happening, but I didn't investigated it deeper.

Comment From: sundb

@arkamar good catch.

Comment From: arkamar

I tried to test it with 6.16-rcs and the test passed without timeouts. Therefore, I bisected the commit which makes it work again. It is this one: https://github.com/torvalds/linux/commit/572be9bf9d0d96242dd7977ce456009b6c690dce. Well, I guess it is a bit unfortunate, as it does not look like a fix and I have no idea why buffer size change affects the test behavior.

Comment From: sundb

@arkamar thanks a lot, I once doubted net.ipv4.tcp_rmem or net.ipv4.tcp_wmem, but I couldn't verify them. This test wrote a large number of commands but didn't read the reply. I'm not sure if it was because the socket buffer was full that the write was blocked forever.

Comment From: sundb

@arkamar Could you increase the value of net.ipv4.tcp_rmem locally to see if it still times out? thx.

Comment From: arkamar

@sundb Tests stop to timeout if I increase the net.ipv4.tcp_rmem to 18MB or more.

Comment From: sundb

@arkamar great, thank you so much.

Comment From: sundb

@arkamar can you try again with https://github.com/redis/redis/pull/14217, thx.