Hi - About a week ago redis we started using redis in our app and started crashing. Our system admin suspects this is a redis bug. I am including some information below:
Crash report
Jun 19 17:28:02 server.co systemd[1]: redis.service: Main process exited, code=dumped, status=6/ABRT
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://wiki.almalinux.org/Help-and-Support
░░
░░ An ExecStart= process belonging to unit redis.service has exited.
------ FAST MEMORY TEST ------
3831371:C 19 Jun 2025 21:37:01.047 # Bio thread for job type #0 terminated
2436112:M 19 Jun 2025 21:37:01.540 # Background saving terminated by signal 11
=== REDIS BUG REPORT START: Cut & paste starting from here ===
2436112:M 19 Jun 2025 21:37:05.298 # Redis 6.2.18 crashed by signal: 11, si_code: 1
2436112:M 19 Jun 2025 21:37:05.298 # Accessing address: 0xffffffffffffffff
2436112:M 19 Jun 2025 21:37:05.298 # Crashed running the instruction at: 0x55ab8422d014
------ STACK TRACE ------
EIP:
/usr/bin/redis-server 127.0.0.1:6379(dictSdsKeyCompare+0x34)[0x55ab8422d014]
Backtrace:
/lib64/libc.so.6(+0x3ebf0)[0x7f82ad83ebf0]
/usr/bin/redis-server 127.0.0.1:6379(dictSdsKeyCompare+0x34)[0x55ab8422d014]
/usr/bin/redis-server 127.0.0.1:6379(dictFind+0x75)[0x55ab84232635]
/usr/bin/redis-server 127.0.0.1:6379(lookupKey+0x16)[0x55ab8425e9e6]
/usr/bin/redis-server 127.0.0.1:6379(lookupKeyReadWithFlags+0x6e)[0x55ab8425eafe]
/usr/bin/redis-server 127.0.0.1:6379(getGenericCommand+0x34)[0x55ab84272c44]
/usr/bin/redis-server 127.0.0.1:6379(call+0xfd)[0x55ab8423fa9d]
/usr/bin/redis-server 127.0.0.1:6379(processCommand+0x633)[0x55ab842428e3]
/usr/bin/redis-server 127.0.0.1:6379(processInputBuffer+0x113)[0x55ab84252c03]
/usr/bin/redis-server 127.0.0.1:6379(+0x12824c)[0x55ab8430f24c]
/usr/bin/redis-server 127.0.0.1:6379(aeProcessEvents+0x2e2)[0x55ab84231152]
/usr/bin/redis-server 127.0.0.1:6379(aeMain+0x1d)[0x55ab8423134d]
/usr/bin/redis-server 127.0.0.1:6379(main+0x369)[0x55ab8422ca09]
/lib64/libc.so.6(+0x295d0)[0x7f82ad8295d0]
/lib64/libc.so.6(__libc_start_main+0x80)[0x7f82ad829680]
/usr/bin/redis-server 127.0.0.1:6379(_start+0x25)[0x55ab8422cf15]
redis-check-rdb /var/lib/redis/dump.rdb
[offset 0] Checking RDB file /var/lib/redis/dump.rdb
[offset 27] AUX FIELD redis-ver = '6.2.18'
[offset 41] AUX FIELD redis-bits = '64'
[offset 53] AUX FIELD ctime = '1750385588'
[offset 68] AUX FIELD used-mem = '28616400'
[offset 84] AUX FIELD aof-preamble = '0'
[offset 86] Selecting DB ID 0
[offset 12107] Selecting DB ID 1
[offset 1937098] Selecting DB ID 3
[offset 9885307] Selecting DB ID 10
[offset 11712827] Checksum OK
[offset 11712827] \o/ RDB looks OK! \o/
[info] 12673 keys read
[info] 201 expires
[info] 9 already expired
Additional information
- Almalinux: 5.14.0-503.34.1.el9_5.x86_64
Comment From: sundb
@moazam1 thx, can you share the fully crash report?
Comment From: moazam1
@sundb got AI help to generate me crash report. I hope it's useful.
Redis Crash Report - June 21, 2025
Environment Information
- Redis Version: 6.2.18 (build: 797283b6387a0075)
- OS: Linux 5.14.0-503.34.1.el9_5.x86_64 x86_64
- Architecture: 64-bit
- Memory Allocator: jemalloc-5.1.0
- GCC Version: 11.5.0
- Multiplexing API: epoll
- Process Supervision: systemd
Crash Pattern Analysis
Recent Crashes (Last 7 Days)
SystemD Journal Entries: - Jun 17 20:57:34: Main process exited, code=dumped, status=11/SEGV - Jun 17 20:58:01: Main process exited, code=dumped, status=6/ABRT - Jun 18 05:06:11: Main process exited, code=dumped, status=6/ABRT - Jun 18 05:07:02: Main process exited, code=dumped, status=6/ABRT - Jun 19 00:49:29: Main process exited, code=dumped, status=11/SEGV - Jun 19 05:32:17: Main process exited, code=dumped, status=11/SEGV - Jun 19 17:27:22: Main process exited, code=dumped, status=11/SEGV - Jun 19 17:28:02: Main process exited, code=dumped, status=6/ABRT - Jun 19 21:37:05: Main process exited, code=dumped, status=11/SEGV - Jun 20 07:03:54: Main process exited, code=dumped, status=6/ABRT
Historical Recovery Pattern: - April 7, 2025: Redis auto-recovered with empty database - May 21, 2025: Redis auto-recovered with empty database - June 5, 2025: Redis auto-recovered with empty database - June 16, 2025: Redis auto-recovered with empty database - June 17, 2025 (2 crashes): Redis auto-recovered with empty database - June 18, 2025: Redis auto-recovered with empty database - June 19, 2025: Redis auto-recovered with empty database
Current Crash Details (June 21, 2025)
Latest Crash Information
Signal: 11 (SEGV - Segmentation Violation)
Crash Location: rdbSaveStringObject+0x1d
at address 0x556767e29c0d
Fault Address: 0xffffffffffffffff
(invalid memory access)
Stack Trace
EIP: redis-rdb-bgsave 127.0.0.1:6379(rdbSaveStringObject+0x1d)[0x556767e29c0d]
Backtrace:
/lib64/libc.so.6(+0x3ebf0)[0x7fabf503ebf0]
redis-rdb-bgsave 127.0.0.1:6379(rdbSaveStringObject+0x1d)[0x556767e29c0d]
redis-rdb-bgsave 127.0.0.1:6379(rdbSaveKeyValuePair+0x88)[0x556767e2b3e8]
redis-rdb-bgsave 127.0.0.1:6379(rdbSaveRio+0x290)[0x556767e2bb70]
redis-rdb-bgsave 127.0.0.1:6379(rdbSave+0x123)[0x556767e2c303]
redis-rdb-bgsave 127.0.0.1:6379(rdbSaveBackground+0xd3)[0x556767e2c653]
redis-rdb-bgsave 127.0.0.1:6379(serverCron+0x27e)[0x556767df1e6e]
redis-rdb-bgsave 127.0.0.1:6379(aeProcessEvents+0x12d)[0x556767debf9d]
redis-rdb-bgsave 127.0.0.1:6379(aeMain+0x1d)[0x556767dec34d]
redis-rdb-bgsave 127.0.0.1:6379(main+0x369)[0x556767de7a09]
Register State at Crash
RAX:0000000000000000 RBX:0000000000000000
RCX:0000000000000c00 RDX:0000000000000000
RDI:00007ffdf2dea490 RSI:0000000000000000
RBP:00007ffdf2dea490 RSP:00007ffdf2dea348
RIP:0000556767e29c0d EFL:0000000000010246
Memory State at Crash
Memory Usage
- Used Memory: 59.6MB / 1.86GB limit (2.98% utilization)
- RSS Memory: 57.4-57.9MB
- Peak Memory: 59.14MB
- Memory Fragmentation: 1.01-1.02 ratio (normal)
- Allocator: jemalloc-5.1.0 with 1.04 fragmentation ratio
Database Contents
- DB0: 48 keys (46 with expiry, avg TTL: 3.88M seconds)
- DB1: 220 keys (113 with expiry, avg TTL: 37.9M seconds)
- DB3: 21,599 keys (20 with expiry, avg TTL: 1.59B seconds)
- DB10: 12,916 keys (13 with expiry, avg TTL: 215M seconds)
- Total Keys: ~34,783
Performance Metrics
- Connected Clients: 2
- Hit Ratio: ~91.2% (1,927,950 hits / 186,286 misses)
- Operations/sec: 5-109 at crash time
- Total Commands: 2,474,460
- Total Connections: 283,194
Root Cause Analysis
Primary Issue: RDB Save Process Corruption
The crash consistently occurs in the rdbSaveStringObject
function during background saving operations. The fault address 0xffffffffffffffff
indicates an attempt to access invalid memory, suggesting:
- Memory Corruption: String objects being saved contain corrupted pointers
- Data Structure Corruption: Redis internal data structures are corrupted
- Hardware Memory Issues: Potential RAM hardware problems
Background Save Status
- RDB Background Save: In progress during all crashes
- Last Successful Save: Very old (timestamp: 1750468912)
- Save Status: "err" (failing consistently)
- Save Interval: 1 change in 900 seconds trigger
- Changes Since Last Save: 92,703 pending
Performance Degradation Indicators
- Low Hit Ratio Alerts: Consistent 67.9-68.2% hit ratio (below 85% threshold)
- Memory Pressure: Not evident (only 2.98% of max memory used)
- High Key Count: 34K+ keys, particularly in DB3 and DB10
Core Dump Files Available
Multiple core dumps found in /var/lib/systemd/coredump/
:
- 10+ core files from recent crashes
- All compressed with .zst format
- Can be analyzed with coredumpctl
or gdb
Recommended Actions for Redis Team
- Investigate rdbSaveStringObject Function: Focus on string object validation before serialization
- RDB Format Validation: Check for corruption in existing RDB files
- Memory Corruption Detection: Implement additional memory validation in save operations
- Background Save Robustness: Improve error handling in RDB save process
Configuration Context
- Max Memory: 2GB limit with volatile-lru eviction policy
- Save Configuration: Background saves enabled with 900-second threshold
- TCP Backlog: 65,535 (recently increased from 511)
- Connection Timeout: 300 seconds
- TCP Keepalive: 60 seconds
Comment From: sundb
@moazam1 no, i mean the block after
/lib64/libc.so.6(+0x3ebf0)[0x7f82ad83ebf0]
/usr/bin/redis-server 127.0.0.1:6379(dictSdsKeyCompare+0x34)[0x55ab8422d014]
/usr/bin/redis-server 127.0.0.1:6379(dictFind+0x75)[0x55ab84232635]
/usr/bin/redis-server 127.0.0.1:6379(lookupKey+0x16)[0x55ab8425e9e6]
/usr/bin/redis-server 127.0.0.1:6379(lookupKeyReadWithFlags+0x6e)[0x55ab8425eafe]
/usr/bin/redis-server 127.0.0.1:6379(getGenericCommand+0x34)[0x55ab84272c44]
/usr/bin/redis-server 127.0.0.1:6379(call+0xfd)[0x55ab8423fa9d]
/usr/bin/redis-server 127.0.0.1:6379(processCommand+0x633)[0x55ab842428e3]
/usr/bin/redis-server 127.0.0.1:6379(processInputBuffer+0x113)[0x55ab84252c03]
/usr/bin/redis-server 127.0.0.1:6379(+0x12824c)[0x55ab8430f24c]
/usr/bin/redis-server 127.0.0.1:6379(aeProcessEvents+0x2e2)[0x55ab84231152]
/usr/bin/redis-server 127.0.0.1:6379(aeMain+0x1d)[0x55ab8423134d]
/usr/bin/redis-server 127.0.0.1:6379(main+0x369)[0x55ab8422ca09]
/lib64/libc.so.6(+0x295d0)[0x7f82ad8295d0]
/lib64/libc.so.6(__libc_start_main+0x80)[0x7f82ad829680]
/usr/bin/redis-server 127.0.0.1:6379(_start+0x25)[0x55ab8422cf15]
like
------ REGISTERS ------
38:S 21 May 2025 21:40:26.198 #
RAX:000000000049aa90 RBX:00007f395600fc00
RCX:00007f3573a4d770 RDX:0000000000000000
RDI:0000000000000000 RSI:00007f32beaf6568
RBP:0000000000000000 RSP:00007ffe558e8000
R8 :00007f3956001350 R9 :00000000000001f0
R10:00007f3956417758 R11:00007f3956417760
R12:00000000005f3370 R13:0000000000000002
R14:00007f37a5ef2f40 R15:0000000000000000
RIP:000000000049aa99 EFL:0000000000010206
CSGSFS:002b000000000033
....
...
=== REDIS BUG REPORT END
Comment From: sundb
there is a similar crash issue: https://github.com/redis/redis/issues/13832
Comment From: moazam1
@sundb is there any solution available?
Comment From: sundb
@moazam1 i saw that a similar issue was raised in https://github.com/ClickHouse/ClickHouse/issues/78509 Can you also try downgrading the version to the suggested kernel version?
Comment From: sundb
@moazam1, which Almalinux version(not kernel version) are you using?