Describe the bug

I am using Redis v7.2.5. Recently, I have been facing Crashloopbackoff in replicas pods. I have checked the logs; it shows that the Connection with the master was lost. It occurs randomly. I don't know the reason for this sudden failure. We did not make any changes in master. It occurs randomly.

To reproduce

Randomly occurs unable to reproduce,

Expected behavior

It should not be disconnected and the connection should not be lost.

Additional information

Please find the logs and check whether is this any known issues, if it is fixed in any latest version, please share the details.

Master:

1:M 24 Mar 2025 08:32:10.562 * Ready to accept connections tcp 1:M 24 Mar 2025 08:32:11.271 * Replica redis-replicas-1.redis-headless.redis.svc.cluster.local:6379 asks for synchronization 1:M 24 Mar 2025 08:32:11.271 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'c18bc3d01b86ae6948f29a6eb2f11f1b30ec2d22', my replication IDs are '6113a7a6a9e0c40ea0f3035b4c200968ccb5c05d' and '0000000000000000000000000000000000000000') 1:M 24 Mar 2025 08:32:11.271 * Replication backlog created, my new replication IDs are '6cb353b64079e59c560d476f9ac868dddd5471f0' and '0000000000000000000000000000000000000000' 1:M 24 Mar 2025 08:32:11.271 * Delay next BGSAVE for diskless SYNC 1:M 24 Mar 2025 08:32:11.469 * Replica redis-replicas-2.redis-headless.redis.svc.cluster.local:6379 asks for synchronization 1:M 24 Mar 2025 08:32:11.469 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'c18bc3d01b86ae6948f29a6eb2f11f1b30ec2d22', my replication IDs are '6cb353b64079e59c560d476f9ac868dddd5471f0' and '0000000000000000000000000000000000000000') 1:M 24 Mar 2025 08:32:11.469 * Delay next BGSAVE for diskless SYNC 1:M 24 Mar 2025 08:32:16.489 * Starting BGSAVE for SYNC with target: replicas sockets 1:M 24 Mar 2025 08:32:16.490 * Background RDB transfer started by pid 14 14:C 24 Mar 2025 08:32:16.497 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 0 MB 1:M 24 Mar 2025 08:32:16.497 * Diskless rdb transfer, done reading from pipe, 2 replicas still up. 1:M 24 Mar 2025 08:32:16.508 * Background RDB transfer terminated with success 1:M 24 Mar 2025 08:32:16.508 * Streamed RDB transfer with replica redis-replicas-1.redis-headless.redis.svc.cluster.local:6379 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming 1:M 24 Mar 2025 08:32:16.508 * Synchronization with replica redis-replicas-1.redis-headless.redis.svc.cluster.local:6379 succeeded

Replicas:

1:S 24 Mar 2025 07:29:21.951 * MASTER <-> REPLICA sync: Finished with success 1:S 24 Mar 2025 07:29:21.952 * Creating AOF incr file temp-appendonly.aof.incr on background rewrite 1:S 24 Mar 2025 07:29:22.041 * Background append only file rewriting started by pid 19 19:C 24 Mar 2025 07:29:26.059 * Successfully created the temporary AOF base file temp-rewriteaof-bg-19.aof 19:C 24 Mar 2025 07:29:26.060 * Fork CoW for AOF rewrite: current 1 MB, peak 1 MB, average 1 MB 1:S 24 Mar 2025 07:29:26.141 * Background AOF rewrite terminated with success 1:S 24 Mar 2025 07:29:26.141 * Successfully renamed the temporary AOF base file temp-rewriteaof-bg-19.aof into appendonly.aof.34.base.rdb 1:S 24 Mar 2025 07:29:26.141 * Successfully renamed the temporary AOF incr file temp-appendonly.aof.incr into appendonly.aof.34.incr.aof 1:S 24 Mar 2025 07:29:26.145 * Removing the history file appendonly.aof.33.incr.aof in the background 1:S 24 Mar 2025 07:29:26.145 * Removing the history file appendonly.aof.33.base.rdb in the background 1:S 24 Mar 2025 07:29:26.148 * Background AOF rewrite finished successfully 1:S 24 Mar 2025 08:32:07.977 * Connection with master lost. 1:S 24 Mar 2025 08:32:07.977 * Caching the disconnected master state. 1:S 24 Mar 2025 08:32:07.977 * Reconnecting to MASTER redis-master-0.redis-headless.redis.svc.cluster.local:6379 1:S 24 Mar 2025 08:32:07.977 * Connection with master lost. 1:S 24 Mar 2025 08:32:07.978 * Caching the disconnected master state. 1:S 24 Mar 2025 08:32:07.978 * Reconnecting to MASTER redis-master-0.redis-headless.redis.svc.cluster.local:6379 1:S 24 Mar 2025 08:32:07.979 * MASTER <-> REPLICA sync started 1:S 24 Mar 2025 08:32:07.980 # Error condition on socket for SYNC: Connection refused 1:S 24 Mar 2025 08:32:07.983 * MASTER <-> REPLICA sync started 1:S 24 Mar 2025 08:32:07.983 # Error condition on socket for SYNC: Connection refused 1:S 24 Mar 2025 08:32:08.078 * Connecting to MASTER redis-master-0.redis-headless.redis.svc.cluster.local:6379 1:S 24 Mar 2025 08:32:08.079 * MASTER <-> REPLICA sync started 1:S 24 Mar 2025 08:32:08.080 # Error condition on socket for SYNC: Connection refused

Comment From: ShooterIT

I don't know what happened, if there is only this log the Connection with the master was lost, this is usually due to network errors. If the heartbeat mechanism between the master and replica (not a tcp problem) caused the disconnection, there will be other logs, such as MASTER timeout: no data nor PING received. maybe you also can check the network metrics of your system