Summary
My team ran into an issue this week where Sentinel took a long time (approximately 15 minutes) to recognize that a master server had crashed and begin the fail-over process.
From looking through the Sentinel code, the issue appears to occur if the master crashes or otherwise becomes unresponsive to network traffic right before the Sentinel process is going to send an INFO command. When an INFO refresh is overdue, sentinelSendPeriodicCommands will only send INFO commands (up to the max pending command limit), skipping the call to sentinelSendPing. Because sentinelCheckSubjectivelyDown is based on last_ping_time, this prevents the sentinel from detecting that the master is down until the established TCP connection is closed due to retransmit timeouts.
Repro steps
I'm able to reproduce this issue using Vagrant, though it does require careful attention to timing. I used Redis version 2.8.23, but it looks like a similar issue would affect the 3.0 branch as well. 1. Set up a cluster of three virtual machines, similar to "Example 2: basic setup with three boxes" in the Sentinel documentation. - Sentinel01 runs master and a Sentinel process. - Sentinel02 runs slave and a Sentinel process. - Sentinel03 runs slave and a Sentinel process. 2. On sentinel02: Run "watch -n 0.25 redis-cli -p 26379 sentinel master mymaster". Pay attention to the last-ping-reply and info-refresh values. The last-ping-reply value will range between 0 and 1000, and info-refresh value will range between 0 and 10000. We want to take the master offline during the time window between PING and INFO commands, which corresponds to info-refresh > 9000 and (info-refresh % 1000) > (last-ping-reply % 1000). (I generally aim for info-refresh of value >= 9500 and last-ping-reply < 500.) 3. On sentinel01: During the time window, down the interface that the Redis server is bound to. (e.g. sudo ifdown eth1).
At this point, sentinel02 quickly marks the sentinel process on sentinel01 as subjectively down, but the master stays up. Running "sentinel master mymaster" on sentinel02 shows info-refresh, last-ping-reply and pending-commands keep increasing, but last-ping-sent stays 0. Running netstat -on shows an established connection to sentinel01:6379 with retransmit timers running.
Eventually, the last retransmit timer expires, and OS closes the command connection. At this point, sentinel02 marks the master as subjectively down and failover attempts begin.
Possible fix
If INFO commands can preempt PING commands, it seems like Sentinel should update the ping times when it sends and receives INFO commands. This would allow the checks in sentinelCheckSubjectivelyDown to work for both cases.
Comment From: drmontgomery
sentinel01.log
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 2.8.23 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in sentinel mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 26379
| `-._ `._ / _.-' | PID: 5341
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
[5341] 23 Oct 01:30:57.073 # Sentinel runid is cd8cd658659da4dfb0b80ecd593aed30669c63cf
[5341] 23 Oct 01:30:57.073 # +monitor master mymaster 10.100.0.21 6379 quorum 2
[5341] 23 Oct 01:30:57.074 . +cmd-link master mymaster 10.100.0.21 6379
[5341] 23 Oct 01:30:57.075 . +pubsub-link master mymaster 10.100.0.21 6379
[5341] 23 Oct 01:32:03.415 - Accepted 10.100.0.22:47196
[5341] 23 Oct 01:32:04.478 * +sentinel sentinel 10.100.0.22:26379 10.100.0.22 26379 @ mymaster 10.100.0.21 6379
[5341] 23 Oct 01:32:04.516 . +cmd-link sentinel 10.100.0.22:26379 10.100.0.22 26379 @ mymaster 10.100.0.21 6379
[5341] 23 Oct 01:32:07.300 * +slave slave 10.100.0.22:6379 10.100.0.22 6379 @ mymaster 10.100.0.21 6379
[5341] 23 Oct 01:32:07.353 . +cmd-link slave 10.100.0.22:6379 10.100.0.22 6379 @ mymaster 10.100.0.21 6379
[5341] 23 Oct 01:32:07.353 . +pubsub-link slave 10.100.0.22:6379 10.100.0.22 6379 @ mymaster 10.100.0.21 6379
[5341] 23 Oct 01:33:12.874 - Accepted 10.100.0.23:60481
[5341] 23 Oct 01:33:13.604 * +sentinel sentinel 10.100.0.23:26379 10.100.0.23 26379 @ mymaster 10.100.0.21 6379
[5341] 23 Oct 01:33:13.627 . +cmd-link sentinel 10.100.0.23:26379 10.100.0.23 26379 @ mymaster 10.100.0.21 6379
[5341] 23 Oct 01:33:17.567 * +slave slave 10.100.0.23:6379 10.100.0.23 6379 @ mymaster 10.100.0.21 6379
[5341] 23 Oct 01:33:17.640 . +cmd-link slave 10.100.0.23:6379 10.100.0.23 6379 @ mymaster 10.100.0.21 6379
[5341] 23 Oct 01:33:17.640 . +pubsub-link slave 10.100.0.23:6379 10.100.0.23 6379 @ mymaster 10.100.0.21 6379
sentinel02.log
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 2.8.23 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in sentinel mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 26379
| `-._ `._ / _.-' | PID: 5339
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
[5339] 23 Oct 01:32:02.440 # Sentinel runid is 715e0ce508bef61a40d95a88fa296184f79605bb
[5339] 23 Oct 01:32:02.440 # +monitor master mymaster 10.100.0.21 6379 quorum 2
[5339] 23 Oct 01:32:02.442 . +cmd-link master mymaster 10.100.0.21 6379
[5339] 23 Oct 01:32:02.442 . +pubsub-link master mymaster 10.100.0.21 6379
[5339] 23 Oct 01:32:02.442 * +slave slave 10.100.0.22:6379 10.100.0.22 6379 @ mymaster 10.100.0.21 6379
[5339] 23 Oct 01:32:02.543 . +cmd-link slave 10.100.0.22:6379 10.100.0.22 6379 @ mymaster 10.100.0.21 6379
[5339] 23 Oct 01:32:02.543 . +pubsub-link slave 10.100.0.22:6379 10.100.0.22 6379 @ mymaster 10.100.0.21 6379
[5339] 23 Oct 01:32:03.314 * +sentinel sentinel 10.100.0.21:26379 10.100.0.21 26379 @ mymaster 10.100.0.21 6379
[5339] 23 Oct 01:32:03.406 . +cmd-link sentinel 10.100.0.21:26379 10.100.0.21 26379 @ mymaster 10.100.0.21 6379
[5339] 23 Oct 01:32:04.508 - Accepted 10.100.0.21:46219
[5339] 23 Oct 01:33:12.286 - Accepted 10.100.0.23:44026
[5339] 23 Oct 01:33:12.757 * +slave slave 10.100.0.23:6379 10.100.0.23 6379 @ mymaster 10.100.0.21 6379
[5339] 23 Oct 01:33:12.835 . +cmd-link slave 10.100.0.23:6379 10.100.0.23 6379 @ mymaster 10.100.0.21 6379
[5339] 23 Oct 01:33:12.835 . +pubsub-link slave 10.100.0.23:6379 10.100.0.23 6379 @ mymaster 10.100.0.21 6379
[5339] 23 Oct 01:33:13.572 * +sentinel sentinel 10.100.0.23:26379 10.100.0.23 26379 @ mymaster 10.100.0.21 6379
[5339] 23 Oct 01:33:13.616 . +cmd-link sentinel 10.100.0.23:26379 10.100.0.23 26379 @ mymaster 10.100.0.21 6379
[5339] 23 Oct 01:38:33.558 # +sdown sentinel 10.100.0.21:26379 10.100.0.21 26379 @ mymaster 10.100.0.21 6379
[5339] 23 Oct 01:53:35.642 . -cmd-link master mymaster 10.100.0.21 6379 #Connection timed out
[5339] 23 Oct 01:54:05.716 # +sdown master mymaster 10.100.0.21 6379
[5339] 23 Oct 01:54:05.790 # +odown master mymaster 10.100.0.21 6379 #quorum 2/2
[5339] 23 Oct 01:54:05.790 # +new-epoch 1
[5339] 23 Oct 01:54:05.790 # +try-failover master mymaster 10.100.0.21 6379
[5339] 23 Oct 01:54:05.791 # +vote-for-leader 715e0ce508bef61a40d95a88fa296184f79605bb 1
[5339] 23 Oct 01:54:05.794 # 10.100.0.23:26379 voted for 715e0ce508bef61a40d95a88fa296184f79605bb 1
[5339] 23 Oct 01:54:05.881 # +elected-leader master mymaster 10.100.0.21 6379
[5339] 23 Oct 01:54:05.882 # +failover-state-select-slave master mymaster 10.100.0.21 6379
[5339] 23 Oct 01:54:05.972 # -failover-abort-no-good-slave master mymaster 10.100.0.21 6379
[5339] 23 Oct 01:54:06.044 # Next failover delay: I will not start a failover before Fri Oct 23 01:56:06 2015
sentinel03.log
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 2.8.23 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in sentinel mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 26379
| `-._ `._ / _.-' | PID: 5331
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
[5331] 23 Oct 01:33:11.477 # Sentinel runid is 1cca6f7ab2394bfa51ae66a7fc34264e4edf6655
[5331] 23 Oct 01:33:11.477 # +monitor master mymaster 10.100.0.21 6379 quorum 2
[5331] 23 Oct 01:33:11.477 . +cmd-link master mymaster 10.100.0.21 6379
[5331] 23 Oct 01:33:11.477 . +pubsub-link master mymaster 10.100.0.21 6379
[5331] 23 Oct 01:33:11.478 * +slave slave 10.100.0.22:6379 10.100.0.22 6379 @ mymaster 10.100.0.21 6379
[5331] 23 Oct 01:33:11.479 * +slave slave 10.100.0.23:6379 10.100.0.23 6379 @ mymaster 10.100.0.21 6379
[5331] 23 Oct 01:33:11.570 . +cmd-link slave 10.100.0.23:6379 10.100.0.23 6379 @ mymaster 10.100.0.21 6379
[5331] 23 Oct 01:33:11.570 . +pubsub-link slave 10.100.0.23:6379 10.100.0.23 6379 @ mymaster 10.100.0.21 6379
[5331] 23 Oct 01:33:11.570 . +cmd-link slave 10.100.0.22:6379 10.100.0.22 6379 @ mymaster 10.100.0.21 6379
[5331] 23 Oct 01:33:11.570 . +pubsub-link slave 10.100.0.22:6379 10.100.0.22 6379 @ mymaster 10.100.0.21 6379
[5331] 23 Oct 01:33:12.191 * +sentinel sentinel 10.100.0.22:26379 10.100.0.22 26379 @ mymaster 10.100.0.21 6379
[5331] 23 Oct 01:33:12.268 . +cmd-link sentinel 10.100.0.22:26379 10.100.0.22 26379 @ mymaster 10.100.0.21 6379
[5331] 23 Oct 01:33:12.821 * +sentinel sentinel 10.100.0.21:26379 10.100.0.21 26379 @ mymaster 10.100.0.21 6379
[5331] 23 Oct 01:33:12.822 . +cmd-link sentinel 10.100.0.21:26379 10.100.0.21 26379 @ mymaster 10.100.0.21 6379
[5331] 23 Oct 01:33:13.576 - Accepted 10.100.0.21:43587
[5331] 23 Oct 01:33:13.597 - Accepted 10.100.0.22:46079
[5331] 23 Oct 01:38:33.762 # +sdown master mymaster 10.100.0.21 6379
[5331] 23 Oct 01:38:33.762 # +sdown sentinel 10.100.0.21:26379 10.100.0.21 26379 @ mymaster 10.100.0.21 6379
[5331] 23 Oct 01:54:05.735 # +new-epoch 1
[5331] 23 Oct 01:54:05.736 # +vote-for-leader 715e0ce508bef61a40d95a88fa296184f79605bb 1
[5331] 23 Oct 01:54:06.671 # +odown master mymaster 10.100.0.21 6379 #quorum 2/2
[5331] 23 Oct 01:54:06.671 # Next failover delay: I will not start a failover before Fri Oct 23 01:56:06 2015
Comment From: drmontgomery
Output from "sentinel master mymaster" on sentinel02 @ 23 Oct 01:40:56 UTC 2015
name
mymaster
ip
10.100.0.21
port
6379
runid
8e6dda9cd75d4654b019938c0c660ded2e572f0f
flags
master
pending-commands
111
last-ping-sent
0
last-ok-ping-reply
173856
last-ping-reply
173856
down-after-milliseconds
30000
info-refresh
182977
role-reported
master
role-reported-time
534320
config-epoch
0
num-slaves
2
num-other-sentinels
2
quorum
2
failover-timeout
60000
parallel-syncs
1
Comment From: therealbill
Nice find.
Would an alternate fix be to have the code which calls INFO first call a PING? In other words sentinelSendPing should always be called before a PING. It is possible this merely reduces the risk of the occurrence by shrinking the window to a very tiny one.
So perhaps renaming the last-ping time to something representing "last-command-success-time", and having both ping and info update them on success is a better route to go. That way it makes sense in the code why info is updating the entry and it can be used to track the last successful command which may be more useful during troubleshooting or analysis. Also if the is-server-down code is checking said entry for the last response either case should be caught.
Comment From: drmontgomery
Yes, I believe modifying sentinelSendPeriodicCommands to send PING before any other command is a valid fix. I was initially concerned about starvation (PINGs constantly trumping INFO, etc.), but this seems unlikely with the given timing parameters. This is probably the best fix for this immediate issue, in terms of limited scope and ease of porting into 2.8 branch.
Comment From: moiyer
Is this issue fixed? I happen to hit this issue when I use redis 2.8.14
Comment From: reclosedev
Looks like It happens in 3.2.11 too.
Comment From: antirez
Hello, I noticed this issue and problem only now, I'm checking if all this still applies to modern Redis.
Comment From: antirez
Problem acknowledged. I think the best fix could be just to let the function send the three types of messages independently, without sending just a single message per iteration. I can't find a strong justification for doing so (even if the original code seemed explicit about that, probably to save some message). So I propose this fix:
diff --git a/src/sentinel.c b/src/sentinel.c
index 6c6a3a0c..ef1be729 100644
--- a/src/sentinel.c
+++ b/src/sentinel.c
@@ -2599,20 +2599,24 @@ void sentinelSendPeriodicCommands(sentinelRedisInstance *ri) {
ping_period = ri->down_after_period;
if (ping_period > SENTINEL_PING_PERIOD) ping_period = SENTINEL_PING_PERIOD;
+ /* Send INFO to masters and slaves, not sentinels. */
if ((ri->flags & SRI_SENTINEL) == 0 &&
(ri->info_refresh == 0 ||
(now - ri->info_refresh) > info_period))
{
- /* Send INFO to masters and slaves, not sentinels. */
retval = redisAsyncCommand(ri->link->cc,
sentinelInfoReplyCallback, ri, "INFO");
if (retval == C_OK) ri->link->pending_commands++;
- } else if ((now - ri->link->last_pong_time) > ping_period &&
+ }
+
+ /* Send PING to all the three kinds of instances. */
+ if ((now - ri->link->last_pong_time) > ping_period &&
(now - ri->link->last_ping_time) > ping_period/2) {
- /* Send PING to all the three kinds of instances. */
sentinelSendPing(ri);
- } else if ((now - ri->last_pub_time) > SENTINEL_PUBLISH_PERIOD) {
- /* PUBLISH hello messages to all the three kinds of instances. */
+ }
+
+ /* PUBLISH hello messages to all the three kinds of instances. */
+ if ((now - ri->last_pub_time) > SENTINEL_PUBLISH_PERIOD) {
sentinelSendHello(ri);
}
}
What do you think? The fix passes all the Sentinel tests out of the box.
Comment From: antirez
I'm not sure if @soloestoy also focuses on Sentinel internals, but if so I would love to have his opinion on the above patch. Thanks!
Comment From: saamich
I also have this problem on 3.2.11
Comment From: soloestoy
Thank you for your trust @antirez ! Sorry, I'm not familiar with Sentinel code, I'll try my best to understand this patch.
Comment From: petterroea
Hello, I just hit a similar problem on Redis 6. Same symptom, a sentinel is elected but fails due to failover-abort-no-good-slave.
Sentinel config:
port 26379
sentinel monitor xxx xxx 6379 2
sentinel down-after-milliseconds xxx 1000
sentinel announce-ip "xxx"
sentinel announce-port 26379
sentinel resolve-hostnames yes
sentinel announce-hostnames yes
Looking at the conditions for finding a "good slave" from https://github.com/redis/redis/issues/7825 (in particular, rejecting slaves with info-refresh > 5s) and observing the master and replicas using sentinel master and sentinel replicas it seems info-refresh goes up to 10s before reaching 0 again, so i suspect this is a configuration issue on my side and that tuneables aren't sufficiently documented?
It is not immediately clear to me what info-refresh actually means