I know this is not the usual use case. But we are managing 2000 masters with a single cluster of redis sentinel on linux.
When the master count reaches 2000, client connections to sentinel itself (redis-cli -h localhost
) is sometimes stuck. The more masters, the more likely client is stuck. After like 2400 masters, not event a single client can connect to the sentinel.
The process open file descriptor limit is not reached. Conntrack limit is not hit. All masters can receive sentinel messages just fine. There is no log indicating error. cpu is not 100%
What could be the reason for this?
Comment From: Nklya
I'm not part of redis team, but I recommend to add metrics collection from Sentinels with redis_exporter and check clients connections. Maybe they're exhausted.
By default I think limit is 10k and as I recall, you need to restart sentinels to apply it.
Comment From: ShooterIT
I am not very familiar with sentinel, but managing more than 2000 masters is too heavy for sentinel.
do you find the logs that sentinel enter tilt
mode?