Enhancing Observability and Control in Redis Cluster Data Migration

When performing operations such as CLUSTER REPLICATE, automatic failover due to node failures, or slot allocation adjustments in Redis clusters, the data migration process often feels like a "black box" to operators. We lack the necessary tools to clearly observe the migration progress, evaluate its impact on cluster performance, and intervene in a fine-grained manner when issues arise.

Lack of Transparent Progress Monitoring: Commands like CLUSTER SLOTS or INFO only display slot mapping status but fail to provide intuitive progress details such as the percentage of migration completed, the number of keys remaining, or the current migration rate. For slots containing a large amount of data, we need tools to estimate the time required to complete the migration.

Resource Contention During Migration: The migration process, especially when large keys are involved, inevitably consumes network bandwidth, CPU, and disk I/O on both the source and target nodes. To prevent disruptions to the online environment, it would be helpful to provide a rate-limiting feature for migration.

No Dynamic Throttling or Pause/Resume Options: There is no way to dynamically throttle, pause, or resume individual migration tasks. If migration impacts critical online services, the only options are to wait for it to complete or take high-risk actions, which creates operational challenges.

Risks of Large Key Migration: Large keys pose significant risks during migration, such as prolonged migration times and potential blocking of the source or target nodes. It would be ideal to have pre-migration checks that provide warnings about large keys and offer recommendations, such as splitting these keys into smaller pieces before migration.

Comment From: ShooterIT

Thanks @ddmg-250 we are enhancing data migration now