Tile38 is not publishing its db stats to redis sentinel which causes sentinel to pick wrong master #750

Kilowhisky · 2024-08-30T18:07:10Z

Describe the bug
Tile38 supports nearly all the commands necessary in order to operate as a Redis node with Redis Sentinel based system. One thing it apparently is missing is reporting its DB status to sentinel so that way sentinel can pick the most up to date server as master when a failover occurs.

We've been dealing with an issue where:

A node in the pool of 3 tile38 servers crashes, the one that crashed happened to be the master.
This kicks sentinel into a master election.
During the sleep period before election occurs, a new node is booted by EKS and joins the cluster. This node happens to be completely empty. So now we have 2 nodes that have data and one node that doesn't.
Sentinel then proceeds to pick a new master. Because sentinel doesn't know anything about the state of the instances it just ends up randomly picking one of the nodes. It has a 33% chance of picking the node that is empty... Which happens.
Sentinel picks the empty node as master and tells the other tile38 instances to follow it.
The tile38 instances happily start following the empty master.
BONUS the other two tile38's that had data in them still have the data as their AOF wasn't cleared when they were told to follow a new master. This leads to followers that have data that the master doesn't and will happily respond as such. This is probably another bug....

Anyways... Back to the issue..

Redis Sentinel uses the following commands to interact with client instances.

In particular it uses the INFO command in order to determine the current offset status of the server.

link to parser is here: https://github.com/redis/redis/blob/3fcddfb61f903d7112da186cba8b1c93a99dc87f/src/sentinel.c#L2490

It is looking for the following fields to be present.

run_id : https://github.com/redis/redis/blob/3fcddfb61f903d7112da186cba8b1c93a99dc87f/src/sentinel.c#L2510
slave(N) (ip, port) https://github.com/redis/redis/blob/3fcddfb61f903d7112da186cba8b1c93a99dc87f/src/sentinel.c#L2530
master_link_down_since_seconds
https://github.com/redis/redis/blob/3fcddfb61f903d7112da186cba8b1c93a99dc87f/src/sentinel.c#L2569
role https://github.com/redis/redis/blob/3fcddfb61f903d7112da186cba8b1c93a99dc87f/src/sentinel.c#L2576
master_link_status: https://github.com/redis/redis/blob/3fcddfb61f903d7112da186cba8b1c93a99dc87f/src/sentinel.c#L2602
slave_priority:https://github.com/redis/redis/blob/3fcddfb61f903d7112da186cba8b1c93a99dc87f/src/sentinel.c#L2610
slave_repl_offset: https://github.com/redis/redis/blob/3fcddfb61f903d7112da186cba8b1c93a99dc87f/src/sentinel.c#L2614
replica_announced: https://github.com/redis/redis/blob/3fcddfb61f903d7112da186cba8b1c93a99dc87f/src/sentinel.c#L2618

In particular without the slave_repl_offset sentinel has no idea the status of the connected nodes. So when a new node of equal priority comes in and a failover happens. It can accidentally pick a empty node.

Expected behavior
The fix is to add the slave_repl_offset to the INFO [replication] command so that the sentinel knows who has data and who does not.

There also appear to be quite a few other pieces of data here that might be useful that are not sent. I'm still digging into what each field is used for and if we need to surface it.

Logs
This is an example of what tile38 outputs for INFO replication

# Server
tile38_version:20240502_000441
redis_version:20240502_000441
uptime_in_seconds:16071

# Clients
connected_clients:37

# Memory
used_memory:1321460168

# Persistence
aof_enabled:1
aof_rewrite_in_progress:0
aof_last_rewrite_time_sec:0
aof_current_rewrite_time_sec:0

# Stats
total_connections_received:187889
total_commands_processed:200368972
total_messages_sent:71002
expired_keys:36

# Replication
role:master
slave0:ip=tile38-node-1.tile38-headless.core-geofence.svc.cluster.local,port=6379,state=online
slave1:ip=tile38-node-3.tile38-headless.core-geofence.svc.cluster.local,port=6379,state=online
connected_slaves:2

# CPU
used_cpu_sys:3319.00
used_cpu_user:15522.00
used_cpu_sys_children:0.00
used_cpu_user_children:0.00

# Cluster
cluster_enabled:0

This is what a regular redis server outputs: https://stackoverflow.com/questions/40726175/what-do-master-and-slave-offsets-mean-in-redis

Operating System (please complete the following information):

OS: Linux (EKS)
CPU: unknown
Version: latest
Container: EKS

Additional context
We are running tile38 inside EKS utilizing the Bitnami-Redis helm chart. This chart utilizes Sentinel in order to maintain the health of the cluster and properly failover when nodes are killed.

The text was updated successfully, but these errors were encountered:

Kilowhisky · 2024-08-30T18:24:06Z

Digging into this a bit more I would suggest adding the following fields to the INFO replication command. ( i can get a PR for this together once we are good with it). Here's a good rundown of the fields that are in the INFO command and what they are for. https://redis.io/docs/latest/commands/info/

run_id expects 40 hex characters.. the server.id generated by tile38 appears to be 33. Not sure if this is necessary.
master_link_status = 'up' or 'down' depending if its connected to the master. Not sure if this is necessary.
slave_priority = a number indicating server affinity. I don't need this...
replica_announced = Flag indicating if the replica is announced by Sentinel. Not sure if this is necessary.
slave_repl_offset = This is the follower offset against the master.

The big one is the last one. See this code block as to why.

    /* If priority is the same, select the slave with greater replication
     * offset (processed more data from the master). */
    if ((*sa)->slave_repl_offset > (*sb)->slave_repl_offset) {
        return -1; /* a < b */
    } else if ((*sa)->slave_repl_offset < (*sb)->slave_repl_offset) {
        return 1; /* a > b */
    }

https://github.com/redis/redis/blob/3fcddfb61f903d7112da186cba8b1c93a99dc87f/src/sentinel.c#L5021-L5027

I'm tempted that this should just be the AOF offset, but i'm wondering if there is a better stat that can be used.

tidwall · 2024-09-03T11:36:11Z

The AOF offset is probably the stat to use.
I'm cool is you want to submit a PR.

Kilowhisky mentioned this issue Sep 3, 2024

Add 'slave_repl_offset' to the INFO replication command output #752

Merged

tidwall closed this as completed in #752 Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tile38 is not publishing its db stats to redis sentinel which causes sentinel to pick wrong master #750

Tile38 is not publishing its db stats to redis sentinel which causes sentinel to pick wrong master #750

Kilowhisky commented Aug 30, 2024 •

edited

Loading

Kilowhisky commented Aug 30, 2024

tidwall commented Sep 3, 2024

Tile38 is not publishing its db stats to redis sentinel which causes sentinel to pick wrong master #750

Tile38 is not publishing its db stats to redis sentinel which causes sentinel to pick wrong master #750

Comments

Kilowhisky commented Aug 30, 2024 • edited Loading

Kilowhisky commented Aug 30, 2024

tidwall commented Sep 3, 2024

Kilowhisky commented Aug 30, 2024 •

edited

Loading