Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not trigger the channelInactive() method in ConnectionWatchdog #3100

Closed
TomGitHome opened this issue Dec 26, 2024 · 7 comments
Closed

not trigger the channelInactive() method in ConnectionWatchdog #3100

TomGitHome opened this issue Dec 26, 2024 · 7 comments

Comments

@TomGitHome
Copy link

Environment

  • Lettuce version(s): [6.1.5.RELEASE]
  • Redis version: [6.2.7]

Basic Information:

There are two servers, one acting as the primary server and the other as the standby.
Both the primary and standby servers deploy Redis in Docker containers, and both are running.
The Spring Boot project is deployed in a Docker container.
The Spring Boot project uses Lettuce to connect to the Redis service.

Issue Description:

Currently, the Redis virtual IP is on the standby server. When using iptables -A INPUT -p tcp --dport 6379 -j REJECT on the standby server to bring Redis down, the Redis virtual IP automatically switches to the primary server. This triggers the channelInactive() method in ConnectionWatchdog, and the connection to the Redis service is restored successfully.
After Redis is restored on the standby server by using iptables -D INPUT -p tcp --dport 6379 -j REJECT, and then Redis is brought down on the primary server by executing iptables -A INPUT -p tcp --dport 6379 -j REJECT, the Redis virtual IP automatically switches to the standby server. However, this does not trigger the channelInactive() method in ConnectionWatchdog, and the connection to the Redis service remains abnormal for about 18 minutes before recovering.

@TomGitHome
Copy link
Author

1735183076331

@tishun
Copy link
Collaborator

tishun commented Dec 27, 2024

Hey @TomGitHome,

When changing the iptables rules of the container you might end up in a state where the driver does not receive any notification from the server that the connection was dropped. Depending on your KEEPALIVE and TCP_USER_TIMEOUT options you may end up waiting a large amount of time (specific to the OS of the container) until the connection drops.

You can read more details about this condition in the following article.

To make sure you do not run into the same problem, could you verify that you are following the best practices described in the Redis documentation?

Important

The suggested values in the article are sample values, you need to use the ones that make sense for your system, the basic recommendation is to follow the rule TCP_USER_TIMEOUT = TCP_KEEP_IDLE+TCP_KEEPINTVL * TCP_KEEPCNT

@tishun tishun added the status: waiting-for-feedback We need additional information before we can continue label Dec 27, 2024
@TomGitHome
Copy link
Author

Hey@tishun thanks!
The current configuration is as follows.
image

@TomGitHome
Copy link
Author

@tishun
The following changes were made based on version 6.1.5.RELEASE. After testing, the issue still occurs. As you mentioned, it is possible that the client still does not receive the server's disconnection signal. I will conduct packet capture testing in the future.
image
The current solution on my side is:
when a RedisTimeoutException occurs, I re-establish the connection pool by checking the availability of the Redis service through the handshake returned by the RedisClient.

@tishun
Copy link
Collaborator

tishun commented Dec 27, 2024

Hey @TomGitHome ,

I do not see in your configuration the setting for the TCP_USER_TIMEOUT?
In your case it would have to be part of the SocketTimeout settings:

   SocketOptions socketOptions = SocketOptions.builder()
            .keepAlive(keepAliveOptions)
            .connectTimeout(Duration.ofSeconds(10))
            .tcpUserTimeout(tcpUserTimeout)               // <--- THIS IS MISSING
            .build();

... and if you want to use the same setting as the one in the example you'd have to add this too:

   SocketOptions.TcpUserTimeoutOptions tcpUserTimeout = SocketOptions.TcpUserTimeoutOptions.builder()
            .tcpUserTimeout(Duration.ofSeconds(20))
            .enable().build();

@TomGitHome
Copy link
Author

Hey@tishun
I was using version 6.1.5 previously, which didn't have the TcpUserTimeoutOptions. After upgrading to version 6.3.2 and configuring the TcpUserTimeoutOptions, the issue I encountered before was resolved. Thank you very much!

@tishun tishun removed the status: waiting-for-feedback We need additional information before we can continue label Jan 3, 2025
@tishun
Copy link
Collaborator

tishun commented Jan 3, 2025

Hey@tishun I was using version 6.1.5 previously, which didn't have the TcpUserTimeoutOptions.

Oh, my bad, missed that.

After upgrading to version 6.3.2 and configuring the TcpUserTimeoutOptions, the issue I encountered before was resolved. Thank you very much!

Happy to help! I will close that issue, please ping me if there is something more to take a look at.

@tishun tishun closed this as completed Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants