Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't close incoming connection on failed dial back. #255

Merged
merged 2 commits into from
Nov 6, 2019
Merged

Conversation

a-urth
Copy link

@a-urth a-urth commented Nov 4, 2019

Problem is that after restarting a node it much longer than expected take time to connect.

node1 is always running, node2 is one which is reconnecting
After node2 is disconnecting, connLoop constantly retries this connection up until it either will be READY again or after 30*3s will pass.

What happens if node2 is getting started again - node2 is dialing node1, grpc dial is successful, credentials check as well, so we store this connection to peers and start connLoop goroutine. Meanwhile node1 is trying to create its client back to node2 in async way, but it fails, since it ALREADY HAS connection to it in peers map and its broken (its in transient failure state and did not yet recover). So node1 is closing new connection from node2. Then somewhere around same time node2 after dialing node1 tries to Ping it using closed connection so ping fails and now both node1 and node2 have connections to each other which are broken and being constantly retried. After 30 tries one of the nodes creates new connection, but, if other node already has broken connection to it nothing will happen and this loop will continue. Situation resolves, but non deterministic and only when both nodes are getting fresh connections at the same time.

I agree with proposed by @rkeene solution, which is in this PR - do not close incoming connection if we're not able to dial it back. This way incoming connection remains and node which tried to connect to us will have working connection and will be able to gather information from us even if we do not benefit from this communication. Meanwhile if we already have connection to this node but it in failure state (like in example above) it will be recovered by means of grpc very shortly. This also resolves cases if user's computers do not have required ports open for example.

@a-urth a-urth self-assigned this Nov 4, 2019
@rkeene rkeene added the enhancement Small overall improvements label Nov 4, 2019
skademlia/protocol.go Outdated Show resolved Hide resolved
Copy link

@rkeene rkeene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome !

@a-urth a-urth merged commit bf3f562 into cleanup Nov 6, 2019
@a-urth a-urth deleted the fix-conn-close branch November 6, 2019 06:16
creamwhip pushed a commit to SwingbyProtocol/noise that referenced this pull request Nov 11, 2019
)

* skademlia-protocol: dont close conn on fail dial back

* Update skademlia/protocol.go

Co-Authored-By: Roy Keene <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Small overall improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants