Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - P2P decentralization improvements #5329

Closed
wants to merge 32 commits into from

Conversation

ivan4th
Copy link
Contributor

@ivan4th ivan4th commented Dec 5, 2023

Motivation

This PR implements changes needed for spacemeshos/pm#275, except for measurement

Changes

  • Introduce Routing Discovery to contact peers behind NATs
  • Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns)
  • Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism
  • Make it possible to listen on multiple addresses and advertise multiple addresses
  • Extend DebugService with additional P2P info needed for hole punching diagnostics (needs Extend NetworkInfoResponse with more P2P info api#285)
  • Add ping-peers config option to facilitate P2P network issue diagnostics
  • Add force-dht-server config option that is useful during troubleshooting DHT and hole-punching issues

ping-peers and force-dht-server were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios.

All of the changes are disabled in the config by default, except for:

  • libp2p Ping service is enabled by default to make diagnostics easier
  • DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs
  • Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed

Test Plan

  • Tested using k8s several clusters with cone NATs enabled via bridge CNI plugin (via Multus) -- backported to v1.2.8
  • Added a Mac node for testing

TODO

  • Have Extend NetworkInfoResponse with more P2P info api#285 merged and updated to the new api release
  • Retest using an image based on this branch (not backport)
  • Decide on whether/how to extend systests to include NAT testing
  • To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism)
  • To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery (doesn't work too well, need something more involved for that)

Maybe as a follow-up (depending on how soon this gets reviewed):

  • Include new metrics / check if they're already present
    • NAT type (UDP / TCP) - Cone / Symmetric / Unknown
    • Reachability - Public / Private / Unknown
    • N of "advertised" peers found via routing discovery
    • N of TCP and UDP (QUIC) peers
    • N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly)
    • N of relay reservations this node managed to obtain
    • Whether routing discovery is active or suspended (e.g. b/c low-peers N of peers has been reached)
    • Whether DHT is in the Server or Client mode
  • systests checking NATed connections

@ivan4th ivan4th mentioned this pull request Dec 5, 2023
4 tasks
Copy link

codecov bot commented Dec 5, 2023

Codecov Report

Attention: 263 lines in your changes are missing coverage. Please review.

Comparison is base (4d1467c) 77.5% compared to head (3cea314) 77.3%.
Report is 3 commits behind head on develop.

Files Patch % Lines
p2p/upgrade.go 24.3% 76 Missing and 8 partials ⚠️
p2p/host.go 37.7% 63 Missing and 8 partials ⚠️
p2p/dhtdiscovery/discovery.go 84.5% 23 Missing and 9 partials ⚠️
p2p/handshake/handshake.go 83.1% 19 Missing and 7 partials ⚠️
p2p/ping.go 75.7% 17 Missing and 9 partials ⚠️
node/mapstructureutil/addresslist.go 69.5% 7 Missing ⚠️
api/grpcserver/debug_service.go 81.8% 6 Missing ⚠️
node/flags/addresslist.go 89.0% 4 Missing and 2 partials ⚠️
p2p/addresslist.go 83.8% 3 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##           develop   #5329     +/-   ##
=========================================
- Coverage     77.5%   77.3%   -0.2%     
=========================================
  Files          252     257      +5     
  Lines        29695   30441    +746     
=========================================
+ Hits         23026   23546    +520     
- Misses        5211    5401    +190     
- Partials      1458    1494     +36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

p2p/ping.go Outdated Show resolved Hide resolved
p2p/ping.go Outdated Show resolved Hide resolved
p2p/ping.go Outdated Show resolved Hide resolved
p2p/ping.go Outdated Show resolved Hide resolved
p2p/ping.go Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
p2p/ping.go Outdated Show resolved Hide resolved
p2p/ping.go Outdated Show resolved Hide resolved
p2p/ping.go Show resolved Hide resolved
p2p/ping.go Show resolved Hide resolved
p2p/host.go Outdated Show resolved Hide resolved
p2p/host.go Outdated Show resolved Hide resolved
p2p/host.go Outdated Show resolved Hide resolved
p2p/host.go Outdated Show resolved Hide resolved
p2p/host.go Outdated Show resolved Hide resolved
p2p/upgrade.go Outdated Show resolved Hide resolved
p2p/dhtdiscovery/discovery_test.go Outdated Show resolved Hide resolved
p2p/dhtdiscovery/discovery.go Outdated Show resolved Hide resolved
p2p/dhtdiscovery/discovery.go Show resolved Hide resolved
p2p/dhtdiscovery/discovery.go Outdated Show resolved Hide resolved
@ivan4th ivan4th force-pushed the p2p-improvements branch 3 times, most recently from 36fbe1c to a1a2e40 Compare December 12, 2023 13:28
node/node_test.go Outdated Show resolved Hide resolved
p2p/dhtdiscovery/discovery.go Outdated Show resolved Hide resolved
p2p/dhtdiscovery/discovery.go Outdated Show resolved Hide resolved
p2p/dhtdiscovery/discovery.go Outdated Show resolved Hide resolved
p2p/dhtdiscovery/discovery.go Outdated Show resolved Hide resolved
select {
case <-ctx.Done():
return nil
case p, ok := <-peerCh:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

peerCh could be nil at this point (if d.disc.FindPeers(ctx, ns) failed). Reading from it would block forever then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to continue the loop in the if above, fixed, thanks

p2p/dhtdiscovery/discovery.go Outdated Show resolved Hide resolved
p2p/handshake/handshake.go Outdated Show resolved Hide resolved
p2p/ping.go Outdated Show resolved Hide resolved
@ivan4th ivan4th force-pushed the p2p-improvements branch 2 times, most recently from 51db827 to 3f7e8ba Compare December 15, 2023 13:38
@ivan4th
Copy link
Contributor Author

ivan4th commented Dec 15, 2023

@ivan4th
Copy link
Contributor Author

ivan4th commented Dec 18, 2023

Disabled routing discovery advertisement by default, and made the nodes use bootnodes as relays when routing discovery is disabled for even better consistency with existing behaviour when an old config is being used.
Updated README.md and CHANGELOG.md

@ivan4th
Copy link
Contributor Author

ivan4th commented Dec 19, 2023

@poszu pls check if the README / changelog changes are sufficient (or if they're excessive)

p2p/upgrade.go Outdated Show resolved Hide resolved
p2p/handshake/handshake.go Show resolved Hide resolved
p2p/handshake/handshake.go Show resolved Hide resolved
@ivan4th
Copy link
Contributor Author

ivan4th commented Dec 19, 2023

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Dec 19, 2023
## Motivation

This PR implements changes needed for spacemeshos/pm#275, except for measurement

## Changes
* Introduce Routing Discovery to contact peers behind NATs
* Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns)
* Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism
  * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support
* Make it possible to listen on multiple addresses and advertise multiple addresses
* Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285)
* Add `ping-peers` config option to facilitate P2P network issue diagnostics
* Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues

`ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios.

All of the changes are disabled in the config by default, except for:
* libp2p Ping service is enabled by default to make diagnostics easier
* DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs
* Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed

## Test Plan
* Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8
* Added a Mac node for testing

## TODO
- [x] Have spacemeshos/api#285 merged and updated to the new `api` release
- [x] Retest using an image based on this branch (not backport)
- [ ] Decide on whether/how to extend systests to include NAT testing
- [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism)
- [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that)

Maybe as a follow-up (depending on how soon this gets reviewed):
- Include new metrics / check if they're already present
  - NAT type (UDP / TCP) - Cone / Symmetric / Unknown
  - Reachability - Public / Private / Unknown
  - N of "advertised" peers found via routing discovery
  - N of TCP and UDP (QUIC) peers
  - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly)
  - N of relay reservations this node managed to obtain
  - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached)
  - Whether DHT is in the `Server` or `Client` mode
- systests checking NATed connections

Co-authored-by: Ivan Shvedunov <[email protected]>
@spacemesh-bors
Copy link

Pull request successfully merged into develop.

Build succeeded:

@spacemesh-bors spacemesh-bors bot changed the title P2P decentralization improvements [Merged by Bors] - P2P decentralization improvements Dec 19, 2023
@spacemesh-bors spacemesh-bors bot closed this Dec 19, 2023
@spacemesh-bors spacemesh-bors bot deleted the p2p-improvements branch December 19, 2023 12:36
ivan4th added a commit that referenced this pull request Dec 20, 2023
This PR implements changes needed for spacemeshos/pm#275, except for measurement

* Introduce Routing Discovery to contact peers behind NATs
* Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns)
* Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism
  * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support
* Make it possible to listen on multiple addresses and advertise multiple addresses
* Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285)
* Add `ping-peers` config option to facilitate P2P network issue diagnostics
* Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues

`ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios.

All of the changes are disabled in the config by default, except for:
* libp2p Ping service is enabled by default to make diagnostics easier
* DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs
* Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed

* Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8
* Added a Mac node for testing

- [x] Have spacemeshos/api#285 merged and updated to the new `api` release
- [x] Retest using an image based on this branch (not backport)
- [ ] Decide on whether/how to extend systests to include NAT testing
- [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism)
- [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that)

Maybe as a follow-up (depending on how soon this gets reviewed):
- Include new metrics / check if they're already present
  - NAT type (UDP / TCP) - Cone / Symmetric / Unknown
  - Reachability - Public / Private / Unknown
  - N of "advertised" peers found via routing discovery
  - N of TCP and UDP (QUIC) peers
  - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly)
  - N of relay reservations this node managed to obtain
  - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached)
  - Whether DHT is in the `Server` or `Client` mode
- systests checking NATed connections

Co-authored-by: Ivan Shvedunov <[email protected]>
ivan4th added a commit that referenced this pull request Dec 20, 2023
This PR implements changes needed for spacemeshos/pm#275, except for measurement

* Introduce Routing Discovery to contact peers behind NATs
* Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns)
* Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism
  * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support
* Make it possible to listen on multiple addresses and advertise multiple addresses
* Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285)
* Add `ping-peers` config option to facilitate P2P network issue diagnostics
* Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues

`ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios.

All of the changes are disabled in the config by default, except for:
* libp2p Ping service is enabled by default to make diagnostics easier
* DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs
* Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed

* Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8
* Added a Mac node for testing

- [x] Have spacemeshos/api#285 merged and updated to the new `api` release
- [x] Retest using an image based on this branch (not backport)
- [ ] Decide on whether/how to extend systests to include NAT testing
- [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism)
- [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that)

Maybe as a follow-up (depending on how soon this gets reviewed):
- Include new metrics / check if they're already present
  - NAT type (UDP / TCP) - Cone / Symmetric / Unknown
  - Reachability - Public / Private / Unknown
  - N of "advertised" peers found via routing discovery
  - N of TCP and UDP (QUIC) peers
  - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly)
  - N of relay reservations this node managed to obtain
  - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached)
  - Whether DHT is in the `Server` or `Client` mode
- systests checking NATed connections

Co-authored-by: Ivan Shvedunov <[email protected]>
ivan4th added a commit that referenced this pull request Dec 25, 2023
This PR implements changes needed for spacemeshos/pm#275, except for measurement

* Introduce Routing Discovery to contact peers behind NATs
* Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns)
* Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism
  * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support
* Make it possible to listen on multiple addresses and advertise multiple addresses
* Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285)
* Add `ping-peers` config option to facilitate P2P network issue diagnostics
* Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues

`ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios.

All of the changes are disabled in the config by default, except for:
* libp2p Ping service is enabled by default to make diagnostics easier
* DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs
* Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed

* Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8
* Added a Mac node for testing

- [x] Have spacemeshos/api#285 merged and updated to the new `api` release
- [x] Retest using an image based on this branch (not backport)
- [ ] Decide on whether/how to extend systests to include NAT testing
- [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism)
- [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that)

Maybe as a follow-up (depending on how soon this gets reviewed):
- Include new metrics / check if they're already present
  - NAT type (UDP / TCP) - Cone / Symmetric / Unknown
  - Reachability - Public / Private / Unknown
  - N of "advertised" peers found via routing discovery
  - N of TCP and UDP (QUIC) peers
  - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly)
  - N of relay reservations this node managed to obtain
  - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached)
  - Whether DHT is in the `Server` or `Client` mode
- systests checking NATed connections

Co-authored-by: Ivan Shvedunov <[email protected]>
ivan4th added a commit that referenced this pull request Dec 26, 2023
This PR implements changes needed for spacemeshos/pm#275, except for measurement

* Introduce Routing Discovery to contact peers behind NATs
* Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns)
* Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism
  * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support
* Make it possible to listen on multiple addresses and advertise multiple addresses
* Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285)
* Add `ping-peers` config option to facilitate P2P network issue diagnostics
* Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues

`ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios.

All of the changes are disabled in the config by default, except for:
* libp2p Ping service is enabled by default to make diagnostics easier
* DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs
* Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed

* Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8
* Added a Mac node for testing

- [x] Have spacemeshos/api#285 merged and updated to the new `api` release
- [x] Retest using an image based on this branch (not backport)
- [ ] Decide on whether/how to extend systests to include NAT testing
- [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism)
- [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that)

Maybe as a follow-up (depending on how soon this gets reviewed):
- Include new metrics / check if they're already present
  - NAT type (UDP / TCP) - Cone / Symmetric / Unknown
  - Reachability - Public / Private / Unknown
  - N of "advertised" peers found via routing discovery
  - N of TCP and UDP (QUIC) peers
  - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly)
  - N of relay reservations this node managed to obtain
  - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached)
  - Whether DHT is in the `Server` or `Client` mode
- systests checking NATed connections

Co-authored-by: Ivan Shvedunov <[email protected]>
spacemesh-bors bot pushed a commit that referenced this pull request Dec 26, 2023
This is a backport of #5329.
It also includes changes from #5287 to avoid conflicts and updates `go-libp2p` to a more recent version.
spacemesh-bors bot pushed a commit that referenced this pull request Dec 26, 2023
This is a backport of #5329.
It also includes changes from #5287 to avoid conflicts and updates `go-libp2p` to a more recent version.
spacemesh-bors bot pushed a commit that referenced this pull request Dec 27, 2023
This is a backport of #5329.
It also includes changes from #5287 to avoid conflicts and updates `go-libp2p` to a more recent version.
spacemesh-bors bot pushed a commit that referenced this pull request Dec 27, 2023
This is a backport of #5329.
It also includes changes from #5287 to avoid conflicts and updates `go-libp2p` to a more recent version.
dsmello pushed a commit that referenced this pull request Dec 28, 2023
## Motivation

This PR implements changes needed for spacemeshos/pm#275, except for measurement

## Changes
* Introduce Routing Discovery to contact peers behind NATs
* Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns)
* Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism
  * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support
* Make it possible to listen on multiple addresses and advertise multiple addresses
* Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285)
* Add `ping-peers` config option to facilitate P2P network issue diagnostics
* Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues

`ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios.

All of the changes are disabled in the config by default, except for:
* libp2p Ping service is enabled by default to make diagnostics easier
* DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs
* Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed

## Test Plan
* Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8
* Added a Mac node for testing

## TODO
- [x] Have spacemeshos/api#285 merged and updated to the new `api` release
- [x] Retest using an image based on this branch (not backport)
- [ ] Decide on whether/how to extend systests to include NAT testing
- [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism)
- [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that)

Maybe as a follow-up (depending on how soon this gets reviewed):
- Include new metrics / check if they're already present
  - NAT type (UDP / TCP) - Cone / Symmetric / Unknown
  - Reachability - Public / Private / Unknown
  - N of "advertised" peers found via routing discovery
  - N of TCP and UDP (QUIC) peers
  - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly)
  - N of relay reservations this node managed to obtain
  - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached)
  - Whether DHT is in the `Server` or `Client` mode
- systests checking NATed connections

Co-authored-by: Ivan Shvedunov <[email protected]>
ivan4th added a commit that referenced this pull request Dec 29, 2023
This PR implements changes needed for spacemeshos/pm#275, except for measurement

* Introduce Routing Discovery to contact peers behind NATs
* Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns)
* Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism
  * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support
* Make it possible to listen on multiple addresses and advertise multiple addresses
* Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285)
* Add `ping-peers` config option to facilitate P2P network issue diagnostics
* Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues

`ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios.

All of the changes are disabled in the config by default, except for:
* libp2p Ping service is enabled by default to make diagnostics easier
* DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs
* Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed

* Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8
* Added a Mac node for testing

- [x] Have spacemeshos/api#285 merged and updated to the new `api` release
- [x] Retest using an image based on this branch (not backport)
- [ ] Decide on whether/how to extend systests to include NAT testing
- [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism)
- [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that)

Maybe as a follow-up (depending on how soon this gets reviewed):
- Include new metrics / check if they're already present
  - NAT type (UDP / TCP) - Cone / Symmetric / Unknown
  - Reachability - Public / Private / Unknown
  - N of "advertised" peers found via routing discovery
  - N of TCP and UDP (QUIC) peers
  - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly)
  - N of relay reservations this node managed to obtain
  - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached)
  - Whether DHT is in the `Server` or `Client` mode
- systests checking NATed connections

Co-authored-by: Ivan Shvedunov <[email protected]>
spacemesh-bors bot pushed a commit that referenced this pull request Dec 29, 2023
This is a backport of #5329.
It also includes changes from #5287 to avoid conflicts and updates `go-libp2p` to a more recent version.
spacemesh-bors bot pushed a commit that referenced this pull request Dec 29, 2023
This is a backport of #5329.
It also includes changes from #5287 to avoid conflicts and updates `go-libp2p` to a more recent version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants