-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
decentralize p2p network #275
Comments
## Motivation This PR implements changes needed for spacemeshos/pm#275, except for measurement ## Changes * Introduce Routing Discovery to contact peers behind NATs * Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns) * Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support * Make it possible to listen on multiple addresses and advertise multiple addresses * Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285) * Add `ping-peers` config option to facilitate P2P network issue diagnostics * Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues `ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios. All of the changes are disabled in the config by default, except for: * libp2p Ping service is enabled by default to make diagnostics easier * DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs * Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed ## Test Plan * Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8 * Added a Mac node for testing ## TODO - [x] Have spacemeshos/api#285 merged and updated to the new `api` release - [x] Retest using an image based on this branch (not backport) - [ ] Decide on whether/how to extend systests to include NAT testing - [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism) - [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that) Maybe as a follow-up (depending on how soon this gets reviewed): - Include new metrics / check if they're already present - NAT type (UDP / TCP) - Cone / Symmetric / Unknown - Reachability - Public / Private / Unknown - N of "advertised" peers found via routing discovery - N of TCP and UDP (QUIC) peers - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly) - N of relay reservations this node managed to obtain - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached) - Whether DHT is in the `Server` or `Client` mode - systests checking NATed connections Co-authored-by: Ivan Shvedunov <[email protected]>
This PR implements changes needed for spacemeshos/pm#275, except for measurement * Introduce Routing Discovery to contact peers behind NATs * Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns) * Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support * Make it possible to listen on multiple addresses and advertise multiple addresses * Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285) * Add `ping-peers` config option to facilitate P2P network issue diagnostics * Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues `ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios. All of the changes are disabled in the config by default, except for: * libp2p Ping service is enabled by default to make diagnostics easier * DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs * Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed * Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8 * Added a Mac node for testing - [x] Have spacemeshos/api#285 merged and updated to the new `api` release - [x] Retest using an image based on this branch (not backport) - [ ] Decide on whether/how to extend systests to include NAT testing - [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism) - [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that) Maybe as a follow-up (depending on how soon this gets reviewed): - Include new metrics / check if they're already present - NAT type (UDP / TCP) - Cone / Symmetric / Unknown - Reachability - Public / Private / Unknown - N of "advertised" peers found via routing discovery - N of TCP and UDP (QUIC) peers - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly) - N of relay reservations this node managed to obtain - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached) - Whether DHT is in the `Server` or `Client` mode - systests checking NATed connections Co-authored-by: Ivan Shvedunov <[email protected]>
This PR implements changes needed for spacemeshos/pm#275, except for measurement * Introduce Routing Discovery to contact peers behind NATs * Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns) * Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support * Make it possible to listen on multiple addresses and advertise multiple addresses * Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285) * Add `ping-peers` config option to facilitate P2P network issue diagnostics * Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues `ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios. All of the changes are disabled in the config by default, except for: * libp2p Ping service is enabled by default to make diagnostics easier * DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs * Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed * Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8 * Added a Mac node for testing - [x] Have spacemeshos/api#285 merged and updated to the new `api` release - [x] Retest using an image based on this branch (not backport) - [ ] Decide on whether/how to extend systests to include NAT testing - [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism) - [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that) Maybe as a follow-up (depending on how soon this gets reviewed): - Include new metrics / check if they're already present - NAT type (UDP / TCP) - Cone / Symmetric / Unknown - Reachability - Public / Private / Unknown - N of "advertised" peers found via routing discovery - N of TCP and UDP (QUIC) peers - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly) - N of relay reservations this node managed to obtain - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached) - Whether DHT is in the `Server` or `Client` mode - systests checking NATed connections Co-authored-by: Ivan Shvedunov <[email protected]>
This PR implements changes needed for spacemeshos/pm#275, except for measurement * Introduce Routing Discovery to contact peers behind NATs * Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns) * Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support * Make it possible to listen on multiple addresses and advertise multiple addresses * Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285) * Add `ping-peers` config option to facilitate P2P network issue diagnostics * Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues `ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios. All of the changes are disabled in the config by default, except for: * libp2p Ping service is enabled by default to make diagnostics easier * DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs * Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed * Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8 * Added a Mac node for testing - [x] Have spacemeshos/api#285 merged and updated to the new `api` release - [x] Retest using an image based on this branch (not backport) - [ ] Decide on whether/how to extend systests to include NAT testing - [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism) - [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that) Maybe as a follow-up (depending on how soon this gets reviewed): - Include new metrics / check if they're already present - NAT type (UDP / TCP) - Cone / Symmetric / Unknown - Reachability - Public / Private / Unknown - N of "advertised" peers found via routing discovery - N of TCP and UDP (QUIC) peers - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly) - N of relay reservations this node managed to obtain - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached) - Whether DHT is in the `Server` or `Client` mode - systests checking NATed connections Co-authored-by: Ivan Shvedunov <[email protected]>
This PR implements changes needed for spacemeshos/pm#275, except for measurement * Introduce Routing Discovery to contact peers behind NATs * Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns) * Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support * Make it possible to listen on multiple addresses and advertise multiple addresses * Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285) * Add `ping-peers` config option to facilitate P2P network issue diagnostics * Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues `ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios. All of the changes are disabled in the config by default, except for: * libp2p Ping service is enabled by default to make diagnostics easier * DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs * Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed * Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8 * Added a Mac node for testing - [x] Have spacemeshos/api#285 merged and updated to the new `api` release - [x] Retest using an image based on this branch (not backport) - [ ] Decide on whether/how to extend systests to include NAT testing - [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism) - [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that) Maybe as a follow-up (depending on how soon this gets reviewed): - Include new metrics / check if they're already present - NAT type (UDP / TCP) - Cone / Symmetric / Unknown - Reachability - Public / Private / Unknown - N of "advertised" peers found via routing discovery - N of TCP and UDP (QUIC) peers - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly) - N of relay reservations this node managed to obtain - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached) - Whether DHT is in the `Server` or `Client` mode - systests checking NATed connections Co-authored-by: Ivan Shvedunov <[email protected]>
## Motivation This PR implements changes needed for spacemeshos/pm#275, except for measurement ## Changes * Introduce Routing Discovery to contact peers behind NATs * Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns) * Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support * Make it possible to listen on multiple addresses and advertise multiple addresses * Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285) * Add `ping-peers` config option to facilitate P2P network issue diagnostics * Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues `ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios. All of the changes are disabled in the config by default, except for: * libp2p Ping service is enabled by default to make diagnostics easier * DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs * Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed ## Test Plan * Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8 * Added a Mac node for testing ## TODO - [x] Have spacemeshos/api#285 merged and updated to the new `api` release - [x] Retest using an image based on this branch (not backport) - [ ] Decide on whether/how to extend systests to include NAT testing - [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism) - [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that) Maybe as a follow-up (depending on how soon this gets reviewed): - Include new metrics / check if they're already present - NAT type (UDP / TCP) - Cone / Symmetric / Unknown - Reachability - Public / Private / Unknown - N of "advertised" peers found via routing discovery - N of TCP and UDP (QUIC) peers - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly) - N of relay reservations this node managed to obtain - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached) - Whether DHT is in the `Server` or `Client` mode - systests checking NATed connections Co-authored-by: Ivan Shvedunov <[email protected]>
This PR implements changes needed for spacemeshos/pm#275, except for measurement * Introduce Routing Discovery to contact peers behind NATs * Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns) * Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support * Make it possible to listen on multiple addresses and advertise multiple addresses * Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285) * Add `ping-peers` config option to facilitate P2P network issue diagnostics * Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues `ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios. All of the changes are disabled in the config by default, except for: * libp2p Ping service is enabled by default to make diagnostics easier * DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs * Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed * Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8 * Added a Mac node for testing - [x] Have spacemeshos/api#285 merged and updated to the new `api` release - [x] Retest using an image based on this branch (not backport) - [ ] Decide on whether/how to extend systests to include NAT testing - [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism) - [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that) Maybe as a follow-up (depending on how soon this gets reviewed): - Include new metrics / check if they're already present - NAT type (UDP / TCP) - Cone / Symmetric / Unknown - Reachability - Public / Private / Unknown - N of "advertised" peers found via routing discovery - N of TCP and UDP (QUIC) peers - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly) - N of relay reservations this node managed to obtain - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached) - Whether DHT is in the `Server` or `Client` mode - systests checking NATed connections Co-authored-by: Ivan Shvedunov <[email protected]>
With P2P decentralization (#5329) and fetch streaming (#5562) merged, we're currently trying to enable decentralization features in the network. The current problem appears that the routing discovery mechanism causes too much network load on the user nodes. The following items are planned:
|
Testing QUIC on the testnet. There was an issue with malfeasance sync on testnet, fixed: spacemeshos/go-spacemesh#5851 Looking into reports that streaming mode is somehow causing too many TCP connections (?) |
spacemeshos/go-spacemesh#5792 should facilitate diagnostics of possible network issues after we try to enable decentralization next time. It would be best to include corresponding views in SMAPP. spacemeshos/go-spacemesh#5882 should reduce network strain due to DHT and connection activity after routing discovery is enabled |
spacemeshos/go-spacemesh#5902 adds QUIC mode systests |
This is done, the default smapp setup now includes quic and discovery enabled. Improvements will come in separate tasks. |
the problem with existing setup is that many nodes are undialable and therefore spacemesh has to run CDN-like nodes (that we call boosters internally) that help with network connectivity.
this is not good for network long term health and we want to fix it when time allows. in the past we failed with libp2p hole punching protocol, but it could be due to rushing things. we want to:
this doesn't introduce any immediate issues, but not good for network health long term
The text was updated successfully, but these errors were encountered: