WIP: WIP-0013 Title: Apr 2021 Chain Fork Post-Mortem Author: The witnet-rust developers Discussions-To: `#dev-general` channel on Witnet Community's Discord server Status: Final Type: Informational Created: 2021-04-24 License: PD
For reasons much alike to WIP-0010, on April 22, 2021, the superblock consensus committee size started to progressively go down until it went to the minimum of 1
.
After several hours of a total lack of superblock consensus, different segments of the network consolidated different superblocks. This resulted in multiple, incompatible chain forks. The last block that was common to all of them was #364269
with hash a23f647471f0976a358de9d7ac901a3f715e78335e6f412d58e5b037c11668ea
.
At that point, at least two forks were registered:
Chain | First block number after #364269 |
First block hash after #364269 |
---|---|---|
A | #364270 |
462d4fc6815a8492a07d133f796e66919ba459ee6431ce3376ad47f311744821 |
B | #365292 |
8a09fd775d473340bec892c7c6cd543e01922fce5ba3e94a1892800b11c43d3e |
By Apr 24, the A
chain above seemed healthy enough, as it presented a consistent committee size of 100
, and a low rate of rollbacks.
The root cause for this network disruption is assumed to be the same as for WIP-0010.
These immediate actions are geared towards resuming the normal operation of the network. That is, making sure that all nodes can get a valid chain state on top of which new blocks can be appended and validated by the entire network.
These actions are aimed specifically at nodes that were not able to keep up with the A
chain above.
- Release a
witnet-rust
version, tagged as1.2.1
. - Optionally, make this release preemptively soften some peering restrictions that are not consensus-critical (e.g. inbound Sybil protection, icing period, outbound peers target, etc.) to facilitate the discovery of potential peers and speed up the recovery.
- Release a recovery script that evaluates whether a local node is on the
A
chain, and if it not, rewinds the chain back to the common checkpoint (364269
) using therewind
CLI method. This script can also let the node know about the public address of stable nodes that are known to be on theA
chain. - Bundle the recovery script into the
witnet/witnet-rust:1.2.1
andwitnet/witnet-rust:latest
docker images, and modifymigrations.sh
so the recovery script is run upon starting or restarting a container.
Nodes that are not on the A
chain SHOULD upgrade to 1.2.1
as soon as possible.
Nodes that are already on the A
chain SHOULDN'T upgrade to 1.2.1
, as upgrading would make no difference to them, and they would be reducing the set of available connections to peers that are already on the leading chain.
Node operators can check whether a local node is on the A
chain by running this one-liner:
docker exec witnet_node sh -c "witnet node blockchain --epoch 368699 --limit 1 2>&1 | grep -q '#368699 had digest 3b0b03df' && echo 'Your node seems to be OK' || echo 'Oh no! Your node is forked'"
This is the equivalent for nodes not using docker:
./witnet node blockchain --epoch 368699 --limit 1 2>&1 | grep -q '#368699 had digest 3b0b03df' && echo 'Your node seems to be OK' || echo 'Oh no! Your node is forked'
The source code for the recovery script can be found in the witnet-rust repository.
Most medium medium term actions mandated by WIP-0010 have been addressed by WIP-0009, WIP-0011, and WIP-0012. All these protocol improvements are bound to be activated on 28 April 2021 at 9am UTC, so no further action is needed at this time.