Skip to content

Commit

Permalink
[epoch] Do not start Narwhal if epoch mismatch (MystenLabs#7452)
Browse files Browse the repository at this point in the history
We currently read narwhal committee from the system object during
startup.
This can be wrong if we are in a crash recovery state: we have executed
the last transaction of the epoch (and hence on-chain system state
object is already at next epoch), but we haven't reconfigured Sui yet.
This could lead to Narwhal and Sui to be at different epoch, which is
fatal and could lead to all kinds of issues.
A proper fix is likely to put Narwhal committee in the epoch store, just
like the Sui committee, but that requires some proper refactoring and
will take some time.
This PR is a quick fix: we simply don't start Narwhal if they don't
match, and we rely on the reconfig process to start it properly.
  • Loading branch information
lxfind authored Jan 17, 2023
1 parent 69413f9 commit 9a8f1bb
Showing 1 changed file with 15 additions and 9 deletions.
24 changes: 15 additions & 9 deletions crates/sui-node/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ use sui_types::messages::QuorumDriverResponse;
use tokio::sync::{watch, Mutex};
use tokio::task::JoinHandle;
use tower::ServiceBuilder;
use tracing::info;
use tracing::{info, warn};
use typed_store::DBMetrics;
pub mod admin;
mod handle;
Expand Down Expand Up @@ -499,7 +499,7 @@ impl SuiNode {
);

let consensus_handler = Arc::new(ConsensusHandler::new(
epoch_store,
epoch_store.clone(),
checkpoint_service.clone(),
state.transaction_manager().clone(),
state.db(),
Expand All @@ -521,13 +521,19 @@ impl SuiNode {
.address;
let worker_cache = system_state.get_current_epoch_narwhal_worker_cache(transactions_addr);

narwhal_manager
.start(
committee.clone(),
SharedWorkerCache::from(worker_cache),
consensus_handler,
)
.await;
if committee.epoch == epoch_store.epoch() {
narwhal_manager
.start(
committee.clone(),
SharedWorkerCache::from(worker_cache),
consensus_handler,
)
.await;
} else {
warn!(
"Current Sui epoch doesn't match the system state epoch. Not starting Narwhal yet"
);
}

Ok(ValidatorComponents {
validator_server_handle,
Expand Down

0 comments on commit 9a8f1bb

Please sign in to comment.