Skip to content

Commit

Permalink
[Consensus] refine logging & small timeout changes (#18795)
Browse files Browse the repository at this point in the history
## Description 

* Increase timeout when waiting for consensus to start from `30s ->
60s`. Came across cases where commit recovery was taking a bit longer
and might make sense to have a bit more tolerance.
* Add more logs in consensus components and switch them to `info` - it's
ok since those should be printed once per epoch
* Make the consensus protocol choice log a `debug` . The method
`get_consensus_protocol_in_epoch` is now being used from
`get_max_accumulated_txn_cost_per_object_in_commit` which is called on
every sequenced transaction leading to a very spammy log output:

<img width="1459" alt="Screenshot 2024-07-25 at 11 37 09"
src="https://github.com/user-attachments/assets/4971ae31-c033-46f6-bd16-bb6c4052640d">



## Test plan 

How did you test the new or updated feature?

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
- [ ] REST API:
  • Loading branch information
akichidis authored Jul 25, 2024
1 parent a4812b9 commit d1efa1c
Show file tree
Hide file tree
Showing 5 changed files with 13 additions and 8 deletions.
5 changes: 3 additions & 2 deletions consensus/core/src/commit_observer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ impl CommitObserver {
.scan_commits(((last_processed_commit_index + 1)..=CommitIndex::MAX).into())
.expect("Scanning commits should not fail");

debug!("Recovering commit observer after index {last_processed_commit_index} with last commit {:?} and {} unsent commits", last_commit, unsent_commits.len());
info!("Recovering commit observer after index {last_processed_commit_index} with last commit {} and {} unsent commits", last_commit.map(|c|c.index()).unwrap_or_default(), unsent_commits.len());

// Resend all the committed subdags to the consensus output channel
// for all the commits above the last processed index.
Expand All @@ -144,6 +144,7 @@ impl CommitObserver {
vec![]
};

info!("Sending commit {} during recovery", commit.index());
let committed_sub_dag =
load_committed_subdag_from_store(self.store.as_ref(), commit, reputation_scores);
self.sender.send(committed_sub_dag).unwrap_or_else(|e| {
Expand All @@ -156,7 +157,7 @@ impl CommitObserver {
last_sent_commit_index += 1;
}

debug!(
info!(
"Commit observer recovery completed, took {:?}",
now.elapsed()
);
Expand Down
2 changes: 1 addition & 1 deletion consensus/core/src/core.rs
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,7 @@ impl Core {
.unwrap();
}

debug!(
info!(
"Core recovery completed with last proposed block {:?}",
self.last_proposed_block
);
Expand Down
8 changes: 6 additions & 2 deletions consensus/core/src/network/tonic_network.rs
Original file line number Diff line number Diff line change
Expand Up @@ -659,7 +659,7 @@ impl<S: NetworkService> NetworkManager<S> for TonicManager {
.with_label_values(&["tonic"])
.set(1);

debug!("Starting tonic service");
info!("Starting tonic service");

let authority = self.context.committee.authority(self.context.own_index);
// Bind to localhost in unit tests since only local networking is needed.
Expand Down Expand Up @@ -742,10 +742,14 @@ impl<S: NetworkService> NetworkManager<S> for TonicManager {
};
}

info!("Binding tonic server to address {:?}", own_address);

// Create TcpListener via TCP socket.
let socket = create_socket(&own_address);
match socket.bind(own_address) {
Ok(_) => {}
Ok(_) => {
info!("Successfully bound tonic server to address {:?}", own_address)
}
Err(e) => {
warn!("Error binding to {own_address}: {e:?}");
tokio::time::sleep(Duration::from_secs(1)).await;
Expand Down
4 changes: 2 additions & 2 deletions crates/sui-core/src/consensus_manager/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ use sui_types::error::SuiResult;
use sui_types::messages_consensus::ConsensusTransaction;
use tokio::sync::{Mutex, MutexGuard};
use tokio::time::{sleep, timeout};
use tracing::info;
use tracing::{debug, info};

pub mod mysticeti_manager;
pub mod narwhal_manager;
Expand Down Expand Up @@ -170,7 +170,7 @@ impl ConsensusManager {
return protocol;
}
_ => {
info!("Invalid consensus choice {} in env var. Continue to pick consensus with protocol config", consensus_choice);
debug!("Invalid consensus choice {} in env var. Continue to pick consensus with protocol config", consensus_choice);
}
};
}
Expand Down
2 changes: 1 addition & 1 deletion crates/sui-core/src/mysticeti_adapter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ impl LazyMysticetiClient {

// We expect this to get called during the SUI process start. After that at least one
// object will have initialised and won't need to call again.
const MYSTICETI_START_TIMEOUT: Duration = Duration::from_secs(30);
const MYSTICETI_START_TIMEOUT: Duration = Duration::from_secs(60);
const LOAD_RETRY_TIMEOUT: Duration = Duration::from_millis(100);
if let Ok(client) = timeout(MYSTICETI_START_TIMEOUT, async {
loop {
Expand Down

0 comments on commit d1efa1c

Please sign in to comment.