Skip to content

Commit

Permalink
[consensus] rename Pacemaker to RoundState
Browse files Browse the repository at this point in the history
To better represent the responsibility of the struct, rename Pacemaker
to RoundState which contains information about a specific round (round,
pending votes, local vote, round deadline etc).

Closes: aptos-labs#4103
Approved by: dmitri-perelman
  • Loading branch information
zekun000 authored and bors-libra committed May 28, 2020
1 parent 6560105 commit ffe7fad
Show file tree
Hide file tree
Showing 20 changed files with 100 additions and 101 deletions.
4 changes: 2 additions & 2 deletions config/src/config/consensus_config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ pub struct ConsensusConfig {
pub max_block_size: u64,
pub contiguous_rounds: u32,
pub max_pruned_blocks_in_mem: usize,
pub pacemaker_initial_timeout_ms: u64,
pub round_initial_timeout_ms: u64,
pub proposer_type: ConsensusProposerType,
pub safety_rules: SafetyRulesConfig,
}
Expand All @@ -26,7 +26,7 @@ impl Default for ConsensusConfig {
}),
contiguous_rounds: 2,
max_pruned_blocks_in_mem: 10000,
pacemaker_initial_timeout_ms: 1000,
round_initial_timeout_ms: 1000,
safety_rules: SafetyRulesConfig::default(),
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ role = "validator"
[consensus]
max_block_size = 1000
max_pruned_blocks_in_mem = 10000
pacemaker_initial_timeout_ms = 1000
round_initial_timeout_ms = 1000
contiguous_rounds = 2

[consensus.proposer_type]
Expand Down
2 changes: 1 addition & 1 deletion config/src/config/test_data/single.node.config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ seed_peers_file = ""
[consensus]
max_block_size = 1000
max_pruned_blocks_in_mem = 10000
pacemaker_initial_timeout_ms = 1000
round_initial_timeout_ms = 1000
contiguous_rounds = 2

[consensus.proposer_type]
Expand Down
8 changes: 4 additions & 4 deletions consensus/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,12 @@ A block is committed when a contiguous 3-chain commit rule is met. A block at ro

We evaluated several BFT-based protocols against the dimensions of performance, reliability, security, ease of robust implementation, and operational overhead for validators. Our goal was to choose a protocol that would initially support at least 100 validators and would be able to evolve over time to support 500–1,000 validators. We had three reasons for selecting the HotStuff protocol as the basis for LibraBFT: (i) simplicity and modularity; (ii) ability to easily integrate consensus with execution; and (iii) promising performance in early experiments.

The HotStuff protocol decomposes into modules for safety (voting and commit rules) and liveness (pacemaker). This decoupling provides the ability to develop and experiment independently and on different modules in parallel. Due to the simple voting and commit rules, protocol safety is easy to implement and verify. It is straightforward to integrate execution as a part of consensus to avoid forking issues that arise from non-deterministic execution in a leader-based protocol. Finally, our early prototypes confirmed high throughput and low transaction latency as independently measured in [HotStuff]((https://arxiv.org/pdf/1803.05069.pdf)). We did not consider proof-of-work based protocols, such as [Bitcoin](https://bitcoin.org/bitcoin.pdf), due to their poor performance
The HotStuff protocol decomposes into modules for safety (voting and commit rules) and liveness (round_state). This decoupling provides the ability to develop and experiment independently and on different modules in parallel. Due to the simple voting and commit rules, protocol safety is easy to implement and verify. It is straightforward to integrate execution as a part of consensus to avoid forking issues that arise from non-deterministic execution in a leader-based protocol. Finally, our early prototypes confirmed high throughput and low transaction latency as independently measured in [HotStuff]((https://arxiv.org/pdf/1803.05069.pdf)). We did not consider proof-of-work based protocols, such as [Bitcoin](https://bitcoin.org/bitcoin.pdf), due to their poor performance
and high energy (and environmental) costs.

### HotStuff Extensions and Modifications

In LibraBFT, to better support the goals of the Libra ecosystem, we extend and adapt the core HotStuff protocol and implementation in several ways. Importantly, we reformulate the safety conditions and provide extended proofs of safety, liveness, and optimistic responsiveness. We also implement a number of additional features. First, we make the protocol more resistant to non-determinism bugs, by having validators collectively sign the resulting state of a block rather than just the sequence of transactions. This also allows clients to use quorum certificates to authenticate reads from the database. Second, we design a pacemaker that emits explicit timeouts, and validators rely on a quorum of those to move to the next round — without requiring synchronized clocks. Third, we intend to design an unpredictable leader election mechanism in which the leader of a round is determined by the proposer of the latest committed block using a verifiable random function [VRF](https://people.csail.mit.edu/silvio/Selected%20Scientific%20Papers/Pseudo%20Randomness/Verifiable_Random_Functions.pdf). This mechanism limits the window of time in which an adversary can launch an effective denial-of-service attack against a leader. Fourth, we use aggregate signatures that preserve the identity of validators who sign quorum certificates. This allows us to provide incentives to validators that contribute to quorum certificates. Aggregate signatures also do not require a complex [threshold key setup](https://www.cypherpunks.ca/~iang/pubs/DKG.pdf).
In LibraBFT, to better support the goals of the Libra ecosystem, we extend and adapt the core HotStuff protocol and implementation in several ways. Importantly, we reformulate the safety conditions and provide extended proofs of safety, liveness, and optimistic responsiveness. We also implement a number of additional features. First, we make the protocol more resistant to non-determinism bugs, by having validators collectively sign the resulting state of a block rather than just the sequence of transactions. This also allows clients to use quorum certificates to authenticate reads from the database. Second, we design a round_state that emits explicit timeouts, and validators rely on a quorum of those to move to the next round — without requiring synchronized clocks. Third, we intend to design an unpredictable leader election mechanism in which the leader of a round is determined by the proposer of the latest committed block using a verifiable random function [VRF](https://people.csail.mit.edu/silvio/Selected%20Scientific%20Papers/Pseudo%20Randomness/Verifiable_Random_Functions.pdf). This mechanism limits the window of time in which an adversary can launch an effective denial-of-service attack against a leader. Fourth, we use aggregate signatures that preserve the identity of validators who sign quorum certificates. This allows us to provide incentives to validators that contribute to quorum certificates. Aggregate signatures also do not require a complex [threshold key setup](https://www.cypherpunks.ca/~iang/pubs/DKG.pdf).

## Implementation Details

Expand All @@ -44,7 +44,7 @@ The consensus component is mostly implemented in the [Actor](https://en.wikipedi
* **StateComputer** is the interface for accessing the execution component. It can execute blocks, commit blocks, and can synchronize state.
* **BlockStore** maintains the tree of proposal blocks, block execution, votes, quorum certificates, and persistent storage. It is responsible for maintaining the consistency of the combination of these data structures and can be concurrently accessed by other subcomponents.
* **RoundManager** is responsible for processing the individual events (e.g., process_new_round, process_proposal, process_vote). It exposes the async processing functions for each event type and drives the protocol.
* **Pacemaker** is responsible for the liveness of the consensus protocol. It changes rounds due to timeout certificates or quorum certificates and proposes blocks when it is the proposer for the current round.
* **RoundState** is responsible for the liveness of the consensus protocol. It changes rounds due to timeout certificates or quorum certificates and proposes blocks when it is the proposer for the current round.
* **SafetyRules** is responsible for the safety of the consensus protocol. It processes quorum certificates and LedgerInfo to learn about new commits and guarantees that the two voting rules are followed — even in the case of restart (since all safety data is persisted to local storage).

All consensus messages are signed by their creators and verified by their receivers. Message verification occurs closest to the network layer to avoid invalid or unnecessary data from entering the consensus protocol.
Expand All @@ -55,7 +55,7 @@ All consensus messages are signed by their creators and verified by their receiv
├── src
│   ├── block_storage # In-memory storage of blocks and related data structures
│   ├── consensusdb # Database interaction to persist consensus data for safety and liveness
│   ├── liveness # Pacemaker, proposer, and other liveness related code
│   ├── liveness # RoundState, proposer, and other liveness related code
│   └── test_utils # Mock implementations that are used for testing only
└── consensus-types # Consensus data types (i.e. quorum certificates)
└── safety-rules # Safety (voting) rules
3 changes: 1 addition & 2 deletions consensus/src/consensus_provider.rs
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,7 @@ pub fn start_consensus(
));
let time_service = Arc::new(ClockTimeService::new(runtime.handle().clone()));

let (timeout_sender, timeout_receiver) =
channel::new(1_024, &counters::PENDING_PACEMAKER_TIMEOUTS);
let (timeout_sender, timeout_receiver) = channel::new(1_024, &counters::PENDING_ROUND_TIMEOUTS);
let (self_sender, self_receiver) = channel::new(1_024, &counters::PENDING_SELF_MESSAGES);

let epoch_mgr = EpochManager::new(
Expand Down
14 changes: 7 additions & 7 deletions consensus/src/counters.rs
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,11 @@ pub static PREFERRED_BLOCK_ROUND: Lazy<IntGauge> = Lazy::new(|| {
.unwrap()
});

/// This counter is set to the last round reported by the local pacemaker.
/// This counter is set to the last round reported by the local round_state.
pub static CURRENT_ROUND: Lazy<IntGauge> = Lazy::new(|| {
register_int_gauge!(
"libra_consensus_current_round",
"This counter is set to the last round reported by the local pacemaker."
"This counter is set to the last round reported by the local round_state."
)
.unwrap()
});
Expand Down Expand Up @@ -153,7 +153,7 @@ pub static VOTE_NIL_COUNT: Lazy<IntCounter> = Lazy::new(|| {
});

//////////////////////
// PACEMAKER COUNTERS
// RoundState COUNTERS
//////////////////////
/// Count of the rounds that gathered QC since last restart.
pub static QC_ROUNDS_COUNT: Lazy<IntCounter> = Lazy::new(|| {
Expand Down Expand Up @@ -418,11 +418,11 @@ pub static PENDING_SELF_MESSAGES: Lazy<IntGauge> = Lazy::new(|| {
.unwrap()
});

/// Count of the pending outbound pacemaker timeouts
pub static PENDING_PACEMAKER_TIMEOUTS: Lazy<IntGauge> = Lazy::new(|| {
/// Count of the pending outbound round timeouts
pub static PENDING_ROUND_TIMEOUTS: Lazy<IntGauge> = Lazy::new(|| {
register_int_gauge!(
"libra_consensus_pending_pacemaker_timeouts",
"Count of the pending outbound pacemaker timeouts"
"libra_consensus_pending_round_timeouts",
"Count of the pending outbound round timeouts"
)
.unwrap()
});
22 changes: 11 additions & 11 deletions consensus/src/epoch_manager.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ use crate::{
liveness::{
leader_reputation::{ActiveInactiveHeuristic, LeaderReputation, LibraDBBackend},
multi_proposer_election::MultiProposer,
pacemaker::{ExponentialTimeInterval, Pacemaker},
proposal_generator::ProposalGenerator,
proposer_election::ProposerElection,
rotating_proposer_election::{choose_leader, RotatingProposer},
round_state::{ExponentialTimeInterval, RoundState},
},
network::{IncomingBlockRetrievalRequest, NetworkReceivers, NetworkSender},
network_interface::{ConsensusMsg, ConsensusNetworkSender},
Expand Down Expand Up @@ -126,19 +126,19 @@ impl<T: Payload> EpochManager<T> {
self.epoch_info().epoch
}

fn create_pacemaker(
fn create_round_state(
&self,
time_service: Arc<dyn TimeService>,
timeout_sender: channel::Sender<Round>,
) -> Pacemaker {
) -> RoundState {
// 1.5^6 ~= 11
// Timeout goes from initial_timeout to initial_timeout*11 in 6 steps
let time_interval = Box::new(ExponentialTimeInterval::new(
Duration::from_millis(self.config.pacemaker_initial_timeout_ms),
Duration::from_millis(self.config.round_initial_timeout_ms),
1.5,
6,
));
Pacemaker::new(time_interval, time_service, timeout_sender)
RoundState::new(time_interval, time_service, timeout_sender)
}

/// Create a proposer election handler based on proposers
Expand Down Expand Up @@ -306,9 +306,9 @@ impl<T: Payload> EpochManager<T> {
self.config.max_block_size,
);

info!("Create Pacemaker");
let pacemaker =
self.create_pacemaker(self.time_service.clone(), self.timeout_sender.clone());
info!("Create RoundState");
let round_state =
self.create_round_state(self.time_service.clone(), self.timeout_sender.clone());

info!("Create ProposerElection");
let proposer_election = self.create_proposer_election(&epoch_info);
Expand All @@ -322,7 +322,7 @@ impl<T: Payload> EpochManager<T> {
let mut processor = RoundManager::new(
epoch_info,
block_store,
pacemaker,
round_state,
proposer_election,
proposal_generator,
safety_rules,
Expand Down Expand Up @@ -485,7 +485,7 @@ impl<T: Payload> EpochManager<T> {

pub async fn start(
mut self,
mut pacemaker_timeout_sender_rx: channel::Receiver<Round>,
mut round_timeout_sender_rx: channel::Receiver<Round>,
mut network_receivers: NetworkReceivers<T>,
mut reconfig_events: libra_channel::Receiver<(), OnChainConfigPayload>,
) {
Expand All @@ -509,7 +509,7 @@ impl<T: Payload> EpochManager<T> {
idle_duration = pre_select_instant.elapsed();
self.process_block_retrieval(block_retrieval).await
}
round = pacemaker_timeout_sender_rx.select_next_some() => {
round = round_timeout_sender_rx.select_next_some() => {
idle_duration = pre_select_instant.elapsed();
self.process_local_timeout(round).await
}
Expand Down
6 changes: 3 additions & 3 deletions consensus/src/liveness/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,16 @@

pub(crate) mod leader_reputation;
pub(crate) mod multi_proposer_election;
pub(crate) mod pacemaker;
pub(crate) mod proposal_generator;
pub(crate) mod proposer_election;
pub(crate) mod rotating_proposer_election;
pub(crate) mod round_state;

#[cfg(test)]
mod leader_reputation_test;
#[cfg(test)]
mod multi_proposer_test;
#[cfg(test)]
mod pacemaker_test;
#[cfg(test)]
mod rotating_proposer_test;
#[cfg(test)]
mod round_state_test;
2 changes: 1 addition & 1 deletion consensus/src/liveness/proposal_generator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ mod proposal_generator_test;
/// used by a validator that believes it's a valid candidate for serving as a proposer at a given
/// round.
/// ProposalGenerator is the one choosing the branch to extend:
/// - round is given by the caller (typically determined by Pacemaker).
/// - round is given by the caller (typically determined by RoundState).
/// The transactions for the proposed block are delivered by TxnManager.
///
/// TxnManager should be aware of the pending transactions in the branch that it is extending,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ impl fmt::Display for NewRoundReason {
}
}

/// NewRoundEvents produced by Pacemaker are guaranteed to be monotonically increasing.
/// NewRoundEvents produced by RoundState are guaranteed to be monotonically increasing.
/// NewRoundEvents are consumed by the rest of the system: they can cause sending new proposals
/// or voting for some proposals that wouldn't have been voted otherwise.
/// The duration is populated for debugging and testing
Expand All @@ -54,7 +54,7 @@ impl fmt::Display for NewRoundEvent {

/// Determines the maximum round duration based on the round difference between the current
/// round and the committed round
pub trait PacemakerTimeInterval: Send + Sync + 'static {
pub trait RoundTimeInterval: Send + Sync + 'static {
/// Use the index of the round after the highest quorum certificate to commit a block and
/// return the duration for this round
///
Expand Down Expand Up @@ -94,7 +94,7 @@ impl ExponentialTimeInterval {
pub fn new(base: Duration, exponent_base: f64, max_exponent: usize) -> Self {
assert!(
max_exponent < 32,
"max_exponent for PacemakerTimeInterval should be <32"
"max_exponent for RoundStateTimeInterval should be <32"
);
assert!(
exponent_base.powf(max_exponent as f64).ceil() < f64::from(std::u32::MAX),
Expand All @@ -108,7 +108,7 @@ impl ExponentialTimeInterval {
}
}

impl PacemakerTimeInterval for ExponentialTimeInterval {
impl RoundTimeInterval for ExponentialTimeInterval {
fn get_round_duration(&self, round_index_after_committed_qc: usize) -> Duration {
let pow = round_index_after_committed_qc.min(self.max_exponent) as u32;
let base_multiplier = self.exponent_base.powf(f64::from(pow));
Expand All @@ -117,27 +117,27 @@ impl PacemakerTimeInterval for ExponentialTimeInterval {
}
}

/// `Pacemaker` is a Pacemaker implementation that is responsible for generating the new round
/// and local timeout events.
/// `RoundState` contains information about a specific round and moves forward when
/// receives new certificates.
///
/// A round `r` starts in the following cases:
/// * there is a QuorumCert for round `r-1`,
/// * there is a TimeoutCertificate for round `r-1`.
///
/// Round interval calculation is the responsibility of the PacemakerTimeoutInterval trait. It
/// Round interval calculation is the responsibility of the RoundStateTimeoutInterval trait. It
/// depends on the delta between the current round and the highest committed round (the intuition is
/// that we want to exponentially grow the interval the further the current round is from the last
/// committed round).
///
/// Whenever a new round starts a local timeout is set following the round interval. This local
/// timeout is going to send the timeout events once in interval until the new round starts.
pub struct Pacemaker {
pub struct RoundState {
// Determines the time interval for a round given the number of non-committed rounds since
// last commit.
time_interval: Box<dyn PacemakerTimeInterval>,
time_interval: Box<dyn RoundTimeInterval>,
// Highest known committed round as reported by the caller. The caller might choose not to
// inform the Pacemaker about certain committed rounds (e.g., NIL blocks): in this case the
// committed round in Pacemaker might lag behind the committed round of a block tree.
// inform the RoundState about certain committed rounds (e.g., NIL blocks): in this case the
// committed round in RoundState might lag behind the committed round of a block tree.
highest_committed_round: Round,
// Current round is max{highest_qc, highest_tc} + 1.
current_round: Round,
Expand All @@ -155,9 +155,9 @@ pub struct Pacemaker {
}

#[allow(dead_code)]
impl Pacemaker {
impl RoundState {
pub fn new(
time_interval: Box<dyn PacemakerTimeInterval>,
time_interval: Box<dyn RoundTimeInterval>,
time_service: Arc<dyn TimeService>,
timeout_sender: channel::Sender<Round>,
) -> Self {
Expand Down Expand Up @@ -201,7 +201,7 @@ impl Pacemaker {
true
}

/// Notify the Pacemaker about the potentially new QC, TC, and highest committed round.
/// Notify the RoundState about the potentially new QC, TC, and highest committed round.
/// Note that some of these values might not be available by the caller.
pub fn process_certificates(&mut self, sync_info: SyncInfo) -> Option<NewRoundEvent> {
if sync_info.highest_commit_round() > self.highest_committed_round {
Expand Down
Loading

0 comments on commit ffe7fad

Please sign in to comment.