Skip to content

Commit

Permalink
Bulk fetch of batches in executor (MystenLabs#9624)
Browse files Browse the repository at this point in the history
## Description 

TLDR: This PR introduces bulk fetch for all batches required to execute
a committed subdag which improves catchup 2.5x

Was waiting to submit this PR in light of another PR that is being
tested/experimented to introduce batching for payload fetch in the
synchronizer and unite the following three batch fetch paths through a
shared `BatchFetcher` worker component.

1. Synchronize header batch (blocking)
2. Synchronize certificate batches (non-blocking)
3. Synchronize certificate batches & fetch batches for execution
(blocking)

The main goal of synchronizer being to get the payload into the workers
local store so that at the time of fetching we are hitting our local
worker and not a remote worker most of the time. The synchronizer will
feed digests to the BatchFetcher which will have a queue of missing
digests it needs to fetch and it can then bulk fetch those from local
store or remote workers in the background or blocking if required
immediately. Doing this will allow us to reduce the total number of
fetch requests significantly because we are bulk fetching and deduping
digests that are being fetched.

Unfortunately we have not seen major results yet with those changes as
there seem to be other bottlenecks that need to be fixed first which is
why we are deploying this incremental change first to speed up catchup
and tail latencies. Will send the follow up PR when the experimental
impact matches the risk of the refactor introduced.

## Test Plan 

Unit tests and benchmark cluster

- 8-node geo distributed
-- **Before batching** ~ 16-72 certs per second
Example total catchup time - [2 hours of down time, about 1 hour to
catchup](https://mysten.grafana.net/explore?left=%7B%22datasource%22:%228Xt1pVoVk%22,%22queries%22:%5B%7B%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22sum%20by%28host%29%20%28last_committed_round%7Bhost%3D~%5C%22ams-bnc-val-00%7Cewr-bnc-val-00%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%29%22,%22hide%22:false,%22range%22:true,%22refId%22:%22C%22,%22interval%22:%22%22%7D,%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22sum%20by%28host%29%20%28rate%28subscriber_processed_batches%7Bhost%3D~%5C%22ams-bnc-val-00%7Cewr-bnc-val-00%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%5B5m%5D%29%29%22,%22legendFormat%22:%22__auto%22,%22range%22:true,%22instant%22:true,%22hide%22:true%7D%5D,%22range%22:%7B%22from%22:%221678411411288%22,%22to%22:%221678424058450%22%7D%7D&orgId=1&right=%7B%22datasource%22:%228Xt1pVoVk%22,%22queries%22:%5B%7B%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22sum%20by%28host%29%20%28last_committed_round%7Bhost%3D~%5C%22ams-bnc-val-00%7Cewr-bnc-val-00%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%29%22,%22hide%22:true,%22range%22:true,%22refId%22:%22C%22,%22interval%22:%22%22%7D,%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22sum%20by%28host%29%20%28rate%28subscriber_processed_batches%7Bhost%3D~%5C%22ams-bnc-val-00%7Cewr-bnc-val-00%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%5B5m%5D%29%29%22,%22legendFormat%22:%22__auto%22,%22range%22:true,%22instant%22:true,%22hide%22:true%7D,%7B%22refId%22:%22B%22,%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22code%22,%22expr%22:%22sum%20by%20%28host%29%20%28rate%28sequencing_certificate_attempt%7Bhost%3D~%5C%22.%2A%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%5B$__rate_interval%5D%29%29%22,%22legendFormat%22:%22__auto%22,%22range%22:true,%22instant%22:true%7D%5D,%22range%22:%7B%22from%22:%221678411411288%22,%22to%22:%221678424058450%22%7D%7D)
Commit round rate - [2-9 rounds/s with one spike up to 15
r/s](https://mysten.grafana.net/explore?left=%7B%22datasource%22:%228Xt1pVoVk%22,%22queries%22:%5B%7B%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22sum%20by%28host%29%20%28rate%28last_committed_round%7Bhost%3D~%5C%22ams-bnc-val-00%7Cewr-bnc-val-00%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%5B5m%5D%29%29%22,%22hide%22:false,%22range%22:true,%22refId%22:%22C%22,%22interval%22:%22%22%7D,%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22sum%20by%28host%29%20%28rate%28subscriber_processed_batches%7Bhost%3D~%5C%22ams-bnc-val-00%7Cewr-bnc-val-00%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%5B5m%5D%29%29%22,%22legendFormat%22:%22__auto%22,%22range%22:true,%22instant%22:true,%22hide%22:true%7D%5D,%22range%22:%7B%22from%22:%221678394911392%22,%22to%22:%221678483239964%22%7D%7D&orgId=1&right=%7B%22datasource%22:%228Xt1pVoVk%22,%22queries%22:%5B%7B%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22sum%20by%28host%29%20%28last_committed_round%7Bhost%3D~%5C%22ams-bnc-val-00%7Cewr-bnc-val-00%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%29%22,%22hide%22:true,%22range%22:true,%22refId%22:%22C%22,%22interval%22:%22%22%7D,%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22sum%20by%28host%29%20%28rate%28subscriber_processed_batches%7Bhost%3D~%5C%22ams-bnc-val-00%7Cewr-bnc-val-00%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%5B5m%5D%29%29%22,%22legendFormat%22:%22__auto%22,%22range%22:true,%22instant%22:true,%22hide%22:true%7D,%7B%22refId%22:%22B%22,%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22code%22,%22expr%22:%22sum%20by%20%28host%29%20%28rate%28sequencing_certificate_attempt%7Bhost%3D~%5C%22.%2A%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%5B$__rate_interval%5D%29%29%22,%22legendFormat%22:%22__auto%22,%22range%22:true,%22instant%22:true%7D%5D,%22range%22:%7B%22from%22:%221678394911392%22,%22to%22:%221678483239964%22%7D%7D)
-- **After batching**  ~ 96-184 certs per second
Example total catchup time - [7.5 hours of down time, about 45 minutes
to
catchup](https://mysten.grafana.net/explore?left=%7B%22datasource%22:%228Xt1pVoVk%22,%22queries%22:%5B%7B%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22sum%20by%28host%29%20%28last_committed_round%7Bhost%3D~%5C%22ams-bnc-val-00%7Cewr-bnc-val-00%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%29%22,%22hide%22:false,%22range%22:true,%22refId%22:%22C%22,%22interval%22:%22%22%7D,%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22sum%20by%28host%29%20%28rate%28subscriber_processed_batches%7Bhost%3D~%5C%22ams-bnc-val-00%7Cewr-bnc-val-00%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%5B5m%5D%29%29%22,%22legendFormat%22:%22__auto%22,%22range%22:true,%22instant%22:true,%22hide%22:true%7D%5D,%22range%22:%7B%22from%22:%221678520418067%22,%22to%22:%221678552191281%22%7D%7D&orgId=1&right=%7B%22datasource%22:%228Xt1pVoVk%22,%22queries%22:%5B%7B%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22sum%20by%28host%29%20%28last_committed_round%7Bhost%3D~%5C%22ams-bnc-val-00%7Cewr-bnc-val-00%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%29%22,%22hide%22:true,%22range%22:true,%22refId%22:%22C%22,%22interval%22:%22%22%7D,%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22sum%20by%28host%29%20%28rate%28subscriber_processed_batches%7Bhost%3D~%5C%22ams-bnc-val-00%7Cewr-bnc-val-00%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%5B5m%5D%29%29%22,%22legendFormat%22:%22__auto%22,%22range%22:true,%22instant%22:true,%22hide%22:true%7D,%7B%22refId%22:%22B%22,%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22code%22,%22expr%22:%22sum%20by%20%28host%29%20%28rate%28sequencing_certificate_attempt%7Bhost%3D~%5C%22.%2A%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%5B$__rate_interval%5D%29%29%22,%22legendFormat%22:%22__auto%22,%22range%22:true,%22instant%22:true%7D%5D,%22range%22:%7B%22from%22:%221678520418067%22,%22to%22:%221678552191281%22%7D%7D)
Commit round rate - [12-23 rounds/s with one spike of about 40
r/s](https://mysten.grafana.net/explore?left=%7B%22datasource%22:%228Xt1pVoVk%22,%22queries%22:%5B%7B%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22sum%20by%28host%29%20%28rate%28last_committed_round%7Bhost%3D~%5C%22ams-bnc-val-00%7Cewr-bnc-val-00%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%5B5m%5D%29%29%22,%22hide%22:false,%22range%22:true,%22refId%22:%22C%22,%22interval%22:%22%22%7D,%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22sum%20by%28host%29%20%28rate%28subscriber_processed_batches%7Bhost%3D~%5C%22ams-bnc-val-00%7Cewr-bnc-val-00%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%5B5m%5D%29%29%22,%22legendFormat%22:%22__auto%22,%22range%22:true,%22instant%22:true,%22hide%22:true%7D%5D,%22range%22:%7B%22from%22:%221678520418067%22,%22to%22:%221678552191281%22%7D%7D&orgId=1&right=%7B%22datasource%22:%228Xt1pVoVk%22,%22queries%22:%5B%7B%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22sum%20by%28host%29%20%28last_committed_round%7Bhost%3D~%5C%22ams-bnc-val-00%7Cewr-bnc-val-00%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%29%22,%22hide%22:true,%22range%22:true,%22refId%22:%22C%22,%22interval%22:%22%22%7D,%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22sum%20by%28host%29%20%28rate%28subscriber_processed_batches%7Bhost%3D~%5C%22ams-bnc-val-00%7Cewr-bnc-val-00%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%5B5m%5D%29%29%22,%22legendFormat%22:%22__auto%22,%22range%22:true,%22instant%22:true,%22hide%22:true%7D,%7B%22refId%22:%22B%22,%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%228Xt1pVoVk%22%7D,%22editorMode%22:%22code%22,%22expr%22:%22sum%20by%20%28host%29%20%28rate%28sequencing_certificate_attempt%7Bhost%3D~%5C%22.%2A%5C%22,%20network%3D%5C%22benchmark%5C%22%7D%5B$__rate_interval%5D%29%29%22,%22legendFormat%22:%22__auto%22,%22range%22:true,%22instant%22:true%7D%5D,%22range%22:%7B%22from%22:%221678520418067%22,%22to%22:%221678552191281%22%7D%7D)

- 100 node geo distributed 
-- [**Before Batching** ~ 200-300 certs per second (2-3 rounds per
second)](https://mysten.grafana.net/d/ORCQSHfVk/subscriber-bulk-fetch-dashboard?var-Environment=mysten-metrics-internal&var-network=benchmark&var-validator=atl-bnc-val-00&var-validator=atl-bnc-val-01&orgId=1&from=1679379442288&to=1679398925376&viewPanel=1)
-- [**After Batching** ~300-500 certs per second (3-5 rounds per
second)](https://mysten.grafana.net/d/ORCQSHfVk/subscriber-bulk-fetch-dashboard?var-Environment=mysten-metrics-internal&var-network=benchmark&var-validator=atl-bnc-val-00&var-validator=atl-bnc-val-01&orgId=1&from=1679540599207&to=1679542836527)
  • Loading branch information
arun-koshy authored Mar 24, 2023
1 parent 018aeb7 commit 2a14e83
Show file tree
Hide file tree
Showing 13 changed files with 734 additions and 154 deletions.
2 changes: 2 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions narwhal/executor/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ thiserror = "1.0.35"
tokio = { workspace = true, features = ["sync"] }
tonic = "0.8.2"
tracing = "0.1.36"
itertools = "0.10.5"
prometheus = "0.13.3"
storage = { path = "../storage", package = "narwhal-storage" }

Expand Down
40 changes: 32 additions & 8 deletions narwhal/executor/src/metrics.rs
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
// Copyright (c) Mysten Labs, Inc.
// SPDX-License-Identifier: Apache-2.0
use prometheus::{
default_registry, register_histogram_with_registry, register_int_counter_with_registry,
register_int_gauge_with_registry, Histogram, IntCounter, IntGauge, Registry,
default_registry, register_histogram_with_registry, register_int_counter_vec_with_registry,
register_int_counter_with_registry, register_int_gauge_with_registry, Histogram, IntCounter,
IntCounterVec, IntGauge, Registry,
};

// buckets defined in seconds
Expand All @@ -11,6 +12,10 @@ const LATENCY_SEC_BUCKETS: &[f64] = &[
100.0, 200.0,
];

const POSITIVE_INT_BUCKETS: &[f64] = &[
1., 2., 5., 10., 20., 50., 100., 200., 500., 1000., 2000., 5000., 10000., 20000., 50000.,
];

#[derive(Clone, Debug)]
pub struct ExecutorMetrics {
/// occupancy of the channel from the `Subscriber` to `Notifier`
Expand All @@ -19,8 +24,6 @@ pub struct ExecutorMetrics {
pub subscriber_local_fetch_latency: Histogram,
/// Time it takes to download a payload from remote peer
pub subscriber_remote_fetch_latency: Histogram,
/// Number of times certificate was found locally
pub subscriber_local_hit: IntCounter,
/// Number of batches processed by subscriber
pub subscriber_processed_batches: IntCounter,
/// Round of last certificate seen by subscriber
Expand All @@ -38,6 +41,13 @@ pub struct ExecutorMetrics {
/// Latency between the time when the batch has been
/// created and when it has been fetched for execution
pub batch_execution_latency: Histogram,
/// The number of batches per committed subdag to be fetched
pub committed_subdag_batch_count: Histogram,
/// Latency for time taken to fetch all batches for committed subdag
/// either from local or remote worker.
pub batch_fetch_for_committed_subdag_total_latency: Histogram,
/// Counter of remote/local batch fetch statuses.
pub subscriber_batch_fetch: IntCounterVec,
}

impl ExecutorMetrics {
Expand Down Expand Up @@ -68,11 +78,19 @@ impl ExecutorMetrics {
"The number of certificates processed by Subscriber during the recovery period to fetch their payloads",
registry
).unwrap(),
subscriber_local_hit: register_int_counter_with_registry!(
"subscriber_local_hit",
"Number of times certificate was found locally",
committed_subdag_batch_count: register_histogram_with_registry!(
"committed_subdag_batch_count",
"The number of batches per committed subdag to be fetched",
POSITIVE_INT_BUCKETS.to_vec(),
registry
).unwrap(),
batch_fetch_for_committed_subdag_total_latency: register_histogram_with_registry!(
"batch_fetch_for_committed_subdag_total_latency",
"Latency for time taken to fetch all batches for committed subdag either from local or remote worker",
LATENCY_SEC_BUCKETS.to_vec(),
registry
)
.unwrap(),
subscriber_processed_batches: register_int_counter_with_registry!(
"subscriber_processed_batches",
"Number of batches processed by subscriber",
Expand Down Expand Up @@ -104,7 +122,13 @@ impl ExecutorMetrics {
"Latency between when the certificate has been created and when it reached the executor",
LATENCY_SEC_BUCKETS.to_vec(),
registry
).unwrap()
).unwrap(),
subscriber_batch_fetch: register_int_counter_vec_with_registry!(
"subscriber_batch_fetch",
"Counter of remote/local batch fetch statuses",
&["source", "status"],
registry
).unwrap(),
}
}
}
Expand Down
Loading

0 comments on commit 2a14e83

Please sign in to comment.