-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
indexer-alt: simplify duration command-line parsing #20459
Merged
Merged
+1,572
−517
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## Description Switch the `IngestionConfig::retry_interval: Duration` field to `IngestionConfig::retry_interval_ms: u64`. This is to avoid dealing with parsing/deserializing a `Duration` as part of loading a config. This is the first step towards converting configs to a file-based format. ## Test plan CI.
## Description Switch - `PipelineConfig::commit_interval: Duration` and - `PipelineConfig::watermark_interval: Duration` to - `PipelineConfig::commit_interval_ms: u64` and - `PipelineConfig::watermark_interval_ms: u64` respectively. To avoid dealing with parsing/deserializing a `Duration` as part of loading a config. Continuing the work to support converting indexer's configs to a file-based format. ## Test plan CI.
…elay}_ms ## Description Switch - `ConsistencyConfig::consistency_pruning_interval: Duration` and - `ConsistencyConfig::pruner_delay: Duration` to - `ConsistencyConfig::consistency_pruning_interval_ms: u64` and - `ConsistencyConfig::pruner_delay_ms: u64` respectively. Note that this is a bigger change than other flags, because we are changing the unit that the value is written in from seconds to milliseconds. This was: - for consistency (no pun intended) -- it seems like it would be more confusing in a file-based config to have a different duration values using different units. - for precision: the inputs are all integers, so if we did want to set these values to half a second or something, we wouldn't be able to if it was denominated in whole seconds. ## Test plan CI
## Description Switch - `PrunerConfig::interval: Duration` and - `PrunerConfig::delay: Duration` to - `PrunerConfig::interval_ms: u64` and - `PrunerConfig::delay_ms: u64` respectively, to prepare for porting to file-based configs. ## Test plan CI
## Description Change `DbConfig::connection_timeout: Duration` to `DbConfig::connection_timeout: u64`, so we can drop the dependency on `const_str`. Also introduce a default value for the database URL so that we can implement a `Default` impl for `DbConfig` overall, and which serves as a single source of truth for defaults. ## Test plan ``` sui$ cargo run -p sui-indexer-alt -- --help ```
## Description `GraphQLConfig` is a derive macro for deriving `Serialize`, `Deserialize`, and `Debug`, and introducing `#[serde(default = ...)]` annotations that rely on the fields of the type's `Default` implementation. This allows config types to rely on their `Default` implementation as a single source of truth for default values, and is used through-out `sui-graphql-rpc` to define its config types. This change generalizes it (renames it) so it can be used in the indexer as well. ## Test plan Build and test `sui-graphql-rpc`, `sui-mvr-graphql-rpc`: ``` sui$ cargo nextest run -p sui-graphql-rpc sui$ cargo nextest run -p sui-graphql-e2e-tests sui$ cargo nextest run -p sui-mvr-graphql-rpc ``` This is a behaviour preserving change for the moment.
## Description `DefaultConfig` previously automatically added derives for `Serialize`, `Deserialize`, and `Clone`, `Eq` and `PartialEq`, which was convenient for its use in `sui-graphql-rpc`, but it does not generalize well, because: - Currently, it requires that the traits are in scope, which somewhat defeats the point of not having to explicitly derive them, in the case of `Serialize`/`Deserialize`. - The logic to detect when a derive is already applied is fragile, because it works by name (so it wouldn't play nicely with fully-qualified or aliased derives). This change removes that support in favour of a simpler approach: - `DefaultConfig` adds derives for `serde::Serialize` and `serde::Deserialize`, using fully-qualified names. - Other derives need to be added explicitly. - The logic to only add a derive if it hasn't already been added has been removed. ## Test plan CI for existing use cases.
## Description Add support to `sui-default-config` to change the naming scheme of its fields (previously always used kebab-case). This allows pipelines to retain the names they were given in source code (which use underscores). ## Test plan Will be used and tested in a future PR.
## Description Introduce file-based configs to `sui-indexer-alt`. This is done by: - Splitting arguments into ones that are passed by command-line argument (ending in `*Args`), and arguments that are passed by configuration file (ending in `*Config`). - Using `DefaultConfig` to ensure there's a single source of truth for the default values of file-based configs (their `Default` impl). - Fleshing out configurations for pipelines: - `PipelineConfig` becomes `CommitterConfig` because it only dealt with the commit side. - Introduced new `ConcurrentConfig` and `SequentialConfig` structs to represent the configs for a given pipeline. These were plumbed through to the functions that add pipelines to the indexer, which means that each pipeline gets its own configuration now, where previously they shared a committer configuration and their pruner could not be configured at all. This does mean that configuring the committer is more cumbersome now (changes needs to be replicated for each pipeline), but it gives the ability to configure the pruner, and add special cases to the committer as well. In future PRs, we will: - Re-introduce the ability to configure the committer in one place, and then override it for specific pipelines. - Introduce a command to generate a default config, to make it easier to start writing your own. - Move some more configuration into the file that were moved to various other places to make them easier to modify during experimentation: - Which pipelines are enabled. - Write concurrency. This PR also performs some smaller changes: - Some re-orderings to make function argument order the same, and to pull config structs up to be the first struct in their respective modules. - Introducing a test that we don't try and enable `skip_watermark` for a sequential pipeline (more relevant now that we configure the committers per pipeline). ## Test plan CI + we will be able to test the actual configuration part more easily once we have some more machinery set-up.
## Description Basic command for generating a config TOML based on the Default config. ## Test plan ``` sui$ cargo run -p sui-indexer-alt -- generate-config [ingestion] checkpoint-buffer-size = 5000 ingest-concurrency = 200 retry-interval-ms = 200 [consistency] consistent-pruning-interval-ms = 300000 pruner-delay-ms = 120000 [pipeline.sum-coin-balances] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.wal-coin-balances] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.sum-obj-types] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.wal-obj-types] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.sum-displays.committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.sum-packages.committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.ev-emit-mod.committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.ev-struct-inst.committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.kv-checkpoints.committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.kv-epoch-ends.committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.kv-epoch-starts.committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.kv-feature-flags.committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.kv-objects.committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.kv-protocol-configs.committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.kv-transactions.committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.obj-versions.committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.tx-affected-addresses.committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.tx-affected-objects.committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.tx-balance-changes.committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.tx-calls.committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.tx-digests.committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.tx-kinds.committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false ```
## Description Add a common `CommitterConfig` and per-pipeline overrides, to avoid repetition for parameters that are often shared. ## Test plan ``` sui$ cargo run -p sui-indexer-alt -- generate-config [ingestion] checkpoint-buffer-size = 5000 ingest-concurrency = 200 retry-interval-ms = 200 [consistency] consistent-pruning-interval-ms = 300000 pruner-delay-ms = 120000 [committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 skip-watermark = false [pipeline.sum-coin-balances] [pipeline.wal-coin-balances] [pipeline.sum-obj-types] [pipeline.wal-obj-types] [pipeline.sum-displays] [pipeline.sum-packages] [pipeline.ev-emit-mod] [pipeline.ev-struct-inst] [pipeline.kv-checkpoints] [pipeline.kv-epoch-ends] [pipeline.kv-epoch-starts] [pipeline.kv-feature-flags] [pipeline.kv-objects] [pipeline.kv-protocol-configs] [pipeline.kv-transactions] [pipeline.obj-versions] [pipeline.tx-affected-addresses] [pipeline.tx-affected-objects] [pipeline.tx-balance-changes] [pipeline.tx-calls] [pipeline.tx-digests] [pipeline.tx-kinds] ```
## Description It doesn't make much sense to skip watermarks for some pipelines and not others from one running instance of the indexer, and it is more common to keep everything else fixed in a configuration and change this flag from run to run, because it is related to the `--first-checkpoint` and `--last-checkpoint` flags, so moving it from the file-based config to the command-line config. ## Test plan ``` sui$ cargo run -p sui-indexer-alt -- generate-config [ingestion] checkpoint-buffer-size = 5000 ingest-concurrency = 200 retry-interval-ms = 200 [consistency] consistent-pruning-interval-ms = 300000 pruner-delay-ms = 120000 [committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 [pipeline.sum-coin-balances] [pipeline.wal-coin-balances] [pipeline.sum-obj-types] [pipeline.wal-obj-types] [pipeline.sum-displays] [pipeline.sum-packages] [pipeline.ev-emit-mod] [pipeline.ev-struct-inst] [pipeline.kv-checkpoints] [pipeline.kv-epoch-ends] [pipeline.kv-epoch-starts] [pipeline.kv-feature-flags] [pipeline.kv-objects] [pipeline.kv-protocol-configs] [pipeline.kv-transactions] [pipeline.obj-versions] [pipeline.tx-affected-addresses] [pipeline.tx-affected-objects] [pipeline.tx-balance-changes] [pipeline.tx-calls] [pipeline.tx-digests] [pipeline.tx-kinds] ```
## Description Rather than deciding which pipelines to run according to a `--pipeline` flag, decide them based on which pipelines have been configured in the TOML file. It suffices that the pipeline is mentioned in the config (no config fields need to be overridden for it). Additionally, this change makes it so that the write-ahead log portion of consistent tables is only written to when a checkpoint lag has been specified. ## Test plan Run indexers with the following configs: ``` [ingestion] remote-store-url = "https://checkpoints.mainnet.sui.io" [pipeline.kv-objects] [pipeline.kv-transactions] ``` ``` [ingestion] remote-store-url = "https://checkpoints.mainnet.sui.io" [consistency] consistent-range = 1000 [pipeline.sum-obj-types] ``` ``` [ingestion] remote-store-url = "https://checkpoints.mainnet.sui.io" [committer] collect-interval-ms = 1000 [pipeline.tx-calls] [pipeline.tx-affected-objects] collect-interval-ms = 5000 ```
## Description Move the source of checkpoint data (remote store or local path) into its own struct that is parsed using clap, instead of serde. This is to cater to the case where the same indexer configuration might be used to index different networks (mainnet, testnet, devnet, etc). ## Test plan Run the indexer against a variety of configs: ``` sui$ cargo run -p sui-indexer-alt --release -- \ indexer --last-checkpoint 10000 \ --remote-store-url https://checkpoints.mainnet.sui.io \ --config $CONFIG ``` Where config is a file that contains one of the following: ``` [pipeline.kv_objects] [pipeline.kv_transactions] ``` ``` [consistency] consistent-range = 1000 [pipeline.sum_obj_types] ``` ``` [committer] collect-interval-ms = 1000 [pipeline.tx_calls] [pipeline.tx_affected_objects] collect-interval-ms = 5000 ```
## Description TOML deserialization ignores fields that are not relevant to the struct being targeted. Given all the fields in our config have default values, this behaviour can cause confusion because it means that a typo on a field name will result in the value being ignored and replaced with its default value without warning. It's tough to change this behaviour wholesale (to detect and error on any unrecognised field name), but this chane tries to bridge most of the gap by detecting unexpected pipeline configurations and erroring for those. ## Test plan Run the indexer on a config with typos in its pipeline configs. It will now produce an error explaining that it doesn't recognise those configs: ``` sui$ cargo run -p sui-indexer-alt -- indexer --config /tmp/tx.toml --last-checkpoint 10000 --remote-store-url 'https://checkpoints.mainnet.sui.io' Error: Unexpected pipeline configurations: [tx-calls] [tx-affected-objects] collect-interval-ms = 5000 ```
## Description This override can now be performed using file-based configuration, so we don't need to hard code overrides in the binary. ## Test plan CI
## Description Introduce a `*Layer` type for each file-based `*Config` type. `*Layer` types typically follow the same structure as their corresponding config, but all their fields are optional. This abstraction will serve the following purposes (to be realised in future changes -- this change is a pure refactoring): - It allows us to isolate the dependency on `sui_default_config` to code that will only belong in our indexer, and not into the indexer framework. - Layers can be merged together, this allows us to accept multiple configuration files and merge them into one file. This can be used for storing the configs for each pipeline in their own files to mix and match, and/or to support parameterised indexing in future. - When converting from a Layer to its Config type (finishing it), we can check that the layer does not have any extra (unexpected) fields. ## Test plan CI + tests + run the indexer against some existing configs
The latest updates on your projects. Learn more about Vercel for Git ↗︎
3 Skipped Deployments
|
amnn
temporarily deployed
to
sui-typescript-aws-kms-test-env
November 28, 2024 13:19 — with
GitHub Actions
Inactive
This was referenced Nov 28, 2024
This was referenced Nov 28, 2024
bmwill
approved these changes
Nov 28, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, modulo the probably unintentional removal of some code in one commit.
## Description Add back the `--pipeline` command-line argument, layered on top of the file-based configuration. ## Test plan ``` sui$ cargo run -p sui-indexer-alt -- generate-config > /tmp/indexer.toml # Remove some pipelines sui$ cargo run -p sui-indexer-alt -- indexer \ --last-checkpoint 10000 \ --remote-store-url https://checkpoints.mainnet.sui.io \ --config /tmp/indexer.toml \ --pipeline i_dont_exist sui$ cargo run -p sui-indexer-alt -- indexer \ --last-checkpoint 10000 \ --remote-store-url https://checkpoints.mainnet.sui.io \ --config /tmp/indexer.toml \ --pipeline kv_objects --pipeline kv_transactions sui$ cargo run -p sui-indexer-alt -- indexer \ --last-checkpoint 10000 \ --remote-store-url https://checkpoints.mainnet.sui.io \ --config /tmp/indexer.toml ```
## Description Document how the macros for adding pipelines work. ## Test plan :eyes:
## Description Add an ability to merge together configurations. This is useful for configuring each pipeline in its own file and then merging them into pods, or combining the configurations of multiple application-specific indexers into one configuration. ## Test plan New unit tests: ``` sui$ cargo nextest run -p sui-indexer-alt ``` Run a test command: ``` sui$ cargo run -p sui-indexer-alt -- merge-configs \ --config tx.toml \ --config obj.toml \ --config kv.toml [ingestion] [consistency] consistent-range = 1000 [committer] collect-interval-ms = 1000 [pipeline.sum_obj_types] [pipeline.kv_objects] [pipeline.kv_transactions] [pipeline.tx_affected_objects] [pipeline.tx_calls] ```
## Description Similar to the change introducing a shared committer config, this change introduces a shared pruner config between all concurrent pipelines. The values in this config are only used to fill out concurrent pipelines who have already specified that they want pruning enabled (i.e. unlike the committer override which is always inherited, this will only be inherited if the pruner config is called for). Additionally, unlike other fields where the last write wins when merging configs, `PrunerConfig.retention` will be merged by `max`. This is to deal with merging the configs of two different apps that need to index the same table, but with different retentions. ## Test plan New unit tests: ``` sui$ cargo nextest run -p sui-indexer-alt ```
## Description Add an extra flattened generic, `toml::Table` to every file-based config struct to pick up any flags that were not recognised by the indexer. Extra fields will be logged as a warning and then ignored when processing flags, so that we are aware of potential typos, but will not cause an error to preserve some form of backwards compatibility (e.g. imagine a new pipeline is added to the indexer, and we update its TOML and try and run an old version of the indexer with the config that mentions the new pipeline). ## Test plan Try merging together some configs that contain typos in them: STDERR: ``` 2024-11-27T00:29:37.177152Z WARN sui_indexer_alt::config: Found unrecognized committer field which will be ignored by this indexer. This could be because of a typo, or because it was introduced in a newer version of the indexer: collect-interval-secs = 1000 2024-11-27T00:29:37.177242Z WARN sui_indexer_alt::config: Found unrecognized pipeline field which will be ignored by this indexer. This could be because of a typo, or because it was introduced in a newer version of the indexer: [tx-affected-objects] collect-interval-ms = 5000 2024-11-27T00:29:37.177328Z WARN sui_indexer_alt::config: Found unrecognized consistency field which will be ignored by this indexer. This could be because of a typo, or because it was introduced in a newer version of the indexer: consistency-range = 1000 ``` STDOUT: ``` [ingestion] [consistency] [committer] [pruner] [pipeline.sum_obj_types] [pipeline.kv_objects] [pipeline.kv_transactions] [pipeline.tx_calls] ```
## Description Using the default impl for `IndexerConfig` to generate the output for `generate-config` lead to sub-par results because most of its fields are optional, and so will not show up. This change introduces an explicit `example` function that is responsible for generating a non-default output that does a better job of documenting what fields and pipelines are available to configure. ## Test plan Generate a config: ``` sui$ cargo run -p sui-indexer-alt -- generate-config [ingestion] checkpoint-buffer-size = 5000 ingest-concurrency = 200 retry-interval-ms = 200 [consistency] consistent-pruning-interval-ms = 300000 pruner-delay-ms = 120000 [committer] write-concurrency = 5 collect-interval-ms = 500 watermark-interval-ms = 500 [pruner] interval-ms = 300000 delay-ms = 120000 retention = 4000000 max-chunk-size = 2000 [pipeline.sum_coin_balances] [pipeline.wal_coin_balances] [pipeline.sum_obj_types] [pipeline.wal_obj_types] [pipeline.sum_displays] [pipeline.sum_packages] [pipeline.ev_emit_mod] [pipeline.ev_struct_inst] [pipeline.kv_checkpoints] [pipeline.kv_epoch_ends] [pipeline.kv_epoch_starts] [pipeline.kv_feature_flags] [pipeline.kv_objects] [pipeline.kv_protocol_configs] [pipeline.kv_transactions] [pipeline.obj_versions] [pipeline.tx_affected_addresses] [pipeline.tx_affected_objects] [pipeline.tx_balance_changes] [pipeline.tx_calls] [pipeline.tx_digests] [pipeline.tx_kinds] ```
## Description Replace use of macros with a `Merge` trait. ## Test plan CI
## Description Avoid catch-all (`..`) patterns, and be explicit about the `extra` pattern, so that if we introduce new fields in future, we notice. ## Test plan CI
## Description Pluralize it to match the other instances of the name. ## Test plan CI
amnn
temporarily deployed
to
sui-typescript-aws-kms-test-env
November 29, 2024 17:12 — with
GitHub Actions
Inactive
amnn
temporarily deployed
to
sui-typescript-aws-kms-test-env
November 29, 2024 18:17 — with
GitHub Actions
Inactive
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
When accepting a duration as a command-line (or eventually file-based) configuration. Rather than using the
Duration
type directly, accept it as a number in milliseconds and expose a constructor on the config type to build theDuration
from it.This is helpful for
Duration
directly is more complicated.Duration
as adefault_value_t
which meant we could not easily share a default value between a command-line default and theimpl Default
.Test plan
CI +
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.