Skip to content

Commit

Permalink
feat: migrate and refactor replica-healthcheck from op-replica
Browse files Browse the repository at this point in the history
  • Loading branch information
annieke committed Jul 30, 2021
1 parent 7ffb835 commit 4319e45
Show file tree
Hide file tree
Showing 20 changed files with 553 additions and 1 deletion.
5 changes: 5 additions & 0 deletions .changeset/thick-peaches-learn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@eth-optimism/replica-healthcheck': minor
---

Add replica-healthcheck to monorepo
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,5 @@
# packages/message-relayer/ @K-Ho
# packages/batch-submitter/ @annieke @karlfloersch
# packages/data-transport-layer/ @annieke
# packages/replica-healthcheck/ @annieke
# integration-tests/ @tynes
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Extensive documentation is available [here](http://community.optimism.io/docs/).
* [`data-transport-layer`](./packages/data-transport-layer): Event indexer, allowing the `l2geth` node to access L1 data
* [`batch-submitter`](./packages/batch-submitter): Daemon for submitting L2 transaction and state root batches to L1
* [`message-relayer`](./packages/message-relayer): Service for relaying L2 messages to L1
* [`replica-healthcheck`](./packages/replica-healthcheck): Service to monitor the health of different replica deployments
* [`l2geth`](./l2geth): Fork of [go-ethereum v1.9.10](https://github.com/ethereum/go-ethereum/tree/v1.9.10) implementing the [OVM](https://research.paradigm.xyz/optimism#optimistic-geth).
* [`integration-tests`](./integration-tests): Integration tests between a L1 testnet, `l2geth`,
* [`ops`](./ops): Contains Dockerfiles for containerizing each service involved in the protocol,
Expand Down
1 change: 1 addition & 0 deletions ops/docker/Dockerfile.monorepo
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ COPY packages/contracts/package.json ./packages/contracts/package.json
COPY packages/data-transport-layer/package.json ./packages/data-transport-layer/package.json
COPY packages/batch-submitter/package.json ./packages/batch-submitter/package.json
COPY packages/message-relayer/package.json ./packages/message-relayer/package.json
COPY packages/replica-healthcheck/package.json ./packages/replica-healthcheck/package.json
COPY integration-tests/package.json ./integration-tests/package.json

RUN yarn install --frozen-lockfile
Expand Down
4 changes: 4 additions & 0 deletions packages/replica-healthcheck/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
REPLICA_HEALTHCHECK__ETH_NETWORK=mainnet
REPLICA_HEALTHCHECK__ETH_NETWORK_RPC_PROVIDER=https://mainnet.optimism.io
REPLICA_HEALTHCHECK__ETH_REPLICA_RPC_PROVIDER=http://localhost:9991
REPLICA_HEALTHCHECK__L2GETH_IMAGE_TAG=0.4.7
3 changes: 3 additions & 0 deletions packages/replica-healthcheck/.eslintrc.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
module.exports = {
extends: '../../.eslintrc.js',
}
2 changes: 2 additions & 0 deletions packages/replica-healthcheck/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
node_modules/
build/
2 changes: 2 additions & 0 deletions packages/replica-healthcheck/.lintstagedrc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
"*.{ts,js}":
- eslint
3 changes: 3 additions & 0 deletions packages/replica-healthcheck/.prettierrc.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
module.exports = {
...require('../../.prettierrc.js'),
};
22 changes: 22 additions & 0 deletions packages/replica-healthcheck/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
(The MIT License)

Copyright 2020-2021 Optimism

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
37 changes: 37 additions & 0 deletions packages/replica-healthcheck/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# @eth-optimism/replica-healthcheck

## What is this?

`replica-healthcheck` is an express server to be run alongside a replica instance, to ensure that the replica is healthy. Currently, it exposes metrics on syncing stats and exits when the replica has a mismatched state root against the sequencer.

## Getting started

### Building and usage

After cloning and switching to the repository, install dependencies:

```bash
$ yarn
```

Use the following commands to build, use, test, and lint:

```bash
$ yarn build
$ yarn start
$ yarn test
$ yarn lint
```

### Configuration

We're using `dotenv` for our configuration.
To configure the project, clone this repository and copy the `env.example` file to `.env`.
Here's a list of environment variables:

| Variable | Purpose | Default |
| ----------------------------------------------- | -------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
| REPLICA_HEALTHCHECK\_\_ETH_NETWORK | Ethereum Layer1 and Layer2 network (mainnet,kovan) | mainnet (change to `kovan` for the test network) |
| REPLICA_HEALTHCHECK\_\_ETH_NETWORK_RPC_PROVIDER | Layer2 source of truth endpoint, used for the sync check | https://mainnet.optimism.io (change to `https://kovan.optimism.io` for the test network) |
| REPLICA_HEALTHCHECK\_\_ETH_REPLICA_RPC_PROVIDER | Layer2 local replica endpoint, used for the sync check | http://localhost:9991 |
| REPLICA_HEALTHCHECK\_\_L2GETH_IMAGE_TAG | L2geth version | 0.4. |
39 changes: 39 additions & 0 deletions packages/replica-healthcheck/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
{
"name": "@eth-optimism/replica-healthcheck",
"version": "0.1.0",
"private": true,
"main": "dist/index",
"files": [
"dist/index"
],
"types": "dist/index",
"author": "Optimism PBC",
"license": "MIT",
"scripts": {
"clean": "rimraf ./dist ./tsconfig.build.tsbuildinfo",
"lint": "yarn run lint:fix && yarn run lint:check",
"lint:fix": "yarn lint:check --fix",
"lint:check": "eslint .",
"build": "tsc -p tsconfig.build.json",
"pre-commit": "lint-staged",
"test": "ts-mocha test/*.spec.ts",
"start": "ts-node ./src/exec/run-healthcheck-server.ts"
},
"devDependencies": {
"@types/express": "^4.17.12",
"@types/node": "^15.12.2",
"dotenv": "^10.0.0",
"supertest": "^6.1.4",
"ts-mocha": "^8.0.0",
"ts-node": "^10.0.0",
"typescript": "^4.3.2"
},
"dependencies": {
"@eth-optimism/common-ts": "0.1.5",
"@eth-optimism/core-utils": "^0.5.1",
"ethers": "^5.3.0",
"express": "^4.17.1",
"express-prom-bundle": "^6.3.6",
"prom-client": "^13.1.0"
}
}
14 changes: 14 additions & 0 deletions packages/replica-healthcheck/src/exec/run-healthcheck-server.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import * as dotenv from 'dotenv'

import { HealthcheckServer, readConfig } from '..'
;(async () => {
dotenv.config()

const healthcheckServer = new HealthcheckServer(readConfig())

healthcheckServer.init()
await healthcheckServer.runSyncCheck()
})().catch((err) => {
console.log(err)
process.exit(1)
})
153 changes: 153 additions & 0 deletions packages/replica-healthcheck/src/healthcheck-server.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
import express from 'express'
import { Server } from 'net'
import promBundle from 'express-prom-bundle'
import { Gauge } from 'prom-client'
import { providers } from 'ethers'
import { Metrics, Logger } from '@eth-optimism/common-ts'
import { injectL2Context, sleep } from '@eth-optimism/core-utils'

import { binarySearchForMismatch } from './helpers'

export interface HealthcheckServerOptions {
network: string
gethRelease: string
sequencerRpcProvider: string
replicaRpcProvider: string
logger: Logger
}

export interface ReplicaMetrics {
lastMatchingStateRootHeight: Gauge<string>
replicaHeight: Gauge<string>
sequencerHeight: Gauge<string>
}

export class HealthcheckServer {
protected options: HealthcheckServerOptions
protected app: express.Express
protected logger: Logger
protected metrics: ReplicaMetrics
server: Server

constructor(options: HealthcheckServerOptions) {
this.options = options
this.app = express()
this.logger = options.logger
}

init = () => {
this.metrics = this.initMetrics()
this.server = this.initServer()
}

initMetrics = (): ReplicaMetrics => {
const metrics = new Metrics({
labels: {
network: this.options.network,
gethRelease: this.options.gethRelease,
},
})
const metricsMiddleware = promBundle({
includeMethod: true,
includePath: true,
})
this.app.use(metricsMiddleware)

return {
lastMatchingStateRootHeight: new metrics.client.Gauge({
name: 'replica_health_last_matching_state_root_height',
help: 'Height of last matching state root of replica',
registers: [metrics.registry],
}),
replicaHeight: new metrics.client.Gauge({
name: 'replica_health_height',
help: 'Block number of the latest block from the replica',
registers: [metrics.registry],
}),
sequencerHeight: new metrics.client.Gauge({
name: 'replica_health_sequencer_height',
help: 'Block number of the latest block from the sequencer',
registers: [metrics.registry],
}),
}
}

initServer = (): Server => {
this.app.get('/', (req, res) => {
res.send(`
<head><title>Replica healthcheck</title></head>
<body>
<h1>Replica healthcheck</h1>
<p><a href="/metrics">Metrics</a></p>
</body>
</html>
`)
})

const server = this.app.listen(3000, () => {
this.logger.info('Listening on port 3000')
})

return server
}

runSyncCheck = async () => {
throw new Error('trial')
const sequencerProvider = injectL2Context(
new providers.JsonRpcProvider(this.options.sequencerRpcProvider)
)
const replicaProvider = injectL2Context(
new providers.JsonRpcBatchProvider(this.options.replicaRpcProvider)
)

// Continuously loop while replica runs
while (true) {
let replicaLatest = (await replicaProvider.getBlock('latest')) as any
const sequencerCorresponding = (await sequencerProvider.getBlock(
replicaLatest.number
)) as any

if (replicaLatest.stateRoot !== sequencerCorresponding.stateRoot) {
this.logger.error(
'Latest replica state root is mismatched from sequencer'
)
const firstMismatch = await binarySearchForMismatch(
sequencerProvider,
replicaProvider,
replicaLatest.number,
this.logger
)
this.logger.error('First state root mismatch found', {
blockNumber: firstMismatch,
})
this.metrics.lastMatchingStateRootHeight.set(firstMismatch)

throw new Error('Replica state root mismatched')
}

this.logger.info('State roots matching', {
blockNumber: replicaLatest.number,
})
this.metrics.lastMatchingStateRootHeight.set(replicaLatest.number)

replicaLatest = await replicaProvider.getBlock('latest')
const sequencerLatest = await sequencerProvider.getBlock('latest')
this.logger.info('Syncing from sequencer', {
sequencerHeight: sequencerLatest.number,
replicaHeight: replicaLatest.number,
heightDifference: sequencerLatest.number - replicaLatest.number,
})

this.metrics.replicaHeight.set(replicaLatest.number)
this.metrics.sequencerHeight.set(sequencerLatest.number)
// Fetch next block and sleep if not new
while (replicaLatest.number === sequencerCorresponding.number) {
this.logger.info(
'Replica caught up with sequencer, waiting for next block'
)
await sleep(1_000)
replicaLatest = await replicaProvider.getBlock('latest')
}
}
}
}
80 changes: 80 additions & 0 deletions packages/replica-healthcheck/src/helpers.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
import { providers } from 'ethers'
import { Logger } from '@eth-optimism/common-ts'

import { HealthcheckServerOptions } from './healthcheck-server'

export const readEnvOrQuitProcess = (envName: string | undefined): string => {
if (!process.env[envName]) {
console.error(`Missing environment variable: ${envName}`)
process.exit(1)
}
return process.env[envName]
}

export const readConfig = (): HealthcheckServerOptions => {
const network = readEnvOrQuitProcess('REPLICA_HEALTHCHECK__ETH_NETWORK')
const gethRelease = readEnvOrQuitProcess(
'REPLICA_HEALTHCHECK__L2GETH_IMAGE_TAG'
)
const sequencerRpcProvider = readEnvOrQuitProcess(
'REPLICA_HEALTHCHECK__ETH_NETWORK_RPC_PROVIDER'
)
const replicaRpcProvider = readEnvOrQuitProcess(
'REPLICA_HEALTHCHECK__ETH_REPLICA_RPC_PROVIDER'
)

if (!['mainnet', 'kovan', 'goerli'].includes(network)) {
console.error(
'Invalid ETH_NETWORK specified. Must be one of mainnet, kovan, or goerli'
)
process.exit(1)
}

const logger = new Logger({ name: 'replica-healthcheck' })

return {
network,
gethRelease,
sequencerRpcProvider,
replicaRpcProvider,
logger,
}
}

export const binarySearchForMismatch = async (
sequencerProvider: providers.JsonRpcProvider,
replicaProvider: providers.JsonRpcProvider,
latest: number,
logger: Logger
): Promise<number> => {
logger.info(
'Executing a binary search to determine the first mismatched block...'
)

let start = 0
let end = latest
while (start !== end) {
const middle = Math.floor((start + end) / 2)

logger.info('Checking block', { blockNumber: middle })
const [replicaBlock, sequencerBlock] = await Promise.all([
replicaProvider.getBlock(middle) as any,
sequencerProvider.getBlock(middle) as any,
])

if (replicaBlock.stateRoot === sequencerBlock.stateRoot) {
logger.info('State roots still matching', { blockNumber: middle })
start = middle
} else {
logger.error('Found mismatched state roots', {
blockNumber: middle,
sequencerBlock,
replicaBlock,
})

end = middle
}
}

return end
}
Loading

0 comments on commit 4319e45

Please sign in to comment.