Test accuracy and speed of different function-signature extractors
For results, refer to the main README.md.
- Get N Etherscan-verified contracts, save the bytecode and ABI to
datasets/NAME/ADDR.json
. - Extract function signatures from the bytecode. Each tool runs inside a Docker container and is limited to 1 CPU (see
providers/NAME
andMakefile
). - Assume selectors from Etherscan's ABI as ground truth.
- Compare the results with it and count False Positives and False Negatives.
Set the performance mode using sudo cpupower frequency-set -g performance
and run make
(GNU Make) inside the benchmark/
directory.
To use Podman instead of Docker: DOCKER=podman make
You can run only specific step; for example:
# Only build docker-images
$ make build
# Only run tests
$ make run
# Build `etherscan` docker image
$ make etherscan.build
# Run `etherscan` on dataset `largest1k`
$ make etherscan/largest1k
To process results run compare.py
:
$ python3 compare.py
# compare in web-browser
$ ../.venv/bin/python3 compare.py --web-listen 127.0.0.1:8080
-
Find all solidity contracts:
$ cd smart-contract-sanctuary/ethereum/contracts/mainnet/
# (contract_size_in_bytes) (contract_file_path)
$ find ./ -name "*.sol" -printf "%s %p\n" > all.txt
- Get ~1200 largest (by size) contracts:
$ cat all.txt | sort -rn | head -n 1200 | cut -d'/' -f3 | cut -d'_' -f1 > top.txt
- Get ~55.000 random contracts
$ cat all.txt | cut -d'/' -f3 | cut -d'_' -f1 | sort -u | shuf | head -n 55000 > random.txt
- Get all vyper contracts:
$ find ./ -type f -name '*.vy' | cut -d'/' -f3 | cut -d'_' -f1 > vyper.txt
- Download contracts code & abi:
$ poetry run python3 datasets/download.py --etherscan-api-key=CHANGE_ME --addrs-list=top.txt --out-dir=datasets/largest1k --limit=1000 --code-regexp='^0x(?!73).'
$ poetry run python3 datasets/download.py --etherscan-api-key=CHANGE_ME --addrs-list=random.txt --out-dir=datasets/random50k --limit=50000 --code-regexp='^0x(?!73).'
$ poetry run python3 datasets/download.py --etherscan-api-key=CHANGE_ME --addrs-list=vyper.txt --out-dir=datasets/vyper --code-regexp='^0x(?!73).'
We use --code-regexp='^0x(?!73).'
to:
- Skip contract with empty code (
{"code": "0x",
) - these are self-destructed contracts. - Skip contract with code starting from
0x73
(PUSH20
opcode). Compiled Solidity libraries begins with this code, and because Non-storage structs are referred to by their fully qualified name it's not yet supported by our reference Etherscan extractor (providers/etherscan
). This issue may be fixed later.