Skip to content


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation


Table of Contents

About The Project

The Web Conference 2021 (WWW '21), April 19--23, 2021, Ljubljana Slobenia

Lin Zhao, Sourav Sen Gupta, Arijit Khan, Robby Luo

Nanyang Technological University, Singapore

With over 42 billion USD market capitalization (October 2020), Ethereum is the largest public blockchain that supports smart contracts. Recent works have modeled transactions, tokens, and other interactions in the Ethereum blockchain as static graphs to provide new observations and insights by conducting relevant graph analysis. Surprisingly, there is much less study on the evolution and temporal properties of these networks. In this paper, we investigate the evolutionary nature of Ethereum interaction networks from a temporal graphs perspective. We study the growth rate and model of four Ethereum blockchain networks, active lifespan and update rate of high-degree vertices. We detect anomalies based on temporal changes in global network properties, and forecast the survival of network communities in succeeding months leveraging on the relevant graph features and machine learning models.

Data Extraction

Due to the size limitation, instead of uploading the dataset, we will introduce the extraction method we are using to obtain the data. We also demonstrate a sample arc list and corresponding address hased table split by year and by month (only for Contract Net) in each folder.

Google bigquery

  1. Apply and login to Google Cloud Platform account.
  2. Create a bucket to store your files.
  3. Go to BigQuery and find the data set 'ethereum_blockchain'
  4. Select the table with desired timestamp you want and 'Export to GCS'.
  5. Then select the GCS location (the bucket created in step 2).
  6. If csv is preferred: //file*.csv (e.g. tmpbucket/blocks/blocks*.csv).
  7. The * will help to number the files as exporting the tables will split the data into multiple files.

Replace .csv with .txt or .json as per your preference. Pip install gsutil For downloaded entire folder: gsutil -m cp -r gs://bucketname/folder-name local-location For downloaded multiple files: gsutil -m cp -r gs://bucketname/folder-name/filename* local-location


Kaggle can be used to preview the data table columns.


Please refer to the github page for more details.

Table Explanation

We extract all relevant data from dataset under the Google Cloud till 2019-12-31 23:59:45 UTC, which amounts to all blocks from genesis (#0) up to #9193265. The entire blockchain data is stored in seven different tables, out of which, we extract data from contracts, token transfers, traces, and transactions tables for our temporal analysis.

  • The trace table stores executions of all recorded messages and transactions (successful ones) in the Ethereum blockchain. This is the most comprehensive tables for analysis.

  • The transactions table contains all transaction details such as source and target address, and amount of ether transferred.

  • The contracts table contains all Contract Accounts, their byte code and other properties of byte code such as block_timestamp}, block_number, token types (e.g., ERC721, ERC20).

  • The token transfers table focuses on all transactions with tokens from one 20-byte address to another 20-byte address on the blockchain.

Scripts Explanation

All the scripts are written in python 3.7. To run the script, please lunch a python tools like Anaconda or directly run "python"

Network Extraction

Link to the folder

The folder contains four folders for transactionNet, traceNet,tokenNet and contractNet arc list and accounts extraction.

For transactionNet, traceNet, tokenNet

  1. Annual graph

    The raw data obtained from Google Bigquery is in annual basis. Scripts named as "","" and "" are to process annual-based raw data, form the annual based arc list and corresponding hash table.

  2. Result

    Due to the file size limitation in github, only Year2015 annual arc list and hash table is uploaded as a reference

For contractNet

  1. Annual graph

    The raw data obtained from Google Bigquery is in annual basis. Scripts named as "" is to process annual-based raw data, form the annual based arc list and corresponding hash table.

  2. Monthly graph

    Script named as "" will not only form the arc list and hash table but also help to partition the arc list into different month by matching with the timestamp in raw data.

  3. Result

    Due to the file size limitation in github, only ContractNet Year2015 annual arc list and hash table is uploaded as a reference in folder "contractNet_address_hash" and "contractNet_edgelist_example".

Graph Analysis

Link to the folder

  1. Find number of vertices, arcs and self-loops of each network

    An example to analyze contractNet_2019 number of vertices, arcs and self-loops for figure 2 and 3.

    Find common account in continuous years

    An example to analyze contractNet for Figure 2.

    Find common account in continuous years

    An example to analyze contractNet for Figure 3.

  2. Analyze graph network reciprocity, associtativity, connectedComponent, kcore properties

    Analyze network pathLength, radius, diameter

    Analyze network triangle, transitivity, aveClusteringCoeff

    Analyze network weakly connected component and strongly connected component

    An example for extract network properties for section 4, 5 and 6

  3. Extract degree number for each vertex

    An example to calculate number of degree/indegree/outdegree for each vertex in the network. Input is the network edgelist, Output is a csv with the account and corresponding results.

    Find tokenNet top10 degree accounts

    Read in vertices degree distribution file (in previous step) and list down the top 10 values for each year for table 6 and table 7.

Community detection and prediction

  1. Community detection

    There are 3 steps in community detection

    Step1: Identify communities using Multi-level algorithm

    Note: python igraph library output communities arc list using index instead of real value of nodes. In order to perform matching in next step, it is needed to attach values (which is annual basis index) to each nodes.

    Step2: Match communities in 3-month dataset and 1-month dataset

    This script makes use of vf2 algorithm for subiomorphism matching. The matching not only consider graph shape but also node values to be matched.

    Step3: Extract properties for each community

    The script extract local and global properties of each community to be training/testing data.

  2. Community Predition

    Link to the folder


    Scripts and are used for each time period prediction. The script are generalized, it only requires to input the class 1 and class 0 training features and labels. There is a random selection function in the script to balance class 1 and class 0. It needs to adjust based input data.


    Scripts and random_forest_combine_allMonth.pyare for competed year prediction. So the training data are combined pior to input into the scripts. Therefore, the scipts are almost the same as individual scripts.

Useful linkes

  1. Github ethereum
  2. Google bigquery
  3. Kaggle


  1. Evgeny Medvedev and the D5 team, "Ethereum ETL,", 2018.
  2. Ethereum Blockchain,, 2020


No description, website, or topics provided.






No releases published


No packages published
