See the cypher examples for cool screenshots.
This script runs through a bitcoin blockchain and inserts it in to a Neo4j graph database.
Important:
-
The resulting Neo4j database is roughly 6x the size of the blockchain. If blockchain = 170GB, your Neo4j database will be 1200GB or 1.2TB
-
It may take 60+ days to finish importing the entire blockchain. Instead of doing a bulk import of the entire blockchain, this script runs through each
blk.dat
1 file and inserts each block and transaction it encounters. So whilst it takes "a while" for an initial import, when it's complete it will continuously add new blocks as they arrive.
Nonetheless, you can still browse whatever is in the database whilst this script is running.
Only tested on Linux (Ubuntu 16.04).
This script makes use of the following software:
sudo add-apt-repository ppa:bitcoin/bitcoin
sudo apt update
sudo apt install bitcoind
sudo add-apt-repository ppa:webupd8team/java
sudo apt update
sudo apt install oracle-java8-installer
wget -O - https://debian.neo4j.org/neotechnology.gpg.key | sudo apt-key add -
echo 'deb http://debian.neo4j.org/repo stable/' >/tmp/neo4j.list
sudo mv /tmp/neo4j.list /etc/apt/sources.list.d
sudo apt update
sudo apt install neo4j
- PHP 7.0+ - The main script and it's library functions are written in PHP.
# The extra php7.0-* libraries are needed for this script to run.
sudo apt install php7.0 php7.0-dev php7.0-gmp php7.0-curl php7.0-bcmath php7.0-mbstring
- Redis 3.2+ - This is used for storing the state of the import, so that the script can be stopped and started at any time.
sudo apt install build-essential
cd /usr/local/share
sudo wget http://download.redis.io/releases/redis-3.2.11.tar.gz
sudo tar -xvzf redis-3.2.11.tar.gz
sudo rm redis-3.2.11.tar.gz
cd redis-3.2.11
cd deps
sudo make geohash-int jemalloc lua hiredis linenoise
cd ..
sudo make
sudo make install
cd utils
sudo ./install_server.sh
1. neo4j-php-client (install via composer).
This is the driver that allows PHP to connect to your Neo4j database. I have included a composer.json
file, so navigate to the project's home directory and install it with:
composer install
2. phpredis
This allows PHP to connect to Redis. These instructions should install the version needed for PHP7 (which is different to the default installation instructions that come with phpredis, which is aimed at PHP5).
# This is needed for phpize (used in a moment)
sudo apt install php7.0-dev
# Install phpredis
cd /usr/local/share
sudo wget https://github.com/phpredis/phpredis/archive/3.1.6.zip
sudo unzip 3.1.6.zip
sudo rm 3.1.6.zip
cd phpredis-3.1.6/
sudo phpize
sudo ./configure
sudo make
sudo make install
# Install mod
sudo touch /etc/php/7.0/mods-available/redis.ini
sudo bash -c "echo extension=redis.so > /etc/php/7.0/mods-available/redis.ini"
sudo ln -s /etc/php/7.0/mods-available/redis.ini /etc/php/7.0/cli/conf.d/20-redis.ini
The config.php
file contains all the configuration settings. You probably only need to change:
- The location of your
~/.bitcoin/blocks
folder - Your Neo4j username and password.
define("BLOCKS", '/home/user/.bitcoin/blocks'); // the location of the blk.dat files you want to read
define("TESTNET", false); // are you reading blk.dat files from Bitcoin's testnet?
define("NEO4J_USER", 'neo4j');
define("NEO4J_PASS", 'neo4j');
define("NEO4J_IP", 'localhost');
define("NEO4J_PORT", '7687'); // this is the port used for the bolt protocol
define("REDIS_IP", 'localhost');
define("REDIS_PORT", '6379');
Make sure Neo4j is running (sudo service neo4j start
), then start running the script with:
php main.php
This will start importing in to Neo4j, printing out the results as it goes.
Here's an annotated explanation of the results
You can stop and restart the script at any time, as the script stores its position using Redis.
The script sets the following keys in Redis:
bitcoin-to-neo4j
- This stores the number of the current blk.dat file, and it's position in that file.bitcoin-to-neo4j:orphans
- This stores the blockhashes of orphan blocks. You see, the blocks in the blk.dat files are not stored in order (based on their height), so by saving blocks that we cannot calculate a height for yet (because we haven't encountered the block it builds upon), we are able set the height later on.bitcoin-to-neo4j:tip
- This is the height of the current longest chain we have got in Neo4j. It's not needed for the script to work, but it's useful for seeing the progress of the script.
When Redis is installed, you can look at each of these with:
redis-cli hgetall bitcoin-to-neo4j
redis-cli hgetall bitcoin-to-neo4j:orphans
redis-cli hgetall bitcoin-to-neo4j:tip
Here are some example cypher queries, including some screenshots.
As with any indexing environment, to maximise performance, your source location/data (Bitcoin Blocks) and your target location/data (Graph DB) should be on different physical storage volumes. This will limit reading and writing to the same volume.
Once again, if you are familiar with large volume indexing, RAM is King.
First build to test Bitcoin to Neo4j solution.
- HP z820 Workstation Dual Xeon
- 10TB OS / DATA Volume
- 128 GB RAM
===============OLD=========================
- A really ****ing big SSD.
Other than that, I run this on my Thinkpad X220 (8GB Ram, 4x2.60GHz CPU) without any problems. It took about 2 weeks to import the full testnet blockchain (50GB total), but my laptop didn't explode.
However, if you want to help things along:
- Make sure you're using an SSD for fast write speeds.
- Give as much RAM to Neo4j as possible. This helps with looking up existing nodes in the database, which this script does continually as it merges new transactions on to old ones.
- Heap Size: I think a minimum 4GB does the trick.
- Page Cache: Whatever RAM you have got left over.
CPU isn't much of a factor in comparison to RAM and a fast disk.
===============END OLD=========================
See Neo4j Performance for more details.
It's constantly growing, but as of 17 May 2017 (blockchain height: 466,874, blockchain size: 114GB):
- Nodes: 1,587,199,550
- Relationships: 2,503,359,310
- Size: 625 GB
Yes, no data is left behind. If you really wanted to you could convert the data back in to binary as it is found in the raw blk.dat
files.
For example, the "serialized" transaction data2 on my explorer is actually data from the graph converted back in to it's original format: Transaction: be56667fed4336efc08c6a1addfba0008169861af906e7f436ffcc86935d2b2e (click on "serialized" in the top-right)
Because I needed a script that would add blocks as they arrived.
It would involve writing another tool for a bulk import. I haven't tried.
Because it's the language I knew best when I started this.
Or in other words, I'm not the king of programming, and PHP does the job.