Skip to content

Import the Bitcoin blockchain in to a Neo4j graph database.

License

Notifications You must be signed in to change notification settings

monsirto/bitcoin-to-neo4j

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bitcoin to Neo4j

See the cypher examples for cool screenshots.

Summary.

This script runs through a bitcoin blockchain and inserts it in to a Neo4j graph database.

Important:

  • The resulting Neo4j database is roughly 6x the size of the blockchain. If blockchain = 170GB, your Neo4j database will be 1200GB or 1.2TB

  • It may take 60+ days to finish importing the entire blockchain. Instead of doing a bulk import of the entire blockchain, this script runs through each blk.dat1 file and inserts each block and transaction it encounters. So whilst it takes "a while" for an initial import, when it's complete it will continuously add new blocks as they arrive.

Nonetheless, you can still browse whatever is in the database whilst this script is running.

Install.

Only tested on Linux (Ubuntu 16.04).

Software.

This script makes use of the following software:

  1. Bitcoin Core
sudo add-apt-repository ppa:bitcoin/bitcoin
sudo apt update
sudo apt install bitcoind
  1. Java8
sudo add-apt-repository ppa:webupd8team/java
sudo apt update
sudo apt install oracle-java8-installer
  1. Neo4j 3.0+
wget -O - https://debian.neo4j.org/neotechnology.gpg.key | sudo apt-key add -
echo 'deb http://debian.neo4j.org/repo stable/' >/tmp/neo4j.list
sudo mv /tmp/neo4j.list /etc/apt/sources.list.d
sudo apt update
sudo apt install neo4j
  1. PHP 7.0+ - The main script and it's library functions are written in PHP.
# The extra php7.0-* libraries are needed for this script to run.
sudo apt install php7.0 php7.0-dev php7.0-gmp php7.0-curl php7.0-bcmath php7.0-mbstring
  1. Redis 3.2+ - This is used for storing the state of the import, so that the script can be stopped and started at any time.
sudo apt install build-essential

cd /usr/local/share
sudo wget http://download.redis.io/releases/redis-3.2.11.tar.gz
sudo tar -xvzf redis-3.2.11.tar.gz
sudo rm redis-3.2.11.tar.gz

cd redis-3.2.11
cd deps
sudo make geohash-int jemalloc lua hiredis linenoise
cd ..
sudo make
sudo make install

cd utils
sudo ./install_server.sh

Dependencies.

1. neo4j-php-client (install via composer).

This is the driver that allows PHP to connect to your Neo4j database. I have included a composer.json file, so navigate to the project's home directory and install it with:

composer install

2. phpredis

This allows PHP to connect to Redis. These instructions should install the version needed for PHP7 (which is different to the default installation instructions that come with phpredis, which is aimed at PHP5).

# This is needed for phpize (used in a moment)
sudo apt install php7.0-dev

# Install phpredis
cd /usr/local/share
sudo wget https://github.com/phpredis/phpredis/archive/3.1.6.zip
sudo unzip 3.1.6.zip
sudo rm 3.1.6.zip

cd phpredis-3.1.6/
sudo phpize
sudo ./configure
sudo make
sudo make install

# Install mod
sudo touch /etc/php/7.0/mods-available/redis.ini
sudo bash -c "echo extension=redis.so > /etc/php/7.0/mods-available/redis.ini"
sudo ln -s /etc/php/7.0/mods-available/redis.ini /etc/php/7.0/cli/conf.d/20-redis.ini

Config.

The config.php file contains all the configuration settings. You probably only need to change:

  1. The location of your ~/.bitcoin/blocks folder
  2. Your Neo4j username and password.
define("BLOCKS", '/home/user/.bitcoin/blocks'); // the location of the blk.dat files you want to read
define("TESTNET", false); // are you reading blk.dat files from Bitcoin's testnet?

define("NEO4J_USER", 'neo4j');
define("NEO4J_PASS", 'neo4j');
define("NEO4J_IP", 'localhost');    
define("NEO4J_PORT", '7687'); // this is the port used for the bolt protocol

define("REDIS_IP", 'localhost');    
define("REDIS_PORT", '6379');

Run.

Make sure Neo4j is running (sudo service neo4j start), then start running the script with:

php main.php

This will start importing in to Neo4j, printing out the results as it goes.

Here's an annotated explanation of the results

Tip:

You can stop and restart the script at any time, as the script stores its position using Redis.

The script sets the following keys in Redis:

  • bitcoin-to-neo4j - This stores the number of the current blk.dat file, and it's position in that file.
  • bitcoin-to-neo4j:orphans - This stores the blockhashes of orphan blocks. You see, the blocks in the blk.dat files are not stored in order (based on their height), so by saving blocks that we cannot calculate a height for yet (because we haven't encountered the block it builds upon), we are able set the height later on.
  • bitcoin-to-neo4j:tip - This is the height of the current longest chain we have got in Neo4j. It's not needed for the script to work, but it's useful for seeing the progress of the script.

When Redis is installed, you can look at each of these with:

redis-cli hgetall bitcoin-to-neo4j
redis-cli hgetall bitcoin-to-neo4j:orphans
redis-cli hgetall bitcoin-to-neo4j:tip

FAQ

How can I query this database?

Here are some example cypher queries, including some screenshots.

What are the hardware requirements?

As with any indexing environment, to maximise performance, your source location/data (Bitcoin Blocks) and your target location/data (Graph DB) should be on different physical storage volumes. This will limit reading and writing to the same volume.

Once again, if you are familiar with large volume indexing, RAM is King.

Monsirto's Build - Feb 2018

First build to test Bitcoin to Neo4j solution.

  • HP z820 Workstation Dual Xeon
  • 10TB OS / DATA Volume
  • 128 GB RAM

===============OLD=========================

  1. A really ****ing big SSD.

Other than that, I run this on my Thinkpad X220 (8GB Ram, 4x2.60GHz CPU) without any problems. It took about 2 weeks to import the full testnet blockchain (50GB total), but my laptop didn't explode.

However, if you want to help things along:

  • Make sure you're using an SSD for fast write speeds.
  • Give as much RAM to Neo4j as possible. This helps with looking up existing nodes in the database, which this script does continually as it merges new transactions on to old ones.
    • Heap Size: I think a minimum 4GB does the trick.
    • Page Cache: Whatever RAM you have got left over.

CPU isn't much of a factor in comparison to RAM and a fast disk.

===============END OLD=========================

See Neo4j Performance for more details.

How big is this graph database?

It's constantly growing, but as of 17 May 2017 (blockchain height: 466,874, blockchain size: 114GB):

  • Nodes: 1,587,199,550
  • Relationships: 2,503,359,310
  • Size: 625 GB

Does this import the entire blockchain?

Yes, no data is left behind. If you really wanted to you could convert the data back in to binary as it is found in the raw blk.dat files.

For example, the "serialized" transaction data2 on my explorer is actually data from the graph converted back in to it's original format: Transaction: be56667fed4336efc08c6a1addfba0008169861af906e7f436ffcc86935d2b2e (click on "serialized" in the top-right)

Why doesn't this use Neo4j's Bulk Import Tool?

Because I needed a script that would add blocks as they arrived.

It would involve writing another tool for a bulk import. I haven't tried.

Why is this written in PHP?

Because it's the language I knew best when I started this.

Or in other words, I'm not the king of programming, and PHP does the job.

Footnotes

About

Import the Bitcoin blockchain in to a Neo4j graph database.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • PHP 100.0%