Kruitwagen, L., Klaas, J., Baghaei Lakeh, A., & Fan, J.
A repo for building a network model of the global fossil fuel supply chain using asset-level data. A project in support of the Oxford Martin Programme on the Post-Carbon Transition generously supported by Microsoft AI for Earth.
Climate change risks manifest in the real economy, with grave consequences for welfare of human populations and the biosphere; the economic returns of industrial sectors; and geopolitical stability. Understanding the diffusion of risks in real infrastructure networks is an urgent priority for delivering climate change mitigation, adaptation, and resiliency. The oil, gas, and coal supply chains are the most salient progenitors and inheritors of these environmental risks. We prepare a geospatial arrangement of the global oil, gas, and coal supply chains using open-source asset-level data. The resulting complex network has 6.09mn nodes and 15.70mn edges and is implemented in a graph database. With estimates of annual coal, gas, and oil demand in 13,229 global population centres and 8,165 global power stations, we use a minimum-cost flow method to estimate global asset-level energy flows. We develop a method for cross-validating and tuning our network flow simulation using aggregate country-level import and export statistics. We demonstrate two analyses of asset-level transition risk: a counter-factual demand shock scenario consistent with the IEA Sustainable Development Scenario; and supply shock scenarios developed by interdicting regionally-aggregated coal, oil, and gas supplies. Our contribution lies in the global scope of our asset-level supply chain and the novelty of our minimum-cost flow method. We conclude with a discussion of further research directions and make the graph database and supporting code publicly available.
The full paper is available here.
Figure 1: Global coal, oil, and gas asset-level data
Our work uses only open-source data in order to maximise accessibility and reporduceability. We build on the substantial efforts of public sector organisations to develop and publish open datasets of energy system infrastrucutre and activity, particularly the work of the Global Energy Monitor, the World Resources Institute ResourceWatch, the Global Oil and Gas Features Database, the European Commision Joint Research Council, OpenStreetMap.
Data must be downloaded and unzipped in a folder data/
in the main directory.
On a fresh linux install you will require the following:
sudo apt-get install python3-dev build-essential libspatialindex-dev openjdk-8-jre
We use conda for environment management. Create a new environment:
conda create -n ffsc python=3.7
Activate your conda environment:
conda activate ffsc
Clone and change directory into this repo:
git clone https://github.com/Lkruitwagen/global-fossil-fuel-supply-chain.git
cd global-fossil-fuel-supply-chain
Install pip package manager to the environment if it isn't already:
conda install pip
Install the project packages. Conda is used to install geospatial packages with c binary dependancies:
pip install -r requirements.txt
conda install -c conda-forge --file conda_reqs.txt
Save the environment variables we need in activation and deactivation scripts in the conda environment. Follow the conda instructions for your os, and adapt the following:
cd CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh
edit ./etc/conda/activate.d/env_vars.sh
as follows:
#!/bin/sh
export USE_PYGEOS=1
export PYTHONPATH=$PWD
export NUMEXPR_MAX_THREADS=24 # or something suiting your resources
edit ./etc/conda/deactivate.d/env_vars.sh
as follows:
#!/bin/sh
unset PYTHONPATH
unset USE_PYGEOS
unset NUMEXPR_MAX_THREADS
Save and close both files.
Clone the DirectedLouvain repository
git clone https://github.com/nicolasdugue/DirectedLouvain.git
Enter the repository and run the MakeFile
cd DirectedLouvain
make
We use PyPy to significantly speed-up dijkstra minimum-cost path search. PyPy uses a just-in-time compiler to significantly speed-up Python code execution. Download the correct version of PyPy to bin/
.
- Note if ssh tunnelling you will need to redirect port 8678 and 8888
- Install neo4j server. We use Neo4j 3.5 in our experiments, but everything described here should also work with Neo4j 4.0
The main pipeline entrypoint is cli.py
. The commands available to cli.py
can be found with:
python cli.py --help
The following top-level commands are available: network-assembly
,solve-flow
,shock-analysis
, and visualisation
.
The pipeline makes extensive use of Kedro, which is a computational graph manager. Kedro uses of a data catalog, which is here: conf/base/catalog.yml. The catalog can also be used in other scripts and environments:
from kedro.io import DataCatalog
import yaml
catalog = DataCatalog(yaml.load(open('/path/to/conf/base/catalog.yaml','r'),Loader=yaml.SafeLoader))
dataframe = catalog.load('<dataframe-key>')
Each pipeline is a computational graph with computation nodes and dataset edges, creating a version-controlled reproduceable workflow from raw data through to analysis. Computation nodes are labelled with tags so the pipeline can be run one-node-at-a-time, if desired. The computational graph for the network-assembly
pipeline is shown in the figure below. Each pipeline can be run using:
python cli.py <pipeline-command> --tags=<list,of,comma-separated,tags>
Tags are nested such that top-level tags will execute a while pipeline step, and nested tags will execute substeps for certain asset types or energy carriers.
Figure A-1: Network assembly computational graph
This pipeline assembles the asset-level infrastructure network. It can be run with python cli.py network-assembly
. A comma-separated list of --tags
can also be passed. Passing --help
produces the following documentation:
Usage: cli.py network-assembly [OPTIONS]
Assemble the basic network from asset-level data. See
ffsc.pipeline.pipeline.py for detailed tags.
AVAILABLE TOP-LEVEL TAGS:
-------------------------
--preprocess : Preprocessing and homogenisation of all raw asset data.
--prep : Geospatial preparation operations on all data.
--sjoin : Spatial join operations matching linear and point assets.
--flmile : First- and last-mile matching operations to gapfill missing data.
--explode : Geospatial post-processing of joining and matching.
--simplify : Simplification operations to reduce the number of nodes.
Options:
--tags TEXT Optionally specify any individual node tags you want to run in
a comma-separated list.
--help Show this message and exit.
This pipeline calculates asset-level energy flow through the network, from energy sources (i.e. coalmines, and oil and gas fields and wells) through to energy sinks (population centres and powerstations). It can be run with python cli.py solve-flow
. A comma-separated list of --tags
can also be passed. Passing --help
produces the following documentation:
Usage: cli.py solve-flow [OPTIONS]
Assemble the basic network from asset-level data. See
ffsc.flow.flow_pipeline.py, ffsc.communities.community_pipeline.py, and
ffsc.interdiction.interdiction_pipeline.py for detailed tags.
AVAILABLE TOP-LEVEL TAGS:
-------------------------
--flow_edges : Prepare network edges dataframe.
--flow_nodes : Prepare network nodes dataframe.
--flow_nx : Test network connectivity and prepared for flow calculations.
--community-prep : Prepare to add communities to network.
--community-run : Run community detection algorithm.
--community-post-nodes : Post-process community detection onto node dataframe.
--community-post-edges : Post-process community detection onto edge dataframe.
--dijkstra-pickle : Pickle edges in preparation for dijkstra mincost path.
--dijkstra-paths : Run async dijkstra mincost path.
--dijkstra-adj : Post-process dijkstra to mincost adjacency matrix.
--dijkstra-flow : Solve flow using iterative cost-scaling.
Options:
--tags TEXT Optionally specify any individual node tags you want to run
--help Show this message and exit.
The energy infrastructure network can be treated with demand and supply shocks. Flow can be recalculated to see the effect of the shocks on asset-level energy flow and costs. The pipeline can be run with python cli.py shock-analysis
. A comma-separated list of --tags
can also be passed. Passing --help
produces the following documentation:
Usage: cli.py shock-analysis [OPTIONS]
Prepare demand and supply shock analysis. See
ffsc.interdiction.interdiction_pipeline.py for detailed tags.
AVAILABLE TOP-LEVEL TAGS:
-------------------------
--sds_counterfactual : Prepare Sustainable Development Scenario demand shock analysis.
--supply-interdiction : Prepare supply interdiction shock analysis.
--post-supply-interdiction : Post-process supply interdiction shock analysis.
Options:
--tags TEXT Optionally specify any individual node tags you want to run
--help Show this message and exit.
The asset-level data can be visualised on a world map, and scaled and colored by the amount of energy flow passing through each asset. The pipeline can be run with python cli.py visualisation
. A comma-separated list of --tags
can also be passed. Passing --help
produces the following documentation:
Usage: cli.py visualisation [OPTIONS]
Prepare visualisation of assets, flow, and demand shock counterfactual.
See ffsc.visualisation.visualise_pipeline.py for detailed tags.
AVAILABLE TOP-LEVEL TAGS:
-------------------------
--visualise-assets : Visualise all assets.
--visualise-iso2 : Add iso2 country codes to dataframes.
--visualise-trade-prep : Prepare trade dataframes for comparison.
--visualise-trade : Visualise actual trade and production vs simulated.
--visualise-flow : Visualise energy flow.
--compare-flow : Compare energy flow to SDS demand shock energy flow.
Options:
--tags TEXT Optionally specify any individual node tags you want to run
--help Show this message and exit.
- Make sure Neo4j is shut down. The installer might start up neo4j under a different user (e.g. Neo4j).
In this case, you might want to find the process under which Neo4j runs using
sudo ps -a | grep neo4j
. Find the PID of the process and kill it usingsudo kill
. - As Neo4j's files may be restricted, you want to do the next steps as root.
- Delete Neo4j's data folder from the old database. On Linux, this is stored under
/var/lib/neo4j/data
. - Import the data by executing the
bin/import.sh
script, which you can find undersrc/neo4j_commands
in this repository. - After the import is complete, restart Neo4j using
neo4j start