The Python dependencies are listed in the file env.yaml
as conda environment. You might need to adapt the cudatoolkit=11.3
version in the yaml file.
conda env create -f env.yaml -p ./conda-env
conda activate ./conda-env
We need to install scikit-sparse
separately because it is problematic to install on Windows systems with conda. For updated instructions see the package's README this thread.
On Windows:
$env:SUITESPARSE_INCLUDE_DIR="$env:CONDA_PREFIX/Library/include/suitesparse"
$env:SUITESPARSE_LIBRARY_DIR="$env:CONDA_PREFIX/Library/lib"
pip install scikit-sparse
On Linux (not tested on Mac OS):
conda install -c conda-forge scikit-sparse
For downloading and pre-processing atmospheric and air pollution data we also need R. It wasn't possible to have a reproducible R conda environment since the packages required for the task conflicted with conda. We use Docker instead (or you can manually install the required packages by examining the Dockerfile
).
docker build -t r-env .
python generate_simdata.py --verbose --output_dir=simulations/basic --ksize=13 --nsims=10
python generate_simdata.py --verbose --output_dir=simulations/nonlinear --ksize=13 --nsims=10 --nonlinear
For convenience, the simulations output are also on the branch data_branch
of the repository.
for i in {0...9}; do python train_simstudy.py --sim="$i" --dir=simulations/basic --output=results/simstudy/basic --epochs 20000; done
for i in {0...9}; do python train_simstudy.py --sim="$i" --dir=simulations/nonlinear --output=results/simstudy/nonlinear --epochs 20000; done
for i in {0...9}; do python train_simstudy.py --sparse --sim="$i" --dir=simulations/basic --output=results/simstudy/basic_sparse --epochs 10000; done
for i in {0...9}; do python train_simstudy.py --sparse --sim="$i" --dir=simulations/nonlinear --output=results/simstudy/nonlinear_sparse --epochs 10000; done
Then check out the notebooks
notebooks/explore_simdata.ipynb
notebooks/potentials.ipynb
notebooks/simstudy_results.ipynb
It used to be possible to download the SO4 data automatically using Python and ftp. But as of May 2022, you will need to download it manual from this link. For background, or if the link stops working, see https://sites.wustl.edu/acag/datasets/surface-pm2-5/. The exact dataset needed is under Monthly/ASCII/SO4
from the monthly V4.NA.02
PM2.5 total mass and composition described in van Donkelaar et al. (2019). The data .asc
files must be extracted to data/SO4/ASCII/*
In addition, you need to manually download the Power Plant Emission Data file AMPD_Unit_with_Sulfur_Content_and_Regulations_with_Facility_Attributes.csv
from here and place it in the data/
folder.
From the root folder run:
docker run --rm -it -v $(pwd):/workspace/ r-env bash preprocessing/preprocess_raw.sh
Note: On Windows Powershell you can replace $(pwd)
with ${pwd}
. The pipeline does seem to run slow on Windows (tested on Windows 11). If that affects you, you can run the preprocessing scripts in prepare_training
using your native R
. If you are running out of memory, try reducing the number of parallel processes at line 8 of preprocessing/process_so4.R
.
It might take a few hours for the process to finish. After it's done, you can run (on the conda environment):
python preprocessing/prepare_training.py
The result should create a file data/training_data.pkl
of approximately 266 MB.
Run the commands
python train_app1_self_narr.py --radius=1 --odir=r1_w2vec
python train_app1_self_narr.py --radius=3 --odir=r3_w2vec
python train_app1_self_narr.py --radius=5 --odir=r5_w2vec
python train_app1_self_narr.py --radius=7 --odir=r7_w2vec
python train_app1_self_narr.py --radius=9 --odir=r9_w2vec
python train_app1_self_narr.py --radius=3 --odir=r3_nbrs --nbrs_av=3
python train_app1_self_narr.py --radius=9 --odir=r9_nbrs --nbrs_av=9
python train_app1_self_narr.py --radius=3 --odir=r3_local --local
python train_app1_self_narr.py --radius=9 --odir=r9_local --local
Then run the notebook notebooks/visualize_selfsupervised.ipynb
.
First clone the DAPSm analysis repository from the previous analysis.
git clone https://github.com/gpapadog/DAPSm-Analysis dapsm
Run the following commands
python train_app2_detrend.py --odir=w2vec
python train_app2_detrend.py --odir=local --local
python train_app2_detrend.py --odir=n9 --av-nbrs=9
python train_app2_detrend.py --odir=unadjusted --unadjusted
Then to produce the plots and analysis run the Jupyter notebook notebooks/visualize_supervised.ipynb
.