This project examines the relationship between the level of ozone concentration in urban locations and their physical features through the use of Convolutional Neural Networks (CNNs). We train two models, including one trained on satellite imagery ("Satellite CNN") to capture higher-level features such as the location's geography, and the other trained on street-level imagery ("Street CNN") to learn ground-level features such as motor vehicle activity. These features are then concatenated to train neural network ("Concat NN") on this shared representation and predict the location's level of ozone as measured in parts per billion.
-
The
02_Scripts/
directory comprises the code to scrape and preprocess the ozone concentration data, which is sourced from the AirNow API. It also contains a01_Data_Exploration
directory which includes code to visualize elements of the dataset such as particular data points and a geographical distribution of the locations with ozone readings. -
In "imagery" we find the scripts to retrieve satellite imagery from Google Earth Engine (
imagery/getting_imagery_no_mask.py
) and the street level images from Google Street View (training set is build withimagery/get_street_imagery_train_set.py
and .... ). The scriptvisualization_L8SR
provides functionality to visualize a Landsat 8 satellite image. -
The
Models
directory comprisesCNNs.py
, which implements the CNNs used for training. We use a ResNet-18 model for both the satellite and the street-level imagery, pretrained on the ImageNet dataset. Adjustments include modification of the input layer to accommodate for higher image channels in the satellite dataset (7), additional regularization through the use ofFully Connected -> Batchnormalization -> Dropout
blocks in the highest layers, and modification of the final layer's number of units to fit our regression and classification tasks. Thedata_loaders.py
script includes theSatelliteData
andStreetData
classes implemented to user Torch's DataLoaders. The DataLoaders call upon the functions inbuild_dataset.py
to build the train/dev/test splits for the satellite data if not already generated. -
train.py
,evaluate.py
andsearch_hyperparams.py
implement the code to train our models. It is important to note that this training code has been largely adapted from Stanford CS 230's Computer Vision project code examples found at this link. -
extract_features
loads the Satellite CNN and Street CNN with a chosen set of weights, extracts features for each satellite and street image from the next to last linear layer of the model, concatenates the features and saves them to a new set of train/dev/test files in HDF5 format. -
predict
Using a set of chosen weights for the Concat NN, this script predicts the ozone measurement for a (satellite image, street image) pair of concatenated features. -
error_analysis
Given a set of predictions on the concatenated features (generated bypredict.py
), this script performs quantitative and qualitative analyses of the errors, their distributions and their geographic dynamics. -
utils.py
implements helper classes and functions used to log our model training process, load and write dictionaries, and plot learning loss curves.