Air Pollution Forecasting Using A Transfomer Neural Network

Semester long IST707 - Data Analytics project

Project Goals

The objective of the project is to use the main skills taught in this class to solve a real data mining problem
For this project, you must choose your own dataset. It can be one that you created yourself or found from other resources, such as the Kaggle competitions and the UCI repository.
Define a problem on the dataset and describe it in terms of its real-world organizational or business application. The complexity level of the problem should be comparable to homework assignments.

Data provdided by UCI.

See full_pipeline_model.ipynb for the following:

Data pipeline transformation
Splitting training and test files
Build Transformer model using PyTorch
Train model
Save model
Forecast entire test set
Run cost projection on air pollution fines

Other Files and Scripts:

ts_transformer.py: Time Series Transformer neural network architecture
torch_utils.py: Custom module of PyTorch helper functions, PyTorch class of the pollution dataset, and wrapper class to allow ts_transformer to be used on a Scikit-learn pipeline
sklearn_utils.py: Custom Transformer steps for a Scikit-learn pipeline and a pipeline creation function (bejing_pipeline)
preprocessor.py: Kalman Filtering (Preprocessor class) class to impute missing data in the datasets during the processing of data
process_data.py: command line script to impute missing data and clean the data
tseries.R: Time Series plotting and analysis of sample data
tseries_eda.Rmd: extensive EDA and time series plotting and forecasting

Software Requirements of the Project

For all Python files (.ipynb abd .py extensions):

Custom modules:
- torch_utils
- ts_transformer
- preprocessor
PyTorch
PathLib
NumPy
Pandas
PyKalman
Scikit-learn
Matplotlib
Multiprocessing
os
glob
click

For all R scripts:

tseries
TSstudio
forecast
xts
tidyverse

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.images		.images
.ipynb_checkpoints		.ipynb_checkpoints
.TODO.md.swp		.TODO.md.swp
.gitignore		.gitignore
Alec_Schneider_Project Report.docx		Alec_Schneider_Project Report.docx
Alec_Schneider_Project Report.pdf		Alec_Schneider_Project Report.pdf
CITATION.cff		CITATION.cff
Pollution_Report.docx		Pollution_Report.docx
README.md		README.md
TODO.md		TODO.md
air_eda.ipynb		air_eda.ipynb
drop_time_cols.py		drop_time_cols.py
eda_kalman.ipynb		eda_kalman.ipynb
fetch_fred_data.py		fetch_fred_data.py
full_pipeline_model.ipynb		full_pipeline_model.ipynb
kalman.py		kalman.py
preprocessor.py		preprocessor.py
process_data.py		process_data.py
sklearn_utils.py		sklearn_utils.py
test_kalman.py		test_kalman.py
test_kalman2.py		test_kalman2.py
torch_utils.py		torch_utils.py
ts_transformer.py		ts_transformer.py
tseries.R		tseries.R
tseries_eda.Rmd		tseries_eda.Rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Air Pollution Forecasting Using A Transfomer Neural Network

Project Goals

See full_pipeline_model.ipynb for the following:

Other Files and Scripts:

Software Requirements of the Project

Daily Heatmap of Particle Matter 2.5 Concentrations

Transformer's Forecasted Rolling Mean Particle Matter 2.5 Concentrations

About

Releases

Packages

Languages

Alec-Schneider/BejingPollutionAnalysis

Folders and files

Latest commit

History

Repository files navigation

Air Pollution Forecasting Using A Transfomer Neural Network

Project Goals

See full_pipeline_model.ipynb for the following:

Other Files and Scripts:

Software Requirements of the Project

Daily Heatmap of Particle Matter 2.5 Concentrations

Transformer's Forecasted Rolling Mean Particle Matter 2.5 Concentrations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages