This project aims to predict the hazard level of various atmospheric pollutants as a function of different variables, such as the type of area in which it is located and the surrounding environmental factors. Hazard level prediction is essential to understand and mitigate potential risks associated with exposure to pollutants at different locations.
This project is important because it allows predicting and understanding air pollution levels to:
- Protect public health.
- Facilitate sustainable urban planning.
- Improve people's quality of life.
- Promote environmental awareness.
- Contribute to scientific research in air quality.
Data are from the official website of the Government of Spain, Ministry for Ecological Transition and Demographic Challenge.
Data collected from the documentation of the page were introduced to obtain some data to help us develop the purpose of our project.
Source:
- FECHA. Date recorded at the time the pollutant values were taken (numeric).
- N_CCAA. Name of each autonomous community in Spain where measurement data were collected for each pollutant (categorical).
- PROVINCIA. Name of each province in which measurement data for each pollutant were collected (categorical).
- N_MUNICIPIO. Name of each municipality in which measurement data were collected for each pollutant (categorical).
- ESTACION. Number assigned to each station in each autonomous community that has recorded pollutant measurement data (numeric).
- MAGNITUD. Each pollutant that has been registered in the different stations (categorical).
- TIPO_AREA.Type of area in which they are located, urban, suburban or rural area (categorical).
- TIPO_ESTACION. According to the typology of the main emission source, traffic, industrial or background (categorical).
- LATITUD, LONGITUD. Geographical data of each station where pollutants have been recorded (numeric).
- H01, H02, H03.... H24. Hourly recorded value for each pollutant. All pollutants are taken with unified measurement units (ยตg/m3) (numeric).
All concentration data are expressed in micrograms per cubic meter (ฮผg/mยณ).
Pollutant | Full Name |
---|---|
ะก6ะ6 (Benzene) | Benzene |
CO (Carbon Monoxide) | Carbon Monoxide |
NO2 (Nitrogen Dioxide) | Nitrogen Dioxide |
NOX (Nitrogen Oxides) | Nitrogen Oxides |
O3 (Ozone) | Ozone |
PM10 (Particulate Matter 10 ฮผm) | Particulate Matter (PM10) |
PM2.5 (Particulate Matter 2.5 ฮผm) | Particulate Matter (PM2.5) |
SO2 (Sulfur Dioxide) | Sulfur Dioxide |
- Exploratory Data Analysis (EDA)
- Data visualization through graphs and tables.
- Feature engineering: New features have been created from existing features to improve model performance.
- Feature selection: Deciding which features to include in the final model based on their importance and relevance.
- Creation and training of a machine learning model to predict the hazard level of each pollutant.
- Python ๐
- Pandas ๐ผ
- Numpy ๐งฎ
- Matplotlib/Seaborn ๐
- Pickle ๐ฅ
- Scikit-learn ๐ค (for the machine learning model).
- Streamlit ๐ (for developing the interactive web application).
data/
: Folder containing the used, processed and temporary datasets.models/
: Folder containing the machine learning models.- notebooks/`: Folder containing the notebooks used for the Exploratory Data Analysis (EDA) and the construction of the Random Forest model.
src/
: Folder containing the code to deploy the model using streamlit.README.md
: Project documentation (this file). ๐
-
Clone this repository:
git clone (https://github.com/Yakondra/Final-project-air-quality.git) o git clone (https://github.com/Pilizmt/Final-project-air-quality.git)
-
Install the dependencies:
pip install -r requirements.t
-
To create the website we have used: https://render.com/
-
Link to the interactive website: https://air-quality-predict.onrender.com/
Contributions are welcome! If you encounter problems, have ideas for improvements or want to add new features, feel free to send a pull request. ๐ค๐