Skip to content

the-franks/atmoseer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AtmoSeer

About

This project provides a pipeline to build rainfall forecast models using 1D Convolutional Neural Networks. The pipeline can be configured with different meteorological data sources.

Install

In the root directory of this repository, type the following command (you must have conda installed in your system):

./setup.sh

Project pipeline

The project pipeline is defined as a sequence of three steps: (1) data retrieving, (2) data pre-processing and (3) model training. These steps are implemented as Python scripts in the ./src directory.

Data retrieving scripts

All datasets generated by the above scripts described in this section will be stored in the ./data folder.

Script retrieve_ws_cor.py

This script has four command line arguments:

  • -s or --sta that define which station will be selected. You have to provide the weather station of interest by name: alto_da_boa_vista, guaratiba, iraja, jardim_botanico, riocentro, santa_cruz, sao_cristovao, vidigal.
  • -a or --all which if filled with 1 indicates that they will be retrieveing the data of all stations.
  • -b or --begin and -e or --end which can be filled with the interval of years for retrieveing the data.

Example 1:

python retrieve_ws_cor.py -s são_cristovao

The above command retrieves the São Cristóvão station observations.

Example 2:

python retrieve_ws_cor.py -a 1 -b 2000 -e 2015

The above command retrieves all the stations observations in the period from 2000 to 2015.

Script retrieve_ws_inmet.py

This script has four command line arguments:

  • -s or --sta, which defines which station will be selected. You must provide the weather stations using their code. The possible codes are A652 (Forte de Copacabana), A636 (Jacarepagua), A621 (Vila Militar), A602 (Marambaia).
  • -a or --all which if filled with 1 indicates that data from all stations will be retrieveed.
  • -b or --begin and -e or --end which can be filled with the interval of years for retrieveing the data.
  • -t defines the INMET API token to be used to access data.

Example 1:

python retrieve_ws_inmet.py -s A652 -api_token <token_string>

The above command retrieves the observations from station with code A652, saving the dowloaded content to a file named 'A652_1997_2022.csv'

Example 2:

python retrieve_ws_inmet.py -a 1 -b 1999 -e 2017 -api_token <token_string>

The command retrieves the observations from all stations between 1999 to 2017 will be retrieveed.

Script retrieve_sounding.py

This script has two command line arguments:

  • -b or --begin and -e or --end which can be filled in with the year interval for data retrieving (The default interval for data retrieve period is from 1997 to 2022).

When running it the Galeão Airport sounding station observations dataset will be retrieved.

Script gen_sounding_indices.py

This script will generate the atmospheric instability indexes for the data retrieveed by the script retrieve_sounding.py. Data from the SBGL sounding (located at the Galeão Airport, Rio de Janeiro - Brazil) will be used to calculate atmospheric instability indexes, generating a new dataset in CSV format. This new dataset contains one entry per sounding probe. SBGL sounding station produces two probes per day (at 00:00h and 12:00h UTC). Each entry contains the values of the computed instability indices for one probe. The following instability indices are computed:

  • CAPE
  • CIN
  • Lift
  • k
  • Total totals
  • Show alter

Script preprocessing.py

Pre Processing

The preprocessing script is responsible for performing several operations on the original dataset, such as creating variables or aggregating data, which can be interesting for model training and its final result. To run the preprocessing script you need to run the Python pre_processing.py command. The pre_processing code has 3 possible arguments, with only the first being required.

The arguments are:

  • -f or --file Mandatory argument, represents the name of the data file that will be used as a base for the model. It must be the same as the name of one of the files present in the Data folder of the project.
  • -d or --data Defines the data sources that will be used to assemble the dataset. The possible options are the following:
    • 'E': Weather station only
    • 'E-N': Weather station and numerical model
    • 'E-R': Weather station and radiosonde
    • 'E-N-R': Weather station, numerical model, and radiosonde
  • -n or --neighbors Defines how many nearby meteorological stations will be used to enrich the dataset

Usage example:

python preprocess_datasources.py -f 'RIO DE JANEIRO - FORTE DE COPACABANA_1997_2022' -d 'E-N-R' -s 5'

Usage example:

python preprocessing.py -f 'RIO DE JANEIRO - FORTE DE COPACABANA_1997_2022' -d 'E-N-R' -s 5'

The above command creates a dataset considering the Forte de Copacabana station as center, with the aggregation of data from the 5 nearest meteorological stations, using the data sources: numerical model and radiosonde.

The above command creates a dataset considering the Forte de Copacabana station as center, with the aggregation of data from the 5 nearest meteorological stations, using the data sources: numerical model and radiosonde.

Model training

The model generation script is responsible for performing the training and exporting the results obtained by the model after testing. It can be executed through the command Python creates_modelo.py, which needs two arguments -f or -file which receives the name of one of the datasets generated from pre-processing and -r or --reg which defines the architecture that will be used. Execution Example:

python creates_modelo.py -f 'RIO DE JANEIRO - FORTE DE COPACABANA_E-N-R_EI+5NN'

An ordinal classification model will be created based on the already processed dataset from the Forte de Copacabana station.

python creates_modelo.py -f 'RIO DE JANEIRO - FORTE DE COPACABANA_E-N-R_EI+5NN' -r 1

A regression model will be created based on the already processed data set of the Forte de Copacabana station

System test example

retrieve : python retrieve_ws_inmet.py -s A652

Pre processing : python pre_processing.py -f 'RIO DE JANEIRO - FORTE DE COPACABANA' -d 'E-N-R' -s 5

Model generation : python creates_modelo.py -f 'RIO DE JANEIRO - FORTE DE COPACABANA_E-N-R_EI+5NN' -r 1

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.9%
  • Python 1.1%