This project provides a pipeline to build rainfall forecast models using 1D Convolutional Neural Networks. The pipeline can be configured with different meteorological data sources.
In the root directory of this repository, type the following command (you must have conda installed in your system):
./setup.sh
The project pipeline is defined as a sequence of three steps: (1) data retrieving, (2) data pre-processing and (3) model training. These steps are implemented as Python scripts in the ./src
directory.
All datasets generated by the above scripts described in this section will be stored in the ./data
folder.
This script has four command line arguments:
-s
or--sta
that define which station will be selected. You have to provide the weather station of interest by name: alto_da_boa_vista, guaratiba, iraja, jardim_botanico, riocentro, santa_cruz, sao_cristovao, vidigal.-a
or--all
which if filled with 1 indicates that they will be retrieveing the data of all stations.-b
or--begin
and-e
or--end
which can be filled with the interval of years for retrieveing the data.
Example 1:
python retrieve_ws_cor.py -s são_cristovao
The above command retrieves the São Cristóvão station observations.
Example 2:
python retrieve_ws_cor.py -a 1 -b 2000 -e 2015
The above command retrieves all the stations observations in the period from 2000 to 2015.
This script has four command line arguments:
-s
or--sta
, which defines which station will be selected. You must provide the weather stations using their code. The possible codes are A652 (Forte de Copacabana), A636 (Jacarepagua), A621 (Vila Militar), A602 (Marambaia).-a
or--all
which if filled with 1 indicates that data from all stations will be retrieveed.-b
or--begin
and-e
or--end
which can be filled with the interval of years for retrieveing the data.- -t defines the INMET API token to be used to access data.
Example 1:
python retrieve_ws_inmet.py -s A652 -api_token <token_string>
The above command retrieves the observations from station with code A652, saving the dowloaded content to a file named 'A652_1997_2022.csv'
Example 2:
python retrieve_ws_inmet.py -a 1 -b 1999 -e 2017 -api_token <token_string>
The command retrieves the observations from all stations between 1999 to 2017 will be retrieveed.
This script has two command line arguments:
-b
or--begin
and-e
or--end
which can be filled in with the year interval for data retrieving (The default interval for data retrieve period is from 1997 to 2022).
When running it the Galeão Airport sounding station observations dataset will be retrieved.
This script will generate the atmospheric instability indexes for the data retrieveed by the script retrieve_sounding.py. Data from the SBGL sounding (located at the Galeão Airport, Rio de Janeiro - Brazil) will be used to calculate atmospheric instability indexes, generating a new dataset in CSV format. This new dataset contains one entry per sounding probe. SBGL sounding station produces two probes per day (at 00:00h and 12:00h UTC). Each entry contains the values of the computed instability indices for one probe. The following instability indices are computed:
- CAPE
- CIN
- Lift
- k
- Total totals
- Show alter
The preprocessing script is responsible for performing several operations on the original dataset, such as creating variables or aggregating data, which can be interesting for model training and its final result. To run the preprocessing script you need to run the Python pre_processing.py
command. The pre_processing code has 3 possible arguments, with only the first being required.
The arguments are:
-f
or--file
Mandatory argument, represents the name of the data file that will be used as a base for the model. It must be the same as the name of one of the files present in the Data folder of the project.-d
or--data
Defines the data sources that will be used to assemble the dataset. The possible options are the following:- 'E': Weather station only
- 'E-N': Weather station and numerical model
- 'E-R': Weather station and radiosonde
- 'E-N-R': Weather station, numerical model, and radiosonde
-n
or--neighbors
Defines how many nearby meteorological stations will be used to enrich the dataset
Usage example:
python preprocess_datasources.py -f 'RIO DE JANEIRO - FORTE DE COPACABANA_1997_2022' -d 'E-N-R' -s 5'
Usage example:
python preprocessing.py -f 'RIO DE JANEIRO - FORTE DE COPACABANA_1997_2022' -d 'E-N-R' -s 5'
The above command creates a dataset considering the Forte de Copacabana station as center, with the aggregation of data from the 5 nearest meteorological stations, using the data sources: numerical model and radiosonde.
The above command creates a dataset considering the Forte de Copacabana station as center, with the aggregation of data from the 5 nearest meteorological stations, using the data sources: numerical model and radiosonde.
The model generation script is responsible for performing the training and exporting the results obtained by the model after testing. It can be executed through the command Python creates_modelo.py
, which needs two arguments -f
or -file
which receives the name of one of the datasets generated from pre-processing and -r
or --reg
which defines the architecture that will be used.
Execution Example:
python creates_modelo.py -f 'RIO DE JANEIRO - FORTE DE COPACABANA_E-N-R_EI+5NN'
An ordinal classification model will be created based on the already processed dataset from the Forte de Copacabana station.
python creates_modelo.py -f 'RIO DE JANEIRO - FORTE DE COPACABANA_E-N-R_EI+5NN' -r 1
A regression model will be created based on the already processed data set of the Forte de Copacabana station
retrieve : python retrieve_ws_inmet.py -s A652
Pre processing : python pre_processing.py -f 'RIO DE JANEIRO - FORTE DE COPACABANA' -d 'E-N-R' -s 5
Model generation : python creates_modelo.py -f 'RIO DE JANEIRO - FORTE DE COPACABANA_E-N-R_EI+5NN' -r 1