Machine Learning simple experimentation template

This repo represents a very simple batch job prediction with machine learning. To increment its maturity some test, machine learning metrics validation, and containerization (docker) could be added.

This repo contains a main package experiment. There is also 2 data folders 1) data, and 2) ml that respectively contains 1) the dataset, a sample json (for testing prediction), and a pickle file containing the split done for experimentation during the Exploratory Data Analysis (EDA), and 2) a machine learning model persisted with joblib.

The suggested order to explore this is:

eda.ipynb contains a simple Exploratory Data Analysis, simple analysis, and some experimentation preparation.
experiment package contains a simple train.py script which sould allow us to run a cross validation experiment over the provided data.
experiment package contains as imple predict.py script which should allow us to run a prediction over the sample json using the saved machine learning model (best model obtained during training).

How to use this

Once you have this folder on your machine you will need Poetry. Install it if you don't have it already. You also need Python version ">=3.10,<3.11".

I suggest you create a virtual environment for this, if you use virtualenv you can just run:

make env

and then activate the virtualen environment running source venv/bin/activate.

To install this application run:

make install

After installing Poetry, and succesfully running the command above, you should be able to run the application commands below:

To run an experiment:

python experiment/train.py

To run a prediction:

python -m experiment.predict

These commands should be run in the order presented here. As the first should generate a machine learning model file inside ml folder. The final prediction output of the data_sample.json will be generated inside the ml folder too, as final_output.json.

Running the commands above

Running the commands above you should see logs similar to below:

» python experiment/train.py  
INFO:root:reading file: data/dataset_test_ds_v2-Atualizado.csv
INFO:root:saved the model with results below:
INFO:root:    balanced_accuracy_score: 0.566
INFO:root:    recall_score: 0.292

INFO:root:saved the model with results below:
INFO:root:    balanced_accuracy_score: 0.669
INFO:root:    recall_score: 0.500

INFO:root:saved the model with results below:
INFO:root:    balanced_accuracy_score: 0.683
INFO:root:    recall_score: 0.542

INFO:root:    balanced_accuracy_score: 0.536
INFO:root:    recall_score: 0.250

INFO:root:    balanced_accuracy_score: 0.585
INFO:root:    recall_score: 0.333

» poetry run python experiment/predict.py
INFO:root:validating payload file...
INFO:root:getting data from file: data/data_sample.json
INFO:root:getting data from file: ml/clf.joblib
INFO:root:running predictions...
INFO:root:[{0: 0}, {1: 1}]

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
experiment		experiment
ml		ml
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
eda.ipynb		eda.ipynb
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning simple experimentation template

How to use this

Running the commands above

About

Releases

Packages

Languages

License

ricoms/ml-experimentation-template

Folders and files

Latest commit

History

Repository files navigation

Machine Learning simple experimentation template

How to use this

Running the commands above

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages