Skip to content
/ d2m Public

A machine learning pipeline taking you from raw data to fully trained machine learning model - from data to model (d2m).

License

Notifications You must be signed in to change notification settings

ejhusom/d2m

Folders and files

NameName
Last commit message
Last commit date

Latest commit

5a49073 · Feb 20, 2024

History

81 Commits
Dec 18, 2023
Nov 24, 2023
Feb 20, 2024
Sep 19, 2022
Sep 19, 2022
Nov 23, 2023
Dec 18, 2023
Sep 19, 2022
Sep 29, 2023
Nov 23, 2023
Feb 20, 2024
Nov 23, 2023
Dec 18, 2023
Mar 24, 2023

Repository files navigation

d2m - Data to model

A machine learning pipeline for trustworthy and green models, enabling responsible AI:

  • Explainable AI, using SHAP, LIME or both.
  • Uncertainty estimation, using Bayesian dropout for neural networks.
  • Carbon emissions tracking and reporting, using CodeCarbon.

d2m lets you easily create and evaluate machine learning models for tabular and time series data, with built-in data profiling and feature engineering.

Usage

Tested on:

  • Linux
  • macOS
  • Windows with WSL 2
  1. Clone/download this repository.
  2. Place your datafiles (csv) in a folder with the name of your dataset (DATASET) inside assets/data/raw/, so the path to the files is assets/data/raw/[DATASET]/.
  3. Update params.yaml with the name of your dataset (DATASET), the target variable, and other configuration parameters.
  4. Build Docker container:
docker build -t d2m -f Dockerfile .
  1. Run the container:
docker run -p 5000:5000 -it -v $(pwd)/assets:/usr/d2m/assets -v $(pwd)/.dvc:/usr/d2m/.dvc d2m
  1. Open the website at localhost:5000 to use the graphical user interface.

Creating models on the command line

  1. Copy params.yaml from the host to the container (find CONTAINER_NAME by running docker ps):
docker cp params.yaml  [CONTAINER_NAME]:/usr/d2m/params.yaml
  1. Inside the interactive session in the container, run:
docker exec [CONTAINER_NAME] dvc repro

Releases

No releases published

Packages

No packages published