Skip to content

buithanhdam/mlops-crack

Repository files navigation

MLOps Crack

A comprehensive self-learning MLOps course repository with practical implementations and resources.


Table of Contents

  1. Overview
  2. Installation
  3. Configuration
  4. Testing
  5. Run with Docker
  6. Security Notice
  7. Prerequisites
  8. Contact

Overview

This repository is a comprehensive guide to learning MLOps through hands-on practices, covering topics such as data pipeline, model training, and deployment using tools like Docker, DVC, and MLflow.


Installation

1. Clone the Repository

Clone the repository to your local machine:

git clone https://github.com/buithanhdam/mlops-crack.git
cd mlops-crack

2. Set Up Virtual Environment

For Unix/macOS:

python3 -m venv venv
source venv/bin/activate

For Windows:

python -m venv venv
.\venv\Scripts\activate

3. Install Dependencies

Install project dependencies:

pip install -r requirements.txt

4. Environment Configuration

Create a .env file from the provided template:

cp .env.example .env

Edit .env and set your environment variables:

MYSQL_DATABASE=mlops-crack
MYSQL_USER=user
MYSQL_PASSWORD=1
MYSQL_ROOT_PASSWORD=1
MYSQL_HOST=mysql
MYSQL_PORT=3306

AWS_ACCESS_KEY_ID=<your-aws-access-key-id>
AWS_SECRET_ACCESS_KEY=<your-aws-secret-access-key>

5. Configure DVC with S3 Bucket

  1. Initialize DVC:
dvc init
  1. Add S3 as the remote storage:
dvc remote add -d s3remote s3://your-bucket-name
  1. Configure AWS credentials:
    • Create a .aws/credentials file:
mkdir .aws
touch .aws/credentials
  • Add your AWS credentials:
[default]
aws_access_key_id = <your-access-key>
aws_secret_access_key = <your-secret-key>
  1. Create the data/raw directory for datasets:
mkdir -p data/raw

Place your raw datasets in the data/raw folder.


Configuration

Config Structure

configs/
├── config.yaml           # Main config
├── data/default.yaml     # Data settings
├── model/default.yaml    # Model parameters
└── training/default.yaml # Training parameters

Configuration Details

  • Main Config: General project settings, including paths and MLflow configurations.
  • Data Config: Controls dataset paths, column names, and split ratios.
  • Model Config: Specifies architecture and hyperparameters.
  • Training Config: Defines batch size, epochs, learning rate, etc.

Testing

Testing Data Pipeline

  1. Update configs/data/default.yaml with your dataset file and label column.
  2. Run the data pipeline:
python3 src/pipeline/data_pipeline.py

Testing Training Pipeline

  1. Update configs/model/default.yaml with the model of your choice (e.g., random_forest, svm).
  2. Run the training pipeline:
python3 src/pipeline/training_pipeline.py

Prediction Pipeline

Test the prediction process:

python3 src/pipeline/prediction_pipeline.py

Run with Docker

Build and Start Services

Build and start the services:

docker-compose up --build

Access services:

  • FastAPI: http://localhost:8000
  • MLflow: http://localhost:5000

Stop Services

Stop all running containers:

docker-compose down

Security Notice

  • Never commit credentials: Use .env and .aws for sensitive information.
  • Add .env and .aws/ to .gitignore.

Prerequisites

  • Python 3.9+
  • Docker and Docker Compose
  • AWS account with S3 access
  • Basic understanding of ML concepts

Contact

For support, open an issue on the GitHub repository.

About

MLops courses (self-learning)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published