MLOps Crack

A comprehensive self-learning MLOps course repository with practical implementations and resources.

Overview

This repository is a comprehensive guide to learning MLOps through hands-on practices, covering topics such as data pipeline, model training, and deployment using tools like Docker, DVC, and MLflow.

Installation

1. Clone the Repository

Clone the repository to your local machine:

git clone https://github.com/buithanhdam/mlops-crack.git
cd mlops-crack

2. Set Up Virtual Environment

For Unix/macOS:

python3 -m venv venv
source venv/bin/activate

For Windows:

python -m venv venv
.\venv\Scripts\activate

3. Install Dependencies

Install project dependencies:

pip install -r requirements.txt

4. Environment Configuration

Create a .env file from the provided template:

cp .env.example .env

Edit .env and set your environment variables:

MYSQL_DATABASE=mlops-crack
MYSQL_USER=user
MYSQL_PASSWORD=1
MYSQL_ROOT_PASSWORD=1
MYSQL_HOST=mysql
MYSQL_PORT=3306

AWS_ACCESS_KEY_ID=<your-aws-access-key-id>
AWS_SECRET_ACCESS_KEY=<your-aws-secret-access-key>

5. Configure DVC with S3 Bucket

Initialize DVC:

dvc init

Add S3 as the remote storage:

dvc remote add -d s3remote s3://your-bucket-name

Configure AWS credentials:
- Create a .aws/credentials file:

mkdir .aws
touch .aws/credentials

Add your AWS credentials:

[default]
aws_access_key_id = <your-access-key>
aws_secret_access_key = <your-secret-key>

Create the data/raw directory for datasets:

mkdir -p data/raw

Place your raw datasets in the data/raw folder.

Configuration

Config Structure

configs/
├── config.yaml           # Main config
├── data/default.yaml     # Data settings
├── model/default.yaml    # Model parameters
└── training/default.yaml # Training parameters

Configuration Details

Main Config: General project settings, including paths and MLflow configurations.
Data Config: Controls dataset paths, column names, and split ratios.
Model Config: Specifies architecture and hyperparameters.
Training Config: Defines batch size, epochs, learning rate, etc.

Testing

Testing Data Pipeline

Update configs/data/default.yaml with your dataset file and label column.
Run the data pipeline:

python3 src/pipeline/data_pipeline.py

Testing Training Pipeline

Update configs/model/default.yaml with the model of your choice (e.g., random_forest, svm).
Run the training pipeline:

python3 src/pipeline/training_pipeline.py

Prediction Pipeline

Test the prediction process:

python3 src/pipeline/prediction_pipeline.py

Run with Docker

Build and Start Services

Build and start the services:

docker-compose up --build

Access services:

FastAPI: http://localhost:8000
MLflow: http://localhost:5000

Stop Services

Stop all running containers:

docker-compose down

Security Notice

Never commit credentials: Use .env and .aws for sensitive information.
Add .env and .aws/ to .gitignore.

Prerequisites

Python 3.9+
Docker and Docker Compose
AWS account with S3 access
Basic understanding of ML concepts

Contact

For support, open an issue on the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.dvc		.dvc
.github/workflows		.github/workflows
airflow/dags		airflow/dags
configs		configs
data		data
models		models
notebooks		notebooks
scripts		scripts
src		src
templates		templates
tests		tests
.dockerignore		.dockerignore
.dvcignore		.dvcignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Dockerfile.mlflow		Dockerfile.mlflow
Jenkinsfile		Jenkinsfile
Makefile		Makefile
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLOps Crack

Table of Contents

Overview

Installation

1. Clone the Repository

2. Set Up Virtual Environment

For Unix/macOS:

For Windows:

3. Install Dependencies

4. Environment Configuration

5. Configure DVC with S3 Bucket

Configuration

Config Structure

Configuration Details

Testing

Testing Data Pipeline

Testing Training Pipeline

Prediction Pipeline

Run with Docker

Build and Start Services

Stop Services

Security Notice

Prerequisites

Contact

About

Releases

Packages

Languages

buithanhdam/mlops-crack

Folders and files

Latest commit

History

Repository files navigation

MLOps Crack

Table of Contents

Overview

Installation

1. Clone the Repository

2. Set Up Virtual Environment

For Unix/macOS:

For Windows:

3. Install Dependencies

4. Environment Configuration

5. Configure DVC with S3 Bucket

Configuration

Config Structure

Configuration Details

Testing

Testing Data Pipeline

Testing Training Pipeline

Prediction Pipeline

Run with Docker

Build and Start Services

Stop Services

Security Notice

Prerequisites

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages