A comprehensive self-learning MLOps course repository with practical implementations and resources.
This repository is a comprehensive guide to learning MLOps through hands-on practices, covering topics such as data pipeline, model training, and deployment using tools like Docker, DVC, and MLflow.
Clone the repository to your local machine:
git clone https://github.com/buithanhdam/mlops-crack.git
cd mlops-crack
python3 -m venv venv
source venv/bin/activate
python -m venv venv
.\venv\Scripts\activate
Install project dependencies:
pip install -r requirements.txt
Create a .env
file from the provided template:
cp .env.example .env
Edit .env
and set your environment variables:
MYSQL_DATABASE=mlops-crack
MYSQL_USER=user
MYSQL_PASSWORD=1
MYSQL_ROOT_PASSWORD=1
MYSQL_HOST=mysql
MYSQL_PORT=3306
AWS_ACCESS_KEY_ID=<your-aws-access-key-id>
AWS_SECRET_ACCESS_KEY=<your-aws-secret-access-key>
- Initialize DVC:
dvc init
- Add S3 as the remote storage:
dvc remote add -d s3remote s3://your-bucket-name
- Configure AWS credentials:
- Create a
.aws/credentials
file:
- Create a
mkdir .aws
touch .aws/credentials
- Add your AWS credentials:
[default]
aws_access_key_id = <your-access-key>
aws_secret_access_key = <your-secret-key>
- Create the
data/raw
directory for datasets:
mkdir -p data/raw
Place your raw datasets in the data/raw
folder.
configs/
├── config.yaml # Main config
├── data/default.yaml # Data settings
├── model/default.yaml # Model parameters
└── training/default.yaml # Training parameters
- Main Config: General project settings, including paths and MLflow configurations.
- Data Config: Controls dataset paths, column names, and split ratios.
- Model Config: Specifies architecture and hyperparameters.
- Training Config: Defines batch size, epochs, learning rate, etc.
- Update
configs/data/default.yaml
with your dataset file and label column. - Run the data pipeline:
python3 src/pipeline/data_pipeline.py
- Update
configs/model/default.yaml
with the model of your choice (e.g.,random_forest
,svm
). - Run the training pipeline:
python3 src/pipeline/training_pipeline.py
Test the prediction process:
python3 src/pipeline/prediction_pipeline.py
Build and start the services:
docker-compose up --build
Access services:
- FastAPI:
http://localhost:8000
- MLflow:
http://localhost:5000
Stop all running containers:
docker-compose down
- Never commit credentials: Use
.env
and.aws
for sensitive information. - Add
.env
and.aws/
to.gitignore
.
- Python 3.9+
- Docker and Docker Compose
- AWS account with S3 access
- Basic understanding of ML concepts
For support, open an issue on the GitHub repository.