Skip to content

Project for MLOps zoomcamp to use new tools and best practices

Notifications You must be signed in to change notification settings

bryskulov/mlops-house-prices

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

House price prediction

Project for MLOps Zoomcamp The main goal of the project is to use MLOps tools and best practices on prediction task.

Dataset source: Kaggle House Prices Prediction challange link

Modelling notebook is inspired by Serigne's notebook

Problem Definition

This project tries to automate the prediction of house prices based on different features of a house such as location, shape, available utilities, condition, style, etc. The project intents to automate different stages of the process including training, deployment and further sustaining it in production.

Project Architecture

mlops_project

Documentation

Installation

The project is developed on AWS EC2 instance and it is highly recommended to run on EC2 instance as well. Model artifacts are stored on AWS S3 bucket, so it is advised to create a S3 bucket with you custom name.

Programs installed on EC2: Anaconda, Docker, docker-compose

Clone this repository to the local repository

    git clone https://github.com/bryskulov/mlops-house-prices.git

Folder explanations:

  • notebooks: Jupyter notebooks for prototyping
  • model_training: Automated model training scripts
  • web_service: Deployment of the model as a web-service

Model training (model_training/)

First, install pipenv package and later the other packages from Pipfile. It is important to be in the same directory as the Pipfile, when running the bash script.

    pip install pipenv
    pipenv install

Activate the pipenv environment:

    pipenv shell

Set your AWS S3 Bucket name as environment variable:

    export S3_BUCKET_PATH="s3://mlflow-models-bryskulov"

Train model once with Python CLI

This script is used to train the model once using the data in "model_training/data/" path. The idea is that new models are to be trained depending on the data in that folder. In future, of course, it is better to pull data from some relational database.

Activate MLFlow Tracking server. If you need to create new database, you can use the following script

    mlflow ui --backend-store-uri sqlite:///mlflow.db --default-artifact-root=$S3_BUCKET_PATH

To run the model training, run:

    python train.py --data_path data/train.csv

Note: I use MLFlow model registry in the jupyter notebook, however in production I decided to use models by the RUN_ID.

Training with Prefect Deployment with Scheduling

Here, model training is scheduled via workflow orchestration tool "Prefect".

Activate MLFlow Tracking server:

    mlflow ui --backend-store-uri sqlite:///mlflow.db --default-artifact-root=$S3_BUCKET_PATH

Start Prefect UI with the following bash command:

    prefect orion start

Note: It will run prefect server and it can accessed from the browser.

Create a new deployment with Prefect CLI command:

    prefect deployment create prefect_deploy.py

Note: This will create a new deployment in prefect, however it won't run it. To run the deployment we should create a work queue, it can be done in Prefect UI.

After creating work queue, we need to start the agent via bash script:

    prefect agent start <work queue ID>

Now, you can observe all the scheduled, completed and failed flows in the Prefect UI.

Choosing the model

After training the models, inspect the models and choose the model that you prefer. Pay attention that the chosen model has an artifact attached.\

Define the chosen varibles as enrivonment variables:

    export MLFLOW_EXPERIMENT_ID='1'
    export RUN_ID='be58cd18afc44f5ab13b3409613e04f9'

Deploying a model as Flask API service with MLFlow on EC2 instance (web-service/)

Don't forget to change the directory and initiate a different Pipenv environment:

    cd ..
    cd web-service
    pipenv shell

The web application is deployed via Flask on the localhost:9696.

Deploying the service with Makefile with all checks and tests

You can deploy the model easily with a couple of commands, the script will make all the checks and only then deploy the service.

First, change the environment variables in the file ".env" according to you.

Second, run the Makefile:

    make setup
    make deploy

Deploying the service manually

To build the Docker Image run:

    docker build -t house-price-prediction-service:v2 .

Run the Docker:

docker run -it --rm -p 9696:9696 \
    -e S3_BUCKET_PATH=$S3_BUCKET_PATH\
    -e MLFLOW_EXPERIMENT_ID=$MLFLOW_EXPERIMENT_ID \
    -e RUN_ID=$RUN_ID \
    house-price-prediction-service:v2

Testing

I run both unit tests and integration test on my deployment application.

Unit tests

Pytest is used for unittesting. The tests can be run through IDE or by script:

    pytest unit_tests

Integration test

Integration test is automated, so you only need to run script "run.sh" in the folder "integration_test":

	cd integration_test
    source run.sh

Note: If you get an error, check that you activate pipenv environment and passed the environment variables such as s3_bucket_path, mlflow_experiment_id, run_id

About

Project for MLOps zoomcamp to use new tools and best practices

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published