Skip to content

This is a sms or email spam classification project with integration of mlops tools like mlflow, docker and git

License

Notifications You must be signed in to change notification settings

DarshanRokkad/Sms_Spam_Classification

Repository files navigation

📧 SMS/Email Spam 👤 Classification Project ‼️


Problem Statement

Given a input text message/email we have to detect whether it is a spam or not a spam.


Solution Explaination

Research Paper -> https://ieeexplore.ieee.org/document/10390514

Click the below image to see vedio solution explaination

YouTube Video


Approch for the problem

Steps:

  1. Downloading data and loading into jupyter notebook.
  2. Perfoming exploratory data analysis and feature engineering in jupyter notebook.
  3. Performing model building in jupyer notebook using different machine learning algorithm.
  4. Converting entire jupyter notebook experiments into modular coding with expection handling and logging.
  5. Then using training pipeline to probably train the model.
  6. Ingeration of mlflow and dags hub for monitering of nlp lifecycle and logging results.
  7. Building of prediction pipeline and falsk api to serve model.
  8. Testing api using Postman.
  9. Building ui using HTML and CSS for flask api.
  10. Then dockering the application and testing in local environment.
  11. Deploying complete working model on AWS cloud platform using CICD with github actions.

Project UI


API Testing Results


Mlflow and Dags Hub integration

Mlflow is used to mointer my Machine Learning Lifecycle and dags hub is a remote repository.


Docker image publishing

Step 1 : Built a docker image on local machine and tested the application.

Step 2 : Pushed tested docker image to the public docker hub.

This docker image is present in docker hub publically anyone can just pull this image and use the project, it's open sourced.

The command to run the pulled docker image is "docker run -p 5000:5000 darshanrm/spam_detection_app:latest"


Deployment Application On AWS

Using ECR + EC2 + Github action[CI-CD pipeline]

Step 1 : Created a IAM role and downloaded access key and secret key.

Step 2 : Created a ECR private registory to store docker image privately(can also be pull image present in the docker hub).

Step 3 : Created a EC2 instance to build docker and run my application.

Step 4 : Configured EC2 instance by installing docker dependencies and also creating github self hosted runner.

Step 5 : Can see the github self hosted runner in below image which is result of the above step.

Step 6 : Added github repository secrets which are used in the github workflow.

Step 7 : Can see successfully runned CI-CD pipeline in below image.

Step 8 : Accessed spam detection application using public IPv4 address.


Project Structure

│  
├── .github/workflow/main.yaml               <-- For Continous Integration, Continous Delivery and Contious Deployment.
│  
├── artificats                               <-- Contains dataset(train, test , modified and raw) and pickle files(vectorizer and model).
│  
├── images                                   <-- Contains images used in readme file.
│  
├── notebooks                                <-- Folder contains a jupyter notebook where eda and model training is performed.
│  
├── resources                                <-- Folder contains some usefull commands and steps used while build project.
│   
├── src
│   │
│   ├── components
│   │   │
│   │   ├── __init__.py
│   │   │
│   │   ├── data_ingestion.py                <-- First component of training pipeline which reads data from source and does train test split. 
│   │   │
│   │   ├── data_transformation.py           <-- Second component of training pipeline which takes train and test data and transform them into data which can be used to train model.
│   │   │
│   │   ├── model_training.py                <-- Third component of training pipeline which will use different machine learning algorithm
│   │   │                                        and train algorithms with transformed data and select best model and save that model of prediction
│   │   │
│   │   └── model_evaluation.py              <-- Fourth component of training pipeline which is used to evaluate the best model and save the performance metrics.
│   │
│   ├── pipeline
│   │   │
│   │   ├── __init__.py
│   │   │
│   │   ├── training_pipeline.py             <-- This pipeline is used to train model by combining all the components present in the components folder.
│   │   │
│   │   └── prediction_pipeline.py           <-- This pipeline uses the vectorizer and model which are obtained after training and does prediction and returns the prediction to application.  
│   │
│   ├── __init__.py
│   │
│   ├── exception.py                         <-- Exception module is contains a class which can be used to raise custom exceptions.
│   │
│   ├── logger.py                            <-- Logger module is used for logging and dugging which can be used to log various information.
│   │
│   └── utils.py                             <-- Utils module contains the commonly used methods in the project.
│   
├── static
│   │
│   └── css                                  <-- Folder contains all css files.
│   
├── templates                                <-- Folder contains all the html files.
│   
├── .gitignore                               <-- Used to ignore the file which are not needed to push to github.
│
├── application.py                           <-- Contains flask web application to take input from user and render output.
│
├── LICENSE                                  <-- Copyright license for the github repository.
│
├── README.md                                <-- Used to display the information about the project.
│
├── requirements.txt                         <-- Text file which contain the dependencies/packages used in project. 
│
├── setup.py                                 <-- Python script used for building python package of our project.
│
└── template.py                              <-- Program used to create our the project structure.

About

This is a sms or email spam classification project with integration of mlops tools like mlflow, docker and git

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published