This project focuses on detecting spam messages using natural language processing (NLP) techniques and machine learning models.
The dataset includes SMS messages labeled as spam or ham (not spam).
-
Data Preprocessing:
- Text cleaning and normalization.
- Tokenization and stemming.
- Converting text data into numerical representations using techniques like TF-IDF.
-
Exploratory Data Analysis (EDA):
- Visualizing the distribution of spam and ham messages.
- Analyzing common words and phrases in spam messages.
-
Model Building:
- Training various machine learning models like Naive Bayes, SVM, and Random Forest.
- Evaluating model performance using metrics such as accuracy, precision, recall, and F1-score.
-
Model Evaluation:
- Comparing different models.
- Selecting the best model based on evaluation metrics.
To run this project, ensure you have the required packages installed and execute the notebook.
Refer to the requirements.txt
file for a list of dependencies.