This is a Binary Classification Problem Statement in which we have to classify the Ham and the Spam emails.
Here is the introduction of the dataset i will be using for this project.
The dataset used in this project is from Apache SpamAssassin.
Apache SpamAssassin is the #1 Open Source anti-spam platform giving system administrators a filter to classify email and block spam (unsolicited bulk email).
It uses a robust scoring framework and plug-ins to integrate a wide range of advanced heuristic and statistical analysis tests on email headers and body text including text analysis, Bayesian filtering, DNS blocklists, and collaborative filtering databases.
Apache SpamAssassin is a project of the Apache Software Foundation (ASF). You can find more about them from the below link:
https://spamassassin.apache.org/
The dataset we will be using is hosted at the below link: