Machine Learning with Spark MLlib is one of the project titles taken up as a part of the UE19CS322 Big Data course at PES University. This simulates a real world scenario with enormous amount of data for predictive modelling. The data source is a stream and the application faces the constraint of only being able to handle batches of a stream at any given point in time.
This project uses spark to train ML algorithms to help classify mails based on the subject and body of the mail into either spam or Ham(Non-Spam) mails. We have used three models MLP(Multilayer Perceptron) MNB(Multinomial Naive bayes) and PAC(Passive Aggressive Classifier) which are Supervised Learning Models and Mini Batch K-means clustering which is an unsupervised learning algorithm