This project aims to be able to identify what political ideology an account on social media is representing or is aligned with based on their language used.
To approach this problem tweets will be analyzed using a series of data visualization techniques, and then will be used to create machine learning models to predict political affiliation. The language used by Democrats or Republicans are specific to their ideologies, therefore analyzing tweets written by political representatives will be a great asset to our machine learning models. Political representatives use very politically correct and specific jargon that is unique to their political parties lexicon.
Using methods such as Count Vectorizer and TF-IDF, a series of models were created to predict what party a tweet is leaning towards or identifies as. The models will then be compared based on accuracy and their confusion matrix to determine which one is most suitable for this task.
APPROACH :
- Sentiment Analysis
- StopWords, Stemming and Lemmatization
- TF-IDF
- CountVectorizer
Models Used:
- Naive Bayes Multinomial Classifier
- SVM Linear Kernel
- Decision Tree Regressor
Thank you!