Product/Service Monitoring System
This project was inspired sentdex
. (https://github.com/Sentdex/socialsentiment/)
Jupyter Notebooks
product_service_monitoring_system.ipynb
: contains training and comparing of various models including BERT, TextBlob, and VADER models.
Python Files
twitter_stream.py
: connects to Twitter API and streamline tweets with various keywordstweet_analyzer.py
: cleans and analyzes streamed tweetsapp.py
: Dash dashboard
Folders
images
: contains image filesmodels
: contains (1) BERT NLP and (2) spaCy TF-IF vectorization sentiment analysis models *(BERT model excluded)data
: contains SQLite3 databases pulled from Twitter using Tweepy (only sample database is included) andbag_of_words
datasets
: contains datasets that were used to train various NLP sentiment analysis modelssrc
: contains useful codes that were used in creating modelskeys
: contains Twitter API key information *(files excluded)
How to use this project
- Add Twitter API information in
keys
directory, and make sure the PATH is correctly defined intwitter_stream.py
. - Adjust any keywords or queries in
twitter_stream.py
. - Set up SQLITE3 database path and run
twitter_stream.py
. - Download
BERT_2
model using this test and store it in a directory of your choice. - In
tweet_analyzer.py
: 1) correctly set up SQLITE3 database path that matches step #3 and 2) correctly set up pathway forBERT_2
model. - Run
tweet_analyzer.py
- Run
app.py
As of January of 2020, there are approximately 145 million users on Twitter. 22% of Americans are on Twitter and 500 million tweets are sent each day globally. This is why many companies internationally use Twitter for marketing. In fact 80% of Twitter users have mentioned a brand in a tweet, and 77% of Twitter users feel more positive when their tweets have been replied by the mentioned brand [1].
We believe that Twitter is one of the platforms that provides people to share their opinion, evaluations, attitudes, and emotions about virtually anything including certain products freely, and for any companies, this is like a gold mine waiting to be mined for opinions are central to almost all human activities and are key influencers of our behaviors
[2].
Sources:
[1] https://unsplash.com/photos/ulRlAm1ITMU
[2]. Bing Liu, https://www.morganclaypool.com/doi/abs/10.2200/s00416ed1v01y201204hlt016
The goals of this project are to
[1] stream Twitter with on various topics (ex. Microsoft, Starbucks, Google, etc.) and
[2] effectively implement various NLP models (including TextBlob, BERT, and TF-IDF models) to classify tweets
[3] to flag the user for any strongly negative tweets on a Dashboard to effectively respond to them.
BERT model has accuracy of 84%, however due to its computing time, VADER was used as an initial model to classify polarity of each tweet. The BERT model was used to confirm any strongly negative sentiment tweets classified by VADER.
Keyword You are able to select up to two keywords to monitor online. If you choose to choose two keywords, you must separate them by a comma!
*if no keyword is selected, then it streams
Graph1 This graph represents moving average value of sentiments towards the keywords you have selected along with number of tweets sent.
Graph2 This pie graph represents sentiment distribution towards your keyword.
By clicking Generate Word Cloud
, it generates frequently used words in tweets related to the keywords for each sentiment.
Table1
This table shows recent tweets sent filtered by the keywords chosen. You are able to click on link
button to access the actual tweet.
Table2 This table shows recent flagged tweets that are strongly negative.
Saving data into .csv file
You must click GENERATE CSV FILE
before you can download the files. You can either download the whole raw tweet data or just the flagged ones.
- Currently working on deploying the app through Heroku!
- Database structure
- As the database size increases, it might get too much for sqlite3 to handle. So, we want to separate database into parts so only when a large number of data is requested, we can combine or join smaller databases.
- Reply feature
- It would be nice if we could add a feature where you could reply to any of tweets shown in the dashboard without going to twitter page.
- Multiple keywords
- It would be nice if multiple keywords can be analyzed and followed at a given time.
- Table Editing Mode
- Modify flagged tweets in the dashboard so that a user can classify flagged tweets as 'resolved', 'false negative', or 'other' for further customer service / data analysis.
- Further analysis on both positive and negative tweets
- Find any correlation between tweet trends with how the company is doing to help with future direction of a company.