The repo includes code to process text, engineer features and perform sentiment analysis using Neural Networks. The project uses LSTM to train on the data and achieves a testing accuracy of 79%.
- Install pyenv for managing Python versions
brew install pyenv
- Install python with this flag
CFLAGS="-I$(xcrun --show-sdk-path)/usr/include" pyenv install 3.7.2
- Clone the repo to your machine
git clone https://github.com/kb22/Twitter-Sentiment-Analysis-using-Neural-Networks.git
- Move into the folder
cd Twitter-Sentiment-Analysis-using-Neural-Networks
- Install all dependencies
pip install -r requirements.txt
The dataset has been taken from Kaggle
- Download the file from kaggle.
- Extract the zip and rename the
csv
todataset.csv
- Create a folder
data
insideTwitter-Sentiment-Analysis-using-Neural-Networks
folder - Copy the file dataset.csv to inside the
data
folder
The Jupyter notebook Dataset analysis.ipynb includes analysis for the various columns in the dataset and a basic overview of the dataset.
- Run Jupyter
jupyter notebook
- Select the file Dataset analysis.ipynb from the list to see dataset analysis.
The whole project is broken into different Python files from splitting the dataset to actually doing sentiment analysis. The steps to carry out Twitter Sentiment Analysis are:
- Run the file
train-test-split.py
to split the Twitter dataset into training and testing data.
python train-test-split.py
- Run the file
preprocessing.py
to process the tweets.
- Remove @user mentions
- Remove non-alphabetic characters + spaces + apostrophe
- Remove links
- Remove single characters
- Remove stopwords
- Lemmatize words
- Stem words
python preprocessing.py
- After processing of the tweets, LSTM can be used to train on the data and test the accuracy on the test data.
python lstm.py