In this project, we are going to implement an End-to-End and basic Natural Language Processing project, which is sentiment analysis on IMDB movie reviews. In this project, our purpose is to analyze and predict the reviewer's feelings about a movie, which is either positive or negative. The dataset which is used in this project is IMDB Dataset of 50K Movie Reviews
which you can download from this link. The dataset is not completely clean, and to clean this dataset, various text preprocessing steps were used, such as Stemming
, Stop Words Removal
, Regular Expressions
, etc. The cleaned dataset is available in this repository as cleaned_data.csv
.
To represent features, two methods were used, which are TF-IDF
and Word Embedding
. Word Embedding has less computational complexity due to less sparsity. So this method is used in production.
Two models were used, Naive Bayes classifier, which is based on the Bayesian theorem and predicts the label based on features/words related to that specific label. The other model is 1D CNN. Both models yield the same accuracy and result(~%85
). Models are available in this repo and can be accessed by the models
folder.
The Code is written in Python 3.7.5. If you don't have Python installed, you can find it here. If you are using a lower version of Python, you can upgrade using the pip package to ensure you have the latest version of pip. To install the required packages and libraries, run this command in the project directory after cloning the repository:
[email protected]:Kasra1377/IMDB-sentiment-analysis.git
or
https://github.com/Kasra1377/IMDB-sentiment-analysis.git
To run the web app on your computer, first open the app.py
python file by your own IDE. After that, open your Git Bash and type the following commands respectively:
export FLASK_APP=app.py
export FLASK_ENV=development
FLASK_DEBUG=1 flask run
Now, the web app is opened locally in your browser.
The model has been created and put into a web application and you can see the performance and the output of the model below:
If you ever encounter any bugs or technical issues in this project, you can report them to the issues
section of this repository, or you can contact me by my email address.
Kasra1377