Rotten_Tomato_Project

Group Name: Wonder Women 4.0

Section: 2

The Goal of the project:

Analyze the different kinds of movie features & Predict the box_office of future movies based on machine learning models

What we have been implemented:

1.Scrap data & Data cleaning

The dataset of this project comes from the website “rottentomatoes.com”. We scraped information on top 100 movies from 2000 to 2018, representing popular movies in the past 20 years. Features include (movie) 'name','meter_score','audience_score','rating','genre','cast', 'director','time','studio','runtime', and ‘movie_synopsis’. Our goal is to predict “box_office” based on these features . To quantify movie, we created scores for each movie by adding up scores for each cast member and director in the movie, both represented by their corresponding ranking on “imdb.com”.

We also scraped all top reviews from movie critics for each movie from 2014 to 2018.

2. Basic Analysis & Visualization

We analyzed the box office distribution, trend of movie quantities for each genre, rating distribution and director explorations.

FUN TIME! Know more about your favorite director!

This part allows reader to input the name of his/her favorite director, and a chart indicates the genre and box office preference of this director will be returned.

3. Text Mining

For the first part of the text mining, we did the word cloud for all reviews for 500 movies. We use three approaches, including the Hu and Liu’s sentiment analysis lexicon, NRC sentiment data analysis and the Vader analysis to perform the sentiment analysis for those reviews. We work out the rankings of the movies’ reviews and pick up the top four movies for each method. The review word cloud for these four movies are then presented. We conclude that the NRC sentiment data analysis provides a more relatively accurate result for this problem.

In the second, for each of the 500 movies, we conducted text summarization on all reviews. The program generates a short paragraph of summary of reviews for each movie.

In the 'FUN TIME!' section, you will also be able to type in any movie name between 2014 to 2018, and receive the summary of how critics evaluate this movie, given that this movie is in our database.

4. Modeling & Prediction

We applied Word Embedding, Lasso, Linear Regression, Random Forest and XGBoost for box office prediction. And finally we choosed XGBoost model for its best performance.

Fun Time! Let's predict box offices!¶

This part allows reader to input the movie index in our test data set, the predicted box office and real box office will be returned. (The trained XGBoost model could also be used for future box office prediction! But due to the complexity of data preprocessing and Lasso, we decided to use test data to realize this function.)

Installation instruction:

see requirements.txt for instruction

Run instruction:

Please go to "Final_Project", then enter "Part II- Analysis+Visualization+Text_mining+Modelling.ipynb". Then, please run all the codes in the file. Please enjoy the fun parts inside. We aslo provide exported html file in the "Final_Project" .

Name	Name	Last commit message	Last commit date
Latest commit Joy-yuan update Dec 3, 2018 d72db36 · Dec 3, 2018 History 89 Commits
.ipynb_checkpoints	.ipynb_checkpoints	integrated both parts of text mining codes	Dec 2, 2018
Final_Project	Final_Project	add requirements and fix some mini error	Dec 3, 2018
Scratch	Scratch	organize files into scratch	Dec 3, 2018
.DS_Store	.DS_Store	add requirements and fix some mini error	Dec 3, 2018
README.md	README.md	update	Dec 3, 2018
requirements.txt	requirements.txt	add requirements and fix some mini error	Dec 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rotten_Tomato_Project

Group Name: Wonder Women 4.0

Section: 2

The Goal of the project:

Analyze the different kinds of movie features & Predict the box_office of future movies based on machine learning models

What we have been implemented:

1.Scrap data & Data cleaning

2. Basic Analysis & Visualization

FUN TIME! Know more about your favorite director!

3. Text Mining

4. Modeling & Prediction

Fun Time! Let's predict box offices!¶

Installation instruction:

Run instruction:

About

Releases

Packages

Languages

aylchenn/RottenTomatoes

Folders and files

Latest commit

History

Repository files navigation

Rotten_Tomato_Project

Group Name: Wonder Women 4.0

Section: 2

The Goal of the project:

Analyze the different kinds of movie features & Predict the box_office of future movies based on machine learning models

What we have been implemented:

1.Scrap data & Data cleaning

2. Basic Analysis & Visualization

FUN TIME! Know more about your favorite director!

3. Text Mining

4. Modeling & Prediction

Fun Time! Let's predict box offices!¶

Installation instruction:

Run instruction:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages