Skip to content

Commit

Permalink
Deep Learning Notebook Finished
Browse files Browse the repository at this point in the history
  • Loading branch information
khanhnamle1994 committed Apr 17, 2018
1 parent 6be40de commit b8d8bff
Show file tree
Hide file tree
Showing 2 changed files with 92 additions and 10 deletions.
51 changes: 46 additions & 5 deletions .ipynb_checkpoints/Deep_Learning_Model-checkpoint.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,9 @@
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Import libraries\n",
Expand Down Expand Up @@ -277,7 +279,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Root Mean Square Error"
"### Root Mean Square Error\n",
"During the training process above, I saved the model weights each time the validation loss has improved. Thus, I can use that value to calculate the best validation Root Mean Square Error."
]
},
{
Expand All @@ -303,7 +306,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Predict the Ratings"
"The best validation loss is *0.7424* at epoch 17. Taking the square root of that number, I got the RMSE value of *0.8616*, which is better than the RMSE from the SVD Model (*0.8736*)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Predict the Ratings\n",
"The next step is to actually predict the ratings a random user will give to a random movie. Below I apply the freshly trained deep learning model for all the users and all the movies, using 100 dimensional embeddings for each of them. I also load pre-trained weights from *[weights.h5](https://github.com/khanhnamle1994/movielens/blob/master/weights.h5)* for the model."
]
},
{
Expand All @@ -320,6 +331,13 @@
"trained_model.load_weights('weights.h5')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As mentioned above, my random test user is has ID 2000."
]
},
{
"cell_type": "code",
"execution_count": 15,
Expand Down Expand Up @@ -381,6 +399,13 @@
"users[users['user_id'] == TEST_USER]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here I define the function to predict user's rating of unrated items, using the *rate* function inside the CFModel class in *[CFModel.py](https://github.com/khanhnamle1994/movielens/blob/master/CFModel.py)*."
]
},
{
"cell_type": "code",
"execution_count": 16,
Expand All @@ -394,6 +419,13 @@
" return trained_model.rate(user_id - 1, movie_id - 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here I show the top 20 movies that user 2000 has already rated, including the *predictions* column showing the values that used 2000 would have rated based on the newly defined *predict_rating* function."
]
},
{
"cell_type": "code",
"execution_count": 17,
Expand Down Expand Up @@ -678,7 +710,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Recommend Movies"
"No surpise that these top movies all have 5-start rating. Some of the prediction values seem off (those with value 3.7, 3.8, 3.9 etc.)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Recommend Movies\n",
"Here I make a recommendation list of unrated 20 movies sorted by prediction value for user 2000. Let's see it."
]
},
{
Expand Down Expand Up @@ -923,7 +963,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion"
"## Conclusion\n",
"In this notebook, I showed how to use a simple deep learning approach to build a recommendation engine for the MovieLens 1M dataset. This model performed better than all the approaches I attempted before (content-based, user-item similarity collaborative filtering, SVD). I can certainly improve this model's performance by making it deeper with more linear and non-linear layers. I leave that task to you then!"
]
}
],
Expand Down
51 changes: 46 additions & 5 deletions Deep_Learning_Model.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,9 @@
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Import libraries\n",
Expand Down Expand Up @@ -277,7 +279,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Root Mean Square Error"
"### Root Mean Square Error\n",
"During the training process above, I saved the model weights each time the validation loss has improved. Thus, I can use that value to calculate the best validation Root Mean Square Error."
]
},
{
Expand All @@ -303,7 +306,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Predict the Ratings"
"The best validation loss is *0.7424* at epoch 17. Taking the square root of that number, I got the RMSE value of *0.8616*, which is better than the RMSE from the SVD Model (*0.8736*)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Predict the Ratings\n",
"The next step is to actually predict the ratings a random user will give to a random movie. Below I apply the freshly trained deep learning model for all the users and all the movies, using 100 dimensional embeddings for each of them. I also load pre-trained weights from *[weights.h5](https://github.com/khanhnamle1994/movielens/blob/master/weights.h5)* for the model."
]
},
{
Expand All @@ -320,6 +331,13 @@
"trained_model.load_weights('weights.h5')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As mentioned above, my random test user is has ID 2000."
]
},
{
"cell_type": "code",
"execution_count": 15,
Expand Down Expand Up @@ -381,6 +399,13 @@
"users[users['user_id'] == TEST_USER]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here I define the function to predict user's rating of unrated items, using the *rate* function inside the CFModel class in *[CFModel.py](https://github.com/khanhnamle1994/movielens/blob/master/CFModel.py)*."
]
},
{
"cell_type": "code",
"execution_count": 16,
Expand All @@ -394,6 +419,13 @@
" return trained_model.rate(user_id - 1, movie_id - 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here I show the top 20 movies that user 2000 has already rated, including the *predictions* column showing the values that used 2000 would have rated based on the newly defined *predict_rating* function."
]
},
{
"cell_type": "code",
"execution_count": 17,
Expand Down Expand Up @@ -678,7 +710,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Recommend Movies"
"No surpise that these top movies all have 5-start rating. Some of the prediction values seem off (those with value 3.7, 3.8, 3.9 etc.)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Recommend Movies\n",
"Here I make a recommendation list of unrated 20 movies sorted by prediction value for user 2000. Let's see it."
]
},
{
Expand Down Expand Up @@ -923,7 +963,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion"
"## Conclusion\n",
"In this notebook, I showed how to use a simple deep learning approach to build a recommendation engine for the MovieLens 1M dataset. This model performed better than all the approaches I attempted before (content-based, user-item similarity collaborative filtering, SVD). I can certainly improve this model's performance by making it deeper with more linear and non-linear layers. I leave that task to you then!"
]
}
],
Expand Down

0 comments on commit b8d8bff

Please sign in to comment.