Deep Learning Notebook Finished

vcheng728 · Apr 17, 2018 · b8d8bff · b8d8bff
1 parent 6be40de
commit b8d8bff
Show file tree

Hide file tree

Showing 2 changed files with 92 additions and 10 deletions.
diff --git a/.ipynb_checkpoints/Deep_Learning_Model-checkpoint.ipynb b/.ipynb_checkpoints/Deep_Learning_Model-checkpoint.ipynb
@@ -27,7 +27,9 @@
   {
    "cell_type": "code",
    "execution_count": 2,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "# Import libraries\n",
@@ -277,7 +279,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Root Mean Square Error"
+    "### Root Mean Square Error\n",
+    "During the training process above, I saved the model weights each time the validation loss has improved. Thus, I can use that value to calculate the best validation Root Mean Square Error."
    ]
   },
   {
@@ -303,7 +306,15 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Predict the Ratings"
+    "The best validation loss is *0.7424* at epoch 17. Taking the square root of that number, I got the RMSE value of *0.8616*, which is better than the RMSE from the SVD Model (*0.8736*)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Predict the Ratings\n",
+    "The next step is to actually predict the ratings a random user will give to a random movie. Below I apply the freshly trained deep learning model for all the users and all the movies, using 100 dimensional embeddings for each of them. I also load pre-trained weights from *[weights.h5](https://github.com/khanhnamle1994/movielens/blob/master/weights.h5)* for the model."
    ]
   },
   {
@@ -320,6 +331,13 @@
     "trained_model.load_weights('weights.h5')"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As mentioned above, my random test user is has ID 2000."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 15,
@@ -381,6 +399,13 @@
     "users[users['user_id'] == TEST_USER]"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here I define the function to predict user's rating of unrated items, using the *rate* function inside the CFModel class in *[CFModel.py](https://github.com/khanhnamle1994/movielens/blob/master/CFModel.py)*."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 16,
@@ -394,6 +419,13 @@
     "    return trained_model.rate(user_id - 1, movie_id - 1)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here I show the top 20 movies that user 2000 has already rated, including the *predictions* column showing the values that used 2000 would have rated based on the newly defined *predict_rating* function."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 17,
@@ -678,7 +710,15 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Recommend Movies"
+    "No surpise that these top movies all have 5-start rating. Some of the prediction values seem off (those with value 3.7, 3.8, 3.9 etc.)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Recommend Movies\n",
+    "Here I make a recommendation list of unrated 20 movies sorted by prediction value for user 2000. Let's see it."
    ]
   },
   {
@@ -923,7 +963,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Conclusion"
+    "## Conclusion\n",
+    "In this notebook, I showed how to use a simple deep learning approach to build a recommendation engine for the MovieLens 1M dataset. This model performed better than all the approaches I attempted before (content-based, user-item similarity collaborative filtering, SVD). I can certainly improve this model's performance by making it deeper with more linear and non-linear layers. I leave that task to you then!"
    ]
   }
  ],

diff --git a/Deep_Learning_Model.ipynb b/Deep_Learning_Model.ipynb
@@ -27,7 +27,9 @@
   {
    "cell_type": "code",
    "execution_count": 2,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "# Import libraries\n",
@@ -277,7 +279,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Root Mean Square Error"
+    "### Root Mean Square Error\n",
+    "During the training process above, I saved the model weights each time the validation loss has improved. Thus, I can use that value to calculate the best validation Root Mean Square Error."
    ]
   },
   {
@@ -303,7 +306,15 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Predict the Ratings"
+    "The best validation loss is *0.7424* at epoch 17. Taking the square root of that number, I got the RMSE value of *0.8616*, which is better than the RMSE from the SVD Model (*0.8736*)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Predict the Ratings\n",
+    "The next step is to actually predict the ratings a random user will give to a random movie. Below I apply the freshly trained deep learning model for all the users and all the movies, using 100 dimensional embeddings for each of them. I also load pre-trained weights from *[weights.h5](https://github.com/khanhnamle1994/movielens/blob/master/weights.h5)* for the model."
    ]
   },
   {
@@ -320,6 +331,13 @@
     "trained_model.load_weights('weights.h5')"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As mentioned above, my random test user is has ID 2000."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 15,
@@ -381,6 +399,13 @@
     "users[users['user_id'] == TEST_USER]"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here I define the function to predict user's rating of unrated items, using the *rate* function inside the CFModel class in *[CFModel.py](https://github.com/khanhnamle1994/movielens/blob/master/CFModel.py)*."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 16,
@@ -394,6 +419,13 @@
     "    return trained_model.rate(user_id - 1, movie_id - 1)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here I show the top 20 movies that user 2000 has already rated, including the *predictions* column showing the values that used 2000 would have rated based on the newly defined *predict_rating* function."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 17,
@@ -678,7 +710,15 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Recommend Movies"
+    "No surpise that these top movies all have 5-start rating. Some of the prediction values seem off (those with value 3.7, 3.8, 3.9 etc.)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Recommend Movies\n",
+    "Here I make a recommendation list of unrated 20 movies sorted by prediction value for user 2000. Let's see it."
    ]
   },
   {
@@ -923,7 +963,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Conclusion"
+    "## Conclusion\n",
+    "In this notebook, I showed how to use a simple deep learning approach to build a recommendation engine for the MovieLens 1M dataset. This model performed better than all the approaches I attempted before (content-based, user-item similarity collaborative filtering, SVD). I can certainly improve this model's performance by making it deeper with more linear and non-linear layers. I leave that task to you then!"
    ]
   }
  ],