Skip to content

Commit

Permalink
temp update latex
Browse files Browse the repository at this point in the history
  • Loading branch information
ethen8181 committed Nov 22, 2017
1 parent 8052875 commit 4f47d28
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 0 deletions.
8 changes: 8 additions & 0 deletions recsys/1_ALSWR.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -698,15 +698,19 @@
"\n",
"We start by denoting our $d$ feature user into math by letting a user $u$ take the form of a $1 \\times d$-dimensional vector $\\textbf{x}_{u}$. These for often times referred to as latent vectors or low-dimensional embeddings. Similarly, an item *i* can be represented by a $1 \\times d$-dimensional vector $\\textbf{y}_{i}$. And the rating that we predict user $u$ will give for item $i$ is just the dot product of the two vectors\n",
"\n",
"$$\n",
"\\begin{align}\n",
"\\hat r_{ui} &= \\textbf{x}_{u} \\textbf{y}_{i}^{T} = \\sum\\limits_{d} x_{ud}y_{di}\n",
"\\end{align}\n",
"$$\n",
"\n",
"Where $\\hat r_{ui}$ represents our prediction for the true rating $r_{ui}$. Next, we will choose a objective function to minimize the square of the difference between all ratings in our dataset ($S$) and our predictions. This produces a objective function of the form:\n",
"\n",
"$$\n",
"\\begin{align}\n",
"L &= \\sum\\limits_{u,i \\in S}( r_{ui} - \\textbf{x}_{u} \\textbf{y}_{i}^{T} )^{2} + \\lambda \\big( \\sum\\limits_{u} \\left\\Vert \\textbf{x}_{u} \\right\\Vert^{2} + \\sum\\limits_{i} \\left\\Vert \\textbf{y}_{i} \\right\\Vert^{2} \\big)\n",
"\\end{align}\n",
"$$\n",
"\n",
"Note that we've added on two $L_{2}$ regularization terms, with $\\lambda$ controlling the strength at the end to prevent overfitting of the user and item vectors. $\\lambda$, is another hyperparameter that we'll have to search for to determine the best value. The concept of regularization can be a topic of itself, and if you're confused by this, consider checking out [this documentation](http://nbviewer.jupyter.org/github/ethen8181/machine-learning/blob/master/regularization/regularization.ipynb) that covers it a bit more."
]
Expand All @@ -717,13 +721,15 @@
"source": [
"Now that we formalize our objective function, we'll introduce the **Alternating Least Squares with Weighted Regularization (ALS-WR)** method for optimizing it. The way it works is we start by treating one set of latent vectors as constant. For this example, we'll pick the item vectors, $\\textbf{y}_{i}$. We then take the derivative of the loss function with respect to the other set of vectors, the user vectors, $\\textbf{x}_{u}$ and solve for the non-constant vectors (the user vectors).\n",
"\n",
"$$\n",
"\\begin{align}\n",
"\\frac{\\partial L}{\\partial \\textbf{x}_{u}} \n",
"&\\implies - 2 \\sum\\limits_{i}(r_{ui} - \\textbf{x}_{u} \\textbf{y}_{i}^{T} ) \\textbf{y}_{i} + 2 \\lambda \\textbf{x}_{u} = 0 \\\\\n",
"&\\implies -(\\textbf{r}_{u} - \\textbf{x}_{u} Y^{T} )Y + \\lambda \\textbf{x}_{u} = 0 \\\\\n",
"&\\implies \\textbf{x}_{u} (Y^{T}Y + \\lambda I) = \\textbf{r}_{u}Y \\\\\n",
"&\\implies \\textbf{x}_{u} = \\textbf{r}_{u}Y (Y^{T}Y + \\lambda I)^{-1}\n",
"\\end{align}\n",
"$$\n",
"\n",
"To clarify it a bit, let us assume that we have $m$ users and $n$ items, so our ratings matrix is $m \\times n$.\n",
"\n",
Expand All @@ -733,10 +739,12 @@
"\n",
"Now comes the alternating part: With these newly updated user vectors in hand, in the next round, we hold them as constant, and take the derivative of the loss function with respect to the previously constant vectors (the item vectors). As the derivation for the item vectors is quite similar, we will simply list out the end formula:\n",
"\n",
"$$\n",
"\\begin{align}\n",
"\\frac{\\partial L}{\\partial \\textbf{y}_{i}}\n",
"&\\implies \\textbf{y}_{i} = \\textbf{r}_{i}X (X^{T}X + \\lambda I)^{-1}\n",
"\\end{align}\n",
"$$\n",
"\n",
"Then we alternate back and forth and carry out this two-step process until convergence. The reason we alternate is, optimizing user latent vectors, $U$, and item latent vectors $V$ simultaneously is hard to solve. If we fix $U$ or $V$ and tackle one problem at a time, we potentially turn it into a easier sub-problem. Just to summarize it, ALS works by:\n",
"\n",
Expand Down
2 changes: 2 additions & 0 deletions trees/decision_tree.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -532,12 +532,14 @@
"\n",
"Gini Index is defined as:\n",
"\n",
"$$\n",
"\\begin{align*}\n",
"I_G(t) &= \\sum_{i =1}^{C} p(i \\mid t) \\big(1-p(i \\mid t)\\big) \\nonumber \\\\ \n",
" &= \\sum_{i =1}^{C} p(i \\mid t) - p(i \\mid t)^2 \\nonumber \\\\ \n",
" &= \\sum_{i =1}^{C} p(i \\mid t) - \\sum_{i =1}^{C} p(i \\mid t)^2 \\nonumber \\\\ \n",
" &= 1 - \\sum_{i =1}^{C} p(i \\mid t)^2\n",
"\\end{align*}\n",
"$$\n",
"\n",
"Compared to Entropy, the maximum value of the Gini index is 0.5, which occurs when the classes are perfectly balanced in a node. On the other hand, the minimum value of the Gini index is 0 and occurs when there is only one class represented in a node (A node with a lower Gini index is said to be more \"pure\").\n",
"\n",
Expand Down

0 comments on commit 4f47d28

Please sign in to comment.