temp update latex

singhsatvik · Nov 22, 2017 · 4f47d28 · 4f47d28
1 parent 8052875
commit 4f47d28
Show file tree

Hide file tree

Showing 2 changed files with 10 additions and 0 deletions.
diff --git a/recsys/1_ALSWR.ipynb b/recsys/1_ALSWR.ipynb
@@ -698,15 +698,19 @@
     "\n",
     "We start by denoting our $d$ feature user into math by letting a user $u$ take the form of a $1 \\times d$-dimensional vector $\\textbf{x}_{u}$. These for often times referred to as latent vectors or low-dimensional embeddings. Similarly, an item *i* can be represented by a $1 \\times d$-dimensional vector $\\textbf{y}_{i}$. And the rating that we predict user $u$ will give for item $i$ is just the dot product of the two vectors\n",
     "\n",
+    "$$\n",
     "\\begin{align}\n",
     "\\hat r_{ui} &= \\textbf{x}_{u} \\textbf{y}_{i}^{T} = \\sum\\limits_{d} x_{ud}y_{di}\n",
     "\\end{align}\n",
+    "$$\n",
     "\n",
     "Where $\\hat r_{ui}$ represents our prediction for the true rating $r_{ui}$. Next, we will choose a objective function to minimize the square of the difference between all ratings in our dataset ($S$) and our predictions. This produces a objective function of the form:\n",
     "\n",
+    "$$\n",
     "\\begin{align}\n",
     "L &= \\sum\\limits_{u,i \\in S}( r_{ui} - \\textbf{x}_{u} \\textbf{y}_{i}^{T} )^{2} + \\lambda \\big( \\sum\\limits_{u} \\left\\Vert \\textbf{x}_{u} \\right\\Vert^{2} + \\sum\\limits_{i} \\left\\Vert \\textbf{y}_{i} \\right\\Vert^{2} \\big)\n",
     "\\end{align}\n",
+    "$$\n",
     "\n",
     "Note that we've added on two $L_{2}$ regularization terms, with $\\lambda$ controlling the strength at the end to prevent overfitting of the user and item vectors. $\\lambda$, is another hyperparameter that we'll have to search for to determine the best value. The concept of regularization can be a topic of itself, and if you're confused by this, consider checking out [this documentation](http://nbviewer.jupyter.org/github/ethen8181/machine-learning/blob/master/regularization/regularization.ipynb) that covers it a bit more."
    ]
@@ -717,13 +721,15 @@
    "source": [
     "Now that we formalize our objective function, we'll introduce the **Alternating Least Squares with Weighted Regularization (ALS-WR)** method for optimizing it. The way it works is we start by treating one set of latent vectors as constant. For this example, we'll pick the item vectors, $\\textbf{y}_{i}$. We then take the derivative of the loss function with respect to the other set of vectors, the user vectors, $\\textbf{x}_{u}$ and solve for the non-constant vectors (the user vectors).\n",
     "\n",
+    "$$\n",
     "\\begin{align}\n",
     "\\frac{\\partial L}{\\partial \\textbf{x}_{u}} \n",
     "&\\implies - 2 \\sum\\limits_{i}(r_{ui} - \\textbf{x}_{u} \\textbf{y}_{i}^{T} ) \\textbf{y}_{i} + 2 \\lambda \\textbf{x}_{u} = 0 \\\\\n",
     "&\\implies -(\\textbf{r}_{u} - \\textbf{x}_{u} Y^{T} )Y + \\lambda \\textbf{x}_{u} = 0 \\\\\n",
     "&\\implies \\textbf{x}_{u} (Y^{T}Y + \\lambda I) = \\textbf{r}_{u}Y \\\\\n",
     "&\\implies \\textbf{x}_{u} = \\textbf{r}_{u}Y (Y^{T}Y + \\lambda I)^{-1}\n",
     "\\end{align}\n",
+    "$$\n",
     "\n",
     "To clarify it a bit, let us assume that we have $m$ users and $n$ items, so our ratings matrix is $m \\times n$.\n",
     "\n",
@@ -733,10 +739,12 @@
     "\n",
     "Now comes the alternating part: With these newly updated user vectors in hand, in the next round, we hold them as constant, and take the derivative of the loss function with respect to the previously constant vectors (the item vectors). As the derivation for the item vectors is quite similar, we will simply list out the end formula:\n",
     "\n",
+    "$$\n",
     "\\begin{align}\n",
     "\\frac{\\partial L}{\\partial \\textbf{y}_{i}}\n",
     "&\\implies \\textbf{y}_{i} = \\textbf{r}_{i}X (X^{T}X + \\lambda I)^{-1}\n",
     "\\end{align}\n",
+    "$$\n",
     "\n",
     "Then we alternate back and forth and carry out this two-step process until convergence. The reason we alternate is, optimizing user latent vectors, $U$, and item latent vectors $V$ simultaneously is hard to solve. If we fix $U$ or $V$ and tackle one problem at a time, we potentially turn it into a easier sub-problem. Just to summarize it, ALS works by:\n",
     "\n",

diff --git a/trees/decision_tree.ipynb b/trees/decision_tree.ipynb
@@ -532,12 +532,14 @@
     "\n",
     "Gini Index is defined as:\n",
     "\n",
+    "$$\n",
     "\\begin{align*}\n",
     "I_G(t) &= \\sum_{i =1}^{C} p(i \\mid t) \\big(1-p(i \\mid t)\\big) \\nonumber \\\\ \n",
     "       &= \\sum_{i =1}^{C} p(i \\mid t) - p(i \\mid t)^2 \\nonumber \\\\ \n",
     "       &= \\sum_{i =1}^{C} p(i \\mid t) - \\sum_{i =1}^{C} p(i \\mid t)^2 \\nonumber \\\\ \n",
     "       &= 1 - \\sum_{i =1}^{C} p(i \\mid t)^2\n",
     "\\end{align*}\n",
+    "$$\n",
     "\n",
     "Compared to Entropy, the maximum value of the Gini index is 0.5, which occurs when the classes are perfectly balanced in a node. On the other hand, the minimum value of the Gini index is 0 and occurs when there is only one class represented in a node (A node with a lower Gini index is said to be more \"pure\").\n",
     "\n",