first pass at clustering notes

sharma409 · May 10, 2014 · 704d124 · 704d124
1 parent 0ae0873
commit 704d124
Showing 2 changed files with 21 additions and 1 deletion.
diff --git a/189cheatSheet.pdf b/189cheatSheet.pdf
diff --git a/189cheatSheet.tex b/189cheatSheet.tex
@@ -257,8 +257,28 @@ \subsection{Neural Networks}
 Then the derivative of error w.r.t. any of the weights is 
 \[\frac{\delta e(w)}{\delta w_{ij}^{(l)}} = \delta_j ^{(l)} x_i^{(l-1)}\].
 giving us the gradient we wanted.
-\newpage
 
+\vfill
+\columnbreak
+\subsection{Clustering}
+Unsupervised Learning (no labels). Two main types:
+\begin{itemize}
+	\item {\bf Hierarchical}:
+	\begin{itemize}
+		\item Agglomerative: Start with n points, merge 2 closest clusters using some measure, such as: Single-link (closest pair), Complete-link (furthest pair), Average-link (average of all pairs), Centroid (centroid distance).
+		\item Divisive: Start with single cluster, recursively divide clusters. Less popular
+	\end{itemize}
+	\item {\bf Partitioning}: Partition the data into a K mutually exclusive exhaustive groups (i.e. encode k=C(i)). Iteratively reallocate to minimize loss function, like $W(C)=\frac{1}{2} \sum_{k=1}^K \sum_{C(i)=k} \sum_{C(i')=k} d(x_i, x_i')$. 
+	\\We can't minimize over the entire loss function so just do greedy iterative descent. This ends up being {\bf K-means}: Choose clusters at random, calculate centroid of each cluster, reallocate objects to nearest centroid, repeat. Finds local minimum of W(C), not global.
+\end{itemize}
+{\bf Vector Quantization}: Use representative prototype vectors to simplify representations of signals. Use clustering to find prototype vectors.
+\\{\bf Parametric Discriminative Clustering (Mixture Models)}: Assume PDF is made up of multiple gaussians with different centers. Then use EM to compute this model. 
+\\E Step: $P(\mu_i | x_k) = \frac{P(\mu_i) P(x_k | mu_i)}{\sum_j P(\mu_j) P(x_j|mu_j)}$
+\\M Step: $P(c_i) = \frac{1}{n_e} \sum_{k=1}^{n_e} P(\mu_i | x_k)$. Now update mu and sigma:\\$\mu_i = \frac{\sum_k x_k P(\mu_i|x_k)}{\sum_k P(\mu_i | x_k)}$
+	\\$\sigma_i^2=\frac{\sum_k (x_k-\mu_i)^2 P(\mu_i|x_k)}{\sum_k P(\mu_i | x_k)}$.
+\\{\bf Nonparametric Discriminative Clustering}: Histogram, Kernel Density Estimation. 
+\\Kernel: $P(x) = \frac{1}{n} \sum K(x-x_i)$, s.t. K is normalized, symmetric, and $\lim_{||x|| \rightarrow \infty} ||x||^d K(x) = 0$.
+\newpage
 
 % You can even have references
 \rule{0.3\linewidth}{0.25pt}