Skip to content

Commit

Permalink
first pass at clustering notes
Browse files Browse the repository at this point in the history
Kavan Sikand committed May 10, 2014

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
1 parent 0ae0873 commit 704d124
Showing 2 changed files with 21 additions and 1 deletion.
Binary file modified 189cheatSheet.pdf
Binary file not shown.
22 changes: 21 additions & 1 deletion 189cheatSheet.tex
Original file line number Diff line number Diff line change
@@ -257,8 +257,28 @@ \subsection{Neural Networks}
Then the derivative of error w.r.t. any of the weights is
\[\frac{\delta e(w)}{\delta w_{ij}^{(l)}} = \delta_j ^{(l)} x_i^{(l-1)}\].
giving us the gradient we wanted.
\newpage

\vfill
\columnbreak
\subsection{Clustering}
Unsupervised Learning (no labels). Two main types:
\begin{itemize}
\item {\bf Hierarchical}:
\begin{itemize}
\item Agglomerative: Start with n points, merge 2 closest clusters using some measure, such as: Single-link (closest pair), Complete-link (furthest pair), Average-link (average of all pairs), Centroid (centroid distance).
\item Divisive: Start with single cluster, recursively divide clusters. Less popular
\end{itemize}
\item {\bf Partitioning}: Partition the data into a K mutually exclusive exhaustive groups (i.e. encode k=C(i)). Iteratively reallocate to minimize loss function, like $W(C)=\frac{1}{2} \sum_{k=1}^K \sum_{C(i)=k} \sum_{C(i')=k} d(x_i, x_i')$.
\\We can't minimize over the entire loss function so just do greedy iterative descent. This ends up being {\bf K-means}: Choose clusters at random, calculate centroid of each cluster, reallocate objects to nearest centroid, repeat. Finds local minimum of W(C), not global.
\end{itemize}
{\bf Vector Quantization}: Use representative prototype vectors to simplify representations of signals. Use clustering to find prototype vectors.
\\{\bf Parametric Discriminative Clustering (Mixture Models)}: Assume PDF is made up of multiple gaussians with different centers. Then use EM to compute this model.
\\E Step: $P(\mu_i | x_k) = \frac{P(\mu_i) P(x_k | mu_i)}{\sum_j P(\mu_j) P(x_j|mu_j)}$
\\M Step: $P(c_i) = \frac{1}{n_e} \sum_{k=1}^{n_e} P(\mu_i | x_k)$. Now update mu and sigma:\\$\mu_i = \frac{\sum_k x_k P(\mu_i|x_k)}{\sum_k P(\mu_i | x_k)}$
\\$\sigma_i^2=\frac{\sum_k (x_k-\mu_i)^2 P(\mu_i|x_k)}{\sum_k P(\mu_i | x_k)}$.
\\{\bf Nonparametric Discriminative Clustering}: Histogram, Kernel Density Estimation.
\\Kernel: $P(x) = \frac{1}{n} \sum K(x-x_i)$, s.t. K is normalized, symmetric, and $\lim_{||x|| \rightarrow \infty} ||x||^d K(x) = 0$.
\newpage

% You can even have references
\rule{0.3\linewidth}{0.25pt}

0 comments on commit 704d124

Please sign in to comment.