From d2f7447258aa0638e5eb32aa3522548a238ebdfb Mon Sep 17 00:00:00 2001 From: VladKha Date: Thu, 24 May 2018 08:35:45 +0300 Subject: [PATCH] Edits in "Properties of word embeddings" --- 5- Sequence Models/Readme.md | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/5- Sequence Models/Readme.md b/5- Sequence Models/Readme.md index 48e899a9..11720086 100644 --- a/5- Sequence Models/Readme.md +++ b/5- Sequence Models/Readme.md @@ -442,30 +442,31 @@ Here are the course summary as its given on the course [link](https://www.course - In the word embeddings task, we are getting a vector say from e1 to e300 for each word in our vocabulary. We will discuss the algorithm in the next sections. #### Properties of word embeddings -- One of the most fascinating properties of word embeddings is that they can also help with analogy reasoning. Analogy reasoning is one of the most important applications of NLP. +- One of the most fascinating properties of word embeddings is that they can also help with analogy reasoning. While analogy reasoning may not be by itself the most important NLP application, but it might help convey a sense of what these word embeddings can do. - Analogies example: - - Given this word embeddings table: - - ![](Images/32.png) + - Given this word embeddings table: + ![](Images/32.png) - Can we conclude this relation: - Man ==> Woman - King ==> ?? - Lets subtract eMan from eWoman. This will equal the vector `[-2 0 0 0]` - Similar eKing - eQueen = `[-2 0 0 0]` - - So the difference is about the gender in both. - - ![](Images/33.png) + - So the difference is about the gender in both. + ![](Images/33.png) - This vector represents the gender. - - This drawing is 2D visualization of the 4D vector that has been extracted by t-SNE algorithm. It was drawing for just clarification! Don't rely on t-SNE algorithm in finding parallels. + - This drawing is a 2D visualization of the 4D vector that has been extracted by a t-SNE algorithm. It's a drawing just for visualization. Don't rely on the t-SNE algorithm for finding parallels. - So we can reformulate the problem to find: - eMan - eWoman ≈ eKing - e?? - - It can also represented mathematically by: - - ![](Images/34.png) + - It can also be represented mathematically by: + ![](Images/34.png) - It turns out that eQueen is the best solution here that gets the the similar vector. -- Cosine similarity: - - Equation: - - ![](Images/35.png) +- Cosine similarity - the most commonly used similarity function: + - Equation: + ![](Images/35.png) - $$\text{CosineSimilarity(u, v)} = \frac {u . v} {||u||_2 ||v||_2} = cos(\theta)$$ - - The top part represents the inner product of `u` and `v` vectors. That will be large if the vectors are so similar. - - We can use this equation to calculate the similarities between word embeddings and on the analogy problem where `u` = ew and `v` = eking - eman + ewoman + - The top part represents the inner product of `u` and `v` vectors. It will be large if the vectors are very similar. +- You can also use Euclidean distance as a similarity function (but it rather measures a dissimilarity, so you should take it with negative sign). +- We can use this equation to calculate the similarities between word embeddings and on the analogy problem where `u` = ew and `v` = eking - eman + ewoman #### Embedding matrix - When you implement an algorithm to learn a word embedding, what you end up learning is an **embedding matrix**.