Skip to content

Commit

Permalink
Edits in "Properties of word embeddings"
Browse files Browse the repository at this point in the history
  • Loading branch information
VladKha authored May 24, 2018
1 parent 9eb299f commit d2f7447
Showing 1 changed file with 14 additions and 13 deletions.
27 changes: 14 additions & 13 deletions 5- Sequence Models/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -442,30 +442,31 @@ Here are the course summary as its given on the course [link](https://www.course
- In the word embeddings task, we are getting a vector say from e<sub>1</sub> to e<sub>300</sub> for each word in our vocabulary. We will discuss the algorithm in the next sections.

#### Properties of word embeddings
- One of the most fascinating properties of word embeddings is that they can also help with analogy reasoning. Analogy reasoning is one of the most important applications of NLP.
- One of the most fascinating properties of word embeddings is that they can also help with analogy reasoning. While analogy reasoning may not be by itself the most important NLP application, but it might help convey a sense of what these word embeddings can do.
- Analogies example:
- Given this word embeddings table:
- ![](Images/32.png)
- Given this word embeddings table:
![](Images/32.png)
- Can we conclude this relation:
- Man ==> Woman
- King ==> ??
- Lets subtract e<sub>Man</sub> from e<sub>Woman</sub>. This will equal the vector `[-2 0 0 0]`
- Similar e<sub>King</sub> - e<sub>Queen</sub> = `[-2 0 0 0]`
- So the difference is about the gender in both.
- ![](Images/33.png)
- So the difference is about the gender in both.
![](Images/33.png)
- This vector represents the gender.
- This drawing is 2D visualization of the 4D vector that has been extracted by t-SNE algorithm. It was drawing for just clarification! Don't rely on t-SNE algorithm in finding parallels.
- This drawing is a 2D visualization of the 4D vector that has been extracted by a t-SNE algorithm. It's a drawing just for visualization. Don't rely on the t-SNE algorithm for finding parallels.
- So we can reformulate the problem to find:
- e<sub>Man</sub> - e<sub>Woman</sub> ≈ e<sub>King</sub> - e<sub>??</sub>
- It can also represented mathematically by:
- ![](Images/34.png)
- It can also be represented mathematically by:
![](Images/34.png)
- It turns out that e<sub>Queen</sub> is the best solution here that gets the the similar vector.
- Cosine similarity:
- Equation:
- ![](Images/35.png)
- Cosine similarity - the most commonly used similarity function:
- Equation:
![](Images/35.png)
- $$\text{CosineSimilarity(u, v)} = \frac {u . v} {||u||_2 ||v||_2} = cos(\theta)$$
- The top part represents the inner product of `u` and `v` vectors. That will be large if the vectors are so similar.
- We can use this equation to calculate the similarities between word embeddings and on the analogy problem where `u` = e<sub>w</sub> and `v` = e<sub>king</sub> - e<sub>man</sub> + e<sub>woman</sub>
- The top part represents the inner product of `u` and `v` vectors. It will be large if the vectors are very similar.
- You can also use Euclidean distance as a similarity function (but it rather measures a dissimilarity, so you should take it with negative sign).
- We can use this equation to calculate the similarities between word embeddings and on the analogy problem where `u` = e<sub>w</sub> and `v` = e<sub>king</sub> - e<sub>man</sub> + e<sub>woman</sub>

#### Embedding matrix
- When you implement an algorithm to learn a word embedding, what you end up learning is an **<u>embedding matrix</u>**.
Expand Down

0 comments on commit d2f7447

Please sign in to comment.