Word2Vec implementation in numpy. Tried out Skip-Gram model on A Storm of Swords
by R.R. Martin
.
Dataset Link : https://www.kaggle.com/muhammedfathi/game-of-thrones-book-files#got2.txt
Dimensions of Input Layer: V X 1
(vocabulary Size)
Dimensions of W1: V X Number of Dimensions of Embedding
Dimensions of Hidden Layer 1: Number of Dimnsions of Embedding X 1
Dimensions of W2: Number of Dimensions of Embedding X V
Dimensions of Output Layer: V X 1
Epochs : 5
Total vocabulary size : 6633
words
Number of Dimensions : 10
- CBOW Model
- Negative Sampling
- Try out for more epochs and larger dimensions.
- Research paper on Word2Vec https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
- A YouTube video by Jordan Boyd-Graber https://www.youtube.com/watch?v=QyrUentbkvw
- Medium blog by Derik Chia which helped in the implementation https://towardsdatascience.com/an-implementation-guide-to-word2vec-using-numpy-and-google-sheets-13445eebd281
- PDF explaining the math behind Word2Vec's loss function http://mccormickml.com/assets/word2vec/Alex_Minnaar_Word2Vec_Tutorial_Part_I_The_Skip-Gram_Model.pdf