word2vec

Word2Vec implementation in numpy. Tried out Skip-Gram model on A Storm of Swords by R.R. Martin .
Dataset Link : https://www.kaggle.com/muhammedfathi/game-of-thrones-book-files#got2.txt

Word2Vec Architecture

Dimensions of Input Layer: V X 1 (vocabulary Size)
Dimensions of W1: V X Number of Dimensions of Embedding
Dimensions of Hidden Layer 1: Number of Dimnsions of Embedding X 1
Dimensions of W2: Number of Dimensions of Embedding X V
Dimensions of Output Layer: V X 1

Built With

Results

Epochs : 5
Total vocabulary size : 6633 words
Number of Dimensions : 10

To-do

CBOW Model
Negative Sampling
Try out for more epochs and larger dimensions.

Acknowledgments

Research paper on Word2Vec https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
A YouTube video by Jordan Boyd-Graber https://www.youtube.com/watch?v=QyrUentbkvw
Medium blog by Derik Chia which helped in the implementation https://towardsdatascience.com/an-implementation-guide-to-word2vec-using-numpy-and-google-sheets-13445eebd281
PDF explaining the math behind Word2Vec's loss function http://mccormickml.com/assets/word2vec/Alex_Minnaar_Word2Vec_Tutorial_Part_I_The_Skip-Gram_Model.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
Word2Vec.ipynb		Word2Vec.ipynb
output1.png		output1.png
word2vec_architecture.png		word2vec_architecture.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

word2vec

Word2Vec Architecture

Built With

Results

To-do

Acknowledgments

About

Releases

Packages

Languages

License

phankietit/word2vec

Folders and files

Latest commit

History

Repository files navigation

word2vec

Word2Vec Architecture

Built With

Results

To-do

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages