Skip to content

Word2Vec implementation(skip-gram model) using numpy and nltk.

License

Notifications You must be signed in to change notification settings

phankietit/word2vec

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

word2vec

Word2Vec implementation in numpy. Tried out Skip-Gram model on A Storm of Swords by R.R. Martin .
Dataset Link : https://www.kaggle.com/muhammedfathi/game-of-thrones-book-files#got2.txt


Word2Vec Architecture

Dimensions of Input Layer: V X 1 (vocabulary Size)
Dimensions of W1: V X Number of Dimensions of Embedding
Dimensions of Hidden Layer 1: Number of Dimnsions of Embedding X 1
Dimensions of W2: Number of Dimensions of Embedding X V
Dimensions of Output Layer: V X 1

Word2vec Architecture

Built With

Results

Epochs : 5
Total vocabulary size : 6633 words
Number of Dimensions : 10

Output for a set of words

To-do

  • CBOW Model
  • Negative Sampling
  • Try out for more epochs and larger dimensions.

Acknowledgments

About

Word2Vec implementation(skip-gram model) using numpy and nltk.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%