Vegetables

This is a collection of word embeddings repackaged for easy machine loading and human reading.

Each set of embeddings should come with the following files:

.tsv is a tab separated file where the
- (i) first column is the word/token,
- (ii) second column is the count (if the original pre-trained embedding didn't save any count, it will be set to -1),
- (iii) the third to the last columns form the actual embedding for the word/token in the first column.
.txt is the key words
- same as the first column in the .tsv file.
.npy is the word embedding that can be directly loaded with numpy
- same as the third to last columns in the .tsv file.
.pkl is a pickled file with its keys as the word/token and the count of the word/token.
- if the original pre-trained embedding didn't save any count, it will be set to -1

Usage

>>> import pickle 
>>> import numpy as np

>>> embeddings = np.load('hlbl.rcv1.original.50d.npy')
>>> tokens = [line.strip() for line in open('hlbl.rcv1.original.50d.txt')]
>>> embeddings[tokens.index('hello')]
array([-0.21167406, -0.04189226,  0.22745571, -0.09330438,  0.13239339,
        0.25136262, -0.01908735, -0.02557277,  0.0029353 , -0.06194451,
       -0.22384156,  0.04584747,  0.03227248, -0.13708033,  0.17901117,
       -0.01664691,  0.09400477,  0.06688628, -0.09019949, -0.06918809,
        0.08437972, -0.01485273, -0.12062263,  0.05024147, -0.00416972,
        0.04466985, -0.05316647,  0.00998635, -0.03696947,  0.10502578,
       -0.00190554,  0.03435732, -0.05715087, -0.06777468, -0.11803425,
        0.17845355,  0.18688948, -0.07509124, -0.16089943,  0.0396672 ,
       -0.05162677, -0.12486628, -0.03870481,  0.0928738 ,  0.06197058,
       -0.14603543,  0.04026282,  0.14052328,  0.1085517 , -0.15121481])

Monolingual

Pre-trained Embeddings	Type	Lang	Cite	Year	Bib	Kaggle Dataset
Senna (aka. C&W)	LM2	eng	Collobert et al. (aka. C&W)	2008/2011	bib/ bib	senna-embeddings
HLBL Embeddings (from Turian et al. 2011)	HLBL	eng	Mnih and Hinton	2009	bib	hlbl-embeddings
Turian Embeddings (aka scaled HLBL and C&W)	C&W, HLBL	eng	Turian et al.	2011	bib	turian-embeddings
Huang Embeddings (aka. Huang)	Huang	eng	Huang et al.	2012	bib	huang-embeddings
Word2Vec (News)	word2vec	eng	Mikolov et al.	2013	bib	google-word2vec
Word2Vec (Freebase)	word2vec	eng	Mikolov et al.	2013	bib	google-word2vec-freebase
morphoRNN	Huang, C&W	eng	Luong et al.	2013	bib	csrnn-embeddings
GloVe (6B)	GloVe	eng	Pennington et al.	2014	bib	stanford-glove-6b
GloVe (42B)	GloVe	eng	Pennington et al.	2014	bib	stanford-glove-42b
GloVe (840B)	GloVe	eng	Pennington et al.	2014	bib	stanford-glove-840b
GloVe (Twitter)	GloVe	eng	Pennington et al.	2014	bib	stanford-glove-twitter
COMPOSES	word2vec	eng	Baroni et al.	2014	bib	composes-embeddings
Dependency	word2vec	eng	Levy and Golberg	2014	bib	dependency-embeddings
Word2Vec (Shiroyagi)	word2vec	jap	Shiroyagi Corp.	2017	~	shiroyagi-word2vec

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vegetables

Usage

Monolingual

About

Releases

Packages

License

alvations/vegetables

Folders and files

Latest commit

History

Repository files navigation

Vegetables

Usage

Monolingual

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages