awesome-sentence-embedding

A curated list of pretrained sentence(and word) embedding models

About This Repo

well there are some awesome-lists for word embeddings and sentence embeddings, but all of them are outdated and more importantly incomplete
this repo will also be incomplete, but I try my best to find and include all the papers with pretrained models
this is not a typical awesome list because it has tables but I guess it's ok and much better than just a huge list
if you find any mistakes or find another paper or anything please send a pull request and help me to keep this list up to date
to be honest I'm not 100% sure how to represent this data and if you think there is a better way (for example by changing the table headers) please send a pull request and let us discuss it
enjoy!

General Framework

Almost all the sentence embeddings work like this:
Given some sort of word embeddings and an optional encoder (for example an LSTM) they obtain the contextualized word embeddings.
Then they define some sort of pooling (it can be as simple as last pooling).
Based on that they either use it directly for the supervised classification task (like infersent) or generate the target sequence (like skip-thought).
So, in general, we have many sentence embeddings that you have never heard of, you can simply do mean-pooling over any word embedding and it's a sentence embedding!

Word Embeddings

Note: don't worry about the language of the code, you can almost always (except for the subword models) just use the pretrained embedding table in the framework of your choice and ignore the training code

paper	training code	pretrained models
GloVe: Global Vectors for Word Representation	C(official)	GloVe
Efficient Estimation of Word Representations in Vector Space	C(official)	Word2Vec
Enriching Word Vectors with Subword Information	C++(official)	fastText
BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages	Python(official)	bpemb
ConceptNet 5.5: An Open Multilingual Graph of General Knowledge	Python(official)	Numberbatch
Non-distributional Word Vector Representations	Python(official)	WordFeat
Sparse Overcomplete Word Vector Representations	C++(official)	-
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks	C++(official) Pytorch	charNgram2vec
Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations	GO(official)	lexvec
Hash Embeddings for Efficient Word Representations	Keras(official) Pytorch	-
Dependency-Based Word Embeddings	C++(official) DL4J	word2vecf
Learning Word Meta-Embeddings	-	Meta-Emb(broken)
Dict2vec : Learning Word Embeddings using Lexical Dictionaries	C++(official)	Dict2vec
Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints	TF(official)	Attract-Repel
Siamese CBOW: Optimizing Word Embeddings for Sentence Representations	Theano(official) TF	Siamese CBOW
Offline bilingual word vectors, orthogonal transformations and the inverted softmax	Python(official)	-
From Paraphrase Database to Compositional Paraphrase Model and Back	Theano(official)	PARAGRAM
Poincaré Embeddings for Learning Hierarchical Representations	Pytorch(official)	-
Dynamic Meta-Embeddings for Improved Sentence Representations	Pytorch(official)	DME/CDME
WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic Models	-	RusVectōrēs
Swivel: Improving Embeddings by Noticing What's Missing	TF(official)	-

OOV Handling

Drop OOV words!
One OOV vector(unk vector)
ALaCarte: A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors
Mimick: Mimicking Word Embeddings using Subword RNNs

Contextualized Word Embeddings

Note: all the unofficial models can load the official pretrained models

paper	code	pretrained models
Learned in Translation: Contextualized Word Vectors	Pytorch(official) Keras	CoVe
Universal Language Model Fine-tuning for Text Classification	Pytorch(official)	ULMFit(English, Zoo)
Deep contextualized word representations	Pytorch(official) TF(official)	ELMO(AllenNLP, TF-Hub)
Improving Language Understanding by Generative Pre-Training	TF(official) Keras Pytorch	Transformer
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	TF(official) Pytorch Keras	BERT
Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation	Pytorch(official)	ELMo
Contextual String Embeddings for Sequence Labeling	Pytorch(official)	Flair

Pooling Methods

{Last, Mean, Max}-Pooling
Special Token Pooling (like BERT and OpenAI's Transformer)
SIF: A Simple but Tough-to-Beat Baseline for Sentence Embeddings
TF-IDF: Unsupervised Sentence Representations as Word Information Series: Revisiting TF--IDF
P-norm: Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations
DisC: A Compressed Sensing View of Unsupervised Text Embeddings, Bag-of-n-Grams, and LSTMs

Encoders

paper	code	name
An efficient framework for learning sentence representations	TF(official, pretrained)	Quick-Thought
Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm	Keras(official, pretrained) Pytorch(load_pretrained)	DeepMoji
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data	Pytorch(official, pretrained)	InferSent
Learning Joint Multilingual Sentence Representations with Neural Machine Translation	Pytorch(official, pretrained)	LASER
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond	Pytorch(official, pretrained)	LASER++
Learning general purpose distributed sentence representations via large scale multi-task learning	Pytorch(official, pretrained)	GenSen
Distributed Representations of Sentences and Documents	Pytorch Python(pretrained)	Doc2Vec
Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features	C++(official, pretrained)	Sent2Vec
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books	Theano(official, pretrained) TF(pretrained) Pytorch,Torch(load_pretrained)	SkipThought
Learning to Generate Reviews and Discovering Sentiment	TF(official, pretrained) Pytorch(load_pretrained) Pytorch(pretrained)	SentimentNeuron
From Word Embeddings to Document Distances	C,Python(official)	Word Mover's Distance
Word Mover's Embedding: From Word2Vec to Document Embedding	C,Python(official)	WordMoversEmbeddings
Convolutional Neural Network for Universal Sentence Embeddings	Theano(official, pretrained)	CSE
Towards Universal Paraphrastic Sentence Embeddings	Theano(official, pretrained)	ParagramPhrase
Charagram: Embedding Words and Sentences via Character n-grams	Theano(official, pretrained)	Charagram
Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings	Theano(official, pretrained)	GRAN
Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations	Theano(official, pretrained)	para-nmt
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models	Theano(official, pretrained) Pytorch(load_pretrained)	VSE
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives	Pytorch(official, pretrained)	VSE++
End-Task Oriented Textual Entailment via Deep Explorations of Inter-Sentence Interactions	Theano(official, pretrained)	DEISTE
Learning Universal Sentence Representations with Mean-Max Attention Autoencoder	TF(official, pretrained)	Mean-MaxAAE
BioSentVec: creating sentence embeddings for biomedical texts	Python(official, pretrained)	BioSentVec
DisSent: Learning Sentence Representations from Explicit Discourse Relations	Pytorch(official, email_for_pretrained)	DisSent
Universal Sentence Encoder	TF-Hub(official, pretrained)	USE
Learning Distributed Representations of Sentences from Unlabelled Data	Python(official)	FastSent
Embedding Text in Hyperbolic Spaces	TF(official)	HyperText
StarSpace: Embed All The Things!	C++(official)	StarSpace
A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks	Pytorch(official, pretrained)	HMTL

Evaluation

Misc

Word Embedding Dimensionality Selection: On the Dimensionality of Word Embedding

Vector Mapping

Articles

Code Less

papers here are just a paper and they don't have any released codes or pretrained models
are you sure? I have read the paper, googled the title, googled the title + github, and searched for the authors one by one and checked their pages, so yeah I'm 60% sure that they don't have anything! :))
I did these two months ago(Oct 2018), and they might have released their codes in this time, so If you found any of them let me know.
TOWARDS LANGUAGE AGNOSTIC UNIVERSAL REPRESENTATIONS
IS WASSERSTEIN ALL YOU NEED?
Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning
UNSUPERVISED SENTENCE EMBEDDING USING DOCUMENT STRUCTURE-BASED CONTEXT
CSE: Conceptual Sentence Embeddings based on Attention Model
Unsupervised Document Embedding With CNNs
Learning Generic Sentence Representations Using Convolutional Neural Networks
Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model
Looking for ELMo's friends: Sentence-Level Pretraining Beyond Language Modeling
ZERO-TRAINING SENTENCE EMBEDDING VIA ORTHOGONAL BASIS
Improving Sentence Representations with Multi-view Frameworks
UNSUPERVISED LEARNING OF SENTENCE REPRESENTATIONS USING SEQUENCE CONSISTENCY
FAKE SENTENCE DETECTION AS A TRAINING TASK FOR SENTENCE ENCODING
POINCARE´ GLOVE: HYPERBOLIC WORD EMBEDDINGS
A NON-LINEAR THEORY FOR SENTENCE EMBEDDING
NO TRAINING REQUIRED: EXPLORING RANDOM ENCODERS FOR SENTENCE CLASSIFICATION
VARIATIONAL AUTOENCODERS FOR TEXT MODELING WITHOUT WEAKENING THE DECODER
IMPROVING COMPOSITION OF SENTENCE EMBEDDINGS THROUGH THE LENS OF STATISTICAL RELATIONAL LEARNING

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

awesome-sentence-embedding

Table of Contents

About This Repo

General Framework

Word Embeddings

OOV Handling

Contextualized Word Embeddings

Pooling Methods

Encoders

Evaluation

Misc

Vector Mapping

Articles

Code Less

About

Releases

Packages

sunny371/awesome-sentence-embedding

Folders and files

Latest commit

History

Repository files navigation

awesome-sentence-embedding

Table of Contents

About This Repo

General Framework

Word Embeddings

OOV Handling

Contextualized Word Embeddings

Pooling Methods

Encoders

Evaluation

Misc

Vector Mapping

Articles

Code Less

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages