universal-sentence-encoder

Kangyi Zhang

Apr 16, 2020

737c722 · Apr 16, 2020

Name	Name	Last commit message	Last commit date
parent directory ..
demo	demo	Actually use TFHub for USE and toxicity models. (tensorflow#381 )	Dec 20, 2019
images	images	Make it possible to use encoder separately (tensorflow#150 )	Feb 15, 2019
src	src	Fix property renaming issue (tensorflow#391 )	Jan 10, 2020
.gitignore	.gitignore	Add simple sentence clustering demo for the Universal Sentence Encode…	Feb 7, 2019
.npmignore	.npmignore	Ignore `.yalc/` folder when we publish to npm (tensorflow#203 )	May 13, 2019
README.md	README.md	Improve README's. (tensorflow#429 )	Mar 27, 2020
cloudbuild.yml	cloudbuild.yml	Add linting rules for tfjs-models. (tensorflow#333 )	Oct 25, 2019
package.json	package.json	update yarn.lock (tensorflow#436 )	Apr 16, 2020
rollup.config.js	rollup.config.js	Add license to resnet.ts in posenet. (tensorflow#258 )	Jul 11, 2019
run_tests.ts	run_tests.ts	Add the Universal Sentence Encoder lite. (tensorflow#139 )	Feb 4, 2019
tsconfig.json	tsconfig.json	Add the Universal Sentence Encoder lite. (tensorflow#139 )	Feb 4, 2019
tslint.json	tslint.json	Add linting rules for tfjs-models. (tensorflow#333 )	Oct 25, 2019
yarn.lock	yarn.lock	update yarn.lock (tensorflow#436 )	Apr 16, 2020

README.md

Universal Sentence Encoder lite

The Universal Sentence Encoder (Cer et al., 2018) (USE) is a model that encodes text into 512-dimensional embeddings. These embeddings can then be used as inputs to natural language processing tasks such as sentiment classification and textual similarity analysis.

This module is a TensorFlow.js GraphModel converted from the USE lite (module on TFHub), a lightweight version of the original. The lite model is based on the Transformer (Vaswani et al, 2017) architecture, and uses an 8k word piece vocabulary.

In this demo we embed six sentences with the USE, and render their self-similarity scores in a matrix (redder means more similar):

The matrix shows that USE embeddings can be used to cluster sentences by similarity.

The sentences (taken from the TensorFlow Hub USE lite colab):

I like my phone.
Your cellphone looks great.
How old are you?
What is your age?
An apple a day, keeps the doctors away.
Eating strawberries is healthy.

Installation

Using yarn:

$ yarn add @tensorflow/tfjs @tensorflow-models/universal-sentence-encoder

Using npm:

$ npm install @tensorflow/tfjs @tensorflow-models/universal-sentence-encoder

Usage

To import in npm:

const use = require('@tensorflow-models/universal-sentence-encoder');

or as a standalone script tag:

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/universal-sentence-encoder"></script>

Then:

// Load the model.
use.load().then(model => {
  // Embed an array of sentences.
  const sentences = [
    'Hello.',
    'How are you?'
  ];
  model.embed(sentences).then(embeddings => {
    // `embeddings` is a 2D tensor consisting of the 512-dimensional embeddings for each sentence.
    // So in this example `embeddings` has the shape [2, 512].
    embeddings.print(true /* verbose */);
  });
});

To use the Tokenizer separately:

use.loadTokenizer().then(tokenizer => {
  tokenizer.encode('Hello, how are you?'); // [341, 4125, 8, 140, 31, 19, 54]
});

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

universal-sentence-encoder

universal-sentence-encoder

README.md

Universal Sentence Encoder lite

Installation

Usage

Files

universal-sentence-encoder

Directory actions

More options

Directory actions

More options

Latest commit

History

universal-sentence-encoder

Folders and files

parent directory

README.md

Universal Sentence Encoder lite

Installation

Usage