Semantic Embedding Template

Minimal example for doing semantic embedding, search, and clustering completely offline in NodeJS.

gpt4all Node.js for converting text to semantic vectors
Vectra local vector database, single file JSON. Supports fast querying.
Clustering using basic ml-kmeans

This is a sandbox for exploring language model based projects, a collection of useful snippets that I copy into other projects, which is why it's super minimal/is just node scripts.

Setup

Run pnpm install.

Simple embedding

example-simple-embedding/index.js

pnpm simple-embedding

This is the simplest example. Given an array of text, you can get semantic vectors back:

const embeddings = new Embeddings({ id: 'simple-embeddings' })
await embeddings.init()

await embeddings.insertText(['coffee shop', 'wifi', 'hard work', 'love peace & joy, relaxation'])
console.log(await embeddings.getTextMap())

You can do search with this, by turning the search string into a semantic vector and using vectra to find the closest vectors to it.

const results = await embeddings.search('coffee')
console.log(results.map(item => [item.item.metadata.text, item.score]))
// [
//   [ 'coffee shop', 0.8214959697396015 ],
//   [ 'wifi', 0.711907901740376 ],
//   [ 'hard work', 0.6709908415581982 ],
//   [ 'love peace & joy, relaxation', 0.6495931802131457 ]
// ]

OpenAI embedding

example-openai-embedding/index.js

pnpm openai-embedding

Exactly the same as the example above, but is hooked up to the OpenAI API. Requires setting the env variable OPEN_API_KEY. You can do by creating a .env file:

OPEN_API_KEY=<your_key>

LM Studio embedding

example-lmstudio-embedding/index.js

Exactly the same as the example above, but is hooked up to the OpenAI compatible API provided by LM Studio, which runs entirely locally.

Install LM Studio.
Download the nomic-ai/nomic-embed-text-v1.5-GGUF model (you can do this from within the Discover tab of LM Studio).
From the Developer tab, load the model and start the local server.
Run pnpm lmstudio-embedding (or npm run lmstudio-embedding).

You can change which embedding model the script uses by editing the embed() function in lib/embeddings-lmstudio.js.

Clustering

example-clustering/index.js

pnpm clustering

Takes vectors and clusters them using k-means. Either you tell it how many clusters to use, or run it for ~100 cluster sizes and log the error (this is the "elbow method", you keep running it as long as the error goes down, until it starts climbing again, that's the optimal).

Example cluster output, given a cluster size of 3. You can see one cluster is "happy" ? One is "coffee/work" and the unrelated concept is in its own cluster.

// result:
// [
//   [ '😄', '❤️', '😊' ],
//   [ 'coffee shop', 'wifi', 'hard work', '☕' ],
//   [ 'love peace & joy, relaxation' ]
// ]

Appendix

running various models directly in JS https://github.com/xenova/transformers.js?tab=readme-ov-file#installation
Open image models
- https://github.com/vikhyat/moondream
- can convert image to text, and then semantically search that, in order to implement "CTRL+F" for images
would be neat to add an example for how to "reverse" embeddings. Like Linus's notebook example?

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.data		.data
example-clustering		example-clustering
example-lmstudio-embedding		example-lmstudio-embedding
example-openai-embedding		example-openai-embedding
example-simple-embedding		example-simple-embedding
lib		lib
.gitignore		.gitignore
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Embedding Template

Setup

Simple embedding

OpenAI embedding

LM Studio embedding

Clustering

Appendix

About

Releases

Packages

Contributors 2

Languages

OmarShehata/semantic-embedding-template

Folders and files

Latest commit

History

Repository files navigation

Semantic Embedding Template

Setup

Simple embedding

OpenAI embedding

LM Studio embedding

Clustering

Appendix

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages