Skip to content

OmarShehata/semantic-embedding-template

Repository files navigation

Semantic Embedding Template

Minimal example for doing semantic embedding, search, and clustering completely offline in NodeJS.

  • gpt4all Node.js for converting text to semantic vectors
  • Vectra local vector database, single file JSON. Supports fast querying.
  • Clustering using basic ml-kmeans

This is a sandbox for exploring language model based projects, a collection of useful snippets that I copy into other projects, which is why it's super minimal/is just node scripts.

Setup

Run pnpm install.

Simple embedding

example-simple-embedding/index.js

pnpm simple-embedding

This is the simplest example. Given an array of text, you can get semantic vectors back:

const embeddings = new Embeddings({ id: 'simple-embeddings' })
await embeddings.init()

await embeddings.insertText(['coffee shop', 'wifi', 'hard work', 'love peace & joy, relaxation'])
console.log(await embeddings.getTextMap())

You can do search with this, by turning the search string into a semantic vector and using vectra to find the closest vectors to it.

const results = await embeddings.search('coffee')
console.log(results.map(item => [item.item.metadata.text, item.score]))
// [
//   [ 'coffee shop', 0.8214959697396015 ],
//   [ 'wifi', 0.711907901740376 ],
//   [ 'hard work', 0.6709908415581982 ],
//   [ 'love peace & joy, relaxation', 0.6495931802131457 ]
// ]

OpenAI embedding

example-openai-embedding/index.js

pnpm openai-embedding

Exactly the same as the example above, but is hooked up to the OpenAI API. Requires setting the env variable OPEN_API_KEY. You can do by creating a .env file:

OPEN_API_KEY=<your_key>

LM Studio embedding

example-lmstudio-embedding/index.js

Exactly the same as the example above, but is hooked up to the OpenAI compatible API provided by LM Studio, which runs entirely locally.

  1. Install LM Studio.
  2. Download the nomic-ai/nomic-embed-text-v1.5-GGUF model (you can do this from within the Discover tab of LM Studio).
  3. From the Developer tab, load the model and start the local server.
  4. Run pnpm lmstudio-embedding (or npm run lmstudio-embedding).

You can change which embedding model the script uses by editing the embed() function in lib/embeddings-lmstudio.js.

Clustering

example-clustering/index.js

pnpm clustering

Takes vectors and clusters them using k-means. Either you tell it how many clusters to use, or run it for ~100 cluster sizes and log the error (this is the "elbow method", you keep running it as long as the error goes down, until it starts climbing again, that's the optimal).

Example cluster output, given a cluster size of 3. You can see one cluster is "happy" ? One is "coffee/work" and the unrelated concept is in its own cluster.

// result:
// [
//   [ '😄', '❤️', '😊' ],
//   [ 'coffee shop', 'wifi', 'hard work', '☕' ],
//   [ 'love peace & joy, relaxation' ]
// ]

Appendix

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published