A locally running vector database with GraphQL API, with built-in methods to obtain embeddings from OpenAI and other services. Written in Kotlin using Spring, with Postgres as the storage. Vector similarity provided by ultra fast algorithms of the Faiss library.
One day I tried to get Pinecone access and they put me on a wait list! I'm too impatient, so I just created a vector database myself.
MaxVector is a vector database created for AI applications, so you can use it exactly like any other vector db - to store your embeddings and query closest ones by distance (using euclidean, cosine or inner product) or metadata and query it by using GraphQL for fine-grained field selection. Example of storing OpenAI embeddings for dog, shark and parrot using plain Python code:
url = "http://localhost:8080/graphql"
get_closest_vectors = """mutation storeEmbedding {
storeEmbedding(queries: ["dog", "shark", "parrot"]) {
status
error
count
}
}"""
payload = {"query": get_closest_vectors}
r = requests.post(url, json=payload)
json_data = r.json()
data = json_data["data"]["storeEmbedding"]
status = data["status"]
error = data["error"]
count = data["count"]
Oh, sure. It can obtain embeddings automagically for you! With GraphQL interface it's enough to call this mutation to obtain embedding vectors for a list of words from OpenAI straight into your database:
mutation storeEmbedding {
storeEmbedding(queries: ["dog", "shark", "parrot"]) {
status
error
count
}
}
Or it could lookup embeddings in your database by prompt, first obtaining the embedding from OpenAI, like here:
query findClosest {
findClosest(prompt: "house animal", k: 3, measure: COSINE) {
id
label
metadata {
example
}
embedding {
coords
}
}
}
If you prefer to query the DB by supplying all 2048 coordinates, you can of course do that, too:
query closestVectors {
findClosestByVector(vec:[1.5, 3.45, 32.3, (2045 more coords...)], k: 3, measure: EUCLIDEAN) {
id
label
}
}
Of course - more features will be added as necessary.
All you need is a local instance of Postgres with pgvector extension and a JAR build from this repo.
On any Debian-ish Linux install Postgres by:
$ sudo apt-get install postgresql
Then configure admin user and password:
$ sudo -u postgres psql postgres
And in Postgres shell:
postgres=# \password postgres
To set up password for the postgres user, which is empty by default. You can then exit Postgres shell, and create
a new database user - this is the user you will need to put in application.properties
file later:
$ sudo -u postgres createuser --interactive --password user12
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) y
Shall the new role be allowed to create more new roles? (y/n) n
Password:
Create a db - whatever name you choose you will need to use it in application.properties
file later:
$ sudo -u postgres createdb testdb -O user12
Finally edit your postgres config file to trust locally running MaxVector:
$ sudo vi /etc/postgresql/9.5/main/pg_hba.conf
Edit the file like this:
# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 127.0.0.1/32 trust
And restart postgres service:
$ sudo service postgresql restart
Go to pgvector GitHub and check the installation section there to install Vector extension.
Edit src\main\resources\application.properties
file, add openAIapiKey
line with your OpenAI API key:
spring.datasource.url=jdbc:postgresql://localhost:5432/<the name of the db>
spring.datasource.username=<the non-admin user you added>
spring.datasource.password=<the user's password>
openAIapiKey=sk-<your-openAPI-key>
Use Gradle.
Query the db with GraphQL, either by plain HTTP requests or any GraphQL client, like Apollo
or the built-in web interface on port 8080 of your machine.
Check src\main\resources\graphql\schema.gqls
for currently implemented queries and mutations.
Note that first insert into the database will determine dimensionality of the vectors it holds.
You can lookup k closest vectors by supplying its coordinates with float array and adding an optional measure (EUCLIDEAN, COSINE, INNER_PRODUCT). Defaults to EUCLIDEAN. For normalized vectors (like OpenAI embeddings) choose INNER_PRODUCT for speed.
query closestVectors {
findClosestByVector(vec: [1.5, 3.45, 32.3,...], k: 3, measure: COSINE) {
id
label
}
}
Lookup k closest vectors by obtaining the embedding first from selected embedding API:
query findClosest {
findClosest(prompt: "house animal", k: 3, measure: INNER_PRODUCT) {
id
label
metadata {
example
}
embedding {
coords
}
}
}
Get distances from selected vector:
query getDistance {
getDistance(vec: [243, 323, 23,...], measure: EUCLIDEAN)
}
You can store new vectors by obtaining the embedding first from selected embedding API:
mutation storeEmbedding {
storeEmbedding(queries: ["dog", "shark", "parrot"]) {
status
error
count
}
}
Vector with a specific ID can be updated by:
mutation update {
updateById(id: 23, vec: [2.3, 2.6,...], label: "New label")
}
To delete a vector with particular ID:
mutation deleteById {
deleteById(id: 1)
}
You can create an index with all available measures:
mutation cosineIndex {
createIndex(lists: 500, measure: COSINE) {
status
error
}
}