Skip to content

hicder/muopdb

Repository files navigation

MuopDB - A vector database for machine learning

Introduction

MuopDB is a vector database for machine learning. Currently, it supports:

  • Index type: HNSW, IVF, SPANN. All on-disk with mmap.
  • Quantization: product quantization

Here are the plans for future MuopDB:

Phase 0 (Done)

  • Query path
    • Vector similarity search
    • Hierarchical Navigable Small Worlds (HNSW)
    • Product Quantization (PQ)
  • Indexing path
    • Support periodic offline indexing
  • Database Management
    • Doc-sharding & query fan-out with aggregator-leaf architecture
    • In-memory & disk-based storage with mmap

Phase 1 (Done)

  • Query & Indexing
    • Inverted File (IVF)
    • Improve locality for HNSW
    • SPANN

Phase 2 (Ongoing)

  • Query
    • Multiple index segments
    • L2 distance
  • Index
    • Optimizing index build time
    • Elias-Fano encoding for IVF
    • RabitQ quantization
  • Misc
    • Configs and documentations

Why MuopDB?

This is an educational project for me to learn Rust & vector database.

Building

Install prerequisites:

# macos
brew install hdf5 protobuf

export HDF5_DIR="$(brew --prefix hdf5)"

Build:

# from top-level workspace
cargo build --release

Test:

cargo test --release

Contributions

This project is done with TechCare Coaching. I am mentoring mentees who made contributions to this project.

About

MuopDB - A Vector Database

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published