SimplerVectors is a straightforward, beginner-friendly vector database project designed for efficiently handling and querying large-scale high-dimensional data vectors. This project is especially suited for applications like language models in retrieval-augmented generation (RAG) projects. Currently in its early stages, SimplerVectors aims to achieve high performance and scalability, making it ideal for managing very large datasets.
- Efficient Storage: Leverages NumPy's memory-mapped files to manage large datasets efficiently.
- Simple Design: The system is designed to be easy to understand and use, even for developers new to vector databases.
- Vector Normalization and Search: Supports automatic vector normalization and cosine similarity searches to find the top similar vectors.
- Scalability in Development: Focuses on enhancements for handling growing data volumes and increasing query demands as development progresses.
Add vectors to the database with optional metadata:
Perform a cosine similarity search to find the top 5 vectors similar to a given query vector:
Contributions are welcome! This project is still in its early stages, and your input can help shape its future. Please fork the repository, create your feature branch, and submit pull requests.
This project is licensed under the MIT License - see the LICENSE.md file for details.
Stay Tuned... something big is coming soon !