RAGLite is a Python package for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite.
- β€οΈ Only lightweight and permissive open source dependencies (e.g., no PyTorch, LangChain, or PyMuPDF)
- π§ Choose any LLM provider with LiteLLM, including local llama-cpp-python models
- πΎ Either PostgreSQL or SQLite as a keyword & vector search database
- π Acceleration with Metal on macOS, and CUDA on Linux and Windows
- π PDF to Markdown conversion on top of pdftext and pypdfium2
- 𧬠Multi-vector chunk embedding with late chunking and contextual chunk headings
- βοΈ Optimal level 4 semantic chunking by solving a binary integer programming problem
- π Optimal closed-form linear query adapter by solving an orthogonal Procrustes problem
- π Hybrid search that combines the database's built-in keyword search (tsvector in PostgreSQL, FTS5 in SQLite) with their native vector search extensions (pgvector in PostgreSQL, sqlite-vec in SQLite)
- βοΈ Optional: conversion of any input document to Markdown with Pandoc
- β Optional: evaluation of retrieval and generation performance with Ragas
To install this package (including Metal acceleration if on macOS), run:
pip install raglite
To add CUDA 12.4 support, use the cuda124
extra:
pip install raglite[cuda124]
To add support for filetypes other than PDF, use the pandoc
extra:
pip install raglite[pandoc]
To add support for evaluation, use the ragas
extra:
pip install raglite[ragas]
- Configuring RAGLite
- Inserting documents
- Searching and Retrieval-Augmented Generation (RAG)
- Computing and using an optimal query adapter
- Evaluation of retrieval and generation
Tip
π§ RAGLite extends LiteLLM with support for llama.cpp models using llama-cpp-python. To select a llama.cpp model (e.g., from bartowski's collection), use a model identifier of the form "llama-cpp-python/<hugging_face_repo_id>/<filename>@<n_ctx>"
, where n_ctx
is an optional parameter that specifies the context size of the model.
Tip
πΎ You can create a PostgreSQL database for free in a few clicks at neon.tech (not sponsored).
First, configure RAGLite with your preferred PostgreSQL or SQLite database and any LLM supported by LiteLLM:
from raglite import RAGLiteConfig
# Example 'remote' config with a PostgreSQL database and an OpenAI LLM:
my_config = RAGLiteConfig(
db_url="postgresql://my_username:my_password@my_host:5432/my_database"
llm="gpt-4o-mini", # Or any LLM supported by LiteLLM.
embedder="text-embedding-3-large", # Or any embedder supported by LiteLLM.
)
# Example 'local' config with a SQLite database and a llama.cpp LLM:
my_config = RAGLiteConfig(
db_url="sqlite:///raglite.sqlite",
llm="llama-cpp-python/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/*Q4_K_M.gguf@8192",
embedder="llama-cpp-python/lm-kit/bge-m3-gguf/*F16.gguf",
)
Tip
βοΈ To insert documents other than PDF, install the pandoc
extra with pip install raglite[pandoc]
.
Next, insert some documents into the database. RAGLite will take care of the conversion to Markdown, optimal level 4 semantic chunking, and multi-vector embedding with late chunking:
# Insert documents:
from pathlib import Path
from raglite import insert_document
insert_document(Path("On the Measure of Intelligence.pdf"), config=my_config)
insert_document(Path("Special Relativity.pdf"), config=my_config)
Now, you can search for chunks with keyword search, vector search, or a hybrid of the two. You can also answer questions with RAG and the search method of your choice (hybrid
is the default):
# Search for chunks:
from raglite import hybrid_search, keyword_search, vector_search
prompt = "How is intelligence measured?"
results_vector = vector_search(prompt, num_results=5, config=my_config)
results_keyword = keyword_search(prompt, num_results=5, config=my_config)
results_hybrid = hybrid_search(prompt, num_results=5, config=my_config)
# Answer questions with RAG:
from raglite import rag
prompt = "What does it mean for two events to be simultaneous?"
stream = rag(prompt, search=hybrid_search, config=my_config)
for update in stream:
print(update, end="")
RAGLite can compute and apply an optimal closed-form query adapter to the prompt embedding to improve the output quality of RAG. To benefit from this, first generate a set of evals with insert_evals
and then compute and store the optimal query adapter with update_query_adapter
:
# Improve RAG with an optimal query adapter:
from raglite import insert_evals, update_query_adapter
insert_evals(num_evals=100, config=my_config)
update_query_adapter(config=my_config)
If you installed the ragas
extra, you can use RAGLite to answer the evals and then evaluate the quality of both the retrieval and generation steps of RAG using Ragas:
# Evaluate retrieval and generation:
from raglite import answer_evals, evaluate, insert_evals
insert_evals(num_evals=100, config=my_config)
answered_evals_df = answer_evals(num_evals=10, config=my_config)
evaluation_df = evaluate(answered_evals_df, config=my_config)
Prerequisites
1. Set up Git to use SSH
- Generate an SSH key and add the SSH key to your GitHub account.
- Configure SSH to automatically load your SSH keys:
cat << EOF >> ~/.ssh/config Host * AddKeysToAgent yes IgnoreUnknown UseKeychain UseKeychain yes ForwardAgent yes EOF
2. Install Docker
- Install Docker Desktop.
- Linux only:
- Export your user's user id and group id so that files created in the Dev Container are owned by your user:
cat << EOF >> ~/.bashrc export UID=$(id --user) export GID=$(id --group) EOF
- Export your user's user id and group id so that files created in the Dev Container are owned by your user:
- Linux only:
3. Install VS Code or PyCharm
- Install VS Code and VS Code's Dev Containers extension. Alternatively, install PyCharm.
- Optional: install a Nerd Font such as FiraCode Nerd Font and configure VS Code or configure PyCharm to use it.
Development environments
The following development environments are supported:
- βοΈ GitHub Codespaces: click on Code and select Create codespace to start a Dev Container with GitHub Codespaces.
- βοΈ Dev Container (with container volume): click on Open in Dev Containers to clone this repository in a container volume and create a Dev Container with VS Code.
- Dev Container: clone this repository, open it with VS Code, and run Ctrl/β + β§ + P β Dev Containers: Reopen in Container.
- PyCharm: clone this repository, open it with PyCharm, and configure Docker Compose as a remote interpreter with the
dev
service. - Terminal: clone this repository, open it with your terminal, and run
docker compose up --detach dev
to start a Dev Container in the background, and then rundocker compose exec dev zsh
to open a shell prompt in the Dev Container.
Developing
- This project follows the Conventional Commits standard to automate Semantic Versioning and Keep A Changelog with Commitizen.
- Run
poe
from within the development environment to print a list of Poe the Poet tasks available to run on this project. - Run
poetry add {package}
from within the development environment to install a run time dependency and add it topyproject.toml
andpoetry.lock
. Add--group test
or--group dev
to install a CI or development dependency, respectively. - Run
poetry update
from within the development environment to upgrade all dependencies to the latest versions allowed bypyproject.toml
. - Run
cz bump
to bump the package's version, update theCHANGELOG.md
, and create a git tag.