Advanced RAG system with dynamic query classification, supporting Llama-3.2 models through Ollama integration.
SPBRAG enhances RAG pipelines using a hybrid approach:
- BERT-based Query Classification - Determines context requirement
- Llama-3.2 LLM - Generates context-aware responses
- Evaluation Framework - Measures precision/recall of retrieval components
- Multi-Model Support π€
llama3.2:1B
(fast) andllama3.2:3B
(high-quality) variants - Automatic Model Handling βοΈ
One-line model downloads via Ollama - Secure Configuration π
Environment-based API key management - Flexible Training ποΈ
Custom BERT fine-tuning capabilities
For detailed explanations of our evaluation metrics and interpretation guidelines, see: Metrics Documentation
Key tracked metrics include:
- F1 score and EM score
- Metrics For Evaluation RAG
- Classification Accuracy For BERT
- Latency Benchmarks
To review the progress of my work and my thoughts, I suggest you read: Workflow
"${SHELL}" <(curl -L micro.mamba.pm/install.sh)
curl -L micro.mamba.pm/install.sh | sh
git clone https://github.com/Dnau15/SPBRAG.git
cd SPBRAG
micromamba create -n spbrag python=3.11 -y
micromamba activate spbrag
./setup.sh
chmod +x ./setup.sh
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2:3B # High-quality version (3B params)
- Create
.env
file:
touch .env
- Add Hugging Face credentials:
HUGGINGFACE_API_KEY=your_hf_api_key_here
export PYTHONPATH="$(pwd)/src:$PYTHONPATH"
set -x PYTHONPATH (pwd)/src $PYTHONPATH
#### Windows (Powershell)
$env:PYTHONPATH = "$(pwd)/src;$env:PYTHONPATH"
mkdir -p models/bert-text-classification-model
python src/rag_system/data/data_creation.py
python src/rag_system/training/fine_tune_bert.py \
--file_path=your_path or default \
--num_samples_per_class=1500 \
--num_epochs=5 \
--learning_rate=2e-5
Column | Type | Description |
---|---|---|
query | text | User input |
requires_context | bool | Context flag |
reference_text | text | Ground truth |
python src/rag_system/evaluation/evaluator.py \
--collection_name=TestCollection5 \
--model_type=ollama \ # or mistral
# Model Configuration
--bert_path="google-bert/bert-base-uncased" # Path to BERT model
--tokenizer_path="google-bert/bert-base-uncased" # Custom tokenizer
--embedding_model_path="sentence-transformers/all-mpnet-base-v2" # Embedding model
# Vector Database Settings
--milvus_uri="./data/milvus_demo.db" # Local Milvus instance path
--collection_name="rag_eval" # Collection name for stored embeddings
# LLM Configuration
--llm_repo_id="mistralai/Mistral-7B-Instruct-v0.2" # Alternative LLM
--llm_max_new_tokens=100 # Maximum response length
--model_type="mistral" # [llama3|mistral] LLM variant
# Retrieval Parameters
--top_k=30 # Number of context chunks to retrieve
--context_len=1000 # Context window size (in tokens)
# Evaluation Settings
--num_test_samples=5 # Number of test cases to evaluate
--use_classifier=True # Enable/disable query classification