Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
rag_demo.ipynb		rag_demo.ipynb
requirements.txt		requirements.txt

README.md

RAG demo with all execution steps delegated to the OpenVINO Model Server {#ovms_demos_continuous_batching_rag}

Creating models repository for all the endpoints

curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2025/0/demos/common/export_models/export_model.py -o export_model.py
pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2025/0/demos/common/export_models/requirements.txt

mkdir -p models
python export_model.py text_generation --source_model meta-llama/Meta-Llama-3-8B-Instruct --weight-format int8 --kv_cache_precision u8 --config_file_path models/config_all.json --model_repository_path models 
python export_model.py embeddings --source_model Alibaba-NLP/gte-large-en-v1.5 --weight-format int8 --config_file_path models/config_all.json
python export_model.py rerank --source_model BAAI/bge-reranker-large --weight-format int8  --config_file_path models/config_all.json

Deploying the model server

With Docker

docker run -d --rm -p 8000:8000 -v $(pwd)/models:/workspace:ro openvino/model_server:latest --rest_port 8000 --config_path /workspace/config_all.json

On Baremetal

ovms --rest_port 8000 --config_path ./models/config_all.json

Using RAG

When the model server is deployed and serving all 3 endpoints, run the jupyter notebook to use RAG chain with a fully remote execution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rag

rag

README.md

RAG demo with all execution steps delegated to the OpenVINO Model Server {#ovms_demos_continuous_batching_rag}

Creating models repository for all the endpoints

Deploying the model server

With Docker

On Baremetal

Using RAG

Files

rag

Directory actions

More options

Directory actions

More options

Latest commit

History

rag

Folders and files

parent directory

README.md

RAG demo with all execution steps delegated to the OpenVINO Model Server {#ovms_demos_continuous_batching_rag}

Creating models repository for all the endpoints

Deploying the model server

With Docker

On Baremetal

Using RAG