GitHub - maxisses/openshift-rag-testbench

Introduction

This repo contains instructions to deploy a full RAG application on OpenShift and OpenShift AI. It contains Jupyter Notebooks to ingest data into a vector dabase (Milvus) and a streamlit Application to actually interact with your own knowledge and popular LLMs (e.g. llama3-7B,Mistral 7B or granite-7B). It leverages RAG and gives you many configuration options to tune how RAG behaves and how to tune the model parameters. It supports text input. Check out this Git Repo to learn more about it and how to ingest your own knowledge base - supporting PDFs, Docs, PPTX or your Confluence Wiki. Check out the details [here](https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/

The following contains a description on how to set it up on Kubernetes / OpenShift. There is also a guide on how to deploy it locally with podman, albeit it would need some customization on the mount paths and on your CDI (NVIDIA configuration) configuration.

Infrastructure SetUp

a) MILVUS Vector DB

1. ODF or AWS S3: Create a bucket

oc new-project <yourname>-chatbot
oc apply -f milvus/bucket-claim.yaml

2. Download helm repo & Update milvus/openshift-values.yaml with your Object Bucket credentials or activate minio (which autogenerates for you but also spins up minio). Refer to this repo for further instructions: LLM-ON-OpenShift

OPTIONAL - You may want to skip generating your own Milvus manifest and just use my preconfigured milvus/milvus_manifest_standalone.yaml

helm template -f openshift-values.yaml vectordb --set cluster.enabled=false --set etcd.replicaCount=1 --set pulsar.enabled=false milvus/milvus > milvus_manifest_standalone.yaml

yq '(select(.kind == "StatefulSet" and .metadata.name == "vectordb-etcd") | .spec.template.spec.securityContext) = {}' -i milvus_manifest_standalone.yaml
yq '(select(.kind == "StatefulSet" and .metadata.name == "vectordb-etcd") | .spec.template.spec.containers[0].securityContext) = {"capabilities": {"drop": ["ALL"]}, "runAsNonRoot": true, "allowPrivilegeEscalation": false}' -i milvus_manifest_standalone.yaml
yq '(select(.kind == "Deployment" and .metadata.name == "vectordb-minio") | .spec.template.spec.securityContext) = {"capabilities": {"drop": ["ALL"]}, "runAsNonRoot": true, "allowPrivilegeEscalation": false}' -i milvus_manifest_standalone.yaml

3. Apply to OpenShift

oc apply -f milvus/milvus_manifest_standalone.yaml

b) Deploy Ollama

oc apply -f ollama/

c) Load Notebooks in OpenShift AI

1. Make NS available in RHOAI

oc patch namespace <yourname>-chatbot -p '{"metadata":{"labels":{"opendatahub.io/dashboard":"true"}}}' --type=merge

2. Deploy a Workbench (e.g. medium, standard Data Science)

3. Clone the repo https://github.com/maxisses/openshift-rag-testbench

4. Upload documents into the folders: docx, pptx, pdfs

5. Run the ingest notebook - this takes a while

6. Once done, run the Ollama Notebook to test your RAG installation

d) Deploy a Frontend

oc apply -f streamlit/k8s
oc create route edge --service=rag-frontend

e) Alternative: Deploy vLLM via standard Deployment, Warning: GPU required, it loads mistral7B per default ~~requires approx 20GB RAM if not quantized, a good alternative is the model "TheBloke/Mistral-7B-Instruct-v0.2-AWQ"

1. Put a Secret.yaml in the vllm/vllm-native/, which contains your Huggingface token

oc apply -f vllm/vllm-native/

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
milvus		milvus
notebooks-for-ingestion		notebooks-for-ingestion
ollama		ollama
streamlit		streamlit
vllm/vllm-native		vllm/vllm-native
.gitignore		.gitignore
README.md		README.md
local_podman_deployment.md		local_podman_deployment.md
temp		temp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Infrastructure SetUp

a) MILVUS Vector DB

1. ODF or AWS S3: Create a bucket

2. Download helm repo & Update milvus/openshift-values.yaml with your Object Bucket credentials or activate minio (which autogenerates for you but also spins up minio). Refer to this repo for further instructions: LLM-ON-OpenShift

OPTIONAL - You may want to skip generating your own Milvus manifest and just use my preconfigured milvus/milvus_manifest_standalone.yaml

3. Apply to OpenShift

b) Deploy Ollama

c) Load Notebooks in OpenShift AI

1. Make NS available in RHOAI

2. Deploy a Workbench (e.g. medium, standard Data Science)

3. Clone the repo https://github.com/maxisses/openshift-rag-testbench

4. Upload documents into the folders: docx, pptx, pdfs

5. Run the ingest notebook - this takes a while

6. Once done, run the Ollama Notebook to test your RAG installation

d) Deploy a Frontend

e) Alternative: Deploy vLLM via standard Deployment, Warning: GPU required, it loads mistral7B per default ~~requires approx 20GB RAM if not quantized, a good alternative is the model "TheBloke/Mistral-7B-Instruct-v0.2-AWQ"

1. Put a Secret.yaml in the vllm/vllm-native/, which contains your Huggingface token

About

Releases

Packages

Languages

maxisses/openshift-rag-testbench

Folders and files

Latest commit

History

Repository files navigation

Introduction

Infrastructure SetUp

a) MILVUS Vector DB

1. ODF or AWS S3: Create a bucket

2. Download helm repo & Update milvus/openshift-values.yaml with your Object Bucket credentials or activate minio (which autogenerates for you but also spins up minio). Refer to this repo for further instructions: LLM-ON-OpenShift

OPTIONAL - You may want to skip generating your own Milvus manifest and just use my preconfigured milvus/milvus_manifest_standalone.yaml

3. Apply to OpenShift

b) Deploy Ollama

c) Load Notebooks in OpenShift AI

1. Make NS available in RHOAI

2. Deploy a Workbench (e.g. medium, standard Data Science)

3. Clone the repo https://github.com/maxisses/openshift-rag-testbench

4. Upload documents into the folders: docx, pptx, pdfs

5. Run the ingest notebook - this takes a while

6. Once done, run the Ollama Notebook to test your RAG installation

d) Deploy a Frontend

e) Alternative: Deploy vLLM via standard Deployment, Warning: GPU required, it loads mistral7B per default ~~requires approx 20GB RAM if not quantized, a good alternative is the model "TheBloke/Mistral-7B-Instruct-v0.2-AWQ"

1. Put a Secret.yaml in the vllm/vllm-native/, which contains your Huggingface token

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages