This repo contains instructions to deploy a full RAG application on OpenShift and OpenShift AI. It contains Jupyter Notebooks to ingest data into a vector dabase (Milvus) and a streamlit Application to actually interact with your own knowledge and popular LLMs (e.g. llama3-7B,Mistral 7B or granite-7B). It leverages RAG and gives you many configuration options to tune how RAG behaves and how to tune the model parameters. It supports text input. Check out this Git Repo to learn more about it and how to ingest your own knowledge base - supporting PDFs, Docs, PPTX or your Confluence Wiki. Check out the details [here](https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/
The following contains a description on how to set it up on Kubernetes / OpenShift. There is also a guide on how to deploy it locally with podman, albeit it would need some customization on the mount paths and on your CDI (NVIDIA configuration) configuration.
oc new-project <yourname>-chatbot
oc apply -f milvus/bucket-claim.yaml
2. Download helm repo & Update milvus/openshift-values.yaml with your Object Bucket credentials or activate minio (which autogenerates for you but also spins up minio). Refer to this repo for further instructions: LLM-ON-OpenShift
OPTIONAL - You may want to skip generating your own Milvus manifest and just use my preconfigured milvus/milvus_manifest_standalone.yaml
helm template -f openshift-values.yaml vectordb --set cluster.enabled=false --set etcd.replicaCount=1 --set pulsar.enabled=false milvus/milvus > milvus_manifest_standalone.yaml
yq '(select(.kind == "StatefulSet" and .metadata.name == "vectordb-etcd") | .spec.template.spec.securityContext) = {}' -i milvus_manifest_standalone.yaml
yq '(select(.kind == "StatefulSet" and .metadata.name == "vectordb-etcd") | .spec.template.spec.containers[0].securityContext) = {"capabilities": {"drop": ["ALL"]}, "runAsNonRoot": true, "allowPrivilegeEscalation": false}' -i milvus_manifest_standalone.yaml
yq '(select(.kind == "Deployment" and .metadata.name == "vectordb-minio") | .spec.template.spec.securityContext) = {"capabilities": {"drop": ["ALL"]}, "runAsNonRoot": true, "allowPrivilegeEscalation": false}' -i milvus_manifest_standalone.yaml
oc apply -f milvus/milvus_manifest_standalone.yaml
oc apply -f ollama/
oc patch namespace <yourname>-chatbot -p '{"metadata":{"labels":{"opendatahub.io/dashboard":"true"}}}' --type=merge
3. Clone the repo https://github.com/maxisses/openshift-rag-testbench
oc apply -f streamlit/k8s
oc create route edge --service=rag-frontend
e) Alternative: Deploy vLLM via standard Deployment, Warning: GPU required, it loads mistral7B per default ~~requires approx 20GB RAM if not quantized, a good alternative is the model "TheBloke/Mistral-7B-Instruct-v0.2-AWQ"
oc apply -f vllm/vllm-native/