Name	Name	Last commit message	Last commit date
parent directory ..
templates	templates
.helmignore	.helmignore
Chart.yaml	Chart.yaml
README.md	README.md
README.md.gotmpl	README.md.gotmpl
values.yaml	values.yaml

MAX OpenAI API Helm chart

The MAX platform unifies the leading AI development frameworks (TensorFlow, PyTorch, ONNX) and hardware backends in order to simplify deployment for AI production teams and accelerate innovation for AI developers.

For more information about using this Helm chart, see the tutorial to Deploy Llama 3 on GPU-powered Kubernetes clusters

Homepage: https://www.modular.com/

Source Code

https://github.com/modular/max

Usage

Installing the chart

To install this chart using Helm 3, run the following command:

helm install max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart \
  --version <insert-version> \
  --set huggingfaceRepoId=<insert-huggingface-model-id>
  --set maxServe.maxLength=512 \
  --set maxServe.maxBatchSize=16 \
  --set envSecret.HF_TOKEN=<insert-huggingface-token> \
  --set env.HF_HUB_ENABLE_HF_TRANSFER=1 \
  --wait

The command deploys MAX OpenAI API on the Kubernetes cluster in the default configuration. The Values reference section below lists the parameters that can be configured during installation.

Upgrading the chart

To upgrade the chart with the release name max-openai-api:

helm upgrade max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart

Uninstalling the chart

To uninstall/delete the max-openai-api deployment:

helm delete max-openai-api

End-to-end example that provisions an K8s cluster and installs MAX OpenAI API

To provision a k8s cluster via eksctl and then install MAX OpenAI API, run the following commands:

# provision a k8s cluster (takes 10-15 minutes)
eksctl create cluster \
  --name max-openai-api-demo \
  --region us-east-1 \
  --node-type g5.4xlarge \
  --nodes 1

# create a k8s namespace
kubectl create namespace max-openai-api-demo

# deploy MAX OpenAI API via helm chart (takes 10 minutes)
helm install max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart \
  --version <insert-version> \
  --namespace max-openai-api-demo \
  --set huggingfaceRepoId=modularai/Llama-3.1-8B-Instruct-GGUF
  --set maxServe.maxLength=512 \
  --set maxServe.maxBatchSize=16 \
  --set envSecret.HF_TOKEN=<insert-huggingface-token> \
  --set env.HF_HUB_ENABLE_HF_TRANSFER=1 \
  --timeout 10m0s \
  --wait

# forward the remote k8s port to the local network to access the service locally
# the command is blocking and takes the terminal
# user another terminal for subsequent curl and ctrl-c to stop the port forwarding
POD_NAME=$(kubectl get pods --namespace max-openai-api-demo -l "app.kubernetes.io/name=max-openai-api-chart,app.kubernetes.io/instance=max-openai-api" -o jsonpath="{.items[0].metadata.name}")
CONTAINER_PORT=$(kubectl get pod --namespace max-openai-api-demo $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
kubectl port-forward $POD_NAME 8000:$CONTAINER_PORT --namespace max-openai-api-demo &

# test the service
curl -N http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "modularai/Llama-3.1-8B-Instruct-GGUF",
        "stream": true,
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who won the world series in 2020?"}
        ]
    }'

# uninstall MAX OpenAI API
helm uninstall max-openai-api --namespace max-openai-api-demo

# Delete the namespace
kubectl delete namespace max-openai-api-demo

# delete the k8s cluster
eksctl delete cluster \
  --name max-openai-api-demo \
  --region us-east-1

Values

Key	Type	Default	Description
affinity	object	`{}`	Affinity to be added to all deployments
env	object	`{}`	Environment variables that will be passed into pods
envFromSecret	string	`"{{ template \"max.fullname\" . }}-env"`	The name of the secret which we will use to populate env vars in deployed pods This can be useful for secret keys, etc.
envFromSecrets	list	`[]`	This can be a list of templated strings
envRaw	list	`[]`	Environment variables in RAW format that will be passed into pods
envSecret	object	`{}`	Environment variables to pass as secrets
fullnameOverride	string	`nil`	Provide a name to override the full names of resources
image.pullPolicy	string	`"IfNotPresent"`
image.repository	string	`"modular/max-openai-api"`
image.tag	string	`"latest"`
imagePullSecrets	list	`[]`
inferenceServer.affinity	object	`{}`	Affinity to be added to inferenceServer deployment
inferenceServer.args	list	See `values.yaml`	Arguments to pass to the node entrypoint. If defined it overwrites the default args value set by .Values.max-serve
inferenceServer.autoscaling.enabled	bool	`false`
inferenceServer.autoscaling.maxReplicas	int	`2`
inferenceServer.autoscaling.minReplicas	int	`1`
inferenceServer.autoscaling.targetCPUUtilizationPercentage	int	`80`
inferenceServer.containerSecurityContext	object	`{}`
inferenceServer.deploymentAnnotations	object	`{}`	Annotations to be added to inferenceServer deployment
inferenceServer.deploymentLabels	object	`{}`	Labels to be added to inferenceServer deployment
inferenceServer.env	object	`{}`
inferenceServer.extraContainers	list	`[]`	Launch additional containers into inferenceServer pod
inferenceServer.livenessProbe.failureThreshold	int	`3`
inferenceServer.livenessProbe.httpGet.path	string	`"/v1/health"`
inferenceServer.livenessProbe.httpGet.port	string	`"http"`
inferenceServer.livenessProbe.initialDelaySeconds	int	`1`
inferenceServer.livenessProbe.periodSeconds	int	`15`
inferenceServer.livenessProbe.successThreshold	int	`1`
inferenceServer.livenessProbe.timeoutSeconds	int	`1`
inferenceServer.nodeSelector	object	`{}`	NodeSelector to be added to inferenceServer deployment
inferenceServer.podAnnotations	object	`{}`	Annotations to be added to inferenceServer pods
inferenceServer.podLabels	object	`{}`	Labels to be added to inferenceServer pods
inferenceServer.podSecurityContext	object	`{}`
inferenceServer.readinessProbe.failureThreshold	int	`3`
inferenceServer.readinessProbe.httpGet.path	string	`"/v1/health"`
inferenceServer.readinessProbe.httpGet.port	string	`"http"`
inferenceServer.readinessProbe.initialDelaySeconds	int	`1`
inferenceServer.readinessProbe.periodSeconds	int	`15`
inferenceServer.readinessProbe.successThreshold	int	`1`
inferenceServer.readinessProbe.timeoutSeconds	int	`1`
inferenceServer.replicaCount	int	`1`
inferenceServer.resources	object	`{}`	Resource settings for the inferenceServer pods - these settings overwrite existing values from the global resources object defined above.
inferenceServer.startupProbe.failureThreshold	int	`60`
inferenceServer.startupProbe.httpGet.path	string	`"/v1/health"`
inferenceServer.startupProbe.httpGet.port	string	`"http"`
inferenceServer.startupProbe.initialDelaySeconds	int	`1`
inferenceServer.startupProbe.periodSeconds	int	`5`
inferenceServer.startupProbe.successThreshold	int	`1`
inferenceServer.startupProbe.timeoutSeconds	int	`1`
inferenceServer.strategy	object	`{}`
inferenceServer.tolerations	list	`[]`	Tolerations to be added to inferenceServer deployment
inferenceServer.topologySpreadConstraints	list	`[]`	TopologySpreadConstrains to be added to inferenceServer deployments
inferenceServer.volumeMounts	list	`[]`	Volumes to mount into inferenceServer pod
inferenceServer.volumes	list	`[]`	Volumes to mount into inferenceServer pod
ingress.annotations	object	`{}`
ingress.enabled	bool	`false`
ingress.extraHostsRaw	list	`[]`
ingress.hosts	list	`[]`
ingress.ingressClassName	string	`nil`
ingress.path	string	`"/"`
ingress.pathType	string	`"ImplementationSpecific"`
ingress.tls	list	`[]`
maxServe	object	`{"cacheStrategy":"continuous","huggingfaceRepoId":"modularai/Llama-3.1-8B-Instruct-GGUF","maxBatchSize":"250","maxLength":"2048","maxNumSteps":"10"}`	MAX Serve arguments
nameOverride	string	`nil`	Provide a name to override the name of the chart
nodeSelector	object	`{}`	NodeSelector to be added to all deployments
resources	object	`{}`
runAsUser	int	`0`	User ID directive. This user must have enough permissions to run the bootstrap script Running containers as root is not recommended in production. Change this to another UID - e.g. 1000 to be more secure
service.annotations	object	`{}`
service.loadBalancerIP	string	`nil`
service.ports[0].name	string	`"http"`
service.ports[0].port	int	`8000`
service.ports[0].protocol	string	`"TCP"`
service.ports[0].targetPort	int	`8000`
service.type	string	`"ClusterIP"`
serviceAccount.annotations	object	`{}`
serviceAccount.create	bool	`false`	Create custom service account for MAX Serving. If create: true and serviceAccountName is not provided, `max.fullname` will be used.
serviceAccountName	string	`nil`	Specify service account name to be used
tolerations	list	`[]`	Tolerations to be added to all deployments
topologySpreadConstraints	list	`[]`	TopologySpreadConstraints to be added to all deployments
volumeMounts	list	`[]`
volumes	list	`[]`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max-openai-api

max-openai-api

README.md

MAX OpenAI API Helm chart

Source Code

Usage

Installing the chart

Upgrading the chart

Uninstalling the chart

End-to-end example that provisions an K8s cluster and installs MAX OpenAI API

Values

Files

max-openai-api

Directory actions

More options

Directory actions

More options

Latest commit

History

max-openai-api

Folders and files

parent directory

README.md

MAX OpenAI API Helm chart

Source Code

Usage

Installing the chart

Upgrading the chart

Uninstalling the chart

End-to-end example that provisions an K8s cluster and installs MAX OpenAI API

Values