AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports LLMs, embeddings, and speech-to-text.
kubernetes ai k8s whisper autoscaler openai-api llm vllm faster-whisper ollama vllm-operator ollama-operator inference-operator
-
Updated
Jan 3, 2025 - Go