OpenVINO™ Model Server {#ovms_what_is_openvino_model_server}

---
maxdepth: 1
hidden:
---

ovms_docs_quick_start_guide
ovms_docs_llm_quickstart
ovms_docs_models_repository
ovms_docs_deploying_server
ovms_docs_server_app
ovms_docs_features
ovms_docs_performance_tuning
ovms_docs_demos
ovms_docs_troubleshooting

Model Server hosts models and makes them accessible to software components over standard network protocols: a client sends a request to the model server, which performs model inference and sends a response back to the client. Model Server offers many advantages for efficient model deployment:

Remote inference enables using lightweight clients with only the necessary functions to perform API calls to edge or cloud deployments.
Applications are independent of the model framework, hardware device, and infrastructure.
Client applications in any programming language that supports REST or gRPC calls can be used to run inference remotely on the model server.
Clients require fewer updates since client libraries change very rarely.
Model topology and weights are not exposed directly to client applications, making it easier to control access to the model.
Ideal architecture for microservices-based applications and deployments in cloud environments – including Kubernetes and OpenShift clusters.
Efficient resource utilization with horizontal and vertical inference scaling.

Serving with OpenVINO Model Server

OpenVINO™ Model Server (OVMS) is a high-performance system for serving models. Implemented in C++ for scalability and optimized for deployment on Intel architectures. It uses the same API as TensorFlow Serving and KServe while applying OpenVINO for inference execution. Inference service is provided via gRPC or REST API, making deploying new algorithms and AI experiments easy.

In addition, there are included endpoints for generative use cases compatible with OpenAI API and Cohere API.

The models used by the server need to be stored locally or hosted remotely by object storage services. For more details, refer to Preparing Model Repository documentation. Model server works inside Docker containers, on Bare Metal, and in Kubernetes environment. Start using OpenVINO Model Server with a fast-forward serving example from the QuickStart guide or LLM QuickStart guide.

Key features:

[NEW] Native Windows support. Check updated deployment guide
[NEW] Embeddings endpoint compatible with OpenAI API
[NEW] Reranking compatible with Cohere API
[NEW] Efficient Text Generation with OpenAI API
Python code execution
gRPC streaming
MediaPipe graphs serving
Model management - including model versioning and model updates in runtime
Dynamic model inputs
Directed Acyclic Graph Scheduler along with custom nodes in DAG pipelines
Metrics - metrics compatible with Prometheus standard
Support for multiple frameworks, such as TensorFlow, PaddlePaddle and ONNX
Support for AI accelerators

Additional Resources

RAG building blocks made easy and affordable with OpenVINO Model Server
Simplified Deployments with OpenVINO™ Model Server and TensorFlow Serving
Inference Scaling with OpenVINO™ Model Server in Kubernetes and OpenShift Clusters
Benchmarking results
Release Notes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

home.md

home.md

OpenVINO™ Model Server {#ovms_what_is_openvino_model_server}

Serving with OpenVINO Model Server

Key features:

Additional Resources

Files

home.md

Latest commit

History

home.md

File metadata and controls

OpenVINO™ Model Server {#ovms_what_is_openvino_model_server}

Serving with OpenVINO Model Server

Key features:

Additional Resources