Skip to content

openvinotoolkit/model_server

This branch is 34 commits ahead of main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

49fb411 · Mar 20, 2025
May 6, 2024
Mar 10, 2025
Mar 3, 2025
Mar 20, 2025
Mar 20, 2025
Mar 14, 2025
Feb 18, 2025
Mar 6, 2025
Mar 20, 2025
Mar 3, 2025
Mar 19, 2025
Aug 13, 2024
Feb 25, 2025
Jul 5, 2024
Mar 10, 2025
Oct 11, 2024
Feb 18, 2025
Feb 12, 2025
Mar 19, 2025
Mar 14, 2025
Apr 1, 2020
Sep 29, 2020
Mar 15, 2025
Mar 13, 2024
Mar 13, 2025
Mar 4, 2025
Feb 28, 2025
Mar 15, 2025
Sep 19, 2024
Jan 24, 2025
Jan 22, 2025
Oct 11, 2024
Apr 20, 2023
Sep 19, 2024
Mar 12, 2025
Feb 11, 2025
May 11, 2023
Jan 16, 2025
Jan 16, 2025
Mar 6, 2025
Feb 19, 2025
Feb 25, 2025
Feb 11, 2025
Mar 6, 2025
Mar 15, 2025
Mar 6, 2025
Jan 20, 2025
Feb 20, 2025
Feb 4, 2025
Feb 25, 2025
Apr 20, 2023

Repository files navigation

OpenVINO™ Model Server

Model Server hosts models and makes them accessible to software components over standard network protocols: a client sends a request to the model server, which performs model inference and sends a response back to the client. Model Server offers many advantages for efficient model deployment:

  • Remote inference enables using lightweight clients with only the necessary functions to perform API calls to edge or cloud deployments.
  • Applications are independent of the model framework, hardware device, and infrastructure.
  • Client applications in any programming language that supports REST or gRPC calls can be used to run inference remotely on the model server.
  • Clients require fewer updates since client libraries change very rarely.
  • Model topology and weights are not exposed directly to client applications, making it easier to control access to the model.
  • Ideal architecture for microservices-based applications and deployments in cloud environments – including Kubernetes and OpenShift clusters.
  • Efficient resource utilization with horizontal and vertical inference scaling.

OVMS diagram

OpenVINO™ Model Server (OVMS) is a high-performance system for serving models. Implemented in C++ for scalability and optimized for deployment on Intel architectures. It uses the same API as TensorFlow Serving and KServe while applying OpenVINO for inference execution. Inference service is provided via gRPC or REST API, making deploying new algorithms and AI experiments easy.

In addition, there are included endpoints for generative use cases compatible with OpenAI API and Cohere API.

OVMS picture

The models used by the server need to be stored locally or hosted remotely by object storage services. For more details, refer to Preparing Model Repository documentation. Model server works inside Docker containers, on Bare Metal, and in Kubernetes environment. Start using OpenVINO Model Server with a fast-forward serving example from the QuickStart guide or LLM QuickStart guide.

Read release notes to find out what’s new.

Key features:

Check full list of features

Note: OVMS has been tested on RedHat, Ubuntu and Windows. Public docker images are stored in:

Run OpenVINO Model Server

A demonstration on how to use OpenVINO Model Server can be found in our quick-start guide for vision use case and LLM text generation.

Check also other instructions:

Preparing model repository

Deployment

Writing client code

Demos

References

Contact

If you have a question, a feature request, or a bug report, feel free to submit a Github issue.


* Other names and brands may be claimed as the property of others.