Skip to content

Commit 7cefcee

Browse files
authoredFeb 7, 2025··
Update links (#3043)
1 parent c29380d commit 7cefcee

25 files changed

+54
-54
lines changed
 

‎README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -13,17 +13,17 @@ Model Server hosts models and makes them accessible to software components over
1313

1414
OpenVINO™ Model Server (OVMS) is a high-performance system for serving models. Implemented in C++ for scalability and optimized for deployment on Intel architectures. It uses the same API as [TensorFlow Serving](https://github.com/tensorflow/serving) and [KServe](https://github.com/kserve/kserve) while applying OpenVINO for inference execution. Inference service is provided via gRPC or REST API, making deploying new algorithms and AI experiments easy.
1515

16-
In addition, there are included endpoints for generative use cases compatible with [OpenAI API and Cohere API](./clients_genai.md).
16+
In addition, there are included endpoints for generative use cases compatible with [OpenAI API and Cohere API](./docs/clients_genai.md).
1717

1818
![OVMS picture](docs/ovms_high_level.png)
1919

2020
The models used by the server need to be stored locally or hosted remotely by object storage services. For more details, refer to [Preparing Model Repository](docs/models_repository.md) documentation. Model server works inside [Docker containers](docs/deploying_server.md#deploying-model-server-in-docker-container), on [Bare Metal](docs/deploying_server.md#deploying-model-server-on-baremetal-without-container), and in [Kubernetes environment](docs/deploying_server.md#deploying-model-server-in-kubernetes).
21-
Start using OpenVINO Model Server with a fast-forward serving example from the [QuickStart guide](docs/ovms_quickstart.md) or [LLM QuickStart guide](./llm/quickstart.md).
21+
Start using OpenVINO Model Server with a fast-forward serving example from the [QuickStart guide](docs/ovms_quickstart.md) or [LLM QuickStart guide](./docs/llm/quickstart.md).
2222

2323
Read [release notes](https://github.com/openvinotoolkit/model_server/releases) to find out what’s new.
2424

2525
### Key features:
26-
- **[NEW]** Native Windows support. Check updated [deployment guide](./deploying_server.md)
26+
- **[NEW]** Native Windows support. Check updated [deployment guide](./docs/deploying_server.md)
2727
- **[NEW]** [Text Embeddings compatible with OpenAI API](demos/embeddings/README.md)
2828
- **[NEW]** [Reranking compatible with Cohere API](demos/rerank/README.md)
2929
- **[NEW]** [Efficient Text Generation via OpenAI API](demos/continuous_batching/README.md)

‎demos/continuous_batching/rag/rag_demo.ipynb

+4-4
Original file line numberDiff line numberDiff line change
@@ -130,10 +130,10 @@
130130
}
131131
],
132132
"source": [
133-
"!curl https://docs.openvino.ai/2024/openvino-workflow/model-server/ovms_what_is_openvino_model_server.html --create-dirs -o ./docs/ovms_what_is_openvino_model_server.html\n",
134-
"!curl https://docs.openvino.ai/2024/openvino-workflow/model-server/ovms_docs_metrics.html -o ./docs/ovms_docs_metrics.html\n",
135-
"!curl https://docs.openvino.ai/2024/openvino-workflow/model-server/ovms_docs_streaming_endpoints.html -o ./docs/ovms_docs_streaming_endpoints.html\n",
136-
"!curl https://docs.openvino.ai/2024/openvino-workflow/model-server/ovms_docs_target_devices.html -o ./docs/ovms_docs_target_devices.html\n"
133+
"!curl https://docs.openvino.ai/2025/openvino-workflow/model-server/ovms_what_is_openvino_model_server.html --create-dirs -o ./docs/ovms_what_is_openvino_model_server.html\n",
134+
"!curl https://docs.openvino.ai/2025/openvino-workflow/model-server/ovms_docs_metrics.html -o ./docs/ovms_docs_metrics.html\n",
135+
"!curl https://docs.openvino.ai/2025/openvino-workflow/model-server/ovms_docs_streaming_endpoints.html -o ./docs/ovms_docs_streaming_endpoints.html\n",
136+
"!curl https://docs.openvino.ai/2025/openvino-workflow/model-server/ovms_docs_target_devices.html -o ./docs/ovms_docs_target_devices.html\n"
137137
]
138138
},
139139
{

‎demos/continuous_batching/speculative_decoding/README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# How to serve LLM Models in Speculative Decoding Pipeline{#ovms_demos_continuous_batching_speculative_decoding}
22

3-
Following [OpenVINO GenAI docs](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/genai-guide.html#efficient-text-generation-via-speculative-decoding):
3+
Following [OpenVINO GenAI docs](https://docs.openvino.ai/2025/openvino-workflow-generative/inference-with-genai.html#efficient-text-generation-via-speculative-decoding):
44
> Speculative decoding (or assisted-generation) enables faster token generation when an additional smaller draft model is used alongside the main model. This reduces the number of infer requests to the main model, increasing performance.
55
>
66
> The draft model predicts the next K tokens one by one in an autoregressive manner. The main model validates these predictions and corrects them if necessary - in case of a discrepancy, the main model prediction is used. Then, the draft model acquires this token and runs prediction of the next K tokens, thus repeating the cycle.
@@ -13,7 +13,7 @@ This demo shows how to use speculative decoding in the model serving scenario, b
1313

1414
**Model preparation**: Python 3.9 or higher with pip and HuggingFace account
1515

16-
**Model Server deployment**: Installed Docker Engine or OVMS binary package according to the [baremetal deployment guide](../../docs/deploying_server_baremetal.md)
16+
**Model Server deployment**: Installed Docker Engine or OVMS binary package according to the [baremetal deployment guide](../../../docs/deploying_server_baremetal.md)
1717

1818
## Model considerations
1919

@@ -103,7 +103,7 @@ Assuming you have unpacked model server package, make sure to:
103103
- **On Windows**: run `setupvars` script
104104
- **On Linux**: set `LD_LIBRARY_PATH` and `PATH` environment variables
105105

106-
as mentioned in [deployment guide](../../docs/deploying_server_baremetal.md), in every new shell that will start OpenVINO Model Server.
106+
as mentioned in [deployment guide](../../../docs/deploying_server_baremetal.md), in every new shell that will start OpenVINO Model Server.
107107

108108
Depending on how you prepared models in the first step of this demo, they are deployed to either CPU or GPU (it's defined in `config.json`). If you run on GPU make sure to have appropriate drivers installed, so the device is accessible for the model server.
109109

‎docs/accelerators.md

+9-9
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@
44

55
Docker engine installed (on Linux and WSL), or ovms binary package installed as described in the [guide](./deploying_server_baremetal.md) (on Linux or Windows).
66

7-
Supported HW is documented in [OpenVINO system requirements](https://docs.openvino.ai/2024/about-openvino/release-notes-openvino/system-requirements.html)
7+
Supported HW is documented in [OpenVINO system requirements](https://docs.openvino.ai/2025/about-openvino/release-notes-openvino/system-requirements.html)
88

9-
Before staring the model server as a binary package, make sure there are installed GPU or/and NPU required drivers like described in [https://docs.openvino.ai/2024/get-started/configurations.html](https://docs.openvino.ai/2024/get-started/configurations.html)
9+
Before staring the model server as a binary package, make sure there are installed GPU or/and NPU required drivers like described in [https://docs.openvino.ai/2025/get-started/install-openvino/configurations.html](https://docs.openvino.ai/2025/get-started/install-openvino/configurations.html)
1010

1111
Additional considerations when deploying with docker container:
1212
- make sure to use the image version including runtime drivers. The public image has a suffix -gpu like `openvino/model_server:latest-gpu`.
@@ -27,7 +27,7 @@ rm model/1/model.tar.gz
2727

2828
## Starting Model Server with Intel GPU
2929

30-
The [GPU plugin](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html) uses the [oneDNN](https://github.com/oneapi-src/oneDNN) and [OpenCL](https://github.com/KhronosGroup/OpenCL-SDK) to infer deep neural networks. For inference execution, it employs Intel® Processor Graphics including
30+
The [GPU plugin](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html) uses the [oneDNN](https://github.com/oneapi-src/oneDNN) and [OpenCL](https://github.com/KhronosGroup/OpenCL-SDK) to infer deep neural networks. For inference execution, it employs Intel® Processor Graphics including
3131
Intel® Arc™ GPU Series, Intel® UHD Graphics, Intel® HD Graphics, Intel® Iris® Graphics, Intel® Iris® Xe Graphics, and Intel® Iris® Xe MAX graphics and Intel® Data Center GPU.
3232

3333
### Container
@@ -57,7 +57,7 @@ docker run --rm -it --device=/dev/dxg --volume /usr/lib/wsl:/usr/lib/wsl -u $(i
5757

5858
### Binary
5959

60-
Starting the server with GPU acceleration requires installation of runtime drivers and ocl-icd-libopencl1 package like described on [configuration guide](https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-gpu.html)
60+
Starting the server with GPU acceleration requires installation of runtime drivers and ocl-icd-libopencl1 package like described on [configuration guide](https://docs.openvino.ai/2025/get-started/install-openvino/configurations/configurations-intel-gpu.html)
6161

6262
Start the model server with GPU accelerations using a command:
6363
```console
@@ -67,7 +67,7 @@ ovms --model_path model --model_name resnet --port 9000 --target_device GPU
6767

6868
## Using NPU device Plugin
6969

70-
OpenVINO Model Server supports using [NPU device](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/npu-device.html)
70+
OpenVINO Model Server supports using [NPU device](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/npu-device.html)
7171

7272
### Container
7373
Example command to run container with NPU:
@@ -82,13 +82,13 @@ Start the model server with NPU accelerations using a command:
8282
ovms --model_path model --model_name resnet --port 9000 --target_device NPU --batch_size 1
8383
```
8484

85-
Check more info about the [NPU driver configuration](https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-npu.html).
85+
Check more info about the [NPU driver configuration](https://docs.openvino.ai/2025/get-started/install-openvino/configurations/configurations-intel-npu.html).
8686

8787
> **NOTE**: NPU device execute models with static input and output shapes only. If your model has dynamic shape, it can be reset to static with parameters `--batch_size` or `--shape`.
8888
8989
## Using Heterogeneous Plugin
9090

91-
The [HETERO plugin](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/hetero-execution.html) makes it possible to distribute inference load of one model
91+
The [HETERO plugin](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/hetero-execution.html) makes it possible to distribute inference load of one model
9292
among several computing devices. That way different parts of the deep learning network can be executed by devices best suited to their type of calculations.
9393
OpenVINO automatically divides the network to optimize the process.
9494

@@ -115,7 +115,7 @@ ovms --model_path model --model_name resnet --port 9000 --target_device "HETERO:
115115

116116
## Using AUTO Plugin
117117

118-
[Auto Device](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/auto-device-selection.html) (or AUTO in short) is a new special “virtual” or “proxy” device in the OpenVINO toolkit, it doesn’t bind to a specific type of HW device.
118+
[Auto Device](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/auto-device-selection.html) (or AUTO in short) is a new special “virtual” or “proxy” device in the OpenVINO toolkit, it doesn’t bind to a specific type of HW device.
119119
AUTO solves the complexity in application required to code a logic for the HW device selection (through HW devices) and then, on the deducing the best optimization settings on that device.
120120
AUTO always chooses the best device, if compiling model fails on this device, AUTO will try to compile it on next best device until one of them succeeds.
121121

@@ -197,7 +197,7 @@ ovms --model_path model --model_name resnet --port 9000 --plugin_config "{\"PERF
197197

198198
## Using Automatic Batching Plugin
199199

200-
[Auto Batching](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/automatic-batching.html) (or BATCH in short) is a new special “virtual” device
200+
[Auto Batching](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/automatic-batching.html) (or BATCH in short) is a new special “virtual” device
201201
which explicitly defines the auto batching.
202202

203203
It performs automatic batching on-the-fly to improve device utilization by grouping inference requests together, without programming effort from the user.

‎docs/advanced_topics.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Implement any CPU layer, that is not support by OpenVINO yet, as a shared librar
1818
[Learn more](../src/example/SampleCpuExtension/README.md)
1919

2020
## Model Cache
21-
Leverage the OpenVINO [model caching](https://docs.openvino.ai/2024/openvino-workflow/running-inference/optimize-inference/optimizing-latency/model-caching-overview.html) feature to speed up subsequent model loading on a target device.
21+
Leverage the OpenVINO [model caching](https://docs.openvino.ai/2025/openvino-workflow/running-inference/optimize-inference/optimizing-latency/model-caching-overview.html) feature to speed up subsequent model loading on a target device.
2222

2323
[Learn more](model_cache.md)
2424

‎docs/build_from_source.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ make release_image MEDIAPIPE_DISABLE=1 PYTHON_DISABLE=1
143143
144144
### `GPU`
145145

146-
When set to `1`, OpenVINO&trade Model Server will be built with the drivers required by [GPU plugin](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html) support. Default value: `0`.
146+
When set to `1`, OpenVINO&trade Model Server will be built with the drivers required by [GPU plugin](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html) support. Default value: `0`.
147147

148148
Example:
149149
```bash

‎docs/deploying_server_baremetal.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,7 @@ Learn more about model server [starting parameters](parameters.md).
164164

165165
> **NOTE**:
166166
> When serving models on [AI accelerators](accelerators.md), some additional steps may be required to install device drivers and dependencies.
167-
> Learn more in the [Additional Configurations for Hardware](https://docs.openvino.ai/2024/get-started/configurations.html) documentation.
167+
> Learn more in the [Additional Configurations for Hardware](https://docs.openvino.ai/2025/get-started/install-openvino/configurations.html) documentation.
168168
169169

170170
## Next Steps

‎docs/deploying_server_docker.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ This is a step-by-step guide on how to deploy OpenVINO™ Model Server on Li
77
- [Docker Engine](https://docs.docker.com/engine/) installed
88
- Intel® Core™ processor (6-13th gen.) or Intel® Xeon® processor (1st to 4th gen.)
99
- Linux, macOS or Windows via [WSL](https://docs.microsoft.com/en-us/windows/wsl/)
10-
- (optional) AI accelerators [supported by OpenVINO](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes.html). Accelerators are tested only on bare-metal Linux hosts.
10+
- (optional) AI accelerators [supported by OpenVINO](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes.html). Accelerators are tested only on bare-metal Linux hosts.
1111

1212
### Launch Model Server Container
1313

@@ -85,4 +85,4 @@ make release_image GPU=1
8585
It will create an image called `openvino/model_server:latest`.
8686
> **Note:** This operation might take 40min or more depending on your build host.
8787
> **Note:** `GPU` parameter in image build command is needed to include dependencies for GPU device.
88-
> **Note:** The public image from the last release might be not compatible with models exported using the the latest export script. Check the [demo version from the last release](https://github.com/openvinotoolkit/model_server/tree/releases/2024/4/demos/continuous_batching) to use the public docker image.
88+
> **Note:** The public image from the last release might be not compatible with models exported using the the latest export script. We recommend using export script and docker image from the same release to avoid compatibility issues.

‎docs/dynamic_shape_dynamic_model.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Enable dynamic shape by setting the `shape` parameter to range or undefined:
88
- `--shape "(1,3,200:500,200:500)"` when model is supposed to support height and width values in a range of 200-500. Note that any dimension can support range of values, height and width are only examples here.
99

1010
> Note that some models do not support dynamic dimensions. Learn more about supported model graph layers including all limitations
11-
on [Shape Inference Document](https://docs.openvino.ai/2024/openvino-workflow/running-inference/changing-input-shape.html).
11+
on [Shape Inference Document](https://docs.openvino.ai/2025/openvino-workflow/running-inference/changing-input-shape.html).
1212

1313
Another option to use dynamic shape feature is to export the model with dynamic dimension using Model Optimizer. OpenVINO Model Server will inherit the dynamic shape and no additional settings are needed.
1414

‎docs/home.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -58,5 +58,5 @@ Start using OpenVINO Model Server with a fast-forward serving example from the [
5858
* [RAG building blocks made easy and affordable with OpenVINO Model Server](https://medium.com/openvino-toolkit/rag-building-blocks-made-easy-and-affordable-with-openvino-model-server-e7b03da5012b)
5959
* [Simplified Deployments with OpenVINO™ Model Server and TensorFlow Serving](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Simplified-Deployments-with-OpenVINO-Model-Server-and-TensorFlow/post/1353218)
6060
* [Inference Scaling with OpenVINO™ Model Server in Kubernetes and OpenShift Clusters](https://www.intel.com/content/www/us/en/developer/articles/technical/deploy-openvino-in-openshift-and-kubernetes.html)
61-
* [Benchmarking results](https://docs.openvino.ai/2024/about-openvino/performance-benchmarks.html)
61+
* [Benchmarking results](https://docs.openvino.ai/2025/about-openvino/performance-benchmarks.html)
6262
* [Release Notes](https://github.com/openvinotoolkit/model_server/releases)

‎docs/llm/reference.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ The calculator supports the following `node_options` for tuning the pipeline con
8181
- `optional uint64 max_num_seqs` - max number of sequences actively processed by the engine [default = 256];
8282
- `optional bool dynamic_split_fuse` - use Dynamic Split Fuse token scheduling [default = true];
8383
- `optional string device` - device to load models to. Supported values: "CPU", "GPU" [default = "CPU"]
84-
- `optional string plugin_config` - [OpenVINO device plugin configuration](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes.html). Should be provided in the same format for regular [models configuration](../parameters.md#model-configuration-options) [default = "{}"]
84+
- `optional string plugin_config` - [OpenVINO device plugin configuration](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes.html). Should be provided in the same format for regular [models configuration](../parameters.md#model-configuration-options) [default = "{}"]
8585
- `optional uint32 best_of_limit` - max value of best_of parameter accepted by endpoint [default = 20];
8686
- `optional uint32 max_tokens_limit` - max value of max_tokens parameter accepted by endpoint [default = 4096];
8787
- `optional bool enable_prefix_caching` - enable caching of KV-blocks [default = false];

‎docs/mediapipe.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ Check their [documentation](https://github.com/openvinotoolkit/mediapipe/blob/ma
5454

5555
## PyTensorOvTensorConverterCalculator
5656

57-
`PyTensorOvTensorConverterCalculator` enables conversion between nodes that are run by `PythonExecutorCalculator` and nodes that receive and/or produce [OV Tensors](https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_tensor.html)
57+
`PyTensorOvTensorConverterCalculator` enables conversion between nodes that are run by `PythonExecutorCalculator` and nodes that receive and/or produce [OV Tensors](https://docs.openvino.ai/2025/api/c_cpp_api/classov_1_1_tensor.html)
5858

5959
## How to create the graph for deployment in OpenVINO Model Server
6060

‎docs/model_cache.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Model Cache {#ovms_docs_model_cache}
22

33
## Overview
4-
The Model Server can leverage a [OpenVINO™ model cache functionality](https://docs.openvino.ai/2024/openvino-workflow/running-inference/optimize-inference/optimizing-latency/model-caching-overview.html), to speed up subsequent model loading on a target device.
4+
The Model Server can leverage a [OpenVINO™ model cache functionality](https://docs.openvino.ai/2025/openvino-workflow/running-inference/optimize-inference/optimizing-latency/model-caching-overview.html), to speed up subsequent model loading on a target device.
55
The cached files make the Model Server initialization usually faster.
66
The boost depends on a model and a target device. The most noticeable improvement will be observed with GPU devices. On other devices, like CPU, it is possible to observe no speed up effect or even slower loading process depending on used model. Test the setup before final deployment.
77

‎docs/model_server_c_api.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ To execute inference using C API you must follow steps described below.
4747
Create an inference request using `OVMS_InferenceRequestNew` specifying which servable name and optionally version to use. Then specify input tensors with `OVMS_InferenceRequestAddInput` and set the tensor data using `OVMS_InferenceRequestInputSetData`. Optionally you can also set one or all outputs with `OVMS_InferenceRequestAddOutput` and `OVMS_InferenceRequestOutputSetData`. For asynchronous inference you also have to set callback with `OVMS_InferenceRequestSetCompletionCallback`.
4848

4949
#### Using OpenVINO Remote Tensor
50-
With OpenVINO Model Server C-API you could also leverage the OpenVINO remote tensors support. Check original documentation [here](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device/remote-tensor-api-gpu-plugin.html). In order to use OpenCL buffers you need to first create `cl::Buffer` and then use its pointer in setting input with `OVMS_InferenceRequestInputSetData` or output with `OVMS_InferenceRequestOutputSetData` and buffer type `OVMS_BUFFERTYPE_OPENCL`. In case of VA surfaces you need to create appropriate VA surfaces and then use the same calls with buffer type `OVMS_BUFFERTYPE_VASURFACE_Y` and `OVMS_BUFFERTYPE_VASURFACE_UV`.
50+
With OpenVINO Model Server C-API you could also leverage the OpenVINO remote tensors support. Check original documentation [here](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device/remote-tensor-api-gpu-plugin.html). In order to use OpenCL buffers you need to first create `cl::Buffer` and then use its pointer in setting input with `OVMS_InferenceRequestInputSetData` or output with `OVMS_InferenceRequestOutputSetData` and buffer type `OVMS_BUFFERTYPE_OPENCL`. In case of VA surfaces you need to create appropriate VA surfaces and then use the same calls with buffer type `OVMS_BUFFERTYPE_VASURFACE_Y` and `OVMS_BUFFERTYPE_VASURFACE_UV`.
5151

5252
#### Invoke inference
5353
Execute inference with OpenVINO Model Server using `OVMS_Inference` synchronous call. During inference execution you must not modify `OVMS_InferenceRequest` and bound memory buffers.

‎docs/models_repository_classic.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,14 @@ ovms_docs_cloud_storage
1212
Traditional AI models perform data analysis in a single inference operation. They can be used over KServe API or TensorFlow API.
1313

1414
The AI models served by OpenVINO™ Model Server must be in either of the five formats:
15-
- [OpenVINO IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format.html), where the graph is represented in .bin and .xml files
15+
- [OpenVINO IR](https://docs.openvino.ai/2025/documentation/openvino-ir-format.html), where the graph is represented in .bin and .xml files
1616
- [ONNX](https://onnx.ai/), using the .onnx file
1717
- [PaddlePaddle](https://www.paddlepaddle.org.cn/en), using .pdiparams and .pdmodel files
1818
- [TensorFlow](https://www.tensorflow.org/), using SavedModel, MetaGraph or frozen Protobuf formats.
1919
- [TensorFlow Lite](https://www.tensorflow.org/lite), using the .tflite file
2020

2121
To use models trained in other formats you need to convert them first. To do so, use
22-
OpenVINO’s [conversion tool](https://docs.openvino.ai/2024/openvino-workflow/model-preparation/convert-model-to-ir.html) for IR, or different
22+
OpenVINO’s [conversion tool](https://docs.openvino.ai/2025/openvino-workflow/model-preparation/convert-model-to-ir.html) for IR, or different
2323
[converters](https://onnx.ai/supported-tools.html) for ONNX.
2424

2525
The models need to be placed and mounted in a particular directory structure and according to the following rules:

‎docs/ovms_quickstart.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# QuickStart - classic models {#ovms_docs_quick_start_guide}
22

3-
OpenVINO Model Server can perform inference using pre-trained models in either [OpenVINO IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format/operation-sets.html)
3+
OpenVINO Model Server can perform inference using pre-trained models in either [OpenVINO IR](https://docs.openvino.ai/2025/documentation/openvino-ir-format/operation-sets.html)
44
, [ONNX](https://onnx.ai/), [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) or [TensorFlow](https://www.tensorflow.org/) format. You can get them by:
55

66
- downloading models from [Open Model Zoo](https://storage.openvinotoolkit.org/repositories/open_model_zoo/)
77
- generating the model in a training framework and saving it to a supported format: TensorFlow saved_model, ONNX or PaddlePaddle.
88
- downloading the models from models hubs like [Kaggle](https://www.kaggle.com/models) or [ONNX models zoo](https://github.com/onnx/models).
9-
- converting models from any formats using [conversion tool](https://docs.openvino.ai/2024/openvino-workflow/model-preparation/convert-model-to-ir.html)
9+
- converting models from any formats using [conversion tool](https://docs.openvino.ai/2025/openvino-workflow/model-preparation/convert-model-to-ir.html)
1010

1111
This guide uses a [Faster R-CNN with Resnet-50 V1 Object Detection model](https://www.kaggle.com/models/tensorflow/faster-rcnn-resnet-v1/tensorFlow2/faster-rcnn-resnet50-v1-640x640/1) in TensorFlow format.
1212

‎docs/parameters.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -7,17 +7,17 @@
77
|---|---|---|
88
| `"model_name"/"name"` | `string` | Model name exposed over gRPC and REST API.(use `model_name` in command line, `name` in json config) |
99
| `"model_path"/"base_path"` | `string` | If using a Google Cloud Storage, Azure Storage or S3 path, see [cloud storage guide](./using_cloud_storage.md). The path may look as follows:<br>`"/opt/ml/models/model"`<br>`"gs://bucket/models/model"`<br>`"s3://bucket/models/model"`<br>`"azure://bucket/models/model"`<br>The path can be also relative to the config.json location<br>(use `model_path` in command line, `base_path` in json config) |
10-
| `"shape"` | `tuple/json/"auto"` | `shape` is optional and takes precedence over `batch_size`. The `shape` argument changes the model that is enabled in the model server to fit the parameters. `shape` accepts three forms of the values: * `auto` - The model server reloads the model with the shape that matches the input data matrix. * a tuple, such as `(1,3,224,224)` - The tuple defines the shape to use for all incoming requests for models with a single input. * A dictionary of shapes, such as `{"input1":"(1,3,224,224)","input2":"(1,3,50,50)", "input3":"auto"}` - This option defines the shape of every included input in the model.Some models don't support the reshape operation.If the model can't be reshaped, it remains in the original parameters and all requests with incompatible input format result in an error. See the logs for more information about specific errors.Learn more about supported model graph layers including all limitations at [Shape Inference Document](https://docs.openvino.ai/2024/openvino-workflow/running-inference/changing-input-shape.html). |
10+
| `"shape"` | `tuple/json/"auto"` | `shape` is optional and takes precedence over `batch_size`. The `shape` argument changes the model that is enabled in the model server to fit the parameters. `shape` accepts three forms of the values: * `auto` - The model server reloads the model with the shape that matches the input data matrix. * a tuple, such as `(1,3,224,224)` - The tuple defines the shape to use for all incoming requests for models with a single input. * A dictionary of shapes, such as `{"input1":"(1,3,224,224)","input2":"(1,3,50,50)", "input3":"auto"}` - This option defines the shape of every included input in the model.Some models don't support the reshape operation.If the model can't be reshaped, it remains in the original parameters and all requests with incompatible input format result in an error. See the logs for more information about specific errors.Learn more about supported model graph layers including all limitations at [Shape Inference Document](https://docs.openvino.ai/2025/openvino-workflow/running-inference/changing-input-shape.html). |
1111
| `"batch_size"` | `integer/"auto"` | Optional. By default, the batch size is derived from the model, defined through the OpenVINO Model Optimizer. `batch_size` is useful for sequential inference requests of the same batch size.Some models, such as object detection, don't work correctly with the `batch_size` parameter. With these models, the output's first dimension doesn't represent the batch size. You can set the batch size for these models by using network reshaping and setting the `shape` parameter appropriately.The default option of using the Model Optimizer to determine the batch size uses the size of the first dimension in the first input for the size. For example, if the input shape is `(1, 3, 225, 225)`, the batch size is set to `1`. If you set `batch_size` to a numerical value, the model batch size is changed when the service starts.`batch_size` also accepts a value of `auto`. If you use `auto`, then the served model batch size is set according to the incoming data at run time. The model is reloaded each time the input data changes the batch size. You might see a delayed response upon the first request. |
1212
| `"layout" `| `json/string` | `layout` is optional argument which allows to define or change the layout of model input and output tensors. To change the layout (add the transposition step), specify `<target layout>:<source layout>`. Example: `NHWC:NCHW` means that user will send input data in `NHWC` layout while the model is in `NCHW` layout.<br><br>When specified without colon separator, it doesn't add a transposition but can determine the batch dimension. E.g. `--layout CN` makes prediction service treat second dimension as batch size.<br><br>When the model has multiple inputs or the output layout has to be changed, use a json format. Set the mapping, such as: `{"input1":"NHWC:NCHW","input2":"HWN:NHW","output1":"CN:NC"}`.<br><br>If not specified, layout is inherited from model.<br><br> [Read more](shape_batch_size_and_layout.md#changing-model-input-output-layout) |
1313
| `"model_version_policy"` | `json/string` | Optional. The model version policy lets you decide which versions of a model that the OpenVINO Model Server is to serve. By default, the server serves the latest version. One reason to use this argument is to control the server memory consumption.The accepted format is in json or string. Examples: <br> `{"latest": { "num_versions":2 }` <br> `{"specific": { "versions":[1, 3] } }` <br> `{"all": {} }` |
14-
| `"plugin_config"` | `json/string` | List of device plugin parameters. For full list refer to [OpenVINO documentation](https://docs.openvino.ai/2024/about-openvino/compatibility-and-support/supported-devices.html) and [performance tuning guide](./performance_tuning.md). Example: <br> `{"PERFORMANCE_HINT": "LATENCY"}` |
14+
| `"plugin_config"` | `json/string` | List of device plugin parameters. For full list refer to [OpenVINO documentation](https://docs.openvino.ai/2025/documentation/compatibility-and-support/supported-devices.html) and [performance tuning guide](./performance_tuning.md). Example: <br> `{"PERFORMANCE_HINT": "LATENCY"}` |
1515
| `"nireq"` | `integer` | The size of internal request queue. When set to 0 or no value is set value is calculated automatically based on available resources.|
1616
| `"target_device"` | `string` | Device name to be used to execute inference operations. Accepted values are: `"CPU"/"GPU"/"MULTI"/"HETERO"` |
1717
| `"stateful"` | `bool` | If set to true, model is loaded as stateful. |
1818
| `"idle_sequence_cleanup"` | `bool` | If set to true, model will be subject to periodic sequence cleaner scans. See [idle sequence cleanup](stateful_models.md). |
1919
| `"max_sequence_number"` | `uint32` | Determines how many sequences can be handled concurrently by a model instance. |
20-
| `"low_latency_transformation"` | `bool` | If set to true, model server will apply [low latency transformation](https://docs.openvino.ai/2024/openvino-workflow/running-inference/stateful-models/obtaining-stateful-openvino-model.html#lowlatency2-transformation) on model load. |
20+
| `"low_latency_transformation"` | `bool` | If set to true, model server will apply [low latency transformation](https://docs.openvino.ai/2025/openvino-workflow/running-inference/stateful-models/obtaining-stateful-openvino-model.html#lowlatency2-transformation) on model load. |
2121
| `"metrics_enable"` | `bool` | Flag enabling [metrics](metrics.md) endpoint on rest_port. |
2222
| `"metrics_list"` | `string` | Comma separated list of [metrics](metrics.md). If unset, only default metrics will be enabled.|
2323

@@ -44,7 +44,7 @@ Configuration options for the server are defined only via command-line options a
4444
| `file_system_poll_wait_seconds` | `integer` | Time interval between config and model versions changes detection in seconds. Default value is 1. Zero value disables changes monitoring. |
4545
| `sequence_cleaner_poll_wait_minutes` | `integer` | Time interval (in minutes) between next sequence cleaner scans. Sequences of the models that are subjects to idle sequence cleanup that have been inactive since the last scan are removed. Zero value disables sequence cleaner. See [idle sequence cleanup](stateful_models.md). It also sets the schedule for releasing free memory from the heap. |
4646
| `custom_node_resources_cleaner_interval_seconds` | `integer` | Time interval (in seconds) between two consecutive resources cleanup scans. Default is 1. Must be greater than 0. See [custom node development](custom_node_development.md). |
47-
| `cpu_extension` | `string` | Optional path to a library with [custom layers implementation](https://docs.openvino.ai/2024/documentation/openvino-extensibility.html). |
47+
| `cpu_extension` | `string` | Optional path to a library with [custom layers implementation](https://docs.openvino.ai/2025/documentation/openvino-extensibility.html). |
4848
| `log_level` | `"DEBUG"/"INFO"/"ERROR"` | Serving logging level |
4949
| `log_path` | `string` | Optional path to the log file. |
5050
| `cache_dir` | `string` | Path to the model cache storage. Caching will be enabled if this parameter is defined or the default path /opt/cache exists |

‎docs/performance_tuning.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ docker run --rm -d --device=/dev/dri --group-add=$(stat -c "%g" /dev/dri/render*
4444

4545
#### LATENCY
4646
This mode prioritizes low latency, providing short response time for each inference job. It performs best for tasks where inference is required for a single input image, like a medical analysis of an ultrasound scan image. It also fits the tasks of real-time or nearly real-time applications, such as an industrial robot's response to actions in its environment or obstacle avoidance for autonomous vehicles.
47-
Note that currently the `PERFORMANCE_HINT` property is supported by CPU and GPU devices only. [More information](https://docs.openvino.ai/2024/openvino-workflow/running-inference/optimize-inference/high-level-performance-hints.html#performance-hints-how-it-works).
47+
Note that currently the `PERFORMANCE_HINT` property is supported by CPU and GPU devices only. [More information](https://docs.openvino.ai/2025/openvino-workflow/running-inference/optimize-inference/high-level-performance-hints.html#performance-hints-how-it-works).
4848

4949
To enable Performance Hints for your application, use the following command:
5050

@@ -124,7 +124,7 @@ In case of using CPU plugin to run the inference, it might be also beneficial to
124124
| ENABLE_CPU_PINNING | This property allows CPU threads pinning during inference. |
125125

126126

127-
> **NOTE:** For additional information about all parameters read about [OpenVINO device properties](https://docs.openvino.ai/2024/api/c_cpp_api/group__ov__runtime__cpp__prop__api.html).
127+
> **NOTE:** For additional information about all parameters read about [OpenVINO device properties](https://docs.openvino.ai/2025/api/c_cpp_api/group__ov__runtime__cpp__prop__api.html).
128128
129129
- Example:
130130
Following docker command will set `NUM_STREAMS` parameter to a value `1`:
@@ -167,7 +167,7 @@ The default value is 1 second which ensures prompt response to creating new mode
167167

168168
Depending on the device employed to run the inference operation, you can tune the execution behavior with a set of parameters. Each device is handled by its OpenVINO plugin.
169169

170-
> **NOTE**: For additional information, read [supported configuration parameters for all plugins](https://docs.openvino.ai/2024/api/c_cpp_api/group__ov__runtime__cpp__prop__api.html).
170+
> **NOTE**: For additional information, read [supported configuration parameters for all plugins](https://docs.openvino.ai/2025/api/c_cpp_api/group__ov__runtime__cpp__prop__api.html).
171171
172172
Model's plugin configuration is a dictionary of param:value pairs passed to OpenVINO Plugin on network load. It can be set with `plugin_config` parameter.
173173

@@ -182,7 +182,7 @@ docker run --rm -d -v ${PWD}/models/public/resnet-50-tf:/opt/model -p 9001:9001
182182
## Analyzing performance issues
183183

184184
Recommended steps to investigate achievable performance and discover bottlenecks:
185-
1. [Launch OV benchmark app](https://docs.openvino.ai/2024/learn-openvino/openvino-samples/benchmark-tool.html)
185+
1. [Launch OV benchmark app](https://docs.openvino.ai/2025/get-started/learn-openvino/openvino-samples/benchmark-tool.html)
186186

187187
**Note:** It is useful to drop plugin configuration from benchmark app using `-dump_config` and then use the same plugin configuration in model loaded into OVMS
188188

‎docs/python_support/reference.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -947,7 +947,7 @@ That's why converter calculators exists. They work as adapters between nodes and
947947

948948
#### PyTensorOvTensorConverterCalculator
949949

950-
OpenVINO Model Server comes with a built-in `PyTensorOvTensorConverterCalculator` that provides conversion between [Python Tensor](#python-tensor) and [OV Tensor](https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_tensor.html).
950+
OpenVINO Model Server comes with a built-in `PyTensorOvTensorConverterCalculator` that provides conversion between [Python Tensor](#python-tensor) and [OV Tensor](https://docs.openvino.ai/2025/api/c_cpp_api/classov_1_1_tensor.html).
951951

952952
Currently `PyTensorOvTensorConverterCalculator` works with only one input and one output.
953953
- The stream that expects Python Tensor **must** have tag `OVMS_PY_TENSOR`

‎docs/security_considerations.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ OpenVINO Model Server currently does not provide access restrictions and traffic
1919

2020
See also:
2121
- [Securing OVMS with NGINX](../extras/nginx-mtls-auth/README.md)
22-
- [Securing models with OVSA](https://docs.openvino.ai/2024/documentation/openvino-ecosystem/openvino-security-add-on.html)
22+
- [Securing models with OVSA](https://docs.openvino.ai/2025/about-openvino/openvino-ecosystem/openvino-project/openvino-security-add-on.html)
2323

2424
---
2525

‎docs/shape_batch_size_and_layout.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ it ignores the batch_size value.
2929
- JSON object e.g. `{"input1":"(1,3,224,224)","input2":"(1,3,50,50)"}` - it defines a shape of every included input in the model
3030

3131
*Note:* Some models do not support the reshape operation. Learn more about supported model graph layers including all limitations
32-
on [Shape Inference Document](https://docs.openvino.ai/2024/openvino-workflow/running-inference/changing-input-shape.html).
32+
on [Shape Inference Document](https://docs.openvino.ai/2025/openvino-workflow/running-inference/changing-input-shape.html).
3333
In case the model can't be reshaped, it will remain in the original parameters and all requests with incompatible input format
3434
will get an error. The model server will also report such problems in the logs.
3535

‎docs/stateful_models.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ docker run -d -u $(id -u):$(id -g) -v $(pwd)/rm_lstm4f:/models/stateful_model -v
7171
| `stateful` | `bool` | If set to true, model is loaded as stateful. | false |
7272
| `idle_sequence_cleanup` | `bool` | If set to true, model will be subject to periodic sequence cleaner scans. <br> See [idle sequence cleanup](#idle-sequence-cleanup). | true |
7373
| `max_sequence_number` | `uint32` | Determines how many sequences can be handled concurrently by a model instance. | 500 |
74-
| `low_latency_transformation` | `bool` | If set to true, model server will apply [low latency transformation](https://docs.openvino.ai/2024/openvino-workflow/running-inference/stateful-models.html) on model load. | false |
74+
| `low_latency_transformation` | `bool` | If set to true, model server will apply [low latency transformation](https://docs.openvino.ai/2025/openvino-workflow/running-inference/stateful-models.html) on model load. | false |
7575

7676
**Note:** Setting `idle_sequence_cleanup`, `max_sequence_number` and `low_latency_transformation` require setting `stateful` to true.
7777

@@ -305,7 +305,7 @@ If set to `true` sequence cleaner will check that model. Otherwise, sequence cle
305305
There are limitations for using stateful models with OVMS:
306306

307307
- Support inference execution only using CPU as the target device.
308-
- Support Kaldi models with memory layers and non-Kaldi models with Tensor Iterator. See this [docs about stateful networks](https://docs.openvino.ai/2024/openvino-workflow/running-inference/stateful-models.html) to learn about stateful networks representation in OpenVINO.
308+
- Support Kaldi models with memory layers and non-Kaldi models with Tensor Iterator. See this [docs about stateful networks](https://docs.openvino.ai/2025/openvino-workflow/running-inference/stateful-models.html) to learn about stateful networks representation in OpenVINO.
309309
- [Auto batch size and shape](shape_batch_size_and_layout.md) are **not** available in stateful models.
310310
- Stateful model instances **cannot** be used in [DAGs](dag_scheduler.md).
311311
- Requests ordering is guaranteed only when a single client sends subsequent requests in a synchronous manner. Concurrent interaction with the same sequence might negatively affect the accuracy of the results.

‎docs/tf_model_binary_input.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This guide shows how to convert TensorFlow models and deploy them with the OpenV
44

55
- In this example TensorFlow model [ResNet](https://github.com/tensorflow/models/tree/v2.2.0/official/r1/resnet) will be used.
66

7-
- TensorFlow model can be converted into Intermediate Representation format using model_optimizer tool. There are several formats for storing TensorFlow model. In this guide, we present conversion from SavedModel format. More information about conversion process can be found in the [model optimizer guide](https://docs.openvino.ai/2024/openvino-workflow/model-preparation.html).
7+
- TensorFlow model can be converted into Intermediate Representation format using model_optimizer tool. There are several formats for storing TensorFlow model. In this guide, we present conversion from SavedModel format. More information about conversion process can be found in the [model optimizer guide](https://docs.openvino.ai/2025/openvino-workflow/model-preparation.html).
88

99
- Binary input format has several requirements for the model and ovms configuration. More information can be found in [binary inputs documentation](binary_input.md).
1010
## Steps
@@ -29,10 +29,10 @@ docker run -u $(id -u):$(id -g) -v ${PWD}/resnet_v2/:/resnet openvino/ubuntu20_d
2929

3030
*Note:* Some models might require other parameters such as `--scale` parameter.
3131
- `--reverse_input_channels` - required for models that are trained with images in RGB order.
32-
- `--mean_values` , `--scale` - should be provided if input pre-processing operations are not a part of topology- and the pre-processing relies on the application providing input data. They can be determined in several ways described in [conversion parameters guide](https://docs.openvino.ai/2024/openvino-workflow/model-preparation/convert-model-tensorflow.html). In this example [model pre-processing script](https://github.com/tensorflow/models/blob/v2.2.0/official/r1/resnet/imagenet_preprocessing.py) was used to determine them.
32+
- `--mean_values` , `--scale` - should be provided if input pre-processing operations are not a part of topology- and the pre-processing relies on the application providing input data. They can be determined in several ways described in [conversion parameters guide](https://docs.openvino.ai/2025/openvino-workflow/model-preparation/convert-model-tensorflow.html). In this example [model pre-processing script](https://github.com/tensorflow/models/blob/v2.2.0/official/r1/resnet/imagenet_preprocessing.py) was used to determine them.
3333

3434

35-
*Note:* You can find out more about [TensorFlow Model conversion into Intermediate Representation](https://docs.openvino.ai/2024/openvino-workflow/model-preparation/convert-model-tensorflow.html) if your model is stored in other formats.
35+
*Note:* You can find out more about [TensorFlow Model conversion into Intermediate Representation](https://docs.openvino.ai/2025/openvino-workflow/model-preparation/convert-model-tensorflow.html) if your model is stored in other formats.
3636

3737
This operation will create model files in `${PWD}/resnet_v2/models/resnet/1/` folder.
3838
```bash

‎src/custom_nodes/image_transformation/README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -48,9 +48,9 @@ make BASE_OS=redhat NODES=image_transformation
4848
| target_image_color_order | Output image color order. If specified and differs from original_image_color_order, color order conversion will be performed | `BGR` | |
4949
| original_image_layout | Input image layout. This is required to determine image shape from input shape | | &check; |
5050
| target_image_layout | Output image layout. If specified and differs from original_image_layout, layout conversion will be performed | | |
51-
| scale | All values will be divided by this value. When `scale_values` is specified, this value is ignored. [read more](https://docs.openvino.ai/2024/documentation/legacy-features/transition-legacy-conversion-api/legacy-conversion-api/%5Blegacy%5D-embedding-preprocessing-computation.html#specifying-mean-and-scale-values) | | |
52-
| scale_values | Scale values to be used for the input image per channel. Input data will be divided by those values. Values should be provided in the same order as output image color order. [read more](https://docs.openvino.ai/2024/documentation/legacy-features/transition-legacy-conversion-api/legacy-conversion-api/%5Blegacy%5D-embedding-preprocessing-computation.html#specifying-mean-and-scale-values) | | |
53-
| mean_values | Mean values to be used for the input image per channel. Values will be subtracted from each input image data value. Values should be provided in the same order as output image color order. [read more](https://docs.openvino.ai/2024/documentation/legacy-features/transition-legacy-conversion-api/legacy-conversion-api/%5Blegacy%5D-embedding-preprocessing-computation.html#specifying-mean-and-scale-values) | | |
51+
| scale | All values will be divided by this value. When `scale_values` is specified, this value is ignored. [read more](https://docs.openvino.ai/2024/documentation/legacy-features/transition-legacy-conversion-api.html#scale-values) | | |
52+
| scale_values | Scale values to be used for the input image per channel. Input data will be divided by those values. Values should be provided in the same order as output image color order. [read more](https://docs.openvino.ai/2024/documentation/legacy-features/transition-legacy-conversion-api.html#scale-values) | | |
53+
| mean_values | Mean values to be used for the input image per channel. Values will be subtracted from each input image data value. Values should be provided in the same order as output image color order. [read more](https://docs.openvino.ai/2024/documentation/legacy-features/transition-legacy-conversion-api.html#mean-values) | | |
5454
| debug | Defines if debug messages should be displayed | false | |
5555

5656
> **_NOTE:_** Subtracting mean values is performed before division by scale values.

‎src/example/SampleCpuExtension/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -40,4 +40,4 @@ $ docker run -it --rm -p 9000:9000 -v `pwd`/lib/${BASE_OS}:/extension:ro -v `pwd
4040
--port 9000 --model_name resnet --model_path /resnet --cpu_extension /extension/libcustom_relu_cpu_extension.so
4141
```
4242

43-
> **NOTE**: Learn more about [OpenVINO extensibility](https://docs.openvino.ai/2024/documentation/openvino-extensibility.html)
43+
> **NOTE**: Learn more about [OpenVINO extensibility](https://docs.openvino.ai/2025/documentation/openvino-extensibility.html)

0 commit comments

Comments
 (0)
Please sign in to comment.