openvinotoolkit · Feb 7, 2025
diff --git a/‎README.md
+3-3 b/‎README.md
+3-3
diff --git a/‎demos/continuous_batching/rag/rag_demo.ipynb
+4-4 b/‎demos/continuous_batching/rag/rag_demo.ipynb
+4-4
diff --git a/‎demos/continuous_batching/speculative_decoding/README.md
+3-3 b/‎demos/continuous_batching/speculative_decoding/README.md
+3-3
diff --git a/‎docs/accelerators.md
+9-9 b/‎docs/accelerators.md
+9-9
diff --git a/‎docs/advanced_topics.md
+1-1 b/‎docs/advanced_topics.md
+1-1
diff --git a/‎docs/build_from_source.md
+1-1 b/‎docs/build_from_source.md
+1-1
diff --git a/‎docs/deploying_server_baremetal.md
+1-1 b/‎docs/deploying_server_baremetal.md
+1-1
diff --git a/‎docs/deploying_server_docker.md
+2-2 b/‎docs/deploying_server_docker.md
+2-2
diff --git a/‎docs/dynamic_shape_dynamic_model.md
+1-1 b/‎docs/dynamic_shape_dynamic_model.md
+1-1
diff --git a/‎docs/home.md
+1-1 b/‎docs/home.md
+1-1
diff --git a/‎docs/llm/reference.md
+1-1 b/‎docs/llm/reference.md
+1-1
diff --git a/‎docs/mediapipe.md
+1-1 b/‎docs/mediapipe.md
+1-1
diff --git a/‎docs/model_cache.md
+1-1 b/‎docs/model_cache.md
+1-1
diff --git a/‎docs/model_server_c_api.md
+1-1 b/‎docs/model_server_c_api.md
+1-1
diff --git a/‎docs/models_repository_classic.md
+2-2 b/‎docs/models_repository_classic.md
+2-2
diff --git a/‎docs/ovms_quickstart.md
+2-2 b/‎docs/ovms_quickstart.md
+2-2
diff --git a/‎docs/parameters.md
+4-4 b/‎docs/parameters.md
+4-4
diff --git a/‎docs/performance_tuning.md
+4-4 b/‎docs/performance_tuning.md
+4-4
diff --git a/‎docs/python_support/reference.md
+1-1 b/‎docs/python_support/reference.md
+1-1
diff --git a/‎docs/security_considerations.md
+1-1 b/‎docs/security_considerations.md
+1-1
diff --git a/‎docs/shape_batch_size_and_layout.md
+1-1 b/‎docs/shape_batch_size_and_layout.md
+1-1
diff --git a/‎docs/stateful_models.md
+2-2 b/‎docs/stateful_models.md
+2-2
diff --git a/‎docs/tf_model_binary_input.md
+3-3 b/‎docs/tf_model_binary_input.md
+3-3
diff --git a/‎src/custom_nodes/image_transformation/README.md
+3-3 b/‎src/custom_nodes/image_transformation/README.md
+3-3
diff --git a/‎src/example/SampleCpuExtension/README.md
+1-1 b/‎src/example/SampleCpuExtension/README.md
+1-1
@@ -13,17 +13,17 @@ Model Server hosts models and makes them accessible to software components over
 
 OpenVINO&trade; Model Server (OVMS) is a high-performance system for serving models. Implemented in C++ for scalability and optimized for deployment on Intel architectures. It uses the same API as [TensorFlow Serving](https://github.com/tensorflow/serving) and [KServe](https://github.com/kserve/kserve) while applying OpenVINO for inference execution. Inference service is provided via gRPC or REST API, making deploying new algorithms and AI experiments easy.
 
-In addition, there are included endpoints for generative use cases compatible with [OpenAI API and Cohere API](./clients_genai.md).
+In addition, there are included endpoints for generative use cases compatible with [OpenAI API and Cohere API](./docs/clients_genai.md).
 
 ![OVMS picture](docs/ovms_high_level.png)
 
 The models used by the server need to be stored locally or hosted remotely by object storage services. For more details, refer to [Preparing Model Repository](docs/models_repository.md) documentation. Model server works inside [Docker containers](docs/deploying_server.md#deploying-model-server-in-docker-container), on [Bare Metal](docs/deploying_server.md#deploying-model-server-on-baremetal-without-container), and in [Kubernetes environment](docs/deploying_server.md#deploying-model-server-in-kubernetes).
-Start using OpenVINO Model Server with a fast-forward serving example from the [QuickStart guide](docs/ovms_quickstart.md) or [LLM QuickStart guide](./llm/quickstart.md).
+Start using OpenVINO Model Server with a fast-forward serving example from the [QuickStart guide](docs/ovms_quickstart.md) or [LLM QuickStart guide](./docs/llm/quickstart.md).
 
 Read [release notes](https://github.com/openvinotoolkit/model_server/releases) to find out what’s new.
 
 ### Key features:
-- **[NEW]** Native Windows support. Check updated [deployment guide](./deploying_server.md)
+- **[NEW]** Native Windows support. Check updated [deployment guide](./docs/deploying_server.md)
 - **[NEW]** [Text Embeddings compatible with OpenAI API](demos/embeddings/README.md)
 - **[NEW]** [Reranking compatible with Cohere API](demos/rerank/README.md)
 - **[NEW]** [Efficient Text Generation via OpenAI API](demos/continuous_batching/README.md)
 
@@ -130,10 +130,10 @@
     }
    ],
    "source": [
-    "!curl https://docs.openvino.ai/2024/openvino-workflow/model-server/ovms_what_is_openvino_model_server.html --create-dirs -o ./docs/ovms_what_is_openvino_model_server.html\n",
-    "!curl https://docs.openvino.ai/2024/openvino-workflow/model-server/ovms_docs_metrics.html -o ./docs/ovms_docs_metrics.html\n",
-    "!curl https://docs.openvino.ai/2024/openvino-workflow/model-server/ovms_docs_streaming_endpoints.html -o ./docs/ovms_docs_streaming_endpoints.html\n",
-    "!curl https://docs.openvino.ai/2024/openvino-workflow/model-server/ovms_docs_target_devices.html -o ./docs/ovms_docs_target_devices.html\n"
+    "!curl https://docs.openvino.ai/2025/openvino-workflow/model-server/ovms_what_is_openvino_model_server.html --create-dirs -o ./docs/ovms_what_is_openvino_model_server.html\n",
+    "!curl https://docs.openvino.ai/2025/openvino-workflow/model-server/ovms_docs_metrics.html -o ./docs/ovms_docs_metrics.html\n",
+    "!curl https://docs.openvino.ai/2025/openvino-workflow/model-server/ovms_docs_streaming_endpoints.html -o ./docs/ovms_docs_streaming_endpoints.html\n",
+    "!curl https://docs.openvino.ai/2025/openvino-workflow/model-server/ovms_docs_target_devices.html -o ./docs/ovms_docs_target_devices.html\n"
    ]
   },
   {
 
@@ -1,6 +1,6 @@
 # How to serve LLM Models in Speculative Decoding Pipeline{#ovms_demos_continuous_batching_speculative_decoding}
 
-Following [OpenVINO GenAI docs](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/genai-guide.html#efficient-text-generation-via-speculative-decoding):
+Following [OpenVINO GenAI docs](https://docs.openvino.ai/2025/openvino-workflow-generative/inference-with-genai.html#efficient-text-generation-via-speculative-decoding):
 > Speculative decoding (or assisted-generation) enables faster token generation when an additional smaller draft model is used alongside the main model. This reduces the number of infer requests to the main model, increasing performance.
 > 
 > The draft model predicts the next K tokens one by one in an autoregressive manner. The main model validates these predictions and corrects them if necessary - in case of a discrepancy, the main model prediction is used. Then, the draft model acquires this token and runs prediction of the next K tokens, thus repeating the cycle.
@@ -13,7 +13,7 @@ This demo shows how to use speculative decoding in the model serving scenario, b
 
 **Model preparation**: Python 3.9 or higher with pip and HuggingFace account
 
-**Model Server deployment**: Installed Docker Engine or OVMS binary package according to the [baremetal deployment guide](../../docs/deploying_server_baremetal.md)
+**Model Server deployment**: Installed Docker Engine or OVMS binary package according to the [baremetal deployment guide](../../../docs/deploying_server_baremetal.md)
 
 ## Model considerations
 
@@ -103,7 +103,7 @@ Assuming you have unpacked model server package, make sure to:
 - **On Windows**: run `setupvars` script
 - **On Linux**: set `LD_LIBRARY_PATH` and `PATH` environment variables
 
-as mentioned in [deployment guide](../../docs/deploying_server_baremetal.md), in every new shell that will start OpenVINO Model Server.
+as mentioned in [deployment guide](../../../docs/deploying_server_baremetal.md), in every new shell that will start OpenVINO Model Server.
 
 Depending on how you prepared models in the first step of this demo, they are deployed to either CPU or GPU (it's defined in `config.json`). If you run on GPU make sure to have appropriate drivers installed, so the device is accessible for the model server.
 
 
@@ -4,9 +4,9 @@
 
 Docker engine installed (on Linux and WSL), or ovms binary package installed as described in the [guide](./deploying_server_baremetal.md) (on Linux or Windows). 
 
-Supported HW is documented in [OpenVINO system requirements](https://docs.openvino.ai/2024/about-openvino/release-notes-openvino/system-requirements.html)
+Supported HW is documented in [OpenVINO system requirements](https://docs.openvino.ai/2025/about-openvino/release-notes-openvino/system-requirements.html)
 
-Before staring the model server as a binary package, make sure there are installed GPU or/and NPU required drivers like described in [https://docs.openvino.ai/2024/get-started/configurations.html](https://docs.openvino.ai/2024/get-started/configurations.html)
+Before staring the model server as a binary package, make sure there are installed GPU or/and NPU required drivers like described in [https://docs.openvino.ai/2025/get-started/install-openvino/configurations.html](https://docs.openvino.ai/2025/get-started/install-openvino/configurations.html)
 
 Additional considerations when deploying with docker container:
 - make sure to use the image version including runtime drivers. The public image has a suffix -gpu like `openvino/model_server:latest-gpu`.
@@ -27,7 +27,7 @@ rm model/1/model.tar.gz
 
 ## Starting Model Server with Intel GPU
 
-The [GPU plugin](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html) uses the [oneDNN](https://github.com/oneapi-src/oneDNN) and [OpenCL](https://github.com/KhronosGroup/OpenCL-SDK) to infer deep neural networks. For inference execution, it employs Intel® Processor Graphics including
+The [GPU plugin](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html) uses the [oneDNN](https://github.com/oneapi-src/oneDNN) and [OpenCL](https://github.com/KhronosGroup/OpenCL-SDK) to infer deep neural networks. For inference execution, it employs Intel® Processor Graphics including
 Intel® Arc™ GPU Series, Intel® UHD Graphics, Intel® HD Graphics, Intel® Iris® Graphics, Intel® Iris® Xe Graphics, and Intel® Iris® Xe MAX graphics and Intel® Data Center GPU.
 
 ### Container
@@ -57,7 +57,7 @@ docker run --rm -it  --device=/dev/dxg --volume /usr/lib/wsl:/usr/lib/wsl -u $(i
 
 ### Binary 
 
-Starting the server with GPU acceleration requires installation of runtime drivers and ocl-icd-libopencl1 package like described on [configuration guide](https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-gpu.html)
+Starting the server with GPU acceleration requires installation of runtime drivers and ocl-icd-libopencl1 package like described on [configuration guide](https://docs.openvino.ai/2025/get-started/install-openvino/configurations/configurations-intel-gpu.html)
 
 Start the model server with GPU accelerations using a command:
 ```console
@@ -67,7 +67,7 @@ ovms --model_path model --model_name resnet --port 9000 --target_device GPU
 
 ## Using NPU device Plugin
 
-OpenVINO Model Server supports using [NPU device](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/npu-device.html)
+OpenVINO Model Server supports using [NPU device](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/npu-device.html)
 
 ### Container
 Example command to run container with NPU:
@@ -82,13 +82,13 @@ Start the model server with NPU accelerations using a command:
 ovms --model_path model --model_name resnet --port 9000 --target_device NPU --batch_size 1
 ```
 
-Check more info about the [NPU driver configuration](https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-npu.html).
+Check more info about the [NPU driver configuration](https://docs.openvino.ai/2025/get-started/install-openvino/configurations/configurations-intel-npu.html).
 
 > **NOTE**: NPU device execute models with static input and output shapes only. If your model has dynamic shape, it can be reset to static with parameters `--batch_size` or `--shape`.
 
 ## Using Heterogeneous Plugin
 
-The [HETERO plugin](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/hetero-execution.html) makes it possible to distribute inference load of one model
+The [HETERO plugin](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/hetero-execution.html) makes it possible to distribute inference load of one model
 among several computing devices. That way different parts of the deep learning network can be executed by devices best suited to their type of calculations.
 OpenVINO automatically divides the network to optimize the process.
 
@@ -115,7 +115,7 @@ ovms --model_path model --model_name resnet --port 9000 --target_device "HETERO:
 
 ## Using AUTO Plugin
 
-[Auto Device](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/auto-device-selection.html) (or AUTO in short) is a new special “virtual” or “proxy” device in the OpenVINO toolkit, it doesn’t bind to a specific type of HW device.
+[Auto Device](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/auto-device-selection.html) (or AUTO in short) is a new special “virtual” or “proxy” device in the OpenVINO toolkit, it doesn’t bind to a specific type of HW device.
 AUTO solves the complexity in application required to code a logic for the HW device selection (through HW devices) and then, on the deducing the best optimization settings on that device.
 AUTO always chooses the best device, if compiling model fails on this device, AUTO will try to compile it on next best device until one of them succeeds.
 
@@ -197,7 +197,7 @@ ovms --model_path model --model_name resnet --port 9000 --plugin_config "{\"PERF
 
 ## Using Automatic Batching Plugin
 
-[Auto Batching](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/automatic-batching.html) (or BATCH in short) is a new special “virtual” device 
+[Auto Batching](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/automatic-batching.html) (or BATCH in short) is a new special “virtual” device 
 which explicitly defines the auto batching. 
 
 It performs automatic batching on-the-fly to improve device utilization by grouping inference requests together, without programming effort from the user. 
 
@@ -18,7 +18,7 @@ Implement any CPU layer, that is not support by OpenVINO yet, as a shared librar
 [Learn more](../src/example/SampleCpuExtension/README.md)
 
 ## Model Cache
-Leverage the OpenVINO [model caching](https://docs.openvino.ai/2024/openvino-workflow/running-inference/optimize-inference/optimizing-latency/model-caching-overview.html) feature to speed up subsequent model loading on a target device.
+Leverage the OpenVINO [model caching](https://docs.openvino.ai/2025/openvino-workflow/running-inference/optimize-inference/optimizing-latency/model-caching-overview.html) feature to speed up subsequent model loading on a target device.
 
 [Learn more](model_cache.md)
 
 
@@ -143,7 +143,7 @@ make release_image MEDIAPIPE_DISABLE=1 PYTHON_DISABLE=1
 
 ### `GPU`
 
-When set to `1`, OpenVINO&trade Model Server will be built with the drivers required by [GPU plugin](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html) support. Default value: `0`.
+When set to `1`, OpenVINO&trade Model Server will be built with the drivers required by [GPU plugin](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html) support. Default value: `0`.
 
 Example:
 ```bash
 
@@ -164,7 +164,7 @@ Learn more about model server [starting parameters](parameters.md).
 
 > **NOTE**:
 > When serving models on [AI accelerators](accelerators.md), some additional steps may be required to install device drivers and dependencies.
-> Learn more in the [Additional Configurations for Hardware](https://docs.openvino.ai/2024/get-started/configurations.html) documentation.
+> Learn more in the [Additional Configurations for Hardware](https://docs.openvino.ai/2025/get-started/install-openvino/configurations.html) documentation.
 
 
 ## Next Steps
 
@@ -7,7 +7,7 @@ This is a step-by-step guide on how to deploy OpenVINO&trade; Model Server on Li
 - [Docker Engine](https://docs.docker.com/engine/) installed
 - Intel® Core™ processor (6-13th gen.) or Intel® Xeon® processor (1st to 4th gen.)
 - Linux, macOS or Windows via [WSL](https://docs.microsoft.com/en-us/windows/wsl/)
-- (optional) AI accelerators [supported by OpenVINO](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes.html). Accelerators are tested only on bare-metal Linux hosts.
+- (optional) AI accelerators [supported by OpenVINO](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes.html). Accelerators are tested only on bare-metal Linux hosts.
 
 ### Launch Model Server Container
 
@@ -85,4 +85,4 @@ make release_image GPU=1
 It will create an image called `openvino/model_server:latest`.
 > **Note:** This operation might take 40min or more depending on your build host.
 > **Note:** `GPU` parameter in image build command is needed to include dependencies for GPU device.
-> **Note:** The public image from the last release might be not compatible with models exported using the the latest export script. Check the [demo version from the last release](https://github.com/openvinotoolkit/model_server/tree/releases/2024/4/demos/continuous_batching) to use the public docker image.
+> **Note:** The public image from the last release might be not compatible with models exported using the the latest export script. We recommend using export script and docker image from the same release to avoid compatibility issues.
@@ -8,7 +8,7 @@ Enable dynamic shape by setting the `shape` parameter to range or undefined:
 - `--shape "(1,3,200:500,200:500)"` when model is supposed to support height and width values in a range of 200-500. Note that any dimension can support range of values, height and width are only examples here.
 
 > Note that some models do not support dynamic dimensions. Learn more about supported model graph layers including all limitations
-on [Shape Inference Document](https://docs.openvino.ai/2024/openvino-workflow/running-inference/changing-input-shape.html).
+on [Shape Inference Document](https://docs.openvino.ai/2025/openvino-workflow/running-inference/changing-input-shape.html).
 
 Another option to use dynamic shape feature is to export the model with dynamic dimension using Model Optimizer. OpenVINO Model Server will inherit the dynamic shape and no additional settings are needed.
 
 
@@ -58,5 +58,5 @@ Start using OpenVINO Model Server with a fast-forward serving example from the [
 * [RAG building blocks made easy and affordable with OpenVINO Model Server](https://medium.com/openvino-toolkit/rag-building-blocks-made-easy-and-affordable-with-openvino-model-server-e7b03da5012b)
 * [Simplified Deployments with OpenVINO™ Model Server and TensorFlow Serving](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Simplified-Deployments-with-OpenVINO-Model-Server-and-TensorFlow/post/1353218)
 * [Inference Scaling with OpenVINO™ Model Server in Kubernetes and OpenShift Clusters](https://www.intel.com/content/www/us/en/developer/articles/technical/deploy-openvino-in-openshift-and-kubernetes.html)
-* [Benchmarking results](https://docs.openvino.ai/2024/about-openvino/performance-benchmarks.html)
+* [Benchmarking results](https://docs.openvino.ai/2025/about-openvino/performance-benchmarks.html)
 * [Release Notes](https://github.com/openvinotoolkit/model_server/releases)
@@ -81,7 +81,7 @@ The calculator supports the following `node_options` for tuning the pipeline con
 -    `optional uint64 max_num_seqs` - max number of sequences actively processed by the engine [default = 256];
 -    `optional bool dynamic_split_fuse` - use Dynamic Split Fuse token scheduling [default = true];
 -    `optional string device` - device to load models to. Supported values: "CPU", "GPU" [default = "CPU"]
--    `optional string plugin_config` - [OpenVINO device plugin configuration](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes.html). Should be provided in the same format for regular [models configuration](../parameters.md#model-configuration-options) [default = "{}"]
+-    `optional string plugin_config` - [OpenVINO device plugin configuration](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes.html). Should be provided in the same format for regular [models configuration](../parameters.md#model-configuration-options) [default = "{}"]
 -    `optional uint32 best_of_limit` - max value of best_of parameter accepted by endpoint [default = 20];
 -    `optional uint32 max_tokens_limit` - max value of max_tokens parameter accepted by endpoint [default = 4096];
 -    `optional bool enable_prefix_caching` - enable caching of KV-blocks [default = false];
 
@@ -54,7 +54,7 @@ Check their [documentation](https://github.com/openvinotoolkit/mediapipe/blob/ma
 
 ## PyTensorOvTensorConverterCalculator
 
-`PyTensorOvTensorConverterCalculator` enables conversion between nodes that are run by `PythonExecutorCalculator` and nodes that receive and/or produce [OV Tensors](https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_tensor.html)
+`PyTensorOvTensorConverterCalculator` enables conversion between nodes that are run by `PythonExecutorCalculator` and nodes that receive and/or produce [OV Tensors](https://docs.openvino.ai/2025/api/c_cpp_api/classov_1_1_tensor.html)
 
 ## How to create the graph for deployment in OpenVINO Model Server
 
 
@@ -1,7 +1,7 @@
 # Model Cache {#ovms_docs_model_cache}
 
 ## Overview
-The Model Server can leverage a [OpenVINO&trade; model cache functionality](https://docs.openvino.ai/2024/openvino-workflow/running-inference/optimize-inference/optimizing-latency/model-caching-overview.html), to speed up subsequent model loading on a target device.
+The Model Server can leverage a [OpenVINO&trade; model cache functionality](https://docs.openvino.ai/2025/openvino-workflow/running-inference/optimize-inference/optimizing-latency/model-caching-overview.html), to speed up subsequent model loading on a target device.
 The cached files make the Model Server initialization usually faster.
 The boost depends on a model and a target device. The most noticeable improvement will be observed with GPU devices. On other devices, like CPU, it is possible to observe no speed up effect or even slower loading process depending on used model. Test the setup before final deployment.
 
 
@@ -47,7 +47,7 @@ To execute inference using C API you must follow steps described below.
 Create an inference request using `OVMS_InferenceRequestNew` specifying which servable name and optionally version to use. Then specify input tensors with `OVMS_InferenceRequestAddInput` and set the tensor data using `OVMS_InferenceRequestInputSetData`. Optionally you can also set one or all outputs with `OVMS_InferenceRequestAddOutput` and `OVMS_InferenceRequestOutputSetData`. For asynchronous inference you also have to set callback with `OVMS_InferenceRequestSetCompletionCallback`.
 
 #### Using OpenVINO Remote Tensor
-With OpenVINO Model Server C-API you could also leverage the OpenVINO remote tensors support. Check original documentation [here](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device/remote-tensor-api-gpu-plugin.html). In order to use OpenCL buffers you need to first create `cl::Buffer` and then use its pointer in setting input with `OVMS_InferenceRequestInputSetData` or output with `OVMS_InferenceRequestOutputSetData` and buffer type `OVMS_BUFFERTYPE_OPENCL`. In case of VA surfaces you need to create appropriate VA surfaces and then use the same calls with buffer type `OVMS_BUFFERTYPE_VASURFACE_Y` and `OVMS_BUFFERTYPE_VASURFACE_UV`.
+With OpenVINO Model Server C-API you could also leverage the OpenVINO remote tensors support. Check original documentation [here](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device/remote-tensor-api-gpu-plugin.html). In order to use OpenCL buffers you need to first create `cl::Buffer` and then use its pointer in setting input with `OVMS_InferenceRequestInputSetData` or output with `OVMS_InferenceRequestOutputSetData` and buffer type `OVMS_BUFFERTYPE_OPENCL`. In case of VA surfaces you need to create appropriate VA surfaces and then use the same calls with buffer type `OVMS_BUFFERTYPE_VASURFACE_Y` and `OVMS_BUFFERTYPE_VASURFACE_UV`.
 
 #### Invoke inference
 Execute inference with OpenVINO Model Server using `OVMS_Inference` synchronous call. During inference execution you must not modify `OVMS_InferenceRequest` and bound memory buffers.
 
@@ -12,14 +12,14 @@ ovms_docs_cloud_storage
 Traditional AI models perform data analysis in a single inference operation. They can be used over KServe API or TensorFlow API. 
 
 The AI models served by OpenVINO&trade; Model Server must be in either of the five formats:
-- [OpenVINO IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format.html), where the graph is represented in .bin and .xml files
+- [OpenVINO IR](https://docs.openvino.ai/2025/documentation/openvino-ir-format.html), where the graph is represented in .bin and .xml files
 - [ONNX](https://onnx.ai/), using the .onnx file
 - [PaddlePaddle](https://www.paddlepaddle.org.cn/en), using .pdiparams and .pdmodel files
 - [TensorFlow](https://www.tensorflow.org/), using SavedModel, MetaGraph or frozen Protobuf formats.
 - [TensorFlow Lite](https://www.tensorflow.org/lite), using the .tflite file
 
 To use models trained in other formats you need to convert them first. To do so, use
-OpenVINO’s [conversion tool](https://docs.openvino.ai/2024/openvino-workflow/model-preparation/convert-model-to-ir.html) for IR, or different
+OpenVINO’s [conversion tool](https://docs.openvino.ai/2025/openvino-workflow/model-preparation/convert-model-to-ir.html) for IR, or different
 [converters](https://onnx.ai/supported-tools.html) for ONNX.
 
 The models need to be placed and mounted in a particular directory structure and according to the following rules:
 
@@ -1,12 +1,12 @@
 # QuickStart - classic models {#ovms_docs_quick_start_guide}
 
-OpenVINO Model Server can perform inference using pre-trained models in either [OpenVINO IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format/operation-sets.html)
+OpenVINO Model Server can perform inference using pre-trained models in either [OpenVINO IR](https://docs.openvino.ai/2025/documentation/openvino-ir-format/operation-sets.html)
 , [ONNX](https://onnx.ai/), [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) or [TensorFlow](https://www.tensorflow.org/) format. You can get them by:
 
 - downloading models from [Open Model Zoo](https://storage.openvinotoolkit.org/repositories/open_model_zoo/)
 - generating the model in a training framework and saving it to a supported format: TensorFlow saved_model, ONNX or PaddlePaddle.
 - downloading the models from models hubs like [Kaggle](https://www.kaggle.com/models) or [ONNX models zoo](https://github.com/onnx/models).
-- converting models from any formats using [conversion tool](https://docs.openvino.ai/2024/openvino-workflow/model-preparation/convert-model-to-ir.html)
+- converting models from any formats using [conversion tool](https://docs.openvino.ai/2025/openvino-workflow/model-preparation/convert-model-to-ir.html)
 
 This guide uses a [Faster R-CNN with Resnet-50 V1 Object Detection model](https://www.kaggle.com/models/tensorflow/faster-rcnn-resnet-v1/tensorFlow2/faster-rcnn-resnet50-v1-640x640/1) in TensorFlow format.
 
 
@@ -7,17 +7,17 @@
 |---|---|---|
 | `"model_name"/"name"` | `string` | Model name exposed over gRPC and REST API.(use `model_name` in command line, `name` in json config)   |
 | `"model_path"/"base_path"` | `string` | If using a Google Cloud Storage, Azure Storage or S3 path, see [cloud storage guide](./using_cloud_storage.md). The path may look as follows:<br>`"/opt/ml/models/model"`<br>`"gs://bucket/models/model"`<br>`"s3://bucket/models/model"`<br>`"azure://bucket/models/model"`<br>The path can be also relative to the config.json location<br>(use `model_path` in command line, `base_path` in json config)  |
-| `"shape"` | `tuple/json/"auto"` | `shape` is optional and takes precedence over `batch_size`. The `shape` argument changes the model that is enabled in the model server to fit the parameters. `shape` accepts three forms of the values: * `auto` - The model server reloads the model with the shape that matches the input data matrix. * a tuple, such as `(1,3,224,224)` - The tuple defines the shape to use for all incoming requests for models with a single input. * A dictionary of shapes, such as `{"input1":"(1,3,224,224)","input2":"(1,3,50,50)", "input3":"auto"}` - This option defines the shape of every included input in the model.Some models don't support the reshape operation.If the model can't be reshaped, it remains in the original parameters and all requests with incompatible input format result in an error. See the logs for more information about specific errors.Learn more about supported model graph layers including all limitations at [Shape Inference Document](https://docs.openvino.ai/2024/openvino-workflow/running-inference/changing-input-shape.html). |
+| `"shape"` | `tuple/json/"auto"` | `shape` is optional and takes precedence over `batch_size`. The `shape` argument changes the model that is enabled in the model server to fit the parameters. `shape` accepts three forms of the values: * `auto` - The model server reloads the model with the shape that matches the input data matrix. * a tuple, such as `(1,3,224,224)` - The tuple defines the shape to use for all incoming requests for models with a single input. * A dictionary of shapes, such as `{"input1":"(1,3,224,224)","input2":"(1,3,50,50)", "input3":"auto"}` - This option defines the shape of every included input in the model.Some models don't support the reshape operation.If the model can't be reshaped, it remains in the original parameters and all requests with incompatible input format result in an error. See the logs for more information about specific errors.Learn more about supported model graph layers including all limitations at [Shape Inference Document](https://docs.openvino.ai/2025/openvino-workflow/running-inference/changing-input-shape.html). |
 | `"batch_size"` | `integer/"auto"` | Optional. By default, the batch size is derived from the model, defined through the OpenVINO Model Optimizer. `batch_size` is useful for sequential inference requests of the same batch size.Some models, such as object detection, don't work correctly with the `batch_size` parameter. With these models, the output's first dimension doesn't represent the batch size. You can set the batch size for these models by using network reshaping and setting the `shape` parameter appropriately.The default option of using the Model Optimizer to determine the batch size uses the size of the first dimension in the first input for the size. For example, if the input shape is `(1, 3, 225, 225)`, the batch size is set to `1`. If you set `batch_size` to a numerical value, the model batch size is changed when the service starts.`batch_size` also accepts a value of `auto`. If you use `auto`, then the served model batch size is set according to the incoming data at run time. The model is reloaded each time the input data changes the batch size. You might see a delayed response upon the first request.  |
 | `"layout" `| `json/string` | `layout` is optional argument which allows to define or change the layout of model input and output tensors. To change the layout (add the transposition step), specify `<target layout>:<source layout>`. Example: `NHWC:NCHW` means that user will send input data in `NHWC` layout while the model is in `NCHW` layout.<br><br>When specified without colon separator, it doesn't add a transposition but can determine the batch dimension. E.g. `--layout CN` makes prediction service treat second dimension as batch size.<br><br>When the model has multiple inputs or the output layout has to be changed, use a json format. Set the mapping, such as: `{"input1":"NHWC:NCHW","input2":"HWN:NHW","output1":"CN:NC"}`.<br><br>If not specified, layout is inherited from model.<br><br> [Read more](shape_batch_size_and_layout.md#changing-model-input-output-layout) |
 | `"model_version_policy"` | `json/string` | Optional. The model version policy lets you decide which versions of a model that the OpenVINO Model Server is to serve. By default, the server serves the latest version. One reason to use this argument is to control the server memory consumption.The accepted format is in json or string. Examples: <br> `{"latest": { "num_versions":2 }` <br> `{"specific": { "versions":[1, 3] } }` <br> `{"all": {} }` |
-| `"plugin_config"` | `json/string`  |  List of device plugin parameters. For full list refer to [OpenVINO documentation](https://docs.openvino.ai/2024/about-openvino/compatibility-and-support/supported-devices.html) and [performance tuning guide](./performance_tuning.md). Example: <br> `{"PERFORMANCE_HINT": "LATENCY"}`  |
+| `"plugin_config"` | `json/string`  |  List of device plugin parameters. For full list refer to [OpenVINO documentation](https://docs.openvino.ai/2025/documentation/compatibility-and-support/supported-devices.html) and [performance tuning guide](./performance_tuning.md). Example: <br> `{"PERFORMANCE_HINT": "LATENCY"}`  |
 | `"nireq"` | `integer` | The size of internal request queue. When set to 0 or no value is set value is calculated automatically based on available resources.|
 | `"target_device"` | `string` | Device name to be used to execute inference operations. Accepted values are: `"CPU"/"GPU"/"MULTI"/"HETERO"` |
 | `"stateful"` | `bool` | If set to true, model is loaded as stateful. |
 | `"idle_sequence_cleanup"` | `bool` | If set to true, model will be subject to periodic sequence cleaner scans.  See [idle sequence cleanup](stateful_models.md). |
 | `"max_sequence_number"` | `uint32` | Determines how many sequences can be handled concurrently by a model instance. |
-| `"low_latency_transformation"` | `bool` | If set to true, model server will apply [low latency transformation](https://docs.openvino.ai/2024/openvino-workflow/running-inference/stateful-models/obtaining-stateful-openvino-model.html#lowlatency2-transformation) on model load. |
+| `"low_latency_transformation"` | `bool` | If set to true, model server will apply [low latency transformation](https://docs.openvino.ai/2025/openvino-workflow/running-inference/stateful-models/obtaining-stateful-openvino-model.html#lowlatency2-transformation) on model load. |
 | `"metrics_enable"` | `bool` | Flag enabling [metrics](metrics.md) endpoint on rest_port. |
 | `"metrics_list"` | `string` | Comma separated list of [metrics](metrics.md). If unset, only default metrics will be enabled.|
 
@@ -44,7 +44,7 @@ Configuration options for the server are defined only via command-line options a
 | `file_system_poll_wait_seconds` | `integer` | Time interval between config and model versions changes detection in seconds. Default value is 1. Zero value disables changes monitoring. |
 | `sequence_cleaner_poll_wait_minutes` | `integer` | Time interval (in minutes) between next sequence cleaner scans. Sequences of the models that are subjects to idle sequence cleanup that have been inactive since the last scan are removed. Zero value disables sequence cleaner. See [idle sequence cleanup](stateful_models.md). It also sets the schedule for releasing free memory from the heap. |
 | `custom_node_resources_cleaner_interval_seconds` | `integer` | Time interval (in seconds) between two consecutive resources cleanup scans. Default is 1. Must be greater than 0. See [custom node development](custom_node_development.md). |
-| `cpu_extension` | `string` | Optional path to a library with [custom layers implementation](https://docs.openvino.ai/2024/documentation/openvino-extensibility.html). |
+| `cpu_extension` | `string` | Optional path to a library with [custom layers implementation](https://docs.openvino.ai/2025/documentation/openvino-extensibility.html). |
 | `log_level` | `"DEBUG"/"INFO"/"ERROR"` | Serving logging level |
 | `log_path` | `string` | Optional path to the log file. |
 | `cache_dir` | `string` | Path to the model cache storage. Caching will be enabled if this parameter is defined or the default path /opt/cache exists |
 
@@ -44,7 +44,7 @@ docker run --rm -d --device=/dev/dri --group-add=$(stat -c "%g" /dev/dri/render*
 
 #### LATENCY
 This mode prioritizes low latency, providing short response time for each inference job. It performs best for tasks where inference is required for a single input image, like a medical analysis of an ultrasound scan image. It also fits the tasks of real-time or nearly real-time applications, such as an industrial robot's response to actions in its environment or obstacle avoidance for autonomous vehicles.
-Note that currently the `PERFORMANCE_HINT` property is supported by CPU and GPU devices only. [More information](https://docs.openvino.ai/2024/openvino-workflow/running-inference/optimize-inference/high-level-performance-hints.html#performance-hints-how-it-works).
+Note that currently the `PERFORMANCE_HINT` property is supported by CPU and GPU devices only. [More information](https://docs.openvino.ai/2025/openvino-workflow/running-inference/optimize-inference/high-level-performance-hints.html#performance-hints-how-it-works).
 
 To enable Performance Hints for your application, use the following command:
 
@@ -124,7 +124,7 @@ In case of using CPU plugin to run the inference, it might be also beneficial to
 | ENABLE_CPU_PINNING | This property allows CPU threads pinning during inference. |
 
 
-> **NOTE:** For additional information about all parameters read about [OpenVINO device properties](https://docs.openvino.ai/2024/api/c_cpp_api/group__ov__runtime__cpp__prop__api.html).
+> **NOTE:** For additional information about all parameters read about [OpenVINO device properties](https://docs.openvino.ai/2025/api/c_cpp_api/group__ov__runtime__cpp__prop__api.html).
 
 - Example:
 Following docker command will set `NUM_STREAMS` parameter to a value `1`:
@@ -167,7 +167,7 @@ The default value is 1 second which ensures prompt response to creating new mode
 
 Depending on the device employed to run the inference operation, you can tune the execution behavior with a set of parameters. Each device is handled by its OpenVINO plugin.
 
-> **NOTE**: For additional information, read [supported configuration parameters for all plugins](https://docs.openvino.ai/2024/api/c_cpp_api/group__ov__runtime__cpp__prop__api.html).
+> **NOTE**: For additional information, read [supported configuration parameters for all plugins](https://docs.openvino.ai/2025/api/c_cpp_api/group__ov__runtime__cpp__prop__api.html).
 
 Model's plugin configuration is a dictionary of param:value pairs passed to OpenVINO Plugin on network load. It can be set with `plugin_config` parameter.
 
@@ -182,7 +182,7 @@ docker run --rm -d -v ${PWD}/models/public/resnet-50-tf:/opt/model -p 9001:9001
 ## Analyzing performance issues
 
 Recommended steps to investigate achievable performance and discover bottlenecks:
-1. [Launch OV benchmark app](https://docs.openvino.ai/2024/learn-openvino/openvino-samples/benchmark-tool.html)
+1. [Launch OV benchmark app](https://docs.openvino.ai/2025/get-started/learn-openvino/openvino-samples/benchmark-tool.html)
 
       **Note:** It is useful to drop plugin configuration from benchmark app using `-dump_config` and then use the same plugin configuration in model loaded into OVMS
 
 
@@ -947,7 +947,7 @@ That's why converter calculators exists. They work as adapters between nodes and
 
 #### PyTensorOvTensorConverterCalculator
 
-OpenVINO Model Server comes with a built-in `PyTensorOvTensorConverterCalculator` that provides conversion between [Python Tensor](#python-tensor) and [OV Tensor](https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_tensor.html).
+OpenVINO Model Server comes with a built-in `PyTensorOvTensorConverterCalculator` that provides conversion between [Python Tensor](#python-tensor) and [OV Tensor](https://docs.openvino.ai/2025/api/c_cpp_api/classov_1_1_tensor.html).
 
 Currently `PyTensorOvTensorConverterCalculator` works with only one input and one output.
 - The stream that expects Python Tensor **must** have tag `OVMS_PY_TENSOR`
 
@@ -19,7 +19,7 @@ OpenVINO Model Server currently does not provide access restrictions and traffic
 
 See also:
 - [Securing OVMS with NGINX](../extras/nginx-mtls-auth/README.md)
-- [Securing models with OVSA](https://docs.openvino.ai/2024/documentation/openvino-ecosystem/openvino-security-add-on.html)
+- [Securing models with OVSA](https://docs.openvino.ai/2025/about-openvino/openvino-ecosystem/openvino-project/openvino-security-add-on.html)
 
 ---
 
 
@@ -29,7 +29,7 @@ it ignores the batch_size value.
     - JSON object e.g. `{"input1":"(1,3,224,224)","input2":"(1,3,50,50)"}` - it defines a shape of every included input in the model
 
 *Note:* Some models do not support the reshape operation. Learn more about supported model graph layers including all limitations
-on [Shape Inference Document](https://docs.openvino.ai/2024/openvino-workflow/running-inference/changing-input-shape.html).
+on [Shape Inference Document](https://docs.openvino.ai/2025/openvino-workflow/running-inference/changing-input-shape.html).
 In case the model can't be reshaped, it will remain in the original parameters and all requests with incompatible input format
 will get an error. The model server will also report such problems in the logs.
 
 
@@ -71,7 +71,7 @@ docker run -d -u $(id -u):$(id -g) -v $(pwd)/rm_lstm4f:/models/stateful_model -v
 | `stateful` | `bool` | If set to true, model is loaded as stateful. | false |
 | `idle_sequence_cleanup` | `bool` | If set to true, model will be subject to periodic sequence cleaner scans. <br> See [idle sequence cleanup](#idle-sequence-cleanup). | true |
 | `max_sequence_number` | `uint32` | Determines how many sequences can be  handled concurrently by a model instance. | 500 |
-| `low_latency_transformation` | `bool` | If set to true, model server will apply [low latency transformation](https://docs.openvino.ai/2024/openvino-workflow/running-inference/stateful-models.html) on model load. | false |
+| `low_latency_transformation` | `bool` | If set to true, model server will apply [low latency transformation](https://docs.openvino.ai/2025/openvino-workflow/running-inference/stateful-models.html) on model load. | false |
 
 **Note:** Setting `idle_sequence_cleanup`, `max_sequence_number` and `low_latency_transformation` require setting `stateful` to true.
 
@@ -305,7 +305,7 @@ If set to `true` sequence cleaner will check that model. Otherwise, sequence cle
 There are limitations for using stateful models with OVMS:
 
  - Support inference execution only using CPU as the target device.
- - Support Kaldi models with memory layers and non-Kaldi models with Tensor Iterator. See this [docs about stateful networks](https://docs.openvino.ai/2024/openvino-workflow/running-inference/stateful-models.html) to learn about stateful networks representation in OpenVINO.
+ - Support Kaldi models with memory layers and non-Kaldi models with Tensor Iterator. See this [docs about stateful networks](https://docs.openvino.ai/2025/openvino-workflow/running-inference/stateful-models.html) to learn about stateful networks representation in OpenVINO.
  - [Auto batch size and shape](shape_batch_size_and_layout.md) are **not** available in stateful models.
  - Stateful model instances **cannot** be used in [DAGs](dag_scheduler.md).
  - Requests ordering is guaranteed only when a single client sends subsequent requests in a synchronous manner. Concurrent interaction with the same sequence might negatively affect the accuracy of the results.
 
@@ -4,7 +4,7 @@ This guide shows how to convert TensorFlow models and deploy them with the OpenV
 
 - In this example TensorFlow model [ResNet](https://github.com/tensorflow/models/tree/v2.2.0/official/r1/resnet) will be used.
 
-- TensorFlow model can be converted into Intermediate Representation format using model_optimizer tool. There are several formats for storing TensorFlow model. In this guide, we present conversion from SavedModel format. More information about conversion process can be found in the [model optimizer guide](https://docs.openvino.ai/2024/openvino-workflow/model-preparation.html).
+- TensorFlow model can be converted into Intermediate Representation format using model_optimizer tool. There are several formats for storing TensorFlow model. In this guide, we present conversion from SavedModel format. More information about conversion process can be found in the [model optimizer guide](https://docs.openvino.ai/2025/openvino-workflow/model-preparation.html).
 
 - Binary input format has several requirements for the model and ovms configuration. More information can be found in [binary inputs documentation](binary_input.md).
 ## Steps
@@ -29,10 +29,10 @@ docker run -u $(id -u):$(id -g) -v ${PWD}/resnet_v2/:/resnet openvino/ubuntu20_d
 
 *Note:* Some models might require other parameters such as `--scale` parameter.
 - `--reverse_input_channels` - required for models that are trained with images in RGB order.
-- `--mean_values` , `--scale` - should be provided if input pre-processing operations are not a part of topology- and the pre-processing relies on the application providing input data. They can be determined in several ways described in [conversion parameters guide](https://docs.openvino.ai/2024/openvino-workflow/model-preparation/convert-model-tensorflow.html). In this example [model pre-processing script](https://github.com/tensorflow/models/blob/v2.2.0/official/r1/resnet/imagenet_preprocessing.py) was used to determine them.
+- `--mean_values` , `--scale` - should be provided if input pre-processing operations are not a part of topology- and the pre-processing relies on the application providing input data. They can be determined in several ways described in [conversion parameters guide](https://docs.openvino.ai/2025/openvino-workflow/model-preparation/convert-model-tensorflow.html). In this example [model pre-processing script](https://github.com/tensorflow/models/blob/v2.2.0/official/r1/resnet/imagenet_preprocessing.py) was used to determine them.
 
 
-*Note:* You can find out more about [TensorFlow Model conversion into Intermediate Representation](https://docs.openvino.ai/2024/openvino-workflow/model-preparation/convert-model-tensorflow.html) if your model is stored in other formats.
+*Note:* You can find out more about [TensorFlow Model conversion into Intermediate Representation](https://docs.openvino.ai/2025/openvino-workflow/model-preparation/convert-model-tensorflow.html) if your model is stored in other formats.
 
 This operation will create model files in `${PWD}/resnet_v2/models/resnet/1/` folder.
 ```bash
 
@@ -48,9 +48,9 @@ make BASE_OS=redhat NODES=image_transformation
 | target_image_color_order  | Output image color order. If specified and differs from original_image_color_order, color order conversion will be performed | `BGR` |  |
 | original_image_layout  | Input image layout. This is required to determine image shape from input shape | | &check; |
 | target_image_layout  | Output image layout. If specified and differs from original_image_layout, layout conversion will be performed | | |
-| scale  | All values will be divided by this value. When `scale_values` is specified, this value is ignored. [read more](https://docs.openvino.ai/2024/documentation/legacy-features/transition-legacy-conversion-api/legacy-conversion-api/%5Blegacy%5D-embedding-preprocessing-computation.html#specifying-mean-and-scale-values) | | |
-| scale_values  | Scale values to be used for the input image per channel. Input data will be divided by those values. Values should be provided in the same order as output image color order. [read more](https://docs.openvino.ai/2024/documentation/legacy-features/transition-legacy-conversion-api/legacy-conversion-api/%5Blegacy%5D-embedding-preprocessing-computation.html#specifying-mean-and-scale-values) | | |
-| mean_values  | Mean values to be used for the input image per channel. Values will be subtracted from each input image data value. Values should be provided in the same order as output image color order. [read more](https://docs.openvino.ai/2024/documentation/legacy-features/transition-legacy-conversion-api/legacy-conversion-api/%5Blegacy%5D-embedding-preprocessing-computation.html#specifying-mean-and-scale-values) | | |
+| scale  | All values will be divided by this value. When `scale_values` is specified, this value is ignored. [read more](https://docs.openvino.ai/2024/documentation/legacy-features/transition-legacy-conversion-api.html#scale-values) | | |
+| scale_values  | Scale values to be used for the input image per channel. Input data will be divided by those values. Values should be provided in the same order as output image color order. [read more](https://docs.openvino.ai/2024/documentation/legacy-features/transition-legacy-conversion-api.html#scale-values) | | |
+| mean_values  | Mean values to be used for the input image per channel. Values will be subtracted from each input image data value. Values should be provided in the same order as output image color order. [read more](https://docs.openvino.ai/2024/documentation/legacy-features/transition-legacy-conversion-api.html#mean-values) | | |
 | debug  | Defines if debug messages should be displayed | false | |
 
 > **_NOTE:_**  Subtracting mean values is performed before division by scale values.
@@ -40,4 +40,4 @@ $ docker run -it --rm -p 9000:9000 -v `pwd`/lib/${BASE_OS}:/extension:ro -v `pwd
  --port 9000 --model_name resnet --model_path /resnet --cpu_extension /extension/libcustom_relu_cpu_extension.so
 ```
 
-> **NOTE**: Learn more about [OpenVINO extensibility](https://docs.openvino.ai/2024/documentation/openvino-extensibility.html)
+> **NOTE**: Learn more about [OpenVINO extensibility](https://docs.openvino.ai/2025/documentation/openvino-extensibility.html)
Original file line number	Diff line number	Diff line change
`@@ -130,10 +130,10 @@`
`130`	`130`	`}`
`131`	`131`	`],`
`132`	`132`	`"source": [`
`133`		`- "!curl https://docs.openvino.ai/2024/openvino-workflow/model-server/ovms_what_is_openvino_model_server.html --create-dirs -o ./docs/ovms_what_is_openvino_model_server.html\n",`
`134`		`- "!curl https://docs.openvino.ai/2024/openvino-workflow/model-server/ovms_docs_metrics.html -o ./docs/ovms_docs_metrics.html\n",`
`135`		`- "!curl https://docs.openvino.ai/2024/openvino-workflow/model-server/ovms_docs_streaming_endpoints.html -o ./docs/ovms_docs_streaming_endpoints.html\n",`
`136`		`- "!curl https://docs.openvino.ai/2024/openvino-workflow/model-server/ovms_docs_target_devices.html -o ./docs/ovms_docs_target_devices.html\n"`
	`133`	`+ "!curl https://docs.openvino.ai/2025/openvino-workflow/model-server/ovms_what_is_openvino_model_server.html --create-dirs -o ./docs/ovms_what_is_openvino_model_server.html\n",`
	`134`	`+ "!curl https://docs.openvino.ai/2025/openvino-workflow/model-server/ovms_docs_metrics.html -o ./docs/ovms_docs_metrics.html\n",`
	`135`	`+ "!curl https://docs.openvino.ai/2025/openvino-workflow/model-server/ovms_docs_streaming_endpoints.html -o ./docs/ovms_docs_streaming_endpoints.html\n",`
	`136`	`+ "!curl https://docs.openvino.ai/2025/openvino-workflow/model-server/ovms_docs_target_devices.html -o ./docs/ovms_docs_target_devices.html\n"`
`137`	`137`	`]`
`138`	`138`	`},`
`139`	`139`	`{`