Inference engine can be extended by creating custom kernel for network layers. It might be useful when the graph include layers and operations not supported by default by the device plugin.
Implementation of such custom layers can be included in OpenVINO™ Model Server for handling the inference requests.
The process of creating the extension is documented on docs.openvinotoolkit.org
The extension should be compiled as a separate library and copied to the OpenVINO Model Server.
OVMS will look for the extension library in the path defined by environment variable CPU_EXTENSION
.
Without this variable, a standard set of layers
will be supported.
Note: The Docker image with OpenVINO Model Server does not include all the tools and sub-components needed to compile the extension library, so you might need to execute this process on a separate host.
While the CPU extension is compiled, you can attach it to the docker container with OpenVINO Model Server and reference it be setting its path like in the example:
docker run --rm -d -v /models/:/opt/ml:ro -p 9001:9001 --env CPU_EXTENSION=/opt/ml/libcpu_extension.so ie-serving-py:latest /ie-serving-py/start_server.sh ie_serving config --config_path /opt/ml/config.json --port 9001