title	titleSuffix	description	services	ms.service	ms.subservice	ms.topic	ms.author	author	ms.reviewer	ms.date	ms.custom
ONNX models: Optimize inference	Azure Machine Learning	Learn how using the Open Neural Network Exchange (ONNX) can help optimize the inference of your machine learning model.	machine-learning	machine-learning	core	conceptual	osomorog	abeomor	mopeakande	11/04/2022	seodec18

ONNX and Azure Machine Learning: Create and accelerate ML models

Learn how using the Open Neural Network Exchange (ONNX) can help optimize the inference of your machine learning model. Inference, or model scoring, is the phase where the deployed model is used for prediction, most commonly on production data.

Optimizing machine learning models for inference (or model scoring) is difficult since you need to tune the model and the inference library to make the most of the hardware capabilities. The problem becomes extremely hard if you want to get optimal performance on different kinds of platforms (cloud/edge, CPU/GPU, etc.), since each one has different capabilities and characteristics. The complexity increases if you have models from a variety of frameworks that need to run on a variety of platforms. It's very time consuming to optimize all the different combinations of frameworks and hardware. A solution to train once in your preferred framework and run anywhere on the cloud or edge is needed. This is where ONNX comes in.

Microsoft and a community of partners created ONNX as an open standard for representing machine learning models. Models from many frameworks including TensorFlow, PyTorch, SciKit-Learn, Keras, Chainer, MXNet, MATLAB, and SparkML can be exported or converted to the standard ONNX format. Once the models are in the ONNX format, they can be run on a variety of platforms and devices.

ONNX Runtime is a high-performance inference engine for deploying ONNX models to production. It's optimized for both cloud and edge and works on Linux, Windows, and Mac. Written in C++, it also has C, Python, C#, Java, and JavaScript (Node.js) APIs for usage in a variety of environments. ONNX Runtime supports both DNN and traditional ML models and integrates with accelerators on different hardware such as TensorRT on NVidia GPUs, OpenVINO on Intel processors, DirectML on Windows, and more. By using ONNX Runtime, you can benefit from the extensive production-grade optimizations, testing, and ongoing improvements.

ONNX Runtime is used in high-scale Microsoft services such as Bing, Office, and Azure Cognitive Services. Performance gains are dependent on a number of factors, but these Microsoft services have seen an average 2x performance gain on CPU. In addition to Azure Machine Learning services, ONNX Runtime also runs in other products that support Machine Learning workloads, including:

Windows: The runtime is built into Windows as part of Windows Machine Learning and runs on hundreds of millions of devices.
Azure SQL product family: Run native scoring on data in Azure SQL Edge and Azure SQL Managed Instance.
ML.NET: Run ONNX models in ML.NET.

Get ONNX models

You can obtain ONNX models in several ways:

Train a new ONNX model in Azure Machine Learning (see examples at the bottom of this article) or by using automated Machine Learning capabilities
Convert existing model from another format to ONNX (see the tutorials)
Get a pre-trained ONNX model from the ONNX Model Zoo
Generate a customized ONNX model from Azure Custom Vision service

Many models including image classification, object detection, and text processing can be represented as ONNX models. If you run into an issue with a model that cannot be converted successfully, please file an issue in the GitHub of the respective converter that you used. You can continue using your existing format model until the issue is addressed.

Deploy ONNX models in Azure

With Azure Machine Learning, you can deploy, manage, and monitor your ONNX models. Using the standard deployment workflow and ONNX Runtime, you can create a REST endpoint hosted in the cloud. See example Jupyter notebooks at the end of this article to try it out for yourself.

Install and use ONNX Runtime with Python

Python packages for ONNX Runtime are available on PyPi.org (CPU, GPU). Please read system requirements before installation.

To install ONNX Runtime for Python, use one of the following commands:

pip install onnxruntime	      # CPU build
pip install onnxruntime-gpu   # GPU build

To call ONNX Runtime in your Python script, use:

import onnxruntime
session = onnxruntime.InferenceSession("path to model")

The documentation accompanying the model usually tells you the inputs and outputs for using the model. You can also use a visualization tool such as Netron to view the model. ONNX Runtime also lets you query the model metadata, inputs, and outputs:

session.get_modelmeta()
first_input_name = session.get_inputs()[0].name
first_output_name = session.get_outputs()[0].name

To inference your model, use run and pass in the list of outputs you want returned (leave empty if you want all of them) and a map of the input values. The result is a list of the outputs.

results = session.run(["output1", "output2"], {
                      "input1": indata1, "input2": indata2})
results = session.run([], {"input1": indata1, "input2": indata2})

For the complete Python API reference, see the ONNX Runtime reference docs.

Examples

See how-to-use-azureml/deployment/onnx for example Python notebooks that create and deploy ONNX models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

concept-onnx.md

concept-onnx.md

ONNX and Azure Machine Learning: Create and accelerate ML models

Get ONNX models

Deploy ONNX models in Azure

Install and use ONNX Runtime with Python

Examples

More info

Files

concept-onnx.md

Latest commit

History

concept-onnx.md

File metadata and controls

ONNX and Azure Machine Learning: Create and accelerate ML models

Get ONNX models

Deploy ONNX models in Azure

Install and use ONNX Runtime with Python

Examples

More info