Example inference script of TensorRT(Python)

This is example inference script of TensorRT.
I checked on the following environment.

reComputer J4012(Jetson Orin NX 16GB)
JetPack 5.1.2
TensorRT 8.5.2
cuda-python 12.2.0
onnxruntime-gpu 1.15.1
- https://elinux.org/Jetson_Zoo#ONNX_Runtime

Preparation

create ONNX model

I created model/model_bn.onnx. This model was generated using the following steps.
https://github.com/NVIDIA-AI-IOT/jetson_dla_tutorial

install cuda-python

I installed cuda-python to use CUDA Toolkit.

pip install cuda-python==12.2.0

Build TensorRT Engine

Please build engine by TensorRT.

trtexec --verbose --profilingVerbosity=detailed --buildOnly --memPoolSize=workspace:8192MiB --onnx=model/model_bn.onnx --saveEngine=model/model_bn.onnx.engine > model_bn.onnx.engine.build.log

If you use DLA(Deep Learning Accelerator), please add --useDLACore option.

trtexec --verbose --profilingVerbosity=detailed --buildOnly --memPoolSize=workspace:8192MiB --onnx=model/model_bn.onnx --saveEngine=model/model_bn.onnx.engine --useDLACore=0 --allowGPUFallback > model_bn.onnx.engine.build.log

Inference

I created trt_infer.py to infer using TensorRT Engine.

import TensorRT and cuda-python

import tensorrt as trt
from cuda import cudart

deserialize TensorRT Engine

logger = trt.Logger(trt.Logger.INFO)
runtime = trt.Runtime(logger)
trt_engine_file = 'model/model_bn.onnx.engine'
with open(trt_engine_file, 'rb') as f:
    engine_bytes = f.read()
    engine = runtime.deserialize_cuda_engine(engine_bytes)

create context

context = engine.create_execution_context()

inference

context.set_tensor_address("input", d_input_npa_ptr)
context.set_tensor_address("output", d_output_npa_ptr)
context.execute_async_v3(stream)

Result

ONNX Runtime(CPUExecutionProvider)

$ python3 ort_infer.py
[[-0.00903677 -0.01994101 -0.00086907  0.00596721  0.01973673  0.00928676
  -0.03634664  0.02087523  0.02591487  0.00518102]]

TensorRT(without DLA)

$ python3 trt_infer.py
[08/28/2023-14:46:48] [TRT] [I] Loaded engine size: 6 MiB
[08/28/2023-14:46:49] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +5, now: CPU 0, GPU 5 (MiB)
[08/28/2023-14:46:49] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1, now: CPU 0, GPU 6 (MiB)
[[-0.0090362  -0.01994125 -0.00086519  0.0059704   0.01974285  0.00928105
  -0.03634467  0.02087708  0.0259134   0.00517914]]

TensorRT(with DLA)

$ python3 trt_infer.py
[08/28/2023-14:49:32] [TRT] [I] Loaded engine size: 3 MiB
[08/28/2023-14:49:32] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +3, GPU +0, now: CPU 3, GPU 0 (MiB)
[08/28/2023-14:49:32] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 3, GPU 0 (MiB)
[[-0.0090332  -0.01992798 -0.00086403  0.00596619  0.01971436  0.00928497
  -0.03634644  0.02087402  0.02590942  0.00518417]]

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
image		image
model		model
LICENSE		LICENSE
README.md		README.md
ort_infer.py		ort_infer.py
trt_infer.py		trt_infer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Example inference script of TensorRT(Python)

Preparation

create ONNX model

install cuda-python

Build TensorRT Engine

Inference

import TensorRT and cuda-python

deserialize TensorRT Engine

create context

inference

Result

ONNX Runtime(CPUExecutionProvider)

TensorRT(without DLA)

TensorRT(with DLA)

Reference

About

Releases

Packages

Languages

License

atinfinity/trt-infer-example-py

Folders and files

Latest commit

History

Repository files navigation

Example inference script of TensorRT(Python)

Preparation

create ONNX model

install cuda-python

Build TensorRT Engine

Inference

import TensorRT and cuda-python

deserialize TensorRT Engine

create context

inference

Result

ONNX Runtime(CPUExecutionProvider)

TensorRT(without DLA)

TensorRT(with DLA)

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages