How to use TensorRT C++ API for high performance GPU machine-learning inference.
Supports models with single / multiple inputs and single / multiple outputs with batching.
Project Overview Video
.
Code Deep-Dive Video
I read all the NVIDIA TensorRT docs so that you don't have to!
This project demonstrates how to use the TensorRT C++ API for high performance GPU inference on image data. It covers how to do the following:
- How to install TensorRT 8 on Ubuntu 20.04.
- How to generate a TRT engine file optimized for your GPU.
- How to specify a simple optimization profile.
- How to read / write data from / into GPU memory and work with GPU images.
- How to use cuda stream to run async inference and later synchronize.
- How to work with models with static and dynamic batch sizes.
- New: Supports models with multiple output tensors (and even works with batching).
- New: Supports models with multiple inputs.
- New: New video walkthrough where I explain every line of code.
- The code can be used as a base for many models, including Insightface ArcFace, YoloV7, SCRFD face detection, and many other single / multiple input - single / multiple output models. You will just need to implement the appropriate post-processing code.
- TODO: Add support for models with dynamic input shapes.
The following instructions assume you are using Ubuntu 20.04. You will need to supply your own onnx model for this sample code, or you can download the sample model (see Sanity Check section below).
- Tested and working on Ubuntu 20.04
- Install CUDA, instructions here.
- Recommended >= 11.8
- Install cudnn, instructions here.
- Recommended >= 8
sudo apt install build-essential
sudo snap install cmake --classic
- Install OpenCV with cuda support. To compile OpenCV from source, run the
build_opencv.sh
script provided in./scripts/
.- If you use the provided script and you have installed cuDNN to a non-standard location, you must modify the
CUDNN_INCLUDE_DIR
andCUDNN_LIBRARY
variables in the script. - Recommended >= 4.8
- If you use the provided script and you have installed cuDNN to a non-standard location, you must modify the
- Download TensorRT 8 from here.
- Recommended >= 8.6
- Required >= 8.6
- Navigate to the
CMakeLists.txt
file and replace theTODO
with the path to your TensorRT installation.
mkdir build
cd build
cmake ..
make -j$(nproc)
- Navigate to the build directory
- Run the executable and provide the path to your onnx model.
- ex.
./run_inference_benchmark ../models/arcfaceresnet100-8.onnx
- Note: See sanity check section below for instructions on how to obtain the arcface model.
- To perform a sanity check, download the following ArcFace model from here and place it in the
./models/
directory. - Running inference using said model and the image located in
./inputs/face_chip.jpg
should produce the following feature vector:- Note: The feature vector will not be identical (but very similar) as TensorRT is not deterministic.
-0.0548096 -0.0994873 0.176514 0.161377 0.226807 0.215942 -0.296143 -0.0601807 0.240112 -0.18457 ...
Wondering how to integrate this library into your project? Or perhaps how to read the outputs to extract meaningful information? If so, check out my newest project, YOLOv8-TensorRT-CPP, which demonstrates how to use the TensorRT C++ API to run YoloV8 inference (supports segmentation). It makes use of this project in the backend!
- The bulk of the implementation is in
src/engine.cpp
. I have written lots of comments all throughout the code which should make it easy to understand what is going on. - You can also check out my deep-dive video in which I explain every line of code.
- If you have issues creating the TensorRT engine file from the onnx model, navigate to
src/engine.cpp
and change the log level by changing the severity level tokVERBOSE
and rebuild and rerun. This should give you more information on where exactly the build process is failing.
If this project was helpful to you, I would appreciate if you could give it a star. That will encourage me to ensure it's up to date and solve issues quickly. I also do consulting work if you require more specific help. Connect with me on LinkedIn.
V3.0
- Implementation has been updated to use TensorRT 8.6 API (ex.
IExecutionContext::enqueueV3()
). - Executable has renamed from
driver
torun_inference_benchmark
and now must be passed path to onnx model as command line argument. - Removed
Options.doesSupportDynamicBatchSize
. Implementation now auto-detects supported batch sizes. - Removed
Options.maxWorkspaceSize
. Implementation now does not limit GPU memory during model constructions, allowing implementation to use as much of memory pool as is available for intermediate layers.
v2.2
- Serialize model name as part of engine file.
V2.1
- Added support for models with multiple inputs. Implementation now supports models with single inputs, multiple inputs, single outputs, multiple outputs, and batching.
V2.0
- Requires OpenCV cuda to be installed. To install, follow instructions here.
Options.optBatchSizes
has been removed, replaced byOptions.optBatchSize
.- Support models with more than a single output (ex. SCRFD).
- Added support for models which do not support batch inference (first input dimension is fixed).
- More error checking.
- Fixed a bunch of common issues people were running into with the original V1.0 version.
- Remove whitespace from GPU device name