This repository contains sources and model for BEVFusion inference using CUDA & TensorRT.
- For all models, we used the BEVFusion-Base configuration.
- The camera resolution is 256x704
- For the camera backbone, we chose SwinTiny and ResNet50.
Model | Framework | Precision | mAP | NDS | FPS |
---|---|---|---|---|---|
Swin-Tiny BEVFusion-Base |
PyTorch | FP32+FP16 | 68.52 | 71.38 | 8.4(on RTX3090) |
ResNet50 | PyTorch | FP32+FP16 | 67.93 | 70.97 | - |
ResNet50 | TensorRT | FP16 | 67.89 | 70.98 | 18(on ORIN) |
ResNet50-PTQ | TensorRT | FP16+INT8 | 67.66 | 70.81 | 25(on ORIN) |
- Note: The time we reported on ORIN is based on the average of nuScenes 6019 validation samples.
- Since the number of lidar points is the main reason that affects the FPS.
- Please refer to the readme of 3DSparseConvolution for more details.
- For quick practice, we provide an example data of nuScenes. You can download it from ( Google Drive ) or ( Baidu Drive ). It contains the following:
- Camera images in 6 directions.
- Transformation matrix of camera/lidar/ego.
- Use for bevfusion-pytorch data of example-data.pth, allow export onnx only without depending on the full dataset.
- All models (model.zip) can be downloaded from ( Google Drive ) or ( Baidu Drive ). It contains the following:
- swin-tiny onnx models.
- resnet50 onnx and pytorch models.
- resnet50 int8 onnx and PTQ models.
To build bevfusion, we need to depend on the following libraries:
- CUDA >= 11.0
- CUDNN >= 8.2
- TensorRT >= 8.5.0
- libprotobuf-dev == 3.6.1
- Compute Capability >= sm_80
- Python >= 3.6
The data in the performance table was obtained by us on the Nvidia Orin platform, using TensorRT-8.6, cuda-11.4 and cudnn8.6 statistics.
- note: Please use
git clone --recursive
to pull this repository to ensure the integrity of the dependencies.
- download model.zip from ( Google Drive ) or ( Baidu Drive )
- download nuScenes-example-data.zip from ( Google Drive ) or ( Baidu Drive )
# download models and datas to CUDA-BEVFusion
cd CUDA-BEVFusion
# unzip models and datas
unzip model.zip
unzip nuScenes-example-data.zip
# here is the directory structure after unzipping
CUDA-BEVFusion
|-- example-data
|-- 0-FRONT.jpg
|-- 1-FRONT_RIGHT.jpg
|-- ...
|-- camera_intrinsics.tensor
|-- ...
|-- example-data.pth
`-- points.tensor
|-- src
|-- qat
|-- model
|-- resnet50int8
| |-- bevfusion_ptq.pth
| |-- camera.backbone.onnx
| |-- camera.vtransform.onnx
| |-- default.yaml
| |-- fuser.onnx
| |-- head.bbox.onnx
| `-- lidar.backbone.xyz.onnx
|-- resnet50
`-- swint
|-- bevfusion
`-- tool
- Install python dependency libraries
apt install libprotobuf-dev
pip install onnx
- Modify the TensorRT/CUDA/CUDNN/BEVFusion variable values in the tool/environment.sh file.
# change the path to the directory you are currently using
export TensorRT_Lib=/path/to/TensorRT/lib
export TensorRT_Inc=/path/to/TensorRT/include
export TensorRT_Bin=/path/to/TensorRT/bin
export CUDA_Lib=/path/to/cuda/lib64
export CUDA_Inc=/path/to/cuda/include
export CUDA_Bin=/path/to/cuda/bin
export CUDA_HOME=/path/to/cuda
export CUDNN_Lib=/path/to/cudnn/lib
# resnet50/resnet50int8/swint
export DEBUG_MODEL=resnet50int8
# fp16/int8
export DEBUG_PRECISION=int8
export DEBUG_DATA=example-data
export USE_Python=OFF
- Apply the environment to the current terminal.
. tool/environment.sh
- Building the models for tensorRT
bash tool/build_trt_engine.sh
- Compile and run the program
bash tool/run.sh
- For more detail, please refer here
- Modify
USE_Python=ON
in environment.sh to enable compilation of python. - Run
bash tool/run.sh
to build the libpybev.so. - Run
python tool/pybev.py
to test the python interface.
- Use the following command to get a specific commit to avoid failure.
git clone https://github.com/mit-han-lab/bevfusion
cd bevfusion
git checkout db75150717a9462cb60241e36ba28d65f6908607
- Since the number of point clouds fluctuates more, this has a significant impact on the FPS.
- Consider using the ground removal or range filter algorithms provided in cuPCL, which can decrease the inference time by lidar.
- We just implemented the recommended partial quantization method. However, users can further reduce the inference latency by sparse pruning and 4:2 sparsity.
- In the resnet50 model at large resolutions, using the --sparsity=force option can significantly improve inference performance. For more details, please refer to ASP (automatic sparsity tools).
- In general, the camera backbone has less impact on accuracy and more impact on latency.
- A lighter camera backbone (such as resnet34) will achieve lower latency.