Skip to content

Latest commit

 

History

History
 
 

CUDA-V2XFusion

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

V2XFusion

V2XFusion is a multi-sensor fusion 3D object detection solution for roadside scenarios. Utilizing the relative concentration of object heights in the roadside scenarios and the more information brought by multi-sensor fusion, V2XFusion achieved a good performance on the DAIR-V2X-I Dataset. This repository contains training, quantization, and ONNX exportation for V2XFusion. The pictures below show the camera and LiDAR installation locations and the characteristics of the roadside scenarios.

Architecture

The architecture image takes batch size of 4 as an example. The data input of each batch consists of a camera data and the corresponding lidar data.

Git clone and apply patch

Please pull the specified version of BEVFusion and copy our patch package into it.

git clone https://github.com/mit-han-lab/bevfusion
cd bevfusion
git checkout db75150717a9462cb60241e36ba28d65f6908607
cd ..

git clone https://github.com/ADLab-AutoDrive/BEVHeight.git
cd BEVHeight
cp -r evaluators ../bevfusion/
cp -r scripts ../bevfusion/
cd ..

cp -r Lidar_AI_Solution/CUDA-V2XFusion/* bevfusion/ 

Prerequisites

Quick Start

1. Prepare Dataset

Convert the dataset format from DAIR-V2X-I to KITTI. Please refer to this document provided by BEVHeight.

2. Configure dataset paths

Set the dataset_root and dataset_kitti_root in the configs/V2X-I/default.yaml to the paths where DAIR-V2X-I and DAIR-V2X-I-KITTI are located.

3. Training and PTQ

We have two training mechanisms, including sparsity training (2:4 structured sparsity) and dense training. The 2:4 structured sparsity can usually yield significant performance improvement on GPU that supports sparsity. This is an acceleration technique supported by hardware that we recommended.

For sparsity training and PTQ:

  • Training
    $ torchpack dist-run -np 8 python scripts/train.py configs/V2X-I/det/centerhead/lssfpn/camera+pointpillar/resnet34/default.yaml --mode sparsity
  • PTQ
    $ python scripts/ptq_v2xfusion.py configs/V2X-I/det/centerhead/lssfpn/camera+pointpillar/resnet34/default.yaml sparsity_epoch_100.pth --mode sparsity

For dense training and PTQ

  • Training
    $ torchpack dist-run -np 8 python scripts/train.py configs/V2X-I/det/centerhead/lssfpn/camera+pointpillar/resnet34/default.yaml --mode dense
  • PTQ
    $ python scripts/ptq_v2xfusion.py configs/V2X-I/det/centerhead/lssfpn/camera+pointpillar/resnet34/default.yaml dense_epoch_100.pth --mode dense

4. Export ONNX

We have two export configurations available: FP16 (--precision fp16) and INT8 (--precision int8). FP16 configuration provides sub-optimal TensorRT performance but the best prediction accuracy when compared to INT8. There are no QDQ nodes in the exported ONNX file. INT8 configuration provides optimal TensorRT performance but sub-optimal prediction accuracy when compared to FP16. In this configuration, quantization parameters are present as QDQ nodes in the exported ONNX file.

  • FP16
    $ python scripts/export_v2xfusion.py configs/V2X-I/det/centerhead/lssfpn/camera+pointpillar/resnet34/default.yaml ptq.pth --precision fp16
  • INT8
    $ python scripts/export_v2xfusion.py configs/V2X-I/det/centerhead/lssfpn/camera+pointpillar/resnet34/default.yaml ptq.pth --precision int8

5. Deployment on DeepStream

Please refer to the V2XFusion inference sample provided by DeepStream. The configuration parameters can be adjusted during training to adapt to different datasets, especially the dbound parameter. Before inference, you need to confirm that the parameters in the precomputation script in DeepStream are the same with the training setup.

image_size = [864, 1536]
downsample_factor = 16
C = 80 
xbound = [0, 102.4, 0.8]
ybound = [-51.2, 51.2, 0.8]
zbound = [-5, 3, 8]
dbound = [-2, 0, 90]   

Experimental Results

  • DAIR-V2X-I Dataset

    Method Sparsity/Dense FP16/INT8 Car Pedestrain Cyclist model pth
    [email protected] [email protected] [email protected]
    Easy Mod. Hard Easy Mod. Hard Easy Mod. Hard
    V2XFusion sparsity FP16 82.08 69.70 69.76 51.51 49.15 49.54 61.21 58.07 58.65 model
    sparsity INT8-PTQ 82.06 69.70 69.75 51.13 48.86 49.22 60.95 57.81 58.43 ptq.pth
    Dense FP16 82.30 69.84 69.90 49.47 47.31 47.60 59.09 58.29 58.70 model
    Dense INT8-PTQ 82.33 69.88 69.94 48.60 46.55 46.82 59.28 58.12 58.52 ptq.pth

    V2XFusion Backbone: ResNet34 + PointPillars

    Note:
    To make the model more robust, the sequence dataset V2X-Seq-SPD also can be added to the training phase. We provide a pre-trained model (dbound=[-1.5, 3.0, 180]) that is used in the Deepstream demo for reference. Please refer to the script dataset_merge for data merge.

    Performance on Jetson Orin

    FP16 Dense(ms) FP16 Spasity(ms) INT8 Dense(ms) INT8 Sparsity(ms)
    49.3 42.2 31.6 25.3
    • Device: NVIDIA Jetson AGX Orin Developer Kit (MAXN power mode)
    • Version: Jetpack 6.0 DP
    • Modality: Camera + LiDAR
    • Image Resolution: 864 x 1536
    • Batch size: 4

    References