Skip to content

Pattern-Labs/Lidar_AI_Solution

 
 

Repository files navigation

Lidar AI Solution

This is a highly optimized solution for self-driving 3D-lidar repository. It does a great job of speeding up sparse convolution/CenterPoint/BEVFusion/OSD/Conversion.

title

Pipeline overview

pipeline

GetStart

$ git clone --recursive https://github.com/NVIDIA-AI-IOT/Lidar_AI_Solution
$ cd Lidar_AI_Solution
  • For each specific task please refer to the readme in the sub-folder.

3D Sparse Convolution

A tiny inference engine for 3d sparse convolutional networks using int8/fp16.

  • Tiny Engine: Tiny Lidar-Backbone inference engine independent of TensorRT.
  • Flexible: Build execution graph from ONNX.
  • Easy To Use: Simple interface and onnx export solution.
  • High Fidelity: Low accuracy drop on nuScenes validation.
  • Low Memory: 422MB@SCN FP16, 426MB@SCN INT8.
  • Compact: Based on the CUDA kernels and independent of cutlass.

CUDA BEVFusion

CUDA & TensorRT solution for BEVFusion inference, including:

  • Camera Encoder: ResNet50 and finetuned BEV pooling with TensorRT and onnx export solution.
  • Lidar Encoder: Tiny Lidar-Backbone inference independent of TensorRT and onnx export solution.
  • Feature Fusion: Camera & Lidar feature fuser with TensorRT and onnx export solution.
  • Pre/Postprocess: Interval precomputing, lidar voxelization, feature decoder with CUDA kernels.
  • Easy To Use: Preparation, inference, evaluation all in one to reproduce torch Impl accuracy.
  • PTQ: Quantization solutions for mmdet3d/spconv, Easy to understand.

CUDA CenterPoint

CUDA & TensorRT solution for CenterPoint inference, including:

  • Preprocess: Voxelization with CUDA kernel
  • Encoder: 3D backbone with NV spconv-scn and onnx export solution.
  • Neck & Header: RPN & CenterHead with TensorRT and onnx export solution.
  • Postprocess: Decode & NMS with CUDA kernel
  • Easy To Use: Preparation, inference, evaluation all in one to reproduce torch Impl accuracy.
  • QAT: Quantization solutions for traveller59/spconv, Easy to understand.

CUDA PointPillars

CUDA & TensorRT solution for pointpillars inference, including:

  • Preprocess: Voxelization & Feature Extending with CUDA kernel
  • Detector: 2.5D backbone with TensorRT and onnx export solution.
  • Postprocess: Parse bounding box, class type and direction
  • Easy To Use: Preparation, inference, evaluation all in one to reproduce torch Impl accuracy.

CUDA-V2XFusion

Training and inference solutions for V2XFusion.

  • Easy To Use: Provides easily reproducible solutions for training, quantization, and ONNX export.
  • Quantification friendly:PointPillars based backbone with pre-normalization which can reduce quantization error.
  • Feature Fusion: Camera & Lidar feature fuser and onnx export solution.
  • PTQ: Quantization solutions for V2XFusion, easy to understand.
  • Sparsity: 4:2 structural sparsity support.
  • Deepstream sample: Sample inference using CUDA, TensorRT/Triton in NVIDIA DeepStream SDK 7.0.

cuOSD(CUDA On-Screen Display Library)

Draw all elements using a single CUDA kernel.

  • Line: Plotting lines by interpolation(Nearest or Linear).
  • RotateBox: Supports drawn with different border colors and fill colors.
  • Circle: Supports drawn with different border colors and fill colors.
  • Rectangle: Supports drawn with different border colors and fill colors.
  • Text: Supports stb_truetype and pango-cairo backends, allowing fonts to be read via TTF or using font-family.
  • Arrow: Combination of arrows by 3 lines.
  • Point: Plotting points by interpolation(Nearest or Linear).
  • Clock: Time plotting based on text support

cuPCL(CUDA Point Cloud Library)

Provide several GPU accelerated Point Cloud operations with high accuracy and high performance at the same time: cuICP, cuFilter, cuSegmentation, cuOctree, cuCluster, cuNDT, Voxelization(incoming).

  • cuICP: CUDA accelerated iterative corresponding point vertex cloud(point-to-point) registration implementation.
  • cuFilter: Support CUDA accelerated features: PassThrough and VoxelGrid.
  • cuSegmentation: Support CUDA accelerated features: RandomSampleConsensus with a plane model.
  • cuOctree: Support CUDA accelerated features: Approximate Nearest Search and Radius Search.
  • cuCluster: Support CUDA accelerated features: Cluster based on the distance among points.
  • cuNDT: CUDA accelerated 3D Normal Distribution Transform registration implementation for point cloud data.

YUVToRGB(CUDA Conversion)

YUV to RGB conversion. Combine Resize/Padding/Conversion/Normalization into a single kernel function.

  • Most of the time, it can be bit-aligned with OpenCV.
    • It will give an exact result when the scaling factor is a rational number.
    • Better performance is usually achieved when the stride can divide by 4.
  • Supported Input Format:
    • NV12BlockLinear
    • NV12PitchLinear
    • YUV422Packed_YUYV
  • Supported Interpolation methods:
    • Nearest
    • Bilinear
  • Supported Output Data Type:
    • Uint8
    • Float32
    • Float16
  • Supported Output Layout:
    • CHW_RGB/BGR
    • HWC_RGB/BGR
    • CHW16/32/4/RGB/BGR for DLA input
  • Supported Features:
    • Resize
    • Padding
    • Conversion
    • Normalization

Thanks

This project makes use of a number of awesome open source libraries, including:

  • stb_image for PNG and JPEG support
  • pybind11 for seamless C++ / Python interop
  • and others! See the dependencies folder.

Many thanks to the authors of these brilliant projects!

About

Fork for NVIDIA based BEVFusion baseline

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 34.2%
  • C++ 23.9%
  • C 19.3%
  • Cuda 19.2%
  • Shell 1.9%
  • Makefile 1.0%
  • CMake 0.5%