Skip to content

Files

This branch is 2 commits ahead of, 1 commit behind xlite-dev/CUDA-Learn-Notes:main.

nms

NMS

0x00 说明

包含以下内容:

  • nms_kernel(CPU/GPU)
  • PyTorch bindings

nms cuda实现是最基础的版本,根据官方源码可以进行进一步优化。

测试

# 只测试Ada架构 不指定默认编译所有架构 耗时较长: Volta, Ampere, Ada, Hopper, ...
export TORCH_CUDA_ARCH_LIST=Ada 
python3 nms.py

输出:

-------------------------------------------------------------------------------------
                                        nboxes=1024
       out_nms: ['1021 ', '1022 ', '1023 '], len of keep: 950, time:0.26456594ms
    out_nms_th: ['1021 ', '1022 ', '1023 '], len of keep: 950, time:0.19218683ms
-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------
                                        nboxes=2048
       out_nms: ['2045 ', '2046 ', '2047 '], len of keep: 1838, time:0.47256470ms
    out_nms_th: ['2044 ', '2045 ', '2047 '], len of keep: 1838, time:0.39437532ms
-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------
                                        nboxes=4096
       out_nms: ['4092 ', '4093 ', '4095 '], len of keep: 3598, time:0.89909315ms
    out_nms_th: ['4093 ', '4094 ', '4095 '], len of keep: 3598, time:1.03515625ms
-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------
                                        nboxes=8192
       out_nms: ['8189 ', '8190 ', '8191 '], len of keep: 7023, time:1.49935722ms
    out_nms_th: ['8189 ', '8190 ', '8191 '], len of keep: 7023, time:3.39094877ms
-------------------------------------------------------------------------------------