Table Of Contents
NOTE:
nmsPlugin
is deprecated since TensorRT 9.0. Its functionality has been superseded by theINMSLayer
andEfficientNMS
plugin.
The nmsPlugin
, similar to the batchedNMSPlugin
, implements a non_max_suppression
(NMS) operation over bounding boxes for object detection networks. This plugin is included in TensorRT.
Additionally, the nmsPlugin
has a bounding box decoding step prior to the non_max_suppression
step. The nmsPlugin
takes the predicted encoded bounding box data as input, decodes them, followed by the non maximum suppression step in a GPU-accelerated fashion in TensorRT.
This plugin takes three inputs, loc_data
, conf_data
and prior_data
and generates one output containing the following information:
image id
bounding box label
confidence score
bounding box labels
and another output containing the number of valid detections for each batch item after non maximum suppression.
Where:
loc_data
is the predicted bounding box data subject to decoding. It has a shape of[batchSize, numPriors * numLocClasses * 4, 1, 1]
. Where:numPriors
is the total number of prior boxes for one samplenumLocClasses
is 1 if each bounding box predicts the probability for all candidate classes, else the number of candidate classes if the bounding box only does binary prediction for one candidate class. For example, say you have 81 candidate classes, if your bounding box could predict the probability for all the 81 candidate classes,numLocClasses
will be 1 in this case. This is also the original implementation of the SSD model. However, if your bounding box was designed to do binary classification, you would need 81 bounding boxes for one anchor box,numLocClasses
will be 81 in this case.
conf_data
is the object classification confidence probability distribution of the bounding box. It has a shape of[batchSize, numPriors * numClasses, 1, 1]
.prior_data
is the anchor box data with additional scaling factor or variance used for bounding box encoding and decoding generated from the custom plugingridAnchorPlugin
. It has a shape of[batchSize, 2, numPriors * 4, 1]
. Theprior_data
input has two channels.- The first channel is the anchor box data.
- The second channel is the scaling factor or variance used for bounding box encoding and decoding.
After decoding, the decoded boxes will proceed to the non maximum suppression step, which performs the same action as batchedNMSPlugin
. The only difference is that instead of generating four outputs:
nmsed box count
(1 value)nmsed box locations
(4 values)nmsed box scores
(1 value)nmsed box class IDs
(1 value)
The nmsPlugin
generates an output of shape [batchSize, 1, keepTopK, 7]
which contains the same information as the outputs nmsed box locations
, nmsed box scores
, and nmsed box class IDs
from batchedNMSPlugin
, and an another output of shape [batchSize, 1, 1, 1]
which contains the same information as the output nmsed box count
from batchedNMSPlugin
.
The plugin has the plugin creator class NMSPluginCreator
and the plugin class DetectionOutput
.
The DetectionOutput
plugin instance is created using an array of DetectionOutputParameters
type parameters. DetectionOutputParameters
consists of the following parameters:
Type | Parameter | Description |
---|---|---|
bool |
shareLocation |
If true , the bounding boxes are shared among different classes. |
bool |
varianceEncodedInTarget |
If true , variance is encoded in target, you will not use the variance values to adjust the predicted bounding box. Otherwise, you will use the variance values to adjust the predicted bounding box accordingly. |
int |
backgroundLabelId |
The background label ID. If there is no background class, set it to -1 . |
int |
numClasses |
Number of classes to be predicted. |
int |
topK |
Number of boxes per image with top confidence scores that are fed into the NMS algorithm. |
int |
keepTopK |
Number of total bounding boxes to be kept per image after the NMS step. |
float |
confidenceThreshold |
Considers detections whose confidences are larger than a threshold. |
float |
nmsThreshold |
Intersection over union (IoU) threshold to be used in NMS. |
codeTypeSSD |
codeType |
Type of coding method for bbox . |
int |
inputOrder |
Specifies the order of inputs {loc_data, conf_data, priorbox_data} , in other words, inputOrder[0] is for loc_data , inputOrder[1] is for conf_data and inputOrder[2] is for priorbox_data . For example, if your inputs in the memory are in the order of loc_data , priorbox_data , conf_data , then inputOrder should be [0, 2, 1] . |
bool |
confSigmoid |
Set to true to calculate sigmoid of confidence scores. |
bool |
isNormalized |
Set to true if bounding box data is normalized by the network, in other words, the bounding box coordinates used in the model are not pixel coordinates. |
int |
scoreBits |
The number of bits to represent the score values during radix sort. The number of bits to represent score values(confidences) during radix sort. This valid range is 0 < scoreBits <= 10. The default value is 16(which means to use full bits in radix sort). Setting this parameter to any invalid value will result in the same effect as setting it to 16. This parameter can be tuned to strike for a best trade-off between performance and accuracy. Lowering scoreBits will improve performance but with some minor degradation to the accuracy. This parameter is only valid for FP16 data type for now. |
The bounding boxes are used in an encoded format in the model. In order to get the exact bounding box coordinates on the original input image, we need to know the encoding method and decode them.
The bounding box is decoded using the decodeBBoxes
CUDA kernel defined in the decodeBBoxes.cu
file based on the encoding and decoding method used during model training. Currently, we support the following encoding methods:
CodeTypeSSD
is defined in NvInferPlugin.h and has a brief description on NvInferPlugin.h File Reference. The mathematical formulation of the coding methods are listed below.
Without using or having variance encoded, the encoded bounding box representation is:
[x_{min, gt} - x_{min, anchor}, y_{min, gt} - y_{min, anchor}, x_{max, gt} - x_{max, anchor}, y_{max, gt} - y_{max, anchor}]
Using or having variance encoded, the encoded bounding box representation is:
[(x_{min, gt} - x_{min, anchor}) / variance_0, (y_{min, gt} - y_{min, anchor}) / variance_1, (x_{max, gt} - x_{max, anchor}) / variance_2, (y_{max, gt} - y_{max, anchor}) / variance_3]
Without using or having variance encoded, the encoded bounding box representation is:
[(x_{center, gt} - x_{center, anchor}) / w_{anchor}, (y_{center, gt} - y_{center, anchor}) / h_{anchor}, ln(w_{gt} / w_{anchor}), ln(h_{gt} / h_{anchor})]
Using or having variance encoded, the encoded bounding box representation is:
[(x_{center, gt} - x_{center, anchor}) / w_{anchor} / variance_0, (y_{center, gt} - y_{center, anchor}) / h_{anchor} / variance_1, ln(w_{gt} / w_{anchor}) / variance_2, ln(h_{gt} / h_{anchor}) / variance_3]
Without using or having variance encoded, the encoded bounding box representation is:
[(x_{min, gt} - x_{min, anchor}) / w_{anchor}, (y_{min, gt} - y_{min, anchor}) / h_{anchor}, (x_{max, gt} - x_{max, anchor}) / w_{anchor}, (y_{max, gt} - y_{max, anchor}) / h_{anchor}]
Using or having variance encoded, the encoded bounding box representation is:
[(x_{min, gt} - x_{min, anchor}) / w_{anchor} / variance_0, (y_{min, gt} - y_{min, anchor}) / h_{anchor} / variance_1, (x_{max, gt} - x_{max, anchor}) / w_{anchor} / variance_2, (y_{max, gt} - y_{max, anchor}) / h_{anchor} / variance_3]
Using or having variance encoded, the encoded bounding box representation is:
[(y_{center, gt} - y_{center, anchor}) / h_{anchor} / variance_0, (x_{center, gt} - x_{center, anchor}) / w_{anchor} / variance_1, ln(h_{gt} / h_{anchor}) / variance_2, ln(w_{gt} / w_{anchor}) / variance_3]
Note: This code is almost the same to CodeTypeSSD::CENTER_SIZE
using variance encoded except that the order of coordinates were different.
When converting the frozen graph pb
file to the unified framework format uff
file using convert-to-uff
, make sure to generate a human readable graph file with argument -t
. The order of the tensor inputs to the plugin will be exactly the same to the order of tensor inputs in the corresponding node shown in the human readable graph file.
The following resources provide a deeper understanding of the nmsPlugin
plugin:
Networks:
- SSD: Single Shot MultiBox Detector
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
For terms and conditions for use, reproduction, and distribution, see the TensorRT Software License Agreement documentation.
June 2023 Add deprecation note.
May 2019
This is the first release of this README.md
file.
There are no known issues in this plugin.