Skip to content

Commit

Permalink
Docs: Revise technical details
Browse files Browse the repository at this point in the history
  • Loading branch information
maudzung committed Aug 30, 2020
1 parent f638f72 commit f8a2812
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 17 deletions.
13 changes: 10 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@
- [x] Support [distributed data parallel training](https://github.com/pytorch/examples/tree/master/distributed/ddp)
- [x] Release pre-trained models

**Technical details could be found [here](./Technical_details.md)**
**The technical details are described [here](./Technical_details.md)**

## Demonstration (on GTX 1080Ti)
## Demonstration (on a single GTX 1080Ti)

[![demo](http://img.youtube.com/vi/FI8mJIXkgX4/0.jpg)](http://www.youtube.com/watch?v=FI8mJIXkgX4)

Expand Down Expand Up @@ -131,7 +131,14 @@ Thank you!
## References

[1] CenterNet: [Objects as Points paper](https://arxiv.org/abs/1904.07850), [PyTorch Implementation](https://github.com/xingyizhou/CenterNet) <br>
[2] RTM3D: [PyTorch Implementation](https://github.com/maudzung/RTM3D)
[2] RTM3D: [PyTorch Implementation](https://github.com/maudzung/RTM3D) <br>
[3] Libra_R-CNN: [PyTorch Implementation](https://github.com/OceanPang/Libra_R-CNN)

_The YOLO-based models with the same BEV maps input:_ <br>
[4] Complex-YOLO: [v4](https://github.com/maudzung/Complex-YOLOv4-Pytorch), [v3](https://github.com/ghimiredhikura/Complex-YOLOv3), [v2](https://github.com/AI-liu/Complex-YOLO)

*3D LiDAR Point pre-processing:* <br>
[5] VoxelNet: [PyTorch Implementation](https://github.com/skyhehe123/VoxelNet-pytorch)

## Folder structure

Expand Down
41 changes: 27 additions & 14 deletions Technical_details.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,38 +5,51 @@
Technical details of the implementation


## 1. Input/Output & Model

- I used the ResNet-based Keypoint Feature Pyramid Network (KFPN) that was proposed in [RTM3D paper](https://arxiv.org/pdf/2001.03343.pdf).
- The model takes a birds-eye-view RGB-map as input. The RGB-map is encoded by height, intensity, and density of 3D LiDAR point clouds.
- **Outputs**: **7 degrees of freedom** _(7-DOF)_ of objects: `(cx, cy, cz, l, w, h, θ)`
## 1. Network architecture

- The **ResNet-based Keypoint Feature Pyramid Network** (KFPN) that was proposed in [RTM3D paper](https://arxiv.org/pdf/2001.03343.pdf).
The unofficial implementation of the RTM3D paper by using PyTorch is [here](https://github.com/maudzung/RTM3D)
- **Input**:
- The model takes a birds-eye-view (BEV) map as input.
- The BEV map is encoded by height, intensity, and density of 3D LiDAR point clouds. Assume that the size of the BEV input is `(H, W, 3)`.

- **Outputs**:
- Heatmap for main center with a size of `(H/S, W/S, C)` where `S=4` _(the down-sample ratio)_, and `C=3` _(the number of classes)_
- Center offset: `(H/S, W/S, 2)`
- The heading angle _(yaw)_: `(H/S, W/S, 2)`. The model estimates the **im**aginary and the **re**al fraction (`sin(yaw)` and `cos(yaw)` values).
- Dimension _(h, w, l)_: `(H/S, W/S, 3)`
- `z` coordinate: `(H/S, W/S, 1)`

- **Targets**: **7 degrees of freedom** _(7-DOF)_ of objects: `(cx, cy, cz, l, w, h, θ)`
- `cx, cy, cz`: The center coordinates.
- `l, w, h`: length, width, height of the bounding box.
- `θ`: The heading angle in radians of the bounding box.

- **Objects**: Cars, Pedestrians, Cyclists.

## 2. Losses function

- For main center heatmap: Used `focal loss`

- For heading angle _(direction)_: The model predicts 2 components (`imaginary value` and `real value`).
The `im` and `re` are directly regressed by using `l1_loss`
- For heading angle _(yaw)_: The `im` and `re` fractions are directly regressed by using `l1_loss`

- For `z coordinate` and `3 dimensions` (height, width, length), I used `balanced l1 loss` that was proposed by the paper
[Libra R-CNN: Towards Balanced Learning for Object Detection](https://arxiv.org/pdf/1904.02701.pdf)

## 3. Training in details

- Set weights for the above losses are uniform (`=1.0` for all)
- Number of epochs: 300
- Learning rate scheduler: [`cosine`](https://arxiv.org/pdf/1812.01187.pdf), initial learning rate: 0.001
- Batch size: `16` (on a single GTX 1080Ti)
- Set uniform weights to the above components of losses. (`=1.0` for all)
- Number of epochs: 300.
- Learning rate scheduler: [`cosine`](https://arxiv.org/pdf/1812.01187.pdf), initial learning rate: 0.001.
- Batch size: `16` (on a single GTX 1080Ti).

## 4. Inference

During the inference, a `3 × 3` max-pooling operation is applied on the center heat map, then I keep `50` predictions whose
center confidences are larger than 0.2.
- A `3 × 3` max-pooling operation was applied on the center heat map, then only `50` predictions whose
center confidences are larger than 0.2 were kept.
- The heading angle _(yaw)_ = `arctan`(_imaginary fraction_ / _real fraction_)

## 5. How to expand the work

You can train the model with more classes and expand the detected area by modifying configurations in the [config/kitti_dataset.py](https://github.com/maudzung/Super-Fast-Accurate-3D-Object-Detection/blob/master/src/config/kitti_config.py) file.
- The model could be trained with more classes and with a larger detected area by modifying configurations in
the [config/kitti_dataset.py](https://github.com/maudzung/Super-Fast-Accurate-3D-Object-Detection/blob/master/src/config/kitti_config.py) file.

0 comments on commit f8a2812

Please sign in to comment.