Docs: Revise technical details

ericleiva · Aug 30, 2020 · f8a2812 · f8a2812
1 parent f638f72
commit f8a2812
Show file tree

Hide file tree

Showing 2 changed files with 37 additions and 17 deletions.
diff --git a/README.md b/README.md
@@ -13,9 +13,9 @@
 - [x] Support [distributed data parallel training](https://github.com/pytorch/examples/tree/master/distributed/ddp)
 - [x] Release pre-trained models 
 
-**Technical details could be found [here](./Technical_details.md)**
+**The technical details are described [here](./Technical_details.md)**
 
-## Demonstration (on GTX 1080Ti)
+## Demonstration (on a single GTX 1080Ti)
 
 [![demo](http://img.youtube.com/vi/FI8mJIXkgX4/0.jpg)](http://www.youtube.com/watch?v=FI8mJIXkgX4)
 
@@ -131,7 +131,14 @@ Thank you!
 ## References
 
 [1] CenterNet: [Objects as Points paper](https://arxiv.org/abs/1904.07850), [PyTorch Implementation](https://github.com/xingyizhou/CenterNet) <br>
-[2] RTM3D: [PyTorch Implementation](https://github.com/maudzung/RTM3D)
+[2] RTM3D: [PyTorch Implementation](https://github.com/maudzung/RTM3D) <br>
+[3] Libra_R-CNN: [PyTorch Implementation](https://github.com/OceanPang/Libra_R-CNN)
+
+_The YOLO-based models with the same BEV maps input:_ <br>
+[4] Complex-YOLO: [v4](https://github.com/maudzung/Complex-YOLOv4-Pytorch), [v3](https://github.com/ghimiredhikura/Complex-YOLOv3), [v2](https://github.com/AI-liu/Complex-YOLO)
+
+*3D LiDAR Point pre-processing:* <br>
+[5] VoxelNet: [PyTorch Implementation](https://github.com/skyhehe123/VoxelNet-pytorch)
 
 ## Folder structure
 

diff --git a/Technical_details.md b/Technical_details.md
@@ -5,38 +5,51 @@
 Technical details of the implementation
 
 
-## 1. Input/Output & Model
-
-- I used the ResNet-based Keypoint Feature Pyramid Network (KFPN) that was proposed in [RTM3D paper](https://arxiv.org/pdf/2001.03343.pdf). 
-- The model takes a birds-eye-view RGB-map as input. The RGB-map is encoded by height, intensity, and density of 3D LiDAR point clouds. 
-- **Outputs**: **7 degrees of freedom** _(7-DOF)_ of objects: `(cx, cy, cz, l, w, h, θ)`
+## 1. Network architecture
+
+- The **ResNet-based Keypoint Feature Pyramid Network** (KFPN) that was proposed in [RTM3D paper](https://arxiv.org/pdf/2001.03343.pdf).
+The unofficial implementation of the RTM3D paper by using PyTorch is [here](https://github.com/maudzung/RTM3D)
+- **Input**: 
+    - The model takes a birds-eye-view (BEV) map as input. 
+    - The BEV map is encoded by height, intensity, and density of 3D LiDAR point clouds. Assume that the size of the BEV input is `(H, W, 3)`.
+
+- **Outputs**: 
+    - Heatmap for main center with a size of `(H/S, W/S, C)` where `S=4` _(the down-sample ratio)_, and `C=3` _(the number of classes)_
+    - Center offset: `(H/S, W/S, 2)`
+    - The heading angle _(yaw)_: `(H/S, W/S, 2)`. The model estimates the **im**aginary and the **re**al fraction (`sin(yaw)` and `cos(yaw)` values).
+    - Dimension _(h, w, l)_: `(H/S, W/S, 3)`
+    - `z` coordinate: `(H/S, W/S, 1)`
+
+- **Targets**: **7 degrees of freedom** _(7-DOF)_ of objects: `(cx, cy, cz, l, w, h, θ)`
    - `cx, cy, cz`: The center coordinates.
    - `l, w, h`: length, width, height of the bounding box.
    - `θ`: The heading angle in radians of the bounding box.
+
 - **Objects**: Cars, Pedestrians, Cyclists.
 
 ## 2. Losses function
 
 - For main center heatmap: Used `focal loss`
 
-- For heading angle _(direction)_: The model predicts 2 components (`imaginary value` and `real value`). 
-The `im` and `re` are directly regressed by using `l1_loss`
+- For heading angle _(yaw)_: The `im` and `re` fractions are directly regressed by using `l1_loss`
 
 - For `z coordinate` and `3 dimensions` (height, width, length), I used `balanced l1 loss` that was proposed by the paper
  [Libra R-CNN: Towards Balanced Learning for Object Detection](https://arxiv.org/pdf/1904.02701.pdf)
 
 ## 3. Training in details
 
-- Set weights for the above losses are uniform (`=1.0` for all)
-- Number of epochs: 300
-- Learning rate scheduler: [`cosine`](https://arxiv.org/pdf/1812.01187.pdf), initial learning rate: 0.001
-- Batch size: `16` (on a single GTX 1080Ti)
+- Set uniform weights to the above components of losses. (`=1.0` for all)
+- Number of epochs: 300.
+- Learning rate scheduler: [`cosine`](https://arxiv.org/pdf/1812.01187.pdf), initial learning rate: 0.001.
+- Batch size: `16` (on a single GTX 1080Ti).
 
 ## 4. Inference
 
-During the inference, a `3 × 3` max-pooling operation is applied on the center heat map, then I keep `50` predictions whose 
-center confidences are larger than 0.2.
+- A `3 × 3` max-pooling operation was applied on the center heat map, then only `50` predictions whose 
+center confidences are larger than 0.2 were kept.
+- The heading angle _(yaw)_ = `arctan`(_imaginary fraction_ / _real fraction_)
 
 ## 5. How to expand the work
 
-You can train the model with more classes and expand the detected area by modifying configurations in the [config/kitti_dataset.py](https://github.com/maudzung/Super-Fast-Accurate-3D-Object-Detection/blob/master/src/config/kitti_config.py) file.
+- The model could be trained with more classes and with a larger detected area by modifying configurations in 
+the [config/kitti_dataset.py](https://github.com/maudzung/Super-Fast-Accurate-3D-Object-Detection/blob/master/src/config/kitti_config.py) file.