Skip to content
/ CMT Public

[ICCV 2023] Cross Modal Transformer: Towards Fast and Robust 3D Object Detection

License

Notifications You must be signed in to change notification settings

junjie18/CMT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection

arXiv visitors

CMT_nuScenes_testset.mp4

This repository is an official implementation of CMT.


CMT is a robust 3D detector for end-to-end 3D multi-modal detection. A DETR-like framework is designed for multi-modal detection(CMT) and lidar-only detection(CMT-L), which obtains 73.5% and 70.1% NDS separately on nuScenes benchmark. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. CMT can be a strong baseline for further research.

Preparation

Main Results

We provide some results on nuScenes val set. The default batch size is 2 on each GPU.

config mAP NDS GPU schedule time
CMT-pillar0200-r50-704x256 53.8% 58.5% 8 x 2080ti 20 epoch 13 hours
CMT-voxel0100-r50-800x320 60.1% 63.4% 8 x 2080ti 20 epoch 14 hours
CMT-voxel0075-vov-1600x640 69.4% 71.9% 8 x A100 15e+5e(with cbgs) 45 hours

Citation

If you find CMT helpful in your research, please consider citing:

@article{yan2023cross,
  title={Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection},
  author={Yan, Junjie and Liu, Yingfei and Sun, Jianjian and Jia, Fan and Li, Shuailin and Wang, Tiancai and Zhang, Xiangyu},
  journal={arXiv preprint arXiv:2301.01283},
  year={2023}
}

Contact

If you have any questions, feel free to open an issue or contact us at [email protected], [email protected], [email protected] or [email protected].

About

[ICCV 2023] Cross Modal Transformer: Towards Fast and Robust 3D Object Detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published