Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
figs		figs
projects		projects
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Repository files navigation

[ICCV 2023] Cross Modal Transformer: Towards Fast and Robust 3D Object Detection

CMT_nuScenes_testset.mp4

This repository is an official implementation of CMT.

Performance comparison between CMT and existing methods. All speed statistics are measured on a single Tesla A100 GPU using the best model of official repositories.

CMT is a robust 3D detector for end-to-end 3D multi-modal detection. A DETR-like framework is designed for multi-modal detection(CMT) and lidar-only detection(CMT-L), which obtains 74.1%(SoTA without TTA/model ensemble) and 70.1% NDS separately on nuScenes benchmark. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. CMT can be a strong baseline for further research.

Preparation

Environments
Python == 3.8
CUDA == 11.1
pytorch == 1.9.0
mmcv-full == 1.6.0
mmdet == 2.24.0
mmsegmentation == 0.29.1
mmdet3d == 1.0.0rc5
spconv-cu111 == 2.1.21
flash-attn == 0.2.2
Data
Follow the mmdet3d to process the nuScenes dataset.

PKLs and image pretrain weights are available at Google Drive.

Train & inference

# train
bash tools/dist_train.sh /path_to_your_config 8
# inference
bash tools/dist_test.sh /path_to_your_config /path_to_your_pth 8 --eval bbox

Main Results

Results on nuScenes val set. The default batch size is 2 on each GPU. The FPS are all evaluated with a single Tesla A100 GPU.

Config	Modality	mAP	NDS	Schedule	Inference FPS
vov_1600x640	C	40.6%	46.0%	20e	8.4
voxel0075	L	62.14%	68.6%	15e+5e	18.1
voxel0100_r50_800x320	C+L	67.9%	70.8%	15e+5e	14.2
voxel0075_vov_1600x640	C+L	70.3%	72.9%	15e+5e	6.0

Results on nuScenes test set. To reproduce our result, replace ann_file=data_root + '/nuscenes_infos_train.pkl' in training config with ann_file=[data_root + '/nuscenes_infos_train.pkl', data_root + '/nuscenes_infos_val.pkl']:

Config	Modality	mAP	NDS	Schedule	Inference FPS
vov_1600x640	C	42.9%	48.1%	20e	8.4
voxel0075	L	65.3%	70.1%	15e+5e	18.1
voxel0075_vov_1600x640	C+L	72.0%	74.1%	15e+5e	6.0

Citation

If you find CMT helpful in your research, please consider citing:

@article{yan2023cross,
  title={Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection},
  author={Yan, Junjie and Liu, Yingfei and Sun, Jianjian and Jia, Fan and Li, Shuailin and Wang, Tiancai and Zhang, Xiangyu},
  journal={arXiv preprint arXiv:2301.01283},
  year={2023}
}

Contact

If you have any questions, feel free to open an issue or contact us at [email protected], [email protected], [email protected] or [email protected].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICCV 2023] Cross Modal Transformer: Towards Fast and Robust 3D Object Detection

Preparation

Train & inference

Main Results

Citation

Contact

About

Releases

Packages

Languages

License

junjie18/CMT

Folders and files

Latest commit

History

Repository files navigation

[ICCV 2023] Cross Modal Transformer: Towards Fast and Robust 3D Object Detection

Preparation

Train & inference

Main Results

Citation

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages