Project Page | Paper | Video | Data
SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving
Yi Wei*, Linqing Zhao*, Wenzhao Zheng, Zheng Zhu, Jiwen Lu, Jie Zhou
- [2022/3/17]: Initial code and paper release.
- [2022/2/27]: Demo release.
Demos are a little bit large; please wait a moment to load them. If you cannot load them or feel them blurry, you can click the hyperlink of each demo for the full-resolution raw video. Welcome to the home page for more demos and detailed introductions.
Towards a more comprehensive and consistent scene reconstruction, in this paper, we propose a SurroundOcc method to predict the volumetric occupancy with multi-camera images. We first extract multi-scale features for each image and adopt spatial cross attention to lift them to the 3D volume space. Then we apply 3D convolutions to progressively upsample the volume features and impose supervision on multiple levels. To train the multi-camera 3D scene reconstruction model, we design a pipeline to generate dense occupancy ground truth with sparse LiDAR points. The generation pipeline only needs existed 3D detection and 3D semantic segmentation labels without extra human annotations. Specifically, we fuse multi-frame LiDAR points of dynamic objects and static scenes separately. Then we adopt Poisson Reconstruction to fill the holes and voxelize the mesh to get dense volumetric occupancy.
Method Pipeline:
Occupancy Ground Truth Generation Pipeline:
You can download our pretrained model for 3D semantic occupancy prediction and 3D scene reconstruction tasks. The difference is whether use semantic labels to train the model. The models are trained on 8 RTX 3090s with about 2.5 days.
You can try our nuScenes pretrained model on your own data! Here we give a template pickle file. You should place it in ./data and change the corresponding infos. Specifically, you need to change the 'lidar2img', 'intrinsic' and 'data_path' as the extrinsic matrix, intrinsic matrix and path of your multi-camera images. Note that the order of frames should be same to their timestamps. 'occ_path' in this pickle file indicates the save path and you will get raw results (.npy) and point coulds (.ply) for further visualization:
./tools/ ./projects/configs/surroundocc/ ./path/to/ckpts.pth 8
Many thanks to these excellent projects:
Related Projects:
If this work is helpful for your research, please consider citing the following BibTeX entry.
title={SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving},
author={Yi Wei and Linqing Zhao and Wenzhao Zheng and Zheng Zhu and Jie Zhou and Jiwen Lu},
journal={arXiv preprint arXiv:2303.09551},