🚨 This repository contains the code and trained models of our work "GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction", ICCV 2023
by Youmin Zhang, Fabio Tosi, Stefano Mattoccia and Matteo Poggi
Department of Computer Science and Engineering (DISI), University of Bologna
Note: 🚧 Kindly note that this repository is currently in the development phase.
3D Reconstruction and Trajectory Error. From left to right: RGB-D methods (iMAP, NICE-SLAM, DROID-SLAM, and ours), ground truth scan, and monocular methods (DROID-SLAM and ours).
We introduce GO-SLAM, a deep-learning-based dense visual SLAM framework that achieves real-time global optimization of poses and 3D reconstruction. By integrating robust pose estimation, efficient loop closing, and continuous surface representation updates, GO-SLAM effectively addresses the error accumulation and distortion challenges associated with neural implicit representations. Through the utilization of learned global geometry from input history, GO-SLAM sets new benchmarks in tracking robustness and reconstruction accuracy across synthetic and real-world datasets. Notably, its versatility encompasses monocular, stereo, and RGB-D inputs..
Contributions:
-
A novel deep-learning-based, real-time global pose optimization system that considers the complete history of input frames and continuously aligns all poses.
-
An efficient alignment strategy that enables instantaneous loop closures and correction of global structure, being both memory and time efficient.
-
An instant 3D implicit reconstruction approach, enabling on-the-fly and continuous 3D model update with the latest global pose estimates. This strategy facilitates real-time 3D reconstructions.
-
The first deep-learning architecture for joint robust pose estimation and dense 3D reconstruction suited for any setup: monocular, stereo, or RGB-D cameras.
Architecture Overview
GO-SLAM consists of three parallel threads: front-end tracking, back-end tracking, and instant mapping. It can run with monocular, stereo, and RGB-D input.
🖋️ If you find this code useful in your research, please cite:
@inproceedings{zhang2023goslam,
author = {Zhang, Youmin and Tosi, Fabio and Mattoccia, Stefano and Poggi, Matteo},
title = {GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
}
The code will be available soon!
In this section, we present illustrative examples that demonstrate the effectiveness of our proposal.
Qualitative results on ScanNet dataset. We evaluate our RGB-D mode SLAM using the ScanNet dataset and benchmark it against state-of-the-art techniques. Our method showcases improved global-consistency in reconstruction results.
Qualitative results on Replica dataset. Supporting both Monocular and RGB-D modes, our GO-SLAM is evaluated on the Replica dataset. It achieves real-time, high-quality 3D reconstruction from monocular or RGB-D input. This stands in contrast to NICE-SLAM, designed solely for depth input, which operates at a frame rate of less than 1 per second and requires hours to achieve comparable outcomes.
Qualitatives examples of LC and full BA on scene0054 00 (ScanNet) with a total of 6629 frames. . In (a), a significant error accumulates when no global optimization is available. With loop closing (b), the system is able to eliminate the trajectory error using global geometry. Additionally, online full BA optimizes (c) the poses of all existing keyframes. The final model (d), which integrates both loop closing and full BA, achieves a more complete and accurate 3D model prediction.
For questions, please send an email to [email protected], [email protected] or [email protected]
We sincerely thank the scholarship supported by China Scholarship Council (CSC).