[NeurIPS 2024] GeoLRM

Project Page | arXiv | Paper | Checkpoint

GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation

Chubin Zhang, Hongliang Song, Yi Wei, Yu Chen, Jiwen Lu, Yansong Tang

Updates:

🔔 2024/6/21 Code release.
🎉 2024/9/29 GeoLRM has been accepted at NeurIPS 2024!
🔔 2024/9/30 Training code release.

🕹 Demos

3D assets generated by GeoLRM:

demo.mp4

📝 Introduction

In this work, we introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. This limits these methods to a low-resolution representation and makes it difficult to scale up to the dense views for better quality. GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms to effectively integrate image features into 3D representations. We implement this solution through a two-stage pipeline: initially, a lightweight proposal network generates a sparse set of 3D anchor points from the posed image inputs; subsequently, a specialized reconstruction transformer refines the geometry and retrieves textural details.

💡 Method

Method Pipeline:

The process begins with the transformation of dense tokens into an occupancy grid via a Proposal Transformer, which captures spatial occupancy from hierarchical image features extracted using a combination of a convolutional layer and DINOv2. Sparse tokens representing occupied voxels are further processed through a Reconstruction Transformer that employs self-attention and deformable cross-attention mechanisms to refine geometry and retrieve texture details with 3D to 2D projection. Finally, the refined 3D tokens are converted into 3D Gaussians for real-time rendering.

🔧 Installation

Clone this repo and install the dependencies:

Create a new conda environment and install the dependencies:

conda create -n geolrm python=3.10
conda activate geolrm
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install flash-attn --no-build-isolation
pip install -r requirements.txt

Follow the instructions in generative-models to install the sgm package. (For SV3D inference.)
Build the curopr3d and deform_attn_3d CUDA extensions:
```
cd src/models/decoder/curope3d
python setup.py build_ext --inplace
cd ../deform_attn_3d
python setup.py build_ext --inplace
```
If you encounter any issues, please make sure that the CUDA version used to compile the pytorch package and the CUDA version of your NVCC compiler are the same, which can be checked by running the following commands:
```
nvcc --version
python -c "import torch; print(torch.version.cuda)"
```

🚀 Quick Start

Download checkpoints

Download the GeoLRM checkpoint:

wget https://huggingface.co/LinShan/GeoLRM/resolve/main/geolrm.ckpt -P ckpts

Download the sv3d_p.safetensors from Huggingface manually and place it under ckpts.

Gradio App

python app.py

Then open the browser and visit http://127.0.0.1:42339/.

Inference

python run_georm_sv3d.py configs/geolrm.yaml examples --output_path outputs

Tips for better results:

Use high-resolution images for better results.
Orthographic front-facing images lead to good reconstructions.
Avoid white objects and overexposed images.

📑 Training

Download the GObjaverse dataset (gobjaverse_280k split) from here. For now, we only use xxxxx.png, xxxxx.json, and xxxxx_nd.exr files. You can modify the download_gobjaverse_280k.py to exclude other files to save disk space. This results in a dataset with a size of around 2.6 TB. The dataset should be organized as follows:

data/
├── objaverse/
│   ├── gobjaverse_280k.json
│   ├── text_captions_cap3d.json
│   ├── gobjaverse_280k/
│   │   ├── 0/
│   │   │   ├── 10010/
│   │   │   │   ├── 00000/
│   │   │   │   │   ├── 00000.png
│   │   │   │   │   ├── 00000.json
│   │   │   │   │   ├── 00000_nd.exr
...

Generate the occupancy ground truth:
```
python tools/create_occ_gts.py
```
We recommend manually parallelizing this process to speed up the generation of occupancy ground truth:
```
CUDA_VISIBLE_DEVICES=0 python tools/create_occ_gts.py --start 0 --end 140000 &
CUDA_VISIBLE_DEVICES=1 python tools/create_occ_gts.py --start 140000
```
The occupancy ground truth generation process will take around 6 hours on 8 GPUs.
Train the proposal network:
```
python train.py --base configs/srl-bf16.yaml --num_nodes 1 --gpus 0,1,2,3,4,5,6,7
```
The training process will take around 1 day on 8 A100 GPUs.
Train the reconstruction network:
```
python train.py --base configs/geolrm-train.yaml --num_nodes 1 --gpus 0,1,2,3,4,5,6,7
```
We provide a basic script to train with multiple nodes in scripts. The training process will take around 2 days on 32 A100 GPUs.

🙏 Acknowledgement

Many thanks to these excellent projects:

InstantMesh, RichDreamer, LGM, Zero123++, 3DGS, diff-gaussian-rasterization (with depth), generative-models, BEVFormer

📃 Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{zhang2024geolrm,
  title={GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation},
  author={Chubin Zhang and Hongliang Song and Yi Wei and Yu Chen and Jiwen Lu and Yansong Tang},
  journal={arXiv preprint arXiv:2406.15333},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
configs		configs
examples		examples
scripts		scripts
src		src
tools		tools
zero123plus		zero123plus
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
run_geolrm.py		run_geolrm.py
run_geolrm_sv3d.py		run_geolrm_sv3d.py
run_geolrm_zero123.py		run_geolrm_zero123.py
sv3d_video_sample.py		sv3d_video_sample.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[NeurIPS 2024] GeoLRM

🕹 Demos

📝 Introduction

💡 Method

🔧 Installation

🚀 Quick Start

Download checkpoints

Gradio App

Inference

📑 Training

🙏 Acknowledgement

📃 Bibtex

About

Releases

Packages

Contributors 2

Languages

License

alibaba-yuanjing-aigclab/GeoLRM

Folders and files

Latest commit

History

Repository files navigation

[NeurIPS 2024] GeoLRM

🕹 Demos

📝 Introduction

💡 Method

🔧 Installation

🚀 Quick Start

Download checkpoints

Gradio App

Inference

📑 Training

🙏 Acknowledgement

📃 Bibtex

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages