[Paper] [Project Page]
We provide a simple installation script which by default installs a conda env of Python 3.9, PyTorch 1.13 and CUDA 11.6.
source install.sh
Download the model checkpoint trained on Co3D from Dropbox. The predicted camera poses and focal lengths are defined in NDC coordinate.
Here's an example of how to use it:
python demo.py image_folder="samples/apple" ckpt="/PATH/TO/DOWNLOADED/CKPT"
Feel free to test with your own data by specifying a different image_folder
.
Using a Quadro GP100 GPU, the inference time for a 20-frame sequence without GGS is approximately 0.8 seconds, and with GGS it’s around 80 seconds (including 20 seconds for matching extraction).
You can choose to enable or disable GGS in ./cfgs/default.yaml
.
We use Visdom by default for visualization. Please ensure that your Visdom settings are correctly configured to visualize the results accurately; however, Visdom is not necessary for running the model.
Start by following the instructions here to preprocess the annotations of Co3D V2 dataset. This will significantly reduce data processing time during training.
Next, specify the paths CO3D_DIR
and CO3D_ANNOTATION_DIR
in ./cfgs/default_train.yaml
.
Now, you can start 1-GPU training with:
python train.py
All configurations are specified inside ./cfgs/default_train.yaml
.
For multi-GPU training, launch the training script using accelerate, e.g., training on 8 GPUs (processes) in 1 node (machines):
accelerate launch train.py --num_processes=8 --multi_gpu --num_machines=1
Please notice that we use Visdom to record logs.
Thanks for the great implementation of denoising-diffusion-pytorch, guided-diffusion, hloc, relpose.
See the LICENSE file for details about the license under which this code is made available.