A GAN-based approach to generate 2D world renderings that are consistent over time and viewpoints. This method colors the 3D point cloud of the world as the camera moves through the world, coloring new regions in a manner consistent with the already colored world. It learns to render images based on the 2D projections of the point cloud to the camera in a semantically consistent manner while robustly dealing with incorrect and incomplete point clouds.
Project | YouTube | arXiv | Paper(full) | Two Minute Papers Video
Imaginaire is released under NVIDIA Software license. For commercial use, please consult [email protected]
For installation, please checkout INSTALL.md.
We trained our models using an NVIDIA DGX1 with 8 V100 32GB GPUs. You can try to use fewer GPUs or reduce the batch size if it does not fit in your GPU memory, but training stability and image quality cannot be guaranteed.
We use the Cityscapes dataset as an example. To train a model on the full dataset, please download it from the official website (registration required). We apply a pre-trained segmentation algorithm to get the corresponding segmentation maps.
The following shows the example commands to train vid2vid on the Cityscapes dataset.
- Download the dataset and put it in the format as following.
cityscapes
└───images
└───seq0001
└───000001.png
└───000002.png
...
└───seq0002
└───000001.png
└───000002.png
...
...
└───seg_maps
└───seq0001
└───000001.png
└───000002.png
...
└───seq0002
└───000001.png
└───000002.png
...
...
└───unprojections
└───seq0001
└───000001.pkl
└───000002.pkl
...
└───seq0002
└───000001.pkl
└───000002.pkl
...
...
We will provide the scripts to perform SfM and generate the unprojection files required by wc_vid2vid in a future update.
- Preprocess the data into LMDB format
python scripts/build_lmdb.py --paired \
--config configs/projects/wc_vid2vid/cityscapes/seg_ampO1.yaml \
--data_root [PATH_TO_DATA train|val] \
--output_root datasets/cityscapes/lmdb/[train|val]
- Train on 8 GPUs with AMPO1
python -m torch.distributed.launch --nproc_per_node=8 train.py \
--config configs/projects/wc_vid2vid/cityscapes/seg_ampO1.yaml
- Download some test data by running
python ./scripts/download_test_data.py --model_name wc_vid2vid
-
Or arrange your own data into the same format as the training data described above.
-
Translate segmentation masks to images
- Inference command
python inference.py --single_gpu \ --config configs/projects/wc_vid2vid/cityscapes/seg_ampO1.yaml \ --output_dir projects/wc_vid2vid/output/cityscapes
- Inference command
-
The results are stored in
projects/wc_vid2vid/output/cityscapes
. Below, we show the expected output video.
If you use this code for your research, please cite our papers.
@inproceedings{mallya2020world,
title={World-Consistent Video-to-Video Synthesis},
author={Arun Mallya and Ting-Chun Wang and Karan Sapra and Ming-Yu Liu},
booktitle={European Conference on Computer Vision (ECCV)}},
year={2020}
}