Rui Zhao Lingmin Ran Yuchao Gu Difei Gao Mike Zheng Shou✉
* Equal Contribution ✉ Corresponding Author
Project Page | arXiv | PDF
- [10/12/2023] Code and weights released!
pip install -r requirements.txt
Pytorch 2.0+ is highly recommended for more efficiency and speed on GPUs.
All weights are available in show lab huggingface! Please check key frames generation, interpolation, superresolution stage 1 and superresolution stage 2 modules. We also use deep-floyd-if superresolution stage 1 model for the first frame superresolution. To download deep-floyd-if models, you need follow their official instructions.
To run diffusion models for text-to-video generation, run this command:
python run_inference.py
The output videos from different modules will be stored in "outputs" folder with the gif format. The code will automatically donwload module weights from huggingface. Otherwise, you can donwload weights manually with git lfs then change the "pretrained_model_path" to your local path. Take key frames generation module for example:
git lfs install
git clone https://huggingface.co/showlab/show-1-base
Show-1.Demo.Video.mp4
If you make use of our work, please cite our paper.
@misc{zhang2023show1,
title={Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation},
author={David Junhao Zhang and Jay Zhangjie Wu and Jia-Wei Liu and Rui Zhao and Lingmin Ran and Yuchao Gu and Difei Gao and Mike Zheng Shou},
year={2023},
eprint={2309.15818},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- This code heavily builds on diffusers, deep-floyd-if, modelscope, zeroscope. Thanks for open-sourcing!