Open-Sora is an open-source initiative dedicated to efficiently reproducing OpenAI's Sora. Our project aims to cover the full pipeline, including video data preprocessing, training with acceleration, efficient inference and more. Operating on a limited budget, we prioritize the vibrant open-source community, providing access to text-to-image, image captioning, and language models. We hope to make a contribution to the community and make the project more accessible to everyone.
- [2024.03.18] 🔥 We release Open-Sora 1.0, an open-source project to reproduce OpenAI Sora. Open-Sora 1.0 supports a full pipeline of video data preprocessing, training with acceleration, inference, and more. Our provided checkpoint can produce 2s 512x512 videos.
2s 512x512 | 2s 512x512 |
---|---|
- 📍 Open-Sora-v1 is trained on xxx. We train the model in three stages. Model weights are available here. Training details can be found here. [WIP]
- ✅ Support training acceleration including flash-attention, accelerated T5, mixed precision, gradient checkpointing, splitted VAE, sequence parallelism, etc. XXX times. Details locates at acceleration.md. [WIP]
- ✅ We provide video cutting and captioning tools for data preprocessing. Instructions can be found here and our data collection plan can be found at datasets.md.
- ✅ We find VQ-VAE from VideoGPT has a low quality and thus adopt a better VAE from Stability-AI. We also find patching in the time dimension deteriorates the quality. See our report for more discussions.
- ✅ We investigate different architectures including DiT, Latte, and our proposed STDiT. Our STDiT achieves a better trade-off between quality and speed. See our report for more discussions.
- ✅ Support clip and T5 text conditioning.
- ✅ By viewing images as one-frame videos, our project supports training DiT on both images and videos (e.g., ImageNet & UCF101). See command.md for more instructions.
- ✅ Support inference with official weights from DiT, Latte, and PixArt.
View more
- ✅ Refactor the codebase. See structure.md to learn the project structure and how to use the config files.
- Complete the data processing pipeline (including dense optical flow, aesthetics scores, text-image similarity, deduplication, etc.). See datasets.md for more information. [WIP]
- Training Video-VAE. [WIP]
View more
- Support image and video conditioning.
- Evaluation pipeline.
- Incoporate a better scheduler, e.g., rectified flow in SD3.
- Support variable aspect ratios, resolutions, durations.
- Support SD3 when released.
- Open-Sora: Towards Open Reproduction of Sora
- 📰 News
- 🎥 Latest Demo
- 🔆 New Features/Updates
- Contents
- Installation
- Model Weights
- Inference
- Data Processing
- Training
- Acknowledgement
- Citation
- Star History
- TODO
git clone https://github.com/hpcaitech/Open-Sora
cd Open-Sora
pip install xxx
After installation, we suggest reading structure.md to learn the project structure and how to use the config files.
Model | #Params | url |
---|---|---|
16x256x256 |
python scripts/inference.py configs/opensora/inference/16x256x256.py
We provide code to split a long video into separate clips efficiently using multiprocessing
. See tools/data/scene_detect.py
.
- DiT: Scalable Diffusion Models with Transformers.
- OpenDiT: An acceleration for DiT training. OpenDiT's team provides valuable suggestions on acceleration of our training process.
- PixArt: An open-source DiT-based text-to-image model.
- Latte: An attempt to efficiently train DiT for video.
- StabilityAI VAE: A powerful image VAE model.
- CLIP: A powerful text-image embedding model.
- T5: The powerful text encoder.
- LLaVA: A powerful image captioning model based on LLaMA and Yi-34B.
- PySceneDetect: A powerful tool to split video into clips.
We are grateful for their exceptional work and generous contribution to open source.
@software{opensora,
author = {Zangwei Zheng and Xiangyu Peng and Shenggui Li and Yang You},
title = {Open-Sora: Towards Open Reproduction of Sora},
month = {March},
year = {2024},
url = {https://github.com/hpcaitech/Open-Sora}
}
Zangwei Zheng and Xiangyu Peng equally contributed to this work during their internship at HPC-AI Tech.
Modules for releasing:
configs
opensora
assets
scripts
tools
packages for data processing
put all outputs under ./checkpoints/, including pretrained_models, checkpoints, samples