Skip to content

Open-Sora: Democratizing Efficient Video Production for All

License

Notifications You must be signed in to change notification settings

HusterRC/Open-Sora

Repository files navigation

Open-Sora: Towards Open Reproduction of Sora

Open-Sora is an open-source initiative dedicated to efficiently reproducing OpenAI's Sora. Our project aims to cover the full pipeline, including video data preprocessing, training with acceleration, efficient inference and more. Operating on a limited budget, we prioritize the vibrant open-source community, providing access to text-to-image, image captioning, and language models. We hope to make a contribution to the community and make the project more accessible to everyone.

📰 News

  • [2024.03.18] 🔥 We release Open-Sora 1.0, an open-source project to reproduce OpenAI Sora. Open-Sora 1.0 supports a full pipeline of video data preprocessing, training with acceleration, inference, and more. Our provided checkpoint can produce 2s 512x512 videos.

🎥 Latest Demo

2s 512x512 2s 512x512

🔆 New Features/Updates

  • 📍 Open-Sora-v1 is trained on xxx. We train the model in three stages. Model weights are available here. Training details can be found here. [WIP]
  • ✅ Support training acceleration including flash-attention, accelerated T5, mixed precision, gradient checkpointing, splitted VAE, sequence parallelism, etc. XXX times. Details locates at acceleration.md. [WIP]
  • ✅ We provide video cutting and captioning tools for data preprocessing. Instructions can be found here and our data collection plan can be found at datasets.md.
  • ✅ We find VQ-VAE from VideoGPT has a low quality and thus adopt a better VAE from Stability-AI. We also find patching in the time dimension deteriorates the quality. See our report for more discussions.
  • ✅ We investigate different architectures including DiT, Latte, and our proposed STDiT. Our STDiT achieves a better trade-off between quality and speed. See our report for more discussions.
  • ✅ Support clip and T5 text conditioning.
  • ✅ By viewing images as one-frame videos, our project supports training DiT on both images and videos (e.g., ImageNet & UCF101). See command.md for more instructions.
  • ✅ Support inference with official weights from DiT, Latte, and PixArt.
View more
  • ✅ Refactor the codebase. See structure.md to learn the project structure and how to use the config files.

TODO list sorted by priority

  • Complete the data processing pipeline (including dense optical flow, aesthetics scores, text-image similarity, deduplication, etc.). See datasets.md for more information. [WIP]
  • Training Video-VAE. [WIP]
View more
  • Support image and video conditioning.
  • Evaluation pipeline.
  • Incoporate a better scheduler, e.g., rectified flow in SD3.
  • Support variable aspect ratios, resolutions, durations.
  • Support SD3 when released.

Contents

Installation

git clone https://github.com/hpcaitech/Open-Sora
cd Open-Sora
pip install xxx

After installation, we suggest reading structure.md to learn the project structure and how to use the config files.

Model Weights

Model #Params url
16x256x256

Inference

python scripts/inference.py configs/opensora/inference/16x256x256.py

Data Processing

Split video into clips

We provide code to split a long video into separate clips efficiently using multiprocessing. See tools/data/scene_detect.py.

Generate video caption

Training

Acknowledgement

  • DiT: Scalable Diffusion Models with Transformers.
  • OpenDiT: An acceleration for DiT training. OpenDiT's team provides valuable suggestions on acceleration of our training process.
  • PixArt: An open-source DiT-based text-to-image model.
  • Latte: An attempt to efficiently train DiT for video.
  • StabilityAI VAE: A powerful image VAE model.
  • CLIP: A powerful text-image embedding model.
  • T5: The powerful text encoder.
  • LLaVA: A powerful image captioning model based on LLaMA and Yi-34B.
  • PySceneDetect: A powerful tool to split video into clips.

We are grateful for their exceptional work and generous contribution to open source.

Citation

@software{opensora,
  author = {Zangwei Zheng and Xiangyu Peng and Shenggui Li and Yang You},
  title = {Open-Sora: Towards Open Reproduction of Sora},
  month = {March},
  year = {2024},
  url = {https://github.com/hpcaitech/Open-Sora}
}

Zangwei Zheng and Xiangyu Peng equally contributed to this work during their internship at HPC-AI Tech.

Star History

Star History Chart

TODO

Modules for releasing:

  • configs
  • opensora
  • assets
  • scripts
  • tools

packages for data processing

put all outputs under ./checkpoints/, including pretrained_models, checkpoints, samples

About

Open-Sora: Democratizing Efficient Video Production for All

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%