GitHub - HusterRC/Open-Sora at 356986c644319b0872cdb2db97f086f54b31bdca

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
assets		assets
configs		configs
docs		docs
opensora		opensora
scripts		scripts
tests		tests
tools/data		tools/data
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

Open-Sora: Towards Open Reproduction of Sora

Open-Sora is an open-source initiative dedicated to efficiently reproducing OpenAI's Sora. Our project aims to cover the full pipeline, including video data preprocessing, training with acceleration, efficient inference and more. Operating on a limited budget, we prioritize the vibrant open-source community, providing access to text-to-image, image captioning, and language models. We hope to make a contribution to the community and make the project more accessible to everyone.

📰 News

[2024.03.18] 🔥 We release Open-Sora 1.0, an open-source project to reproduce OpenAI Sora. Open-Sora 1.0 supports a full pipeline of video data preprocessing, training with acceleration, inference, and more. Our provided checkpoint can produce 2s 512x512 videos.

🎥 Latest Demo

2s 512x512	2s 512x512

🔆 New Features/Updates

📍 Open-Sora-v1 is trained on xxx. We train the model in three stages. Model weights are available here. Training details can be found here. [WIP]
✅ Support training acceleration including flash-attention, accelerated T5, mixed precision, gradient checkpointing, splitted VAE, sequence parallelism, etc. XXX times. Details locates at acceleration.md. [WIP]
✅ We provide video cutting and captioning tools for data preprocessing. Instructions can be found here and our data collection plan can be found at datasets.md.
✅ We find VQ-VAE from VideoGPT has a low quality and thus adopt a better VAE from Stability-AI. We also find patching in the time dimension deteriorates the quality. See our report for more discussions.
✅ We investigate different architectures including DiT, Latte, and our proposed STDiT. Our STDiT achieves a better trade-off between quality and speed. See our report for more discussions.
✅ Support clip and T5 text conditioning.
✅ By viewing images as one-frame videos, our project supports training DiT on both images and videos (e.g., ImageNet & UCF101). See command.md for more instructions.
✅ Support inference with official weights from DiT, Latte, and PixArt.

✅ Refactor the codebase. See structure.md to learn the project structure and how to use the config files.

TODO list sorted by priority

Complete the data processing pipeline (including dense optical flow, aesthetics scores, text-image similarity, deduplication, etc.). See datasets.md for more information. [WIP]
Training Video-VAE. [WIP]

Support image and video conditioning.
Evaluation pipeline.
Incoporate a better scheduler, e.g., rectified flow in SD3.
Support variable aspect ratios, resolutions, durations.
Support SD3 when released.

Installation

git clone https://github.com/hpcaitech/Open-Sora
cd Open-Sora
pip install xxx

After installation, we suggest reading structure.md to learn the project structure and how to use the config files.

Model Weights

Model	#Params	url
16x256x256

Inference

python scripts/inference.py configs/opensora/inference/16x256x256.py

Data Processing

Split video into clips

We provide code to split a long video into separate clips efficiently using multiprocessing. See tools/data/scene_detect.py.

Generate video caption

Training

Acknowledgement

DiT: Scalable Diffusion Models with Transformers.
OpenDiT: An acceleration for DiT training. OpenDiT's team provides valuable suggestions on acceleration of our training process.
PixArt: An open-source DiT-based text-to-image model.
Latte: An attempt to efficiently train DiT for video.
StabilityAI VAE: A powerful image VAE model.
CLIP: A powerful text-image embedding model.
T5: The powerful text encoder.
LLaVA: A powerful image captioning model based on LLaMA and Yi-34B.
PySceneDetect: A powerful tool to split video into clips.

We are grateful for their exceptional work and generous contribution to open source.

Citation

@software{opensora,
  author = {Zangwei Zheng and Xiangyu Peng and Shenggui Li and Yang You},
  title = {Open-Sora: Towards Open Reproduction of Sora},
  month = {March},
  year = {2024},
  url = {https://github.com/hpcaitech/Open-Sora}
}

Zangwei Zheng and Xiangyu Peng equally contributed to this work during their internship at HPC-AI Tech.

Star History

TODO

Modules for releasing:

configs
opensora
assets
scripts
tools

packages for data processing

put all outputs under ./checkpoints/, including pretrained_models, checkpoints, samples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open-Sora: Towards Open Reproduction of Sora

📰 News

🎥 Latest Demo

🔆 New Features/Updates

TODO list sorted by priority

Contents

Installation

Model Weights

Inference

Data Processing

Split video into clips

Generate video caption

Training

Acknowledgement

Citation

Star History

TODO

About

Releases

Packages

Languages

License

HusterRC/Open-Sora

Folders and files

Latest commit

History

Repository files navigation

Open-Sora: Towards Open Reproduction of Sora

📰 News

🎥 Latest Demo

🔆 New Features/Updates

TODO list sorted by priority

Contents

Installation

Model Weights

Inference

Data Processing

Split video into clips

Generate video caption

Training

Acknowledgement

Citation

Star History

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages