GitHub - sunbin05/Open-Sora at 88848311539ee782f1c9484c1ede14d7f0b72049

Name	Name	Last commit message	Last commit date
Latest commit History 69 Commits
assets	assets
configs	configs
docs	docs
opensora	opensora
scripts	scripts
tests	tests
tools/data	tools/data
.gitignore	.gitignore
.isort.cfg	.isort.cfg
.pre-commit-config.yaml	.pre-commit-config.yaml
LICENSE	LICENSE
README.md	README.md
requirements.txt	requirements.txt
setup.py	setup.py

Open-Sora: Towards Open Reproduction of Sora

Open-Sora is an open-source initiative dedicated to efficiently reproducing OpenAI's Sora. Our project aims to cover the full pipeline, including video data preprocessing, training with acceleration, efficient inference and more. Operating on a limited budget, we prioritize the vibrant open-source community, providing access to text-to-image, image captioning, and language models. We hope to make a contribution to the community and make the project more accessible to everyone.

📰 News

[2024.03.18] 🔥 We release Open-Sora 1.0, an open-source project to reproduce OpenAI Sora. Open-Sora 1.0 supports a full pipeline including video data preprocessing, training with acceleration, inference, and more. Our provided checkpoint can produce 2~5s 512x512 videos with only 3 days training.

🎥 Latest Demo

2s 512×512	2s 512×512	2s 512×512

Videos are downsampled to .gif for display. Click the video for original ones.

🔆 New Features/Updates

📍 Open-Sora-v1 released. Model weights are available here. With only 400K video clips and 200 H800 days (compared with 152M samples in Stable Video Diffusion), we are able to generate 2s 512×512 videos.
✅ Three stages training from an image diffusion model to a video diffusion model. We provide the weights for each stage.
✅ Support training acceleration including accelerated transformer, faster T5 and VAE, and sequence parallelism. Open-Sora improve 55% training speed when training on 64x512x512 videos. Details locates at acceleration.md.
✅ We provide video cutting and captioning tools for data preprocessing. Instructions can be found here and our data collection plan can be found at datasets.md.
✅ We find VQ-VAE from VideoGPT has a low quality and thus adopt a better VAE from Stability-AI. We also find patching in the time dimension deteriorates the quality. See our report for more discussions.
✅ We investigate different architectures including DiT, Latte, and our proposed STDiT. Our STDiT achieves a better trade-off between quality and speed. See our report for more discussions.
✅ Support clip and T5 text conditioning.
✅ By viewing images as one-frame videos, our project supports training DiT on both images and videos (e.g., ImageNet & UCF101). See command.md for more instructions.
✅ Support inference with official weights from DiT, Latte, and PixArt.

✅ Refactor the codebase. See structure.md to learn the project structure and how to use the config files.

TODO list sorted by priority

Complete the data processing pipeline (including dense optical flow, aesthetics scores, text-image similarity, deduplication, etc.). See datasets.md for more information. [WIP]
Training Video-VAE. [WIP]

Support image and video conditioning.
Evaluation pipeline.
Incoporate a better scheduler, e.g., rectified flow in SD3.
Support variable aspect ratios, resolutions, durations.
Support SD3 when released.

Installation

# create a virtual env
conda create -n opensora python=3.10

# install torch
# the command below is for CUDA 12.1, choose install commands from 
# https://pytorch.org/get-started/locally/ based on your own CUDA version
pip3 install torch torchvision

# install flash attention (optional)
pip install packaging ninja
pip install flash-attn --no-build-isolation

# install apex (optional)
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" git+https://github.com/NVIDIA/apex.git

# install xformers
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu121

# install this project
git clone https://github.com/hpcaitech/Open-Sora
cd Open-Sora
pip install -v -e .

After installation, we suggest reading structure.md to learn the project structure and how to use the config files.

Model Weights

Resoluion	Data	#iterations	Batch Size	GPU days (H800)
16×256×256	366K	80k	8×64	117
16×256×256	20K HQ	24k	8×64	45
16×512×512	20K HQ	20k	2×64	35
64×512×512	50K HQ		4×64

Our model's weight is partially initialized from PixArt-α. The number of parameters is 724M. More information about training can be found in our report. More about dataset can be found in dataset.md.

LIMITATION: Our model is trained on a limited budget. The quality and text alignment is relatively poor. The model performs badly especially on generating human beings and cannot follow detailed instructions. We are working on improving the quality and text alignment.

Inference

To run inference with our provided weights, first download T5 weights into pretrained_models/t5_ckpts/t5-v1_1-xxl. Then run the following commands to generate samples. See here to customize the configuration.

# Sample 16x256x256 (may take less than 1 min)
torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path ./path/to/your/ckpt.pth

# Sample 16x512x512 (may take less than 1 min)
torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py

# Sample 64x512x512 (may take 1 min or more)
torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/64x512x512.py

# Sample 64x512x512 with sequence parallelism (may take 1 min or more)
# sequence parallelism is enabled automatically when nproc_per_node is larger than 1
torchrun --standalone --nproc_per_node 2 scripts/inference.py configs/opensora/inference/64x512x512.py

For inference with other models, see here for more instructions.

Data Processing (WIP)

Split video into clips

We provide code to split a long video into separate clips efficiently using multiprocessing. See tools/data/scene_detect.py.

Generate video caption

Training

To launch training, first download T5 weights into pretrained_models/t5_ckpts/t5-v1_1-xxl. Then run the following commands to launch training on a single node.

# 1 GPU, 16x256x256
torchrun --nnodes=1 --nproc_per_node=1 scripts/train.py configs/opensora/train/16x256x512.py --data-path YOUR_CSV_PATH
# 8 GPUs, 64x512x512
torchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT

To launch training on multiple nodes, prepare a hostfile according to ColossalAI, and run the following commands.

colossalai run --nproc_per_node 8 --hostfile hostfile scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT

For training other models and advanced usage, see here for more instructions.

Acknowledgement

DiT: Scalable Diffusion Models with Transformers.
OpenDiT: An acceleration for DiT training. OpenDiT's team provides valuable suggestions on acceleration of our training process.
PixArt: An open-source DiT-based text-to-image model.
Latte: An attempt to efficiently train DiT for video.
StabilityAI VAE: A powerful image VAE model.
CLIP: A powerful text-image embedding model.
T5: A powerful text encoder.
LLaVA: A powerful image captioning model based on Yi-34B.

We are grateful for their exceptional work and generous contribution to open source.

Citation

@software{opensora,
  author = {Zangwei Zheng and Xiangyu Peng and Yang You},
  title = {Open-Sora: Towards Open Reproduction of Sora},
  month = {March},
  year = {2024},
  url = {https://github.com/hpcaitech/Open-Sora}
}

Zangwei Zheng and Xiangyu Peng equally contributed to this work during their internship at HPC-AI Tech.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open-Sora: Towards Open Reproduction of Sora

📰 News

🎥 Latest Demo

🔆 New Features/Updates

TODO list sorted by priority

Contentss

Installation

Model Weights

Inference

Data Processing (WIP)

Split video into clips

Generate video caption

Training

Acknowledgement

Citation

Star History

About

Releases

Packages

Languages

License

sunbin05/Open-Sora

Folders and files

Latest commit

History

Repository files navigation

Open-Sora: Towards Open Reproduction of Sora

📰 News

🎥 Latest Demo

🔆 New Features/Updates

TODO list sorted by priority

Contentss

Installation

Model Weights

Inference

Data Processing (WIP)

Split video into clips

Generate video caption

Training

Acknowledgement

Citation

Star History

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages