GitHub - YangLing0818/VideoTetris at 01e7fa9eafcdcde6aa2b522d87649b8aea99ebf7

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
README.md		README.md
convert.py		convert.py

Repository files navigation

VideoTetris: Towards Compositional Text-To-Video Generation

This repo contains the official implementation of our VideoTetris.

VideoTetris: Towards Compositional Text-To-Video Generation
Ye Tian, Ling Yang*, Haotian Yang, Yuan Gao, Yufan Deng, Jingmin Chen, Xintao Wang, Zhaochen Yu, Xin Tao, Pengfei Wan, Di Zhang, Bin Cui
(* Equal Contribution and Corresponding Author)
Peking University, Kuaishou Technology

Introduction

VideoTetris is a novel framework that enables compositional T2V generation. Specifically, we propose spatio-temporal compositional diffusion to precisely follow complex textual semantics by manipulating and composing the attention maps of denoising networks spatially and temporally. Moreover, we propose an enhanced video data preprocessing to enhance the training data regarding motion dynamics and prompt understanding, equipped with a new reference frame attention mechanism to improve the consistency of auto-regressive video generation. Our demonstrations include successful examples of videos spanning from 10s, 30s to 2 minutes, and can be extended for even longer durations.

News and Todo List

[2024.6.7] Paper VideoTetris released
Release the inference code of VideoTetris with VideoCrafter2
Release the checkpoint of our long compositonal video generation
Release VideoTetris with KLing/FIFO-Diffusion

Training and Inference

(TODO)

Example Results

We only provide some example results here, more detailed results can be found in the project page.


A cute brown dog on the left and a sleepy cat on the right are napping in the sun. @16 Frames	A cheerful farmer and a hardworking blacksmith are building a barn. @16 Frames


One cute brown squirrel, on a pile of hazelnuts, cinematic. ------> transitions to Two cute brown squirrels, on a pile of hazelnuts, cinematic. ------> transitions to Three cute brown squirrels, on a pile of hazelnuts, cinematic. ------> transitions to Four cute brown squirrels, on a pile of hazelnuts, cinematic. @80 Frames	A cute brown squirrel, on a pile of hazelnuts, cinematic. ------> transitions to A cute brown squirrel and a cute white squirrel, on a pile of hazelnuts, cinematic. @240 Frames

Citation

@article{tian2024videotetris,
  title={VideoTetris: Towards Compositional Text-to-Video Generation},
  author={Tian, Ye and Yang, Ling and Yang, Haotian and Gao, Yuan and Deng, Yufan and Chen, Jingmin and Wang, Xintao and Yu, Zhaochen and Tao, Xin and Wan, Pengfei and Zhang, Di and Cui, Bin},
  journal={arXiv preprint arXiv:2401.},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VideoTetris: Towards Compositional Text-To-Video Generation

Introduction

News and Todo List

Training and Inference

Example Results

Citation

About

Releases

Packages

Contributors 2

Languages

License

YangLing0818/VideoTetris

Folders and files

Latest commit

History

Repository files navigation

VideoTetris: Towards Compositional Text-To-Video Generation

Introduction

News and Todo List

Training and Inference

Example Results

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages