UniVTG (ICCV'23)

TL; DR: The first video temporal grounding pretraining model, unifying diverse temporal annotations to power moment retrieval (interval), highlight detection (curve) and video summarization (point).

📢 News

[2023.10.15] Upload the Clip teacher scripts to create scalable pseudo annotations.
[2023.8.22] Code cleaning, add training/inference instruction, upload all downstream checkpoints.
[2023.8.6] Create the Huggingface space demo!
[2023.7.31] We release the arXiv paper, codes, checkpoints, and gradio demo.

📝 Todo

Connect UniVTG with LLM e.g., ChatGPT.
Upload all downstream checkpoints.
Upload all pretraining checkpoints.

🌟 Run on your video:

To power practical usage, we release the following checkpoints:

can be run on a single GPU with less than 4GB memory, highly efficient, less than 1 sec to perform temporal grounding even a 10 minutes long video.

Video Enc.	Text Enc.	Pretraining	Fine-tuning	Checkpoints
CLIP-B/16	CLIP-B/16	4M	-	Google Drive
CLIP-B/16	CLIP-B/16	4M	QVHL + Charades + NLQ + TACoS + ActivityNet + DiDeMo	Google Drive

Download checkpoint and put it in the dir results/omni.

Download the example videos from here and put it under examples/

Run python3 main_gradio.py --resume ./results/omni/model_best.ckpt

[ Youtube video ]

[ Egocentric video ]

[ Charades video ]

⚙️ Preparation

Please find instructions in install.md to setup environment and datasets.

📦 Model Zoo

Download checkpoints in model.md to reproduce the benchmark results.

🚀 Training & Inference

We use slurm for job running, you may need to slightly modify the code to adapt your environment if you do not use slurm system.

Pretraining (multi-gpu)

Large-scale pretraining: bash scripts/pretrain.sh

Multi-datasets co-training: bash scripts/cotrain.sh

Downstream (single-gpu)

Indicate --resume to init model by pretraining weight. Refer to our model zoo for detailed parameter settings

Training: bash scripts/qvhl_pretrain.sh

Indicate --eval_init and --n_epoch=0 to evaluate selected checkpoint --resume.

Inference: bash scripts/qvhl_inference.sh

CLIP teacher to create scalable pseudo labels

Download the openimages v6 class list from https://storage.googleapis.com/openimages/v6/oidv6-class-descriptions.csv.
Convert it as json by python3 teacher/csv2json.py then extract the textual class features by python3 teacher/label2feature.py
(Before this, you should have extracted the video features of the video) Run the script to generate pseudo labels python3 teacher/clip2labels.py

🎨 Visualization

If you want to draw visualizations like our paper, you can simply run python3 plot/qvhl.py to generate corresponding figures by providing the prediction jsons (you can download them in Model Zoo).

🎓 Citation

If you find our work helps, please cite our paper.

@misc{lin2023univtg,
      title={UniVTG: Towards Unified Video-Language Temporal Grounding}, 
      author={Kevin Qinghong Lin and Pengchuan Zhang and Joya Chen and Shraman Pramanick and Difei Gao and Alex Jinpeng Wang and Rui Yan and Mike Zheng Shou},
      year={2023},
      eprint={2307.16715},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

✉️ Contact

This repo is maintained by Kevin. Questions and discussions are welcome via [email protected] or open an issue.

😊 Acknowledgement

This codebase is based on moment_detr, HERO_Video_Feature_Extractor, UMT.

We thank the authors for their open-source contributions.

Name	Name	Last commit message	Last commit date
Latest commit QinghongLin Merge pull request #11 from melroy89/main-1 May 8, 2024 32659ac · May 8, 2024 History 134 Commits
data	data	provide h5py create script	Nov 1, 2023
eval	eval	Add files via upload	Jul 31, 2023
examples	examples	Add files via upload	Aug 7, 2023
figures	figures	Add files via upload	Aug 9, 2023
main	main	Update dataset.py	Aug 21, 2023
model	model	Delete base_qfvs.py	Aug 21, 2023
plot	plot	Add files via upload	Aug 21, 2023
run_on_video	run_on_video	Update video_extractor.py	Aug 6, 2023
scripts	scripts	Update qvhl_pretrain.sh	Dec 19, 2023
teacher	teacher	Add files via upload	Oct 16, 2023
tmp	tmp	Add files via upload	Aug 7, 2023
utils	utils	Fix typo in basic_utils.py	Aug 9, 2023
LICENSE	LICENSE	license update	Aug 24, 2023
README.md	README.md	Merge pull request #15 from NoahSchiro/license-update	May 8, 2024
install.md	install.md	Update install.md	Sep 24, 2023
main_gradio.py	main_gradio.py	Update main_gradio.py	Aug 7, 2023
model.md	model.md	Update model.md	Nov 3, 2023
requirements.txt	requirements.txt	Update requirements.txt	Aug 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UniVTG (ICCV'23)

📢 News

📝 Todo

🌟 Run on your video:

⚙️ Preparation

📦 Model Zoo

🚀 Training & Inference

Pretraining (multi-gpu)

Downstream (single-gpu)

CLIP teacher to create scalable pseudo labels

🎨 Visualization

🎓 Citation

✉️ Contact

😊 Acknowledgement

About

Releases

Packages

Contributors 4

Languages

License

showlab/UniVTG

Folders and files

Latest commit

History

Repository files navigation

UniVTG (ICCV'23)

📢 News

📝 Todo

🌟 Run on your video:

⚙️ Preparation

📦 Model Zoo

🚀 Training & Inference

Pretraining (multi-gpu)

Downstream (single-gpu)

CLIP teacher to create scalable pseudo labels

🎨 Visualization

🎓 Citation

✉️ Contact

😊 Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages