Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
README.md		README.md

Repository files navigation

Awesome-Video-Understanding-with-LLM

TBD: Taxonomy

Video Understanding

LLM as A Controller

Title	Date	Code	Data	Venue
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models	06/2023	code	-
VALLEY: Video Assistant with Large Language model Enhanced abilitY	06/2023	code	-
VLog: Video as a Long Document	-	demo	-
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions	04/2023	code	-
ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System	04/2023	project page	-
VideoChat: Chat-Centric Video Understanding	2023/05	code	demo
VideoLLM: Modeling Video Sequence with Large Language Models	05/2023	code	-
Self-Chained Image-Language Model for Video Localization and Question Answering	05/2023	code	-
[Learning Video Representations from Large Language Models] (https://arxiv.org/abs/2212.04501)	12/2022	code(https://github.com/facebookresearch/lavila)	-
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language	04/2022	project page	-
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos	03/2023	code	-
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration	06/2023	code	-
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video	02/2023	code	-
Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering	03/2023	code	-
MIMIC-IT: Multi-Modal In-Context Instruction Tuning	06/2023	code	-
Garbage in, garbage out: Zero-shot detection of crime using Large Language Models	07/2023	code	-
A Video Is Worth 4096 Tokens: Verbalize Story Videos To Understand Them In Zero Shot	05/2023	-	-
Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering	04/2023	-	-
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models	06/2023	-	-
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners	05/2022	code	-
Let's Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction	05/2023	-	-
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation	07/2023	code	-	-
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks	06/2023	code	-	-
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset	05/2023	code	-	-
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension	07/2023	code	-	-
FunQA: Towards Surprising Video Comprehension	06/2023	code	-	-

End-to-end Models

Title	Date	Code	Data
Video-LLaMA: An Instruction-Finetuned Visual Language Model for Video Understanding	06/2023	code	-
LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning	06/2023	code	-
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset	04/2023	code	-

Video Generation

Title	Date	Code	Data	Venue
Generative Pretraining in Multimodality	07/2023	-	-
NExT-GPT: Any-to-Any Multimodal LLM	09/2023	-	-

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-Video-Understanding-with-LLM

Video Understanding

LLM as A Controller

End-to-end Models

Video Generation

About

Releases

Packages

Gary-code/Awesome-LLMs-for-Video-Understanding

Folders and files

Latest commit

History

Repository files navigation

Awesome-Video-Understanding-with-LLM

Video Understanding

LLM as A Controller

End-to-end Models

Video Generation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages