Skip to content

"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

License

Notifications You must be signed in to change notification settings

chakrabortyrajatsubhra/Video-ChatGPT

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Video-ChatGPT 🎥 💬

Oryx Video-ChatGPT

Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

Installation 🔧

We recommend setting up a conda environment for the project:

conda create --name=video_chatgpt python=3.10
conda activate video_chatgpt

git clone https://github.com/mbzuai-oryx/Video-ChatGPT.git
cd Video-ChatGPT
pip install -r requirements.txt

export PYTHONPATH="./:$PYTHONPATH"

Additionally, install FlashAttention for training,

pip install ninja

git clone https://github.com/HazyResearch/flash-attention.git
cd flash-attention
git checkout v1.0.7
python setup.py install

Running Demo Offline 💿

To run the demo offline, please refer to the instructions in offline_demo.md.


Training 🚋

For training instructions, check out train_video_chatgpt.md.


Video Instruction Dataset for ADL:

If you want the dataset and features let me know.

Qualitative Analysis 🔍

A Comprehensive Evaluation of Video-ChatGPT's Performance across Multiple Tasks.

Video Reasoning Tasks 🎥

sample1


Creative and Generative Tasks 🖌️

sample5


Spatial Understanding 🌐

sample8


Video Understanding and Conversational Tasks 💬

sample10


Action Recognition 🏃

sample22


Question Answering Tasks ❓

sample14


Temporal Understanding ⏳

sample18


License 📜

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

About

"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.4%
  • Shell 0.6%