Skip to content

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

License

Notifications You must be signed in to change notification settings

thanhphat-19/Video-ChatGPT-demo

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Running Video-ChatGPT Demo

Please follow the instructions below to run the Video-ChatGPT demo on your local GPU machine.

Note: Our demo requires approximately 18 GB of GPU memory.

Clone the repository

We recommend setting up a conda environment for the project:

conda create --name=video_chatgpt python=3.10
conda activate video_chatgpt
git clone https://github.com/thanhphat-19/Video-ChatGPT-demo.git
cd Video-ChatGPT
pip install -r requirements.txt
export PYTHONPATH="./:$PYTHONPATH"

Install GIT lFS

sudo apt-get install git-lfs

Access Token Huggingface to Download The Model

    huggingface-cli

Download Video-ChatGPT weights

git clone https://huggingface.co/mmaaz60/LLaVA-7B-Lightening-v1-1

Download LLaVA weights model

git clone https://huggingface.co/MBZUAI/Video-ChatGPT-7B

Run the Gradio Application

python video_chatgpt/demo/video_demo.py         
--model-name "../../Video-ChatGPT-demo/LLaVA-7B-Lightening-v1-1"         
--projection_path "../../Video-ChatGPT-demo/Video-ChatGPT-7B/video_chatgpt-7B.bin"

Follow the instructions on the screen to open the demo dashboard.

About

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.4%
  • Shell 0.6%