Official Implementation of Zero-Shot Video Captioning with Evolving Pseudo-Tokens
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.8 -c pytorch -c nvidia
pip3 install clip-by-openai
pip install chardet
bash get_captions.sh
python run.py --token_wise --randomized_prompt --run_type caption_videos --data_path ../data/ActivityNet_200/validation/Mixing_drinks/yjazHd6a5SQ.mp4 --start_sec 0.0 --end_sec 17.87
python run.py --token_wise --randomized_prompt --run_type caption_videos --data_path examples/example_video.mp4
python run.py --token_wise --randomized_prompt --run_type caption_videos --data_path examples/example_video.mp4
python run.py --token_wise --randomized_prompt --run_type caption_images --data_path examples/example_image.jpg
Please cite our work if you use it in your research:
@article{tewel2022videocap,
title={Zero-Shot Video Captioning with Evolving Pseudo-Tokens},
author={Tewel, Yoad and Shalev, Yoav and Nadler, Roy and Schwartz, Idan and Wolf, Lior},
journal={arXiv preprint arXiv:2207.11100},
year={2022}
}