Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)
This code is based on following libraries:
python=3.8
pytorch=1.7.0
(with cuda 10.2)
To create virtual environment with all necessary libraries:
conda env create -f environment.yml
By default data should be saved under data/feat/{audio,label,visual}
directory and logs (w/ cache, checkpoint) are saved under data/{cache,ckpt,log}
directory. Using symbolic link is recommended:
ln -s {path_to_your_data_directory} data
We use single TITAN RTX for training, but GPUs with less memory are still doable with smaller batch size (provided precomputed features).
We plan to release the Pano-AVQA dataset public within this year, including Q&A annotation, precomputed features, etc. Please stay tuned!
Default configuration is provided in code/config.py
. To run with this configuration:
python cli.py
To run with custom configuration, either modify code/config.py
or execute:
python cli.py with {{flags_at_your_disposal}}
Model weight is saved under ./data/log
directory. To run inference only:
python cli.py eval with ckpt_file=../data/log/{experiment}/{ckpt}.pth
If you find our work useful in your research, please consider citing:
@InProceedings{Yun2021PanoAVQA,
author = {Yun, Heeseung and Yu, Youngjae and Yang, Wonsuk and Lee, Kangil and Kim, Gunhee},
title = {Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$ Videos},
booktitle = {ICCV},
year = {2021}
}
If you have any inquiries, please don't hesitate to contact us via heeseung.yun at vision.snu.ac.kr.