Paper: https://arxiv.org/abs/2412.06014
Project page: https://aaltoml.github.io/BayesVLM/
- Ensure you have Python version
>= 3.11
installed. - Install the required packages by running:
pip install -r requirements.txt
- Set
DATA_BASE_DIR
in your.env
file. You can use the structure from the.env.example
file.DATA_BASE_DIR=/path/to/datasets
- Add the project root directory to the
PYTHONPATH
environment variable.export PYTHONPATH=$PYTHONPATH:/path/to/project/root
To run the hessian estimation code, use the following command:
python scripts/hessian_estimation.py
To run the code for zero-shot experiments, use the following command:
python scripts/zeroshot.py
To run the code for the active-learning experiments, use the following command:
python scripts/activelearning.py
Note that each of those commands has additional arguments that allow the adjustment of the Hessian estimation and zero-shot/active learning experiments.
The precomputed Hessians for the models used in the paper are available in the hessians/
folder. You can select a specific hessian by setting --hessian_dir
in the provided scripts.
A notebook stepping through the zero-shot code is available in notebooks/zeroshot.ipynb
.
The data is stored in the DATA_BASE_DIR
folder and is structured as follows:
DATA_BASE_DIR/
├── cifar10/
├── cifar100/
├── eurosat/
├── flowers102/
├── food101/
├── homeoffice/
├── imagenet1k/
├── imagenet_r/
├── imagenet_val_wds/
├── laion400m/
├── sun397/
├── ucf101/
Please set the DATA_BASE_DIR
environment variable accordingly.
The CIFAR-10
dataset is automatically downloaded by the huggingface datasets
library.
The CIFAR-100
dataset is automatically downloaded by the huggingface datasets
library.
From https://github.com/vishaal27/SuS-X/blob/main/data/DATA.md
- Create a folder named
eurosat/
underDATA_BASE_DIR
. - Download the dataset from http://madm.dfki.de/files/sentinel/EuroSAT.zip and extract it to
DATA_BASE_DIR/eurosat/
. - Download
split_zhou_EuroSAT.json
from here and put it underDATA_BASE_DIR/eurosat
.
The directory structure should look like
eurosat/
|–– 2750/
|–– split_zhou_EuroSAT.json
The Flowers102
dataset is automatically downloaded by the torchvision
library.
The Food101
dataset is automatically downloaded by the torchvision
library.
Download the dataset from https://www.hemanthdv.org/officeHomeDataset.html and extract it to DATA_BASE_DIR/homeoffice/
.
The directory structure should look like
homeoffice/
|–– Art/
|–– Clipart/
|–– Product/
|–– Real World/
|–– ImageInfo.csv
|–– imagelist.txt
Follow the instructions pytorch/vision#7545 (comment) to download the dataset and extract it to DATA_BASE_DIR/stanford_cars/
.
The DTD
dataset is automatically downloaded by the torchvision
library.
We supply the script scripts/download_imagenet.py
to download all validation tar files for the ImageNet dataset from the Hugging Face Datasets Hub.
After running the script, the directory structure should look like
imagenet_val_wds/
|–– imagenet1k-validation-00.tar
|–– imagenet1k-validation-01.tar
|–– ...
|–– imagenet1k-validation-63.tar
The laion400M
dataset can be downloaded using the img2dataset tool. The instructions for the laion400m
dataset are available here.
Before running the img2dataset
script, we removed all data points marked as NSFW
in the metadata.
- Create a folder named
sun397/
under./data
. - Download the images http://vision.princeton.edu/projects/2010/SUN/SUN397.tar.gz.
- Download the partitions https://vision.princeton.edu/projects/2010/SUN/download/Partitions.zip.
- Extract these files under
./data/sun397/
. - Download
split_zhou_SUN397.json
from this link and put it under./data/sun397
.
The directory structure should look like
sun397/
|–– SUN397/
|–– split_zhou_SUN397.json
|–– ... # a bunch of .txt files
- Create a folder named
ucf101/
under./data
. - Download the zip file
UCF-101-midframes.zip
from here and extract it to./data/ucf101/
. This zip file contains the extracted middle video frames. - Download
split_zhou_UCF101.json
from this link and put it under./data/ucf101
.
The directory structure should look like
ucf101/
|–– UCF-101-midframes/
|–– split_zhou_UCF101.json
@article{baumann2024bayesvlm,
title = {Post-hoc Probabilistic Vision-Language Models},
author = {Anton Baumann, Rui Li, Marcus Klasson, Santeri Mentu, Shyamgopal Karthik, Zeynep Akata, Arno Solin and Martin Trapp},
year = {2024},
journal = {arXiv preprint arxiv:2412.06014}
}
This software is provided under the MIT license.