Experiments

Please check amlt_configs/ for the experiments configs.

Performance

The major results can be found in docs/MODEL_ZOO.md and our Project Page.

We also provide evaluation code of our baseline (Promptable-GRiT) and benchmark referring VLLMs.

Evaluate with `vdtk`

Install `vdtk`

Support CLIP computation with images encoded by base64.

https://github.com/xk-huang/vdtk/tree/dev

data (e.g., jar files): https://huggingface.co/xk-huang/vdtk-data

Install with external data:

Docker

alias=`whoami | cut -d'.' -f2`
docker run -itd --runtime=nvidia --ipc=host --privileged -v /home/${alias}:/home/${alias} -w `pwd` --name sca nvcr.io/nvidia/pytorch:22.10-py3 bash
docker exec -it sca bash

# In the docker container
# cd to the code dir
. amlt_configs/setup.sh
source ~/.bashrc
pip install pydantic==1.10.8  # https://github.com/pydantic/pydantic/issues/545#issuecomment-1573776471
. amlt_configs/setup_eval_suite.sh

Conda

# Install env first
# conda create -n sca -y python=3.9 
# conda activate sca
# conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia

ORIGINAL_DIR="$(pwd)"
REPO_DIR=/tmp/vdtk
git clone --recursive https://github.com/xk-huang/vdtk.git $REPO_DIR -b dev
cd $REPO_DIR
git submodule update --init --recursive

apt-get update
sudo apt-get update
apt-get install git-lfs
sudo apt-get install git-lfs

git lfs install
git clone https://huggingface.co/xk-huang/vdtk-data
# git submodule init && git submodule update

rsync -avP ./vdtk-data/vdtk .
rm -rf vdtk-data

pip install --upgrade pip
pip install -e . POT==0.9.0  # POT=0.9.1 will take up all the memory with tf backend
pip install tensorflow==2.12.1  # Just fix one version of tf
pip install levenshtein==0.21.1
pip install openpyxl==3.1.2

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
cd "$ORIGINAL_DIR"

Potential Problems:

About Tensorflow: TF does not support CUDA 12 now (08/15/23). So we use nvcr.io/nvidia/pytorch:22.12-py3 which contains CUDA 11.8.
Encoding in docker image: import locale;locale.getpreferredencoding() is ANSI_X3.4-1968 rather than UTF-8 which causes error in file writing.
- change vdtk/metrics/tokenizer/ptbtokenizer.py:73: tmp_file = tempfile.NamedTemporaryFile(mode="w", delete=False, encoding="utf-8")

The format of input prediction json file

[
    {
        "_id": 0,
        "split": "inference",
        "references": [
            "a man wearing a red and white shirt"
        ],
        "candidates": [
            "red and yellow",
            "red shirt guy",
            "red and yellow uniform"
        ],
        "metadata": {
            "metadata_input_boxes": [
                0,
                95,
                113,
                419
            ],
            "metadata_image_id": 266240,
            "metadata_region_id": 27287
        },
        "logits": {
            "iou_scores": [
                0.89990234375,
                0.994140625,
                0.99365234375
            ]
        }
    }
]

The structure of files

$OUTPUT_DIR/infer/infer-visual_genome-densecap-local-densecap-test.json
# infer-{data_script_identifier}-{name}-{split}.json

All-in-one script

Usage:

>>> bash scripts/tools/eval_suite.sh
# Env args:
#        DRY_RUN: 
#        ONLY_GATHER: 
#        ONLY_EVAL: 
#        SKIP_CLIP_RECALL: 
#        DEBUG: 
#         NO_POST_PROCESS: 
# Usage: [DRY_RUN=1] [ONLY_GATHER=1] [ONLY_EVAL=1] ./eval_suite.sh <INFERENCE_JSON_DIR> <JSON_FILE_NAME> <SPLIT> [<IMAGE_B64_TSV_PATH>] [<MERGE_TSV_INTO_JSON_FOR_VDTK_SCRIPT>] [<POST_PROCESS_MULTI_CANDIDATES_SCRIPT>]JSON_FILE_NAME is not used, use any string like 'xxx' for it.

e.g.,

DRY_RUN=1 NO_POST_PROCESS=1 ONLY_EVAL=1 SKIP_CLIP_RECALL=1 bash scripts/tools/eval_suite.sh exp/ xxx inference

The details about the script.

Replace GT captions (the tokenizer processed ones) with the real GT (scripts/tools/replace_references_in_json_for_vdtk.py). Please prepare the folder structure correctly as in [this](The structure of files). It requires the .hydra config.
Remove multiple predictions but keep one based on IOU score (scripts/tools/post_process_multi_candidates_for_vdtk.py).

If there are multiple candidate preditions, we only choose one candidates with highest IOU for Meteor, CIDEr-D, ROUGE, etc.:

python scripts/tools/post_process_multi_candidates_for_vdtk.py -i $INFERENCE_JSON_PATH

Process multiple inference json file under a certain dirctory:

INFERENCE_JSON_DIR=
find $INFERENCE_JSON_DIR -name 'infer.json' -exec python scripts/tools/post_process_multi_candidates_for_vdtk.py -i {} \;

evaluate with vdtk, and save the results in .log file

You need to change PRED_JSONS_BASE_DIR, JSON_FILE_NAME, SPLIT, and IMAGE_B64_TSV_PATH.

If the infer.json file is too large to open in vscode, you can use vim to open it and change the above variables accordingly.

Currently, JSON_FILE_NAME is deprecated as we find the *.json in PRED_JSONS_BASE_DIR.

Parse the results for each *.log and gather to one xlsx by sheets.

Parse the log. Change the PRED_JSONS_BASE_DIR accordingly.

Merge each metric into one table with scripts/tools/merge_sheets_xlsx.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EVAL.md

EVAL.md

Experiments

Performance

Evaluate with `vdtk`

Install `vdtk`

Docker

Conda

The format of input prediction json file

The structure of files

All-in-one script

Files

EVAL.md

Latest commit

History

EVAL.md

File metadata and controls

Experiments

Performance

Evaluate with vdtk

Install vdtk

Docker

Conda

The format of input prediction json file

The structure of files

All-in-one script

Evaluate with `vdtk`

Install `vdtk`