Refactored utility files

ach1ntya · May 29, 2022 · 0438d33 · 0438d33
1 parent 63a705e
commit 0438d33
Show file tree

Hide file tree

Showing 21 changed files with 200 additions and 175 deletions.
diff --git a/ClipCap b/ClipCap
diff --git a/README.md b/README.md
@@ -42,10 +42,10 @@ conda activate aokvqa
 export AOKVQA_DIR=./datasets/aokvqa/
 mkdir -p ${AOKVQA_DIR}
 
-curl -L -s https://prior-datasets.s3.us-east-2.amazonaws.com/aokvqa/aokvqa_v1p0.tar.gz | tar xvz -C ${AOKVQA_DIR}
+curl -fsSL https://prior-datasets.s3.us-east-2.amazonaws.com/aokvqa/aokvqa_v1p0.tar.gz | tar xvz -C ${AOKVQA_DIR}
 ```
 
-<details> <summary><b>Downloading images/annotations from COCO 2017</b></summary>
+<details> <summary><b>Downloading COCO 2017</b></summary>
 
 ```bash
 export COCO_DIR=./datasets/coco/
@@ -62,17 +62,14 @@ unzip annotations_trainval2017.zip -d ${COCO_DIR}; rm annotations_trainval2017.z
 
 </details>
 
-Loading our dataset is easy! Just grab our [aokvqa_utils.py](https://github.com/allenai/aokvqa/blob/main/aokvqa_utils.py) file and refer to the following code.
+Loading our dataset is easy! Just grab our [load_aokvqa.py](https://github.com/allenai/aokvqa/blob/main/load_aokvqa.py) file and refer to the following code.
 
 ```python
 import os
-import aokvqa_utils
-
 aokvqa_dir = os.getenv('AOKVQA_DIR')
 
-train_dataset = aokvqa_utils.load_aokvqa(aokvqa_dir, 'train')
-val_dataset = aokvqa_utils.load_aokvqa(aokvqa_dir, 'val')
-test_dataset = aokvqa_utils.load_aokvqa(aokvqa_dir, 'test')
+from load_aokvqa import load_aokvqa, get_coco_path
+train_dataset = load_aokvqa(aokvqa_dir, 'train')  # also 'val' or 'test'
 ```
 
 <details> <summary><b>Example dataset entry</b></summary>
@@ -84,7 +81,7 @@ print(dataset_example['question_id'])
 # 22MexNkBPpdZGX6sxbxVBH
 
 coco_dir = os.getenv('COCO_DIR')
-image_path = aokvqa_utils.get_coco_path('train', dataset_example['image_id'], coco_dir)
+image_path = get_coco_path('train', dataset_example['image_id'], coco_dir)
 print(image_path)
 # ./datasets/coco/train2017/000000299207.jpg
 
@@ -104,30 +101,24 @@ print(dataset_example['rationales'][0])
 
 ## Evaluation
 
-Please prepare a `predictions_{split}-{setting}.json` file for each evaluation set (val and test splits, for both MC and DA settings) with the format: `{ question_id (str) : prediction (str) }`. Be sure this includes a prediction for **every** question in the evaluation set. You won't be able to run evaluation locally on test set predictions, since the ground-truth answers are hidden.
-
-```python
-import os
-import json
-import aokvqa_utils
-
-aokvqa_dir = os.getenv('AOKVQA_DIR')
-split = 'val'
-multiple_choice = True  # Set False for DA
-predictions_file = './path/to/predictions_val-mc.json'
+Please prepare `predictions_{split}-{setting}.json` files (for `split: {val,test}` and `setting: {mc,da}`) in the format `{ question_id (str) : prediction (str) }`.
 
-eval_dataset = aokvqa_utils.load_aokvqa(aokvqa_dir, split)
-predictions = json.load(open(predictions_file, 'r'))
+See the following example command for evaluation. Exclude `--multiple-choice` for the DA setting. You won't be able to run evaluation for the (private) test set locally.
 
-acc = aokvqa_utils.eval_aokvqa(eval_dataset, predictions, multiple_choice=multiple_choice)
-print(acc) # float
+```bash
+python evaluation/eval_predictions.py --aokvqa-dir ${AOKVQA_DIR} --split val --preds ./predictions_val-mc.json --multiple-choice
 ```
 
-To compute metrics over a batch of predictions files (e.g. `./predictions/{model-name}_val-da.json`), you can instead run `python evaluate_predictions.py --aokvqa-dir ${AOKVQA_DIR} --split val --preds "./predictions/*_val-da.json"`. Add the `--multiple-choice` flag to run MC evaluation over (e.g. `*_val-mc.json`) files that have instead been generated for the multiple-choice setting.
-
 ### Leaderboard
 
-You can submit predictions from your model to our leaderboard! Simply produce predictions files for each split and setting and [submit here](https://leaderboard.allenai.org/aokvqa). Remember that your model is not allowed to compare "choices" when predicting for the DA setting.
+First, unify predictions for each split as follows. You can omit either `--mc` or `--da` prediction file if you only want to evaluate one setting.
+
+```bash
+python evaluation/prepare_predictions.py --aokvqa-dir ${AOKVQA_DIR} --split val --mc ./predictions_val-mc.json --da ./predictions_val-da.json --out ./predictions_val.json
+# repeat for test split ...
+```
+
+Then, submit `predictions_val.json` and/or `predictions_test.json` to the [leaderboard](https://leaderboard.allenai.org/aokvqa).
 
 ## Codebase
 
@@ -187,19 +178,21 @@ mkdir -p ${LOG_DIR} ${PREDS_DIR} ${PT_MODEL_DIR}
 
 ```bash
 # Checkpoints for transfer learning experiments
-curl -L -s https://prior-model-weights.s3.us-east-2.amazonaws.com/aokvqa/transfer_exp_checkpoints.tar.gz | tar xvz -C ${PT_MODEL_DIR}/aokvqa_models
+curl -fsSL https://prior-model-weights.s3.us-east-2.amazonaws.com/aokvqa/transfer_exp_checkpoints.tar.gz | tar xvz -C ${PT_MODEL_DIR}/aokvqa_models
 
 # Checkpoints for ClipCap models (generating answers and rationales)
-curl -L -s https://prior-model-weights.s3.us-east-2.amazonaws.com/aokvqa/clipcap_checkpoints.tar.gz | tar xvz -C ${PT_MODEL_DIR}/aokvqa_models
+curl -fsSL https://prior-model-weights.s3.us-east-2.amazonaws.com/aokvqa/clipcap_checkpoints.tar.gz | tar xvz -C ${PT_MODEL_DIR}/aokvqa_models
 ```
 
 </details>
 
-We have included instructions for replicating each of our experiments. Please refer to the README.md files for:
+We have included instructions for replicating each of our experiments (see README.md files below).
+
+All Python scripts should be run from the root of this repository. Please be sure to first run the installation and data preparation as directed above.
+For each experiment, we follow this prediction file naming scheme: `{model-name}_{split}-{setting}.json` (e.g. `random-weighted_val-mc.json` or `random-weighted_test-da.json`). As examples in these Readme files, we produce predictions on the validation set.
+
 - [Heuristics](./heuristics/README.md)
 - [Transfer Learning Experiments](./transfer_experiments/README.md)
 - [Querying GPT-3](./gpt3/README.md)
 - [ClipCap](./ClipCap/README.md)
 - [Generating Captions & Rationales](./ClipCap/README.md)
-
-All Python scripts should be run from the root of this repository. Please be sure to first run the installation and data preparation as directed above. For each, we follow this prediction file naming scheme: `{model-name}_{split}-{setting}.json` (e.g. `random-weighted_val-mc.json` or `random-weighted_test-da.json`). As examples in these Readme files, we produce predictions on the validation set.
diff --git a/aokvqa_utils.py b/aokvqa_utils.py
diff --git a/data_scripts/build_vocab.py b/data_scripts/build_vocab.py
@@ -3,7 +3,7 @@
 from collections import Counter
 import pathlib
 
-from aokvqa_utils import load_aokvqa
+from load_aokvqa import load_aokvqa
 
 
 parser = argparse.ArgumentParser()

diff --git a/data_scripts/extract_bert_features.py b/data_scripts/extract_bert_features.py
@@ -6,7 +6,7 @@
 import torch
 from transformers import AutoTokenizer, AutoModel
 
-from aokvqa_utils import load_aokvqa
+from load_aokvqa import load_aokvqa
 
 
 parser = argparse.ArgumentParser()

diff --git a/data_scripts/extract_clip_features.py b/data_scripts/extract_clip_features.py
@@ -7,7 +7,7 @@
 import torch
 import clip
 
-from aokvqa_utils import load_aokvqa, get_coco_path
+from load_aokvqa import load_aokvqa, get_coco_path
 
 
 parser = argparse.ArgumentParser()

diff --git a/data_scripts/extract_resnet_features.py b/data_scripts/extract_resnet_features.py
@@ -9,7 +9,7 @@
 from torchvision import models
 from torchvision import transforms as T
 
-from aokvqa_utils import load_aokvqa, get_coco_path
+from load_aokvqa import load_aokvqa, get_coco_path
 
 
 parser = argparse.ArgumentParser()

diff --git a/evaluate_predictions.py b/evaluate_predictions.py
diff --git a/evaluation/eval_predictions.py b/evaluation/eval_predictions.py
@@ -0,0 +1,73 @@
+import argparse
+import pathlib
+import json
+import glob
+
+from load_aokvqa import load_aokvqa
+
+
+def eval_aokvqa(dataset, preds, multiple_choice=False, strict=True):
+
+    if isinstance(dataset, list):
+        dataset = { dataset[i]['question_id'] : dataset[i] for i in range(len(dataset)) }
+
+    if multiple_choice:
+        dataset = {k:v for k,v in dataset.items() if v['difficult_direct_answer'] is False}
+
+    if strict:
+        dataset_qids = set(dataset.keys())
+        preds_qids = set(preds.keys())
+        assert dataset_qids.issubset(preds_qids)
+
+    # dataset = q_id (str) : dataset element (dict)
+    # preds = q_id (str) : prediction (str)
+
+    acc = []
+
+    for q in dataset.keys():
+        if q not in preds.keys():
+            acc.append(0.0)
+            continue
+
+        pred = preds[q]
+        choices = dataset[q]['choices']
+        direct_answers = dataset[q]['direct_answers']
+
+        ## Multiple Choice setting
+        if multiple_choice:
+            if strict:
+                assert pred in choices, 'Prediction must be a valid choice'
+            correct_choice_idx = dataset[q]['correct_choice_idx']
+            acc.append( float(pred == choices[correct_choice_idx]) )
+        ## Direct Answer setting
+        else:
+            num_match = sum([pred == da for da in direct_answers])
+            vqa_acc = min(1.0, num_match / 3.0)
+            acc.append(vqa_acc)
+
+    acc = sum(acc) / len(acc) * 100
+
+    return acc
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--aokvqa-dir', type=pathlib.Path, required=True, dest='aokvqa_dir')
+    parser.add_argument('--split', type=str, choices=['train', 'val', 'test_w_ans'], required=True)
+    parser.add_argument('--preds', type=str, required=True, dest='prediction_files')
+    parser.add_argument('--multiple-choice', action='store_true', dest='multiple_choice')
+    args = parser.parse_args()
+
+    dataset = load_aokvqa(args.aokvqa_dir, args.split)
+
+    for prediction_file in glob.glob(args.prediction_files):
+        predictions = json.load(open(prediction_file, 'r'))
+
+        acc = eval_aokvqa(
+            dataset,
+            predictions,
+            multiple_choice=args.multiple_choice,
+            ensure_valid_choice=False
+        )
+
+        print(prediction_file, acc)
diff --git a/evaluation/prepare_predictions.py b/evaluation/prepare_predictions.py
@@ -0,0 +1,31 @@
+import argparse
+import pathlib
+import json
+
+from load_aokvqa import load_aokvqa
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--aokvqa-dir', type=pathlib.Path, required=True, dest='aokvqa_dir')
+    parser.add_argument('--split', type=str, choices=['train', 'val', 'test'], required=True)
+    parser.add_argument('--mc', type=argparse.FileType('r'), dest='mc_pred_file')
+    parser.add_argument('--da', type=argparse.FileType('r'), dest='da_pred_file')
+    parser.add_argument('--out', type=argparse.FileType('w'), dest='output_file')
+    args = parser.parse_args()
+    assert args.mc_pred_file or args.da_pred_file
+
+    dataset = load_aokvqa(args.aokvqa_dir, args.split)
+    mc_preds = json.load(args.mc_pred_file) if args.mc_pred_file else None
+    da_preds = json.load(args.da_pred_file) if args.da_pred_file else None
+    predictions = {}
+
+    for d in dataset:
+        q = d['question_id']
+        predictions[q] = {}
+        if mc_preds and q in mc_preds.keys():
+            predictions[q]['multiple_choice'] = mc_preds[q]
+        if da_preds and q in da_preds.keys():
+            predictions[q]['direct_answer'] = da_preds[q]
+
+    json.dump(predictions, args.output_file)