Skip to content

Commit

Permalink
Merge pull request OpenBMB#156 from lihytotoro/main
Browse files Browse the repository at this point in the history
Add evaluation and inference
  • Loading branch information
Cuiunbo authored May 28, 2024
2 parents f592fed + 65f5567 commit d58dcde
Show file tree
Hide file tree
Showing 49 changed files with 5,610 additions and 0 deletions.
177 changes: 177 additions & 0 deletions eval_mm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# Evaluation

## opencompass
First, enter the `vlmevalkit` directory and install all dependencies:
```bash
cd vlmevalkit
pip install -r requirements.txt
```
<br />

Then, run `script/run_inference.sh`, which receives three input parameters in sequence: `MODELNAME`, `DATALIST`, and `MODE`. `MODELNAME` represents the name of the model, `DATALIST` represents the datasets used for inference, and `MODE` represents evaluation mode:
```bash
chmod +x ./script/run_inference.sh
./script/run_inference.sh $MODELNAME $DATALIST $MODE
```
<br />

The three available choices for `MODELNAME` are listed in `vlmeval/config.py`:
```bash
ungrouped = {
'MiniCPM-V':partial(MiniCPM_V, model_path='openbmb/MiniCPM-V'),
'MiniCPM-V-2':partial(MiniCPM_V, model_path='openbmb/MiniCPM-V-2'),
'MiniCPM-Llama3-V-2_5':partial(MiniCPM_Llama3_V, model_path='openbmb/MiniCPM-Llama3-V-2_5'),
}
```
<br />

All available choices for `DATALIST` are listed in `vlmeval/utils/dataset_config.py`. While evaluating on a single dataset, call the dataset name directly without quotation marks; while evaluating on multiple datasets, separate the names of different datasets with spaces and add quotation marks at both ends:
```bash
$DATALIST="POPE ScienceQA_TEST ChartQA_TEST"
```
<br />

While scoring on each benchmark directly, set `MODE=all`. If only inference results are required, set `MODE=infer`. In order to reproduce the results in the table displayed on the homepage (columns between MME and RealWorldQA), you need to run the script according to the following settings:
```bash
# run on all 7 datasets
./script/run_inference.sh MiniCPM-Llama3-V-2_5 "MME MMBench_TEST_EN MMBench_TEST_CN MMMU_DEV_VAL MathVista_MINI LLaVABench RealWorldQA" all

# The following are instructions for running on a single dataset
# MME
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MME all
# MMBench_TEST_EN
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MMBench_TEST_EN all
# MMBench_TEST_CN
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MMBench_TEST_CN all
# MMMU_DEV_VAL
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MMMU_DEV_VAL all
# MathVista_MINI
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MathVista_MINI all
# LLaVABench
./script/run_inference.sh MiniCPM-Llama3-V-2_5 LLaVABench all
# RealWorldQA
./script/run_inference.sh MiniCPM-Llama3-V-2_5 RealWorldQA all
```
<br />

## vqadataset
First, enter the `vqaeval` directory and install all dependencies. Then, create `downloads` subdirectory to store the downloaded dataset for all tasks:
```bash
cd vqaeval
pip install -r requirements.txt
mkdir downloads
```
<br />

Download the datasets from the following links and place it in the specified directories:
###### TextVQA
```bash
cd downloads
mkdir TextVQA && cd TextVQA
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
unzip train_val_images.zip && rm train_val_images.zip
mv train_val_images/train_images . && rm -rf train_val_images
wget https://dl.fbaipublicfiles.com/textvqa/data/TextVQA_0.5.1_val.json
cd ../..
```

###### DocVQA / DocVQATest

```bash
cd downloads
mkdir DocVQA && cd DocVQA && mkdir spdocvqa_images
# Download Images and Annotations from Task 1 - Single Page Document Visual Question Answering at https://rrc.cvc.uab.es/?ch=17&com=downloads
# Move the spdocvqa_images.tar.gz and spdocvqa_qas.zip to DocVQA directory
tar -zxvf spdocvqa_images.tar.gz -C spdocvqa_images && rm spdocvqa_images.tar.gz
unzip spdocvqa_qas.zip && rm spdocvqa_qas.zip
cp spdocvqa_qas/val_v1.0_withQT.json . && cp spdocvqa_qas/test_v1.0.json . && rm -rf spdocvqa_qas
cd ../..
```
<br />

The `downloads` directory should be organized according to the following structure:
```bash
downloads
├── TextVQA
│ ├── train_images
│ │ ├── ...
│ ├── TextVQA_0.5.1_val.json
├── DocVQA
│ ├── spdocvqa_images
│ │ ├── ...
│ ├── val_v1.0_withQT.json
│ ├── test_v1.0.json
```
<br />

Modify the parameters in `shell/run_inference.sh` and run inference:

```bash
chmod +x ./shell/run_inference.sh
./shell/run_inference.sh
```
<br />

All optional parameters are listed in `eval_utils/getargs.py`. The meanings of some major parameters are listed as follows:
```bash
# path to images and their corresponding questions
# TextVQA
--textVQA_image_dir
--textVQA_ann_path
# DocVQA
--docVQA_image_dir
--docVQA_ann_path
# DocVQATest
--docVQATest_image_dir
--docVQATest_ann_path

# whether to eval on certain task
--eval_textVQA
--eval_docVQA
--eval_docVQATest
--eval_all

# model name and model path
--model_name
--model_path
# load model from ckpt
--ckpt
# the way the model processes input data, "interleave" represents interleaved image-text form, while "old" represents non-interleaved.
--generate_method

--batchsize

# path to save the outputs
--answer_path
```
<br />

While evaluating on different tasks, parameters need to be set as follows:
###### TextVQA
```bash
--eval_textVQA
--textVQA_image_dir ./downloads/TextVQA/train_images
--textVQA_ann_path ./downloads/TextVQA/TextVQA_0.5.1_val.json
```

###### DocVQA
```bash
--eval_docVQA
--docVQA_image_dir ./downloads/DocVQA/spdocvqa_images
--docVQA_ann_path ./downloads/DocVQA/val_v1.0_withQT.json
```

###### DocVQATest
```bash
--eval_docVQATest
--docVQATest_image_dir ./downloads/DocVQA/spdocvqa_images
--docVQATest_ann_path ./downloads/DocVQA/test_v1.0.json
```

<br />

For the DocVQATest task, in order to upload the inference results to the [official website](https://rrc.cvc.uab.es/?ch=17) for evaluation, run `shell/run_transform.sh` for format transformation after inference. `input_file_path` represents the path to the original output json, `output_file_path` represents the path to the transformed json:
```bash
chmod +x ./shell/run_transform.sh
./shell/run_transform.sh
```
175 changes: 175 additions & 0 deletions eval_mm/README_zh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# Evaluation

## opencompass
首先,进入 `vlmevalkit` 目录下,安装必要的依赖:
```bash
cd vlmevalkit
pip install -r requirements.txt
```
<br />

然后,运行 `script/run_inference.sh`,该脚本依次接收三个输入参数:`MODELNAME`, `DATALIST`, `MODE``MODELNAME` 为模型名称,`DATALIST` 为目标数据集,`MODE` 为评测模式。
```bash
chmod +x ./script/run_inference.sh
./script/run_inference.sh $MODELNAME $DATALIST $MODE
```
<br />

`MODELNAME` 有三种选择,位于 `vlmeval/config.py` 中:
```bash
ungrouped = {
'MiniCPM-V':partial(MiniCPM_V, model_path='openbmb/MiniCPM-V'),
'MiniCPM-V-2':partial(MiniCPM_V, model_path='openbmb/MiniCPM-V-2'),
'MiniCPM-Llama3-V-2_5':partial(MiniCPM_Llama3_V, model_path='openbmb/MiniCPM-Llama3-V-2_5'),
}
```
<br />

可选的所有 `DATALIST` 位于 `vlmeval/utils/dataset_config.py` 中,评测单个数据集时,直接调用数据集名称,不加引号;评测多个数据集时,将不同数据集名称以空格隔开,两端加引号:
```bash
$DATALIST="POPE ScienceQA_TEST ChartQA_TEST"
```
<br />

直接对各 benchmark 进行评分时,设置 `MODE=all`。如果仅需要推理结果,则设置 `MODE=infer`
为了复现出首页展示的表格中的各项结果(MME 到 RealWorldQA 之间的列),需要按照如下设置运行:
```bash
# 一次性运行 7 个数据集
./script/run_inference.sh MiniCPM-Llama3-V-2_5 "MME MMBench_TEST_EN MMBench_TEST_CN MMMU_DEV_VAL MathVista_MINI LLaVABench RealWorldQA" all

# 以下是单独运行 1 个数据集的指令
# MME
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MME all
# MMBench_TEST_EN
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MMBench_TEST_EN all
# MMBench_TEST_CN
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MMBench_TEST_CN all
# MMMU_DEV_VAL
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MMMU_DEV_VAL all
# MathVista_MINI
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MathVista_MINI all
# LLaVABench
./script/run_inference.sh MiniCPM-Llama3-V-2_5 LLaVABench all
# RealWorldQA
./script/run_inference.sh MiniCPM-Llama3-V-2_5 RealWorldQA all
```
<br />

## vqadataset
首先,进入 `vqaeval` 目录下,安装必要的依赖,并创建 `downloads` 子目录,用于存储下载的数据集:
```bash
cd vqaeval
pip install -r requirements.txt
mkdir downloads
```
<br />

然后,从下列各地址下载数据集并置于指定目录下:
###### TextVQA
```bash
cd downloads
mkdir TextVQA && cd TextVQA
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
unzip train_val_images.zip && rm train_val_images.zip
mv train_val_images/train_images . && rm -rf train_val_images
wget https://dl.fbaipublicfiles.com/textvqa/data/TextVQA_0.5.1_val.json
cd ../..
```

###### DocVQA / DocVQATest
```bash
cd downloads
mkdir DocVQA && cd DocVQA && mkdir spdocvqa_images
# 在 https://rrc.cvc.uab.es/?ch=17&com=downloads 下载 Task 1 - Single Page Document Visual Question Answering 下的 Images 和 Annotations
# 将下载得到的 spdocvqa_images.tar.gz 以及 spdocvqa_qas.zip 置于 DocVQA 目录下
tar -zxvf spdocvqa_images.tar.gz -C spdocvqa_images && rm spdocvqa_images.tar.gz
unzip spdocvqa_qas.zip && rm spdocvqa_qas.zip
cp spdocvqa_qas/val_v1.0_withQT.json . && cp spdocvqa_qas/test_v1.0.json . && rm -rf spdocvqa_qas
cd ../..
```
<br />

`downloads` 目录应当按照下列结构组织:
```bash
downloads
├── TextVQA
│ ├── train_images
│ │ ├── ...
│ ├── TextVQA_0.5.1_val.json
├── DocVQA
│ ├── spdocvqa_images
│ │ ├── ...
│ ├── val_v1.0_withQT.json
│ ├── test_v1.0.json
```
<br />

准备好相应的数据集之后,修改 `shell/run_inference.sh` 的参数,运行推理:

```bash
chmod +x ./shell/run_inference.sh
./shell/run_inference.sh
```
<br />

可以传入的参数位于 `eval_utils/getargs.py` 中,各主要参数的含义如下:
```bash
# 指定 TextVQA 评测所有图片和问题的路径
--textVQA_image_dir
--textVQA_ann_path
# 指定 DocVQA 评测所有图片和问题的路径
--docVQA_image_dir
--docVQA_ann_path
# 指定 DocVQATest 评测所有图片和问题的路径
--docVQATest_image_dir
--docVQATest_ann_path

# 决定是否评测某个任务,eval_all 设置为 True 表示所有任务都评测
--eval_textVQA
--eval_docVQA
--eval_docVQATest
--eval_all

# 模型名称、模型路径(从指定路径加载模型)
--model_name
--model_path
# 从 checkpoint 加载模型
--ckpt
# 模型处理输入数据的方式,interleave 表示图文交错式,old 表示非交错式
--generate_method
# 推理时的批处理规模,建议推理时设置为 1
--batchsize

# 输出内容保存的路径
--answer_path
```
<br />

评测三个任务需要设置的参数如下:
###### TextVQA
```bash
--eval_textVQA
--textVQA_image_dir ./downloads/TextVQA/train_images
--textVQA_ann_path ./downloads/TextVQA/TextVQA_0.5.1_val.json
```

###### DocVQA
```bash
--eval_docVQA
--docVQA_image_dir ./downloads/DocVQA/spdocvqa_images
--docVQA_ann_path ./downloads/DocVQA/val_v1.0_withQT.json
```

###### DocVQATest
```bash
--eval_docVQATest
--docVQATest_image_dir ./downloads/DocVQA/spdocvqa_images
--docVQATest_ann_path ./downloads/DocVQA/test_v1.0.json
```
<br />

对于 DocVQATest 任务,为了将推理结果上传到[官方网站](https://rrc.cvc.uab.es/?ch=17)进行评测,还需要运行 `shell/run_transform.sh` 进行格式转换。其中,`input_file_path` 对应原始输出的 json 的路径,`output_file_path` 为自定义的转换后的 json 的路径:
```bash
chmod +x ./shell/run_transform.sh
./shell/run_transform.sh
```
33 changes: 33 additions & 0 deletions eval_mm/vlmevalkit/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
einops
gradio==4.15.0
huggingface_hub
matplotlib
numpy>=1.23.4
omegaconf
openai==1.3.5
opencv-python>=4.4.0.46
openpyxl
pandas>=1.5.3
pillow
portalocker
protobuf
pycocoevalcap
python-dotenv
requests
rich
seaborn
sentencepiece
sty
tabulate
tiktoken
timeout-decorator
tqdm
typing_extensions==4.7.1
validators
visual_genome
xlsxwriter
Pillow==10.1.0
sentencepiece==0.1.99
transformers==4.40.0
torch==1.13.1
torchvision
Loading

0 comments on commit d58dcde

Please sign in to comment.