forked from OpenBMB/MiniCPM-o
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request OpenBMB#156 from lihytotoro/main
Add evaluation and inference
- Loading branch information
Showing
49 changed files
with
5,610 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,177 @@ | ||
# Evaluation | ||
|
||
## opencompass | ||
First, enter the `vlmevalkit` directory and install all dependencies: | ||
```bash | ||
cd vlmevalkit | ||
pip install -r requirements.txt | ||
``` | ||
<br /> | ||
|
||
Then, run `script/run_inference.sh`, which receives three input parameters in sequence: `MODELNAME`, `DATALIST`, and `MODE`. `MODELNAME` represents the name of the model, `DATALIST` represents the datasets used for inference, and `MODE` represents evaluation mode: | ||
```bash | ||
chmod +x ./script/run_inference.sh | ||
./script/run_inference.sh $MODELNAME $DATALIST $MODE | ||
``` | ||
<br /> | ||
|
||
The three available choices for `MODELNAME` are listed in `vlmeval/config.py`: | ||
```bash | ||
ungrouped = { | ||
'MiniCPM-V':partial(MiniCPM_V, model_path='openbmb/MiniCPM-V'), | ||
'MiniCPM-V-2':partial(MiniCPM_V, model_path='openbmb/MiniCPM-V-2'), | ||
'MiniCPM-Llama3-V-2_5':partial(MiniCPM_Llama3_V, model_path='openbmb/MiniCPM-Llama3-V-2_5'), | ||
} | ||
``` | ||
<br /> | ||
|
||
All available choices for `DATALIST` are listed in `vlmeval/utils/dataset_config.py`. While evaluating on a single dataset, call the dataset name directly without quotation marks; while evaluating on multiple datasets, separate the names of different datasets with spaces and add quotation marks at both ends: | ||
```bash | ||
$DATALIST="POPE ScienceQA_TEST ChartQA_TEST" | ||
``` | ||
<br /> | ||
|
||
While scoring on each benchmark directly, set `MODE=all`. If only inference results are required, set `MODE=infer`. In order to reproduce the results in the table displayed on the homepage (columns between MME and RealWorldQA), you need to run the script according to the following settings: | ||
```bash | ||
# run on all 7 datasets | ||
./script/run_inference.sh MiniCPM-Llama3-V-2_5 "MME MMBench_TEST_EN MMBench_TEST_CN MMMU_DEV_VAL MathVista_MINI LLaVABench RealWorldQA" all | ||
|
||
# The following are instructions for running on a single dataset | ||
# MME | ||
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MME all | ||
# MMBench_TEST_EN | ||
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MMBench_TEST_EN all | ||
# MMBench_TEST_CN | ||
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MMBench_TEST_CN all | ||
# MMMU_DEV_VAL | ||
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MMMU_DEV_VAL all | ||
# MathVista_MINI | ||
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MathVista_MINI all | ||
# LLaVABench | ||
./script/run_inference.sh MiniCPM-Llama3-V-2_5 LLaVABench all | ||
# RealWorldQA | ||
./script/run_inference.sh MiniCPM-Llama3-V-2_5 RealWorldQA all | ||
``` | ||
<br /> | ||
|
||
## vqadataset | ||
First, enter the `vqaeval` directory and install all dependencies. Then, create `downloads` subdirectory to store the downloaded dataset for all tasks: | ||
```bash | ||
cd vqaeval | ||
pip install -r requirements.txt | ||
mkdir downloads | ||
``` | ||
<br /> | ||
|
||
Download the datasets from the following links and place it in the specified directories: | ||
###### TextVQA | ||
```bash | ||
cd downloads | ||
mkdir TextVQA && cd TextVQA | ||
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip | ||
unzip train_val_images.zip && rm train_val_images.zip | ||
mv train_val_images/train_images . && rm -rf train_val_images | ||
wget https://dl.fbaipublicfiles.com/textvqa/data/TextVQA_0.5.1_val.json | ||
cd ../.. | ||
``` | ||
|
||
###### DocVQA / DocVQATest | ||
|
||
```bash | ||
cd downloads | ||
mkdir DocVQA && cd DocVQA && mkdir spdocvqa_images | ||
# Download Images and Annotations from Task 1 - Single Page Document Visual Question Answering at https://rrc.cvc.uab.es/?ch=17&com=downloads | ||
# Move the spdocvqa_images.tar.gz and spdocvqa_qas.zip to DocVQA directory | ||
tar -zxvf spdocvqa_images.tar.gz -C spdocvqa_images && rm spdocvqa_images.tar.gz | ||
unzip spdocvqa_qas.zip && rm spdocvqa_qas.zip | ||
cp spdocvqa_qas/val_v1.0_withQT.json . && cp spdocvqa_qas/test_v1.0.json . && rm -rf spdocvqa_qas | ||
cd ../.. | ||
``` | ||
<br /> | ||
|
||
The `downloads` directory should be organized according to the following structure: | ||
```bash | ||
downloads | ||
├── TextVQA | ||
│ ├── train_images | ||
│ │ ├── ... | ||
│ ├── TextVQA_0.5.1_val.json | ||
├── DocVQA | ||
│ ├── spdocvqa_images | ||
│ │ ├── ... | ||
│ ├── val_v1.0_withQT.json | ||
│ ├── test_v1.0.json | ||
``` | ||
<br /> | ||
|
||
Modify the parameters in `shell/run_inference.sh` and run inference: | ||
|
||
```bash | ||
chmod +x ./shell/run_inference.sh | ||
./shell/run_inference.sh | ||
``` | ||
<br /> | ||
|
||
All optional parameters are listed in `eval_utils/getargs.py`. The meanings of some major parameters are listed as follows: | ||
```bash | ||
# path to images and their corresponding questions | ||
# TextVQA | ||
--textVQA_image_dir | ||
--textVQA_ann_path | ||
# DocVQA | ||
--docVQA_image_dir | ||
--docVQA_ann_path | ||
# DocVQATest | ||
--docVQATest_image_dir | ||
--docVQATest_ann_path | ||
|
||
# whether to eval on certain task | ||
--eval_textVQA | ||
--eval_docVQA | ||
--eval_docVQATest | ||
--eval_all | ||
|
||
# model name and model path | ||
--model_name | ||
--model_path | ||
# load model from ckpt | ||
--ckpt | ||
# the way the model processes input data, "interleave" represents interleaved image-text form, while "old" represents non-interleaved. | ||
--generate_method | ||
|
||
--batchsize | ||
|
||
# path to save the outputs | ||
--answer_path | ||
``` | ||
<br /> | ||
|
||
While evaluating on different tasks, parameters need to be set as follows: | ||
###### TextVQA | ||
```bash | ||
--eval_textVQA | ||
--textVQA_image_dir ./downloads/TextVQA/train_images | ||
--textVQA_ann_path ./downloads/TextVQA/TextVQA_0.5.1_val.json | ||
``` | ||
|
||
###### DocVQA | ||
```bash | ||
--eval_docVQA | ||
--docVQA_image_dir ./downloads/DocVQA/spdocvqa_images | ||
--docVQA_ann_path ./downloads/DocVQA/val_v1.0_withQT.json | ||
``` | ||
|
||
###### DocVQATest | ||
```bash | ||
--eval_docVQATest | ||
--docVQATest_image_dir ./downloads/DocVQA/spdocvqa_images | ||
--docVQATest_ann_path ./downloads/DocVQA/test_v1.0.json | ||
``` | ||
|
||
<br /> | ||
|
||
For the DocVQATest task, in order to upload the inference results to the [official website](https://rrc.cvc.uab.es/?ch=17) for evaluation, run `shell/run_transform.sh` for format transformation after inference. `input_file_path` represents the path to the original output json, `output_file_path` represents the path to the transformed json: | ||
```bash | ||
chmod +x ./shell/run_transform.sh | ||
./shell/run_transform.sh | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,175 @@ | ||
# Evaluation | ||
|
||
## opencompass | ||
首先,进入 `vlmevalkit` 目录下,安装必要的依赖: | ||
```bash | ||
cd vlmevalkit | ||
pip install -r requirements.txt | ||
``` | ||
<br /> | ||
|
||
然后,运行 `script/run_inference.sh`,该脚本依次接收三个输入参数:`MODELNAME`, `DATALIST`, `MODE`。`MODELNAME` 为模型名称,`DATALIST` 为目标数据集,`MODE` 为评测模式。 | ||
```bash | ||
chmod +x ./script/run_inference.sh | ||
./script/run_inference.sh $MODELNAME $DATALIST $MODE | ||
``` | ||
<br /> | ||
|
||
`MODELNAME` 有三种选择,位于 `vlmeval/config.py` 中: | ||
```bash | ||
ungrouped = { | ||
'MiniCPM-V':partial(MiniCPM_V, model_path='openbmb/MiniCPM-V'), | ||
'MiniCPM-V-2':partial(MiniCPM_V, model_path='openbmb/MiniCPM-V-2'), | ||
'MiniCPM-Llama3-V-2_5':partial(MiniCPM_Llama3_V, model_path='openbmb/MiniCPM-Llama3-V-2_5'), | ||
} | ||
``` | ||
<br /> | ||
|
||
可选的所有 `DATALIST` 位于 `vlmeval/utils/dataset_config.py` 中,评测单个数据集时,直接调用数据集名称,不加引号;评测多个数据集时,将不同数据集名称以空格隔开,两端加引号: | ||
```bash | ||
$DATALIST="POPE ScienceQA_TEST ChartQA_TEST" | ||
``` | ||
<br /> | ||
|
||
直接对各 benchmark 进行评分时,设置 `MODE=all`。如果仅需要推理结果,则设置 `MODE=infer` | ||
为了复现出首页展示的表格中的各项结果(MME 到 RealWorldQA 之间的列),需要按照如下设置运行: | ||
```bash | ||
# 一次性运行 7 个数据集 | ||
./script/run_inference.sh MiniCPM-Llama3-V-2_5 "MME MMBench_TEST_EN MMBench_TEST_CN MMMU_DEV_VAL MathVista_MINI LLaVABench RealWorldQA" all | ||
|
||
# 以下是单独运行 1 个数据集的指令 | ||
# MME | ||
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MME all | ||
# MMBench_TEST_EN | ||
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MMBench_TEST_EN all | ||
# MMBench_TEST_CN | ||
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MMBench_TEST_CN all | ||
# MMMU_DEV_VAL | ||
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MMMU_DEV_VAL all | ||
# MathVista_MINI | ||
./script/run_inference.sh MiniCPM-Llama3-V-2_5 MathVista_MINI all | ||
# LLaVABench | ||
./script/run_inference.sh MiniCPM-Llama3-V-2_5 LLaVABench all | ||
# RealWorldQA | ||
./script/run_inference.sh MiniCPM-Llama3-V-2_5 RealWorldQA all | ||
``` | ||
<br /> | ||
|
||
## vqadataset | ||
首先,进入 `vqaeval` 目录下,安装必要的依赖,并创建 `downloads` 子目录,用于存储下载的数据集: | ||
```bash | ||
cd vqaeval | ||
pip install -r requirements.txt | ||
mkdir downloads | ||
``` | ||
<br /> | ||
|
||
然后,从下列各地址下载数据集并置于指定目录下: | ||
###### TextVQA | ||
```bash | ||
cd downloads | ||
mkdir TextVQA && cd TextVQA | ||
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip | ||
unzip train_val_images.zip && rm train_val_images.zip | ||
mv train_val_images/train_images . && rm -rf train_val_images | ||
wget https://dl.fbaipublicfiles.com/textvqa/data/TextVQA_0.5.1_val.json | ||
cd ../.. | ||
``` | ||
|
||
###### DocVQA / DocVQATest | ||
```bash | ||
cd downloads | ||
mkdir DocVQA && cd DocVQA && mkdir spdocvqa_images | ||
# 在 https://rrc.cvc.uab.es/?ch=17&com=downloads 下载 Task 1 - Single Page Document Visual Question Answering 下的 Images 和 Annotations | ||
# 将下载得到的 spdocvqa_images.tar.gz 以及 spdocvqa_qas.zip 置于 DocVQA 目录下 | ||
tar -zxvf spdocvqa_images.tar.gz -C spdocvqa_images && rm spdocvqa_images.tar.gz | ||
unzip spdocvqa_qas.zip && rm spdocvqa_qas.zip | ||
cp spdocvqa_qas/val_v1.0_withQT.json . && cp spdocvqa_qas/test_v1.0.json . && rm -rf spdocvqa_qas | ||
cd ../.. | ||
``` | ||
<br /> | ||
|
||
`downloads` 目录应当按照下列结构组织: | ||
```bash | ||
downloads | ||
├── TextVQA | ||
│ ├── train_images | ||
│ │ ├── ... | ||
│ ├── TextVQA_0.5.1_val.json | ||
├── DocVQA | ||
│ ├── spdocvqa_images | ||
│ │ ├── ... | ||
│ ├── val_v1.0_withQT.json | ||
│ ├── test_v1.0.json | ||
``` | ||
<br /> | ||
|
||
准备好相应的数据集之后,修改 `shell/run_inference.sh` 的参数,运行推理: | ||
|
||
```bash | ||
chmod +x ./shell/run_inference.sh | ||
./shell/run_inference.sh | ||
``` | ||
<br /> | ||
|
||
可以传入的参数位于 `eval_utils/getargs.py` 中,各主要参数的含义如下: | ||
```bash | ||
# 指定 TextVQA 评测所有图片和问题的路径 | ||
--textVQA_image_dir | ||
--textVQA_ann_path | ||
# 指定 DocVQA 评测所有图片和问题的路径 | ||
--docVQA_image_dir | ||
--docVQA_ann_path | ||
# 指定 DocVQATest 评测所有图片和问题的路径 | ||
--docVQATest_image_dir | ||
--docVQATest_ann_path | ||
|
||
# 决定是否评测某个任务,eval_all 设置为 True 表示所有任务都评测 | ||
--eval_textVQA | ||
--eval_docVQA | ||
--eval_docVQATest | ||
--eval_all | ||
|
||
# 模型名称、模型路径(从指定路径加载模型) | ||
--model_name | ||
--model_path | ||
# 从 checkpoint 加载模型 | ||
--ckpt | ||
# 模型处理输入数据的方式,interleave 表示图文交错式,old 表示非交错式 | ||
--generate_method | ||
# 推理时的批处理规模,建议推理时设置为 1 | ||
--batchsize | ||
|
||
# 输出内容保存的路径 | ||
--answer_path | ||
``` | ||
<br /> | ||
|
||
评测三个任务需要设置的参数如下: | ||
###### TextVQA | ||
```bash | ||
--eval_textVQA | ||
--textVQA_image_dir ./downloads/TextVQA/train_images | ||
--textVQA_ann_path ./downloads/TextVQA/TextVQA_0.5.1_val.json | ||
``` | ||
|
||
###### DocVQA | ||
```bash | ||
--eval_docVQA | ||
--docVQA_image_dir ./downloads/DocVQA/spdocvqa_images | ||
--docVQA_ann_path ./downloads/DocVQA/val_v1.0_withQT.json | ||
``` | ||
|
||
###### DocVQATest | ||
```bash | ||
--eval_docVQATest | ||
--docVQATest_image_dir ./downloads/DocVQA/spdocvqa_images | ||
--docVQATest_ann_path ./downloads/DocVQA/test_v1.0.json | ||
``` | ||
<br /> | ||
|
||
对于 DocVQATest 任务,为了将推理结果上传到[官方网站](https://rrc.cvc.uab.es/?ch=17)进行评测,还需要运行 `shell/run_transform.sh` 进行格式转换。其中,`input_file_path` 对应原始输出的 json 的路径,`output_file_path` 为自定义的转换后的 json 的路径: | ||
```bash | ||
chmod +x ./shell/run_transform.sh | ||
./shell/run_transform.sh | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
einops | ||
gradio==4.15.0 | ||
huggingface_hub | ||
matplotlib | ||
numpy>=1.23.4 | ||
omegaconf | ||
openai==1.3.5 | ||
opencv-python>=4.4.0.46 | ||
openpyxl | ||
pandas>=1.5.3 | ||
pillow | ||
portalocker | ||
protobuf | ||
pycocoevalcap | ||
python-dotenv | ||
requests | ||
rich | ||
seaborn | ||
sentencepiece | ||
sty | ||
tabulate | ||
tiktoken | ||
timeout-decorator | ||
tqdm | ||
typing_extensions==4.7.1 | ||
validators | ||
visual_genome | ||
xlsxwriter | ||
Pillow==10.1.0 | ||
sentencepiece==0.1.99 | ||
transformers==4.40.0 | ||
torch==1.13.1 | ||
torchvision |
Oops, something went wrong.