Table of Contents
- 🎉 [2023.11.18] We release paper at arxive.
The leaderboard can be found via Papers with Code or project page.
Images and Questions can be downloaded here.
To evaluate on our CORE-MM Benchmark, please follow below steps:
Step 0: Download Images and Questions
Generate responses for your model on the CORE-MM dataset. The response should be a json file with the following format:
{
"1": "the answer of question 1",
"2": "the answer of question 2",
...
"idx": "the answer of question idx"
}
After generating responses for your model, please name the json as model_name_model_size.json
e.g. CogVLM-Chat_17B.json
and send to us via email for evaluation.
We will evaluate your model and send you the results back.
@misc{han2023coremm,
title={CORE-MM: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models},
author={Xiaotian Han and Quanzeng You and Yongfei Liu and Wentao Chen and Huangjie Zheng and Khalil Mrini and Xudong Lin and Yiqi Wang and Bohan Zhai and Jianbo Yuan and Heng Wang and Hongxia Yang},
year={2023},
eprint={2311.11567},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
This project is licensed under the CC BY-NC 4.0.
The copyright of the images belongs to the original authors.
See LICENSE for more information.
Please feel free to contact us via email [email protected] if you have any questions.