Skip to content

HiThink-Research/MME-Finance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning

Ziliang Gan · Yu Lu · Dong Zhang · Haohan Li · Che Liu · Jian Liu · Ji Liu · Haipang Wu · Chaoyou Fu · Zenglin Xu · Rongjunchen Zhang · Yong Dai

📖Paper |🏠Homepage|🤗Huggingface

In recent years, multimodal benchmarks for general domains have guided the rapid development of multimodal models on general tasks. However, the financial field has its peculiarities. It features unique graphical images (e.g., candlestick charts, technical indicator charts) and possesses a wealth of specialized financial knowledge (e.g., futures, turnover rate).

Benchmarks from general fields often fail to measure the performance of multimodal models in the financial domain, and thus cannot effectively guide the rapid development of large financial models. To promote the development of large financial multimodal models, we propose MME-Finance, an bilingual open-ended and practical usage-oriented Visual Question Answering (VQA) benchmark.

📢 News

  • 🚀 [11/05/2024] We released MME-Finance benchmark, a bilingual multimodal benchmark in financial domain.

💡 Highlights

  • 🔥 Bilingual multimodal financial benchmark: MME-Finance is the first Bilingual multimodal financial benchmark which comprises 1,171 English and 1,103 Chinese open-ended questions, covering diverse financial image types and various multimodal capabilities.
  • 🔥 Evaluation strategy: MME-Finance proposes a Elaborate evaluation strategy that taking image into consideration, and has a high consistency with humans. It can serve as a reference for evaluating MLLMs for other works.
  • 🔥 Valuable insights: We conduct extensive evaluation on 19 MLLMs based on MME-Finance, revealing critical insights about the strengths and shortcomings of the current MLLMs in financial applications.

🛠️ Usage

We have integrated MMfin into the VLMEvalKit framework. For the environment configuration and the use of API, please refer to VLMEvalKit. Regarding the data, first of all, you should download the MMfin.tsv and MMfin_CN.tsv files, as well as the relevant financial images. The folder structure is shown as follows:

├─ datasets
    ├─ images
        ├─ MMfin
            ...
        ├─ MMfin_CN
            ...
    │ MMfin.tsv
    │ MMfin_CN.tsv

The following is the process of inference and evaluation (Qwen2-VL-2B-Instruct as an example):

export LMUData="The path of the datasets"
python run.py --data MMfin --model Qwen2-VL-2B-Instruct --verbose
python run.py --data MMfin_CN --model Qwen2-VL-2B-Instruct --verbose

✨ Todo

Currently, we have released 110 samples in both English and Chinese.

Here is the performance of Qwen2-VL-72B on MMfin.

"Category","tot","acc"
"Accurate Numerical Calculation","10","100.0"
"Entity Recognition","10","68.0"
"Explain Reason","10","82.0"
"Financial Knowledge","10","80.0"
"Image Caption","10","78.0"
"Investment Advice","10","64.0"
"Not Applicable","10","90.0"
"Numerical Calculation","10","48.0"
"OCR","10","66.0"
"Risk Warning","10","88.0"
"Spatial Awareness","10","52.0"
"Overall","110","74.18181818181819"

Here is the performance of Qwen2-VL-72B on MMfin_CN.

"Category","tot","acc"
"Accurate Numerical Calculation","10","80.0"
"Entity Recognition","10","66.0"
"Explain Reason","10","78.0"
"Financial Knowledge","10","86.0"
"Image Caption","10","100.0"
"Investment Advice","10","76.0"
"Not Applicable","10","46.0"
"Numerical Calculation","10","60.0"
"OCR","10","82.0"
"Risk Warning","10","84.0"
"Spatial Awareness","10","58.0"
"Overall","110","74.18181818181819"

We will release all the data within approximately a month.

✒️Citation

@article{gan2024woodpecker,
  title={MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning},
  author={Gan, Ziliang and Lu, Yu and Zang, Dong and Li, Haohan and Liu, Che and Liu, Jian and Liu, Ji and Wu, Haipang and Fu, Chaoyou and Xu, Zenglin and Zhang, Rongjunchen and Dai, Yong},
  journal={arXiv preprint arXiv:2411.03314},
  year={2024}
}

📄 License

Code License Data License Usage and License Notices: The data and code are intended and licensed for research use only. License: Attribution-NonCommercial 4.0 International It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use

💖 Acknowledgement

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published