This is the official repository for the paper MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline. We release our codes and data.
MARIO REACT Corpus coming soon. 🤗🤖Gaokao-2023-ME are released here
Base Model: Llemma | Outcome Value Model | |
---|---|---|
7B | 🤗🤖MARIO-7B | 🤗🤖MARIO-OVM-7B |
34B | 🤗🤖MARIO-34B |
We demonstrate the results of our MARIO-7B and MARIO-34B as follows:
Model | Decoding | GSM | MATH | OCWCourse | Gaokao-2023-ME |
---|---|---|---|---|---|
MARIO-OVM-7B + OVM@20 | Hybrid | 83.6 | 60.6 | 25.4 | 42.9 |
MARIO-7B + OVM@20 | Hybrid | 82.9 | 59.1 | 28.3 | 45.2 |
MARIO-OVM-7B | Hybrid | 74.5 | 47.7 | 19.1 | 32.5 |
MARIO-7B | Hybrid | 70.1 | 46.3 | 19.9 | 35.6 |
ToRA-Code-7B | Hybrid | 72.6 | 44.6 | 4.8 | 23.9 |
MAmmoTH-Coder-7B | Hybrid | 59.4 | 33.4 | 11.0 | 15.3 |
MathCoder-7B | Hybrid | 67.8 | 30.2 | - | - |
MetaMath-7B-Mistral | CoT | 77.7 | 28.2 | - | - |
OpenChat-3.5-7B | CoT | 77.3 | 28.6 | - | - |
ChatGLM-3-6B | CoT | 72.3 | 25.7 | - | - |
Model | Decoding | GSM | MATH | OCWCourse | Gaokao-2023-ME |
---|---|---|---|---|---|
MARIO-34B | Hybrid | 78.7 | 53.1 | 25.4 | 41.3 |
ToRA-Code-34B | Hybrid | 80.7 | 50.8 | 5.5 | 31.7 |
MAmmoTH-Coder-34B | Hybrid | 72.7 | 43.6 | 14.0 | 25.2 |
MathCoder-34B | Hybrid | 81.7 | 45.2 | - | - |
DeepSeek-Coder-33B | PoT | 60.7 | 29.1 | - | - |
QWen-72B | CoT | 78.9 | 35.2 | - | - |
Clone this repository and install the required packages:
git clone https://github.com/MARIO-Math-Reasoning/MARIO.git
cd MARIO
pip install -r requirements.txt
pip install -e ./math_evaluation
python gpt_react.py --verbose -g "gpt-4-1106-preview" -q "Given complex number $(a+i)(1-ai)=2,\;a \in \mathbb{R}$, find $a$."
Our training is mostly performed on LLaMA-Factory code base. Please refer to that repo for more details.
Single question inference with screen output.
python react.py -c /path/to/checkpoint_dir -q "Compute tan(45)." --verbose
python batch_react.py -c /path/to/checkpoint_dir -q /path/to/question_file
Question file should be in jsonl
format, where each line is a json string. The json string should at least include a key value pair for question.
python eval.py -q /path/to/question_file
Question file should be in jsonl
format, where each line is a json string at least containing "pred" and "answer" keys for prediction and ground truth, respectively.
- hiyouga's LLaMA-Factory
Please cite our paper if you use our data, model or code. Please also kindly cite the original dataset papers.
@misc{liao2024mario,
title={MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline},
author={Minpeng Liao and Wei Luo and Chengxi Li and Jing Wu and Kai Fan},
year={2024},
eprint={2401.08190},
archivePrefix={arXiv},
primaryClass={cs.CL}
}