zihanliu's picture
Upload 3 files
94f14e9 verified
|
raw
history blame
336 Bytes
## Introduction
This is the evaluation script used to reproduce math benchmarks scores for AceMath-1.5B/7B/72B-Instruct models based on their outputs. The benchmark can be downloaded from [Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math/tree/main/evaluation/data).
## Calculate Scores
```console
python calculate_scores.py
```