Update README.md
Browse files
README.md
CHANGED
@@ -17,31 +17,30 @@ The AceMath-1.5B/7B/72B-Instruct models excel at solving English mathematical pr
|
|
17 |
|
18 |
The AceMath-1.5B/7B/72B-Instruct models are developed from the Qwen2.5-Math-1.5B/7B/72B-Base models, leveraging a multi-stage supervised fine-tuning (SFT) process: first with general-purpose SFT data, followed by math-specific SFT data. We are releasing all training data to support further research in this field.
|
19 |
|
|
|
|
|
20 |
For more information about AceMath, check our [website](https://research.nvidia.com/labs/adlr/acemath/) and [paper](https://arxiv.org/abs/2412.15084).
|
21 |
|
|
|
22 |
## All Resources
|
23 |
[AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct)   [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct)   [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct)
|
24 |
|
25 |
[AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM)   [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM)
|
26 |
|
27 |
-
[AceMath-Instruct Training Data](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data)   [AceMath-RM Training Data](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data)
|
|
|
|
|
|
|
|
|
28 |
|
29 |
-
[AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench)   [AceMath Evaluation Script](https://huggingface.co/datasets/nvidia/AceMath-RewardBench/tree/main/scripts)
|
30 |
|
31 |
## Benchmark Results
|
32 |
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
| MATH | 81.10 | 75.90 | 73.80 | 75.80 | 83.60 | 85.90 | 76.84 | 83.14 | 86.10 |
|
37 |
-
| Minerva Math | 50.74 | 48.16 | 54.04 | 29.40 | 37.10 | 44.10 | 41.54 | 51.11 | 56.99 |
|
38 |
-
| GaoKao 2023En | 67.50 | 64.94 | 62.08 | 65.50 | 66.80 | 71.90 | 64.42 | 68.05 | 72.21 |
|
39 |
-
| Olympiad Bench | 43.30 | 37.93 | 34.81 | 38.10 | 41.60 | 49.00 | 33.78 | 42.22 | 48.44 |
|
40 |
-
| College Math | 48.50 | 48.47 | 49.25 | 47.70 | 46.80 | 49.50 | 54.36 | 56.64 | 57.24 |
|
41 |
-
| MMLU STEM | 87.99 | 85.06 | 83.10 | 57.50 | 71.90 | 80.80 | 62.04 | 75.32 | 85.44 |
|
42 |
-
| Average | 67.43 | 65.27 | 64.84 | 56.97 | 63.29 | 68.16 | 59.99 | 67.17 | 71.84 |
|
43 |
|
44 |
-
|
45 |
|
46 |
|
47 |
## How to use
|
@@ -91,5 +90,4 @@ If you find our work helpful, we’d appreciate it if you could cite us.
|
|
91 |
|
92 |
|
93 |
## License
|
94 |
-
All models in the AceMath family are for non-commercial use only, subject to [Terms of Use](https://openai.com/policies/row-terms-of-use/) of the data generated by OpenAI. We put the AceMath models under the license of [Creative Commons Attribution: Non-Commercial 4.0 International](https://spdx.org/licenses/CC-BY-NC-4.0).
|
95 |
-
|
|
|
17 |
|
18 |
The AceMath-1.5B/7B/72B-Instruct models are developed from the Qwen2.5-Math-1.5B/7B/72B-Base models, leveraging a multi-stage supervised fine-tuning (SFT) process: first with general-purpose SFT data, followed by math-specific SFT data. We are releasing all training data to support further research in this field.
|
19 |
|
20 |
+
We only recommend using the AceMath models for solving math problems. To support other tasks, we also release AceInstruct-1.5B/7B/72B, a series of general-purpose SFT models designed to handle code, math, and general knowledge tasks. These models are built upon the Qwen2.5-1.5B/7B/72B-Base.
|
21 |
+
|
22 |
For more information about AceMath, check our [website](https://research.nvidia.com/labs/adlr/acemath/) and [paper](https://arxiv.org/abs/2412.15084).
|
23 |
|
24 |
+
|
25 |
## All Resources
|
26 |
[AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct)   [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct)   [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct)
|
27 |
|
28 |
[AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM)   [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM)
|
29 |
|
30 |
+
[AceMath-Instruct Training Data](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data)   [AceMath-RM Training Data](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data)
|
31 |
+
|
32 |
+
[AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench)   [AceMath-Instruct Evaluation Script](https://huggingface.co/datasets/nvidia/AceMath-Evaluation-Script)
|
33 |
+
|
34 |
+
[AceInstruct-1.5B](https://huggingface.co/nvidia/AceInstruct-1.5B)   [AceInstruct-7B](https://huggingface.co/nvidia/AceInstruct-7B)   [AceInstruct-72B](https://huggingface.co/nvidia/AceInstruct-72B)
|
35 |
|
|
|
36 |
|
37 |
## Benchmark Results
|
38 |
|
39 |
+
<p align="center">
|
40 |
+
<img src="https://research.nvidia.com/labs/adlr/images/acemath/acemath.png" alt="AceMath Benchmark Results" width="800">
|
41 |
+
</p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
+
We compare AceMath to leading proprietary and open-access math models in above Table. Our AceMath-7B-Instruct, largely outperforms the previous best-in-class Qwen2.5-Math-7B-Instruct (Average pass@1: 67.2 vs. 62.9) on a variety of math reasoning benchmarks, while coming close to the performance of 10× larger Qwen2.5-Math-72B-Instruct (67.2 vs. 68.2). Notably, our AceMath-72B-Instruct outperforms the state-of-the-art Qwen2.5-Math-72B-Instruct (71.8 vs. 68.2), GPT-4o (67.4) and Claude 3.5 Sonnet (65.6) by a margin. We also report the rm@8 accuracy (best of 8) achieved by our reward model, AceMath-72B-RM, which sets a new record on these reasoning benchmarks. This excludes OpenAI’s o1 model, which relies on scaled inference computation.
|
44 |
|
45 |
|
46 |
## How to use
|
|
|
90 |
|
91 |
|
92 |
## License
|
93 |
+
All models in the AceMath family are for non-commercial use only, subject to [Terms of Use](https://openai.com/policies/row-terms-of-use/) of the data generated by OpenAI. We put the AceMath models under the license of [Creative Commons Attribution: Non-Commercial 4.0 International](https://spdx.org/licenses/CC-BY-NC-4.0).
|
|