Upload README.md
Browse files
README.md
CHANGED
@@ -15,28 +15,37 @@ tags:
|
|
15 |
We introduce AceMath, a family of frontier models designed for mathematical reasoning. The models in AceMath family, including AceMath-1.5B/7B/72B-Instruct and AceMath-7B/72B-RM, are <b>Improved using Qwen</b>.
|
16 |
The AceMath-1.5B/7B/72B-Instruct models excel at solving English mathematical problems using Chain-of-Thought (CoT) reasoning, while the AceMath-7B/72B-RM models, as outcome reward models, specialize in evaluating and scoring mathematical solutions.
|
17 |
|
18 |
-
The AceMath-
|
19 |
|
20 |
For more information about AceMath, check our [website](https://research.nvidia.com/labs/adlr/acemath/) and [paper](https://arxiv.org/abs/2412.15084).
|
21 |
|
22 |
## All Resources
|
23 |
-
[AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct)   [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct)   [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct)
|
24 |
|
25 |
-
|
|
|
26 |
|
27 |
-
|
|
|
28 |
|
29 |
-
|
|
|
30 |
|
31 |
-
|
|
|
32 |
|
33 |
-
<p align="center">
|
34 |
-
<img src="https://research.nvidia.com/labs/adlr/images/acemath/acemath.png" alt="AceMath Benchmark Results" width="800">
|
35 |
-
</p>
|
36 |
|
37 |
|
38 |
-
|
39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
## How to use
|
42 |
```python
|
@@ -82,7 +91,7 @@ input_ids = tokenizer.encode(
|
|
82 |
).to(model.device)
|
83 |
|
84 |
outputs = model(input_ids=input_ids)
|
85 |
-
print(outputs[0][0])
|
86 |
```
|
87 |
|
88 |
|
|
|
15 |
We introduce AceMath, a family of frontier models designed for mathematical reasoning. The models in AceMath family, including AceMath-1.5B/7B/72B-Instruct and AceMath-7B/72B-RM, are <b>Improved using Qwen</b>.
|
16 |
The AceMath-1.5B/7B/72B-Instruct models excel at solving English mathematical problems using Chain-of-Thought (CoT) reasoning, while the AceMath-7B/72B-RM models, as outcome reward models, specialize in evaluating and scoring mathematical solutions.
|
17 |
|
18 |
+
The AceMath-7B/72B-RM models are developed from their AceMath-7B/72B-Instruct models and trained on AceMath-RM-Training-Data using Bradley-Terry loss. The architecture employs standard sequence classification with a linear layer on top of the language model, using the final token to output a scalar score.pull
|
19 |
|
20 |
For more information about AceMath, check our [website](https://research.nvidia.com/labs/adlr/acemath/) and [paper](https://arxiv.org/abs/2412.15084).
|
21 |
|
22 |
## All Resources
|
|
|
23 |
|
24 |
+
### AceMath Instruction Models
|
25 |
+
- [AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct), [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct), [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct)
|
26 |
|
27 |
+
### AceMath Reward Models
|
28 |
+
- [AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM), [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM)
|
29 |
|
30 |
+
### Evaluation & Training Data
|
31 |
+
- [AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench), [AceMath-Instruct Training Data](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data), [AceMath-RM Training Data](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data)
|
32 |
|
33 |
+
### Base Models
|
34 |
+
- [AceInstruct-1.5B](https://huggingface.co/nvidia/AceInstruct-1.5B), [AceInstruct-7B](https://huggingface.co/nvidia/AceInstruct-7B), [AceInstruct-72B](https://huggingface.co/nvidia/AceInstruct-72B)
|
35 |
|
|
|
|
|
|
|
36 |
|
37 |
|
38 |
+
## Reward Model Benchmark Results
|
39 |
|
40 |
+
| Model | GSM8K | MATH500 | Minerva Math | GaoKao 2023 En | Olympiad Bench | College Math | MMLU STEM | Avg. |
|
41 |
+
|---------------------------|-------|---------|--------------|----------------|-----------------|--------------|-----------|--------|
|
42 |
+
| majority@8 | 96.22 | 83.11 | 41.20 | 68.21 | 42.69 | 45.01 | 78.21 | 64.95 |
|
43 |
+
| Skywork-o1-Open-PRM-Qwen-2.5-7B | 96.92 | 86.64 | 41.00 | 72.34 | 46.50 | 46.30 | 74.01 | 66.24 |
|
44 |
+
| Qwen2.5-Math-RM-72B | 96.61 | 86.63 | 43.60 | 73.62 | 47.21 | 47.29 | 84.24 | 68.46 |
|
45 |
+
| AceMath-7B-RM (Ours) | 96.66 | 85.47 | 41.96 | 73.82 | 46.81 | 46.37 | 80.78 | 67.41 |
|
46 |
+
| AceMath-72B-RM (Ours) | 97.23 | 86.72 | 45.06 | 74.69 | 49.23 | 46.79 | 87.01 | 69.53 |
|
47 |
+
|
48 |
+
*Reward model evaluation on AceMath-RewardBench. The average results (rm@8) of reward models on math benchmarks, randomly sample 8 responses from 64 candidates with 100 random seeds. Response candidates are generated from a pool of 8 LLMs.
|
49 |
|
50 |
## How to use
|
51 |
```python
|
|
|
91 |
).to(model.device)
|
92 |
|
93 |
outputs = model(input_ids=input_ids)
|
94 |
+
print(outputs[0][0])
|
95 |
```
|
96 |
|
97 |
|