zihanliu commited on
Commit
bddb20c
·
verified ·
1 Parent(s): e457fce

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -0
README.md ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - nvidia
8
+ - AceMath
9
+ - math
10
+ - pytorch
11
+ ---
12
+
13
+
14
+ ## Introduction
15
+ We introduce AceMath, a family of frontier models designed for mathematical reasoning. The models in AceMath family, including AceMath-1.5B/7B/72B-Instruct and AceMath-7B/72B-RM, are <b>Improved using Qwen</b>.
16
+ The AceMath-1.5B/7B/72B-Instruct models excel at solving English mathematical problems using Chain-of-Thought (CoT) reasoning, while the AceMath-7B/72B-RM models, as outcome reward models, specialize in evaluating and scoring mathematical solutions.
17
+
18
+ The AceMath-1.5B/7B/72B-Instruct models are developed from the Qwen2.5-Math-1.5B/7B/72B-Base models, leveraging a multi-stage supervised fine-tuning (SFT) process: first with general-purpose SFT data, followed by math-specific SFT data. We are releasing all training data to support further research in this field.
19
+
20
+ For more information about AceMath, check our [website](https://research.nvidia.com/labs/adlr/acemath/) and [paper](https://arxiv.org/abs/2412.15084).
21
+
22
+
23
+ ## All Resources
24
+
25
+ [AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct) &ensp; [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct) &ensp; [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct) &ensp; [AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-Instruct) &ensp; [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-1.5B-Instruct)
26
+
27
+ [AceMath-Instruct Training Data](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data) &ensp; [AceMath-RM Training Data](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data) &ensp; [AceMath Evaluation Script](https://huggingface.co/nvidia/AceMath-Evaluation-Script)
28
+
29
+
30
+ ## Benchmark Results
31
+
32
+ | | GSM8K | MATH | Minerva Math | GaoKao 2023En | Olympiad Bench | College Math | MMLU STEM | Average |
33
+ | -- |:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
34
+ | GPT-4o (2024-0806) | 92.90 | 81.10 | 50.74 | 67.50 | 43.30 | 48.50 | 87.99 | 67.43 |
35
+ | Claude-3.5 Sonnet (2024-1022) | 96.40 | 75.90 | 48.16 | 64.94 | 37.93 | 48.47 | 85.06 | 65.27 |
36
+ | Llama3.1-70B-Instruct | 94.10 | 65.70 | 34.20 | 54.00 | 27.70 | 42.50 | 80.40 | 56.94 |
37
+ | Llama3.1-405B-Instruct | 96.80 | 73.80 | 54.04 | 62.08 | 34.81 | 49.25 | 83.10 | 64.84 |
38
+ | Qwen2.5-Math-1.5B-Instruct | 84.80 | 75.80 | 29.40 | 65.50 | 38.10 | 47.70 | 57.50 | 56.97 |
39
+ | Qwen2.5-Math-7B-Instruct | 95.20 | 83.60 | 37.10 | 66.80 | 41.60 | 46.80 | 71.90 | 63.29 |
40
+ | Qwen2.5-Math-72B-Instruct | 95.90 | 85.90 | 44.10 | 71.90 | 49.00 | 49.50 | 80.80 | 68.16 |
41
+ | AceMath-1.5B-Instruct (Ours) | 86.95 | 76.84 | 41.54 | 64.42 | 33.78 | 54.36 | 62.04 | 59.99 |
42
+ | AceMath-7B-Instruct (Ours) | 93.71 | 83.14 | 51.11 | 68.05 | 42.22 | 56.64 | 75.32 | 67.17 |
43
+ | AceMath-72B-Instruct (Ours) | 96.44 | 86.10 | 56.99 | 72.21 | 48.44 | 57.24 | 85.44 | 71.84 |
44
+
45
+ Greedy decoding (pass@1) results on a variety of math reasoning benchmarks. AceMath-7B-Instruct significantly outperforms the previous best-in-class Qwen2.5-Math-7B-Instruct (67.2 vs. 62.9) and comes close to the performance of 10× larger Qwen2.5-Math-72B-Instruct (67.2 vs. 68.2). Notably, our AceMath-72B-Instruct outperforms the state-of-the-art Qwen2.5-Math-72B-Instruct (71.8 vs. 68.2), GPT-4o (67.4) and Claude 3.5 Sonnet (65.6) by a margin.
46
+
47
+
48
+ ## How to use
49
+ ```python
50
+ from transformers import AutoModelForCausalLM, AutoTokenizer
51
+
52
+ model_name = "nvidia/AceMath-72B-Instruct"
53
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
54
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
55
+
56
+ prompt = "Jen enters a lottery by picking $4$ distinct numbers from $S=\\{1,2,3,\\cdots,9,10\\}.$ $4$ numbers are randomly chosen from $S.$ She wins a prize if at least two of her numbers were $2$ of the randomly chosen numbers, and wins the grand prize if all four of her numbers were the randomly chosen numbers. The probability of her winning the grand prize given that she won a prize is $\\tfrac{m}{n}$ where $m$ and $n$ are relatively prime positive integers. Find $m+n$."
57
+ messages = [{"role": "user", "content": prompt}]
58
+
59
+ text = tokenizer.apply_chat_template(
60
+ messages,
61
+ tokenize=False,
62
+ add_generation_prompt=True
63
+ )
64
+ model_inputs = tokenizer([text], return_tensors="pt").to("cuda")
65
+
66
+ generated_ids = model.generate(
67
+ **model_inputs,
68
+ max_new_tokens=2048
69
+ )
70
+ generated_ids = [
71
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
72
+ ]
73
+
74
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
75
+ ```
76
+
77
+
78
+ ## Correspondence to
79
+ Zihan Liu ([email protected]), Yang Chen ([email protected]), Wei Ping ([email protected])
80
+
81
+
82
+ ## Citation
83
+ If you find our work helpful, we’d appreciate it if you could cite us.
84
+ <pre>
85
+ @article{acemath2024,
86
+ title={AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling},
87
+ author={Liu, Zihan and Chen, Yang and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
88
+ journal={arXiv preprint},
89
+ year={2024}
90
+ }
91
+ </pre>
92
+
93
+
94
+ ## License
95
+ All models in the AceMath family are for non-commercial use only, subject to [Terms of Use](https://openai.com/policies/row-terms-of-use/) of the data generated by OpenAI. We put the AceMath models under the license of [Creative Commons Attribution: Non-Commercial 4.0 International](https://spdx.org/licenses/CC-BY-NC-4.0).
96
+