Safetensors
qwen2
Wanfq commited on
Commit
4261b10
·
verified ·
1 Parent(s): 988907c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -10
README.md CHANGED
@@ -3,12 +3,13 @@
3
 
4
  <div id="top" align="center">
5
 
6
- FuseO1-Preview: System-II Reasoning Fusion of LLMs
7
  -----------------------------
8
 
9
  <h4> |<a href="https://arxiv.org/abs/2408.07990"> 📑 Paper </a> |
10
  <a href="https://github.com/fanqiwan/FuseAI"> 🐱 GitHub Repo </a> |
11
  <a href="https://huggingface.co/FuseAI"> 🤗 Hugging Face </a> |
 
12
  </h4>
13
 
14
  <!-- **Authors:** -->
@@ -22,6 +23,11 @@ _Sun Yat-sen University_
22
 
23
  </div>
24
 
 
 
 
 
 
25
  ## Overview
26
 
27
  [FuseO1-Preview](https://huggingface.co/collections/FuseAI/fuseo1-preview-678eb56093649b2688bc9977) is our initial endeavor to enhance the System-II reasoning capabilities of large language models (LLMs) through innovative model fusion techniques. By employing advanced [SCE](https://arxiv.org/abs/2408.07990) merging methodologies, we integrate multiple open-source o1-like LLMs into a unified model. Our goal is to incorporate the distinct knowledge and strengths from different reasoning LLMs into a single, unified model with strong System-II reasoning abilities, particularly in mathematics, coding, and science domains.
@@ -52,7 +58,7 @@ To reproduce the merged [FuseAI/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview](https:/
52
  cd FuseAI/FuseO1-Preview/mergekit
53
  pip3 install -e .
54
  model_save_dir=xx # your path to save the merged models
55
- mergekit-yaml fuseo1_configs/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview.yaml ${model_save_dir}/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview
56
  ```
57
 
58
  To reproduce the merged [FuseAI/FuseO1-DeekSeekR1-QwQ-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-QwQ-32B-Preview) model, using the script below.
@@ -61,7 +67,7 @@ To reproduce the merged [FuseAI/FuseO1-DeekSeekR1-QwQ-32B-Preview](https://huggi
61
  cd FuseAI/FuseO1-Preview/mergekit
62
  pip3 install -e .
63
  model_save_dir=xxx # your path to save the merged models
64
- mergekit-yaml fuseo1_configs/FuseO1-DeekSeekR1-QwQ-32B-Preview.yaml ${model_save_dir}/FuseO1-DeekSeekR1-QwQ-32B-Preview
65
  ```
66
 
67
  We provide the example code to use FuseAI/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview.
@@ -98,7 +104,7 @@ To reproduce the merged [FuseAI/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview](
98
  cd FuseAI/FuseO1-Preview/mergekit
99
  pip3 install -e .
100
  model_save_dir=xxx # your path to save the merged models
101
- mergekit-yaml fuseo1_configs/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview.yaml ${model_save_dir}/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview
102
  ```
103
 
104
  We provide the code to use FuseAI/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview.
@@ -141,21 +147,29 @@ Scientific Reasoning
141
  Code Reasoning
142
  - LiveCodeBench
143
 
144
- The [evaluation code](https://github.com/fanqiwan/FuseAI/tree/main/FuseO1-Preview/evaluation) is modified from [SkyThought](https://github.com/NovaSky-AI/SkyThought). In our evaluation, we set the temperature to 0.7 (sampling) and the max_tokens to 32768.
 
 
 
 
 
 
 
 
145
 
146
  The evaluation results are shown in the table below:
147
 
148
  | Models | AIME24 | MATH500 | GSM8K | GPQA-Diamond | ARC-Challenge | MMLU-Pro | MMLU | LiveCodeBench |
149
  |:-| ------ | ------- | ----- | ------------ | ------------- | -------- | ---- | ------------- |
150
- | o1-preview | 44.60 | 85.50 | - | 73.30 | - | - | 90.80 | - |
151
  | o1-mini | 63.60 | 90.00 | - | 60.00 | - | 80.30 | 85.20| 53.80 |
152
- | [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | 46.67 | 88.20 | - | 57.58 | - | - | - | - |
153
  | [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) |43.33 | 87.80 | 95.45 | 49.49 | 95.73 | 63.49 | 85.19 | 51.86 |
154
  | [NovaSky-AI/Sky-T1-32B-Preview](https://huggingface.co/NovaSky-AI/Sky-T1-32B-Preview) | 43.33 | 86.80 | 95.15 | 50.51 | 95.56 | 65.80 | 82.71 | 51.66 |
155
  | [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | 20.00 | 81.60 | 93.63 | 46.46 | 95.22 | 56.27 | 79.63 | 48.53 |
156
- | [FuseAI/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview) | 46.67 | 87.20 | - | 55.05 | - | - | - | - |
157
- | [FuseAI/FuseO1-DeekSeekR1-QwQ-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-QwQ-32B-Preview) | 56.67 | 85.60 | - | 62.12 | - | - | - | - |
158
- | [FuseAI/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview) | 60.00 | 90.00 | - | 62.12 | - | - | - | - |
159
 
160
  ## Future Works
161
 
 
3
 
4
  <div id="top" align="center">
5
 
6
+ # FuseO1-Preview: System-II Reasoning Fusion of LLMs
7
  -----------------------------
8
 
9
  <h4> |<a href="https://arxiv.org/abs/2408.07990"> 📑 Paper </a> |
10
  <a href="https://github.com/fanqiwan/FuseAI"> 🐱 GitHub Repo </a> |
11
  <a href="https://huggingface.co/FuseAI"> 🤗 Hugging Face </a> |
12
+ <a href="https://huggingface.co/blog/Wanfq/fuseo1-preview"> 🌐 Blog </a> |
13
  </h4>
14
 
15
  <!-- **Authors:** -->
 
23
 
24
  </div>
25
 
26
+ <p align="center">
27
+ <img src="./assets/fuseo1-preview.jpg" width="100%"> <br>
28
+ </p>
29
+
30
+
31
  ## Overview
32
 
33
  [FuseO1-Preview](https://huggingface.co/collections/FuseAI/fuseo1-preview-678eb56093649b2688bc9977) is our initial endeavor to enhance the System-II reasoning capabilities of large language models (LLMs) through innovative model fusion techniques. By employing advanced [SCE](https://arxiv.org/abs/2408.07990) merging methodologies, we integrate multiple open-source o1-like LLMs into a unified model. Our goal is to incorporate the distinct knowledge and strengths from different reasoning LLMs into a single, unified model with strong System-II reasoning abilities, particularly in mathematics, coding, and science domains.
 
58
  cd FuseAI/FuseO1-Preview/mergekit
59
  pip3 install -e .
60
  model_save_dir=xx # your path to save the merged models
61
+ mergekit-yaml fuseo1_configs/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview.yaml ${model_save_dir}/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview --cudas
62
  ```
63
 
64
  To reproduce the merged [FuseAI/FuseO1-DeekSeekR1-QwQ-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-QwQ-32B-Preview) model, using the script below.
 
67
  cd FuseAI/FuseO1-Preview/mergekit
68
  pip3 install -e .
69
  model_save_dir=xxx # your path to save the merged models
70
+ mergekit-yaml fuseo1_configs/FuseO1-DeekSeekR1-QwQ-32B-Preview.yaml ${model_save_dir}/FuseO1-DeekSeekR1-QwQ-32B-Preview --cuda
71
  ```
72
 
73
  We provide the example code to use FuseAI/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview.
 
104
  cd FuseAI/FuseO1-Preview/mergekit
105
  pip3 install -e .
106
  model_save_dir=xxx # your path to save the merged models
107
+ mergekit-yaml fuseo1_configs/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview.yaml ${model_save_dir}/FuseO1-DeekSeekR1-Qwen2. 5-Instruct-32B-Preview --cuda
108
  ```
109
 
110
  We provide the code to use FuseAI/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview.
 
147
  Code Reasoning
148
  - LiveCodeBench
149
 
150
+ The evaluation code is modified from [SkyThought](https://github.com/NovaSky-AI/SkyThought). In our evaluation, we set the temperature to 0.7 and the max_tokens to 32768. We provide the example to reproduce our results in [evaluation](https://github.com/fanqiwan/FuseAI/tree/main/FuseO1-Preview/evaluation).
151
+
152
+ The system prompt for evaluation is set to:
153
+
154
+ ```sh
155
+ You are a helpful and harmless assistant. You should think step-by-step.
156
+ ```
157
+
158
+ We are currently attempting to reproduce the results reported in the DeepSeek-R1 paper by experimenting with different system prompts. We will update our findings once we have acquired the original system prompt used in their study.
159
 
160
  The evaluation results are shown in the table below:
161
 
162
  | Models | AIME24 | MATH500 | GSM8K | GPQA-Diamond | ARC-Challenge | MMLU-Pro | MMLU | LiveCodeBench |
163
  |:-| ------ | ------- | ----- | ------------ | ------------- | -------- | ---- | ------------- |
164
+ | o1-preview | 44.60 | 85.50 | - | 73.30 | - | - | 90.80 | 44.60 |
165
  | o1-mini | 63.60 | 90.00 | - | 60.00 | - | 80.30 | 85.20| 53.80 |
166
+ | [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | 46.67 | 88.20 | 93.71 | 57.58 | 95.90 | 68.70 | 82.17 | 59.69 |
167
  | [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) |43.33 | 87.80 | 95.45 | 49.49 | 95.73 | 63.49 | 85.19 | 51.86 |
168
  | [NovaSky-AI/Sky-T1-32B-Preview](https://huggingface.co/NovaSky-AI/Sky-T1-32B-Preview) | 43.33 | 86.80 | 95.15 | 50.51 | 95.56 | 65.80 | 82.71 | 51.66 |
169
  | [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | 20.00 | 81.60 | 93.63 | 46.46 | 95.22 | 56.27 | 79.63 | 48.53 |
170
+ | [FuseAI/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview) | 46.67 | 87.20 | 93.33 | 55.05 | 96.33 | 68.61 | - | 60.67 |
171
+ | [FuseAI/FuseO1-DeekSeekR1-QwQ-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-QwQ-32B-Preview) | 56.67 | 85.60 | 93.78 | 62.12 | 96.08 | 68.85 | - | 59.49 |
172
+ | [FuseAI/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview) | 60.00 | 90.00 | 93.33 | 62.12 | 95.90 | 70.79 | - | 58.90 |
173
 
174
  ## Future Works
175