Update README.md
Browse files
README.md
CHANGED
@@ -3,12 +3,13 @@
|
|
3 |
|
4 |
<div id="top" align="center">
|
5 |
|
6 |
-
FuseO1-Preview: System-II Reasoning Fusion of LLMs
|
7 |
-----------------------------
|
8 |
|
9 |
<h4> |<a href="https://arxiv.org/abs/2408.07990"> 📑 Paper </a> |
|
10 |
<a href="https://github.com/fanqiwan/FuseAI"> 🐱 GitHub Repo </a> |
|
11 |
<a href="https://huggingface.co/FuseAI"> 🤗 Hugging Face </a> |
|
|
|
12 |
</h4>
|
13 |
|
14 |
<!-- **Authors:** -->
|
@@ -22,6 +23,11 @@ _Sun Yat-sen University_
|
|
22 |
|
23 |
</div>
|
24 |
|
|
|
|
|
|
|
|
|
|
|
25 |
## Overview
|
26 |
|
27 |
[FuseO1-Preview](https://huggingface.co/collections/FuseAI/fuseo1-preview-678eb56093649b2688bc9977) is our initial endeavor to enhance the System-II reasoning capabilities of large language models (LLMs) through innovative model fusion techniques. By employing advanced [SCE](https://arxiv.org/abs/2408.07990) merging methodologies, we integrate multiple open-source o1-like LLMs into a unified model. Our goal is to incorporate the distinct knowledge and strengths from different reasoning LLMs into a single, unified model with strong System-II reasoning abilities, particularly in mathematics, coding, and science domains.
|
@@ -52,7 +58,7 @@ To reproduce the merged [FuseAI/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview](https:/
|
|
52 |
cd FuseAI/FuseO1-Preview/mergekit
|
53 |
pip3 install -e .
|
54 |
model_save_dir=xx # your path to save the merged models
|
55 |
-
mergekit-yaml fuseo1_configs/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview.yaml ${model_save_dir}/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview
|
56 |
```
|
57 |
|
58 |
To reproduce the merged [FuseAI/FuseO1-DeekSeekR1-QwQ-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-QwQ-32B-Preview) model, using the script below.
|
@@ -61,7 +67,7 @@ To reproduce the merged [FuseAI/FuseO1-DeekSeekR1-QwQ-32B-Preview](https://huggi
|
|
61 |
cd FuseAI/FuseO1-Preview/mergekit
|
62 |
pip3 install -e .
|
63 |
model_save_dir=xxx # your path to save the merged models
|
64 |
-
mergekit-yaml fuseo1_configs/FuseO1-DeekSeekR1-QwQ-32B-Preview.yaml ${model_save_dir}/FuseO1-DeekSeekR1-QwQ-32B-Preview
|
65 |
```
|
66 |
|
67 |
We provide the example code to use FuseAI/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview.
|
@@ -98,7 +104,7 @@ To reproduce the merged [FuseAI/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview](
|
|
98 |
cd FuseAI/FuseO1-Preview/mergekit
|
99 |
pip3 install -e .
|
100 |
model_save_dir=xxx # your path to save the merged models
|
101 |
-
mergekit-yaml fuseo1_configs/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview.yaml ${model_save_dir}/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview
|
102 |
```
|
103 |
|
104 |
We provide the code to use FuseAI/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview.
|
@@ -141,21 +147,29 @@ Scientific Reasoning
|
|
141 |
Code Reasoning
|
142 |
- LiveCodeBench
|
143 |
|
144 |
-
The
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
145 |
|
146 |
The evaluation results are shown in the table below:
|
147 |
|
148 |
| Models | AIME24 | MATH500 | GSM8K | GPQA-Diamond | ARC-Challenge | MMLU-Pro | MMLU | LiveCodeBench |
|
149 |
|:-| ------ | ------- | ----- | ------------ | ------------- | -------- | ---- | ------------- |
|
150 |
-
| o1-preview | 44.60 | 85.50 | - | 73.30 | - | - | 90.80 |
|
151 |
| o1-mini | 63.60 | 90.00 | - | 60.00 | - | 80.30 | 85.20| 53.80 |
|
152 |
-
| [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | 46.67 | 88.20 |
|
153 |
| [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) |43.33 | 87.80 | 95.45 | 49.49 | 95.73 | 63.49 | 85.19 | 51.86 |
|
154 |
| [NovaSky-AI/Sky-T1-32B-Preview](https://huggingface.co/NovaSky-AI/Sky-T1-32B-Preview) | 43.33 | 86.80 | 95.15 | 50.51 | 95.56 | 65.80 | 82.71 | 51.66 |
|
155 |
| [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | 20.00 | 81.60 | 93.63 | 46.46 | 95.22 | 56.27 | 79.63 | 48.53 |
|
156 |
-
| [FuseAI/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview) | 46.67 | 87.20 |
|
157 |
-
| [FuseAI/FuseO1-DeekSeekR1-QwQ-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-QwQ-32B-Preview) | 56.67 | 85.60 |
|
158 |
-
| [FuseAI/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview) | 60.00 | 90.00 |
|
159 |
|
160 |
## Future Works
|
161 |
|
|
|
3 |
|
4 |
<div id="top" align="center">
|
5 |
|
6 |
+
# FuseO1-Preview: System-II Reasoning Fusion of LLMs
|
7 |
-----------------------------
|
8 |
|
9 |
<h4> |<a href="https://arxiv.org/abs/2408.07990"> 📑 Paper </a> |
|
10 |
<a href="https://github.com/fanqiwan/FuseAI"> 🐱 GitHub Repo </a> |
|
11 |
<a href="https://huggingface.co/FuseAI"> 🤗 Hugging Face </a> |
|
12 |
+
<a href="https://huggingface.co/blog/Wanfq/fuseo1-preview"> 🌐 Blog </a> |
|
13 |
</h4>
|
14 |
|
15 |
<!-- **Authors:** -->
|
|
|
23 |
|
24 |
</div>
|
25 |
|
26 |
+
<p align="center">
|
27 |
+
<img src="./assets/fuseo1-preview.jpg" width="100%"> <br>
|
28 |
+
</p>
|
29 |
+
|
30 |
+
|
31 |
## Overview
|
32 |
|
33 |
[FuseO1-Preview](https://huggingface.co/collections/FuseAI/fuseo1-preview-678eb56093649b2688bc9977) is our initial endeavor to enhance the System-II reasoning capabilities of large language models (LLMs) through innovative model fusion techniques. By employing advanced [SCE](https://arxiv.org/abs/2408.07990) merging methodologies, we integrate multiple open-source o1-like LLMs into a unified model. Our goal is to incorporate the distinct knowledge and strengths from different reasoning LLMs into a single, unified model with strong System-II reasoning abilities, particularly in mathematics, coding, and science domains.
|
|
|
58 |
cd FuseAI/FuseO1-Preview/mergekit
|
59 |
pip3 install -e .
|
60 |
model_save_dir=xx # your path to save the merged models
|
61 |
+
mergekit-yaml fuseo1_configs/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview.yaml ${model_save_dir}/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview --cudas
|
62 |
```
|
63 |
|
64 |
To reproduce the merged [FuseAI/FuseO1-DeekSeekR1-QwQ-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-QwQ-32B-Preview) model, using the script below.
|
|
|
67 |
cd FuseAI/FuseO1-Preview/mergekit
|
68 |
pip3 install -e .
|
69 |
model_save_dir=xxx # your path to save the merged models
|
70 |
+
mergekit-yaml fuseo1_configs/FuseO1-DeekSeekR1-QwQ-32B-Preview.yaml ${model_save_dir}/FuseO1-DeekSeekR1-QwQ-32B-Preview --cuda
|
71 |
```
|
72 |
|
73 |
We provide the example code to use FuseAI/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview.
|
|
|
104 |
cd FuseAI/FuseO1-Preview/mergekit
|
105 |
pip3 install -e .
|
106 |
model_save_dir=xxx # your path to save the merged models
|
107 |
+
mergekit-yaml fuseo1_configs/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview.yaml ${model_save_dir}/FuseO1-DeekSeekR1-Qwen2. 5-Instruct-32B-Preview --cuda
|
108 |
```
|
109 |
|
110 |
We provide the code to use FuseAI/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview.
|
|
|
147 |
Code Reasoning
|
148 |
- LiveCodeBench
|
149 |
|
150 |
+
The evaluation code is modified from [SkyThought](https://github.com/NovaSky-AI/SkyThought). In our evaluation, we set the temperature to 0.7 and the max_tokens to 32768. We provide the example to reproduce our results in [evaluation](https://github.com/fanqiwan/FuseAI/tree/main/FuseO1-Preview/evaluation).
|
151 |
+
|
152 |
+
The system prompt for evaluation is set to:
|
153 |
+
|
154 |
+
```sh
|
155 |
+
You are a helpful and harmless assistant. You should think step-by-step.
|
156 |
+
```
|
157 |
+
|
158 |
+
We are currently attempting to reproduce the results reported in the DeepSeek-R1 paper by experimenting with different system prompts. We will update our findings once we have acquired the original system prompt used in their study.
|
159 |
|
160 |
The evaluation results are shown in the table below:
|
161 |
|
162 |
| Models | AIME24 | MATH500 | GSM8K | GPQA-Diamond | ARC-Challenge | MMLU-Pro | MMLU | LiveCodeBench |
|
163 |
|:-| ------ | ------- | ----- | ------------ | ------------- | -------- | ---- | ------------- |
|
164 |
+
| o1-preview | 44.60 | 85.50 | - | 73.30 | - | - | 90.80 | 44.60 |
|
165 |
| o1-mini | 63.60 | 90.00 | - | 60.00 | - | 80.30 | 85.20| 53.80 |
|
166 |
+
| [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | 46.67 | 88.20 | 93.71 | 57.58 | 95.90 | 68.70 | 82.17 | 59.69 |
|
167 |
| [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) |43.33 | 87.80 | 95.45 | 49.49 | 95.73 | 63.49 | 85.19 | 51.86 |
|
168 |
| [NovaSky-AI/Sky-T1-32B-Preview](https://huggingface.co/NovaSky-AI/Sky-T1-32B-Preview) | 43.33 | 86.80 | 95.15 | 50.51 | 95.56 | 65.80 | 82.71 | 51.66 |
|
169 |
| [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | 20.00 | 81.60 | 93.63 | 46.46 | 95.22 | 56.27 | 79.63 | 48.53 |
|
170 |
+
| [FuseAI/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview) | 46.67 | 87.20 | 93.33 | 55.05 | 96.33 | 68.61 | - | 60.67 |
|
171 |
+
| [FuseAI/FuseO1-DeekSeekR1-QwQ-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-QwQ-32B-Preview) | 56.67 | 85.60 | 93.78 | 62.12 | 96.08 | 68.85 | - | 59.49 |
|
172 |
+
| [FuseAI/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview) | 60.00 | 90.00 | 93.33 | 62.12 | 95.90 | 70.79 | - | 58.90 |
|
173 |
|
174 |
## Future Works
|
175 |
|