Update README.md
Browse files
README.md
CHANGED
@@ -133,8 +133,11 @@ Please check the examples we provided: https://huggingface.co/Pinkstack/SuperTho
|
|
133 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/QDHJhI0EVT_L9AHY_g3Br.png)
|
134 |
Beats qwen/qwq at MATH & MuSR (MuSR being a reasoning benchmark)
|
135 |
Evaluation:
|
136 |
-
|
137 |
-
|
|
|
|
|
|
|
138 |
|
139 |
Unlike previous models we've uploaded, this one is the best one we've published! Answers in two steps: Reasoning -> Final answer like o1 mini and other similar reasoning ai models.
|
140 |
# 🧀 Which quant is right for you? (all tested!)
|
@@ -145,6 +148,7 @@ Unlike previous models we've uploaded, this one is the best one we've published!
|
|
145 |
# [Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
146 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Pinkstack__SuperThoughts-CoT-14B-16k-o1-QwQ-details)!
|
147 |
Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Pinkstack%2FSuperThoughts-CoT-14B-16k-o1-QwQ&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
|
|
|
148 |
|
149 |
| Metric |Value (%)|
|
150 |
|-------------------|--------:|
|
|
|
133 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/QDHJhI0EVT_L9AHY_g3Br.png)
|
134 |
Beats qwen/qwq at MATH & MuSR (MuSR being a reasoning benchmark)
|
135 |
Evaluation:
|
136 |
+
|
137 |
+
|
138 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/csbdGKzGcDVMPRqMCoH8D.png)
|
139 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/HR9WtjBhE4h6wrq88FLAf.png)
|
140 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/GLt4ct4yAVMvYEpoYO5o6.png)
|
141 |
|
142 |
Unlike previous models we've uploaded, this one is the best one we've published! Answers in two steps: Reasoning -> Final answer like o1 mini and other similar reasoning ai models.
|
143 |
# 🧀 Which quant is right for you? (all tested!)
|
|
|
148 |
# [Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
149 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Pinkstack__SuperThoughts-CoT-14B-16k-o1-QwQ-details)!
|
150 |
Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Pinkstack%2FSuperThoughts-CoT-14B-16k-o1-QwQ&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
|
151 |
+
Please note, the low IFEVAL results is probably due to it always reasoning, it does have issues with instruction following.
|
152 |
|
153 |
| Metric |Value (%)|
|
154 |
|-------------------|--------:|
|