Update README.md
Browse files
README.md
CHANGED
@@ -22,6 +22,13 @@ Benchmark on reasoning tasks using lighteval:
|
|
22 |
|aime24 | 1|extractive_match|0.1333|± |0.0631|
|
23 |
|math_500| 1|extractive_match|0.7420|± |0.0196|
|
24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
## 🧩 Configuration
|
26 |
|
27 |
```yaml
|
|
|
22 |
|aime24 | 1|extractive_match|0.1333|± |0.0631|
|
23 |
|math_500| 1|extractive_match|0.7420|± |0.0196|
|
24 |
|
25 |
+
In comparison, Qwen2.5-7B-Instruct:
|
26 |
+
|
27 |
+
| Task |Version| Metric |Value | |Stderr|
|
28 |
+
|-----------------|------:|----------------|-----:|---|-----:|
|
29 |
+
|aime24 | 1|extractive_match|0.1667|± |0.0692|
|
30 |
+
|math_500| 1|extractive_match|0.8220|± |0.0171|
|
31 |
+
|
32 |
## 🧩 Configuration
|
33 |
|
34 |
```yaml
|