Update README.md
Browse files
README.md
CHANGED
@@ -47,11 +47,12 @@ Evaluation Results
|
|
47 |
|
48 |
The model was evaluated across a range of tasks. Below are the final evaluation results (after removing GSM8k):
|
49 |
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
|
|
55 |
|
56 |
• ARC Challenge: The model performs decently in answering general knowledge questions.
|
57 |
• HellaSwag: The model is strong in commonsense reasoning, performing well in predicting the next sequence of events in a given scenario.
|
|
|
47 |
|
48 |
The model was evaluated across a range of tasks. Below are the final evaluation results (after removing GSM8k):
|
49 |
|
50 |
+
| Parameters | Model | MMLU | ARC-C | HellaSwag | PIQA | Winogrande | Average |
|
51 |
+
|------------|-----------|-------|--------|-----------|--------|------------|---------|
|
52 |
+
| 500M | qwen 2 | 44.13 | 28.92 | 49.05 | 69.31 | 56.99 | 49.68 |
|
53 |
+
| 500M | qwen 2.5 | 47.29 | 31.83 | 52.17 | 70.29 | 57.06 | 51.72 |
|
54 |
+
| 1.24B | llama 3.2 | 36.75 | 36.18 | 63.70 | 74.54 | 60.54 | 54.34 |
|
55 |
+
| 514M | archeon | NA | 32.34 | 47.80 | 74.37 | 62.12 | 54.16 |
|
56 |
|
57 |
• ARC Challenge: The model performs decently in answering general knowledge questions.
|
58 |
• HellaSwag: The model is strong in commonsense reasoning, performing well in predicting the next sequence of events in a given scenario.
|