bfuzzy1 commited on
Commit
1b39890
·
verified ·
1 Parent(s): a703860

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -47,11 +47,12 @@ Evaluation Results
47
 
48
  The model was evaluated across a range of tasks. Below are the final evaluation results (after removing GSM8k):
49
 
50
- Task Accuracy Normalized Accuracy
51
- ARC Challenge 32.42% 37.29%
52
- HellaSwag 47.83% 63.02%
53
- PIQA 74.37% N/A
54
- Winogrande 62.12% N/A
 
55
 
56
  • ARC Challenge: The model performs decently in answering general knowledge questions.
57
  • HellaSwag: The model is strong in commonsense reasoning, performing well in predicting the next sequence of events in a given scenario.
 
47
 
48
  The model was evaluated across a range of tasks. Below are the final evaluation results (after removing GSM8k):
49
 
50
+ | Parameters | Model | MMLU | ARC-C | HellaSwag | PIQA | Winogrande | Average |
51
+ |------------|-----------|-------|--------|-----------|--------|------------|---------|
52
+ | 500M | qwen 2 | 44.13 | 28.92 | 49.05 | 69.31 | 56.99 | 49.68 |
53
+ | 500M | qwen 2.5 | 47.29 | 31.83 | 52.17 | 70.29 | 57.06 | 51.72 |
54
+ | 1.24B | llama 3.2 | 36.75 | 36.18 | 63.70 | 74.54 | 60.54 | 54.34 |
55
+ | 514M | archeon | NA | 32.34 | 47.80 | 74.37 | 62.12 | 54.16 |
56
 
57
  • ARC Challenge: The model performs decently in answering general knowledge questions.
58
  • HellaSwag: The model is strong in commonsense reasoning, performing well in predicting the next sequence of events in a given scenario.