inarikami commited on
Commit
ea50d7f
·
verified ·
1 Parent(s): 4c1f901

add mmlu_pro benchmark results

Browse files
Files changed (1) hide show
  1. README.md +47 -1
README.md CHANGED
@@ -11,4 +11,50 @@ tags:
11
 
12
  # DeepSeek-R1-Distill-Qwen-32B-AWQ wint4
13
 
14
- Distillation of DeepSeek-R1 to Qwen 32B, quantized using AWQ to wint4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  # DeepSeek-R1-Distill-Qwen-32B-AWQ wint4
13
 
14
+ Distillation of DeepSeek-R1 to Qwen 32B, quantized using AWQ to wint4
15
+
16
+
17
+ ## Benchmarks:
18
+
19
+ Here's how you can convert the given data into a Markdown format, including a short description about the benchmark:
20
+
21
+ ## MMLU-PRO
22
+
23
+ The MMLU-PRO dataset evaluates subjects across 14 distinct fields using a 5-shot accuracy measurement. Each task assesses models following the methodology of the original MMLU implementation, with each having ten possible choices.
24
+
25
+ ### Measure
26
+
27
+ - **Accuracy**: Evaluated as "exact_match"
28
+
29
+ ### Shots
30
+
31
+ - **Shots**: 5-shot
32
+
33
+ ### Results Table
34
+
35
+ | Tasks | Version | Filter | n-shot | Metric | Direction | Value | Stderr |
36
+ |---------------------------|---------|---------------|--------|------------|-----------|-------|--------|
37
+ | mmlu_pro | 2 | custom-extract| | exact_match| ↑ | 0.5875| 0.0044 |
38
+ | biology | 1 | custom-extract| 5 | exact_match| ↑ | 0.7978| 0.0150 |
39
+ | business | 1 | custom-extract| 5 | exact_match| ↑ | 0.5982| 0.0175 |
40
+ | chemistry | 1 | custom-extract| 5 | exact_match| ↑ | 0.4691| 0.0148 |
41
+ | computer_science | 1 | custom-extract| 5 | exact_match| ↑ | 0.6122| 0.0241 |
42
+ | economics | 1 | custom-extract| 5 | exact_match| ↑ | 0.7346| 0.0152 |
43
+ | engineering | 1 | custom-extract| 5 | exact_match| ↑ | 0.3891| 0.0157 |
44
+ | health | 1 | custom-extract| 5 | exact_match| ↑ | 0.6345| 0.0168 |
45
+ | history | 1 | custom-extract| 5 | exact_match| ↑ | 0.6168| 0.0249 |
46
+ | law | 1 | custom-extract| 5 | exact_match| ↑ | 0.4596| 0.0150 |
47
+ | math | 1 | custom-extract| 5 | exact_match| ↑ | 0.6425| 0.0130 |
48
+ | other | 1 | custom-extract| 5 | exact_match| ↑ | 0.6223| 0.0160 |
49
+ | philosophy | 1 | custom-extract| 5 | exact_match| ↑ | 0.5731| 0.0222 |
50
+ | physics | 1 | custom-extract| 5 | exact_match| ↑ | 0.5073| 0.0139 |
51
+ | psychology | 1 | custom-extract| 5 | exact_match| ↑ | 0.7494| 0.0154 |
52
+
53
+ ## Groups
54
+
55
+ | Groups | Version | Filter | n-shot | Metric | Direction | Value | Stderr |
56
+ |-----------|---------|---------------|--------|------------|-----------|-------|--------|
57
+ | mmlu_pro | 2 | custom-extract| | exact_match| ↑ | 0.5875| 0.0044 |
58
+
59
+
60
+