inarikami
/

DeepSeek-R1-Distill-Qwen-32B-AWQ

Text Generation

4-bit precision

Model card Files Files and versions Community

inarikami commited on 18 days ago

Commit

ea50d7f

·

verified ·

1 Parent(s): 4c1f901

add mmlu_pro benchmark results

Files changed (1) hide show

README.md +47 -1

README.md CHANGED Viewed

@@ -11,4 +11,50 @@ tags:
 # DeepSeek-R1-Distill-Qwen-32B-AWQ wint4
-Distillation of DeepSeek-R1 to Qwen 32B, quantized using AWQ to wint4

 # DeepSeek-R1-Distill-Qwen-32B-AWQ wint4
+Distillation of DeepSeek-R1 to Qwen 32B, quantized using AWQ to wint4
+## Benchmarks:
+Here's how you can convert the given data into a Markdown format, including a short description about the benchmark:
+## MMLU-PRO
+The MMLU-PRO dataset evaluates subjects across 14 distinct fields using a 5-shot accuracy measurement. Each task assesses models following the methodology of the original MMLU implementation, with each having ten possible choices.
+### Measure
+- **Accuracy**: Evaluated as "exact_match"
+### Shots
+- **Shots**: 5-shot
+### Results Table
+| Tasks                     | Version | Filter        | n-shot | Metric     | Direction | Value | Stderr |
+|---------------------------|---------|---------------|--------|------------|-----------|-------|--------|
+| mmlu_pro                  | 2       | custom-extract|        | exact_match| ↑         | 0.5875| 0.0044 |
+| biology                   | 1       | custom-extract| 5      | exact_match| ↑         | 0.7978| 0.0150 |
+| business                  | 1       | custom-extract| 5      | exact_match| ↑         | 0.5982| 0.0175 |
+| chemistry                 | 1       | custom-extract| 5      | exact_match| ↑         | 0.4691| 0.0148 |
+| computer_science          | 1       | custom-extract| 5      | exact_match| ↑         | 0.6122| 0.0241 |
+| economics                 | 1       | custom-extract| 5      | exact_match| ↑         | 0.7346| 0.0152 |
+| engineering               | 1       | custom-extract| 5      | exact_match| ↑         | 0.3891| 0.0157 |
+| health                    | 1       | custom-extract| 5      | exact_match| ↑         | 0.6345| 0.0168 |
+| history                   | 1       | custom-extract| 5      | exact_match| ↑         | 0.6168| 0.0249 |
+| law                       | 1       | custom-extract| 5      | exact_match| ↑         | 0.4596| 0.0150 |
+| math                      | 1       | custom-extract| 5      | exact_match| ↑         | 0.6425| 0.0130 |
+| other                     | 1       | custom-extract| 5      | exact_match| ↑         | 0.6223| 0.0160 |
+| philosophy                | 1       | custom-extract| 5      | exact_match| ↑         | 0.5731| 0.0222 |
+| physics                   | 1       | custom-extract| 5      | exact_match| ↑         | 0.5073| 0.0139 |
+| psychology                | 1       | custom-extract| 5      | exact_match| ↑         | 0.7494| 0.0154 |
+## Groups
+| Groups    | Version | Filter        | n-shot | Metric     | Direction | Value | Stderr |
+|-----------|---------|---------------|--------|------------|-----------|-------|--------|
+| mmlu_pro  | 2       | custom-extract|        | exact_match| ↑         | 0.5875| 0.0044 |