Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- sparse
|
4 |
+
- fp8
|
5 |
+
- vllm
|
6 |
+
---
|
7 |
+
|
8 |
+
# Meta-Llama-3-8B-pruned_50.2of4-FP8
|
9 |
+
|
10 |
+
This repo contains model files for a 2:4 (N:M) sparse [Meta-Llama-3-8B](meta-llama/Meta-Llama-3-8B) model pruned in one-shot with [SparseGPT](https://arxiv.org/abs/2301.00774), and then additionally retrained with the [SquareHead](https://arxiv.org/abs/2310.06927) knowledge distillation while maintaining the 2:4 sparsity mask.
|
11 |
+
It was then quantized using AutoFP8 to FP8 weights and activations with per-tensor scales, calibrated on UltraChat2k.
|
12 |
+
|
13 |
+
|
14 |
+
## Evaluation Benchmark Results
|
15 |
+
|
16 |
+
Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) following the configuration of [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard).
|
17 |
+
|
18 |
+
| Benchmark | Meta-Llama-3-8B | Meta-Llama-3-8B-pruned_50.2of4 | Meta-Llama-3-8B-pruned_50.2of4-FP8<br>(this model) |
|
19 |
+
|:----------------------------------------------:|:-----------:|:-----------------------------:|:-----------------------------:|
|
20 |
+
| [ARC-c](https://arxiv.org/abs/1911.01547)<br> 25-shot | 59.47% | 57.76% | xxxxxx |
|
21 |
+
| [MMLU](https://arxiv.org/abs/2009.03300)<br> 5-shot | 65.29% | 60.44% | xxxxxx |
|
22 |
+
| [HellaSwag](https://arxiv.org/abs/1905.07830)<br> 10-shot | 82.14% | 79.97% | 79.61% |
|
23 |
+
| [WinoGrande](https://arxiv.org/abs/1907.10641)<br> 5-shot | 77.27% | 77.19% | 76.32% |
|
24 |
+
| [GSM8K](https://arxiv.org/abs/2110.14168)<br> 5-shot | 44.81% | 47.92% | 49.36% |
|
25 |
+
| [TruthfulQA](https://arxiv.org/abs/2109.07958)<br> 0-shot | 43.96% | 41.02% | 40.82% |
|
26 |
+
| **Average<br>Accuracy** | **62.16%** | **60.72%** | xxxxxx |
|
27 |
+
| **Recovery** | **100%** | **97.68%** | xxxxxx |
|
28 |
+
|
29 |
+
|
30 |
+
## Help
|
31 |
+
|
32 |
+
For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)
|