nm-testing
/

SparseLlama-3-8B-pruned_50.2of4-FP8

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

mgoin commited on Jun 20, 2024

Commit

9492007

·

verified ·

1 Parent(s): 13166f9

Create README.md

Files changed (1) hide show

README.md +32 -0

README.md ADDED Viewed

	@@ -0,0 +1,32 @@

+---
+tags:
+- sparse
+- fp8
+- vllm
+---
+# Meta-Llama-3-8B-pruned_50.2of4-FP8
+This repo contains model files for a 2:4 (N:M) sparse [Meta-Llama-3-8B](meta-llama/Meta-Llama-3-8B) model pruned in one-shot with [SparseGPT](https://arxiv.org/abs/2301.00774), and then additionally retrained with the [SquareHead](https://arxiv.org/abs/2310.06927) knowledge distillation while maintaining the 2:4 sparsity mask.
+It was then quantized using AutoFP8 to FP8 weights and activations with per-tensor scales, calibrated on UltraChat2k.
+## Evaluation Benchmark Results
+Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) following the configuration of [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard).
+| Benchmark                                      | Meta-Llama-3-8B  | Meta-Llama-3-8B-pruned_50.2of4 | Meta-Llama-3-8B-pruned_50.2of4-FP8<br>(this model) |
+|:----------------------------------------------:|:-----------:|:-----------------------------:|:-----------------------------:|
+| [ARC-c](https://arxiv.org/abs/1911.01547)<br> 25-shot     | 59.47%       | 57.76%           | xxxxxx            |
+| [MMLU](https://arxiv.org/abs/2009.03300)<br> 5-shot       | 65.29%       | 60.44%           | xxxxxx            |
+| [HellaSwag](https://arxiv.org/abs/1905.07830)<br> 10-shot | 82.14%       | 79.97%           | 79.61%            |
+| [WinoGrande](https://arxiv.org/abs/1907.10641)<br> 5-shot | 77.27%       | 77.19%           | 76.32%            |
+| [GSM8K](https://arxiv.org/abs/2110.14168)<br> 5-shot      | 44.81%       | 47.92%           | 49.36%            |
+| [TruthfulQA](https://arxiv.org/abs/2109.07958)<br> 0-shot | 43.96%       | 41.02%           | 40.82%            |
+| **Average<br>Accuracy**                                   | **62.16%**   | **60.72%**       | xxxxxx            |
+| **Recovery**                                              | **100%**     | **97.68%**       | xxxxxx            |
+## Help
+For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)