File size: 1,693 Bytes

---
tags:
- fp8
---


Meta-Llama-3-8B-Instruct quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.4.3.

Produced using https://github.com/neuralmagic/AutoFP8/blob/b0c1f789c51659bb023c06521ecbd04cea4a26f6/quantize.py

```bash
python quantize.py --model-id meta-llama/Meta-Llama-3-8B-Instruct --save-dir Meta-Llama-3-8B-Instruct-FP8
```

Accuracy on MMLU:
```
vllm (pretrained=meta-llama/Meta-Llama-3-8B-Instruct,gpu_memory_utilization=0.4), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16
|      Groups      |Version|Filter|n-shot|Metric|Value |   |Stderr|
|------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu              |N/A    |none  |     0|acc   |0.6569|±  |0.0038|
| - humanities     |N/A    |none  |     5|acc   |0.6049|±  |0.0068|
| - other          |N/A    |none  |     5|acc   |0.7203|±  |0.0078|
| - social_sciences|N/A    |none  |     5|acc   |0.7663|±  |0.0075|
| - stem           |N/A    |none  |     5|acc   |0.5652|±  |0.0085|

vllm (pretrained=nm-testing/Meta-Llama-3-8B-Instruct-FP8,quantization=fp8,gpu_memory_utilization=0.4), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16
|      Groups      |Version|Filter|n-shot|Metric|Value |   |Stderr|
|------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu              |N/A    |none  |     0|acc   |0.6567|±  |0.0038|
| - humanities     |N/A    |none  |     5|acc   |0.6072|±  |0.0068|
| - other          |N/A    |none  |     5|acc   |0.7206|±  |0.0078|
| - social_sciences|N/A    |none  |     5|acc   |0.7618|±  |0.0075|
| - stem           |N/A    |none  |     5|acc   |0.5649|±  |0.0085|
```