xmadai
/

Mistral-Small-Instruct-2409-xMADai-INT4

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

JonahYixMAD commited on Oct 21, 2024

Commit

e04f911

·

verified ·

1 Parent(s): 191f8fb

Update README.md

Files changed (1) hide show

README.md +0 -1

README.md CHANGED Viewed

@@ -14,7 +14,6 @@ This repository contains [`mistralai/Mistral-Small-Instruct-2409`](https://huggi
 1. **Memory-efficiency:** The full-precision model is around 44 GB, while this xMADified model is only 12 GB, making it feasible to run on a 16 GB GPU.
 2. **Accuracy:** This xMADified model preserves the quality of the full-precision model. In the table below, we present the zero-shot accuracy on popular benchmarks of this xMADified model against the [GPTQ](https://github.com/AutoGPTQ/AutoGPTQ)-quantized model (both w4g128 for a fair comparison). GPTQ fails on the difficult **MMLU** task, while the xMADai model offers significantly higher accuracy.
 | Model | MMLU | Arc Challenge | Arc Easy | LAMBADA | WinoGrande | PIQA |
 |---|---|---|---|---|---|---|
 | GPTQ Mistral-Small-Instruct-2409 | 49.45 | 56.14 | 80.64 | 75.1 | 77.74 | 77.48 |

 1. **Memory-efficiency:** The full-precision model is around 44 GB, while this xMADified model is only 12 GB, making it feasible to run on a 16 GB GPU.
 2. **Accuracy:** This xMADified model preserves the quality of the full-precision model. In the table below, we present the zero-shot accuracy on popular benchmarks of this xMADified model against the [GPTQ](https://github.com/AutoGPTQ/AutoGPTQ)-quantized model (both w4g128 for a fair comparison). GPTQ fails on the difficult **MMLU** task, while the xMADai model offers significantly higher accuracy.
 | Model | MMLU | Arc Challenge | Arc Easy | LAMBADA | WinoGrande | PIQA |
 |---|---|---|---|---|---|---|
 | GPTQ Mistral-Small-Instruct-2409 | 49.45 | 56.14 | 80.64 | 75.1 | 77.74 | 77.48 |