xmadai
/

Mistral-Small-Instruct-2409-xMADai-INT4

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

onebitquantized commited on Oct 23, 2024

Commit

2d35550

·

verified ·

1 Parent(s): e04f911

Update README.md

Files changed (1) hide show

README.md +6 -5

README.md CHANGED Viewed

@@ -13,11 +13,12 @@ This repository contains [`mistralai/Mistral-Small-Instruct-2409`](https://huggi
 1. **Memory-efficiency:** The full-precision model is around 44 GB, while this xMADified model is only 12 GB, making it feasible to run on a 16 GB GPU.
-2. **Accuracy:** This xMADified model preserves the quality of the full-precision model. In the table below, we present the zero-shot accuracy on popular benchmarks of this xMADified model against the [GPTQ](https://github.com/AutoGPTQ/AutoGPTQ)-quantized model (both w4g128 for a fair comparison). GPTQ fails on the difficult **MMLU** task, while the xMADai model offers significantly higher accuracy.
-| Model | MMLU | Arc Challenge | Arc Easy | LAMBADA | WinoGrande | PIQA |
-|---|---|---|---|---|---|---|
-| GPTQ Mistral-Small-Instruct-2409 | 49.45 | 56.14 | 80.64 | 75.1 | 77.74 | 77.48 |
-| xMADai Mistral-Small-Instruct-2409 | **68.59** | **57.51** | **82.83** | **77.74** | **79.56** | **81.34** |
 # How to Run Model

 1. **Memory-efficiency:** The full-precision model is around 44 GB, while this xMADified model is only 12 GB, making it feasible to run on a 16 GB GPU.
+2. **Accuracy:** This xMADified model preserves the quality of the full-precision model. In the table below, we present the zero-shot accuracy on popular benchmarks of this xMADified model against the [GPTQ](https://github.com/AutoGPTQ/AutoGPTQ)-quantized model, and the full-precision model. The GPTQ model fails on the difficult **MMLU** task, while the xMADai model offers significantly higher accuracy.
+| Model | Size | MMLU | Arc Challenge | Arc Easy | LAMBADA | WinoGrande | PIQA |
+|---|---|---|---|---|---|---|---|
+| mistralai/Mistral-Small-Instruct-2409 | 44.5 GB | 69.48 | 58.79 | 84.72 | 79.06 | 79.08 | 82.43 |
+| GPTQ Mistral-Small-Instruct-2409 | 12.2 GB | 49.45 | 56.14 | 80.64 | 75.1 | 77.74 | 77.48 |
+| xMADified Mistral-Small-Instruct-2409 (this model) | 12.2 GB | **68.59** | **57.51** | **82.83** | **77.74** | **79.56** | **81.34** |
 # How to Run Model