xmadai
/

Mistral-Small-Instruct-2409-xMADai-INT4

@@ -14,13 +14,14 @@ This repository contains [`mistralai/Mistral-Small-Instruct-2409`](https://huggi
 1. **Memory-efficiency:** The full-precision model is around 44 GB, while this xMADified model is only 12 GB, making it feasible to run on a 16 GB GPU.
 2. **Accuracy:** This xMADified model preserves the quality of the full-precision model. In the table below, we present the zero-shot accuracy on popular benchmarks of this xMADified model against the [GPTQ](https://github.com/AutoGPTQ/AutoGPTQ)-quantized model, and the full-precision model. The GPTQ model fails on the difficult **MMLU** task, while the xMADai model offers significantly higher accuracy.
-| Model | Size | MMLU | Arc Challenge | Arc Easy | LAMBADA | WinoGrande | PIQA |
-|---|---|---|---|---|---|---|---|
-| mistralai/Mistral-Small-Instruct-2409 | 44.5 GB | 69.48 | 58.79 | 84.72 | 79.06 | 79.08 | 82.43 |
-| GPTQ Mistral-Small-Instruct-2409 | 12.2 GB | 49.45 | 56.14 | 80.64 | 75.1 | 77.74 | 77.48 |
-| xMADified Mistral-Small-Instruct-2409 (this model) | 12.2 GB | **68.59** | **57.51** | **82.83** | **77.74** | **79.56** | **81.34** |
-3. **Fine-tuning**:  These models are fine-tunable over the same reduced (12 GB) hardware in mere 3-clicks. Watch our product demo [here](https://www.youtube.com/watch?v=S0wX32kT90s&list=TLGGL9fvmJ-d4xsxODEwMjAyNA)
 # How to Run Model
@@ -28,8 +29,7 @@ Loading the model checkpoint of this xMADified model requires less than 12 GiB o
 **Package prerequisites**: Run the following commands to install the required packages.
 ```bash
-pip install -q --upgrade transformers accelerate optimum
-pip install -q --no-build-isolation auto-gptq
 ```
 **Sample Inference Code**
@@ -70,4 +70,4 @@ Here's a sample output of the model, using the code above:
 # Contact Us
-For additional xMADified models, access to fine-tuning, and general questions, please contact us at [email protected] and join our waiting list.

 1. **Memory-efficiency:** The full-precision model is around 44 GB, while this xMADified model is only 12 GB, making it feasible to run on a 16 GB GPU.
 2. **Accuracy:** This xMADified model preserves the quality of the full-precision model. In the table below, we present the zero-shot accuracy on popular benchmarks of this xMADified model against the [GPTQ](https://github.com/AutoGPTQ/AutoGPTQ)-quantized model, and the full-precision model. The GPTQ model fails on the difficult **MMLU** task, while the xMADai model offers significantly higher accuracy.
+| Model                                              | Size    | MMLU      | Arc Challenge | Arc Easy  | LAMBADA   | WinoGrande | PIQA      |
+| -------------------------------------------------- | ------- | --------- | ------------- | --------- | --------- | ---------- | --------- |
+| xMADified Mistral-Small-Instruct-2409 (this model) | 12.2 GB | **68.59** | **57.51**     | **82.83** | **77.74** | **79.56**  | **81.34** |
+| mistralai/Mistral-Small-Instruct-2409              | 44.5 GB | 69.48     | 58.79         | 84.72     | 79.06     | 79.08      | 82.43     |
+| GPTQ Mistral-Small-Instruct-2409                   | 12.2 GB | 49.45     | 56.14         | 80.64     | 75.1      | 77.74      | 77.48     |
+1. **Fine-tuning**:  These models are fine-tunable over the same reduced (12 GB) hardware in mere 3-clicks. Watch our product demo [here](https://www.youtube.com/watch?v=S0wX32kT90s&list=TLGGL9fvmJ-d4xsxODEwMjAyNA)
 # How to Run Model
 **Package prerequisites**: Run the following commands to install the required packages.
 ```bash
+pip install torch==2.4.0 transformers accelerate optimum && pip install -vvv --no-build-isolation "git+https://github.com/PanQiWei/[email protected]"
 ```
 **Sample Inference Code**
 # Contact Us
+For additional xMADified models, access to fine-tuning, and general questions, please contact us at [email protected] and join our waiting list.