cortecs
/

Llama-3-SauerkrautLM-70b-Instruct-GPTQ

@@ -25,24 +25,24 @@ curl http://localhost:8000/v1/completions     -H "Content-Type: application/json
 ```
 ## Evaluations
-| __English__   | __SKLM Llama-3 70B Instruct__   | __SKLM Llama-3 70B Instruct GPTQ__   | __SKLM Mixtral Instruct__   |
-|:--------------|:--------------------------------|:-------------------------------------|:----------------------------|
-| Avg.          | 78.17                           | 76.72                                | 73.47                       |
-| ARC           | 74.5                            | 73.0                                 | 71.7                        |
-| Hellaswag     | 79.2                            | 78.0                                 | 77.4                        |
-| MMLU          | 80.8                            | 79.15                                | 71.31                       |
-|               |                                 |                                      |                             |
-| __German__   | __SKLM Llama-3 70B Instruct__   | __SKLM Llama-3 70B Instruct GPTQ__   | __SKLM Mixtral Instruct__   |
-| Avg.         | 70.83                           | 69.13                                | 66.43                       |
-| ARC_de       | 66.7                            | 65.9                                 | 62.7                        |
-| Hellaswag_de | 70.8                            | 68.8                                 | 72.9                        |
-| MMLU_de      | 75.0                            | 72.7                                 | 63.7                        |
-|              |                                 |                                      |                             |
-| __Safety__          |   __SKLM Llama-3 70B Instruct__ |   __SKLM Llama-3 70B Instruct GPTQ__ |   __SKLM Mixtral Instruct__ |
-| Avg.                |                           65.86 |                                65.94 |                       64.18 |
-| RealToxicityPrompts |                           97.6  |                                98.4  |                       93.2  |
-| TruthfulQA          |                           67.07 |                                65.56 |                       65.84 |
-| CrowS               |                           32.92 |                                33.87 |                       33.51 |
 Take with caution. We did not check for data contamination.
      Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) using `limit=1000` for big datasets.

 ```
 ## Evaluations
+| __English__   | __SKLM Llama-3 70B Instruct__   | __SKLM Llama-3 70B Instruct GPTQ__   | __Llama-3 70B Instruct__   |
+|:--------------|:--------------------------------|:-------------------------------------|:---------------------------|
+| Avg.          | 78.17                           | 76.72                                | 76.19                      |
+| ARC           | 74.5                            | 73.0                                 | 71.6                       |
+| Hellaswag     | 79.2                            | 78.0                                 | 77.3                       |
+| MMLU          | 80.8                            | 79.15                                | 79.66                      |
+|               |                                 |                                      |                            |
+| __German__   | __SKLM Llama-3 70B Instruct__   | __SKLM Llama-3 70B Instruct GPTQ__   | __Llama-3 70B Instruct__   |
+| Avg.         | 70.83                           | 69.13                                | 68.43                      |
+| ARC_de       | 66.7                            | 65.9                                 | 64.2                       |
+| Hellaswag_de | 70.8                            | 68.8                                 | 67.8                       |
+| MMLU_de      | 75.0                            | 72.7                                 | 73.3                       |
+|              |                                 |                                      |                            |
+| __Safety__          |   __SKLM Llama-3 70B Instruct__ |   __SKLM Llama-3 70B Instruct GPTQ__ |   __Llama-3 70B Instruct__ |
+| Avg.                |                           65.86 |                                65.94 |                      64.28 |
+| RealToxicityPrompts |                           97.6  |                                98.4  |                      97.9  |
+| TruthfulQA          |                           67.07 |                                65.56 |                      61.91 |
+| CrowS               |                           32.92 |                                33.87 |                      33.04 |
 Take with caution. We did not check for data contamination.
      Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) using `limit=1000` for big datasets.