markoarnauto commited on
Commit
16f9394
1 Parent(s): 033dfe3

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +18 -18
README.md CHANGED
@@ -25,24 +25,24 @@ curl http://localhost:8000/v1/completions -H "Content-Type: application/json
25
  ```
26
 
27
  ## Evaluations
28
- | __English__ | __SKLM Llama-3 70B Instruct__ | __SKLM Llama-3 70B Instruct GPTQ__ | __SKLM Mixtral Instruct__ |
29
- |:--------------|:--------------------------------|:-------------------------------------|:----------------------------|
30
- | Avg. | 78.17 | 76.72 | 73.47 |
31
- | ARC | 74.5 | 73.0 | 71.7 |
32
- | Hellaswag | 79.2 | 78.0 | 77.4 |
33
- | MMLU | 80.8 | 79.15 | 71.31 |
34
- | | | | |
35
- | __German__ | __SKLM Llama-3 70B Instruct__ | __SKLM Llama-3 70B Instruct GPTQ__ | __SKLM Mixtral Instruct__ |
36
- | Avg. | 70.83 | 69.13 | 66.43 |
37
- | ARC_de | 66.7 | 65.9 | 62.7 |
38
- | Hellaswag_de | 70.8 | 68.8 | 72.9 |
39
- | MMLU_de | 75.0 | 72.7 | 63.7 |
40
- | | | | |
41
- | __Safety__ | __SKLM Llama-3 70B Instruct__ | __SKLM Llama-3 70B Instruct GPTQ__ | __SKLM Mixtral Instruct__ |
42
- | Avg. | 65.86 | 65.94 | 64.18 |
43
- | RealToxicityPrompts | 97.6 | 98.4 | 93.2 |
44
- | TruthfulQA | 67.07 | 65.56 | 65.84 |
45
- | CrowS | 32.92 | 33.87 | 33.51 |
46
 
47
  Take with caution. We did not check for data contamination.
48
  Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) using `limit=1000` for big datasets.
 
25
  ```
26
 
27
  ## Evaluations
28
+ | __English__ | __SKLM Llama-3 70B Instruct__ | __SKLM Llama-3 70B Instruct GPTQ__ | __Llama-3 70B Instruct__ |
29
+ |:--------------|:--------------------------------|:-------------------------------------|:---------------------------|
30
+ | Avg. | 78.17 | 76.72 | 76.19 |
31
+ | ARC | 74.5 | 73.0 | 71.6 |
32
+ | Hellaswag | 79.2 | 78.0 | 77.3 |
33
+ | MMLU | 80.8 | 79.15 | 79.66 |
34
+ | | | | |
35
+ | __German__ | __SKLM Llama-3 70B Instruct__ | __SKLM Llama-3 70B Instruct GPTQ__ | __Llama-3 70B Instruct__ |
36
+ | Avg. | 70.83 | 69.13 | 68.43 |
37
+ | ARC_de | 66.7 | 65.9 | 64.2 |
38
+ | Hellaswag_de | 70.8 | 68.8 | 67.8 |
39
+ | MMLU_de | 75.0 | 72.7 | 73.3 |
40
+ | | | | |
41
+ | __Safety__ | __SKLM Llama-3 70B Instruct__ | __SKLM Llama-3 70B Instruct GPTQ__ | __Llama-3 70B Instruct__ |
42
+ | Avg. | 65.86 | 65.94 | 64.28 |
43
+ | RealToxicityPrompts | 97.6 | 98.4 | 97.9 |
44
+ | TruthfulQA | 67.07 | 65.56 | 61.91 |
45
+ | CrowS | 32.92 | 33.87 | 33.04 |
46
 
47
  Take with caution. We did not check for data contamination.
48
  Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) using `limit=1000` for big datasets.