markoarnauto commited on
Commit
9f16fb2
1 Parent(s): 16f9394

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +24 -23
README.md CHANGED
@@ -3,9 +3,9 @@ datasets: LeoLM/wikitext-en-de
3
  license: other
4
  license_link: https://llama.meta.com/llama3/license/
5
  ---
6
- This is a quantized model of [SKLM Llama-3 70B Instruct](https://huggingface.co/VAGOsolutions/Llama-3-SauerkrautLM-70b-Instruct) using GPTQ developed by [IST Austria](https://ist.ac.at/en/research/alistarh-group/)
7
  using the following configuration:
8
- - 4bit (8bit will follow)
9
  - Act order: True
10
  - Group size: 128
11
 
@@ -20,34 +20,35 @@ Access the model:
20
  ```
21
  curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d ' {
22
  "model": "cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ",
23
- "prompt": "Berlin ist eine"
24
  } '
25
  ```
26
 
27
  ## Evaluations
28
- | __English__ | __SKLM Llama-3 70B Instruct__ | __SKLM Llama-3 70B Instruct GPTQ__ | __Llama-3 70B Instruct__ |
29
- |:--------------|:--------------------------------|:-------------------------------------|:---------------------------|
30
- | Avg. | 78.17 | 76.72 | 76.19 |
31
- | ARC | 74.5 | 73.0 | 71.6 |
32
- | Hellaswag | 79.2 | 78.0 | 77.3 |
33
- | MMLU | 80.8 | 79.15 | 79.66 |
34
- | | | | |
35
- | __German__ | __SKLM Llama-3 70B Instruct__ | __SKLM Llama-3 70B Instruct GPTQ__ | __Llama-3 70B Instruct__ |
36
- | Avg. | 70.83 | 69.13 | 68.43 |
37
- | ARC_de | 66.7 | 65.9 | 64.2 |
38
- | Hellaswag_de | 70.8 | 68.8 | 67.8 |
39
- | MMLU_de | 75.0 | 72.7 | 73.3 |
40
- | | | | |
41
- | __Safety__ | __SKLM Llama-3 70B Instruct__ | __SKLM Llama-3 70B Instruct GPTQ__ | __Llama-3 70B Instruct__ |
42
- | Avg. | 65.86 | 65.94 | 64.28 |
43
- | RealToxicityPrompts | 97.6 | 98.4 | 97.9 |
44
- | TruthfulQA | 67.07 | 65.56 | 61.91 |
45
- | CrowS | 32.92 | 33.87 | 33.04 |
46
 
47
- Take with caution. We did not check for data contamination.
48
- Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) using `limit=1000` for big datasets.
49
 
50
  ## Performance
51
  | | requests/s | tokens/s |
52
  |:--------------|-------------:|-----------:|
53
  | NVIDIA L40Sx2 | 2.19 | 1044.76 |
 
 
3
  license: other
4
  license_link: https://llama.meta.com/llama3/license/
5
  ---
6
+ This is a quantized model of [Llama-3-SauerkrautLM-70b-Instruct](https://huggingface.co/VAGOSolutions/Llama-3-SauerkrautLM-70b-Instruct) using GPTQ developed by [IST Austria](https://ist.ac.at/en/research/alistarh-group/)
7
  using the following configuration:
8
+ - 4bit
9
  - Act order: True
10
  - Group size: 128
11
 
 
20
  ```
21
  curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d ' {
22
  "model": "cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ",
23
+ "prompt": "San Francisco is a"
24
  } '
25
  ```
26
 
27
  ## Evaluations
28
+ | __English__ | __[Llama-3-SauerkrautLM-70b-Instruct](https://huggingface.co/VAGOsolutions/Llama-3-SauerkrautLM-70b-Instruct)__ | __[Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b)__ | __[Llama-3-SauerkrautLM-70b-Instruct-GPTQ](https://huggingface.co/cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ)__ |
29
+ |:--------------|:------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------|
30
+ | Avg. | 78.17 | 78.1 | 76.72 |
31
+ | ARC | 74.5 | 74.4 | 73.0 |
32
+ | Hellaswag | 79.2 | 79.2 | 78.0 |
33
+ | MMLU | 80.8 | 80.7 | 79.15 |
34
+ | | | | |
35
+ | __German__ | __[Llama-3-SauerkrautLM-70b-Instruct](https://huggingface.co/VAGOsolutions/Llama-3-SauerkrautLM-70b-Instruct)__ | __[Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b)__ | __[Llama-3-SauerkrautLM-70b-Instruct-GPTQ](https://huggingface.co/cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ)__ |
36
+ | Avg. | 70.83 | 70.47 | 69.13 |
37
+ | ARC_de | 66.7 | 66.2 | 65.9 |
38
+ | Hellaswag_de | 70.8 | 71.0 | 68.8 |
39
+ | MMLU_de | 75.0 | 74.2 | 72.7 |
40
+ | | | | |
41
+ | __Safety__ | __[Llama-3-SauerkrautLM-70b-Instruct](https://huggingface.co/VAGOsolutions/Llama-3-SauerkrautLM-70b-Instruct)__ | __[Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b)__ | __[Llama-3-SauerkrautLM-70b-Instruct-GPTQ](https://huggingface.co/cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ)__ |
42
+ | Avg. | 65.86 | 65.94 | 65.94 |
43
+ | RealToxicityPrompts | 97.6 | 97.8 | 98.4 |
44
+ | TruthfulQA | 67.07 | 66.92 | 65.56 |
45
+ | CrowS | 32.92 | 33.09 | 33.87 |
46
 
47
+ We did not check for data contamination.
48
+ Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) using `limit=1000`.
49
 
50
  ## Performance
51
  | | requests/s | tokens/s |
52
  |:--------------|-------------:|-----------:|
53
  | NVIDIA L40Sx2 | 2.19 | 1044.76 |
54
+ Performance measured on [cortecs inference](https://cortecs.ai).