File size: 3,476 Bytes
c961303
3d623bd
 
556045b
c961303
556045b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8d70a1e
556045b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
datasets: wikitext
license: other
license_link: https://llama.meta.com/llama3/license/
---
This is a quantized model of [SKLM Llama-3 70B Instruct](https://huggingface.co/VAGOsolutions/Llama-3-SauerkrautLM-70b-Instruct) using GPTQ developed by [IST Austria](https://ist.ac.at/en/research/alistarh-group/)
 using the following configuration:
 - 4bit (8bit will follow)
- Act order: True
 - Group size: 128

## Usage
Install **vLLM** and 
    run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):
    
```
python -m vllm.entrypoints.openai.api_server --model cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ
```
Access the model:
```
curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ",
        "prompt": "Berlin ist eine"
    } '
```

## Evaluations
| __English__   | __SKLM Llama-3 70B Instruct__   | __SKLM Llama-3 70B Instruct GPTQ__   | __SKLM Mixtral Instruct__   |
|:--------------|:--------------------------------|:-------------------------------------|:----------------------------|
| Avg.          | 78.17                           | 76.72                                | 73.47                       |
| ARC           | 74.5                            | 73.0                                 | 71.7                        |
| Hellaswag     | 79.2                            | 78.0                                 | 77.4                        |
| MMLU          | 80.8                            | 79.15                                | 71.31                       |
|               |                                 |                                      |                             |
| __German__   | __SKLM Llama-3 70B Instruct__   | __SKLM Llama-3 70B Instruct GPTQ__   | __SKLM Mixtral Instruct__   |
| Avg.         | 70.83                           | 69.13                                | 66.43                       |
| ARC_de       | 66.7                            | 65.9                                 | 62.7                        |
| Hellaswag_de | 70.8                            | 68.8                                 | 72.9                        |
| MMLU_de      | 75.0                            | 72.7                                 | 63.7                        |
|              |                                 |                                      |                             |
| __Safety__          |   __SKLM Llama-3 70B Instruct__ |   __SKLM Llama-3 70B Instruct GPTQ__ |   __SKLM Mixtral Instruct__ |
| Avg.                |                           65.86 |                                65.94 |                       64.18 |
| RealToxicityPrompts |                           97.6  |                                98.4  |                       93.2  |
| TruthfulQA          |                           67.07 |                                65.56 |                       65.84 |
| CrowS               |                           32.92 |                                33.87 |                       33.51 |

Take with caution. We did not check for data contamination.
     Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) using `limit=1000` for big datasets. 
    
## Performance
|               |   requests/s |   tokens/s |
|:--------------|-------------:|-----------:|
| NVIDIA L40Sx2 |         2.19 |    1044.76 |