File size: 6,469 Bytes
d1f632f
183fad2
 
a6f054c
 
 
 
d133f66
d1f632f
183fad2
3bb5fa2
183fad2
 
 
 
 
 
1e036f9
183fad2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bf943a4
183fad2
 
 
 
 
 
 
 
 
 
 
 
bf943a4
 
 
ecc09db
bf943a4
 
 
 
 
 
 
 
183fad2
 
 
 
 
45b03e8
c2b3704
45b03e8
 
 
183fad2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dda3460
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
183fad2
 
 
 
 
 
009801b
 
fa3b828
62ad30e
 
 
 
 
 
 
 
 
 
 
 
 
 
909103f
62ad30e
 
 
 
 
 
 
4650f1e
 
 
009801b
6708a31
 
 
 
 
4650f1e
6708a31
4650f1e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
---
language:
- de
pipeline_tag: text-generation
tags:
- awq
- autoawq
license: apache-2.0
---
# ***WIP*** 
(Please bear with me, this model will get better and get a license soon)


_Hermes + Leo + German AWQ = Germeo_

# Germeo-7B-AWQ

A German-English understanding, but German-only speaking model merged from [Hermeo-7B](https://https://huggingface.co/malteos/hermeo-7b).

### Model details

- **Merged from:** [leo-mistral-hessianai-7b-chat](https://huggingface.co/LeoLM/leo-mistral-hessianai-7b-chat) and [DPOpenHermes-7B-v2](https://huggingface.co/openaccess-ai-collective/DPOpenHermes-7B-v2)
- **Model type:** Causal decoder-only transformer language model
- **Languages:** German replies with English Understanding Capabilities
- **Calibration Data:** [LeoLM/OpenSchnabeltier](https://huggingface.co/datasets/LeoLM/OpenSchnabeltier)

### Quantization Procedure and Use Case:

The speciality of this model is that it solely replies in German, independently from the system message or prompt.
Within the AWQ-process I introduced OpenSchnabeltier as calibration data for the model to stress the importance of German Tokens.


### Usage

Setup in autoawq
```python
# setup [autoawq](https://github.com/casper-hansen/AutoAWQ)
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer, TextStreamer

quant_path = "aari1995/germeo-7b-awq"

# Load model
model = AutoAWQForCausalLM.from_quantized(quant_path, fuse_layers=True)
tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
```

Setup in transformers (works in colab)
```python
# pip install [autoawq](https://github.com/casper-hansen/AutoAWQ) and pip install --upgrade transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

quant_path = "aari1995/germeo-7b-awq"

# Load model
model = AutoModelForCausalLM.from_pretrained(quant_path, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
```

### Inference:
```python
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

# Convert prompt to tokens
prompt_template = """<|im_start|>system
Du bist ein hilfreicher Assistent.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant"""

prompt = "Schreibe eine Stellenanzeige für Data Scientist bei AXA!"

tokens = tokenizer(
    prompt_template.format(prompt=prompt), 
    return_tensors='pt'
).input_ids.cuda()

# Generate output
generation_output = model.generate(
    tokens, 
    streamer=streamer,
    max_new_tokens=1012
)
# tokenizer.decode(generation_output.flatten())
```

### FAQ
#### The model continues after the reply with user inputs:
  To solve this, you need to implement a custom stopping criteria:

```python
from transformers import StoppingCriteria
class GermeoStoppingCriteria(StoppingCriteria):
  def __init__(self, target_sequence, prompt):
      self.target_sequence = target_sequence
      self.prompt=prompt

  def __call__(self, input_ids, scores, **kwargs):
      # Get the generated text as a string
      generated_text = tokenizer.decode(input_ids[0])
      generated_text = generated_text.replace(self.prompt,'')
      # Check if the target sequence appears in the generated text
      if self.target_sequence in generated_text:
          return True  # Stop generation

      return False  # Continue generation

  def __len__(self):
      return 1

  def __iter__(self):
      yield self
```
This then expects your input prompt (formatted as given into the model), and a stopping criteria, in this case the im_end token. Simply add it to the generation:

```python
generation_output = model.generate(
    tokens, 
    streamer=streamer,
    max_new_tokens=1012,
    stopping_criteria=GermeoStoppingCriteria("<|im_end|>", prompt_template.format(prompt=prompt))
)
```
### Acknowledgements and Special Thanks

- Thank you [malteos](https://https://huggingface.co/malteos/)  for hermeo, without this it would not be possible! (and all your other contributions)
- Thanks to the authors of the base models: [Mistral](https://mistral.ai/), [LAION](https://laion.ai/), [HessianAI](https://hessian.ai/), [Open Access AI Collective](https://huggingface.co/openaccess-ai-collective), [@teknium](https://huggingface.co/teknium), [@bjoernp](https://huggingface.co/bjoernp)
- Also [@bjoernp](https://huggingface.co/bjoernp) thank you for your contribution and LeoLM for OpenSchnabeltier.

## Evaluation and Benchmarks (German only)


### German benchmarks

| **German tasks:**             | **MMLU-DE**    | **Hellaswag-DE** | **ARC-DE**      |**Average**      |
|-------------------------------|-------------|---------------|--------------|--------------|
| **Models / Few-shots:**       | _(5 shots)_ | _(10 shots)_  | _(24 shots)_ | |
| _7B parameters_      |  | |  | |
| llama-2-7b                    | 0.400       | 0.513         | 0.381        | 0.431  |
| leo-hessianai-7b              | 0.400       | 0.609         | 0.429        | 0.479 |
| bloom-6b4-clp-german          | 0.274       | 0.550         | 0.351        | 0.392 |
| mistral-7b                    | **0.524**       | 0.588         | 0.473        | 0.528 |
| leo-mistral-hessianai-7b      | 0.481       | 0.663         | 0.485        | 0.543 |
| leo-mistral-hessianai-7b-chat | 0.458       | 0.617         | 0.465        | 0.513 |
| DPOpenHermes-7B-v2            | 0.517         | 0.603         | 0.515        | 0.545 |
| hermeo-7b                     | 0.511       | **0.668**         | **0.528**        | **0.569** |
| **germeo-7b-awq (this model)**| 0.522       | 0.651         | 0.514        | 0.563 |
| _13B parameters_      |  | |  | |
| llama-2-13b                    | 0.469       | 0.581        | 0.468        | 0.506 |
| leo-hessianai-13b              | **0.486**       | **0.658**         | **0.509**       | **0.551** |
| _70B parameters_      |  | |  | |
| llama-2-70b                    | 0.597       | 0.674       | 0.561       | 0.611 |
| leo-hessianai-70b              | **0.653**       | **0.721**         | **0.600**       | **0.658** |


### German reply rate benchmark
The fraction of German reply rates according to [this benchmark](https://huggingface.co/spaces/floleuerer/german_llm_outputs)

| **Models:**             | **German Response Rate**    |
|-------------------------|-------------------------|
| hermeo-7b                     | tba      |
| **germeo-7b-awq (this model)**| tba       |

### Additional Benchmarks:

TruthfulQA-DE: 0.508