File size: 6,469 Bytes
d1f632f 183fad2 a6f054c d133f66 d1f632f 183fad2 3bb5fa2 183fad2 1e036f9 183fad2 bf943a4 183fad2 bf943a4 ecc09db bf943a4 183fad2 45b03e8 c2b3704 45b03e8 183fad2 dda3460 183fad2 009801b fa3b828 62ad30e 909103f 62ad30e 4650f1e 009801b 6708a31 4650f1e 6708a31 4650f1e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
---
language:
- de
pipeline_tag: text-generation
tags:
- awq
- autoawq
license: apache-2.0
---
# ***WIP***
(Please bear with me, this model will get better and get a license soon)
_Hermes + Leo + German AWQ = Germeo_
# Germeo-7B-AWQ
A German-English understanding, but German-only speaking model merged from [Hermeo-7B](https://https://huggingface.co/malteos/hermeo-7b).
### Model details
- **Merged from:** [leo-mistral-hessianai-7b-chat](https://huggingface.co/LeoLM/leo-mistral-hessianai-7b-chat) and [DPOpenHermes-7B-v2](https://huggingface.co/openaccess-ai-collective/DPOpenHermes-7B-v2)
- **Model type:** Causal decoder-only transformer language model
- **Languages:** German replies with English Understanding Capabilities
- **Calibration Data:** [LeoLM/OpenSchnabeltier](https://huggingface.co/datasets/LeoLM/OpenSchnabeltier)
### Quantization Procedure and Use Case:
The speciality of this model is that it solely replies in German, independently from the system message or prompt.
Within the AWQ-process I introduced OpenSchnabeltier as calibration data for the model to stress the importance of German Tokens.
### Usage
Setup in autoawq
```python
# setup [autoawq](https://github.com/casper-hansen/AutoAWQ)
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer, TextStreamer
quant_path = "aari1995/germeo-7b-awq"
# Load model
model = AutoAWQForCausalLM.from_quantized(quant_path, fuse_layers=True)
tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
```
Setup in transformers (works in colab)
```python
# pip install [autoawq](https://github.com/casper-hansen/AutoAWQ) and pip install --upgrade transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
quant_path = "aari1995/germeo-7b-awq"
# Load model
model = AutoModelForCausalLM.from_pretrained(quant_path, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
```
### Inference:
```python
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
# Convert prompt to tokens
prompt_template = """<|im_start|>system
Du bist ein hilfreicher Assistent.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant"""
prompt = "Schreibe eine Stellenanzeige für Data Scientist bei AXA!"
tokens = tokenizer(
prompt_template.format(prompt=prompt),
return_tensors='pt'
).input_ids.cuda()
# Generate output
generation_output = model.generate(
tokens,
streamer=streamer,
max_new_tokens=1012
)
# tokenizer.decode(generation_output.flatten())
```
### FAQ
#### The model continues after the reply with user inputs:
To solve this, you need to implement a custom stopping criteria:
```python
from transformers import StoppingCriteria
class GermeoStoppingCriteria(StoppingCriteria):
def __init__(self, target_sequence, prompt):
self.target_sequence = target_sequence
self.prompt=prompt
def __call__(self, input_ids, scores, **kwargs):
# Get the generated text as a string
generated_text = tokenizer.decode(input_ids[0])
generated_text = generated_text.replace(self.prompt,'')
# Check if the target sequence appears in the generated text
if self.target_sequence in generated_text:
return True # Stop generation
return False # Continue generation
def __len__(self):
return 1
def __iter__(self):
yield self
```
This then expects your input prompt (formatted as given into the model), and a stopping criteria, in this case the im_end token. Simply add it to the generation:
```python
generation_output = model.generate(
tokens,
streamer=streamer,
max_new_tokens=1012,
stopping_criteria=GermeoStoppingCriteria("<|im_end|>", prompt_template.format(prompt=prompt))
)
```
### Acknowledgements and Special Thanks
- Thank you [malteos](https://https://huggingface.co/malteos/) for hermeo, without this it would not be possible! (and all your other contributions)
- Thanks to the authors of the base models: [Mistral](https://mistral.ai/), [LAION](https://laion.ai/), [HessianAI](https://hessian.ai/), [Open Access AI Collective](https://huggingface.co/openaccess-ai-collective), [@teknium](https://huggingface.co/teknium), [@bjoernp](https://huggingface.co/bjoernp)
- Also [@bjoernp](https://huggingface.co/bjoernp) thank you for your contribution and LeoLM for OpenSchnabeltier.
## Evaluation and Benchmarks (German only)
### German benchmarks
| **German tasks:** | **MMLU-DE** | **Hellaswag-DE** | **ARC-DE** |**Average** |
|-------------------------------|-------------|---------------|--------------|--------------|
| **Models / Few-shots:** | _(5 shots)_ | _(10 shots)_ | _(24 shots)_ | |
| _7B parameters_ | | | | |
| llama-2-7b | 0.400 | 0.513 | 0.381 | 0.431 |
| leo-hessianai-7b | 0.400 | 0.609 | 0.429 | 0.479 |
| bloom-6b4-clp-german | 0.274 | 0.550 | 0.351 | 0.392 |
| mistral-7b | **0.524** | 0.588 | 0.473 | 0.528 |
| leo-mistral-hessianai-7b | 0.481 | 0.663 | 0.485 | 0.543 |
| leo-mistral-hessianai-7b-chat | 0.458 | 0.617 | 0.465 | 0.513 |
| DPOpenHermes-7B-v2 | 0.517 | 0.603 | 0.515 | 0.545 |
| hermeo-7b | 0.511 | **0.668** | **0.528** | **0.569** |
| **germeo-7b-awq (this model)**| 0.522 | 0.651 | 0.514 | 0.563 |
| _13B parameters_ | | | | |
| llama-2-13b | 0.469 | 0.581 | 0.468 | 0.506 |
| leo-hessianai-13b | **0.486** | **0.658** | **0.509** | **0.551** |
| _70B parameters_ | | | | |
| llama-2-70b | 0.597 | 0.674 | 0.561 | 0.611 |
| leo-hessianai-70b | **0.653** | **0.721** | **0.600** | **0.658** |
### German reply rate benchmark
The fraction of German reply rates according to [this benchmark](https://huggingface.co/spaces/floleuerer/german_llm_outputs)
| **Models:** | **German Response Rate** |
|-------------------------|-------------------------|
| hermeo-7b | tba |
| **germeo-7b-awq (this model)**| tba |
### Additional Benchmarks:
TruthfulQA-DE: 0.508 |