File size: 6,334 Bytes
092f3c0 2a9709c e858f33 2a9709c 092f3c0 2a9709c d81befe 3570ad2 4679181 3570ad2 2a9709c f07212d 2a9709c dcde308 2a9709c 51a3999 2a9709c 5ad5c51 2a9709c 4d2310a 313d20b 2a9709c ec1535d 2a9709c 51a3999 2a9709c 65131bf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
---
language:
- de
license: apache-2.0
tags:
- dpo
- alignment-handbook
---
<div align="center">
<img src=https://cdn-uploads.huggingface.co/production/uploads/6474c16e7d131daf633db8ad/-mL8PSG00X2lEw1lb8E1Q.png>
</div>
# Model Card for Phoenix
**Phoenix** is a model trained using Direct Preference Optimization (DPO) for the german language. Its training procedure follows the process of the alignment-handbook from Huggingface.
In contrast to zephyr and notus this model has been trained using german instruction and dpo data. In detail, a german translation of HuggingFaceH4/ultrachat_200k
and HuggingFaceH4/ultrafeedback_binarized were created in addition to a series of allready available instruction datasets. The LLM haoranxu/ALMA-13B was used for this.
While the mistral model performs really well, it is not really suitable for the german language. Therefore we have used the fantastic LeoLM/leo-mistral-hessianai-7b.
Thanks to the new type of training, Phoenix is not only able to compete with the Mistral model from LeoLM but also **beats the Llama-70b-chat model in 2 mt-bench categories**.
This model **wouldn't have been possible without the amazing work of Huggingface, LeoLM, openbnb, argilla, the Alma-Team and many others of the AI community**.
i would like to personally thank all AI researchers who make the training of such models possible
## MT-Bench-DE Scores
Phoenix beats the LeoLM-Mistral model in all categories except for coding and humanities.
Additionally it also Beats LeoLM/Llama-2-70b-chat in roleplay and reasoning which shows the power of DPO.
```
{
"first_turn": 6.39375,
"second_turn": 5.1625,
"categories": {
"writing": 7.45,
"roleplay": 7.9,
"reasoning": 4.3,
"math": 3.25,
"coding": 2.5,
"extraction": 5.9,
"stem": 7.125,
"humanities": 7.8
},
"average": 5.778124999999999
}
```
## Other Evaluations
Florian Leurer compared Phoenix to other LLMs. Check it out here:
['Evaluation of German LLMs'](https://www.linkedin.com/posts/florian-leuerer-927479194_vermutlich-relativ-unbeobachtet-ist-gestern-activity-7151475428019388418-sAKR?utm_source=share&utm_medium=member_desktop)
## Model Details
### Model Description
- **Developed by:** Matthias Uhlig (based on HuggingFace H4, Argillla and MistralAI previous efforts and amazing work)
- **Shared by:** Matthias Uhlig
- **Model type:** GPT-like 7B model DPO fine-tuned
- **Language(s) (NLP):** German
- **License:** Apache 2.0 (same as alignment-handbook/zephyr-7b-dpo-full)
- **Finetuned from model:** [`LeoLM/leo-mistral-hessianai-7b`](https://huggingface.co/LeoLM/leo-mistral-hessianai-7b)
### Model Sources
- **Repository:** -
- **Paper:** [`PHOENIX: Open-Source Language Adaption for Direct Preference Optimization`](https://arxiv.org/abs/2401.10580)
- **Demo:** -
## Training Details
### Training Hardware
We used a VM with 8 x A100 80GB hosted in Runpods.io.
### Training Data
We used a new translated version of [`HuggingFaceH4/ultrachat_200k`](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k), and [argilla/ultrafeedback-binarized-preferences](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences).
The data used for training will be made public after additional quality inspection.
## Prompt template
We use the same prompt template as [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta):
```
<|system|>
</s>
<|user|>
{prompt}</s>
<|assistant|>
```
It is also possible to use the model in a multi-turn setup
```
<|system|>
</s>
<|user|>
{prompt_1}</s>
<|assistant|>
{answer_1}</s>
<|user|>
{prompt_2}</s>
<|assistant|>
```
## Usage
You will first need to install `transformers` and `accelerate` (just to ease the device placement), then you can run any of the following:
### Via `generate`
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("DRXD1000/Phoenix", torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("DRXD1000/Phoenix")
prompt = [
{
"role": "system",
"content": "", #Not recommended. Phoenix does not react well on system prompts
},
{"role": "user", "content": "Erkläre mir was KI ist"},
]
inputs = tokenizer.apply_chat_template(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, num_return_sequences=1, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
```
## Ethical Considerations and Limitations
As with all LLMs, the potential outputs of `DRXD1000/Phoenix` cannot be predicted
in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses
to user prompts. Therefore, before deploying any applications of `DRXD1000/Phoenix`, developers should
perform safety testing and tuning tailored to their specific applications of the model.
Please see Meta's [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/).
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
#### SFT Training
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 2
- total_train_batch_size: 512
- total_eval_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- num_epochs: 1
#### DPO Training
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
### Citation
```
@misc{uhlig2024phoenix,
title={PHOENIX: Open-Source Language Adaption for Direct Preference Optimization},
author={Matthias Uhlig and Sigurd Schacht and Sudarshan Kamath Barkur},
year={2024},
eprint={2401.10580},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
### Framework versions
- Transformers 4.35.0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.14.1 |