Phoenix-AWQ / README.md

Create README.md

d41c3cc verified 10 months ago

5.61 kB

	---
	license: apache-2.0
	language:
	- de
	tags:
	- dpo
	- alignment-handbook
	- awq
	- quantization
	---
	<div align="center">
	<img src=https://cdn-uploads.huggingface.co/production/uploads/6474c16e7d131daf633db8ad/-mL8PSG00X2lEw1lb8E1Q.png>
	</div>

	# AWQ-Version of Phoenix

	\| Bits \| GS \| AWQ Dataset \| Seq Len \|
	\| ---- \| -- \| ----------- \| ------- \|
	\| 4 \| 128 \| c4 \| 4096 \|

	# Model Card for Phoenix


	Phoenix is a model trained using Direct Preference Optimization (DPO) for the german language. Its training procedure follows the process of the alignment-handbook from Huggingface.
	In contrast to zephyr and notus this model has been trained using german instruction and dpo data. In detail, a german translation of HuggingFaceH4/ultrachat_200k
	and HuggingFaceH4/ultrafeedback_binarized were created in addition to a series of allready available instruction datasets. The LLM haoranxu/ALMA-13B was used for this.
	While the mistral model performs really well, it is not really suitable for the german language. Therefore we have used the fantastic LeoLM/leo-mistral-hessianai-7b.
	Thanks to the new type of training, Phoenix is not only able to compete with the Mistral model from LeoLM but also beats the Llama-70b-chat model in 2 mt-bench categories.
	This model wouldn't have been possible without the amazing work of Huggingface, LeoLM, openbnb, Argilla the Alma-Team and many others of the AI community.
	i would like to personally thank all AI researchers who make the training of such models possible

	## MT-Bench-DE Scores
	Phoenix beats the LeoLM-Mistral model in all categories except for coding and humanities.
	Additionally it also Beats LeoLM/Llama-2-70b-chat in roleplay and reasoning which shows the power of DPO.

	```
	{
	"first_turn": 6.39375,
	"second_turn": 5.1625,
	"categories": {
	"writing": 7.45,
	"roleplay": 7.9,
	"reasoning": 4.3,
	"math": 3.25,
	"coding": 2.5,
	"extraction": 5.9,
	"stem": 7.125,
	"humanities": 7.8
	},
	"average": 5.778124999999999
	}
	```

	## Other Evaluations

	Florian Leurer compared Phoenix to other LLMs. Check it out here:

	['Evaluation of German LLMs'](https://www.linkedin.com/posts/florian-leuerer-927479194_vermutlich-relativ-unbeobachtet-ist-gestern-activity-7151475428019388418-sAKR?utm_source=share&utm_medium=member_desktop)


	## Model Details

	### Model Description

	- Developed by: Matthias Uhlig (based on HuggingFace H4, Argillla and MistralAI previous efforts and amazing work)
	- Shared by: Matthias Uhlig
	- Model type: GPT-like 7B model DPO fine-tuned
	- Language(s) (NLP): German
	- License: Apache 2.0 (same as alignment-handbook/zephyr-7b-dpo-full)
	- Finetuned from model: [`LeoLM/leo-mistral-hessianai-7b`](https://huggingface.co/LeoLM/leo-mistral-hessianai-7b)

	### Model Sources

	- Repository: -
	- Paper: in progress
	- Demo: -

	## Training Details

	### Training Hardware

	We used a VM with 8 x A100 80GB hosted in Runpods.io.

	### Training Data

	We used a new translated version of [`HuggingFaceH4/ultrachat_200k`](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k), and [argilla/ultrafeedback-binarized-preferences](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences).

	The data used for training will be made public after additional quality inspection.

	## Prompt template
	We use the same prompt template as [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta):
	```
	<\|system\|>
	</s>
	<\|user\|>
	{prompt}</s>
	<\|assistant\|>
	```

	It is also possible to use the model in a multi-turn setup
	```
	<\|system\|>
	</s>
	<\|user\|>
	{prompt_1}</s>
	<\|assistant\|>
	{answer_1}</s>
	<\|user\|>
	{prompt_2}</s>
	<\|assistant\|>
	```

	## Usage
	You will first need to install `transformers` and `accelerate` (just to ease the device placement), then you can run any of the following:
	### Via `generate`
	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	model = AutoModelForCausalLM.from_pretrained("DRXD1000/Phoenix-AWQ", torch_dtype=torch.bfloat16, device_map="auto")
	tokenizer = AutoTokenizer.from_pretrained("DRXD1000/Phoenix-AWQ")
	prompt = """<\|system\|>
	</s>
	<\|user\|>
	Erkläre mir was KI ist.</s>
	<\|assistant\|>
	"""
	inputs = tokenizer.apply_chat_template(prompt, return_tensors="pt").to("cuda")
	outputs = model.generate(inputs, num_return_sequences=1, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	```

	## Ethical Considerations and Limitations

	As with all LLMs, the potential outputs of `DRXD1000/Phoenix-AWQ` cannot be predicted
	in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses
	to user prompts. Therefore, before deploying any applications of `DRXD1000/Phoenix-AWQ`, developers should
	perform safety testing and tuning tailored to their specific applications of the model.
	Please see Meta's [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/).



	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-07
	- train_batch_size: 8
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- total_train_batch_size: 64
	- total_eval_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 1


	### Framework versions

	- Transformers 4.35.0
	- Pytorch 2.1.2+cu121
	- Datasets 2.14.6
	- Tokenizers 0.14.1