File size: 3,049 Bytes

---
license: llama3
base_model: meta-llama/Meta-Llama-3-8B
tags:
- ORPO
- llama 3 8B
- conversational
datasets:
- BramVanroy/ultra_feedback_dutch
model-index:
- name: ReBatch/Llama-3-8B-dutch
  results: []
language:
- nl
pipeline_tag: text-generation
---

<p align="center" style="margin:0;padding:0">
  <img src="llama3-8b-dutch-banner.jpeg" alt="Llama 3 dutch banner" width="400" height="400"/>
</p>

<div style="margin:auto; text-align:center">
<h1 style="margin-bottom: 0">Llama 3 8B - Dutch</h1>
<em>A conversational model for Dutch, based on Llama 3 8B</em>
<p><em><a href="https://huggingface.co/spaces/ReBatch/Llama-3-Dutch">Try chatting with the model!</a></em></p>
</div>

This model is a [QLORA](https://huggingface.co/blog/4bit-transformers-bitsandbytes) and [ORPO](https://huggingface.co/docs/trl/main/en/orpo_trainer) fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the synthetic feedback dataset [BramVanroy/ultra_feedback_dutch](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch)


## Model description
This model is a Dutch chat model, originally developed from Llama 3 8B and further refined through a feedback dataset with [ORPO](https://huggingface.co/docs/trl/main/en/orpo_trainer) and trained on [BramVanroy/ultra_feedback_dutch](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch)



## Intended uses & limitations
Although the model has been aligned with gpt-4-turbo output, which has strong content filters, the model could still generate wrong, misleading, and potentially even offensive content. Use at your own risk. 


## Training procedure

The model was trained in bfloat16 with QLORA with flash attention 2 on one GPU - H100 80GB SXM5 for around 24 hours on RunPod. 

## Evaluation Results

The model was evaluated using [scandeval](https://scandeval.com/dutch-nlg/)

The model showed mixed results across different benchmarks; it exhibited slight improvements on some while experiencing a decrease in scores on others. This occurred despite being trained on only 200,000 samples for a single epoch. We are curious to see whether its performance could be enhanced by training with more data or additional epochs.

| Model| conll_nl | dutch_social | scala_nl | squad_nl | wiki_lingua_nl | mmlu_nl | hellaswag_nl |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:
meta-llama/Meta-Llama-3-8B-Instruct | 68.72	| 14.67	| 32.91	| 45.36	| 67.62	| 36.18	| 33.91
ReBatch/Llama-3-8B-dutch | 58.85 | 11.14 | 15.58 | 59.96 | 64.51 | 36.27 | 28.34
meta-llama/Meta-Llama-3-8B | 62.26 | 10.45| 30.3| 62.99| 65.17 | 36.38| 28.33

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 2
- eval_batch_size: 2
- num_devices: 1
- gradient_accumulation_steps: 4
- optimizer: paged_adamw_8bit
- lr_scheduler_type: linear
- warmup_steps: 10
- num_epochs: 1.0
- r: 16
- lora_alpha: 32
- lora_dropout: 0.05