File size: 5,887 Bytes
62dfd87 6c75e9b 62dfd87 6b576e8 6c75e9b 6b576e8 6c75e9b 004d49b 6c75e9b b51e819 0bf986a b51e819 6c75e9b 0bf986a 6c75e9b 0bf986a 6c75e9b 0bf986a 6c75e9b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
---
library_name: transformers
tags:
- chocolatine
- dpo
license: apache-2.0
datasets:
- jpacifico/french-orca-dpo-pairs-revised
language:
- fr
- en
---
### Chocolatine-2-14B-Instruct-v2.0.3
DPO fine-tuning of the merged model [jpacifico/Chocolatine-2-merged-qwen25arch](https://huggingface.co/jpacifico/Chocolatine-2-merged-qwen25arch) (Qwen-2.5-14B architecture)
using the [jpacifico/french-orca-dpo-pairs-revised](https://huggingface.co/datasets/jpacifico/french-orca-dpo-pairs-revised) RLHF dataset.
Training in French also improves the model's overall capabilities.
> [!TIP] Window context : up to 128K tokens
### OpenLLM Leaderboard
Chocolatine-2 is the best-performing 14B fine-tuned model (Ex-aequo with avg. score 41.08) on the [OpenLLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
[Updated 2025-02-12]
| Metric |Value|
|-------------------|----:|
|**Avg.** |**41.08**|
|IFEval |70.37|
|BBH |50.63|
|MATH Lvl 5 |40.56|
|GPQA |17.23|
|MuSR |19.07|
|MMLU-PRO |48.60|
### LLM Leaderboard FR
Top 3 all categories on the French Government [Leaderboard LLM FR](https://huggingface.co/spaces/fr-gouv-coordination-ia/llm_leaderboard_fr#/)

[Updated 2025-02-15]
### MT-Bench-French
Chocolatine-2 outperforms its previous versions and its base architecture Qwen-2.5 model on [MT-Bench-French](https://huggingface.co/datasets/bofenghuang/mt-bench-french), used with [multilingual-mt-bench](https://github.com/Peter-Devine/multilingual_mt_bench) and GPT-4-Turbo as a LLM-judge.
My goal was to achieve GPT-4o-mini's performance on the French language, this version equals the performance of the OpenAI model according to this benchmark
```
########## First turn ##########
score
model turn
gpt-4o-mini 1 9.287500
Chocolatine-2-14B-Instruct-v2.0.3 1 9.112500
Qwen2.5-14B-Instruct 1 8.887500
Chocolatine-14B-Instruct-DPO-v1.2 1 8.612500
Phi-3.5-mini-instruct 1 8.525000
Chocolatine-3B-Instruct-DPO-v1.2 1 8.375000
DeepSeek-R1-Distill-Qwen-14B 1 8.375000
phi-4 1 8.300000
Phi-3-medium-4k-instruct 1 8.225000
gpt-3.5-turbo 1 8.137500
Chocolatine-3B-Instruct-DPO-Revised 1 7.987500
Meta-Llama-3.1-8B-Instruct 1 7.050000
vigostral-7b-chat 1 6.787500
Mistral-7B-Instruct-v0.3 1 6.750000
gemma-2-2b-it 1 6.450000
########## Second turn ##########
score
model turn
Chocolatine-2-14B-Instruct-v2.0.3 2 9.050000
gpt-4o-mini 2 8.912500
Qwen2.5-14B-Instruct 2 8.912500
Chocolatine-14B-Instruct-DPO-v1.2 2 8.337500
DeepSeek-R1-Distill-Qwen-14B 2 8.200000
phi-4 2 8.131250
Chocolatine-3B-Instruct-DPO-Revised 2 7.937500
Chocolatine-3B-Instruct-DPO-v1.2 2 7.862500
Phi-3-medium-4k-instruct 2 7.750000
gpt-3.5-turbo 2 7.679167
Phi-3.5-mini-instruct 2 7.575000
Meta-Llama-3.1-8B-Instruct 2 6.787500
Mistral-7B-Instruct-v0.3 2 6.500000
vigostral-7b-chat 2 6.162500
gemma-2-2b-it 2 6.100000
########## Average ##########
score
model
gpt-4o-mini 9.100000
Chocolatine-2-14B-Instruct-v2.0.3 9.081250
Qwen2.5-14B-Instruct 8.900000
Chocolatine-14B-Instruct-DPO-v1.2 8.475000
DeepSeek-R1-Distill-Qwen-14B 8.287500
phi-4 8.215625
Chocolatine-3B-Instruct-DPO-v1.2 8.118750
Phi-3.5-mini-instruct 8.050000
Phi-3-medium-4k-instruct 7.987500
Chocolatine-3B-Instruct-DPO-Revised 7.962500
gpt-3.5-turbo 7.908333
Meta-Llama-3.1-8B-Instruct 6.918750
Mistral-7B-Instruct-v0.3 6.625000
vigostral-7b-chat 6.475000
gemma-2-2b-it 6.275000
```
### Usage
You can run this model using my [Colab notebook](https://github.com/jpacifico/Chocolatine-LLM/blob/main/Chocolatine_14B_inference_test_colab.ipynb)
You can also run Chocolatine-2 using the following code:
```python
import transformers
from transformers import AutoTokenizer
# Format prompt
message = [
{"role": "system", "content": "You are a helpful assistant chatbot."},
{"role": "user", "content": "What is a Large Language Model?"}
]
tokenizer = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)
# Create pipeline
pipeline = transformers.pipeline(
"text-generation",
model=new_model,
tokenizer=tokenizer
)
# Generate text
sequences = pipeline(
prompt,
do_sample=True,
temperature=0.7,
top_p=0.9,
num_return_sequences=1,
max_length=200,
)
print(sequences[0]['generated_text'])
```
### Limitations
The Chocolatine-2 model series is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance.
It does not have any moderation mechanism.
- **Developed by:** Jonathan Pacifico, 2025
- **Model type:** LLM
- **Language(s) (NLP):** French, English
- **License:** Apache-2.0
Made with ❤️ in France |