File size: 16,805 Bytes
5e97e20
 
51fff37
 
 
 
 
 
 
 
 
 
 
 
 
5e97e20
51fff37
9228e70
 
 
 
 
 
 
 
 
 
 
 
 
 
51fff37
 
 
 
9228e70
51fff37
9228e70
51fff37
 
9228e70
51fff37
 
 
 
 
 
9228e70
51fff37
 
 
 
 
 
 
 
 
 
 
 
9228e70
 
 
51fff37
9228e70
 
51fff37
 
9228e70
51fff37
 
 
9228e70
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51fff37
 
 
 
9228e70
51fff37
 
 
 
9228e70
 
 
 
 
 
 
51fff37
9228e70
51fff37
9228e70
51fff37
9228e70
51fff37
 
9228e70
51fff37
 
 
 
 
9228e70
51fff37
9228e70
51fff37
9228e70
51fff37
 
 
9228e70
 
 
 
 
 
 
51fff37
9228e70
51fff37
9228e70
51fff37
9228e70
51fff37
 
9228e70
51fff37
 
 
 
 
9228e70
51fff37
9228e70
51fff37
9228e70
51fff37
 
 
 
9228e70
51fff37
 
 
 
9228e70
 
 
 
 
 
51fff37
 
 
9228e70
 
 
 
 
 
 
51fff37
 
 
 
 
 
 
 
 
 
 
9228e70
51fff37
 
 
 
 
 
 
 
 
9228e70
 
51fff37
9228e70
51fff37
 
 
 
9228e70
 
 
51fff37
 
 
 
 
 
 
 
 
 
 
 
 
 
9228e70
51fff37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9228e70
51fff37
 
 
 
 
 
 
 
 
 
 
 
 
9228e70
 
51fff37
 
 
9228e70
 
51fff37
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
---
license: mit
datasets:
  - R0k1e/UltraLink
  - stingning/ultrachat
  - ise-uiuc/Magicoder-Evol-Instruct-110K
  - ise-uiuc/Magicoder-OSS-Instruct-75K
language:
  - eng
  - fra
  - rus
  - spa
  - zho
metrics:
  - accuracy
---

<div align="center">

<img src="title.png" alt="UltraLink" width="200">

**multi-lingual, knowledge-grounded, multi-round dialogue dataset and model**

<p align="center">
 <a href="#Introduction"> Introduction </a><a href="#Construction-of-UltraLink">Construction Process</a><a href="https://arxiv.org/abs/2402.04588">Paper</a><a href="https://huggingface.co/datasets/R0k1e/UltraLink"> UltraLink</a><a href="https://github.com/OpenBMB/UltraLink"> Github</a>
</p>
</div>

# Model Card for UltraLink-LM

## Model Summary
> The UltraLink-LM is a massively multilingual generative language model that follows instructions in 5 languages, English, French, Russian, Spanish, and Chinese. The model is capable of generating text in 5 languages with high quality and diversity.
> UltraLink-LM outperforms [PolyLM-Chat-13b](https://huggingface.co/DAMO-NLP-MT/polylm-chat-13b), [Guanaco](JosephusCheung/Guanaco),  and [Bloomz-7b1-mt](https://huggingface.co/bigscience/bloomz-7b1-mt) in code, math and chat abilities in four languages, and has a high-quality and diverse text generation performance in all languages.
> The UltraLink-LM is trained using [UltraLink](https://huggingface.co/datasets/R0k1e/UltraLink), [UltraChat](https://huggingface.co/datasets/stingning/ultrachat), [Magicoder-Evol](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K), [Magicoder-OSS](https://huggingface.co/datasets/ise-uiuc/Magicoder-OSS-Instruct-75K), [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA), and [ShareGPT](https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset/).
> We release the checkpoints under a MIT license to further our mission of multilingual technologies empowering a multilingual world.

- **Developed by:** [OpenBMB]((https://www.openbmb.cn/home))
- **Model type:** a Transformer style autoregressive massively multilingual language model.
- **Paper**: [UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset](https://arxiv.org/abs/2402.04588)
- **Languages**: Refer to the list of languages in the `language` section of this model card.
- **License**: MIT
- **Model**: [UltraLink-LM](https://huggingface.co/R0k1e/UltraLink-LM)
- **Model Size**: 13 billion parameters
- **Datasets**: [UltraLink](https://huggingface.co/datasets/R0k1e/UltraLink), [UltraChat](https://huggingface.co/datasets/stingning/ultrachat)(random select 10k samples), [Magicoder-Evol](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K), [Magicoder-OSS](https://huggingface.co/datasets/ise-uiuc/Magicoder-OSS-Instruct-75K), [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA), and [ShareGPT](https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset/)(the English part of the dataset whose sample length is greater than 4k).

## Use

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "R0k1e/UltraLink-LM"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
ultralink_lm = AutoModelForCausalLM.from_pretrained(checkpoint)

# Chat abilities in Chinese
# What is heavy cavalry?
first_question = "<s>[INST] 什么是重骑兵? [/INST]"
chat_inputs = tokenizer.encode(first_question, add_special_tokens=False, return_tensors="pt")
chat_outputs = ultralink_lm.generate(chat_inputs, max_new_tokens=512)
first_answer = tokenizer.decode(chat_outputs[0])
print(first_answer)
# Expected output: 
"""
<s> [INST] 什么是重骑兵? [/INST] 重骑兵是一种历史上的战斗单位,通常由骑兵组成,他们在战斗中使用重型装甲和长矛。他们以在战场上的强大攻击能力而闻名,并且通常被用于突破敌军阵线或攻击敌方骑兵。重骑兵通常被认为是中世纪战争中最强大和最具威慑力的单位之一。</s>
"""
# Translations in English:
"""
<s> [INST] What is heavy cavalry? [/INST] The Heavy Cavalry is a historical fighting unit, usually consisting of cavalrymen who use heavy armour and spears in battle. They were known for their powerful attacks on the battlefield and were often used to break through enemy lines or attack enemy cavalry. Heavy cavalry is often considered one of the most powerful and intimidating units in medieval warfare.</s>
"""

# Second turn:
second_question = "<s>[INST] 重骑兵对中世纪的战场有哪些影响? [/INST]"
second_question = first_answer + second_question 
chat_inputs = tokenizer.encode(second_question, add_special_tokens=False, return_tensors="pt")
chat_outputs = ultralink_lm.generate(chat_inputs, max_new_tokens=512)
second_answer = tokenizer.decode(chat_outputs[0])
print(second_answer)
# Expected output: 
"""
<s>  [INST] 什么是重骑兵? [/INST] 重骑兵是一种历史上的战斗单位,通常由骑兵组成,他们在战斗中使用重型装甲和长矛。他们以在战场上的强大攻击能力而闻名,并且
通常被用于突破敌军阵线或攻击敌方骑兵。重骑兵通常被认为是中世纪战争中最强大和最具威慑力的单位之一。</s><s> [INST] 重骑兵对中世纪的战场有哪些影响? [/INST]
 重骑兵在中世纪的战场上起到了重要的作用。他们的强大攻击能力使他们成为战斗中的强大力量,并且他们的存在常常能够改变战斗的结果。重骑兵通常被用于突破敌军阵线
,并在战斗中创造突破口,这使得其他部队能够进入敌方阵地。他们还被用于攻击敌方骑兵,并且他们的重型装甲和长矛使他们在这种情况下具有优势。总的来说,重骑兵的 
存在使得中世纪的战场更加复杂和不可预测,他们的存在对战斗的结果产生了重大影响。</s>      
"""
#Translations in English:
"""
<s> [INST] What is heavy cavalry? [/INST] The Heavy Cavalry is a historical fighting unit, usually consisting of cavalrymen who use heavy armour and spears in battle. They were known for their powerful attacks on the battlefield and were often used to break through enemy lines or attack enemy cavalry. Heavy cavalry is often considered one of the most powerful and intimidating units in medieval warfare.</s><s>[INST] What impact did heavy cavalry have on the medieval battlefield? [/INST] Heavy cavalry played an important role on the battlefields of the Middle Ages. Their powerful attacks made them a formidable force in battle and their presence could often change the outcome of a battle. Heavy cavalry were often used to break through enemy lines and create breaches in battle, which allowed other troops to move into enemy positions. They were also used to attack enemy cavalry, and their heavy armour and spears gave them an advantage in such situations. Overall, the presence of heavy cavalry made the medieval battlefield more complex and unpredictable, and their presence had a major impact on the outcome of battles.</s>
"""

# Code abilities in Russian
# Please implement a bubble sort algorithm in Python.
code_inputs = tokenizer.encode("<s>[INST] Реализуйте алгоритм пузырьковой сортировки на Python. [/INST]", add_special_tokens=False, return_tensors="pt")
code_outputs = ultralink_lm.generate(code_inputs, max_new_tokens=512)
print(tokenizer.decode(code_outputs[0]))
# Expected output: 
"""                                                                                 
<s> [INST] Реализуйте алгоритм пузырьковой сортировки на Python. [/INST] Конечно, вот простая реализация алгоритма пузырьковой сортировки на Python:

```python
def bubbleSort(arr):
    n = len(arr)

    # Проходимся по всем элементам массива
    for i in range(n):
        # Последние i элементов уже отсортированы
        for j in range(0, n-i-1):
            # Проходимся по массиву, пока не найдем элемент, больший, чем следующий
            if arr[j] > arr[j+1]:
                # Обмениваем элементы местами
                arr[j], arr[j+1] = arr[j+1], arr[j]

# Проверяем функцию
arr = [64, 34, 25, 12, 22, 11, 90]
bubbleSort(arr)
print("Отсортированный массив:", arr)
\```

Эта реализация работает путем повторного прохождения по массиву и сравнения соседних элементов. Если элемент сравнивается больше, чем следующий элемент, они меняются местами. Этот процесс повторяется до тех пор, пока массив не будет отсортирован.

Вы можете изменить размер массива и его начальное состояние, чтобы увидеть, как функция работает с разными входными данными.

Примечание: Алгоритм пузырьковой сортировки не является эффективным для больших наборов данных, так как его средняя и худшая временная сложность составляют O(n^2), где n - количество элементов в массиве.</s>
"""
# Translations in English:
"""
<s> [INST] Implement the bubble sorting algorithm in Python. [/INST] Sure enough, here's a simple implementation of the bubble sort algorithm in Python:

```python
def bubbleSort(arr):
    n = len(arr)

    # Go through all elements of the array
    for i in range(n):
        # The last i elements are already sorted
        for j in range(0, n-i-1):
            # We traverse the array until we find an element greater than the next one
            if arr[j] > arr[j+1]:
                # Swapping elements
                arr[j], arr[j+1] = arr[j+1], arr[j]

# Check the function
arr = [64, 34, 25, 12, 22, 11, 90]
bubbleSort(arr)
print("Sorted array:", arr)
\```

This implementation works by repeatedly traversing the array and comparing neighbouring elements. If an element is compared more than the next element, they are swapped. This process is repeated until the array is sorted.

You can change the array size and initial state to see how the function works with different input data.

Note: The bubble sort algorithm is not efficient for large datasets because its average and worst-case time complexity are O(n^2), where n is the number of elements in the array.</s>
"""

# Math abilities in French
# When the length of a rectangle is twice its width, calculate the area of the rectangle if its perimeter is 18 units?
math_inputs = tokenizer.encode("<s>[INST] Lorsque la longueur d'un rectangle est le double de sa largeur, calculer l'aire du rectangle si son périmètre est de 18 unités? [/INST]", add_special_tokens=False, return_tensors="pt")
math_outputs = ultralink_lm.generate(math_inputs, max_new_tokens=512)
print(tokenizer.decode(math_outputs[0]))
# Expected output: 
"""
<s> [INST] Lorsque la longueur d'un rectangle est le double de sa largeur, calculer l'aire du rectangle si son périmètre est de 18 unités? [/INST] 
Soit la largeur du rectangle $w$. Alors la longueur du rectangle est $2w$.
Le périmètre du rectangle est $2(w+2w)=18$.
En simplifiant, nous avons $6w=18$, donc $w=3$.
L'aire du rectangle est $w \cdot (2w) = 3 \cdot 6 = \boxed{18}$ unités carrées.
La réponse est : 18</s>
"""
# Translations in English:
"""
<s> [INST] When the length of a rectangle is twice its width, calculate the area of the rectangle if its perimeter is 18 units? [/INST] 
Let $w$ be the width of the rectangle. Then the length of the rectangle is $2w$.
La réponse est : 18
The perimeter of the rectangle is $2(w+2w)=18$. 
Simplifying, we have $6w=18$, so $w=3$. 
The area of the rectangle is $w \cdot (2w) = 3 \cdot 6 = \boxed{18}$ square units. 
The answer is: 18</s>
"""
```

## Model Details

### Finetuning

- Architecture: Same as [Llama-2-13b](https://huggingface.co/meta-llama/Llama-2-13b-hf)
- Number of Samples seen during Finetuning: 1023K
- Batch size: 128
- Hardware: NVIDIA A100 80GB PCIe
- Software: [BMTrain](https://github.com/OpenBMB/BMTrain)

### Data Sources

The UltraLink-LM is trained on the following datasets:

- [UltraLink](https://huggingface.co/datasets/R0k1e/UltraLink)
- [UltraChat](https://huggingface.co/datasets/stingning/ultrachat)
- [Magicoder-Evol](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K)
- [Magicoder-OSS](https://huggingface.co/datasets/ise-uiuc/Magicoder-OSS-Instruct-75K)
- [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA)
- [ShareGPT](https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset/)

We randomly select 10k samples from the UltraChat dataset and use them as the training set. And ShareGPT is filtered to keep only the English part of the dataset whose sample length is greater than 4k. The other datasets are used as auxiliary datasets for training.
All the datasets are integrated into the UltraLink dataset.

## Evaluation

We report three evaluations in this section: multilingual HumanEval, MGSM, and OMGEval.
Evaluations of modern LLMs may be biased and affected by many factors, we are also actively working on more comprehensive evaluation methods.

### Multilingual HumanEval 

[HumanEval](https://github.com/openai/human-eval) is a well-known benchmark for evaluating the code ability of LLMs. It execute the code snippets generated by the model and evaluate their correctness.  Since there are no existing multilingual test set for code generation, we use GPT-3.5 with carefully-designed prompts to translation HumanEval into other languages.

|Model|En|Zh|Es|Ru|Fr|Avg|
|-----|---|---|---|---|---|---|
|Bloomz-7b1-mt | 8.5 | 7.3 | 6.1 | 8.5 | 6.1 | 7.3 |
|Phoenix-inst-chat-7b | 11.0 | 10.4 | 8.5 | 1.2 | 13.4 | 12.2 |
|PolyLM-Multialpaca-13b | 8.5 | 7.3 | 6.1 | 6.1 | 6.1 | 6.8 |
|PolyLM-Chat-13b | 10.4 | 7.9 | 6.1 | 7.3 | 8.5 | 8.1 |
|Chimera-inst-chat-13b| 14.6 | 13.4 | 14.6 | 12.8 | 14.0 | 13.9 |
|Okapi-7b | 12.2 | 11.0 | 8.5 | 8.5 | 8.5 | 9.8 |
|Guanaco-7b | 9.2 | 6.7 | 11.0 | 9.8 | 12.8 | 9.9 |
|Guanaco-13b| 18.3 | 15.9 | 9.8 | 8.5 | 14.6 | 12.2 |
|UltraLink-LM  | __60.4__ | __43.9__ | __40.9__ | __49.4__ | __39.6__ | __46.8__|


### MGSM

We employ [MGSM](https://github.com/google-research/url-nlp/tree/main/mgsm) to evaluate the math reasoning abilities, which is a multilingual benchmark. It compares the result with correct answers and evaluates the model's ability to perform mathematical reasoning.
|Model|En|Zh|Es|Ru|Fr|Avg|
|-----|---|---|---|---|---|---|
|Bloomz-7b1-mt | 2.8 | 1.6 | 2.0 | 0.4 | 2.8 | 1.7 |
|Phoenix-inst-chat-7b | 3.2 | 3.2 | 2.8 | 3.2 | 3.2 | 3.1 |
|PolyLM-Multialpaca-13b | 1.2 | 2.8 | 1.6 | 2.8 | 2.4 | 2.4 |
|PolyLM-Chat-13b | 10.8 | 6.4 | 4.8 | 4.4 | 5.6 | 5.3 |
|Chimera-inst-chat-13b  | 14.0 | 11.6 | 10.0 | 12.0 | 12.8 | 11.6 |
|Okapi-7b | 4.0 | 2.4 | 3.6 | 4.4 | 4.8 | 3.8 |
|Guanaco-7b | 4.0 | 1.6 | 3.2 | 2.8 | 4.4 | 3.0 |
|Guanaco-13b | 13.6 | 10.8 | 11.2 | 6.4 | 5.2 | 8.4 |
|UltraLink-LM| __70.4__ | __56.0__ | __70.4__ | __64.8__ | __63.6__ | __63.7__ |

### OMGEval
We use the [OMGEval](https://github.com/blcuicall/OMGEval) to evaluate the chat ability, which is a multilingual version of the widely-used English benchmark AlpacaEval.

|Model|En|Zh|Es|Ru|Fr|Avg|
|-----|---|---|---|---|---|---|
|Bloomz-7b1-mt | 0.0 | 0.9 | 0.1 | 0.5 | 0.3 | 0.4 |
|Phoenix-inst-chat-7b  | 6.9 | 13.3 | 7.4 | 2.9 | 8.1 | 7.7 |
|PolyLM-Multialpaca-13b  | 3.4 | 5.0 | 2.1 | 5.1 | 2.2 | 3.6 |
|PolyLM-Chat-13b | 7.7 | 14.0 | 6.1 | 5.5 | 4.8 | 7.6 |
|Chimera-inst-chat-13b | 15.5 | 9.7 | 11.8 | 13.7 | 13.8 | 12.9 |
|Okapi-7b | 8.8 | 6.2 | 5.0 | 12.1 | 8.7 | 8.2 |
|Guanaco-7b  | 4.6 | 3.8 | 0.4 | 1.8 | 1.2 | 2.4 |
|Guanaco-13b  |  __29.0__ | 8.6 | 16.9 | 15.4 | 17.3 | 17.5 |
|UltraLink-LM |  28.8 |  __21.9__ |  __23.5__ | __37.6__ | __29.0__ |  __28.2__  |

## Citation

Feel free to cite the repo if you think UltraLink is useful.

```bibtex
@misc{wang2024ultralink,
      title={UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset}, 
      author={Haoyu Wang and Shuo Wang and Yukun Yan and Xujia Wang and Zhiyu Yang and Yuzhuang Xu and Zhenghao Liu and Ning Ding and Xu Han and Zhiyuan Liu and Maosong Sun},
      year={2024},
      eprint={2402.04588},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```