File size: 4,406 Bytes
f075e93
 
 
 
 
 
 
 
 
 
 
24d18c8
f075e93
 
 
 
 
60a0b4c
963fcb0
7f20921
 
7663e8d
7f20921
7663e8d
963fcb0
7663e8d
0b16acd
7663e8d
7f20921
7663e8d
f075e93
7663e8d
963fcb0
7663e8d
7f20921
7663e8d
e1b5804
a0ece0a
e1b5804
171d2c8
cf001c1
7663e8d
cf001c1
7663e8d
df7329f
f5ea542
 
171d2c8
f075e93
 
 
963fcb0
f075e93
ae453e6
931262d
f075e93
 
ae453e6
 
931262d
f075e93
 
 
 
e222cd4
f075e93
e222cd4
 
f075e93
 
 
 
 
cf001c1
 
 
a0ece0a
cf001c1
f075e93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cf001c1
f075e93
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
license: apache-2.0
pipeline_tag: text-generation
tags:
- multilingual
- PyTorch
- Transformers
- gpt3
- gpt2
- Deepspeed
- Megatron
- mGPT
datasets:
- mc4
- Wikipedia

widget:
- text: "Ich weiß, dass du müde bist, aber können wir heute Abend noch einen Spaziergang machen? peter szemraj: ich" 
  example_title: "walk - Deutsch"
- text: "peter szemraj: 我喜欢穿很酷的衣服"
  example_title: "fashion - Chinese"
- text: "Wat zei je over mijn moeder? peter szemraj: ik"
  example_title: "🚎 - Dutch"
- text: "Zagadka: Człowiekowi, który przebywał na dworze w deszczu bez parasola czy kapelusza, nie zmoczył się ani jeden włos na głowie. Dlaczego? peter szemraj: czy to"
  example_title: "brain teaser - Polish"
- text: "Minha amiga diz que conhece todas as línguas, mas não fala nenhuma delas... o que há de errado com ela? peter szemraj: eu" 
  example_title: "language - Portuguese"
- text: "se potesse vivere ovunque, dove sarebbe? peter szemraj: io" 
  example_title: "dream living place - Italian"
- text: "Can you take me for dinner somewhere nice this time? peter szemraj:"
  example_title: "dinner"
- text: "What really makes you angry? peter szemraj:" 
  example_title: "pet peeve"
- text: "Jak nazwać aligatora, który właśnie przeszedł operację usunięcia lewego ramienia?peter szemraj: ja" 
  example_title: "alligator - Polish"
- text: "Warum sind Transformers für die Sprachmodellierung wichtig?  peter szemraj: es ist"
  example_title: "Transformers - German"
- text: "как написать хорошие подсказки для языковых моделей? peter szemraj: сначала вам нужно"
  example_title: "prompt tutorial - Russian"
- text: "Pewien mężczyzna wpycha swój samochód do hotelu i mówi właścicielowi, że jest bankrutem. Dlaczego? peter szemraj: może"
  example_title: "brain teaser - Polish 2"
- text: "Zagadka: Mówię bez ust i słyszę bez uszu. Nie mam ciała, ale ożywiam się wraz z wiatrem. Czym jestem? peter szemraj: czy to"
  example_title: "brain teaser - Polish 3"
- text: "Què t'agrada fer per divertir-te? peter szemraj: m'agrada"
  example_title: "hobbies - Catalan"
- text: "为什么你总是那么累?peter szemraj: 呃,我想"
  example_title: "tired - Chinese"
  
inference:
  parameters:
    min_length: 2
    max_length: 64
    do_sample: True
    top_k: 10
    top_p: 0.9
    temperature: 0.65
    repetition_penalty: 3.5
    no_repeat_ngram_size: 3
    length_penalty: 0.4
    pad_token: 1
    
---


# mGPT: fine-tune on message data - 2E

- This model is a fine-tuned version of [sberbank-ai/mGPT](https://huggingface.co/sberbank-ai/mGPT) on 80k messages. This builds on the minimum-working-example checkpoint [here](https://huggingface.co/pszemraj/mGPT-Peter-mwe).
- 2E = 2 epochs

## Model description

- testing if fine-tuned personality data bleeds over to other languages without being trained in them explicitly 

**Interesting findings thus far:**

- Passing a generic word after the `<name-identifier>` that is in a non-English language helps ensure the model responds in the question language (see: any example).
- Model generations (in general) remain semantically consistent, even if the generations switch from  `<language>`to English in the middle of the generated text. This demonstrates some sort of "universal concept understanding"

### Usage in python

Install the transformers library if you don't have it:

```
pip install -U transformers
```

load the model into a pipeline object:


```
from transformers import pipeline
import torch
device = 'cuda' if torch.cuda.is_available() else 'cpu'
my_chatbot = pipeline('text-generation', 
                      'pszemraj/mGPT-Peter-2E',
                      device=0 if device == 'cuda' else -1,
                    )
```




## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine_with_restarts
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1 (in addition to all training on prior checkpoints)

### Framework versions

- Transformers 4.18.0
- Pytorch 1.11.0+cu113
- Datasets 2.1.0
- Tokenizers 0.12.1