File size: 4,095 Bytes
f075e93 60a0b4c 963fcb0 7f20921 cf001c1 963fcb0 60a0b4c 0b16acd 7f20921 f075e93 963fcb0 7f20921 e1b5804 a0ece0a e1b5804 171d2c8 cf001c1 171d2c8 f075e93 963fcb0 f075e93 963fcb0 f075e93 e222cd4 f075e93 e222cd4 f075e93 cf001c1 a0ece0a cf001c1 f075e93 cf001c1 f075e93 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
---
license: apache-2.0
pipeline_tag: text-generation
tags:
- multilingual
- PyTorch
- Transformers
- gpt3
- gpt2
- Deepspeed
- Megatron
datasets:
- mc4
- Wikipedia
widget:
- text: "Ich weiß, dass du müde bist, aber können wir heute Abend noch einen Spaziergang machen? peter szemraj: ich"
example_title: "walk - Deutsch"
- text: "peter szemraj: 我喜欢穿很酷的衣服"
example_title: "fashion - Chinese"
- text: "Wat zei je over mijn moeder? peter szemraj: Ik"
example_title: "🚎 - Dutch"
- text: "Zagadka: Najpierw mnie zjadasz, a potem sam zostajesz zjedzony. Czym ja jestem? peter szemraj: Czy to"
example_title: "brain teaser - Polish"
- text: "Minha amiga diz que conhece todas as línguas, mas não fala nenhuma delas... o que há de errado com ela? peter szemraj: Eu"
example_title: "language - Portuguese"
- text: "se potesse vivere ovunque, dove sarebbe? peter szemraj: Io"
example_title: "dream living place - Italian"
- text: "Can you take me for dinner somewhere nice this time?\npeter szemraj:\n\n"
example_title: "dinner"
- text: "What really makes you angry?\npeter szemraj:\n\n"
example_title: "pet peeve"
- text: "Jak nazwać aligatora, który właśnie przeszedł operację usunięcia lewego ramienia?peter szemraj: Ja"
example_title: "alligator - Polish"
- text: "Warum sind Transformers für die Sprachmodellierung wichtig? peter szemraj: Es ist"
example_title: "Transformers - German"
- text: "как написать хорошие подсказки для языковых моделей? peter szemraj: сначала вам нужно"
example_title: "prompt tutorial - Russian"
- text: "Pewien mężczyzna wpycha swój samochód do hotelu i mówi właścicielowi, że jest bankrutem. Dlaczego? peter szemraj: może"
example_title: "brain teaser - Polish 2"
- text: "Zagadka: Mówię bez ust i słyszę bez uszu. Nie mam ciała, ale ożywiam się wraz z wiatrem. Czym jestem? peter szemraj: Czy to"
example_title: "brain teaser - Polish 3"
inference:
parameters:
min_length: 2
max_length: 64
no_repeat_ngram_size: 3
do_sample: True
top_p: 0.95
top_k: 50
temperature: 0.65
repetition_penalty: 3.5
---
# mGPT: fine-tune on message data - 2E
- This model is a fine-tuned version of [sberbank-ai/mGPT](https://huggingface.co/sberbank-ai/mGPT) on 80k messages. This builds on the minimum-working-example checkpoint [here](https://huggingface.co/pszemraj/mGPT-Peter-mwe).
- 2E = 2 epochs
## Model description
- testing if fine-tuned personality data bleeds over to other languages without being trained in them explicitly
**Interesting findings thus far:**
- Passing a generic word after the `<name-identifier>` that is in a non-English language helps ensure the model responds in the question language (see: any example).
- Model generations (in general) remain semantically consistent, even if the generations switch from `<language>`to English in the middle of the generated text. This demonstrates some sort of "universal concept understanding"
### Usage in python
Install the transformers library if you don't have it:
```
pip install -U transformers
```
load the model into a pipeline object:
```
from transformers import pipeline
import torch
device = 'cuda' if torch.cuda.is_available() else 'cpu'
my_chatbot = pipeline('text-generation',
'pszemraj/mGPT-Peter-2E',
device=0 if device == 'cuda' else -1,
)
```
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine_with_restarts
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1 (in addition to all training on prior checkpoints)
### Framework versions
- Transformers 4.18.0
- Pytorch 1.11.0+cu113
- Datasets 2.1.0
- Tokenizers 0.12.1
|