metadata

language:
  - de
  - en
license: apache-2.0
base_model: google/mt5-small
tags:
  - generated_from_trainer
datasets:
  - lilferrit/wmt14-short
metrics:
  - bleu
model-index:
  - name: ft-wmt14-5
    results:
      - task:
          name: Translation
          type: translation
        dataset:
          name: lilferrit/wmt14-short
          type: lilferrit/wmt14-short
        metrics:
          - name: Bleu
            type: bleu
            value: 20.7584

ft-wmt14-5

This model is a fine-tuned version of google/mt5-small on the lilferrit/wmt14-short dataset. It achieves the following results on the evaluation set:

Loss: 2.0604
Bleu: 20.7584
Gen Len: 30.499

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adafactor
lr_scheduler_type: constant
training_steps: 100000

Training results

Training Loss	Epoch	Step	Bleu	Gen Len	Validation Loss
1.9166	0.2778	10000	15.8119	32.097	2.3105
1.7184	0.5556	20000	17.5903	31.1153	2.1993
1.6061	0.8333	30000	18.9604	30.327	2.1380
1.516	1.1111	40000	19.1444	30.2727	2.1366
1.4675	1.3889	50000	19.7588	30.1127	2.1208
1.4416	1.6667	60000	19.9263	30.4463	2.0889
1.4111	1.9444	70000	2.0795	20.3323	30.1207
1.3603	2.2222	80000	2.0850	20.5373	30.5943
1.3378	2.5	90000	2.0604	20.7584	30.499
1.3381	2.7778	100000	2.0597	20.6113	30.701

Framework versions

Transformers 4.40.0
Pytorch 2.2.2+cu121
Datasets 2.19.0
Tokenizers 0.19.1