zh-kr_mid

This model is a fine-tuned version of facebook/mbart-large-cc25 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5557
  • Bleu: 16.6036
  • Gen Len: 15.4901

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 40

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
2.7248 0.75 1000 1.9410 3.2381 48.6095
1.5683 1.5 2000 1.6889 10.2345 20.4433
1.1916 2.25 3000 1.6843 13.4571 18.8854
1.068 2.99 4000 1.6390 15.6862 15.5054
0.7313 3.74 5000 1.7003 15.2014 16.5938
0.4832 4.49 6000 1.8982 15.0381 16.9068
0.3862 5.24 7000 2.1426 15.5397 15.6451
0.3675 5.99 8000 2.1168 15.8847 15.6926
0.2627 6.74 9000 2.2603 16.3603 15.9671
0.1955 7.49 10000 2.4114 15.7447 15.979
0.171 8.23 11000 2.5141 15.7852 15.9244
0.1702 8.98 12000 2.5557 16.6036 15.4901
0.1298 9.73 13000 2.6536 16.1319 15.5492
0.1052 10.48 14000 2.7586 16.1807 15.8884
0.2268 11.23 15000 2.7258 15.1752 15.5346
0.1327 11.98 16000 2.7193 15.8563 15.7971

Framework versions

  • Transformers 4.34.0
  • Pytorch 2.1.0+cu121
  • Datasets 2.14.5
  • Tokenizers 0.14.1
Downloads last month
6
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for yesj1234/mbart-mmt_mid1_zh-ko

Finetuned
(23)
this model