lapp0's picture
End of training
8572e72 verified
|
raw
history blame
5.5 kB
metadata
base_model: gpt2
library_name: distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_gpt2_optim_extended2
    results: []

distily_bench_gpt2_optim_extended2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 603.2673
  • eval_frwikippl: 3866.3679
  • eval_zhwikippl: 9060.9883
  • eval_loss: 6355.0508
  • eval_runtime: 64.6366
  • eval_samples_per_second: 46.413
  • eval_steps_per_second: 11.603

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: 'legacy'
  • loss_fn: kl
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.3344 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2385 57.2728 18.1772
0 0 55332.9297 57511.9648 333834.9375 63.8516 46.984 11.746 57797.4375
500 0.0269 2446.0188 10865.3799 11817.8984 64.1124 46.793 11.698 39870.7812
1000 0.0539 1804.3785 6767.0361 9836.2031 64.095 46.806 11.701 19923.8262
1500 0.0808 1456.1499 5625.2583 9170.7520 65.271 45.962 11.491 18979.7988
2000 0.1077 1255.2349 5859.0298 8742.0908 64.4753 46.529 11.632 17829.9570
2500 0.1347 1123.1558 5142.7266 8474.3467 64.6172 46.427 11.607 18204.0723
3000 0.1616 1041.0769 5179.4790 8192.3838 64.0965 46.804 11.701 16922.875
3500 0.1886 948.0062 4929.7056 7911.0400 64.5488 46.476 11.619 22088.8789
4000 0.2155 899.1066 4752.2407 7641.6426 65.5993 45.732 11.433 16942.1074
4500 0.2424 843.3125 4732.0117 7480.7788 64.5158 46.5 11.625 13217.6758
5000 0.2694 796.5746 4456.2817 7343.6479 65.0161 46.142 11.536 12772.6074
5500 0.2963 772.2271 4386.7627 7222.3145 65.0008 46.153 11.538 11082.3330
6000 0.3232 723.9974 4267.7817 7016.9600 64.7743 46.315 11.579 9581.7812
6500 0.3502 696.7773 4287.5391 6892.1387 64.6727 46.387 11.597 8422.7246
7000 0.3771 679.4652 4046.8250 6773.9629 64.6977 46.369 11.592 7275.9604
7500 0.4040 667.8522 4138.4370 6713.6533 65.028 46.134 11.533 8175.5986
8000 0.4310 647.4772 3977.0999 6626.9331 64.3886 46.592 11.648 5914.0166
8500 0.4579 627.8210 3850.3174 6548.4160 64.3532 46.618 11.654 7728.6548
9000 0.4848 608.0646 3773.8511 6449.4614 64.3549 46.616 11.654 7419.7065
9500 0.5118 603.2673 3866.3679 6355.0508 64.6366 46.413 11.603 9060.9883
10000 0.5387 588.2559 3563.7371 6282.0479 65.1489 46.048 11.512 7187.1206
10500 0.5657 569.4130 3654.1926 6309.9839 64.8852 46.235 11.559 7732.7837
11000 0.5926 572.8280 3728.8887 6206.9868 65.1196 46.069 11.517 6973.9194
11500 0.6195 551.1736 3640.4358 6146.9331 65.3439 45.911 11.478 5983.9292
12000 0.6465 544.3150 3507.0312 6073.0454 65.3717 45.891 11.473 5726.3408
12500 0.6734 538.8688 3312.2402 6032.6079 65.1968 46.015 11.504 5642.0854
13000 0.7003 525.2048 3317.0325 6042.6240 65.216 46.001 11.5 11299.7695
13500 0.7273 516.2283 3381.7358 5946.3682 67.4205 44.497 11.124 7501.9004
14000 0.7542 508.5393 3201.6807 5921.8345 65.0932 46.088 11.522 7485.8843
14500 0.7811 499.8382 3091.7612 5887.8721 65.2716 45.962 11.49 5927.4609
15000 0.8081 491.9155 3132.6841 5930.3252 65.4781 45.817 11.454 7431.6040
15500 0.8350 485.9736 3050.2964 5844.8960 65.1349 46.058 11.515 6106.6260
16000 0.8620 483.0016 2964.3213 5828.2241 65.6241 45.715 11.429 5001.1572
16500 0.8889 480.1220 2957.3284 5789.1626 65.5498 45.767 11.442 4932.5088
17000 0.9158 470.7449 2851.3689 5783.3174 65.2632 45.968 11.492 4651.6655
17500 0.9428 471.4951 2821.2729 5762.3945 65.89 45.53 11.383 4335.2710
18000 0.9697 467.4575 2898.3936 5772.5654 65.0307 46.132 11.533 3703.9866
18500 0.9966 465.3025 2792.5769 5640.9438 65.1725 46.032 11.508 4174.7715
18562 1.0000 459.9143 2775.4995 5699.1255 65.9141 45.514 11.378 4052.5564

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0