lapp0's picture
End of training
d5a3ccf verified
|
raw
history blame
3.14 kB
metadata
base_model: gpt2
library_name: distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_gpt2_optim
    results: []

distily_bench_gpt2_optim

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 854.1211
  • eval_frwikippl: 4984.0498
  • eval_zhwikippl: 8071.4624
  • eval_loss: 7592.3198
  • eval_runtime: 22.093
  • eval_samples_per_second: 45.263
  • eval_steps_per_second: 11.316

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: 'legacy'
  • loss_fn: kl
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 4.6175 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2385 57.2728 18.1772
0 0 55339.3672 57682.5742 331776.0 21.5321 46.442 11.611 57080.2930
500 0.0808 2252.7444 10031.8955 12590.8477 21.6698 46.147 11.537 54057.1758
1000 0.1616 1772.2999 6312.1328 10713.8564 21.7782 45.917 11.479 20088.1348
1500 0.2424 1526.4294 5867.9150 9839.7441 21.9107 45.64 11.41 12735.1455
2000 0.3232 1386.3311 5905.2700 9335.0400 21.6766 46.133 11.533 12883.1006
2500 0.4040 1285.9930 5870.4004 9057.4082 21.7814 45.911 11.478 12968.5391
3000 0.4848 1184.0109 5485.2373 8730.4961 21.825 45.819 11.455 11484.6025
3500 0.5657 1126.2782 5563.9180 8546.9443 21.8317 45.805 11.451 12904.6191
4000 0.6465 1054.3176 5538.6753 8247.4883 21.7217 46.037 11.509 12877.9414
4500 0.7273 994.4172 5374.4102 8100.9922 21.713 46.055 11.514 11938.875
5000 0.8081 946.4249 5192.8228 7962.6880 21.8256 45.818 11.454 9305.625
5500 0.8889 910.4888 5210.4282 7757.3120 22.1477 45.152 11.288 9479.9629
6000 0.9697 871.6422 5126.9775 7617.0239 22.8323 43.798 10.949 8336.0049
6187 0.9999 854.1211 4984.0498 7592.3198 22.093 45.263 11.316 8071.4624

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0