metadata
base_model: gpt2
library_name: distily
license: mit
tags:
- generated_from_trainer
model-index:
- name: distily_bench_gpt2_optim_extended2
results: []
distily_bench_gpt2_optim_extended2
This student model is distilled from the teacher model gpt2 using the dataset (unspecified).
The Distily library was used for this distillation.
It achieves the following results on the evaluation set:
- eval_enwikippl: 1466.9598
- eval_frwikippl: 6589.9976
- eval_zhwikippl: 19049.6328
- eval_loss: 8530.3359
- eval_runtime: 64.7254
- eval_samples_per_second: 46.35
- eval_steps_per_second: 11.587
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- distillation_objective: 'legacy'
- loss_fn: kl
- train_embeddings: True
- learning_rate: 4e-05
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- num_epochs: 1.0
Resource Usage
Peak GPU Memory: 8.3354 GB
Eval-Phase Metrics
step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
---|---|---|---|---|---|---|---|---|
teacher eval | 30.2385 | 57.2728 | 18.1772 | |||||
0 | 0 | 55332.9297 | 57511.9648 | 333834.9375 | 64.4894 | 46.519 | 11.63 | 57797.4375 |
500 | 0.0269 | 3397.8057 | 14195.7314 | 11200.1709 | 64.3161 | 46.645 | 11.661 | 46176.3906 |
1000 | 0.0539 | 2565.4185 | 11100.7803 | 10401.7070 | 64.9732 | 46.173 | 11.543 | 40786.25 |
1500 | 0.0808 | 2280.1555 | 9752.9180 | 10029.2695 | 65.1147 | 46.073 | 11.518 | 34300.0664 |
2000 | 0.1077 | 2111.7202 | 8617.1777 | 9861.6855 | 65.0861 | 46.093 | 11.523 | 27128.5918 |
2500 | 0.1347 | 1990.7386 | 8209.1553 | 9601.2373 | 64.8934 | 46.23 | 11.557 | 25209.2168 |
3000 | 0.1616 | 1918.3867 | 7799.5220 | 9467.9785 | 64.886 | 46.235 | 11.559 | 22736.8027 |
3500 | 0.1886 | 1818.1265 | 7551.1548 | 9349.7920 | 64.7154 | 46.357 | 11.589 | 22582.4883 |
4000 | 0.2155 | 1769.4467 | 7458.5562 | 9246.7197 | 64.7466 | 46.334 | 11.584 | 21114.0508 |
4500 | 0.2424 | 1728.6010 | 7363.9741 | 9099.1787 | 65.1202 | 46.069 | 11.517 | 20729.8926 |
5000 | 0.2694 | 1704.3433 | 7453.2944 | 9068.9062 | 64.69 | 46.375 | 11.594 | 21740.6367 |
5500 | 0.2963 | 1664.6129 | 7184.9824 | 8969.5039 | 64.2668 | 46.68 | 11.67 | 20534.2910 |
6000 | 0.3232 | 1631.8164 | 7198.6724 | 8898.6348 | 65.558 | 45.761 | 11.44 | 22204.2188 |
6500 | 0.3502 | 1589.2347 | 6884.9448 | 8812.0322 | 64.8035 | 46.294 | 11.573 | 19131.2129 |
7000 | 0.3771 | 1553.9370 | 6727.0781 | 8747.2002 | 65.3644 | 45.897 | 11.474 | 18709.2949 |
7500 | 0.4040 | 1540.8395 | 6779.4512 | 8707.7334 | 64.9958 | 46.157 | 11.539 | 18515.4297 |
8000 | 0.4310 | 1519.5702 | 6720.9155 | 8684.7471 | 65.1941 | 46.016 | 11.504 | 19323.7656 |
8500 | 0.4579 | 1499.4967 | 6702.9292 | 8618.3145 | 64.6164 | 46.428 | 11.607 | 20303.8691 |
9000 | 0.4848 | 1468.8694 | 6597.9023 | 8579.7764 | 65.1809 | 46.026 | 11.506 | 19187.4902 |
9500 | 0.5118 | 1466.9598 | 6589.9976 | 8530.3359 | 64.7254 | 46.35 | 11.587 | 19049.6328 |
10000 | 0.5387 | 1450.3381 | 6594.1782 | 8527.4131 | 65.1904 | 46.019 | 11.505 | 20619.4590 |
10500 | 0.5657 | 1422.2881 | 6539.0815 | 8491.7549 | 64.9945 | 46.158 | 11.539 | 20106.9180 |
11000 | 0.5926 | 1413.1234 | 6447.0659 | 8481.6855 | 65.107 | 46.078 | 11.52 | 18302.7910 |
11500 | 0.6195 | 1399.7990 | 6463.4536 | 8433.2803 | 64.732 | 46.345 | 11.586 | 18501.8398 |
12000 | 0.6465 | 1386.2769 | 6439.3423 | 8387.9043 | 64.7399 | 46.339 | 11.585 | 18306.4570 |
12500 | 0.6734 | 1381.0126 | 6380.1401 | 8346.6777 | 64.7944 | 46.3 | 11.575 | 19072.5371 |
13000 | 0.7003 | 1360.2582 | 6364.1938 | 8351.8828 | 64.608 | 46.434 | 11.608 | 18941.8262 |
13500 | 0.7273 | 1355.2496 | 6337.5508 | 8364.6289 | 64.4743 | 46.53 | 11.633 | 18354.1797 |
14000 | 0.7542 | 1342.7577 | 6132.9243 | 8351.3281 | 64.4281 | 46.564 | 11.641 | 18108.3027 |
14500 | 0.7811 | 1324.4287 | 6172.4019 | 8299.2109 | 64.0768 | 46.819 | 11.705 | 17864.5078 |
15000 | 0.8081 | 1311.8136 | 6250.3555 | 8288.9170 | 63.9884 | 46.883 | 11.721 | 18093.8008 |
15500 | 0.8350 | 1300.1758 | 6161.9678 | 8240.8105 | 65.0003 | 46.154 | 11.538 | 18435.2441 |
16000 | 0.8620 | 1294.5092 | 6087.9023 | 8225.1836 | 65.3075 | 45.937 | 11.484 | 18195.5664 |
16500 | 0.8889 | 1272.7550 | 6124.9282 | 8187.4561 | 64.7644 | 46.322 | 11.58 | 18905.1719 |
17000 | 0.9158 | 1271.9396 | 6117.1646 | 8179.8828 | 66.1093 | 45.379 | 11.345 | 17912.2910 |
17500 | 0.9428 | 1263.8173 | 5966.3726 | 8165.7280 | 64.1579 | 46.76 | 11.69 | 16779.9922 |
18000 | 0.9697 | 1245.9607 | 6065.6255 | 8219.2422 | 64.3092 | 46.65 | 11.662 | 17666.4180 |
18500 | 0.9966 | 1240.7706 | 6013.2476 | 8146.3145 | 64.5002 | 46.511 | 11.628 | 16597.2520 |
18562 | 1.0000 | 1242.8444 | 5899.8604 | 8136.0962 | 64.3726 | 46.604 | 11.651 | 16160.9238 |
Framework versions
- Distily 0.2.0
- Transformers 4.44.0
- Pytorch 2.3.0
- Datasets 2.20.0