|
--- |
|
base_model: gpt2 |
|
library_name: distily |
|
license: mit |
|
tags: |
|
- generated_from_trainer |
|
model-index: |
|
- name: distily_bench_gpt2_optim_extended2 |
|
results: [] |
|
--- |
|
|
|
# distily_bench_gpt2_optim_extended2 |
|
|
|
This student model is distilled from the teacher model [gpt2](https://huggingface.co/gpt2) using the dataset (unspecified). |
|
|
|
The [Distily](https://github.com/lapp0/distily) library was used for this distillation. |
|
|
|
It achieves the following results on the evaluation set: |
|
- eval_enwikippl: 1466.9598 |
|
- eval_frwikippl: 6589.9976 |
|
- eval_zhwikippl: 19049.6328 |
|
- eval_loss: 8530.3359 |
|
- eval_runtime: 64.7254 |
|
- eval_samples_per_second: 46.35 |
|
- eval_steps_per_second: 11.587 |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
--> |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- distillation_objective: 'legacy' |
|
- loss_fn: kl |
|
- train_embeddings: True |
|
- learning_rate: 4e-05 |
|
- train_batch_size: 8 |
|
- eval_batch_size: 4 |
|
- seed: 42 |
|
- gradient_accumulation_steps: 2 |
|
- total_train_batch_size: 16 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: constant |
|
- num_epochs: 1.0 |
|
|
|
### Resource Usage |
|
Peak GPU Memory: 8.3354 GB |
|
|
|
### Eval-Phase Metrics |
|
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl | |
|
| --- | --- | --- | --- | --- | --- | --- | --- | --- | |
|
| **teacher eval** | | 30.2385 | 57.2728 | | | | | 18.1772 | |
|
| 0 | 0 | 55332.9297 | 57511.9648 | 333834.9375 | 64.4894 | 46.519 | 11.63 | 57797.4375 | |
|
| 500 | 0.0269 | 3397.8057 | 14195.7314 | 11200.1709 | 64.3161 | 46.645 | 11.661 | 46176.3906 | |
|
| 1000 | 0.0539 | 2565.4185 | 11100.7803 | 10401.7070 | 64.9732 | 46.173 | 11.543 | 40786.25 | |
|
| 1500 | 0.0808 | 2280.1555 | 9752.9180 | 10029.2695 | 65.1147 | 46.073 | 11.518 | 34300.0664 | |
|
| 2000 | 0.1077 | 2111.7202 | 8617.1777 | 9861.6855 | 65.0861 | 46.093 | 11.523 | 27128.5918 | |
|
| 2500 | 0.1347 | 1990.7386 | 8209.1553 | 9601.2373 | 64.8934 | 46.23 | 11.557 | 25209.2168 | |
|
| 3000 | 0.1616 | 1918.3867 | 7799.5220 | 9467.9785 | 64.886 | 46.235 | 11.559 | 22736.8027 | |
|
| 3500 | 0.1886 | 1818.1265 | 7551.1548 | 9349.7920 | 64.7154 | 46.357 | 11.589 | 22582.4883 | |
|
| 4000 | 0.2155 | 1769.4467 | 7458.5562 | 9246.7197 | 64.7466 | 46.334 | 11.584 | 21114.0508 | |
|
| 4500 | 0.2424 | 1728.6010 | 7363.9741 | 9099.1787 | 65.1202 | 46.069 | 11.517 | 20729.8926 | |
|
| 5000 | 0.2694 | 1704.3433 | 7453.2944 | 9068.9062 | 64.69 | 46.375 | 11.594 | 21740.6367 | |
|
| 5500 | 0.2963 | 1664.6129 | 7184.9824 | 8969.5039 | 64.2668 | 46.68 | 11.67 | 20534.2910 | |
|
| 6000 | 0.3232 | 1631.8164 | 7198.6724 | 8898.6348 | 65.558 | 45.761 | 11.44 | 22204.2188 | |
|
| 6500 | 0.3502 | 1589.2347 | 6884.9448 | 8812.0322 | 64.8035 | 46.294 | 11.573 | 19131.2129 | |
|
| 7000 | 0.3771 | 1553.9370 | 6727.0781 | 8747.2002 | 65.3644 | 45.897 | 11.474 | 18709.2949 | |
|
| 7500 | 0.4040 | 1540.8395 | 6779.4512 | 8707.7334 | 64.9958 | 46.157 | 11.539 | 18515.4297 | |
|
| 8000 | 0.4310 | 1519.5702 | 6720.9155 | 8684.7471 | 65.1941 | 46.016 | 11.504 | 19323.7656 | |
|
| 8500 | 0.4579 | 1499.4967 | 6702.9292 | 8618.3145 | 64.6164 | 46.428 | 11.607 | 20303.8691 | |
|
| 9000 | 0.4848 | 1468.8694 | 6597.9023 | 8579.7764 | 65.1809 | 46.026 | 11.506 | 19187.4902 | |
|
| 9500 | 0.5118 | 1466.9598 | 6589.9976 | 8530.3359 | 64.7254 | 46.35 | 11.587 | 19049.6328 | |
|
| 10000 | 0.5387 | 1450.3381 | 6594.1782 | 8527.4131 | 65.1904 | 46.019 | 11.505 | 20619.4590 | |
|
| 10500 | 0.5657 | 1422.2881 | 6539.0815 | 8491.7549 | 64.9945 | 46.158 | 11.539 | 20106.9180 | |
|
| 11000 | 0.5926 | 1413.1234 | 6447.0659 | 8481.6855 | 65.107 | 46.078 | 11.52 | 18302.7910 | |
|
| 11500 | 0.6195 | 1399.7990 | 6463.4536 | 8433.2803 | 64.732 | 46.345 | 11.586 | 18501.8398 | |
|
| 12000 | 0.6465 | 1386.2769 | 6439.3423 | 8387.9043 | 64.7399 | 46.339 | 11.585 | 18306.4570 | |
|
| 12500 | 0.6734 | 1381.0126 | 6380.1401 | 8346.6777 | 64.7944 | 46.3 | 11.575 | 19072.5371 | |
|
| 13000 | 0.7003 | 1360.2582 | 6364.1938 | 8351.8828 | 64.608 | 46.434 | 11.608 | 18941.8262 | |
|
| 13500 | 0.7273 | 1355.2496 | 6337.5508 | 8364.6289 | 64.4743 | 46.53 | 11.633 | 18354.1797 | |
|
| 14000 | 0.7542 | 1342.7577 | 6132.9243 | 8351.3281 | 64.4281 | 46.564 | 11.641 | 18108.3027 | |
|
| 14500 | 0.7811 | 1324.4287 | 6172.4019 | 8299.2109 | 64.0768 | 46.819 | 11.705 | 17864.5078 | |
|
| 15000 | 0.8081 | 1311.8136 | 6250.3555 | 8288.9170 | 63.9884 | 46.883 | 11.721 | 18093.8008 | |
|
| 15500 | 0.8350 | 1300.1758 | 6161.9678 | 8240.8105 | 65.0003 | 46.154 | 11.538 | 18435.2441 | |
|
| 16000 | 0.8620 | 1294.5092 | 6087.9023 | 8225.1836 | 65.3075 | 45.937 | 11.484 | 18195.5664 | |
|
| 16500 | 0.8889 | 1272.7550 | 6124.9282 | 8187.4561 | 64.7644 | 46.322 | 11.58 | 18905.1719 | |
|
| 17000 | 0.9158 | 1271.9396 | 6117.1646 | 8179.8828 | 66.1093 | 45.379 | 11.345 | 17912.2910 | |
|
| 17500 | 0.9428 | 1263.8173 | 5966.3726 | 8165.7280 | 64.1579 | 46.76 | 11.69 | 16779.9922 | |
|
| 18000 | 0.9697 | 1245.9607 | 6065.6255 | 8219.2422 | 64.3092 | 46.65 | 11.662 | 17666.4180 | |
|
| 18500 | 0.9966 | 1240.7706 | 6013.2476 | 8146.3145 | 64.5002 | 46.511 | 11.628 | 16597.2520 | |
|
| 18562 | 1.0000 | 1242.8444 | 5899.8604 | 8136.0962 | 64.3726 | 46.604 | 11.651 | 16160.9238 | |
|
|
|
### Framework versions |
|
- Distily 0.2.0 |
|
- Transformers 4.44.0 |
|
- Pytorch 2.3.0 |
|
- Datasets 2.20.0 |
|
|