End of training
Browse files
README.md
CHANGED
@@ -16,13 +16,13 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
|
|
16 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
17 |
|
18 |
It achieves the following results on the evaluation set:
|
19 |
-
- eval_enwikippl:
|
20 |
-
- eval_frwikippl:
|
21 |
-
- eval_zhwikippl:
|
22 |
-
- eval_loss:
|
23 |
-
- eval_runtime: 64.
|
24 |
-
- eval_samples_per_second: 46.
|
25 |
-
- eval_steps_per_second: 11.
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
should probably proofread and complete it, then remove this comment.
|
@@ -65,45 +65,45 @@ Peak GPU Memory: 8.3354 GB
|
|
65 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
|
66 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
67 |
| **teacher eval** | | 30.2385 | 57.2728 | | | | | 18.1772 |
|
68 |
-
| 0 | 0 |
|
69 |
-
| 500 | 0.0269 |
|
70 |
-
| 1000 | 0.0539 |
|
71 |
-
| 1500 | 0.0808 |
|
72 |
-
| 2000 | 0.1077 |
|
73 |
-
| 2500 | 0.1347 |
|
74 |
-
| 3000 | 0.1616 |
|
75 |
-
| 3500 | 0.1886 |
|
76 |
-
| 4000 | 0.2155 |
|
77 |
-
| 4500 | 0.2424 |
|
78 |
-
| 5000 | 0.2694 |
|
79 |
-
| 5500 | 0.2963 |
|
80 |
-
| 6000 | 0.3232 |
|
81 |
-
| 6500 | 0.3502 |
|
82 |
-
| 7000 | 0.3771 |
|
83 |
-
| 7500 | 0.4040 |
|
84 |
-
| 8000 | 0.4310 |
|
85 |
-
| 8500 | 0.4579 |
|
86 |
-
| 9000 | 0.4848 |
|
87 |
-
| 9500 | 0.5118 |
|
88 |
-
| 10000 | 0.5387 |
|
89 |
-
| 10500 | 0.5657 |
|
90 |
-
| 11000 | 0.5926 |
|
91 |
-
| 11500 | 0.6195 |
|
92 |
-
| 12000 | 0.6465 |
|
93 |
-
| 12500 | 0.6734 |
|
94 |
-
| 13000 | 0.7003 |
|
95 |
-
| 13500 | 0.7273 |
|
96 |
-
| 14000 | 0.7542 |
|
97 |
-
| 14500 | 0.7811 |
|
98 |
-
| 15000 | 0.8081 |
|
99 |
-
| 15500 | 0.8350 |
|
100 |
-
| 16000 | 0.8620 |
|
101 |
-
| 16500 | 0.8889 |
|
102 |
-
| 17000 | 0.9158 |
|
103 |
-
| 17500 | 0.9428 |
|
104 |
-
| 18000 | 0.9697 |
|
105 |
-
| 18500 | 0.9966 |
|
106 |
-
| 18562 | 1.0000 |
|
107 |
|
108 |
### Framework versions
|
109 |
- Distily 0.2.0
|
|
|
16 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
17 |
|
18 |
It achieves the following results on the evaluation set:
|
19 |
+
- eval_enwikippl: 1466.9598
|
20 |
+
- eval_frwikippl: 6589.9976
|
21 |
+
- eval_zhwikippl: 19049.6328
|
22 |
+
- eval_loss: 8530.3359
|
23 |
+
- eval_runtime: 64.7254
|
24 |
+
- eval_samples_per_second: 46.35
|
25 |
+
- eval_steps_per_second: 11.587
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
should probably proofread and complete it, then remove this comment.
|
|
|
65 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
|
66 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
67 |
| **teacher eval** | | 30.2385 | 57.2728 | | | | | 18.1772 |
|
68 |
+
| 0 | 0 | 55332.9297 | 57511.9648 | 333834.9375 | 64.4894 | 46.519 | 11.63 | 57797.4375 |
|
69 |
+
| 500 | 0.0269 | 3397.8057 | 14195.7314 | 11200.1709 | 64.3161 | 46.645 | 11.661 | 46176.3906 |
|
70 |
+
| 1000 | 0.0539 | 2565.4185 | 11100.7803 | 10401.7070 | 64.9732 | 46.173 | 11.543 | 40786.25 |
|
71 |
+
| 1500 | 0.0808 | 2280.1555 | 9752.9180 | 10029.2695 | 65.1147 | 46.073 | 11.518 | 34300.0664 |
|
72 |
+
| 2000 | 0.1077 | 2111.7202 | 8617.1777 | 9861.6855 | 65.0861 | 46.093 | 11.523 | 27128.5918 |
|
73 |
+
| 2500 | 0.1347 | 1990.7386 | 8209.1553 | 9601.2373 | 64.8934 | 46.23 | 11.557 | 25209.2168 |
|
74 |
+
| 3000 | 0.1616 | 1918.3867 | 7799.5220 | 9467.9785 | 64.886 | 46.235 | 11.559 | 22736.8027 |
|
75 |
+
| 3500 | 0.1886 | 1818.1265 | 7551.1548 | 9349.7920 | 64.7154 | 46.357 | 11.589 | 22582.4883 |
|
76 |
+
| 4000 | 0.2155 | 1769.4467 | 7458.5562 | 9246.7197 | 64.7466 | 46.334 | 11.584 | 21114.0508 |
|
77 |
+
| 4500 | 0.2424 | 1728.6010 | 7363.9741 | 9099.1787 | 65.1202 | 46.069 | 11.517 | 20729.8926 |
|
78 |
+
| 5000 | 0.2694 | 1704.3433 | 7453.2944 | 9068.9062 | 64.69 | 46.375 | 11.594 | 21740.6367 |
|
79 |
+
| 5500 | 0.2963 | 1664.6129 | 7184.9824 | 8969.5039 | 64.2668 | 46.68 | 11.67 | 20534.2910 |
|
80 |
+
| 6000 | 0.3232 | 1631.8164 | 7198.6724 | 8898.6348 | 65.558 | 45.761 | 11.44 | 22204.2188 |
|
81 |
+
| 6500 | 0.3502 | 1589.2347 | 6884.9448 | 8812.0322 | 64.8035 | 46.294 | 11.573 | 19131.2129 |
|
82 |
+
| 7000 | 0.3771 | 1553.9370 | 6727.0781 | 8747.2002 | 65.3644 | 45.897 | 11.474 | 18709.2949 |
|
83 |
+
| 7500 | 0.4040 | 1540.8395 | 6779.4512 | 8707.7334 | 64.9958 | 46.157 | 11.539 | 18515.4297 |
|
84 |
+
| 8000 | 0.4310 | 1519.5702 | 6720.9155 | 8684.7471 | 65.1941 | 46.016 | 11.504 | 19323.7656 |
|
85 |
+
| 8500 | 0.4579 | 1499.4967 | 6702.9292 | 8618.3145 | 64.6164 | 46.428 | 11.607 | 20303.8691 |
|
86 |
+
| 9000 | 0.4848 | 1468.8694 | 6597.9023 | 8579.7764 | 65.1809 | 46.026 | 11.506 | 19187.4902 |
|
87 |
+
| 9500 | 0.5118 | 1466.9598 | 6589.9976 | 8530.3359 | 64.7254 | 46.35 | 11.587 | 19049.6328 |
|
88 |
+
| 10000 | 0.5387 | 1450.3381 | 6594.1782 | 8527.4131 | 65.1904 | 46.019 | 11.505 | 20619.4590 |
|
89 |
+
| 10500 | 0.5657 | 1422.2881 | 6539.0815 | 8491.7549 | 64.9945 | 46.158 | 11.539 | 20106.9180 |
|
90 |
+
| 11000 | 0.5926 | 1413.1234 | 6447.0659 | 8481.6855 | 65.107 | 46.078 | 11.52 | 18302.7910 |
|
91 |
+
| 11500 | 0.6195 | 1399.7990 | 6463.4536 | 8433.2803 | 64.732 | 46.345 | 11.586 | 18501.8398 |
|
92 |
+
| 12000 | 0.6465 | 1386.2769 | 6439.3423 | 8387.9043 | 64.7399 | 46.339 | 11.585 | 18306.4570 |
|
93 |
+
| 12500 | 0.6734 | 1381.0126 | 6380.1401 | 8346.6777 | 64.7944 | 46.3 | 11.575 | 19072.5371 |
|
94 |
+
| 13000 | 0.7003 | 1360.2582 | 6364.1938 | 8351.8828 | 64.608 | 46.434 | 11.608 | 18941.8262 |
|
95 |
+
| 13500 | 0.7273 | 1355.2496 | 6337.5508 | 8364.6289 | 64.4743 | 46.53 | 11.633 | 18354.1797 |
|
96 |
+
| 14000 | 0.7542 | 1342.7577 | 6132.9243 | 8351.3281 | 64.4281 | 46.564 | 11.641 | 18108.3027 |
|
97 |
+
| 14500 | 0.7811 | 1324.4287 | 6172.4019 | 8299.2109 | 64.0768 | 46.819 | 11.705 | 17864.5078 |
|
98 |
+
| 15000 | 0.8081 | 1311.8136 | 6250.3555 | 8288.9170 | 63.9884 | 46.883 | 11.721 | 18093.8008 |
|
99 |
+
| 15500 | 0.8350 | 1300.1758 | 6161.9678 | 8240.8105 | 65.0003 | 46.154 | 11.538 | 18435.2441 |
|
100 |
+
| 16000 | 0.8620 | 1294.5092 | 6087.9023 | 8225.1836 | 65.3075 | 45.937 | 11.484 | 18195.5664 |
|
101 |
+
| 16500 | 0.8889 | 1272.7550 | 6124.9282 | 8187.4561 | 64.7644 | 46.322 | 11.58 | 18905.1719 |
|
102 |
+
| 17000 | 0.9158 | 1271.9396 | 6117.1646 | 8179.8828 | 66.1093 | 45.379 | 11.345 | 17912.2910 |
|
103 |
+
| 17500 | 0.9428 | 1263.8173 | 5966.3726 | 8165.7280 | 64.1579 | 46.76 | 11.69 | 16779.9922 |
|
104 |
+
| 18000 | 0.9697 | 1245.9607 | 6065.6255 | 8219.2422 | 64.3092 | 46.65 | 11.662 | 17666.4180 |
|
105 |
+
| 18500 | 0.9966 | 1240.7706 | 6013.2476 | 8146.3145 | 64.5002 | 46.511 | 11.628 | 16597.2520 |
|
106 |
+
| 18562 | 1.0000 | 1242.8444 | 5899.8604 | 8136.0962 | 64.3726 | 46.604 | 11.651 | 16160.9238 |
|
107 |
|
108 |
### Framework versions
|
109 |
- Distily 0.2.0
|
logs/optim=paged_adamw_32bit/events.out.tfevents.1723354205.93d6cbb3ad53
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:97475d50c797a0228b79191e95b9474c42cdf7844de4533d955767d32167f4cc
|
3 |
+
size 253
|