File size: 56,524 Bytes
00e4cb4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
Loss scales: [0.0, 0.0, 1.0]
Noise std: 0.1
Use amp for speeding up training
Load teacher model: clip-ViT-B-32
Teacher model architecture: 
 Framework(
  (0): CLIPModel()
)
Create student model from output/2stages/1_b32_pt1_100
Training does not need the teacher model, set it to None
Freeze the multimodal encoder of the student model
Student model architecture: 
 Framework(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Dense({'in_features': 768, 'out_features': 512, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (3): Projector({'in_features': 512, 'out_features': 768, 'bias': True, 'noise_std': 0.1, 'dropout': 0.1, 'noise_prob': 0})
  (4): Decoder({'max_seq_length': 128, 'do_lower_case': False, 'attend_to': ['student'], 'teacher_model_name': 'clip-ViT-B-32'}) with Transformer model: BertLMHeadModel 
)
Total Params: 164986107
Trainable Params: 29858811
Load data/corpus/multilingual_cc3m/cc3m_en.tsv
There are 1 langauges: ['en']
There are 1111391 lines, one of which is ['woman selling flowers to decorate religious offerings at the market']
Load data/corpus/multilingual_cc3m/cc3m_en-zh.tsv
There are 2 langauges: ['en', 'zh']
There are 1111391 lines, one of which is ['a very typical bus station', '一个非常典型的公交车站']
Load data/corpus/multilingual_cc3m/cc3m_en-de.tsv
There are 2 langauges: ['en', 'de']
There are 1111391 lines, one of which is ['tourists take a photo in front of the entrance sign', 'Touristen machen ein Foto vor dem Eingangsschild']
Load data/corpus/multilingual_cc3m/cc3m_en-fr.tsv
There are 2 langauges: ['en', 'fr']
There are 1111391 lines, one of which is ['farmer holding a box with grapes', 'agriculteur tenant une boîte avec des raisins']
Epoch: 0 [   500 / 111704]  loss: 8.7714  loss_at_student: 8.7714  max mem: 4592
Epoch: 0 [  1000 / 111704]  loss: 7.9345  loss_at_student: 7.9345  max mem: 4592
Epoch: 0 [  1500 / 111704]  loss: 7.4418  loss_at_student: 7.4418  max mem: 4592
Epoch: 0 [  2000 / 111704]  loss: 7.2385  loss_at_student: 7.2385  max mem: 5142
Epoch: 0 [  2500 / 111704]  loss: 6.8782  loss_at_student: 6.8782  max mem: 5142
Epoch: 0 [  3000 / 111704]  loss: 6.4817  loss_at_student: 6.4817  max mem: 5194
Epoch: 0 [  3500 / 111704]  loss: 6.3331  loss_at_student: 6.3331  max mem: 5194
Epoch: 0 [  4000 / 111704]  loss: 6.6354  loss_at_student: 6.6354  max mem: 5194
Epoch: 0 [  4500 / 111704]  loss: 6.2082  loss_at_student: 6.2082  max mem: 5194
Epoch: 0 [  5000 / 111704]  loss: 6.0398  loss_at_student: 6.0398  max mem: 5194
Epoch: 0 [  5500 / 111704]  loss: 5.9675  loss_at_student: 5.9675  max mem: 5194
Epoch: 0 [  6000 / 111704]  loss: 5.7734  loss_at_student: 5.7734  max mem: 5194
Epoch: 0 [  6500 / 111704]  loss: 5.8813  loss_at_student: 5.8813  max mem: 5194
Epoch: 0 [  7000 / 111704]  loss: 5.9705  loss_at_student: 5.9705  max mem: 5194
Epoch: 0 [  7500 / 111704]  loss: 5.6952  loss_at_student: 5.6952  max mem: 5194
Epoch: 0 [  8000 / 111704]  loss: 5.7671  loss_at_student: 5.7671  max mem: 5194
Epoch: 0 [  8500 / 111704]  loss: 5.6863  loss_at_student: 5.6863  max mem: 5194
Epoch: 0 [  9000 / 111704]  loss: 5.2731  loss_at_student: 5.2731  max mem: 5194
Epoch: 0 [  9500 / 111704]  loss: 5.6404  loss_at_student: 5.6404  max mem: 5194
Epoch: 0 [ 10000 / 111704]  loss: 5.5654  loss_at_student: 5.5654  max mem: 5194
Epoch: 0 [ 10500 / 111704]  loss: 5.1481  loss_at_student: 5.1481  max mem: 5194
Epoch: 0 [ 11000 / 111704]  loss: 5.3339  loss_at_student: 5.3339  max mem: 5194
Epoch: 0 [ 11500 / 111704]  loss: 5.5660  loss_at_student: 5.5660  max mem: 5194
Epoch: 0 [ 12000 / 111704]  loss: 5.0071  loss_at_student: 5.0071  max mem: 5194
Epoch: 0 [ 12500 / 111704]  loss: 5.0456  loss_at_student: 5.0456  max mem: 5194
Epoch: 0 [ 13000 / 111704]  loss: 5.2548  loss_at_student: 5.2548  max mem: 5194
Epoch: 0 [ 13500 / 111704]  loss: 4.7399  loss_at_student: 4.7399  max mem: 5194
Epoch: 0 [ 14000 / 111704]  loss: 4.9904  loss_at_student: 4.9904  max mem: 5194
Epoch: 0 [ 14500 / 111704]  loss: 4.8041  loss_at_student: 4.8041  max mem: 5194
Epoch: 0 [ 15000 / 111704]  loss: 5.0959  loss_at_student: 5.0959  max mem: 5194
Epoch: 0 [ 15500 / 111704]  loss: 4.6961  loss_at_student: 4.6961  max mem: 5194
Epoch: 0 [ 16000 / 111704]  loss: 5.0804  loss_at_student: 5.0804  max mem: 5194
Epoch: 0 [ 16500 / 111704]  loss: 4.8322  loss_at_student: 4.8322  max mem: 5242
Epoch: 0 [ 17000 / 111704]  loss: 5.1881  loss_at_student: 5.1881  max mem: 5242
Epoch: 0 [ 17500 / 111704]  loss: 5.2201  loss_at_student: 5.2201  max mem: 5242
Epoch: 0 [ 18000 / 111704]  loss: 4.6492  loss_at_student: 4.6492  max mem: 5242
Epoch: 0 [ 18500 / 111704]  loss: 5.0002  loss_at_student: 5.0002  max mem: 5242
Epoch: 0 [ 19000 / 111704]  loss: 4.5451  loss_at_student: 4.5451  max mem: 5242
Epoch: 0 [ 19500 / 111704]  loss: 4.7435  loss_at_student: 4.7435  max mem: 5242
Epoch: 0 [ 20000 / 111704]  loss: 4.4531  loss_at_student: 4.4531  max mem: 5242
Epoch: 0 [ 20500 / 111704]  loss: 4.4171  loss_at_student: 4.4171  max mem: 5242
Epoch: 0 [ 21000 / 111704]  loss: 4.8378  loss_at_student: 4.8378  max mem: 5242
Epoch: 0 [ 21500 / 111704]  loss: 4.5904  loss_at_student: 4.5904  max mem: 5242
Epoch: 0 [ 22000 / 111704]  loss: 4.5181  loss_at_student: 4.5181  max mem: 5242
Epoch: 0 [ 22500 / 111704]  loss: 4.8956  loss_at_student: 4.8956  max mem: 5242
Epoch: 0 [ 23000 / 111704]  loss: 4.7755  loss_at_student: 4.7755  max mem: 5754
Epoch: 0 [ 23500 / 111704]  loss: 4.4486  loss_at_student: 4.4486  max mem: 5754
Epoch: 0 [ 24000 / 111704]  loss: 4.4459  loss_at_student: 4.4459  max mem: 5754
Epoch: 0 [ 24500 / 111704]  loss: 4.5868  loss_at_student: 4.5868  max mem: 5754
Epoch: 0 [ 25000 / 111704]  loss: 4.3169  loss_at_student: 4.3169  max mem: 5754
Epoch: 0 [ 25500 / 111704]  loss: 4.5783  loss_at_student: 4.5783  max mem: 5754
Epoch: 0 [ 26000 / 111704]  loss: 4.2304  loss_at_student: 4.2304  max mem: 5754
Epoch: 0 [ 26500 / 111704]  loss: 4.6112  loss_at_student: 4.6112  max mem: 5754
Epoch: 0 [ 27000 / 111704]  loss: 4.2030  loss_at_student: 4.2030  max mem: 5754
Epoch: 0 [ 27500 / 111704]  loss: 4.5268  loss_at_student: 4.5268  max mem: 5754
Epoch: 0 [ 28000 / 111704]  loss: 4.3942  loss_at_student: 4.3942  max mem: 5754
Epoch: 0 [ 28500 / 111704]  loss: 4.3239  loss_at_student: 4.3239  max mem: 5754
Epoch: 0 [ 29000 / 111704]  loss: 4.3064  loss_at_student: 4.3064  max mem: 5754
Epoch: 0 [ 29500 / 111704]  loss: 4.4896  loss_at_student: 4.4896  max mem: 5754
Epoch: 0 [ 30000 / 111704]  loss: 4.2512  loss_at_student: 4.2512  max mem: 5754
Epoch: 0 [ 30500 / 111704]  loss: 4.4796  loss_at_student: 4.4796  max mem: 5754
Epoch: 0 [ 31000 / 111704]  loss: 4.0884  loss_at_student: 4.0884  max mem: 5754
Epoch: 0 [ 31500 / 111704]  loss: 4.0342  loss_at_student: 4.0342  max mem: 5754
Epoch: 0 [ 32000 / 111704]  loss: 4.4938  loss_at_student: 4.4938  max mem: 5754
Epoch: 0 [ 32500 / 111704]  loss: 4.4294  loss_at_student: 4.4294  max mem: 5754
Epoch: 0 [ 33000 / 111704]  loss: 4.3596  loss_at_student: 4.3596  max mem: 5754
Epoch: 0 [ 33500 / 111704]  loss: 3.9769  loss_at_student: 3.9769  max mem: 5754
Epoch: 0 [ 34000 / 111704]  loss: 4.1970  loss_at_student: 4.1970  max mem: 5754
Epoch: 0 [ 34500 / 111704]  loss: 3.9168  loss_at_student: 3.9168  max mem: 5754
Epoch: 0 [ 35000 / 111704]  loss: 4.0093  loss_at_student: 4.0093  max mem: 5754
Epoch: 0 [ 35500 / 111704]  loss: 4.2701  loss_at_student: 4.2701  max mem: 5754
Epoch: 0 [ 36000 / 111704]  loss: 4.0642  loss_at_student: 4.0642  max mem: 7597
Epoch: 0 [ 36500 / 111704]  loss: 4.1264  loss_at_student: 4.1264  max mem: 7597
Epoch: 0 [ 37000 / 111704]  loss: 4.1885  loss_at_student: 4.1885  max mem: 7597
Epoch: 0 [ 37500 / 111704]  loss: 4.2928  loss_at_student: 4.2928  max mem: 7597
Epoch: 0 [ 38000 / 111704]  loss: 4.1477  loss_at_student: 4.1477  max mem: 7597
Epoch: 0 [ 38500 / 111704]  loss: 4.1527  loss_at_student: 4.1527  max mem: 7597
Epoch: 0 [ 39000 / 111704]  loss: 3.8931  loss_at_student: 3.8931  max mem: 7597
Epoch: 0 [ 39500 / 111704]  loss: 4.4130  loss_at_student: 4.4130  max mem: 7597
Epoch: 0 [ 40000 / 111704]  loss: 4.0760  loss_at_student: 4.0760  max mem: 7597
Epoch: 0 [ 40500 / 111704]  loss: 4.0467  loss_at_student: 4.0467  max mem: 7597
Epoch: 0 [ 41000 / 111704]  loss: 3.8695  loss_at_student: 3.8695  max mem: 7597
Epoch: 0 [ 41500 / 111704]  loss: 4.0500  loss_at_student: 4.0500  max mem: 7597
Epoch: 0 [ 42000 / 111704]  loss: 4.4134  loss_at_student: 4.4134  max mem: 7597
Epoch: 0 [ 42500 / 111704]  loss: 3.6092  loss_at_student: 3.6092  max mem: 7597
Epoch: 0 [ 43000 / 111704]  loss: 3.8108  loss_at_student: 3.8108  max mem: 7597
Epoch: 0 [ 43500 / 111704]  loss: 3.7753  loss_at_student: 3.7753  max mem: 7597
Epoch: 0 [ 44000 / 111704]  loss: 4.2059  loss_at_student: 4.2059  max mem: 7597
Epoch: 0 [ 44500 / 111704]  loss: 3.8862  loss_at_student: 3.8862  max mem: 7597
Epoch: 0 [ 45000 / 111704]  loss: 4.1190  loss_at_student: 4.1190  max mem: 7597
Epoch: 0 [ 45500 / 111704]  loss: 3.8656  loss_at_student: 3.8656  max mem: 7597
Epoch: 0 [ 46000 / 111704]  loss: 3.9975  loss_at_student: 3.9975  max mem: 7597
Epoch: 0 [ 46500 / 111704]  loss: 4.2335  loss_at_student: 4.2335  max mem: 7597
Epoch: 0 [ 47000 / 111704]  loss: 4.3347  loss_at_student: 4.3347  max mem: 7597
Epoch: 0 [ 47500 / 111704]  loss: 3.5865  loss_at_student: 3.5865  max mem: 7597
Epoch: 0 [ 48000 / 111704]  loss: 3.6770  loss_at_student: 3.6770  max mem: 7597
Epoch: 0 [ 48500 / 111704]  loss: 3.8896  loss_at_student: 3.8896  max mem: 7597
Epoch: 0 [ 49000 / 111704]  loss: 3.9530  loss_at_student: 3.9530  max mem: 7597
Epoch: 0 [ 49500 / 111704]  loss: 4.0959  loss_at_student: 4.0959  max mem: 7597
Epoch: 0 [ 50000 / 111704]  loss: 4.1451  loss_at_student: 4.1451  max mem: 7597
Epoch: 0 [ 50500 / 111704]  loss: 3.9495  loss_at_student: 3.9495  max mem: 7597
Epoch: 0 [ 51000 / 111704]  loss: 3.9685  loss_at_student: 3.9685  max mem: 7597
Epoch: 0 [ 51500 / 111704]  loss: 3.6819  loss_at_student: 3.6819  max mem: 7597
Epoch: 0 [ 52000 / 111704]  loss: 4.3911  loss_at_student: 4.3911  max mem: 7597
Epoch: 0 [ 52500 / 111704]  loss: 3.8748  loss_at_student: 3.8748  max mem: 7597
Epoch: 0 [ 53000 / 111704]  loss: 3.8664  loss_at_student: 3.8664  max mem: 7597
Epoch: 0 [ 53500 / 111704]  loss: 3.8093  loss_at_student: 3.8093  max mem: 7597
Epoch: 0 [ 54000 / 111704]  loss: 3.7960  loss_at_student: 3.7960  max mem: 7597
Epoch: 0 [ 54500 / 111704]  loss: 3.9549  loss_at_student: 3.9549  max mem: 7597
Epoch: 0 [ 55000 / 111704]  loss: 4.1104  loss_at_student: 4.1104  max mem: 7597
Epoch: 0 [ 55500 / 111704]  loss: 4.0118  loss_at_student: 4.0118  max mem: 7597
Epoch: 0 [ 56000 / 111704]  loss: 3.6153  loss_at_student: 3.6153  max mem: 7597
Epoch: 0 [ 56500 / 111704]  loss: 3.9898  loss_at_student: 3.9898  max mem: 7597
Epoch: 0 [ 57000 / 111704]  loss: 3.7724  loss_at_student: 3.7724  max mem: 7597
Epoch: 0 [ 57500 / 111704]  loss: 3.8434  loss_at_student: 3.8434  max mem: 7597
Epoch: 0 [ 58000 / 111704]  loss: 4.0537  loss_at_student: 4.0537  max mem: 7597
Epoch: 0 [ 58500 / 111704]  loss: 3.7411  loss_at_student: 3.7411  max mem: 7597
Epoch: 0 [ 59000 / 111704]  loss: 3.7493  loss_at_student: 3.7493  max mem: 7597
Epoch: 0 [ 59500 / 111704]  loss: 4.3428  loss_at_student: 4.3428  max mem: 7597
Epoch: 0 [ 60000 / 111704]  loss: 3.8873  loss_at_student: 3.8873  max mem: 7597
Epoch: 0 [ 60500 / 111704]  loss: 4.1809  loss_at_student: 4.1809  max mem: 7597
Epoch: 0 [ 61000 / 111704]  loss: 4.1613  loss_at_student: 4.1613  max mem: 7597
Epoch: 0 [ 61500 / 111704]  loss: 3.4200  loss_at_student: 3.4200  max mem: 7597
Epoch: 0 [ 62000 / 111704]  loss: 3.9101  loss_at_student: 3.9101  max mem: 7597
Epoch: 0 [ 62500 / 111704]  loss: 3.8585  loss_at_student: 3.8585  max mem: 7597
Epoch: 0 [ 63000 / 111704]  loss: 3.7161  loss_at_student: 3.7161  max mem: 7597
Epoch: 0 [ 63500 / 111704]  loss: 3.8943  loss_at_student: 3.8943  max mem: 7597
Epoch: 0 [ 64000 / 111704]  loss: 3.7164  loss_at_student: 3.7164  max mem: 7597
Epoch: 0 [ 64500 / 111704]  loss: 3.7043  loss_at_student: 3.7043  max mem: 7597
Epoch: 0 [ 65000 / 111704]  loss: 3.4761  loss_at_student: 3.4761  max mem: 7597
Epoch: 0 [ 65500 / 111704]  loss: 4.0781  loss_at_student: 4.0781  max mem: 7597
Epoch: 0 [ 66000 / 111704]  loss: 3.7520  loss_at_student: 3.7520  max mem: 7597
Epoch: 0 [ 66500 / 111704]  loss: 3.5518  loss_at_student: 3.5518  max mem: 7597
Epoch: 0 [ 67000 / 111704]  loss: 3.9021  loss_at_student: 3.9021  max mem: 7597
Epoch: 0 [ 67500 / 111704]  loss: 3.8593  loss_at_student: 3.8593  max mem: 7597
Epoch: 0 [ 68000 / 111704]  loss: 3.9456  loss_at_student: 3.9456  max mem: 7597
Epoch: 0 [ 68500 / 111704]  loss: 3.6141  loss_at_student: 3.6141  max mem: 7597
Epoch: 0 [ 69000 / 111704]  loss: 4.2022  loss_at_student: 4.2022  max mem: 7597
Epoch: 0 [ 69500 / 111704]  loss: 3.5705  loss_at_student: 3.5705  max mem: 7597
Epoch: 0 [ 70000 / 111704]  loss: 3.8974  loss_at_student: 3.8974  max mem: 7597
Epoch: 0 [ 70500 / 111704]  loss: 3.4586  loss_at_student: 3.4586  max mem: 7597
Epoch: 0 [ 71000 / 111704]  loss: 4.2041  loss_at_student: 4.2041  max mem: 7597
Epoch: 0 [ 71500 / 111704]  loss: 3.6301  loss_at_student: 3.6301  max mem: 7597
Epoch: 0 [ 72000 / 111704]  loss: 3.8927  loss_at_student: 3.8927  max mem: 7597
Epoch: 0 [ 72500 / 111704]  loss: 3.7067  loss_at_student: 3.7067  max mem: 7597
Epoch: 0 [ 73000 / 111704]  loss: 3.5971  loss_at_student: 3.5971  max mem: 7597
Epoch: 0 [ 73500 / 111704]  loss: 3.9996  loss_at_student: 3.9996  max mem: 7597
Epoch: 0 [ 74000 / 111704]  loss: 3.8815  loss_at_student: 3.8815  max mem: 7597
Epoch: 0 [ 74500 / 111704]  loss: 3.9927  loss_at_student: 3.9927  max mem: 7597
Epoch: 0 [ 75000 / 111704]  loss: 3.4703  loss_at_student: 3.4703  max mem: 7597
Epoch: 0 [ 75500 / 111704]  loss: 3.5760  loss_at_student: 3.5760  max mem: 7597
Epoch: 0 [ 76000 / 111704]  loss: 3.7167  loss_at_student: 3.7167  max mem: 7597
Epoch: 0 [ 76500 / 111704]  loss: 3.8299  loss_at_student: 3.8299  max mem: 7597
Epoch: 0 [ 77000 / 111704]  loss: 3.5532  loss_at_student: 3.5532  max mem: 7597
Epoch: 0 [ 77500 / 111704]  loss: 3.6085  loss_at_student: 3.6085  max mem: 7597
Epoch: 0 [ 78000 / 111704]  loss: 3.5591  loss_at_student: 3.5591  max mem: 7597
Epoch: 0 [ 78500 / 111704]  loss: 3.4231  loss_at_student: 3.4231  max mem: 7597
Epoch: 0 [ 79000 / 111704]  loss: 3.6261  loss_at_student: 3.6261  max mem: 7597
Epoch: 0 [ 79500 / 111704]  loss: 3.4383  loss_at_student: 3.4383  max mem: 7597
Epoch: 0 [ 80000 / 111704]  loss: 3.3283  loss_at_student: 3.3283  max mem: 7597
Epoch: 0 [ 80500 / 111704]  loss: 3.4221  loss_at_student: 3.4221  max mem: 7597
Epoch: 0 [ 81000 / 111704]  loss: 3.9027  loss_at_student: 3.9027  max mem: 7597
Epoch: 0 [ 81500 / 111704]  loss: 3.8388  loss_at_student: 3.8388  max mem: 7597
Epoch: 0 [ 82000 / 111704]  loss: 4.1729  loss_at_student: 4.1729  max mem: 7597
Epoch: 0 [ 82500 / 111704]  loss: 3.8036  loss_at_student: 3.8036  max mem: 7597
Epoch: 0 [ 83000 / 111704]  loss: 3.5903  loss_at_student: 3.5903  max mem: 7597
Epoch: 0 [ 83500 / 111704]  loss: 3.4736  loss_at_student: 3.4736  max mem: 7597
Epoch: 0 [ 84000 / 111704]  loss: 3.4524  loss_at_student: 3.4524  max mem: 7597
Epoch: 0 [ 84500 / 111704]  loss: 3.7814  loss_at_student: 3.7814  max mem: 7597
Epoch: 0 [ 85000 / 111704]  loss: 3.5877  loss_at_student: 3.5877  max mem: 7597
Epoch: 0 [ 85500 / 111704]  loss: 3.2953  loss_at_student: 3.2953  max mem: 7597
Epoch: 0 [ 86000 / 111704]  loss: 3.6137  loss_at_student: 3.6137  max mem: 7597
Epoch: 0 [ 86500 / 111704]  loss: 3.7357  loss_at_student: 3.7357  max mem: 7597
Epoch: 0 [ 87000 / 111704]  loss: 3.7362  loss_at_student: 3.7362  max mem: 7597
Epoch: 0 [ 87500 / 111704]  loss: 3.6342  loss_at_student: 3.6342  max mem: 7597
Epoch: 0 [ 88000 / 111704]  loss: 3.6499  loss_at_student: 3.6499  max mem: 7597
Epoch: 0 [ 88500 / 111704]  loss: 3.8936  loss_at_student: 3.8936  max mem: 7597
Epoch: 0 [ 89000 / 111704]  loss: 3.9120  loss_at_student: 3.9120  max mem: 7597
Epoch: 0 [ 89500 / 111704]  loss: 3.6771  loss_at_student: 3.6771  max mem: 7597
Epoch: 0 [ 90000 / 111704]  loss: 3.8526  loss_at_student: 3.8526  max mem: 7597
Epoch: 0 [ 90500 / 111704]  loss: 3.9116  loss_at_student: 3.9116  max mem: 7597
Epoch: 0 [ 91000 / 111704]  loss: 3.3960  loss_at_student: 3.3960  max mem: 7597
Epoch: 0 [ 91500 / 111704]  loss: 3.9203  loss_at_student: 3.9203  max mem: 7597
Epoch: 0 [ 92000 / 111704]  loss: 3.5709  loss_at_student: 3.5709  max mem: 7597
Epoch: 0 [ 92500 / 111704]  loss: 3.6945  loss_at_student: 3.6945  max mem: 7597
Epoch: 0 [ 93000 / 111704]  loss: 4.0280  loss_at_student: 4.0280  max mem: 7597
Epoch: 0 [ 93500 / 111704]  loss: 3.3604  loss_at_student: 3.3604  max mem: 7597
Epoch: 0 [ 94000 / 111704]  loss: 3.4572  loss_at_student: 3.4572  max mem: 7597
Epoch: 0 [ 94500 / 111704]  loss: 3.9002  loss_at_student: 3.9002  max mem: 7597
Epoch: 0 [ 95000 / 111704]  loss: 3.6444  loss_at_student: 3.6444  max mem: 7597
Epoch: 0 [ 95500 / 111704]  loss: 3.2206  loss_at_student: 3.2206  max mem: 7597
Epoch: 0 [ 96000 / 111704]  loss: 3.1926  loss_at_student: 3.1926  max mem: 7597
Epoch: 0 [ 96500 / 111704]  loss: 3.5636  loss_at_student: 3.5636  max mem: 7597
Epoch: 0 [ 97000 / 111704]  loss: 4.0269  loss_at_student: 4.0269  max mem: 7597
Epoch: 0 [ 97500 / 111704]  loss: 3.6760  loss_at_student: 3.6760  max mem: 7597
Epoch: 0 [ 98000 / 111704]  loss: 3.3086  loss_at_student: 3.3086  max mem: 7597
Epoch: 0 [ 98500 / 111704]  loss: 3.6044  loss_at_student: 3.6044  max mem: 7597
Epoch: 0 [ 99000 / 111704]  loss: 3.9427  loss_at_student: 3.9427  max mem: 7597
Epoch: 0 [ 99500 / 111704]  loss: 3.8270  loss_at_student: 3.8270  max mem: 7597
Epoch: 0 [100000 / 111704]  loss: 3.4903  loss_at_student: 3.4903  max mem: 7597
Epoch: 0 [100500 / 111704]  loss: 3.6302  loss_at_student: 3.6302  max mem: 7597
Epoch: 0 [101000 / 111704]  loss: 3.7080  loss_at_student: 3.7080  max mem: 7597
Epoch: 0 [101500 / 111704]  loss: 3.4830  loss_at_student: 3.4830  max mem: 7597
Epoch: 0 [102000 / 111704]  loss: 3.6739  loss_at_student: 3.6739  max mem: 7597
Epoch: 0 [102500 / 111704]  loss: 3.3773  loss_at_student: 3.3773  max mem: 7597
Epoch: 0 [103000 / 111704]  loss: 3.4852  loss_at_student: 3.4852  max mem: 7597
Epoch: 0 [103500 / 111704]  loss: 3.5963  loss_at_student: 3.5963  max mem: 7597
Epoch: 0 [104000 / 111704]  loss: 3.6638  loss_at_student: 3.6638  max mem: 7597
Epoch: 0 [104500 / 111704]  loss: 3.6741  loss_at_student: 3.6741  max mem: 7597
Epoch: 0 [105000 / 111704]  loss: 3.9578  loss_at_student: 3.9578  max mem: 7597
Epoch: 0 [105500 / 111704]  loss: 3.5483  loss_at_student: 3.5483  max mem: 7597
Epoch: 0 [106000 / 111704]  loss: 3.9791  loss_at_student: 3.9791  max mem: 7597
Epoch: 0 [106500 / 111704]  loss: 3.2237  loss_at_student: 3.2237  max mem: 7597
Epoch: 0 [107000 / 111704]  loss: 3.3677  loss_at_student: 3.3677  max mem: 7597
Epoch: 0 [107500 / 111704]  loss: 3.9328  loss_at_student: 3.9328  max mem: 7597
Epoch: 0 [108000 / 111704]  loss: 3.5512  loss_at_student: 3.5512  max mem: 7597
Epoch: 0 [108500 / 111704]  loss: 3.4838  loss_at_student: 3.4838  max mem: 7597
Epoch: 0 [109000 / 111704]  loss: 3.4433  loss_at_student: 3.4433  max mem: 7597
Epoch: 0 [109500 / 111704]  loss: 3.3684  loss_at_student: 3.3684  max mem: 7597
Epoch: 0 [110000 / 111704]  loss: 3.5861  loss_at_student: 3.5861  max mem: 7597
Epoch: 0 [110500 / 111704]  loss: 3.5507  loss_at_student: 3.5507  max mem: 7597
Epoch: 0 [111000 / 111704]  loss: 3.3398  loss_at_student: 3.3398  max mem: 7597
Epoch: 0 [111500 / 111704]  loss: 3.6632  loss_at_student: 3.6632  max mem: 7597
Averaged stats: loss: 4.2150  loss_at_student: 4.2150
Train epoch time: 3:58:26
Epoch: 1 [   296 / 111704]  loss: 3.6029  loss_at_student: 3.6029  max mem: 7597
Epoch: 1 [   796 / 111704]  loss: 3.6623  loss_at_student: 3.6623  max mem: 7597
Epoch: 1 [  1296 / 111704]  loss: 3.4661  loss_at_student: 3.4661  max mem: 7597
Epoch: 1 [  1796 / 111704]  loss: 3.2433  loss_at_student: 3.2433  max mem: 7597
Epoch: 1 [  2296 / 111704]  loss: 3.4813  loss_at_student: 3.4813  max mem: 7597
Epoch: 1 [  2796 / 111704]  loss: 3.3808  loss_at_student: 3.3808  max mem: 7597
Epoch: 1 [  3296 / 111704]  loss: 3.4983  loss_at_student: 3.4983  max mem: 7597
Epoch: 1 [  3796 / 111704]  loss: 3.4964  loss_at_student: 3.4964  max mem: 7597
Epoch: 1 [  4296 / 111704]  loss: 3.4458  loss_at_student: 3.4458  max mem: 7597
Epoch: 1 [  4796 / 111704]  loss: 3.5752  loss_at_student: 3.5752  max mem: 7597
Epoch: 1 [  5296 / 111704]  loss: 3.7625  loss_at_student: 3.7625  max mem: 7597
Epoch: 1 [  5796 / 111704]  loss: 3.4297  loss_at_student: 3.4297  max mem: 7597
Epoch: 1 [  6296 / 111704]  loss: 3.7925  loss_at_student: 3.7925  max mem: 7597
Epoch: 1 [  6796 / 111704]  loss: 3.0521  loss_at_student: 3.0521  max mem: 7597
Epoch: 1 [  7296 / 111704]  loss: 3.3852  loss_at_student: 3.3852  max mem: 7597
Epoch: 1 [  7796 / 111704]  loss: 3.2090  loss_at_student: 3.2090  max mem: 7597
Epoch: 1 [  8296 / 111704]  loss: 3.9120  loss_at_student: 3.9120  max mem: 7597
Epoch: 1 [  8796 / 111704]  loss: 3.2972  loss_at_student: 3.2972  max mem: 7597
Epoch: 1 [  9296 / 111704]  loss: 3.7184  loss_at_student: 3.7184  max mem: 7597
Epoch: 1 [  9796 / 111704]  loss: 3.5720  loss_at_student: 3.5720  max mem: 7597
Epoch: 1 [ 10296 / 111704]  loss: 3.6307  loss_at_student: 3.6307  max mem: 7597
Epoch: 1 [ 10796 / 111704]  loss: 3.2653  loss_at_student: 3.2653  max mem: 7597
Epoch: 1 [ 11296 / 111704]  loss: 3.5389  loss_at_student: 3.5389  max mem: 7597
Epoch: 1 [ 11796 / 111704]  loss: 3.6222  loss_at_student: 3.6222  max mem: 7597
Epoch: 1 [ 12296 / 111704]  loss: 3.4383  loss_at_student: 3.4383  max mem: 7597
Epoch: 1 [ 12796 / 111704]  loss: 3.2861  loss_at_student: 3.2861  max mem: 7597
Epoch: 1 [ 13296 / 111704]  loss: 3.6515  loss_at_student: 3.6515  max mem: 7597
Epoch: 1 [ 13796 / 111704]  loss: 3.2430  loss_at_student: 3.2430  max mem: 7597
Epoch: 1 [ 14296 / 111704]  loss: 3.5435  loss_at_student: 3.5435  max mem: 7597
Epoch: 1 [ 14796 / 111704]  loss: 3.3641  loss_at_student: 3.3641  max mem: 7597
Epoch: 1 [ 15296 / 111704]  loss: 3.6065  loss_at_student: 3.6065  max mem: 7597
Epoch: 1 [ 15796 / 111704]  loss: 3.4092  loss_at_student: 3.4092  max mem: 7597
Epoch: 1 [ 16296 / 111704]  loss: 3.6313  loss_at_student: 3.6313  max mem: 7597
Epoch: 1 [ 16796 / 111704]  loss: 3.6361  loss_at_student: 3.6361  max mem: 7597
Epoch: 1 [ 17296 / 111704]  loss: 3.3100  loss_at_student: 3.3100  max mem: 7597
Epoch: 1 [ 17796 / 111704]  loss: 3.8539  loss_at_student: 3.8539  max mem: 7597
Epoch: 1 [ 18296 / 111704]  loss: 3.4563  loss_at_student: 3.4563  max mem: 7597
Epoch: 1 [ 18796 / 111704]  loss: 3.6452  loss_at_student: 3.6452  max mem: 7597
Epoch: 1 [ 19296 / 111704]  loss: 3.2030  loss_at_student: 3.2030  max mem: 7597
Epoch: 1 [ 19796 / 111704]  loss: 3.6025  loss_at_student: 3.6025  max mem: 7597
Epoch: 1 [ 20296 / 111704]  loss: 3.8071  loss_at_student: 3.8071  max mem: 7597
Epoch: 1 [ 20796 / 111704]  loss: 3.5293  loss_at_student: 3.5293  max mem: 7597
Epoch: 1 [ 21296 / 111704]  loss: 3.1455  loss_at_student: 3.1455  max mem: 7597
Epoch: 1 [ 21796 / 111704]  loss: 3.1347  loss_at_student: 3.1347  max mem: 7597
Epoch: 1 [ 22296 / 111704]  loss: 3.4667  loss_at_student: 3.4667  max mem: 7597
Epoch: 1 [ 22796 / 111704]  loss: 3.4165  loss_at_student: 3.4165  max mem: 7597
Epoch: 1 [ 23296 / 111704]  loss: 3.5240  loss_at_student: 3.5240  max mem: 7597
Epoch: 1 [ 23796 / 111704]  loss: 3.3696  loss_at_student: 3.3696  max mem: 7597
Epoch: 1 [ 24296 / 111704]  loss: 3.1481  loss_at_student: 3.1481  max mem: 7597
Epoch: 1 [ 24796 / 111704]  loss: 3.5372  loss_at_student: 3.5372  max mem: 7597
Epoch: 1 [ 25296 / 111704]  loss: 3.0937  loss_at_student: 3.0937  max mem: 7597
Epoch: 1 [ 25796 / 111704]  loss: 3.2243  loss_at_student: 3.2243  max mem: 7597
Epoch: 1 [ 26296 / 111704]  loss: 3.3260  loss_at_student: 3.3260  max mem: 7597
Epoch: 1 [ 26796 / 111704]  loss: 3.1824  loss_at_student: 3.1824  max mem: 7597
Epoch: 1 [ 27296 / 111704]  loss: 3.2693  loss_at_student: 3.2693  max mem: 7597
Epoch: 1 [ 27796 / 111704]  loss: 2.9478  loss_at_student: 2.9478  max mem: 7597
Epoch: 1 [ 28296 / 111704]  loss: 3.2822  loss_at_student: 3.2822  max mem: 7597
Epoch: 1 [ 28796 / 111704]  loss: 3.1710  loss_at_student: 3.1710  max mem: 7597
Epoch: 1 [ 29296 / 111704]  loss: 3.6465  loss_at_student: 3.6465  max mem: 7597
Epoch: 1 [ 29796 / 111704]  loss: 3.4467  loss_at_student: 3.4467  max mem: 7597
Epoch: 1 [ 30296 / 111704]  loss: 3.2328  loss_at_student: 3.2328  max mem: 7597
Epoch: 1 [ 30796 / 111704]  loss: 3.4318  loss_at_student: 3.4318  max mem: 7597
Epoch: 1 [ 31296 / 111704]  loss: 3.5629  loss_at_student: 3.5629  max mem: 7597
Epoch: 1 [ 31796 / 111704]  loss: 3.4550  loss_at_student: 3.4550  max mem: 7597
Epoch: 1 [ 32296 / 111704]  loss: 3.5785  loss_at_student: 3.5785  max mem: 7597
Epoch: 1 [ 32796 / 111704]  loss: 3.5753  loss_at_student: 3.5753  max mem: 7597
Epoch: 1 [ 33296 / 111704]  loss: 2.9137  loss_at_student: 2.9137  max mem: 7597
Epoch: 1 [ 33796 / 111704]  loss: 3.7341  loss_at_student: 3.7341  max mem: 7597
Epoch: 1 [ 34296 / 111704]  loss: 3.2240  loss_at_student: 3.2240  max mem: 7597
Epoch: 1 [ 34796 / 111704]  loss: 3.4021  loss_at_student: 3.4021  max mem: 7597
Epoch: 1 [ 35296 / 111704]  loss: 3.4050  loss_at_student: 3.4050  max mem: 7597
Epoch: 1 [ 35796 / 111704]  loss: 3.4401  loss_at_student: 3.4401  max mem: 7597
Epoch: 1 [ 36296 / 111704]  loss: 3.2925  loss_at_student: 3.2925  max mem: 7597
Epoch: 1 [ 36796 / 111704]  loss: 3.1435  loss_at_student: 3.1435  max mem: 7597
Epoch: 1 [ 37296 / 111704]  loss: 3.3391  loss_at_student: 3.3391  max mem: 7597
Epoch: 1 [ 37796 / 111704]  loss: 3.5469  loss_at_student: 3.5469  max mem: 7597
Epoch: 1 [ 38296 / 111704]  loss: 3.0550  loss_at_student: 3.0550  max mem: 7597
Epoch: 1 [ 38796 / 111704]  loss: 3.6440  loss_at_student: 3.6440  max mem: 7597
Epoch: 1 [ 39296 / 111704]  loss: 3.2004  loss_at_student: 3.2004  max mem: 7597
Epoch: 1 [ 39796 / 111704]  loss: 3.4220  loss_at_student: 3.4220  max mem: 7597
Epoch: 1 [ 40296 / 111704]  loss: 3.5742  loss_at_student: 3.5742  max mem: 7597
Epoch: 1 [ 40796 / 111704]  loss: 3.4066  loss_at_student: 3.4066  max mem: 7597
Epoch: 1 [ 41296 / 111704]  loss: 3.5858  loss_at_student: 3.5858  max mem: 7597
Epoch: 1 [ 41796 / 111704]  loss: 3.1169  loss_at_student: 3.1169  max mem: 7597
Epoch: 1 [ 42296 / 111704]  loss: 3.5306  loss_at_student: 3.5306  max mem: 7597
Epoch: 1 [ 42796 / 111704]  loss: 3.5323  loss_at_student: 3.5323  max mem: 7597
Epoch: 1 [ 43296 / 111704]  loss: 3.5636  loss_at_student: 3.5636  max mem: 7597
Epoch: 1 [ 43796 / 111704]  loss: 3.3182  loss_at_student: 3.3182  max mem: 7597
Epoch: 1 [ 44296 / 111704]  loss: 3.4409  loss_at_student: 3.4409  max mem: 7597
Epoch: 1 [ 44796 / 111704]  loss: 3.7748  loss_at_student: 3.7748  max mem: 7597
Epoch: 1 [ 45296 / 111704]  loss: 3.2716  loss_at_student: 3.2716  max mem: 7597
Epoch: 1 [ 45796 / 111704]  loss: 3.3531  loss_at_student: 3.3531  max mem: 7597
Epoch: 1 [ 46296 / 111704]  loss: 3.1153  loss_at_student: 3.1153  max mem: 7597
Epoch: 1 [ 46796 / 111704]  loss: 3.7651  loss_at_student: 3.7651  max mem: 7597
Epoch: 1 [ 47296 / 111704]  loss: 3.4135  loss_at_student: 3.4135  max mem: 7597
Epoch: 1 [ 47796 / 111704]  loss: 3.3709  loss_at_student: 3.3709  max mem: 7597
Epoch: 1 [ 48296 / 111704]  loss: 3.7346  loss_at_student: 3.7346  max mem: 7597
Epoch: 1 [ 48796 / 111704]  loss: 3.0866  loss_at_student: 3.0866  max mem: 7597
Epoch: 1 [ 49296 / 111704]  loss: 3.4034  loss_at_student: 3.4034  max mem: 7597
Epoch: 1 [ 49796 / 111704]  loss: 3.2171  loss_at_student: 3.2171  max mem: 7597
Epoch: 1 [ 50296 / 111704]  loss: 3.4626  loss_at_student: 3.4626  max mem: 7597
Epoch: 1 [ 50796 / 111704]  loss: 3.2732  loss_at_student: 3.2732  max mem: 7597
Epoch: 1 [ 51296 / 111704]  loss: 3.3169  loss_at_student: 3.3169  max mem: 7597
Epoch: 1 [ 51796 / 111704]  loss: 3.6335  loss_at_student: 3.6335  max mem: 7597
Epoch: 1 [ 52296 / 111704]  loss: 3.4199  loss_at_student: 3.4199  max mem: 7597
Epoch: 1 [ 52796 / 111704]  loss: 3.1910  loss_at_student: 3.1910  max mem: 7597
Epoch: 1 [ 53296 / 111704]  loss: 3.4056  loss_at_student: 3.4056  max mem: 7597
Epoch: 1 [ 53796 / 111704]  loss: 3.7073  loss_at_student: 3.7073  max mem: 7597
Epoch: 1 [ 54296 / 111704]  loss: 3.0123  loss_at_student: 3.0123  max mem: 7597
Epoch: 1 [ 54796 / 111704]  loss: 2.8909  loss_at_student: 2.8909  max mem: 7597
Epoch: 1 [ 55296 / 111704]  loss: 3.5244  loss_at_student: 3.5244  max mem: 7597
Epoch: 1 [ 55796 / 111704]  loss: 3.3107  loss_at_student: 3.3107  max mem: 7597
Epoch: 1 [ 56296 / 111704]  loss: 3.5457  loss_at_student: 3.5457  max mem: 7597
Epoch: 1 [ 56796 / 111704]  loss: 3.5005  loss_at_student: 3.5005  max mem: 7597
Epoch: 1 [ 57296 / 111704]  loss: 3.1823  loss_at_student: 3.1823  max mem: 7597
Epoch: 1 [ 57796 / 111704]  loss: 3.7455  loss_at_student: 3.7455  max mem: 7597
Epoch: 1 [ 58296 / 111704]  loss: 3.3206  loss_at_student: 3.3206  max mem: 7597
Epoch: 1 [ 58796 / 111704]  loss: 3.3536  loss_at_student: 3.3536  max mem: 7597
Epoch: 1 [ 59296 / 111704]  loss: 3.5023  loss_at_student: 3.5023  max mem: 7597
Epoch: 1 [ 59796 / 111704]  loss: 3.2701  loss_at_student: 3.2701  max mem: 7597
Epoch: 1 [ 60296 / 111704]  loss: 3.2055  loss_at_student: 3.2055  max mem: 7597
Epoch: 1 [ 60796 / 111704]  loss: 3.3352  loss_at_student: 3.3352  max mem: 7597
Epoch: 1 [ 61296 / 111704]  loss: 3.2006  loss_at_student: 3.2006  max mem: 7597
Epoch: 1 [ 61796 / 111704]  loss: 3.3647  loss_at_student: 3.3647  max mem: 7597
Epoch: 1 [ 62296 / 111704]  loss: 3.5265  loss_at_student: 3.5265  max mem: 7597
Epoch: 1 [ 62796 / 111704]  loss: 3.0733  loss_at_student: 3.0733  max mem: 7597
Epoch: 1 [ 63296 / 111704]  loss: 3.0901  loss_at_student: 3.0901  max mem: 7597
Epoch: 1 [ 63796 / 111704]  loss: 3.0577  loss_at_student: 3.0577  max mem: 7597
Epoch: 1 [ 64296 / 111704]  loss: 3.3136  loss_at_student: 3.3136  max mem: 7597
Epoch: 1 [ 64796 / 111704]  loss: 3.1727  loss_at_student: 3.1727  max mem: 7597
Epoch: 1 [ 65296 / 111704]  loss: 3.5392  loss_at_student: 3.5392  max mem: 7597
Epoch: 1 [ 65796 / 111704]  loss: 3.2230  loss_at_student: 3.2230  max mem: 7597
Epoch: 1 [ 66296 / 111704]  loss: 3.5260  loss_at_student: 3.5260  max mem: 7597
Epoch: 1 [ 66796 / 111704]  loss: 3.4806  loss_at_student: 3.4806  max mem: 7597
Epoch: 1 [ 67296 / 111704]  loss: 3.0358  loss_at_student: 3.0358  max mem: 7597
Epoch: 1 [ 67796 / 111704]  loss: 3.1595  loss_at_student: 3.1595  max mem: 7597
Epoch: 1 [ 68296 / 111704]  loss: 3.4882  loss_at_student: 3.4882  max mem: 7597
Epoch: 1 [ 68796 / 111704]  loss: 2.9856  loss_at_student: 2.9856  max mem: 7597
Epoch: 1 [ 69296 / 111704]  loss: 3.2345  loss_at_student: 3.2345  max mem: 7597
Epoch: 1 [ 69796 / 111704]  loss: 3.6079  loss_at_student: 3.6079  max mem: 7597
Epoch: 1 [ 70296 / 111704]  loss: 3.0650  loss_at_student: 3.0650  max mem: 7597
Epoch: 1 [ 70796 / 111704]  loss: 3.1669  loss_at_student: 3.1669  max mem: 7597
Epoch: 1 [ 71296 / 111704]  loss: 2.8514  loss_at_student: 2.8514  max mem: 7597
Epoch: 1 [ 71796 / 111704]  loss: 3.3876  loss_at_student: 3.3876  max mem: 7597
Epoch: 1 [ 72296 / 111704]  loss: 3.1704  loss_at_student: 3.1704  max mem: 7597
Epoch: 1 [ 72796 / 111704]  loss: 3.4274  loss_at_student: 3.4274  max mem: 7597
Epoch: 1 [ 73296 / 111704]  loss: 3.1756  loss_at_student: 3.1756  max mem: 7597
Epoch: 1 [ 73796 / 111704]  loss: 3.4278  loss_at_student: 3.4278  max mem: 7597
Epoch: 1 [ 74296 / 111704]  loss: 3.3285  loss_at_student: 3.3285  max mem: 7597
Epoch: 1 [ 74796 / 111704]  loss: 3.2738  loss_at_student: 3.2738  max mem: 7597
Epoch: 1 [ 75296 / 111704]  loss: 3.0635  loss_at_student: 3.0635  max mem: 7597
Epoch: 1 [ 75796 / 111704]  loss: 3.3299  loss_at_student: 3.3299  max mem: 7597
Epoch: 1 [ 76296 / 111704]  loss: 3.1587  loss_at_student: 3.1587  max mem: 7597
Epoch: 1 [ 76796 / 111704]  loss: 3.0882  loss_at_student: 3.0882  max mem: 7597
Epoch: 1 [ 77296 / 111704]  loss: 3.0397  loss_at_student: 3.0397  max mem: 7597
Epoch: 1 [ 77796 / 111704]  loss: 3.5432  loss_at_student: 3.5432  max mem: 7597
Epoch: 1 [ 78296 / 111704]  loss: 3.4269  loss_at_student: 3.4269  max mem: 7597
Epoch: 1 [ 78796 / 111704]  loss: 3.4881  loss_at_student: 3.4881  max mem: 7597
Epoch: 1 [ 79296 / 111704]  loss: 3.5102  loss_at_student: 3.5102  max mem: 7597
Epoch: 1 [ 79796 / 111704]  loss: 2.9870  loss_at_student: 2.9870  max mem: 7597
Epoch: 1 [ 80296 / 111704]  loss: 3.6219  loss_at_student: 3.6219  max mem: 7597
Epoch: 1 [ 80796 / 111704]  loss: 2.9447  loss_at_student: 2.9447  max mem: 7597
Epoch: 1 [ 81296 / 111704]  loss: 3.3926  loss_at_student: 3.3926  max mem: 7597
Epoch: 1 [ 81796 / 111704]  loss: 2.9767  loss_at_student: 2.9767  max mem: 7597
Epoch: 1 [ 82296 / 111704]  loss: 3.5474  loss_at_student: 3.5474  max mem: 7597
Epoch: 1 [ 82796 / 111704]  loss: 3.5902  loss_at_student: 3.5902  max mem: 7597
Epoch: 1 [ 83296 / 111704]  loss: 3.4155  loss_at_student: 3.4155  max mem: 7597
Epoch: 1 [ 83796 / 111704]  loss: 3.1603  loss_at_student: 3.1603  max mem: 7597
Epoch: 1 [ 84296 / 111704]  loss: 3.8424  loss_at_student: 3.8424  max mem: 7597
Epoch: 1 [ 84796 / 111704]  loss: 3.2034  loss_at_student: 3.2034  max mem: 7597
Epoch: 1 [ 85296 / 111704]  loss: 3.1573  loss_at_student: 3.1573  max mem: 7597
Epoch: 1 [ 85796 / 111704]  loss: 3.7017  loss_at_student: 3.7017  max mem: 7597
Epoch: 1 [ 86296 / 111704]  loss: 3.2270  loss_at_student: 3.2270  max mem: 7597
Epoch: 1 [ 86796 / 111704]  loss: 3.3402  loss_at_student: 3.3402  max mem: 7597
Epoch: 1 [ 87296 / 111704]  loss: 3.4993  loss_at_student: 3.4993  max mem: 7597
Epoch: 1 [ 87796 / 111704]  loss: 3.7399  loss_at_student: 3.7399  max mem: 7597
Epoch: 1 [ 88296 / 111704]  loss: 3.2117  loss_at_student: 3.2117  max mem: 7597
Epoch: 1 [ 88796 / 111704]  loss: 3.5974  loss_at_student: 3.5974  max mem: 7597
Epoch: 1 [ 89296 / 111704]  loss: 3.5153  loss_at_student: 3.5153  max mem: 7597
Epoch: 1 [ 89796 / 111704]  loss: 3.4865  loss_at_student: 3.4865  max mem: 7597
Epoch: 1 [ 90296 / 111704]  loss: 3.0485  loss_at_student: 3.0485  max mem: 7597
Epoch: 1 [ 90796 / 111704]  loss: 3.2208  loss_at_student: 3.2208  max mem: 7597
Epoch: 1 [ 91296 / 111704]  loss: 3.0650  loss_at_student: 3.0650  max mem: 7597
Epoch: 1 [ 91796 / 111704]  loss: 3.3943  loss_at_student: 3.3943  max mem: 7597
Epoch: 1 [ 92296 / 111704]  loss: 3.3520  loss_at_student: 3.3520  max mem: 7597
Epoch: 1 [ 92796 / 111704]  loss: 3.3314  loss_at_student: 3.3314  max mem: 7597
Epoch: 1 [ 93296 / 111704]  loss: 3.1173  loss_at_student: 3.1173  max mem: 7597
Epoch: 1 [ 93796 / 111704]  loss: 3.1904  loss_at_student: 3.1904  max mem: 7597
Epoch: 1 [ 94296 / 111704]  loss: 3.2286  loss_at_student: 3.2286  max mem: 7597
Epoch: 1 [ 94796 / 111704]  loss: 3.2978  loss_at_student: 3.2978  max mem: 7597
Epoch: 1 [ 95296 / 111704]  loss: 3.4678  loss_at_student: 3.4678  max mem: 7597
Epoch: 1 [ 95796 / 111704]  loss: 3.4887  loss_at_student: 3.4887  max mem: 7597
Epoch: 1 [ 96296 / 111704]  loss: 3.1410  loss_at_student: 3.1410  max mem: 7597
Epoch: 1 [ 96796 / 111704]  loss: 2.9872  loss_at_student: 2.9872  max mem: 7597
Epoch: 1 [ 97296 / 111704]  loss: 3.5573  loss_at_student: 3.5573  max mem: 7597
Epoch: 1 [ 97796 / 111704]  loss: 3.1718  loss_at_student: 3.1718  max mem: 7597
Epoch: 1 [ 98296 / 111704]  loss: 3.2211  loss_at_student: 3.2211  max mem: 7597
Epoch: 1 [ 98796 / 111704]  loss: 3.6510  loss_at_student: 3.6510  max mem: 7597
Epoch: 1 [ 99296 / 111704]  loss: 2.9727  loss_at_student: 2.9727  max mem: 7597
Epoch: 1 [ 99796 / 111704]  loss: 3.3128  loss_at_student: 3.3128  max mem: 7597
Epoch: 1 [100296 / 111704]  loss: 3.2027  loss_at_student: 3.2027  max mem: 7597
Epoch: 1 [100796 / 111704]  loss: 3.2118  loss_at_student: 3.2118  max mem: 7597
Epoch: 1 [101296 / 111704]  loss: 3.1509  loss_at_student: 3.1509  max mem: 7597
Epoch: 1 [101796 / 111704]  loss: 2.8168  loss_at_student: 2.8168  max mem: 7597
Epoch: 1 [102296 / 111704]  loss: 3.3901  loss_at_student: 3.3901  max mem: 7597
Epoch: 1 [102796 / 111704]  loss: 3.0754  loss_at_student: 3.0754  max mem: 7597
Epoch: 1 [103296 / 111704]  loss: 3.0242  loss_at_student: 3.0242  max mem: 7597
Epoch: 1 [103796 / 111704]  loss: 3.2743  loss_at_student: 3.2743  max mem: 7597
Epoch: 1 [104296 / 111704]  loss: 3.3502  loss_at_student: 3.3502  max mem: 7597
Epoch: 1 [104796 / 111704]  loss: 3.2919  loss_at_student: 3.2919  max mem: 7597
Epoch: 1 [105296 / 111704]  loss: 3.1074  loss_at_student: 3.1074  max mem: 7597
Epoch: 1 [105796 / 111704]  loss: 3.3843  loss_at_student: 3.3843  max mem: 7597
Epoch: 1 [106296 / 111704]  loss: 3.1101  loss_at_student: 3.1101  max mem: 7597
Epoch: 1 [106796 / 111704]  loss: 3.1543  loss_at_student: 3.1543  max mem: 7597
Epoch: 1 [107296 / 111704]  loss: 3.1192  loss_at_student: 3.1192  max mem: 7597
Epoch: 1 [107796 / 111704]  loss: 3.3150  loss_at_student: 3.3150  max mem: 7597
Epoch: 1 [108296 / 111704]  loss: 3.0263  loss_at_student: 3.0263  max mem: 7597
Epoch: 1 [108796 / 111704]  loss: 3.5272  loss_at_student: 3.5272  max mem: 7597
Epoch: 1 [109296 / 111704]  loss: 3.1728  loss_at_student: 3.1728  max mem: 7597
Epoch: 1 [109796 / 111704]  loss: 2.9165  loss_at_student: 2.9165  max mem: 7597
Epoch: 1 [110296 / 111704]  loss: 3.2870  loss_at_student: 3.2870  max mem: 7597
Epoch: 1 [110796 / 111704]  loss: 3.0450  loss_at_student: 3.0450  max mem: 7597
Epoch: 1 [111296 / 111704]  loss: 3.0864  loss_at_student: 3.0864  max mem: 7597
Averaged stats: loss: 3.3426  loss_at_student: 3.3426
Train epoch time: 3:58:27
Epoch: 2 [    92 / 111704]  loss: 3.1570  loss_at_student: 3.1570  max mem: 7597
Epoch: 2 [   592 / 111704]  loss: 3.1492  loss_at_student: 3.1492  max mem: 7597
Epoch: 2 [  1092 / 111704]  loss: 3.3456  loss_at_student: 3.3456  max mem: 7597
Epoch: 2 [  1592 / 111704]  loss: 3.1372  loss_at_student: 3.1372  max mem: 7597
Epoch: 2 [  2092 / 111704]  loss: 3.2438  loss_at_student: 3.2438  max mem: 7597
Epoch: 2 [  2592 / 111704]  loss: 3.6019  loss_at_student: 3.6019  max mem: 7597
Epoch: 2 [  3092 / 111704]  loss: 3.1922  loss_at_student: 3.1922  max mem: 7597
Epoch: 2 [  3592 / 111704]  loss: 3.3063  loss_at_student: 3.3063  max mem: 7597
Epoch: 2 [  4092 / 111704]  loss: 3.3038  loss_at_student: 3.3038  max mem: 7597
Epoch: 2 [  4592 / 111704]  loss: 3.2484  loss_at_student: 3.2484  max mem: 7597
Epoch: 2 [  5092 / 111704]  loss: 3.3163  loss_at_student: 3.3163  max mem: 7597
Epoch: 2 [  5592 / 111704]  loss: 3.4034  loss_at_student: 3.4034  max mem: 7597
Epoch: 2 [  6092 / 111704]  loss: 3.3259  loss_at_student: 3.3259  max mem: 7597
Epoch: 2 [  6592 / 111704]  loss: 3.5461  loss_at_student: 3.5461  max mem: 7597
Epoch: 2 [  7092 / 111704]  loss: 3.0349  loss_at_student: 3.0349  max mem: 7597
Epoch: 2 [  7592 / 111704]  loss: 3.4164  loss_at_student: 3.4164  max mem: 7597
Epoch: 2 [  8092 / 111704]  loss: 3.2787  loss_at_student: 3.2787  max mem: 7597
Epoch: 2 [  8592 / 111704]  loss: 3.4127  loss_at_student: 3.4127  max mem: 7597
Epoch: 2 [  9092 / 111704]  loss: 2.9611  loss_at_student: 2.9611  max mem: 7597
Epoch: 2 [  9592 / 111704]  loss: 3.1175  loss_at_student: 3.1175  max mem: 7597
Epoch: 2 [ 10092 / 111704]  loss: 3.1947  loss_at_student: 3.1947  max mem: 7597
Epoch: 2 [ 10592 / 111704]  loss: 3.1617  loss_at_student: 3.1617  max mem: 7597
Epoch: 2 [ 11092 / 111704]  loss: 3.2706  loss_at_student: 3.2706  max mem: 7597
Epoch: 2 [ 11592 / 111704]  loss: 3.2594  loss_at_student: 3.2594  max mem: 7597
Epoch: 2 [ 12092 / 111704]  loss: 2.9815  loss_at_student: 2.9815  max mem: 7597
Epoch: 2 [ 12592 / 111704]  loss: 3.2797  loss_at_student: 3.2797  max mem: 7597
Epoch: 2 [ 13092 / 111704]  loss: 2.8657  loss_at_student: 2.8657  max mem: 7597
Epoch: 2 [ 13592 / 111704]  loss: 2.9151  loss_at_student: 2.9151  max mem: 7597
Epoch: 2 [ 14092 / 111704]  loss: 3.1873  loss_at_student: 3.1873  max mem: 7597
Epoch: 2 [ 14592 / 111704]  loss: 3.1420  loss_at_student: 3.1420  max mem: 7597
Epoch: 2 [ 15092 / 111704]  loss: 3.3000  loss_at_student: 3.3000  max mem: 7597
Epoch: 2 [ 15592 / 111704]  loss: 3.1154  loss_at_student: 3.1154  max mem: 7597
Epoch: 2 [ 16092 / 111704]  loss: 3.6799  loss_at_student: 3.6799  max mem: 7597
Epoch: 2 [ 16592 / 111704]  loss: 2.9719  loss_at_student: 2.9719  max mem: 7597
Epoch: 2 [ 17092 / 111704]  loss: 3.3178  loss_at_student: 3.3178  max mem: 7597
Epoch: 2 [ 17592 / 111704]  loss: 3.0249  loss_at_student: 3.0249  max mem: 7597
Epoch: 2 [ 18092 / 111704]  loss: 3.1124  loss_at_student: 3.1124  max mem: 7597
Epoch: 2 [ 18592 / 111704]  loss: 3.0208  loss_at_student: 3.0208  max mem: 7597
Epoch: 2 [ 19092 / 111704]  loss: 3.4148  loss_at_student: 3.4148  max mem: 7597
Epoch: 2 [ 19592 / 111704]  loss: 2.7564  loss_at_student: 2.7564  max mem: 7597
Epoch: 2 [ 20092 / 111704]  loss: 3.0958  loss_at_student: 3.0958  max mem: 7597
Epoch: 2 [ 20592 / 111704]  loss: 2.9602  loss_at_student: 2.9602  max mem: 7597
Epoch: 2 [ 21092 / 111704]  loss: 3.1279  loss_at_student: 3.1279  max mem: 7597
Epoch: 2 [ 21592 / 111704]  loss: 2.5996  loss_at_student: 2.5996  max mem: 7597
Epoch: 2 [ 22092 / 111704]  loss: 2.9255  loss_at_student: 2.9255  max mem: 7597
Epoch: 2 [ 22592 / 111704]  loss: 2.8220  loss_at_student: 2.8220  max mem: 7597
Epoch: 2 [ 23092 / 111704]  loss: 3.5968  loss_at_student: 3.5968  max mem: 7597
Epoch: 2 [ 23592 / 111704]  loss: 3.1218  loss_at_student: 3.1218  max mem: 7597
Epoch: 2 [ 24092 / 111704]  loss: 3.0281  loss_at_student: 3.0281  max mem: 7597
Epoch: 2 [ 24592 / 111704]  loss: 2.9733  loss_at_student: 2.9733  max mem: 7597
Epoch: 2 [ 25092 / 111704]  loss: 2.9832  loss_at_student: 2.9832  max mem: 7597
Epoch: 2 [ 25592 / 111704]  loss: 3.1556  loss_at_student: 3.1556  max mem: 7597
Epoch: 2 [ 26092 / 111704]  loss: 3.5751  loss_at_student: 3.5751  max mem: 7597
Epoch: 2 [ 26592 / 111704]  loss: 3.0645  loss_at_student: 3.0645  max mem: 7597
Epoch: 2 [ 27092 / 111704]  loss: 3.2230  loss_at_student: 3.2230  max mem: 7597
Epoch: 2 [ 27592 / 111704]  loss: 3.1791  loss_at_student: 3.1791  max mem: 7597
Epoch: 2 [ 28092 / 111704]  loss: 3.1030  loss_at_student: 3.1030  max mem: 7597
Epoch: 2 [ 28592 / 111704]  loss: 2.9599  loss_at_student: 2.9599  max mem: 7597
Epoch: 2 [ 29092 / 111704]  loss: 3.2918  loss_at_student: 3.2918  max mem: 7597
Epoch: 2 [ 29592 / 111704]  loss: 3.5885  loss_at_student: 3.5885  max mem: 7597
Epoch: 2 [ 30092 / 111704]  loss: 3.0141  loss_at_student: 3.0141  max mem: 7597
Epoch: 2 [ 30592 / 111704]  loss: 2.9831  loss_at_student: 2.9831  max mem: 7597
Epoch: 2 [ 31092 / 111704]  loss: 2.7934  loss_at_student: 2.7934  max mem: 7597
Epoch: 2 [ 31592 / 111704]  loss: 2.9667  loss_at_student: 2.9667  max mem: 7597
Epoch: 2 [ 32092 / 111704]  loss: 3.1315  loss_at_student: 3.1315  max mem: 7597
Epoch: 2 [ 32592 / 111704]  loss: 3.2508  loss_at_student: 3.2508  max mem: 7597
Epoch: 2 [ 33092 / 111704]  loss: 3.2722  loss_at_student: 3.2722  max mem: 7597
Epoch: 2 [ 33592 / 111704]  loss: 2.7211  loss_at_student: 2.7211  max mem: 7597
Epoch: 2 [ 34092 / 111704]  loss: 2.8365  loss_at_student: 2.8365  max mem: 7597
Epoch: 2 [ 34592 / 111704]  loss: 3.3109  loss_at_student: 3.3109  max mem: 7597
Epoch: 2 [ 35092 / 111704]  loss: 3.0362  loss_at_student: 3.0362  max mem: 7597
Epoch: 2 [ 35592 / 111704]  loss: 2.9647  loss_at_student: 2.9647  max mem: 7597
Epoch: 2 [ 36092 / 111704]  loss: 3.1992  loss_at_student: 3.1992  max mem: 7597
Epoch: 2 [ 36592 / 111704]  loss: 3.1449  loss_at_student: 3.1449  max mem: 7597
Epoch: 2 [ 37092 / 111704]  loss: 3.2123  loss_at_student: 3.2123  max mem: 7597
Epoch: 2 [ 37592 / 111704]  loss: 2.9693  loss_at_student: 2.9693  max mem: 7597
Epoch: 2 [ 38092 / 111704]  loss: 3.0670  loss_at_student: 3.0670  max mem: 7597
Epoch: 2 [ 38592 / 111704]  loss: 3.1207  loss_at_student: 3.1207  max mem: 7597
Epoch: 2 [ 39092 / 111704]  loss: 3.1011  loss_at_student: 3.1011  max mem: 7597
Epoch: 2 [ 39592 / 111704]  loss: 3.2596  loss_at_student: 3.2596  max mem: 7597
Epoch: 2 [ 40092 / 111704]  loss: 2.8965  loss_at_student: 2.8965  max mem: 7597
Epoch: 2 [ 40592 / 111704]  loss: 3.0696  loss_at_student: 3.0696  max mem: 7597
Epoch: 2 [ 41092 / 111704]  loss: 3.3265  loss_at_student: 3.3265  max mem: 7597
Epoch: 2 [ 41592 / 111704]  loss: 3.4100  loss_at_student: 3.4100  max mem: 7597
Epoch: 2 [ 42092 / 111704]  loss: 2.9811  loss_at_student: 2.9811  max mem: 7597
Epoch: 2 [ 42592 / 111704]  loss: 3.0444  loss_at_student: 3.0444  max mem: 7597
Epoch: 2 [ 43092 / 111704]  loss: 2.9677  loss_at_student: 2.9677  max mem: 7597
Epoch: 2 [ 43592 / 111704]  loss: 3.1948  loss_at_student: 3.1948  max mem: 7597
Epoch: 2 [ 44092 / 111704]  loss: 3.0865  loss_at_student: 3.0865  max mem: 7597
Epoch: 2 [ 44592 / 111704]  loss: 2.9306  loss_at_student: 2.9306  max mem: 7597
Epoch: 2 [ 45092 / 111704]  loss: 3.2895  loss_at_student: 3.2895  max mem: 7597
Epoch: 2 [ 45592 / 111704]  loss: 2.9763  loss_at_student: 2.9763  max mem: 7597
Epoch: 2 [ 46092 / 111704]  loss: 3.0334  loss_at_student: 3.0334  max mem: 7597
Epoch: 2 [ 46592 / 111704]  loss: 3.1123  loss_at_student: 3.1123  max mem: 7597
Epoch: 2 [ 47092 / 111704]  loss: 3.0714  loss_at_student: 3.0714  max mem: 7597
Epoch: 2 [ 47592 / 111704]  loss: 2.9862  loss_at_student: 2.9862  max mem: 7597
Epoch: 2 [ 48092 / 111704]  loss: 2.9168  loss_at_student: 2.9168  max mem: 7597
Epoch: 2 [ 48592 / 111704]  loss: 3.0830  loss_at_student: 3.0830  max mem: 7597
Epoch: 2 [ 49092 / 111704]  loss: 3.1148  loss_at_student: 3.1148  max mem: 7597
Epoch: 2 [ 49592 / 111704]  loss: 3.3342  loss_at_student: 3.3342  max mem: 7597
Epoch: 2 [ 50092 / 111704]  loss: 3.1005  loss_at_student: 3.1005  max mem: 7597
Epoch: 2 [ 50592 / 111704]  loss: 3.2425  loss_at_student: 3.2425  max mem: 7597
Epoch: 2 [ 51092 / 111704]  loss: 3.3952  loss_at_student: 3.3952  max mem: 7597
Epoch: 2 [ 51592 / 111704]  loss: 3.2944  loss_at_student: 3.2944  max mem: 7597
Epoch: 2 [ 52092 / 111704]  loss: 3.1615  loss_at_student: 3.1615  max mem: 7597
Epoch: 2 [ 52592 / 111704]  loss: 3.5014  loss_at_student: 3.5014  max mem: 7597
Epoch: 2 [ 53092 / 111704]  loss: 3.0910  loss_at_student: 3.0910  max mem: 7597
Epoch: 2 [ 53592 / 111704]  loss: 3.0033  loss_at_student: 3.0033  max mem: 7597
Epoch: 2 [ 54092 / 111704]  loss: 3.3219  loss_at_student: 3.3219  max mem: 7597
Epoch: 2 [ 54592 / 111704]  loss: 2.9142  loss_at_student: 2.9142  max mem: 7597
Epoch: 2 [ 55092 / 111704]  loss: 3.4974  loss_at_student: 3.4974  max mem: 7597
Epoch: 2 [ 55592 / 111704]  loss: 3.0405  loss_at_student: 3.0405  max mem: 7597
Epoch: 2 [ 56092 / 111704]  loss: 3.0574  loss_at_student: 3.0574  max mem: 7597
Epoch: 2 [ 56592 / 111704]  loss: 3.1187  loss_at_student: 3.1187  max mem: 7597
Epoch: 2 [ 57092 / 111704]  loss: 3.0074  loss_at_student: 3.0074  max mem: 7597
Epoch: 2 [ 57592 / 111704]  loss: 3.1858  loss_at_student: 3.1858  max mem: 7597
Epoch: 2 [ 58092 / 111704]  loss: 3.1928  loss_at_student: 3.1928  max mem: 7597
Epoch: 2 [ 58592 / 111704]  loss: 3.0415  loss_at_student: 3.0415  max mem: 7597
Epoch: 2 [ 59092 / 111704]  loss: 3.1918  loss_at_student: 3.1918  max mem: 7597
Epoch: 2 [ 59592 / 111704]  loss: 3.3756  loss_at_student: 3.3756  max mem: 7597
Epoch: 2 [ 60092 / 111704]  loss: 3.3203  loss_at_student: 3.3203  max mem: 7597
Epoch: 2 [ 60592 / 111704]  loss: 3.1989  loss_at_student: 3.1989  max mem: 7597
Epoch: 2 [ 61092 / 111704]  loss: 3.0439  loss_at_student: 3.0439  max mem: 7597
Epoch: 2 [ 61592 / 111704]  loss: 3.0236  loss_at_student: 3.0236  max mem: 7597
Epoch: 2 [ 62092 / 111704]  loss: 3.0663  loss_at_student: 3.0663  max mem: 7597
Epoch: 2 [ 62592 / 111704]  loss: 2.8388  loss_at_student: 2.8388  max mem: 7597
Epoch: 2 [ 63092 / 111704]  loss: 3.3522  loss_at_student: 3.3522  max mem: 7597
Epoch: 2 [ 63592 / 111704]  loss: 3.0969  loss_at_student: 3.0969  max mem: 7597
Epoch: 2 [ 64092 / 111704]  loss: 3.2880  loss_at_student: 3.2880  max mem: 7597
Epoch: 2 [ 64592 / 111704]  loss: 3.1295  loss_at_student: 3.1295  max mem: 7597
Epoch: 2 [ 65092 / 111704]  loss: 3.1881  loss_at_student: 3.1881  max mem: 7597
Epoch: 2 [ 65592 / 111704]  loss: 3.1249  loss_at_student: 3.1249  max mem: 7597
Epoch: 2 [ 66092 / 111704]  loss: 2.9852  loss_at_student: 2.9852  max mem: 7597
Epoch: 2 [ 66592 / 111704]  loss: 3.1692  loss_at_student: 3.1692  max mem: 7597
Epoch: 2 [ 67092 / 111704]  loss: 2.7819  loss_at_student: 2.7819  max mem: 7597
Epoch: 2 [ 67592 / 111704]  loss: 3.1106  loss_at_student: 3.1106  max mem: 7597
Epoch: 2 [ 68092 / 111704]  loss: 3.5338  loss_at_student: 3.5338  max mem: 7597
Epoch: 2 [ 68592 / 111704]  loss: 3.0590  loss_at_student: 3.0590  max mem: 7597
Epoch: 2 [ 69092 / 111704]  loss: 3.3025  loss_at_student: 3.3025  max mem: 7597
Epoch: 2 [ 69592 / 111704]  loss: 3.4662  loss_at_student: 3.4662  max mem: 7597
Epoch: 2 [ 70092 / 111704]  loss: 2.8658  loss_at_student: 2.8658  max mem: 7597
Epoch: 2 [ 70592 / 111704]  loss: 3.2873  loss_at_student: 3.2873  max mem: 7597
Epoch: 2 [ 71092 / 111704]  loss: 3.4558  loss_at_student: 3.4558  max mem: 7597
Epoch: 2 [ 71592 / 111704]  loss: 2.7339  loss_at_student: 2.7339  max mem: 7597
Epoch: 2 [ 72092 / 111704]  loss: 3.0521  loss_at_student: 3.0521  max mem: 7597
Epoch: 2 [ 72592 / 111704]  loss: 3.2664  loss_at_student: 3.2664  max mem: 7597
Epoch: 2 [ 73092 / 111704]  loss: 2.7575  loss_at_student: 2.7575  max mem: 7597
Epoch: 2 [ 73592 / 111704]  loss: 3.0140  loss_at_student: 3.0140  max mem: 7597
Epoch: 2 [ 74092 / 111704]  loss: 2.6482  loss_at_student: 2.6482  max mem: 7597
Epoch: 2 [ 74592 / 111704]  loss: 3.2003  loss_at_student: 3.2003  max mem: 7597
Epoch: 2 [ 75092 / 111704]  loss: 3.6019  loss_at_student: 3.6019  max mem: 7597
Epoch: 2 [ 75592 / 111704]  loss: 3.3288  loss_at_student: 3.3288  max mem: 7597
Epoch: 2 [ 76092 / 111704]  loss: 2.8977  loss_at_student: 2.8977  max mem: 7597
Epoch: 2 [ 76592 / 111704]  loss: 3.5917  loss_at_student: 3.5917  max mem: 7597
Epoch: 2 [ 77092 / 111704]  loss: 2.9039  loss_at_student: 2.9039  max mem: 7597
Epoch: 2 [ 77592 / 111704]  loss: 3.4392  loss_at_student: 3.4392  max mem: 7597
Epoch: 2 [ 78092 / 111704]  loss: 2.8249  loss_at_student: 2.8249  max mem: 7597
Epoch: 2 [ 78592 / 111704]  loss: 3.1704  loss_at_student: 3.1704  max mem: 7597
Epoch: 2 [ 79092 / 111704]  loss: 2.9782  loss_at_student: 2.9782  max mem: 7597
Epoch: 2 [ 79592 / 111704]  loss: 2.8502  loss_at_student: 2.8502  max mem: 7597
Epoch: 2 [ 80092 / 111704]  loss: 2.9841  loss_at_student: 2.9841  max mem: 7597
Epoch: 2 [ 80592 / 111704]  loss: 3.1727  loss_at_student: 3.1727  max mem: 7597
Epoch: 2 [ 81092 / 111704]  loss: 3.4080  loss_at_student: 3.4080  max mem: 7597
Epoch: 2 [ 81592 / 111704]  loss: 3.2048  loss_at_student: 3.2048  max mem: 7597
Epoch: 2 [ 82092 / 111704]  loss: 2.9910  loss_at_student: 2.9910  max mem: 7597
Epoch: 2 [ 82592 / 111704]  loss: 3.1524  loss_at_student: 3.1524  max mem: 7597
Epoch: 2 [ 83092 / 111704]  loss: 3.1920  loss_at_student: 3.1920  max mem: 7597
Epoch: 2 [ 83592 / 111704]  loss: 3.0559  loss_at_student: 3.0559  max mem: 7597
Epoch: 2 [ 84092 / 111704]  loss: 2.8880  loss_at_student: 2.8880  max mem: 7597
Epoch: 2 [ 84592 / 111704]  loss: 3.2365  loss_at_student: 3.2365  max mem: 7597
Epoch: 2 [ 85092 / 111704]  loss: 2.8279  loss_at_student: 2.8279  max mem: 7597
Epoch: 2 [ 85592 / 111704]  loss: 2.9217  loss_at_student: 2.9217  max mem: 7597
Epoch: 2 [ 86092 / 111704]  loss: 3.3111  loss_at_student: 3.3111  max mem: 7597
Epoch: 2 [ 86592 / 111704]  loss: 3.2999  loss_at_student: 3.2999  max mem: 7597
Epoch: 2 [ 87092 / 111704]  loss: 3.1021  loss_at_student: 3.1021  max mem: 7597
Epoch: 2 [ 87592 / 111704]  loss: 2.7914  loss_at_student: 2.7914  max mem: 7597
Epoch: 2 [ 88092 / 111704]  loss: 3.0102  loss_at_student: 3.0102  max mem: 7597
Epoch: 2 [ 88592 / 111704]  loss: 3.3256  loss_at_student: 3.3256  max mem: 7597
Epoch: 2 [ 89092 / 111704]  loss: 3.1430  loss_at_student: 3.1430  max mem: 7597
Epoch: 2 [ 89592 / 111704]  loss: 3.1627  loss_at_student: 3.1627  max mem: 7597
Epoch: 2 [ 90092 / 111704]  loss: 2.8412  loss_at_student: 2.8412  max mem: 7597
Epoch: 2 [ 90592 / 111704]  loss: 2.9398  loss_at_student: 2.9398  max mem: 7597
Epoch: 2 [ 91092 / 111704]  loss: 2.9865  loss_at_student: 2.9865  max mem: 7597
Epoch: 2 [ 91592 / 111704]  loss: 3.0481  loss_at_student: 3.0481  max mem: 7597
Epoch: 2 [ 92092 / 111704]  loss: 2.6848  loss_at_student: 2.6848  max mem: 7597
Epoch: 2 [ 92592 / 111704]  loss: 3.2974  loss_at_student: 3.2974  max mem: 7597
Epoch: 2 [ 93092 / 111704]  loss: 2.9354  loss_at_student: 2.9354  max mem: 7597
Epoch: 2 [ 93592 / 111704]  loss: 3.2387  loss_at_student: 3.2387  max mem: 7597
Epoch: 2 [ 94092 / 111704]  loss: 2.5645  loss_at_student: 2.5645  max mem: 7597
Epoch: 2 [ 94592 / 111704]  loss: 2.8155  loss_at_student: 2.8155  max mem: 7597
Epoch: 2 [ 95092 / 111704]  loss: 3.1809  loss_at_student: 3.1809  max mem: 7597
Epoch: 2 [ 95592 / 111704]  loss: 3.0687  loss_at_student: 3.0687  max mem: 7597
Epoch: 2 [ 96092 / 111704]  loss: 3.0573  loss_at_student: 3.0573  max mem: 7597
Epoch: 2 [ 96592 / 111704]  loss: 3.3157  loss_at_student: 3.3157  max mem: 7597
Epoch: 2 [ 97092 / 111704]  loss: 2.8827  loss_at_student: 2.8827  max mem: 7597
Epoch: 2 [ 97592 / 111704]  loss: 2.9934  loss_at_student: 2.9934  max mem: 7597
Epoch: 2 [ 98092 / 111704]  loss: 3.0306  loss_at_student: 3.0306  max mem: 7597
Epoch: 2 [ 98592 / 111704]  loss: 3.0934  loss_at_student: 3.0934  max mem: 7597
Epoch: 2 [ 99092 / 111704]  loss: 2.9938  loss_at_student: 2.9938  max mem: 7597
Epoch: 2 [ 99592 / 111704]  loss: 3.3441  loss_at_student: 3.3441  max mem: 7597
Epoch: 2 [100092 / 111704]  loss: 3.0537  loss_at_student: 3.0537  max mem: 7597
Epoch: 2 [100592 / 111704]  loss: 3.3418  loss_at_student: 3.3418  max mem: 7597
Epoch: 2 [101092 / 111704]  loss: 3.0892  loss_at_student: 3.0892  max mem: 7597
Epoch: 2 [101592 / 111704]  loss: 3.2821  loss_at_student: 3.2821  max mem: 7597
Epoch: 2 [102092 / 111704]  loss: 3.0410  loss_at_student: 3.0410  max mem: 7597
Epoch: 2 [102592 / 111704]  loss: 2.9763  loss_at_student: 2.9763  max mem: 7597
Epoch: 2 [103092 / 111704]  loss: 3.2817  loss_at_student: 3.2817  max mem: 7597
Epoch: 2 [103592 / 111704]  loss: 2.9892  loss_at_student: 2.9892  max mem: 7597
Epoch: 2 [104092 / 111704]  loss: 3.0735  loss_at_student: 3.0735  max mem: 7597
Epoch: 2 [104592 / 111704]  loss: 3.1850  loss_at_student: 3.1850  max mem: 7597
Epoch: 2 [105092 / 111704]  loss: 3.0212  loss_at_student: 3.0212  max mem: 7597
Epoch: 2 [105592 / 111704]  loss: 3.0313  loss_at_student: 3.0313  max mem: 7597
Epoch: 2 [106092 / 111704]  loss: 3.0088  loss_at_student: 3.0088  max mem: 7597
Epoch: 2 [106592 / 111704]  loss: 3.3199  loss_at_student: 3.3199  max mem: 7597
Epoch: 2 [107092 / 111704]  loss: 2.8394  loss_at_student: 2.8394  max mem: 7597
Epoch: 2 [107592 / 111704]  loss: 3.1127  loss_at_student: 3.1127  max mem: 7597
Epoch: 2 [108092 / 111704]  loss: 3.1149  loss_at_student: 3.1149  max mem: 7597
Epoch: 2 [108592 / 111704]  loss: 3.1553  loss_at_student: 3.1553  max mem: 7597
Epoch: 2 [109092 / 111704]  loss: 3.0681  loss_at_student: 3.0681  max mem: 7597
Epoch: 2 [109592 / 111704]  loss: 2.8928  loss_at_student: 2.8928  max mem: 7597
Epoch: 2 [110092 / 111704]  loss: 2.5687  loss_at_student: 2.5687  max mem: 7597
Epoch: 2 [110592 / 111704]  loss: 3.2357  loss_at_student: 3.2357  max mem: 7597
Epoch: 2 [111092 / 111704]  loss: 3.1596  loss_at_student: 3.1596  max mem: 7597
Epoch: 2 [111592 / 111704]  loss: 2.8145  loss_at_student: 2.8145  max mem: 7597
Averaged stats: loss: 3.1486  loss_at_student: 3.1486
Train epoch time: 3:58:32
Train time: 11:55:48