Loss scales: [0.0, 0.0, 1.0] Noise std: 0.1 Use amp for speeding up training Load teacher model: clip-ViT-B-32 Teacher model architecture: Framework( (0): CLIPModel() ) Create student model from output/2stages/1_b32_pt1_100 Training does not need the teacher model, set it to None Freeze the multimodal encoder of the student model Student model architecture: Framework( (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False}) (2): Dense({'in_features': 768, 'out_features': 512, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'}) (3): Projector({'in_features': 512, 'out_features': 768, 'bias': True, 'noise_std': 0.1, 'dropout': 0.1, 'noise_prob': 0}) (4): Decoder({'max_seq_length': 128, 'do_lower_case': False, 'attend_to': ['student'], 'teacher_model_name': 'clip-ViT-B-32'}) with Transformer model: BertLMHeadModel ) Total Params: 164986107 Trainable Params: 29858811 Load data/corpus/multilingual_cc3m/cc3m_en.tsv There are 1 langauges: ['en'] There are 1111391 lines, one of which is ['woman selling flowers to decorate religious offerings at the market'] Load data/corpus/multilingual_cc3m/cc3m_en-zh.tsv There are 2 langauges: ['en', 'zh'] There are 1111391 lines, one of which is ['a very typical bus station', '一个非常典型的公交车站'] Load data/corpus/multilingual_cc3m/cc3m_en-de.tsv There are 2 langauges: ['en', 'de'] There are 1111391 lines, one of which is ['tourists take a photo in front of the entrance sign', 'Touristen machen ein Foto vor dem Eingangsschild'] Load data/corpus/multilingual_cc3m/cc3m_en-fr.tsv There are 2 langauges: ['en', 'fr'] There are 1111391 lines, one of which is ['farmer holding a box with grapes', 'agriculteur tenant une boîte avec des raisins'] Epoch: 0 [ 500 / 111704] loss: 8.7714 loss_at_student: 8.7714 max mem: 4592 Epoch: 0 [ 1000 / 111704] loss: 7.9345 loss_at_student: 7.9345 max mem: 4592 Epoch: 0 [ 1500 / 111704] loss: 7.4418 loss_at_student: 7.4418 max mem: 4592 Epoch: 0 [ 2000 / 111704] loss: 7.2385 loss_at_student: 7.2385 max mem: 5142 Epoch: 0 [ 2500 / 111704] loss: 6.8782 loss_at_student: 6.8782 max mem: 5142 Epoch: 0 [ 3000 / 111704] loss: 6.4817 loss_at_student: 6.4817 max mem: 5194 Epoch: 0 [ 3500 / 111704] loss: 6.3331 loss_at_student: 6.3331 max mem: 5194 Epoch: 0 [ 4000 / 111704] loss: 6.6354 loss_at_student: 6.6354 max mem: 5194 Epoch: 0 [ 4500 / 111704] loss: 6.2082 loss_at_student: 6.2082 max mem: 5194 Epoch: 0 [ 5000 / 111704] loss: 6.0398 loss_at_student: 6.0398 max mem: 5194 Epoch: 0 [ 5500 / 111704] loss: 5.9675 loss_at_student: 5.9675 max mem: 5194 Epoch: 0 [ 6000 / 111704] loss: 5.7734 loss_at_student: 5.7734 max mem: 5194 Epoch: 0 [ 6500 / 111704] loss: 5.8813 loss_at_student: 5.8813 max mem: 5194 Epoch: 0 [ 7000 / 111704] loss: 5.9705 loss_at_student: 5.9705 max mem: 5194 Epoch: 0 [ 7500 / 111704] loss: 5.6952 loss_at_student: 5.6952 max mem: 5194 Epoch: 0 [ 8000 / 111704] loss: 5.7671 loss_at_student: 5.7671 max mem: 5194 Epoch: 0 [ 8500 / 111704] loss: 5.6863 loss_at_student: 5.6863 max mem: 5194 Epoch: 0 [ 9000 / 111704] loss: 5.2731 loss_at_student: 5.2731 max mem: 5194 Epoch: 0 [ 9500 / 111704] loss: 5.6404 loss_at_student: 5.6404 max mem: 5194 Epoch: 0 [ 10000 / 111704] loss: 5.5654 loss_at_student: 5.5654 max mem: 5194 Epoch: 0 [ 10500 / 111704] loss: 5.1481 loss_at_student: 5.1481 max mem: 5194 Epoch: 0 [ 11000 / 111704] loss: 5.3339 loss_at_student: 5.3339 max mem: 5194 Epoch: 0 [ 11500 / 111704] loss: 5.5660 loss_at_student: 5.5660 max mem: 5194 Epoch: 0 [ 12000 / 111704] loss: 5.0071 loss_at_student: 5.0071 max mem: 5194 Epoch: 0 [ 12500 / 111704] loss: 5.0456 loss_at_student: 5.0456 max mem: 5194 Epoch: 0 [ 13000 / 111704] loss: 5.2548 loss_at_student: 5.2548 max mem: 5194 Epoch: 0 [ 13500 / 111704] loss: 4.7399 loss_at_student: 4.7399 max mem: 5194 Epoch: 0 [ 14000 / 111704] loss: 4.9904 loss_at_student: 4.9904 max mem: 5194 Epoch: 0 [ 14500 / 111704] loss: 4.8041 loss_at_student: 4.8041 max mem: 5194 Epoch: 0 [ 15000 / 111704] loss: 5.0959 loss_at_student: 5.0959 max mem: 5194 Epoch: 0 [ 15500 / 111704] loss: 4.6961 loss_at_student: 4.6961 max mem: 5194 Epoch: 0 [ 16000 / 111704] loss: 5.0804 loss_at_student: 5.0804 max mem: 5194 Epoch: 0 [ 16500 / 111704] loss: 4.8322 loss_at_student: 4.8322 max mem: 5242 Epoch: 0 [ 17000 / 111704] loss: 5.1881 loss_at_student: 5.1881 max mem: 5242 Epoch: 0 [ 17500 / 111704] loss: 5.2201 loss_at_student: 5.2201 max mem: 5242 Epoch: 0 [ 18000 / 111704] loss: 4.6492 loss_at_student: 4.6492 max mem: 5242 Epoch: 0 [ 18500 / 111704] loss: 5.0002 loss_at_student: 5.0002 max mem: 5242 Epoch: 0 [ 19000 / 111704] loss: 4.5451 loss_at_student: 4.5451 max mem: 5242 Epoch: 0 [ 19500 / 111704] loss: 4.7435 loss_at_student: 4.7435 max mem: 5242 Epoch: 0 [ 20000 / 111704] loss: 4.4531 loss_at_student: 4.4531 max mem: 5242 Epoch: 0 [ 20500 / 111704] loss: 4.4171 loss_at_student: 4.4171 max mem: 5242 Epoch: 0 [ 21000 / 111704] loss: 4.8378 loss_at_student: 4.8378 max mem: 5242 Epoch: 0 [ 21500 / 111704] loss: 4.5904 loss_at_student: 4.5904 max mem: 5242 Epoch: 0 [ 22000 / 111704] loss: 4.5181 loss_at_student: 4.5181 max mem: 5242 Epoch: 0 [ 22500 / 111704] loss: 4.8956 loss_at_student: 4.8956 max mem: 5242 Epoch: 0 [ 23000 / 111704] loss: 4.7755 loss_at_student: 4.7755 max mem: 5754 Epoch: 0 [ 23500 / 111704] loss: 4.4486 loss_at_student: 4.4486 max mem: 5754 Epoch: 0 [ 24000 / 111704] loss: 4.4459 loss_at_student: 4.4459 max mem: 5754 Epoch: 0 [ 24500 / 111704] loss: 4.5868 loss_at_student: 4.5868 max mem: 5754 Epoch: 0 [ 25000 / 111704] loss: 4.3169 loss_at_student: 4.3169 max mem: 5754 Epoch: 0 [ 25500 / 111704] loss: 4.5783 loss_at_student: 4.5783 max mem: 5754 Epoch: 0 [ 26000 / 111704] loss: 4.2304 loss_at_student: 4.2304 max mem: 5754 Epoch: 0 [ 26500 / 111704] loss: 4.6112 loss_at_student: 4.6112 max mem: 5754 Epoch: 0 [ 27000 / 111704] loss: 4.2030 loss_at_student: 4.2030 max mem: 5754 Epoch: 0 [ 27500 / 111704] loss: 4.5268 loss_at_student: 4.5268 max mem: 5754 Epoch: 0 [ 28000 / 111704] loss: 4.3942 loss_at_student: 4.3942 max mem: 5754 Epoch: 0 [ 28500 / 111704] loss: 4.3239 loss_at_student: 4.3239 max mem: 5754 Epoch: 0 [ 29000 / 111704] loss: 4.3064 loss_at_student: 4.3064 max mem: 5754 Epoch: 0 [ 29500 / 111704] loss: 4.4896 loss_at_student: 4.4896 max mem: 5754 Epoch: 0 [ 30000 / 111704] loss: 4.2512 loss_at_student: 4.2512 max mem: 5754 Epoch: 0 [ 30500 / 111704] loss: 4.4796 loss_at_student: 4.4796 max mem: 5754 Epoch: 0 [ 31000 / 111704] loss: 4.0884 loss_at_student: 4.0884 max mem: 5754 Epoch: 0 [ 31500 / 111704] loss: 4.0342 loss_at_student: 4.0342 max mem: 5754 Epoch: 0 [ 32000 / 111704] loss: 4.4938 loss_at_student: 4.4938 max mem: 5754 Epoch: 0 [ 32500 / 111704] loss: 4.4294 loss_at_student: 4.4294 max mem: 5754 Epoch: 0 [ 33000 / 111704] loss: 4.3596 loss_at_student: 4.3596 max mem: 5754 Epoch: 0 [ 33500 / 111704] loss: 3.9769 loss_at_student: 3.9769 max mem: 5754 Epoch: 0 [ 34000 / 111704] loss: 4.1970 loss_at_student: 4.1970 max mem: 5754 Epoch: 0 [ 34500 / 111704] loss: 3.9168 loss_at_student: 3.9168 max mem: 5754 Epoch: 0 [ 35000 / 111704] loss: 4.0093 loss_at_student: 4.0093 max mem: 5754 Epoch: 0 [ 35500 / 111704] loss: 4.2701 loss_at_student: 4.2701 max mem: 5754 Epoch: 0 [ 36000 / 111704] loss: 4.0642 loss_at_student: 4.0642 max mem: 7597 Epoch: 0 [ 36500 / 111704] loss: 4.1264 loss_at_student: 4.1264 max mem: 7597 Epoch: 0 [ 37000 / 111704] loss: 4.1885 loss_at_student: 4.1885 max mem: 7597 Epoch: 0 [ 37500 / 111704] loss: 4.2928 loss_at_student: 4.2928 max mem: 7597 Epoch: 0 [ 38000 / 111704] loss: 4.1477 loss_at_student: 4.1477 max mem: 7597 Epoch: 0 [ 38500 / 111704] loss: 4.1527 loss_at_student: 4.1527 max mem: 7597 Epoch: 0 [ 39000 / 111704] loss: 3.8931 loss_at_student: 3.8931 max mem: 7597 Epoch: 0 [ 39500 / 111704] loss: 4.4130 loss_at_student: 4.4130 max mem: 7597 Epoch: 0 [ 40000 / 111704] loss: 4.0760 loss_at_student: 4.0760 max mem: 7597 Epoch: 0 [ 40500 / 111704] loss: 4.0467 loss_at_student: 4.0467 max mem: 7597 Epoch: 0 [ 41000 / 111704] loss: 3.8695 loss_at_student: 3.8695 max mem: 7597 Epoch: 0 [ 41500 / 111704] loss: 4.0500 loss_at_student: 4.0500 max mem: 7597 Epoch: 0 [ 42000 / 111704] loss: 4.4134 loss_at_student: 4.4134 max mem: 7597 Epoch: 0 [ 42500 / 111704] loss: 3.6092 loss_at_student: 3.6092 max mem: 7597 Epoch: 0 [ 43000 / 111704] loss: 3.8108 loss_at_student: 3.8108 max mem: 7597 Epoch: 0 [ 43500 / 111704] loss: 3.7753 loss_at_student: 3.7753 max mem: 7597 Epoch: 0 [ 44000 / 111704] loss: 4.2059 loss_at_student: 4.2059 max mem: 7597 Epoch: 0 [ 44500 / 111704] loss: 3.8862 loss_at_student: 3.8862 max mem: 7597 Epoch: 0 [ 45000 / 111704] loss: 4.1190 loss_at_student: 4.1190 max mem: 7597 Epoch: 0 [ 45500 / 111704] loss: 3.8656 loss_at_student: 3.8656 max mem: 7597 Epoch: 0 [ 46000 / 111704] loss: 3.9975 loss_at_student: 3.9975 max mem: 7597 Epoch: 0 [ 46500 / 111704] loss: 4.2335 loss_at_student: 4.2335 max mem: 7597 Epoch: 0 [ 47000 / 111704] loss: 4.3347 loss_at_student: 4.3347 max mem: 7597 Epoch: 0 [ 47500 / 111704] loss: 3.5865 loss_at_student: 3.5865 max mem: 7597 Epoch: 0 [ 48000 / 111704] loss: 3.6770 loss_at_student: 3.6770 max mem: 7597 Epoch: 0 [ 48500 / 111704] loss: 3.8896 loss_at_student: 3.8896 max mem: 7597 Epoch: 0 [ 49000 / 111704] loss: 3.9530 loss_at_student: 3.9530 max mem: 7597 Epoch: 0 [ 49500 / 111704] loss: 4.0959 loss_at_student: 4.0959 max mem: 7597 Epoch: 0 [ 50000 / 111704] loss: 4.1451 loss_at_student: 4.1451 max mem: 7597 Epoch: 0 [ 50500 / 111704] loss: 3.9495 loss_at_student: 3.9495 max mem: 7597 Epoch: 0 [ 51000 / 111704] loss: 3.9685 loss_at_student: 3.9685 max mem: 7597 Epoch: 0 [ 51500 / 111704] loss: 3.6819 loss_at_student: 3.6819 max mem: 7597 Epoch: 0 [ 52000 / 111704] loss: 4.3911 loss_at_student: 4.3911 max mem: 7597 Epoch: 0 [ 52500 / 111704] loss: 3.8748 loss_at_student: 3.8748 max mem: 7597 Epoch: 0 [ 53000 / 111704] loss: 3.8664 loss_at_student: 3.8664 max mem: 7597 Epoch: 0 [ 53500 / 111704] loss: 3.8093 loss_at_student: 3.8093 max mem: 7597 Epoch: 0 [ 54000 / 111704] loss: 3.7960 loss_at_student: 3.7960 max mem: 7597 Epoch: 0 [ 54500 / 111704] loss: 3.9549 loss_at_student: 3.9549 max mem: 7597 Epoch: 0 [ 55000 / 111704] loss: 4.1104 loss_at_student: 4.1104 max mem: 7597 Epoch: 0 [ 55500 / 111704] loss: 4.0118 loss_at_student: 4.0118 max mem: 7597 Epoch: 0 [ 56000 / 111704] loss: 3.6153 loss_at_student: 3.6153 max mem: 7597 Epoch: 0 [ 56500 / 111704] loss: 3.9898 loss_at_student: 3.9898 max mem: 7597 Epoch: 0 [ 57000 / 111704] loss: 3.7724 loss_at_student: 3.7724 max mem: 7597 Epoch: 0 [ 57500 / 111704] loss: 3.8434 loss_at_student: 3.8434 max mem: 7597 Epoch: 0 [ 58000 / 111704] loss: 4.0537 loss_at_student: 4.0537 max mem: 7597 Epoch: 0 [ 58500 / 111704] loss: 3.7411 loss_at_student: 3.7411 max mem: 7597 Epoch: 0 [ 59000 / 111704] loss: 3.7493 loss_at_student: 3.7493 max mem: 7597 Epoch: 0 [ 59500 / 111704] loss: 4.3428 loss_at_student: 4.3428 max mem: 7597 Epoch: 0 [ 60000 / 111704] loss: 3.8873 loss_at_student: 3.8873 max mem: 7597 Epoch: 0 [ 60500 / 111704] loss: 4.1809 loss_at_student: 4.1809 max mem: 7597 Epoch: 0 [ 61000 / 111704] loss: 4.1613 loss_at_student: 4.1613 max mem: 7597 Epoch: 0 [ 61500 / 111704] loss: 3.4200 loss_at_student: 3.4200 max mem: 7597 Epoch: 0 [ 62000 / 111704] loss: 3.9101 loss_at_student: 3.9101 max mem: 7597 Epoch: 0 [ 62500 / 111704] loss: 3.8585 loss_at_student: 3.8585 max mem: 7597 Epoch: 0 [ 63000 / 111704] loss: 3.7161 loss_at_student: 3.7161 max mem: 7597 Epoch: 0 [ 63500 / 111704] loss: 3.8943 loss_at_student: 3.8943 max mem: 7597 Epoch: 0 [ 64000 / 111704] loss: 3.7164 loss_at_student: 3.7164 max mem: 7597 Epoch: 0 [ 64500 / 111704] loss: 3.7043 loss_at_student: 3.7043 max mem: 7597 Epoch: 0 [ 65000 / 111704] loss: 3.4761 loss_at_student: 3.4761 max mem: 7597 Epoch: 0 [ 65500 / 111704] loss: 4.0781 loss_at_student: 4.0781 max mem: 7597 Epoch: 0 [ 66000 / 111704] loss: 3.7520 loss_at_student: 3.7520 max mem: 7597 Epoch: 0 [ 66500 / 111704] loss: 3.5518 loss_at_student: 3.5518 max mem: 7597 Epoch: 0 [ 67000 / 111704] loss: 3.9021 loss_at_student: 3.9021 max mem: 7597 Epoch: 0 [ 67500 / 111704] loss: 3.8593 loss_at_student: 3.8593 max mem: 7597 Epoch: 0 [ 68000 / 111704] loss: 3.9456 loss_at_student: 3.9456 max mem: 7597 Epoch: 0 [ 68500 / 111704] loss: 3.6141 loss_at_student: 3.6141 max mem: 7597 Epoch: 0 [ 69000 / 111704] loss: 4.2022 loss_at_student: 4.2022 max mem: 7597 Epoch: 0 [ 69500 / 111704] loss: 3.5705 loss_at_student: 3.5705 max mem: 7597 Epoch: 0 [ 70000 / 111704] loss: 3.8974 loss_at_student: 3.8974 max mem: 7597 Epoch: 0 [ 70500 / 111704] loss: 3.4586 loss_at_student: 3.4586 max mem: 7597 Epoch: 0 [ 71000 / 111704] loss: 4.2041 loss_at_student: 4.2041 max mem: 7597 Epoch: 0 [ 71500 / 111704] loss: 3.6301 loss_at_student: 3.6301 max mem: 7597 Epoch: 0 [ 72000 / 111704] loss: 3.8927 loss_at_student: 3.8927 max mem: 7597 Epoch: 0 [ 72500 / 111704] loss: 3.7067 loss_at_student: 3.7067 max mem: 7597 Epoch: 0 [ 73000 / 111704] loss: 3.5971 loss_at_student: 3.5971 max mem: 7597 Epoch: 0 [ 73500 / 111704] loss: 3.9996 loss_at_student: 3.9996 max mem: 7597 Epoch: 0 [ 74000 / 111704] loss: 3.8815 loss_at_student: 3.8815 max mem: 7597 Epoch: 0 [ 74500 / 111704] loss: 3.9927 loss_at_student: 3.9927 max mem: 7597 Epoch: 0 [ 75000 / 111704] loss: 3.4703 loss_at_student: 3.4703 max mem: 7597 Epoch: 0 [ 75500 / 111704] loss: 3.5760 loss_at_student: 3.5760 max mem: 7597 Epoch: 0 [ 76000 / 111704] loss: 3.7167 loss_at_student: 3.7167 max mem: 7597 Epoch: 0 [ 76500 / 111704] loss: 3.8299 loss_at_student: 3.8299 max mem: 7597 Epoch: 0 [ 77000 / 111704] loss: 3.5532 loss_at_student: 3.5532 max mem: 7597 Epoch: 0 [ 77500 / 111704] loss: 3.6085 loss_at_student: 3.6085 max mem: 7597 Epoch: 0 [ 78000 / 111704] loss: 3.5591 loss_at_student: 3.5591 max mem: 7597 Epoch: 0 [ 78500 / 111704] loss: 3.4231 loss_at_student: 3.4231 max mem: 7597 Epoch: 0 [ 79000 / 111704] loss: 3.6261 loss_at_student: 3.6261 max mem: 7597 Epoch: 0 [ 79500 / 111704] loss: 3.4383 loss_at_student: 3.4383 max mem: 7597 Epoch: 0 [ 80000 / 111704] loss: 3.3283 loss_at_student: 3.3283 max mem: 7597 Epoch: 0 [ 80500 / 111704] loss: 3.4221 loss_at_student: 3.4221 max mem: 7597 Epoch: 0 [ 81000 / 111704] loss: 3.9027 loss_at_student: 3.9027 max mem: 7597 Epoch: 0 [ 81500 / 111704] loss: 3.8388 loss_at_student: 3.8388 max mem: 7597 Epoch: 0 [ 82000 / 111704] loss: 4.1729 loss_at_student: 4.1729 max mem: 7597 Epoch: 0 [ 82500 / 111704] loss: 3.8036 loss_at_student: 3.8036 max mem: 7597 Epoch: 0 [ 83000 / 111704] loss: 3.5903 loss_at_student: 3.5903 max mem: 7597 Epoch: 0 [ 83500 / 111704] loss: 3.4736 loss_at_student: 3.4736 max mem: 7597 Epoch: 0 [ 84000 / 111704] loss: 3.4524 loss_at_student: 3.4524 max mem: 7597 Epoch: 0 [ 84500 / 111704] loss: 3.7814 loss_at_student: 3.7814 max mem: 7597 Epoch: 0 [ 85000 / 111704] loss: 3.5877 loss_at_student: 3.5877 max mem: 7597 Epoch: 0 [ 85500 / 111704] loss: 3.2953 loss_at_student: 3.2953 max mem: 7597 Epoch: 0 [ 86000 / 111704] loss: 3.6137 loss_at_student: 3.6137 max mem: 7597 Epoch: 0 [ 86500 / 111704] loss: 3.7357 loss_at_student: 3.7357 max mem: 7597 Epoch: 0 [ 87000 / 111704] loss: 3.7362 loss_at_student: 3.7362 max mem: 7597 Epoch: 0 [ 87500 / 111704] loss: 3.6342 loss_at_student: 3.6342 max mem: 7597 Epoch: 0 [ 88000 / 111704] loss: 3.6499 loss_at_student: 3.6499 max mem: 7597 Epoch: 0 [ 88500 / 111704] loss: 3.8936 loss_at_student: 3.8936 max mem: 7597 Epoch: 0 [ 89000 / 111704] loss: 3.9120 loss_at_student: 3.9120 max mem: 7597 Epoch: 0 [ 89500 / 111704] loss: 3.6771 loss_at_student: 3.6771 max mem: 7597 Epoch: 0 [ 90000 / 111704] loss: 3.8526 loss_at_student: 3.8526 max mem: 7597 Epoch: 0 [ 90500 / 111704] loss: 3.9116 loss_at_student: 3.9116 max mem: 7597 Epoch: 0 [ 91000 / 111704] loss: 3.3960 loss_at_student: 3.3960 max mem: 7597 Epoch: 0 [ 91500 / 111704] loss: 3.9203 loss_at_student: 3.9203 max mem: 7597 Epoch: 0 [ 92000 / 111704] loss: 3.5709 loss_at_student: 3.5709 max mem: 7597 Epoch: 0 [ 92500 / 111704] loss: 3.6945 loss_at_student: 3.6945 max mem: 7597 Epoch: 0 [ 93000 / 111704] loss: 4.0280 loss_at_student: 4.0280 max mem: 7597 Epoch: 0 [ 93500 / 111704] loss: 3.3604 loss_at_student: 3.3604 max mem: 7597 Epoch: 0 [ 94000 / 111704] loss: 3.4572 loss_at_student: 3.4572 max mem: 7597 Epoch: 0 [ 94500 / 111704] loss: 3.9002 loss_at_student: 3.9002 max mem: 7597 Epoch: 0 [ 95000 / 111704] loss: 3.6444 loss_at_student: 3.6444 max mem: 7597 Epoch: 0 [ 95500 / 111704] loss: 3.2206 loss_at_student: 3.2206 max mem: 7597 Epoch: 0 [ 96000 / 111704] loss: 3.1926 loss_at_student: 3.1926 max mem: 7597 Epoch: 0 [ 96500 / 111704] loss: 3.5636 loss_at_student: 3.5636 max mem: 7597 Epoch: 0 [ 97000 / 111704] loss: 4.0269 loss_at_student: 4.0269 max mem: 7597 Epoch: 0 [ 97500 / 111704] loss: 3.6760 loss_at_student: 3.6760 max mem: 7597 Epoch: 0 [ 98000 / 111704] loss: 3.3086 loss_at_student: 3.3086 max mem: 7597 Epoch: 0 [ 98500 / 111704] loss: 3.6044 loss_at_student: 3.6044 max mem: 7597 Epoch: 0 [ 99000 / 111704] loss: 3.9427 loss_at_student: 3.9427 max mem: 7597 Epoch: 0 [ 99500 / 111704] loss: 3.8270 loss_at_student: 3.8270 max mem: 7597 Epoch: 0 [100000 / 111704] loss: 3.4903 loss_at_student: 3.4903 max mem: 7597 Epoch: 0 [100500 / 111704] loss: 3.6302 loss_at_student: 3.6302 max mem: 7597 Epoch: 0 [101000 / 111704] loss: 3.7080 loss_at_student: 3.7080 max mem: 7597 Epoch: 0 [101500 / 111704] loss: 3.4830 loss_at_student: 3.4830 max mem: 7597 Epoch: 0 [102000 / 111704] loss: 3.6739 loss_at_student: 3.6739 max mem: 7597 Epoch: 0 [102500 / 111704] loss: 3.3773 loss_at_student: 3.3773 max mem: 7597 Epoch: 0 [103000 / 111704] loss: 3.4852 loss_at_student: 3.4852 max mem: 7597 Epoch: 0 [103500 / 111704] loss: 3.5963 loss_at_student: 3.5963 max mem: 7597 Epoch: 0 [104000 / 111704] loss: 3.6638 loss_at_student: 3.6638 max mem: 7597 Epoch: 0 [104500 / 111704] loss: 3.6741 loss_at_student: 3.6741 max mem: 7597 Epoch: 0 [105000 / 111704] loss: 3.9578 loss_at_student: 3.9578 max mem: 7597 Epoch: 0 [105500 / 111704] loss: 3.5483 loss_at_student: 3.5483 max mem: 7597 Epoch: 0 [106000 / 111704] loss: 3.9791 loss_at_student: 3.9791 max mem: 7597 Epoch: 0 [106500 / 111704] loss: 3.2237 loss_at_student: 3.2237 max mem: 7597 Epoch: 0 [107000 / 111704] loss: 3.3677 loss_at_student: 3.3677 max mem: 7597 Epoch: 0 [107500 / 111704] loss: 3.9328 loss_at_student: 3.9328 max mem: 7597 Epoch: 0 [108000 / 111704] loss: 3.5512 loss_at_student: 3.5512 max mem: 7597 Epoch: 0 [108500 / 111704] loss: 3.4838 loss_at_student: 3.4838 max mem: 7597 Epoch: 0 [109000 / 111704] loss: 3.4433 loss_at_student: 3.4433 max mem: 7597 Epoch: 0 [109500 / 111704] loss: 3.3684 loss_at_student: 3.3684 max mem: 7597 Epoch: 0 [110000 / 111704] loss: 3.5861 loss_at_student: 3.5861 max mem: 7597 Epoch: 0 [110500 / 111704] loss: 3.5507 loss_at_student: 3.5507 max mem: 7597 Epoch: 0 [111000 / 111704] loss: 3.3398 loss_at_student: 3.3398 max mem: 7597 Epoch: 0 [111500 / 111704] loss: 3.6632 loss_at_student: 3.6632 max mem: 7597 Averaged stats: loss: 4.2150 loss_at_student: 4.2150 Train epoch time: 3:58:26 Epoch: 1 [ 296 / 111704] loss: 3.6029 loss_at_student: 3.6029 max mem: 7597 Epoch: 1 [ 796 / 111704] loss: 3.6623 loss_at_student: 3.6623 max mem: 7597 Epoch: 1 [ 1296 / 111704] loss: 3.4661 loss_at_student: 3.4661 max mem: 7597 Epoch: 1 [ 1796 / 111704] loss: 3.2433 loss_at_student: 3.2433 max mem: 7597 Epoch: 1 [ 2296 / 111704] loss: 3.4813 loss_at_student: 3.4813 max mem: 7597 Epoch: 1 [ 2796 / 111704] loss: 3.3808 loss_at_student: 3.3808 max mem: 7597 Epoch: 1 [ 3296 / 111704] loss: 3.4983 loss_at_student: 3.4983 max mem: 7597 Epoch: 1 [ 3796 / 111704] loss: 3.4964 loss_at_student: 3.4964 max mem: 7597 Epoch: 1 [ 4296 / 111704] loss: 3.4458 loss_at_student: 3.4458 max mem: 7597 Epoch: 1 [ 4796 / 111704] loss: 3.5752 loss_at_student: 3.5752 max mem: 7597 Epoch: 1 [ 5296 / 111704] loss: 3.7625 loss_at_student: 3.7625 max mem: 7597 Epoch: 1 [ 5796 / 111704] loss: 3.4297 loss_at_student: 3.4297 max mem: 7597 Epoch: 1 [ 6296 / 111704] loss: 3.7925 loss_at_student: 3.7925 max mem: 7597 Epoch: 1 [ 6796 / 111704] loss: 3.0521 loss_at_student: 3.0521 max mem: 7597 Epoch: 1 [ 7296 / 111704] loss: 3.3852 loss_at_student: 3.3852 max mem: 7597 Epoch: 1 [ 7796 / 111704] loss: 3.2090 loss_at_student: 3.2090 max mem: 7597 Epoch: 1 [ 8296 / 111704] loss: 3.9120 loss_at_student: 3.9120 max mem: 7597 Epoch: 1 [ 8796 / 111704] loss: 3.2972 loss_at_student: 3.2972 max mem: 7597 Epoch: 1 [ 9296 / 111704] loss: 3.7184 loss_at_student: 3.7184 max mem: 7597 Epoch: 1 [ 9796 / 111704] loss: 3.5720 loss_at_student: 3.5720 max mem: 7597 Epoch: 1 [ 10296 / 111704] loss: 3.6307 loss_at_student: 3.6307 max mem: 7597 Epoch: 1 [ 10796 / 111704] loss: 3.2653 loss_at_student: 3.2653 max mem: 7597 Epoch: 1 [ 11296 / 111704] loss: 3.5389 loss_at_student: 3.5389 max mem: 7597 Epoch: 1 [ 11796 / 111704] loss: 3.6222 loss_at_student: 3.6222 max mem: 7597 Epoch: 1 [ 12296 / 111704] loss: 3.4383 loss_at_student: 3.4383 max mem: 7597 Epoch: 1 [ 12796 / 111704] loss: 3.2861 loss_at_student: 3.2861 max mem: 7597 Epoch: 1 [ 13296 / 111704] loss: 3.6515 loss_at_student: 3.6515 max mem: 7597 Epoch: 1 [ 13796 / 111704] loss: 3.2430 loss_at_student: 3.2430 max mem: 7597 Epoch: 1 [ 14296 / 111704] loss: 3.5435 loss_at_student: 3.5435 max mem: 7597 Epoch: 1 [ 14796 / 111704] loss: 3.3641 loss_at_student: 3.3641 max mem: 7597 Epoch: 1 [ 15296 / 111704] loss: 3.6065 loss_at_student: 3.6065 max mem: 7597 Epoch: 1 [ 15796 / 111704] loss: 3.4092 loss_at_student: 3.4092 max mem: 7597 Epoch: 1 [ 16296 / 111704] loss: 3.6313 loss_at_student: 3.6313 max mem: 7597 Epoch: 1 [ 16796 / 111704] loss: 3.6361 loss_at_student: 3.6361 max mem: 7597 Epoch: 1 [ 17296 / 111704] loss: 3.3100 loss_at_student: 3.3100 max mem: 7597 Epoch: 1 [ 17796 / 111704] loss: 3.8539 loss_at_student: 3.8539 max mem: 7597 Epoch: 1 [ 18296 / 111704] loss: 3.4563 loss_at_student: 3.4563 max mem: 7597 Epoch: 1 [ 18796 / 111704] loss: 3.6452 loss_at_student: 3.6452 max mem: 7597 Epoch: 1 [ 19296 / 111704] loss: 3.2030 loss_at_student: 3.2030 max mem: 7597 Epoch: 1 [ 19796 / 111704] loss: 3.6025 loss_at_student: 3.6025 max mem: 7597 Epoch: 1 [ 20296 / 111704] loss: 3.8071 loss_at_student: 3.8071 max mem: 7597 Epoch: 1 [ 20796 / 111704] loss: 3.5293 loss_at_student: 3.5293 max mem: 7597 Epoch: 1 [ 21296 / 111704] loss: 3.1455 loss_at_student: 3.1455 max mem: 7597 Epoch: 1 [ 21796 / 111704] loss: 3.1347 loss_at_student: 3.1347 max mem: 7597 Epoch: 1 [ 22296 / 111704] loss: 3.4667 loss_at_student: 3.4667 max mem: 7597 Epoch: 1 [ 22796 / 111704] loss: 3.4165 loss_at_student: 3.4165 max mem: 7597 Epoch: 1 [ 23296 / 111704] loss: 3.5240 loss_at_student: 3.5240 max mem: 7597 Epoch: 1 [ 23796 / 111704] loss: 3.3696 loss_at_student: 3.3696 max mem: 7597 Epoch: 1 [ 24296 / 111704] loss: 3.1481 loss_at_student: 3.1481 max mem: 7597 Epoch: 1 [ 24796 / 111704] loss: 3.5372 loss_at_student: 3.5372 max mem: 7597 Epoch: 1 [ 25296 / 111704] loss: 3.0937 loss_at_student: 3.0937 max mem: 7597 Epoch: 1 [ 25796 / 111704] loss: 3.2243 loss_at_student: 3.2243 max mem: 7597 Epoch: 1 [ 26296 / 111704] loss: 3.3260 loss_at_student: 3.3260 max mem: 7597 Epoch: 1 [ 26796 / 111704] loss: 3.1824 loss_at_student: 3.1824 max mem: 7597 Epoch: 1 [ 27296 / 111704] loss: 3.2693 loss_at_student: 3.2693 max mem: 7597 Epoch: 1 [ 27796 / 111704] loss: 2.9478 loss_at_student: 2.9478 max mem: 7597 Epoch: 1 [ 28296 / 111704] loss: 3.2822 loss_at_student: 3.2822 max mem: 7597 Epoch: 1 [ 28796 / 111704] loss: 3.1710 loss_at_student: 3.1710 max mem: 7597 Epoch: 1 [ 29296 / 111704] loss: 3.6465 loss_at_student: 3.6465 max mem: 7597 Epoch: 1 [ 29796 / 111704] loss: 3.4467 loss_at_student: 3.4467 max mem: 7597 Epoch: 1 [ 30296 / 111704] loss: 3.2328 loss_at_student: 3.2328 max mem: 7597 Epoch: 1 [ 30796 / 111704] loss: 3.4318 loss_at_student: 3.4318 max mem: 7597 Epoch: 1 [ 31296 / 111704] loss: 3.5629 loss_at_student: 3.5629 max mem: 7597 Epoch: 1 [ 31796 / 111704] loss: 3.4550 loss_at_student: 3.4550 max mem: 7597 Epoch: 1 [ 32296 / 111704] loss: 3.5785 loss_at_student: 3.5785 max mem: 7597 Epoch: 1 [ 32796 / 111704] loss: 3.5753 loss_at_student: 3.5753 max mem: 7597 Epoch: 1 [ 33296 / 111704] loss: 2.9137 loss_at_student: 2.9137 max mem: 7597 Epoch: 1 [ 33796 / 111704] loss: 3.7341 loss_at_student: 3.7341 max mem: 7597 Epoch: 1 [ 34296 / 111704] loss: 3.2240 loss_at_student: 3.2240 max mem: 7597 Epoch: 1 [ 34796 / 111704] loss: 3.4021 loss_at_student: 3.4021 max mem: 7597 Epoch: 1 [ 35296 / 111704] loss: 3.4050 loss_at_student: 3.4050 max mem: 7597 Epoch: 1 [ 35796 / 111704] loss: 3.4401 loss_at_student: 3.4401 max mem: 7597 Epoch: 1 [ 36296 / 111704] loss: 3.2925 loss_at_student: 3.2925 max mem: 7597 Epoch: 1 [ 36796 / 111704] loss: 3.1435 loss_at_student: 3.1435 max mem: 7597 Epoch: 1 [ 37296 / 111704] loss: 3.3391 loss_at_student: 3.3391 max mem: 7597 Epoch: 1 [ 37796 / 111704] loss: 3.5469 loss_at_student: 3.5469 max mem: 7597 Epoch: 1 [ 38296 / 111704] loss: 3.0550 loss_at_student: 3.0550 max mem: 7597 Epoch: 1 [ 38796 / 111704] loss: 3.6440 loss_at_student: 3.6440 max mem: 7597 Epoch: 1 [ 39296 / 111704] loss: 3.2004 loss_at_student: 3.2004 max mem: 7597 Epoch: 1 [ 39796 / 111704] loss: 3.4220 loss_at_student: 3.4220 max mem: 7597 Epoch: 1 [ 40296 / 111704] loss: 3.5742 loss_at_student: 3.5742 max mem: 7597 Epoch: 1 [ 40796 / 111704] loss: 3.4066 loss_at_student: 3.4066 max mem: 7597 Epoch: 1 [ 41296 / 111704] loss: 3.5858 loss_at_student: 3.5858 max mem: 7597 Epoch: 1 [ 41796 / 111704] loss: 3.1169 loss_at_student: 3.1169 max mem: 7597 Epoch: 1 [ 42296 / 111704] loss: 3.5306 loss_at_student: 3.5306 max mem: 7597 Epoch: 1 [ 42796 / 111704] loss: 3.5323 loss_at_student: 3.5323 max mem: 7597 Epoch: 1 [ 43296 / 111704] loss: 3.5636 loss_at_student: 3.5636 max mem: 7597 Epoch: 1 [ 43796 / 111704] loss: 3.3182 loss_at_student: 3.3182 max mem: 7597 Epoch: 1 [ 44296 / 111704] loss: 3.4409 loss_at_student: 3.4409 max mem: 7597 Epoch: 1 [ 44796 / 111704] loss: 3.7748 loss_at_student: 3.7748 max mem: 7597 Epoch: 1 [ 45296 / 111704] loss: 3.2716 loss_at_student: 3.2716 max mem: 7597 Epoch: 1 [ 45796 / 111704] loss: 3.3531 loss_at_student: 3.3531 max mem: 7597 Epoch: 1 [ 46296 / 111704] loss: 3.1153 loss_at_student: 3.1153 max mem: 7597 Epoch: 1 [ 46796 / 111704] loss: 3.7651 loss_at_student: 3.7651 max mem: 7597 Epoch: 1 [ 47296 / 111704] loss: 3.4135 loss_at_student: 3.4135 max mem: 7597 Epoch: 1 [ 47796 / 111704] loss: 3.3709 loss_at_student: 3.3709 max mem: 7597 Epoch: 1 [ 48296 / 111704] loss: 3.7346 loss_at_student: 3.7346 max mem: 7597 Epoch: 1 [ 48796 / 111704] loss: 3.0866 loss_at_student: 3.0866 max mem: 7597 Epoch: 1 [ 49296 / 111704] loss: 3.4034 loss_at_student: 3.4034 max mem: 7597 Epoch: 1 [ 49796 / 111704] loss: 3.2171 loss_at_student: 3.2171 max mem: 7597 Epoch: 1 [ 50296 / 111704] loss: 3.4626 loss_at_student: 3.4626 max mem: 7597 Epoch: 1 [ 50796 / 111704] loss: 3.2732 loss_at_student: 3.2732 max mem: 7597 Epoch: 1 [ 51296 / 111704] loss: 3.3169 loss_at_student: 3.3169 max mem: 7597 Epoch: 1 [ 51796 / 111704] loss: 3.6335 loss_at_student: 3.6335 max mem: 7597 Epoch: 1 [ 52296 / 111704] loss: 3.4199 loss_at_student: 3.4199 max mem: 7597 Epoch: 1 [ 52796 / 111704] loss: 3.1910 loss_at_student: 3.1910 max mem: 7597 Epoch: 1 [ 53296 / 111704] loss: 3.4056 loss_at_student: 3.4056 max mem: 7597 Epoch: 1 [ 53796 / 111704] loss: 3.7073 loss_at_student: 3.7073 max mem: 7597 Epoch: 1 [ 54296 / 111704] loss: 3.0123 loss_at_student: 3.0123 max mem: 7597 Epoch: 1 [ 54796 / 111704] loss: 2.8909 loss_at_student: 2.8909 max mem: 7597 Epoch: 1 [ 55296 / 111704] loss: 3.5244 loss_at_student: 3.5244 max mem: 7597 Epoch: 1 [ 55796 / 111704] loss: 3.3107 loss_at_student: 3.3107 max mem: 7597 Epoch: 1 [ 56296 / 111704] loss: 3.5457 loss_at_student: 3.5457 max mem: 7597 Epoch: 1 [ 56796 / 111704] loss: 3.5005 loss_at_student: 3.5005 max mem: 7597 Epoch: 1 [ 57296 / 111704] loss: 3.1823 loss_at_student: 3.1823 max mem: 7597 Epoch: 1 [ 57796 / 111704] loss: 3.7455 loss_at_student: 3.7455 max mem: 7597 Epoch: 1 [ 58296 / 111704] loss: 3.3206 loss_at_student: 3.3206 max mem: 7597 Epoch: 1 [ 58796 / 111704] loss: 3.3536 loss_at_student: 3.3536 max mem: 7597 Epoch: 1 [ 59296 / 111704] loss: 3.5023 loss_at_student: 3.5023 max mem: 7597 Epoch: 1 [ 59796 / 111704] loss: 3.2701 loss_at_student: 3.2701 max mem: 7597 Epoch: 1 [ 60296 / 111704] loss: 3.2055 loss_at_student: 3.2055 max mem: 7597 Epoch: 1 [ 60796 / 111704] loss: 3.3352 loss_at_student: 3.3352 max mem: 7597 Epoch: 1 [ 61296 / 111704] loss: 3.2006 loss_at_student: 3.2006 max mem: 7597 Epoch: 1 [ 61796 / 111704] loss: 3.3647 loss_at_student: 3.3647 max mem: 7597 Epoch: 1 [ 62296 / 111704] loss: 3.5265 loss_at_student: 3.5265 max mem: 7597 Epoch: 1 [ 62796 / 111704] loss: 3.0733 loss_at_student: 3.0733 max mem: 7597 Epoch: 1 [ 63296 / 111704] loss: 3.0901 loss_at_student: 3.0901 max mem: 7597 Epoch: 1 [ 63796 / 111704] loss: 3.0577 loss_at_student: 3.0577 max mem: 7597 Epoch: 1 [ 64296 / 111704] loss: 3.3136 loss_at_student: 3.3136 max mem: 7597 Epoch: 1 [ 64796 / 111704] loss: 3.1727 loss_at_student: 3.1727 max mem: 7597 Epoch: 1 [ 65296 / 111704] loss: 3.5392 loss_at_student: 3.5392 max mem: 7597 Epoch: 1 [ 65796 / 111704] loss: 3.2230 loss_at_student: 3.2230 max mem: 7597 Epoch: 1 [ 66296 / 111704] loss: 3.5260 loss_at_student: 3.5260 max mem: 7597 Epoch: 1 [ 66796 / 111704] loss: 3.4806 loss_at_student: 3.4806 max mem: 7597 Epoch: 1 [ 67296 / 111704] loss: 3.0358 loss_at_student: 3.0358 max mem: 7597 Epoch: 1 [ 67796 / 111704] loss: 3.1595 loss_at_student: 3.1595 max mem: 7597 Epoch: 1 [ 68296 / 111704] loss: 3.4882 loss_at_student: 3.4882 max mem: 7597 Epoch: 1 [ 68796 / 111704] loss: 2.9856 loss_at_student: 2.9856 max mem: 7597 Epoch: 1 [ 69296 / 111704] loss: 3.2345 loss_at_student: 3.2345 max mem: 7597 Epoch: 1 [ 69796 / 111704] loss: 3.6079 loss_at_student: 3.6079 max mem: 7597 Epoch: 1 [ 70296 / 111704] loss: 3.0650 loss_at_student: 3.0650 max mem: 7597 Epoch: 1 [ 70796 / 111704] loss: 3.1669 loss_at_student: 3.1669 max mem: 7597 Epoch: 1 [ 71296 / 111704] loss: 2.8514 loss_at_student: 2.8514 max mem: 7597 Epoch: 1 [ 71796 / 111704] loss: 3.3876 loss_at_student: 3.3876 max mem: 7597 Epoch: 1 [ 72296 / 111704] loss: 3.1704 loss_at_student: 3.1704 max mem: 7597 Epoch: 1 [ 72796 / 111704] loss: 3.4274 loss_at_student: 3.4274 max mem: 7597 Epoch: 1 [ 73296 / 111704] loss: 3.1756 loss_at_student: 3.1756 max mem: 7597 Epoch: 1 [ 73796 / 111704] loss: 3.4278 loss_at_student: 3.4278 max mem: 7597 Epoch: 1 [ 74296 / 111704] loss: 3.3285 loss_at_student: 3.3285 max mem: 7597 Epoch: 1 [ 74796 / 111704] loss: 3.2738 loss_at_student: 3.2738 max mem: 7597 Epoch: 1 [ 75296 / 111704] loss: 3.0635 loss_at_student: 3.0635 max mem: 7597 Epoch: 1 [ 75796 / 111704] loss: 3.3299 loss_at_student: 3.3299 max mem: 7597 Epoch: 1 [ 76296 / 111704] loss: 3.1587 loss_at_student: 3.1587 max mem: 7597 Epoch: 1 [ 76796 / 111704] loss: 3.0882 loss_at_student: 3.0882 max mem: 7597 Epoch: 1 [ 77296 / 111704] loss: 3.0397 loss_at_student: 3.0397 max mem: 7597 Epoch: 1 [ 77796 / 111704] loss: 3.5432 loss_at_student: 3.5432 max mem: 7597 Epoch: 1 [ 78296 / 111704] loss: 3.4269 loss_at_student: 3.4269 max mem: 7597 Epoch: 1 [ 78796 / 111704] loss: 3.4881 loss_at_student: 3.4881 max mem: 7597 Epoch: 1 [ 79296 / 111704] loss: 3.5102 loss_at_student: 3.5102 max mem: 7597 Epoch: 1 [ 79796 / 111704] loss: 2.9870 loss_at_student: 2.9870 max mem: 7597 Epoch: 1 [ 80296 / 111704] loss: 3.6219 loss_at_student: 3.6219 max mem: 7597 Epoch: 1 [ 80796 / 111704] loss: 2.9447 loss_at_student: 2.9447 max mem: 7597 Epoch: 1 [ 81296 / 111704] loss: 3.3926 loss_at_student: 3.3926 max mem: 7597 Epoch: 1 [ 81796 / 111704] loss: 2.9767 loss_at_student: 2.9767 max mem: 7597 Epoch: 1 [ 82296 / 111704] loss: 3.5474 loss_at_student: 3.5474 max mem: 7597 Epoch: 1 [ 82796 / 111704] loss: 3.5902 loss_at_student: 3.5902 max mem: 7597 Epoch: 1 [ 83296 / 111704] loss: 3.4155 loss_at_student: 3.4155 max mem: 7597 Epoch: 1 [ 83796 / 111704] loss: 3.1603 loss_at_student: 3.1603 max mem: 7597 Epoch: 1 [ 84296 / 111704] loss: 3.8424 loss_at_student: 3.8424 max mem: 7597 Epoch: 1 [ 84796 / 111704] loss: 3.2034 loss_at_student: 3.2034 max mem: 7597 Epoch: 1 [ 85296 / 111704] loss: 3.1573 loss_at_student: 3.1573 max mem: 7597 Epoch: 1 [ 85796 / 111704] loss: 3.7017 loss_at_student: 3.7017 max mem: 7597 Epoch: 1 [ 86296 / 111704] loss: 3.2270 loss_at_student: 3.2270 max mem: 7597 Epoch: 1 [ 86796 / 111704] loss: 3.3402 loss_at_student: 3.3402 max mem: 7597 Epoch: 1 [ 87296 / 111704] loss: 3.4993 loss_at_student: 3.4993 max mem: 7597 Epoch: 1 [ 87796 / 111704] loss: 3.7399 loss_at_student: 3.7399 max mem: 7597 Epoch: 1 [ 88296 / 111704] loss: 3.2117 loss_at_student: 3.2117 max mem: 7597 Epoch: 1 [ 88796 / 111704] loss: 3.5974 loss_at_student: 3.5974 max mem: 7597 Epoch: 1 [ 89296 / 111704] loss: 3.5153 loss_at_student: 3.5153 max mem: 7597 Epoch: 1 [ 89796 / 111704] loss: 3.4865 loss_at_student: 3.4865 max mem: 7597 Epoch: 1 [ 90296 / 111704] loss: 3.0485 loss_at_student: 3.0485 max mem: 7597 Epoch: 1 [ 90796 / 111704] loss: 3.2208 loss_at_student: 3.2208 max mem: 7597 Epoch: 1 [ 91296 / 111704] loss: 3.0650 loss_at_student: 3.0650 max mem: 7597 Epoch: 1 [ 91796 / 111704] loss: 3.3943 loss_at_student: 3.3943 max mem: 7597 Epoch: 1 [ 92296 / 111704] loss: 3.3520 loss_at_student: 3.3520 max mem: 7597 Epoch: 1 [ 92796 / 111704] loss: 3.3314 loss_at_student: 3.3314 max mem: 7597 Epoch: 1 [ 93296 / 111704] loss: 3.1173 loss_at_student: 3.1173 max mem: 7597 Epoch: 1 [ 93796 / 111704] loss: 3.1904 loss_at_student: 3.1904 max mem: 7597 Epoch: 1 [ 94296 / 111704] loss: 3.2286 loss_at_student: 3.2286 max mem: 7597 Epoch: 1 [ 94796 / 111704] loss: 3.2978 loss_at_student: 3.2978 max mem: 7597 Epoch: 1 [ 95296 / 111704] loss: 3.4678 loss_at_student: 3.4678 max mem: 7597 Epoch: 1 [ 95796 / 111704] loss: 3.4887 loss_at_student: 3.4887 max mem: 7597 Epoch: 1 [ 96296 / 111704] loss: 3.1410 loss_at_student: 3.1410 max mem: 7597 Epoch: 1 [ 96796 / 111704] loss: 2.9872 loss_at_student: 2.9872 max mem: 7597 Epoch: 1 [ 97296 / 111704] loss: 3.5573 loss_at_student: 3.5573 max mem: 7597 Epoch: 1 [ 97796 / 111704] loss: 3.1718 loss_at_student: 3.1718 max mem: 7597 Epoch: 1 [ 98296 / 111704] loss: 3.2211 loss_at_student: 3.2211 max mem: 7597 Epoch: 1 [ 98796 / 111704] loss: 3.6510 loss_at_student: 3.6510 max mem: 7597 Epoch: 1 [ 99296 / 111704] loss: 2.9727 loss_at_student: 2.9727 max mem: 7597 Epoch: 1 [ 99796 / 111704] loss: 3.3128 loss_at_student: 3.3128 max mem: 7597 Epoch: 1 [100296 / 111704] loss: 3.2027 loss_at_student: 3.2027 max mem: 7597 Epoch: 1 [100796 / 111704] loss: 3.2118 loss_at_student: 3.2118 max mem: 7597 Epoch: 1 [101296 / 111704] loss: 3.1509 loss_at_student: 3.1509 max mem: 7597 Epoch: 1 [101796 / 111704] loss: 2.8168 loss_at_student: 2.8168 max mem: 7597 Epoch: 1 [102296 / 111704] loss: 3.3901 loss_at_student: 3.3901 max mem: 7597 Epoch: 1 [102796 / 111704] loss: 3.0754 loss_at_student: 3.0754 max mem: 7597 Epoch: 1 [103296 / 111704] loss: 3.0242 loss_at_student: 3.0242 max mem: 7597 Epoch: 1 [103796 / 111704] loss: 3.2743 loss_at_student: 3.2743 max mem: 7597 Epoch: 1 [104296 / 111704] loss: 3.3502 loss_at_student: 3.3502 max mem: 7597 Epoch: 1 [104796 / 111704] loss: 3.2919 loss_at_student: 3.2919 max mem: 7597 Epoch: 1 [105296 / 111704] loss: 3.1074 loss_at_student: 3.1074 max mem: 7597 Epoch: 1 [105796 / 111704] loss: 3.3843 loss_at_student: 3.3843 max mem: 7597 Epoch: 1 [106296 / 111704] loss: 3.1101 loss_at_student: 3.1101 max mem: 7597 Epoch: 1 [106796 / 111704] loss: 3.1543 loss_at_student: 3.1543 max mem: 7597 Epoch: 1 [107296 / 111704] loss: 3.1192 loss_at_student: 3.1192 max mem: 7597 Epoch: 1 [107796 / 111704] loss: 3.3150 loss_at_student: 3.3150 max mem: 7597 Epoch: 1 [108296 / 111704] loss: 3.0263 loss_at_student: 3.0263 max mem: 7597 Epoch: 1 [108796 / 111704] loss: 3.5272 loss_at_student: 3.5272 max mem: 7597 Epoch: 1 [109296 / 111704] loss: 3.1728 loss_at_student: 3.1728 max mem: 7597 Epoch: 1 [109796 / 111704] loss: 2.9165 loss_at_student: 2.9165 max mem: 7597 Epoch: 1 [110296 / 111704] loss: 3.2870 loss_at_student: 3.2870 max mem: 7597 Epoch: 1 [110796 / 111704] loss: 3.0450 loss_at_student: 3.0450 max mem: 7597 Epoch: 1 [111296 / 111704] loss: 3.0864 loss_at_student: 3.0864 max mem: 7597 Averaged stats: loss: 3.3426 loss_at_student: 3.3426 Train epoch time: 3:58:27 Epoch: 2 [ 92 / 111704] loss: 3.1570 loss_at_student: 3.1570 max mem: 7597 Epoch: 2 [ 592 / 111704] loss: 3.1492 loss_at_student: 3.1492 max mem: 7597 Epoch: 2 [ 1092 / 111704] loss: 3.3456 loss_at_student: 3.3456 max mem: 7597 Epoch: 2 [ 1592 / 111704] loss: 3.1372 loss_at_student: 3.1372 max mem: 7597 Epoch: 2 [ 2092 / 111704] loss: 3.2438 loss_at_student: 3.2438 max mem: 7597 Epoch: 2 [ 2592 / 111704] loss: 3.6019 loss_at_student: 3.6019 max mem: 7597 Epoch: 2 [ 3092 / 111704] loss: 3.1922 loss_at_student: 3.1922 max mem: 7597 Epoch: 2 [ 3592 / 111704] loss: 3.3063 loss_at_student: 3.3063 max mem: 7597 Epoch: 2 [ 4092 / 111704] loss: 3.3038 loss_at_student: 3.3038 max mem: 7597 Epoch: 2 [ 4592 / 111704] loss: 3.2484 loss_at_student: 3.2484 max mem: 7597 Epoch: 2 [ 5092 / 111704] loss: 3.3163 loss_at_student: 3.3163 max mem: 7597 Epoch: 2 [ 5592 / 111704] loss: 3.4034 loss_at_student: 3.4034 max mem: 7597 Epoch: 2 [ 6092 / 111704] loss: 3.3259 loss_at_student: 3.3259 max mem: 7597 Epoch: 2 [ 6592 / 111704] loss: 3.5461 loss_at_student: 3.5461 max mem: 7597 Epoch: 2 [ 7092 / 111704] loss: 3.0349 loss_at_student: 3.0349 max mem: 7597 Epoch: 2 [ 7592 / 111704] loss: 3.4164 loss_at_student: 3.4164 max mem: 7597 Epoch: 2 [ 8092 / 111704] loss: 3.2787 loss_at_student: 3.2787 max mem: 7597 Epoch: 2 [ 8592 / 111704] loss: 3.4127 loss_at_student: 3.4127 max mem: 7597 Epoch: 2 [ 9092 / 111704] loss: 2.9611 loss_at_student: 2.9611 max mem: 7597 Epoch: 2 [ 9592 / 111704] loss: 3.1175 loss_at_student: 3.1175 max mem: 7597 Epoch: 2 [ 10092 / 111704] loss: 3.1947 loss_at_student: 3.1947 max mem: 7597 Epoch: 2 [ 10592 / 111704] loss: 3.1617 loss_at_student: 3.1617 max mem: 7597 Epoch: 2 [ 11092 / 111704] loss: 3.2706 loss_at_student: 3.2706 max mem: 7597 Epoch: 2 [ 11592 / 111704] loss: 3.2594 loss_at_student: 3.2594 max mem: 7597 Epoch: 2 [ 12092 / 111704] loss: 2.9815 loss_at_student: 2.9815 max mem: 7597 Epoch: 2 [ 12592 / 111704] loss: 3.2797 loss_at_student: 3.2797 max mem: 7597 Epoch: 2 [ 13092 / 111704] loss: 2.8657 loss_at_student: 2.8657 max mem: 7597 Epoch: 2 [ 13592 / 111704] loss: 2.9151 loss_at_student: 2.9151 max mem: 7597 Epoch: 2 [ 14092 / 111704] loss: 3.1873 loss_at_student: 3.1873 max mem: 7597 Epoch: 2 [ 14592 / 111704] loss: 3.1420 loss_at_student: 3.1420 max mem: 7597 Epoch: 2 [ 15092 / 111704] loss: 3.3000 loss_at_student: 3.3000 max mem: 7597 Epoch: 2 [ 15592 / 111704] loss: 3.1154 loss_at_student: 3.1154 max mem: 7597 Epoch: 2 [ 16092 / 111704] loss: 3.6799 loss_at_student: 3.6799 max mem: 7597 Epoch: 2 [ 16592 / 111704] loss: 2.9719 loss_at_student: 2.9719 max mem: 7597 Epoch: 2 [ 17092 / 111704] loss: 3.3178 loss_at_student: 3.3178 max mem: 7597 Epoch: 2 [ 17592 / 111704] loss: 3.0249 loss_at_student: 3.0249 max mem: 7597 Epoch: 2 [ 18092 / 111704] loss: 3.1124 loss_at_student: 3.1124 max mem: 7597 Epoch: 2 [ 18592 / 111704] loss: 3.0208 loss_at_student: 3.0208 max mem: 7597 Epoch: 2 [ 19092 / 111704] loss: 3.4148 loss_at_student: 3.4148 max mem: 7597 Epoch: 2 [ 19592 / 111704] loss: 2.7564 loss_at_student: 2.7564 max mem: 7597 Epoch: 2 [ 20092 / 111704] loss: 3.0958 loss_at_student: 3.0958 max mem: 7597 Epoch: 2 [ 20592 / 111704] loss: 2.9602 loss_at_student: 2.9602 max mem: 7597 Epoch: 2 [ 21092 / 111704] loss: 3.1279 loss_at_student: 3.1279 max mem: 7597 Epoch: 2 [ 21592 / 111704] loss: 2.5996 loss_at_student: 2.5996 max mem: 7597 Epoch: 2 [ 22092 / 111704] loss: 2.9255 loss_at_student: 2.9255 max mem: 7597 Epoch: 2 [ 22592 / 111704] loss: 2.8220 loss_at_student: 2.8220 max mem: 7597 Epoch: 2 [ 23092 / 111704] loss: 3.5968 loss_at_student: 3.5968 max mem: 7597 Epoch: 2 [ 23592 / 111704] loss: 3.1218 loss_at_student: 3.1218 max mem: 7597 Epoch: 2 [ 24092 / 111704] loss: 3.0281 loss_at_student: 3.0281 max mem: 7597 Epoch: 2 [ 24592 / 111704] loss: 2.9733 loss_at_student: 2.9733 max mem: 7597 Epoch: 2 [ 25092 / 111704] loss: 2.9832 loss_at_student: 2.9832 max mem: 7597 Epoch: 2 [ 25592 / 111704] loss: 3.1556 loss_at_student: 3.1556 max mem: 7597 Epoch: 2 [ 26092 / 111704] loss: 3.5751 loss_at_student: 3.5751 max mem: 7597 Epoch: 2 [ 26592 / 111704] loss: 3.0645 loss_at_student: 3.0645 max mem: 7597 Epoch: 2 [ 27092 / 111704] loss: 3.2230 loss_at_student: 3.2230 max mem: 7597 Epoch: 2 [ 27592 / 111704] loss: 3.1791 loss_at_student: 3.1791 max mem: 7597 Epoch: 2 [ 28092 / 111704] loss: 3.1030 loss_at_student: 3.1030 max mem: 7597 Epoch: 2 [ 28592 / 111704] loss: 2.9599 loss_at_student: 2.9599 max mem: 7597 Epoch: 2 [ 29092 / 111704] loss: 3.2918 loss_at_student: 3.2918 max mem: 7597 Epoch: 2 [ 29592 / 111704] loss: 3.5885 loss_at_student: 3.5885 max mem: 7597 Epoch: 2 [ 30092 / 111704] loss: 3.0141 loss_at_student: 3.0141 max mem: 7597 Epoch: 2 [ 30592 / 111704] loss: 2.9831 loss_at_student: 2.9831 max mem: 7597 Epoch: 2 [ 31092 / 111704] loss: 2.7934 loss_at_student: 2.7934 max mem: 7597 Epoch: 2 [ 31592 / 111704] loss: 2.9667 loss_at_student: 2.9667 max mem: 7597 Epoch: 2 [ 32092 / 111704] loss: 3.1315 loss_at_student: 3.1315 max mem: 7597 Epoch: 2 [ 32592 / 111704] loss: 3.2508 loss_at_student: 3.2508 max mem: 7597 Epoch: 2 [ 33092 / 111704] loss: 3.2722 loss_at_student: 3.2722 max mem: 7597 Epoch: 2 [ 33592 / 111704] loss: 2.7211 loss_at_student: 2.7211 max mem: 7597 Epoch: 2 [ 34092 / 111704] loss: 2.8365 loss_at_student: 2.8365 max mem: 7597 Epoch: 2 [ 34592 / 111704] loss: 3.3109 loss_at_student: 3.3109 max mem: 7597 Epoch: 2 [ 35092 / 111704] loss: 3.0362 loss_at_student: 3.0362 max mem: 7597 Epoch: 2 [ 35592 / 111704] loss: 2.9647 loss_at_student: 2.9647 max mem: 7597 Epoch: 2 [ 36092 / 111704] loss: 3.1992 loss_at_student: 3.1992 max mem: 7597 Epoch: 2 [ 36592 / 111704] loss: 3.1449 loss_at_student: 3.1449 max mem: 7597 Epoch: 2 [ 37092 / 111704] loss: 3.2123 loss_at_student: 3.2123 max mem: 7597 Epoch: 2 [ 37592 / 111704] loss: 2.9693 loss_at_student: 2.9693 max mem: 7597 Epoch: 2 [ 38092 / 111704] loss: 3.0670 loss_at_student: 3.0670 max mem: 7597 Epoch: 2 [ 38592 / 111704] loss: 3.1207 loss_at_student: 3.1207 max mem: 7597 Epoch: 2 [ 39092 / 111704] loss: 3.1011 loss_at_student: 3.1011 max mem: 7597 Epoch: 2 [ 39592 / 111704] loss: 3.2596 loss_at_student: 3.2596 max mem: 7597 Epoch: 2 [ 40092 / 111704] loss: 2.8965 loss_at_student: 2.8965 max mem: 7597 Epoch: 2 [ 40592 / 111704] loss: 3.0696 loss_at_student: 3.0696 max mem: 7597 Epoch: 2 [ 41092 / 111704] loss: 3.3265 loss_at_student: 3.3265 max mem: 7597 Epoch: 2 [ 41592 / 111704] loss: 3.4100 loss_at_student: 3.4100 max mem: 7597 Epoch: 2 [ 42092 / 111704] loss: 2.9811 loss_at_student: 2.9811 max mem: 7597 Epoch: 2 [ 42592 / 111704] loss: 3.0444 loss_at_student: 3.0444 max mem: 7597 Epoch: 2 [ 43092 / 111704] loss: 2.9677 loss_at_student: 2.9677 max mem: 7597 Epoch: 2 [ 43592 / 111704] loss: 3.1948 loss_at_student: 3.1948 max mem: 7597 Epoch: 2 [ 44092 / 111704] loss: 3.0865 loss_at_student: 3.0865 max mem: 7597 Epoch: 2 [ 44592 / 111704] loss: 2.9306 loss_at_student: 2.9306 max mem: 7597 Epoch: 2 [ 45092 / 111704] loss: 3.2895 loss_at_student: 3.2895 max mem: 7597 Epoch: 2 [ 45592 / 111704] loss: 2.9763 loss_at_student: 2.9763 max mem: 7597 Epoch: 2 [ 46092 / 111704] loss: 3.0334 loss_at_student: 3.0334 max mem: 7597 Epoch: 2 [ 46592 / 111704] loss: 3.1123 loss_at_student: 3.1123 max mem: 7597 Epoch: 2 [ 47092 / 111704] loss: 3.0714 loss_at_student: 3.0714 max mem: 7597 Epoch: 2 [ 47592 / 111704] loss: 2.9862 loss_at_student: 2.9862 max mem: 7597 Epoch: 2 [ 48092 / 111704] loss: 2.9168 loss_at_student: 2.9168 max mem: 7597 Epoch: 2 [ 48592 / 111704] loss: 3.0830 loss_at_student: 3.0830 max mem: 7597 Epoch: 2 [ 49092 / 111704] loss: 3.1148 loss_at_student: 3.1148 max mem: 7597 Epoch: 2 [ 49592 / 111704] loss: 3.3342 loss_at_student: 3.3342 max mem: 7597 Epoch: 2 [ 50092 / 111704] loss: 3.1005 loss_at_student: 3.1005 max mem: 7597 Epoch: 2 [ 50592 / 111704] loss: 3.2425 loss_at_student: 3.2425 max mem: 7597 Epoch: 2 [ 51092 / 111704] loss: 3.3952 loss_at_student: 3.3952 max mem: 7597 Epoch: 2 [ 51592 / 111704] loss: 3.2944 loss_at_student: 3.2944 max mem: 7597 Epoch: 2 [ 52092 / 111704] loss: 3.1615 loss_at_student: 3.1615 max mem: 7597 Epoch: 2 [ 52592 / 111704] loss: 3.5014 loss_at_student: 3.5014 max mem: 7597 Epoch: 2 [ 53092 / 111704] loss: 3.0910 loss_at_student: 3.0910 max mem: 7597 Epoch: 2 [ 53592 / 111704] loss: 3.0033 loss_at_student: 3.0033 max mem: 7597 Epoch: 2 [ 54092 / 111704] loss: 3.3219 loss_at_student: 3.3219 max mem: 7597 Epoch: 2 [ 54592 / 111704] loss: 2.9142 loss_at_student: 2.9142 max mem: 7597 Epoch: 2 [ 55092 / 111704] loss: 3.4974 loss_at_student: 3.4974 max mem: 7597 Epoch: 2 [ 55592 / 111704] loss: 3.0405 loss_at_student: 3.0405 max mem: 7597 Epoch: 2 [ 56092 / 111704] loss: 3.0574 loss_at_student: 3.0574 max mem: 7597 Epoch: 2 [ 56592 / 111704] loss: 3.1187 loss_at_student: 3.1187 max mem: 7597 Epoch: 2 [ 57092 / 111704] loss: 3.0074 loss_at_student: 3.0074 max mem: 7597 Epoch: 2 [ 57592 / 111704] loss: 3.1858 loss_at_student: 3.1858 max mem: 7597 Epoch: 2 [ 58092 / 111704] loss: 3.1928 loss_at_student: 3.1928 max mem: 7597 Epoch: 2 [ 58592 / 111704] loss: 3.0415 loss_at_student: 3.0415 max mem: 7597 Epoch: 2 [ 59092 / 111704] loss: 3.1918 loss_at_student: 3.1918 max mem: 7597 Epoch: 2 [ 59592 / 111704] loss: 3.3756 loss_at_student: 3.3756 max mem: 7597 Epoch: 2 [ 60092 / 111704] loss: 3.3203 loss_at_student: 3.3203 max mem: 7597 Epoch: 2 [ 60592 / 111704] loss: 3.1989 loss_at_student: 3.1989 max mem: 7597 Epoch: 2 [ 61092 / 111704] loss: 3.0439 loss_at_student: 3.0439 max mem: 7597 Epoch: 2 [ 61592 / 111704] loss: 3.0236 loss_at_student: 3.0236 max mem: 7597 Epoch: 2 [ 62092 / 111704] loss: 3.0663 loss_at_student: 3.0663 max mem: 7597 Epoch: 2 [ 62592 / 111704] loss: 2.8388 loss_at_student: 2.8388 max mem: 7597 Epoch: 2 [ 63092 / 111704] loss: 3.3522 loss_at_student: 3.3522 max mem: 7597 Epoch: 2 [ 63592 / 111704] loss: 3.0969 loss_at_student: 3.0969 max mem: 7597 Epoch: 2 [ 64092 / 111704] loss: 3.2880 loss_at_student: 3.2880 max mem: 7597 Epoch: 2 [ 64592 / 111704] loss: 3.1295 loss_at_student: 3.1295 max mem: 7597 Epoch: 2 [ 65092 / 111704] loss: 3.1881 loss_at_student: 3.1881 max mem: 7597 Epoch: 2 [ 65592 / 111704] loss: 3.1249 loss_at_student: 3.1249 max mem: 7597 Epoch: 2 [ 66092 / 111704] loss: 2.9852 loss_at_student: 2.9852 max mem: 7597 Epoch: 2 [ 66592 / 111704] loss: 3.1692 loss_at_student: 3.1692 max mem: 7597 Epoch: 2 [ 67092 / 111704] loss: 2.7819 loss_at_student: 2.7819 max mem: 7597 Epoch: 2 [ 67592 / 111704] loss: 3.1106 loss_at_student: 3.1106 max mem: 7597 Epoch: 2 [ 68092 / 111704] loss: 3.5338 loss_at_student: 3.5338 max mem: 7597 Epoch: 2 [ 68592 / 111704] loss: 3.0590 loss_at_student: 3.0590 max mem: 7597 Epoch: 2 [ 69092 / 111704] loss: 3.3025 loss_at_student: 3.3025 max mem: 7597 Epoch: 2 [ 69592 / 111704] loss: 3.4662 loss_at_student: 3.4662 max mem: 7597 Epoch: 2 [ 70092 / 111704] loss: 2.8658 loss_at_student: 2.8658 max mem: 7597 Epoch: 2 [ 70592 / 111704] loss: 3.2873 loss_at_student: 3.2873 max mem: 7597 Epoch: 2 [ 71092 / 111704] loss: 3.4558 loss_at_student: 3.4558 max mem: 7597 Epoch: 2 [ 71592 / 111704] loss: 2.7339 loss_at_student: 2.7339 max mem: 7597 Epoch: 2 [ 72092 / 111704] loss: 3.0521 loss_at_student: 3.0521 max mem: 7597 Epoch: 2 [ 72592 / 111704] loss: 3.2664 loss_at_student: 3.2664 max mem: 7597 Epoch: 2 [ 73092 / 111704] loss: 2.7575 loss_at_student: 2.7575 max mem: 7597 Epoch: 2 [ 73592 / 111704] loss: 3.0140 loss_at_student: 3.0140 max mem: 7597 Epoch: 2 [ 74092 / 111704] loss: 2.6482 loss_at_student: 2.6482 max mem: 7597 Epoch: 2 [ 74592 / 111704] loss: 3.2003 loss_at_student: 3.2003 max mem: 7597 Epoch: 2 [ 75092 / 111704] loss: 3.6019 loss_at_student: 3.6019 max mem: 7597 Epoch: 2 [ 75592 / 111704] loss: 3.3288 loss_at_student: 3.3288 max mem: 7597 Epoch: 2 [ 76092 / 111704] loss: 2.8977 loss_at_student: 2.8977 max mem: 7597 Epoch: 2 [ 76592 / 111704] loss: 3.5917 loss_at_student: 3.5917 max mem: 7597 Epoch: 2 [ 77092 / 111704] loss: 2.9039 loss_at_student: 2.9039 max mem: 7597 Epoch: 2 [ 77592 / 111704] loss: 3.4392 loss_at_student: 3.4392 max mem: 7597 Epoch: 2 [ 78092 / 111704] loss: 2.8249 loss_at_student: 2.8249 max mem: 7597 Epoch: 2 [ 78592 / 111704] loss: 3.1704 loss_at_student: 3.1704 max mem: 7597 Epoch: 2 [ 79092 / 111704] loss: 2.9782 loss_at_student: 2.9782 max mem: 7597 Epoch: 2 [ 79592 / 111704] loss: 2.8502 loss_at_student: 2.8502 max mem: 7597 Epoch: 2 [ 80092 / 111704] loss: 2.9841 loss_at_student: 2.9841 max mem: 7597 Epoch: 2 [ 80592 / 111704] loss: 3.1727 loss_at_student: 3.1727 max mem: 7597 Epoch: 2 [ 81092 / 111704] loss: 3.4080 loss_at_student: 3.4080 max mem: 7597 Epoch: 2 [ 81592 / 111704] loss: 3.2048 loss_at_student: 3.2048 max mem: 7597 Epoch: 2 [ 82092 / 111704] loss: 2.9910 loss_at_student: 2.9910 max mem: 7597 Epoch: 2 [ 82592 / 111704] loss: 3.1524 loss_at_student: 3.1524 max mem: 7597 Epoch: 2 [ 83092 / 111704] loss: 3.1920 loss_at_student: 3.1920 max mem: 7597 Epoch: 2 [ 83592 / 111704] loss: 3.0559 loss_at_student: 3.0559 max mem: 7597 Epoch: 2 [ 84092 / 111704] loss: 2.8880 loss_at_student: 2.8880 max mem: 7597 Epoch: 2 [ 84592 / 111704] loss: 3.2365 loss_at_student: 3.2365 max mem: 7597 Epoch: 2 [ 85092 / 111704] loss: 2.8279 loss_at_student: 2.8279 max mem: 7597 Epoch: 2 [ 85592 / 111704] loss: 2.9217 loss_at_student: 2.9217 max mem: 7597 Epoch: 2 [ 86092 / 111704] loss: 3.3111 loss_at_student: 3.3111 max mem: 7597 Epoch: 2 [ 86592 / 111704] loss: 3.2999 loss_at_student: 3.2999 max mem: 7597 Epoch: 2 [ 87092 / 111704] loss: 3.1021 loss_at_student: 3.1021 max mem: 7597 Epoch: 2 [ 87592 / 111704] loss: 2.7914 loss_at_student: 2.7914 max mem: 7597 Epoch: 2 [ 88092 / 111704] loss: 3.0102 loss_at_student: 3.0102 max mem: 7597 Epoch: 2 [ 88592 / 111704] loss: 3.3256 loss_at_student: 3.3256 max mem: 7597 Epoch: 2 [ 89092 / 111704] loss: 3.1430 loss_at_student: 3.1430 max mem: 7597 Epoch: 2 [ 89592 / 111704] loss: 3.1627 loss_at_student: 3.1627 max mem: 7597 Epoch: 2 [ 90092 / 111704] loss: 2.8412 loss_at_student: 2.8412 max mem: 7597 Epoch: 2 [ 90592 / 111704] loss: 2.9398 loss_at_student: 2.9398 max mem: 7597 Epoch: 2 [ 91092 / 111704] loss: 2.9865 loss_at_student: 2.9865 max mem: 7597 Epoch: 2 [ 91592 / 111704] loss: 3.0481 loss_at_student: 3.0481 max mem: 7597 Epoch: 2 [ 92092 / 111704] loss: 2.6848 loss_at_student: 2.6848 max mem: 7597 Epoch: 2 [ 92592 / 111704] loss: 3.2974 loss_at_student: 3.2974 max mem: 7597 Epoch: 2 [ 93092 / 111704] loss: 2.9354 loss_at_student: 2.9354 max mem: 7597 Epoch: 2 [ 93592 / 111704] loss: 3.2387 loss_at_student: 3.2387 max mem: 7597 Epoch: 2 [ 94092 / 111704] loss: 2.5645 loss_at_student: 2.5645 max mem: 7597 Epoch: 2 [ 94592 / 111704] loss: 2.8155 loss_at_student: 2.8155 max mem: 7597 Epoch: 2 [ 95092 / 111704] loss: 3.1809 loss_at_student: 3.1809 max mem: 7597 Epoch: 2 [ 95592 / 111704] loss: 3.0687 loss_at_student: 3.0687 max mem: 7597 Epoch: 2 [ 96092 / 111704] loss: 3.0573 loss_at_student: 3.0573 max mem: 7597 Epoch: 2 [ 96592 / 111704] loss: 3.3157 loss_at_student: 3.3157 max mem: 7597 Epoch: 2 [ 97092 / 111704] loss: 2.8827 loss_at_student: 2.8827 max mem: 7597 Epoch: 2 [ 97592 / 111704] loss: 2.9934 loss_at_student: 2.9934 max mem: 7597 Epoch: 2 [ 98092 / 111704] loss: 3.0306 loss_at_student: 3.0306 max mem: 7597 Epoch: 2 [ 98592 / 111704] loss: 3.0934 loss_at_student: 3.0934 max mem: 7597 Epoch: 2 [ 99092 / 111704] loss: 2.9938 loss_at_student: 2.9938 max mem: 7597 Epoch: 2 [ 99592 / 111704] loss: 3.3441 loss_at_student: 3.3441 max mem: 7597 Epoch: 2 [100092 / 111704] loss: 3.0537 loss_at_student: 3.0537 max mem: 7597 Epoch: 2 [100592 / 111704] loss: 3.3418 loss_at_student: 3.3418 max mem: 7597 Epoch: 2 [101092 / 111704] loss: 3.0892 loss_at_student: 3.0892 max mem: 7597 Epoch: 2 [101592 / 111704] loss: 3.2821 loss_at_student: 3.2821 max mem: 7597 Epoch: 2 [102092 / 111704] loss: 3.0410 loss_at_student: 3.0410 max mem: 7597 Epoch: 2 [102592 / 111704] loss: 2.9763 loss_at_student: 2.9763 max mem: 7597 Epoch: 2 [103092 / 111704] loss: 3.2817 loss_at_student: 3.2817 max mem: 7597 Epoch: 2 [103592 / 111704] loss: 2.9892 loss_at_student: 2.9892 max mem: 7597 Epoch: 2 [104092 / 111704] loss: 3.0735 loss_at_student: 3.0735 max mem: 7597 Epoch: 2 [104592 / 111704] loss: 3.1850 loss_at_student: 3.1850 max mem: 7597 Epoch: 2 [105092 / 111704] loss: 3.0212 loss_at_student: 3.0212 max mem: 7597 Epoch: 2 [105592 / 111704] loss: 3.0313 loss_at_student: 3.0313 max mem: 7597 Epoch: 2 [106092 / 111704] loss: 3.0088 loss_at_student: 3.0088 max mem: 7597 Epoch: 2 [106592 / 111704] loss: 3.3199 loss_at_student: 3.3199 max mem: 7597 Epoch: 2 [107092 / 111704] loss: 2.8394 loss_at_student: 2.8394 max mem: 7597 Epoch: 2 [107592 / 111704] loss: 3.1127 loss_at_student: 3.1127 max mem: 7597 Epoch: 2 [108092 / 111704] loss: 3.1149 loss_at_student: 3.1149 max mem: 7597 Epoch: 2 [108592 / 111704] loss: 3.1553 loss_at_student: 3.1553 max mem: 7597 Epoch: 2 [109092 / 111704] loss: 3.0681 loss_at_student: 3.0681 max mem: 7597 Epoch: 2 [109592 / 111704] loss: 2.8928 loss_at_student: 2.8928 max mem: 7597 Epoch: 2 [110092 / 111704] loss: 2.5687 loss_at_student: 2.5687 max mem: 7597 Epoch: 2 [110592 / 111704] loss: 3.2357 loss_at_student: 3.2357 max mem: 7597 Epoch: 2 [111092 / 111704] loss: 3.1596 loss_at_student: 3.1596 max mem: 7597 Epoch: 2 [111592 / 111704] loss: 2.8145 loss_at_student: 2.8145 max mem: 7597 Averaged stats: loss: 3.1486 loss_at_student: 3.1486 Train epoch time: 3:58:32 Train time: 11:55:48