File size: 205,857 Bytes
4721857
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
[2024-06-21 17:33:50,150][fairseq_cli.train][INFO] - {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 200, 'log_format': 'json', 'log_file': None, 'tensorboard_logdir': 'tblog', 'wandb_project': 'AVSP-LLM', 'azureml_logging': False, 'seed': 1337, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': True, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'on_cpu_convert_precision': False, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'amp': False, 'amp_batch_retries': 2, 'amp_init_scale': 128, 'amp_scale_window': None, 'user_dir': '/home/theodore/Projects/VSP-LLM/src', 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': False, 'suppress_crashes': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma'}, 'common_eval': {'_name': None, 'path': None, 'post_process': None, 'quiet': False, 'model_overrides': '{}', 'results_path': None}, 'distributed_training': {'_name': None, 'distributed_world_size': 1, 'distributed_num_procs': 1, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': None, 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': False, 'ddp_backend': 'no_c10d', 'ddp_comm_hook': 'none', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': True, 'fast_stat_sync': False, 'heartbeat_timeout': -1, 'broadcast_buffers': False, 'slowmo_momentum': None, 'slowmo_algorithm': 'LocalSGD', 'localsgd_frequency': 3, 'nprocs_per_node': 1, 'pipeline_model_parallel': False, 'pipeline_balance': None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devices': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'fp16': True, 'memory_efficient_fp16': False, 'tpu': False, 'no_reshard_after_forward': False, 'fp32_reduce_scatter': False, 'cpu_offload': False, 'use_sharded_state': False}, 'dataset': {'_name': None, 'num_workers': 0, 'skip_invalid_size_inputs_valid_test': False, 'max_tokens': None, 'batch_size': 1, 'required_batch_size_multiple': 8, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset': 'train', 'valid_subset': 'valid', 'combine_valid_subsets': None, 'ignore_unused_valid_subsets': False, 'validate_interval': 1, 'validate_interval_updates': 0, 'validate_after_updates': 0, 'fixed_validation_seed': None, 'disable_validation': False, 'max_tokens_valid': None, 'batch_size_valid': 1, 'max_valid_steps': None, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0}, 'optimization': {'_name': None, 'max_epoch': 0, 'max_update': 30000, 'stop_time_hours': 0.0, 'clip_norm': 0.0, 'sentence_avg': True, 'update_freq': [8], 'lr': [0.0005], 'stop_min_lr': -1.0, 'use_bmuf': False}, 'checkpoint': {'_name': None, 'save_dir': 'checkpoints', 'restore_file': 'checkpoint_last.pt', 'finetune_from_model': None, 'reset_dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 2500, 'keep_interval_updates': 1, 'keep_interval_updates_pattern': -1, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': True, 'no_last_checkpoints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'accuracy', 'maximize_best_checkpoint_metric': True, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_shard_count': 1, 'load_checkpoint_on_all_dp_ranks': False, 'write_checkpoints_asynchronously': False, 'model_parallel_size': 1}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_momentum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 1}, 'generation': {'_name': None, 'beam': 5, 'nbest': 1, 'max_len_a': 0.0, 'max_len_b': 200, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': None, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False}, 'eval_lm': {'_name': None, 'output_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 0, 'input': '-'}, 'model': {'_name': 'vsp_llm', 'w2v_path': '/home/theodore/Projects/VSP-LLM/checkpoints/large_vox_iter5.pt', 'llm_ckpt_path': 'vilm/vinallama-2.7b', 'apply_mask': False, 'mask_selection': 'static', 'mask_length': 10, 'mask_other': 0, 'mask_prob': 0.75, 'mask_channel_selection': 'static', 'mask_channel_length': 64, 'mask_channel_other': 0, 'mask_channel_prob': 0.5, 'layerdrop': 0.1, 'dropout': 0.0, 'activation_dropout': 0.1, 'attention_dropout': 0.0, 'feature_grad_mult': 1.0, 'encoder_embed_dim': 1024, 'decoder_embed_dim': 4096, 'freeze_finetune_updates': 18000}, 'task': {'_name': 'vsp_llm_training', 'is_s2s': True, 'data': '/home/theodore/Projects/VSP-LLM/data/processed/vasr/100h/1_2_2', 'label_dir': '/home/theodore/Projects/VSP-LLM/data/processed/vasr/100h/1_2_2', 'normalize': True, 'labels': ['wrd'], 'single_target': True, 'fine_tuning': True, 'stack_order_audio': 4, 'max_sample_size': 500, 'modalities': ['video', 'audio'], 'image_aug': True, 'pad_audio': True, 'random_crop': False, 'llm_ckpt_path': 'vilm/vinallama-2.7b'}, 'criterion': {'_name': 'decoder_only_language_modeling_loss', 'report_accuracy': True, 'label_smoothing': 0.1}, 'optimizer': {'_name': 'adam', 'adam_betas': '(0.9,0.98)', 'adam_eps': 1e-08, 'weight_decay': 0.0, 'use_old_adam': False, 'tpu': False, 'lr': [0.0005]}, 'lr_scheduler': {'_name': 'tri_stage', 'warmup_steps': 10000, 'hold_steps': 0, 'decay_steps': 20000, 'phase_ratio': None, 'init_lr_scale': 0.01, 'final_lr_scale': 0.05, 'max_update': 30000, 'lr': [0.0005]}, 'scoring': None, 'bpe': None, 'tokenizer': None, 'job_logging_cfg': {'version': 1, 'formatters': {'simple': {'format': '[%(asctime)s][%(name)s][%(levelname)s] - %(message)s'}}, 'handlers': {'console': {'class': 'logging.StreamHandler', 'formatter': 'simple', 'stream': 'ext://sys.stdout'}, 'file': {'class': 'logging.FileHandler', 'formatter': 'simple', 'filename': 'hydra_train.log'}}, 'root': {'level': 'INFO', 'handlers': ['console', 'file']}, 'disable_existing_loggers': False}}
[2024-06-21 17:33:50,153][src.vsp_llm_training][INFO] - current directory is /home/theodore/Projects/VSP-LLM/experiments/ViAVSP-LLM_v1.2.2
[2024-06-21 17:33:50,153][src.vsp_llm_training][INFO] - AVHubertPretrainingTask Config {'_name': 'vsp_llm_training', 'data': '/home/theodore/Projects/VSP-LLM/data/processed/vasr/100h/1_2_2', 'labels': ['wrd'], 'label_dir': '/home/theodore/Projects/VSP-LLM/data/processed/vasr/100h/1_2_2', 'label_rate': -1, 'sample_rate': 16000, 'llm_ckpt_path': 'vilm/vinallama-2.7b', 'normalize': True, 'enable_padding': False, 'max_sample_size': 500, 'min_sample_size': None, 'max_trim_sample_size': '${task.max_sample_size}', 'single_target': True, 'random_crop': False, 'pad_audio': True, 'pdb': False, 'stack_order_audio': 4, 'skip_verify': False, 'image_aug': True, 'image_crop_size': 88, 'image_mean': 0.421, 'image_std': 0.165, 'modalities': ['video', 'audio'], 'is_s2s': True, 'tokenizer_bpe_name': None, 'tokenizer_bpe_model': None, 'noise_wav': None, 'noise_prob': 0.0, 'noise_snr': '0', 'noise_num': 1, 'fine_tuning': True}
[2024-06-21 17:33:52,075][src.hubert_pretraining][INFO] - current directory is /home/theodore/Projects/VSP-LLM/experiments/ViAVSP-LLM_v1.2.2
[2024-06-21 17:33:52,075][src.hubert_pretraining][INFO] - AVHubertPretrainingTask Config {'_name': 'av_hubert_pretraining', 'data': '/home/theodore/Projects/VSP-LLM/data/processed/vasr/100h/1_2_2', 'labels': ['km'], 'label_dir': '/checkpoint/bshi/data/lrs3//video/hubert/stitch-iters/envox-iter4-l12c2000/', 'label_rate': 25, 'sample_rate': 25, 'normalize': True, 'enable_padding': False, 'max_sample_size': 2000, 'min_sample_size': 5, 'max_trim_sample_size': 400, 'single_target': False, 'random_crop': True, 'pad_audio': False, 'pdb': False, 'stack_order_audio': 4, 'skip_verify': False, 'image_aug': True, 'image_crop_size': 88, 'image_mean': 0.421, 'image_std': 0.165, 'modalities': ['audio', 'video'], 'is_s2s': False, 'tokenizer_bpe_name': None, 'tokenizer_bpe_model': None, 'noise_wav': None, 'noise_prob': 0.0, 'noise_snr': '0', 'noise_num': 1, 'fine_tuning': False}
[2024-06-21 17:33:52,079][src.hubert][INFO] - HubertModel Config: {'_name': 'av_hubert', 'label_rate': 25, 'input_modality': '${task.input_modality}', 'extractor_mode': default, 'encoder_layers': 24, 'encoder_embed_dim': 1024, 'encoder_ffn_embed_dim': 4096, 'encoder_attention_heads': 16, 'activation_fn': gelu, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.1, 'encoder_layerdrop': 0.1, 'dropout_input': 0.0, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': True, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 1.0, 'mask_length_audio': 10, 'mask_prob_audio': 0.8, 'mask_length_image': 5, 'mask_prob_image': 0.3, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'resnet_relu_type': 'prelu', 'resnet_weights': None, 'sim_type': 'cosine', 'sub_encoder_layers': 0, 'audio_feat_dim': 104, 'modality_dropout': 0.5, 'audio_dropout': 0.5, 'modality_fuse': 'concat', 'selection_type': 'same_seq', 'masking_type': 'input', 'decoder_embed_dim': 768, 'decoder_ffn_embed_dim': 3072, 'decoder_layers': 6, 'decoder_layerdrop': 0.0, 'decoder_attention_heads': 4, 'decoder_learned_pos': False, 'decoder_normalize_before': False, 'no_token_positional_embeddings': False, 'decoder_dropout': 0.1, 'decoder_attention_dropout': 0.1, 'decoder_activation_dropout': 0.0, 'max_target_positions': 2048, 'share_decoder_input_output_embed': False, 'no_scale_embedding': True}
[2024-06-21 17:33:58,960][fairseq_cli.train][INFO] - avhubert_llm_seq2seq_cluster_count(
  (encoder): HubertEncoderWrapper(
    (w2v_model): AVHubertModel(
      (feature_extractor_audio): SubModel(
        (proj): Linear(in_features=104, out_features=1024, bias=True)
      )
      (feature_extractor_video): SubModel(
        (resnet): ResEncoder(
          (frontend3D): Sequential(
            (0): Conv3d(1, 64, kernel_size=(5, 7, 7), stride=(1, 2, 2), padding=(2, 3, 3), bias=False)
            (1): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): PReLU(num_parameters=64)
            (3): MaxPool3d(kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1), dilation=1, ceil_mode=False)
          )
          (trunk): ResNet(
            (layer1): Sequential(
              (0): BasicBlock(
                (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (relu1): PReLU(num_parameters=64)
                (relu2): PReLU(num_parameters=64)
                (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              )
              (1): BasicBlock(
                (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (relu1): PReLU(num_parameters=64)
                (relu2): PReLU(num_parameters=64)
                (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              )
            )
            (layer2): Sequential(
              (0): BasicBlock(
                (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
                (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (relu1): PReLU(num_parameters=128)
                (relu2): PReLU(num_parameters=128)
                (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (downsample): Sequential(
                  (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
                  (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
              )
              (1): BasicBlock(
                (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (relu1): PReLU(num_parameters=128)
                (relu2): PReLU(num_parameters=128)
                (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              )
            )
            (layer3): Sequential(
              (0): BasicBlock(
                (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
                (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (relu1): PReLU(num_parameters=256)
                (relu2): PReLU(num_parameters=256)
                (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (downsample): Sequential(
                  (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
                  (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
              )
              (1): BasicBlock(
                (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (relu1): PReLU(num_parameters=256)
                (relu2): PReLU(num_parameters=256)
                (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              )
            )
            (layer4): Sequential(
              (0): BasicBlock(
                (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
                (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (relu1): PReLU(num_parameters=512)
                (relu2): PReLU(num_parameters=512)
                (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (downsample): Sequential(
                  (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
                  (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
              )
              (1): BasicBlock(
                (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (relu1): PReLU(num_parameters=512)
                (relu2): PReLU(num_parameters=512)
                (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              )
            )
            (avgpool): AdaptiveAvgPool2d(output_size=1)
          )
        )
        (proj): Linear(in_features=512, out_features=1024, bias=True)
      )
      (post_extract_proj): Linear(in_features=2048, out_features=1024, bias=True)
      (dropout_input): Dropout(p=0.0, inplace=False)
      (dropout_features): Dropout(p=0.1, inplace=False)
      (encoder): TransformerEncoder(
        (pos_conv): Sequential(
          (0): Conv1d(1024, 1024, kernel_size=(128,), stride=(1,), padding=(64,), groups=16)
          (1): SamePad()
          (2): GELU(approximate='none')
        )
        (layers): ModuleList(
          (0-23): 24 x TransformerSentenceEncoderLayer(
            (self_attn): MultiheadAttention(
              (dropout_module): FairseqDropout()
              (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
              (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
              (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
              (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
            )
            (dropout1): Dropout(p=0.0, inplace=False)
            (dropout2): Dropout(p=0.1, inplace=False)
            (dropout3): Dropout(p=0.0, inplace=False)
            (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
            (fc1): Linear(in_features=1024, out_features=4096, bias=True)
            (fc2): Linear(in_features=4096, out_features=1024, bias=True)
            (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
          )
        )
        (layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (layer_norm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
      (final_proj): None
    )
  )
  (decoder): PeftModelForCausalLM(
    (base_model): LoraModel(
      (model): LlamaForCausalLM(
        (model): LlamaModel(
          (embed_tokens): Embedding(46304, 2560, padding_idx=0)
          (layers): ModuleList(
            (0-31): 32 x LlamaDecoderLayer(
              (self_attn): LlamaSdpaAttention(
                (q_proj): lora.Linear4bit(
                  (base_layer): Linear4bit(in_features=2560, out_features=2560, bias=False)
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_features=2560, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (default): Linear(in_features=16, out_features=2560, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                )
                (k_proj): lora.Linear4bit(
                  (base_layer): Linear4bit(in_features=2560, out_features=2560, bias=False)
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_features=2560, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (default): Linear(in_features=16, out_features=2560, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                )
                (v_proj): lora.Linear4bit(
                  (base_layer): Linear4bit(in_features=2560, out_features=2560, bias=False)
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_features=2560, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (default): Linear(in_features=16, out_features=2560, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                )
                (o_proj): Linear4bit(in_features=2560, out_features=2560, bias=False)
                (rotary_emb): LlamaRotaryEmbedding()
              )
              (mlp): LlamaMLP(
                (gate_proj): Linear4bit(in_features=2560, out_features=6912, bias=False)
                (up_proj): Linear4bit(in_features=2560, out_features=6912, bias=False)
                (down_proj): Linear4bit(in_features=6912, out_features=2560, bias=False)
                (act_fn): SiLU()
              )
              (input_layernorm): LlamaRMSNorm()
              (post_attention_layernorm): LlamaRMSNorm()
            )
          )
          (norm): LlamaRMSNorm()
        )
        (lm_head): Linear(in_features=2560, out_features=46304, bias=False)
      )
    )
  )
  (avfeat_to_llm): Linear(in_features=1024, out_features=2560, bias=True)
)
[2024-06-21 17:33:58,966][fairseq_cli.train][INFO] - task: VSP_LLM_TrainingTask
[2024-06-21 17:33:58,966][fairseq_cli.train][INFO] - model: avhubert_llm_seq2seq_cluster_count
[2024-06-21 17:33:58,966][fairseq_cli.train][INFO] - criterion: decoder_only_language_modeling_loss
[2024-06-21 17:33:58,969][fairseq_cli.train][INFO] - num. shared model params: 1,841,644,264 (num. trained: 335,624,424)
[2024-06-21 17:33:58,971][fairseq_cli.train][INFO] - num. expert model params: 0 (num. trained: 0)
[2024-06-21 17:33:58,972][src.vsp_llm_training][INFO] - Using tokenizer
[2024-06-21 17:33:59,010][src.vsp_llm_dataset][INFO] - max_keep=500, min_keep=None, loaded 23990, skipped 0 short and 0 long and 0 unaligned, longest-loaded=76, shortest-loaded=76
[2024-06-21 17:33:59,347][src.vsp_llm_dataset][INFO] - /home/theodore/Projects/VSP-LLM/data/processed/vasr/100h/1_2_2/valid.wrd is sequence label. skipped
[2024-06-21 17:33:59,347][src.vsp_llm_dataset][INFO] - image transform: Compose(
    Normalize(mean=0.0, std=255.0)
    <src.utils_vsp_llm.CenterCrop object at 0x795ad4ac3190>
    Normalize(mean=0.421, std=0.165)
)
[2024-06-21 17:33:59,347][src.vsp_llm_dataset][INFO] - pad_audio=True, random_crop=False, normalize=True, max_sample_size=500, seqs2seq data=True,
[2024-06-21 17:33:59,347][src.vsp_llm_dataset][INFO] - Noise wav: None->0 wav, Prob: 0.0, SNR: 0, Number of mixture: 1
[2024-06-21 17:33:59,510][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- encoder.w2v_model.feature_extractor_video.resnet.trunk.layer1.0.conv1.bias
[2024-06-21 17:33:59,510][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- encoder.w2v_model.feature_extractor_video.resnet.trunk.layer1.0.conv2.bias
[2024-06-21 17:33:59,510][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- encoder.w2v_model.feature_extractor_video.resnet.trunk.layer1.1.conv1.bias
[2024-06-21 17:33:59,510][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- encoder.w2v_model.feature_extractor_video.resnet.trunk.layer1.1.conv2.bias
[2024-06-21 17:33:59,510][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- encoder.w2v_model.feature_extractor_video.resnet.trunk.layer2.0.conv1.bias
[2024-06-21 17:33:59,510][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- encoder.w2v_model.feature_extractor_video.resnet.trunk.layer2.0.conv2.bias
[2024-06-21 17:33:59,510][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- encoder.w2v_model.feature_extractor_video.resnet.trunk.layer2.0.downsample.0.bias
[2024-06-21 17:33:59,510][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- encoder.w2v_model.feature_extractor_video.resnet.trunk.layer2.1.conv1.bias
[2024-06-21 17:33:59,510][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- encoder.w2v_model.feature_extractor_video.resnet.trunk.layer2.1.conv2.bias
[2024-06-21 17:33:59,510][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- encoder.w2v_model.feature_extractor_video.resnet.trunk.layer3.0.conv1.bias
[2024-06-21 17:33:59,510][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- encoder.w2v_model.feature_extractor_video.resnet.trunk.layer3.0.conv2.bias
[2024-06-21 17:33:59,510][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- encoder.w2v_model.feature_extractor_video.resnet.trunk.layer3.0.downsample.0.bias
[2024-06-21 17:33:59,510][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- encoder.w2v_model.feature_extractor_video.resnet.trunk.layer3.1.conv1.bias
[2024-06-21 17:33:59,510][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- encoder.w2v_model.feature_extractor_video.resnet.trunk.layer3.1.conv2.bias
[2024-06-21 17:33:59,510][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- encoder.w2v_model.feature_extractor_video.resnet.trunk.layer4.0.conv1.bias
[2024-06-21 17:33:59,510][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- encoder.w2v_model.feature_extractor_video.resnet.trunk.layer4.0.conv2.bias
[2024-06-21 17:33:59,510][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- encoder.w2v_model.feature_extractor_video.resnet.trunk.layer4.0.downsample.0.bias
[2024-06-21 17:33:59,510][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- encoder.w2v_model.feature_extractor_video.resnet.trunk.layer4.1.conv1.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- encoder.w2v_model.feature_extractor_video.resnet.trunk.layer4.1.conv2.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.0.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.0.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.0.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.0.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.0.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.0.self_attn.o_proj.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.0.mlp.gate_proj.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.0.mlp.up_proj.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.0.mlp.down_proj.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.1.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.1.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.1.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.1.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.1.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.1.self_attn.o_proj.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.1.mlp.gate_proj.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.1.mlp.up_proj.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.1.mlp.down_proj.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.2.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.2.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.2.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.2.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.2.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.2.self_attn.o_proj.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.2.mlp.gate_proj.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.2.mlp.up_proj.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.2.mlp.down_proj.bias
[2024-06-21 17:33:59,511][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.3.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.3.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.3.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.3.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.3.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.3.self_attn.o_proj.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.3.mlp.gate_proj.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.3.mlp.up_proj.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.3.mlp.down_proj.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.4.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.4.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.4.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.4.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.4.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.4.self_attn.o_proj.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.4.mlp.gate_proj.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.4.mlp.up_proj.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.4.mlp.down_proj.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.5.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.5.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.5.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.5.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.5.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.5.self_attn.o_proj.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.5.mlp.gate_proj.bias
[2024-06-21 17:33:59,512][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.5.mlp.up_proj.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.5.mlp.down_proj.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.6.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.6.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.6.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.6.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.6.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.6.self_attn.o_proj.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.6.mlp.gate_proj.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.6.mlp.up_proj.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.6.mlp.down_proj.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.7.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.7.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.7.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.7.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.7.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.7.self_attn.o_proj.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.7.mlp.gate_proj.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.7.mlp.up_proj.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.7.mlp.down_proj.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.8.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.8.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.8.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.8.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.8.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.8.self_attn.o_proj.bias
[2024-06-21 17:33:59,513][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.8.mlp.gate_proj.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.8.mlp.up_proj.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.8.mlp.down_proj.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.9.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.9.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.9.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.9.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.9.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.9.self_attn.o_proj.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.9.mlp.gate_proj.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.9.mlp.up_proj.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.9.mlp.down_proj.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.10.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.10.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.10.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.10.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.10.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.10.self_attn.o_proj.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.10.mlp.gate_proj.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.10.mlp.up_proj.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.10.mlp.down_proj.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.11.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.11.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.11.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.11.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.11.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,514][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.11.self_attn.o_proj.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.11.mlp.gate_proj.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.11.mlp.up_proj.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.11.mlp.down_proj.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.12.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.12.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.12.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.12.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.12.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.12.self_attn.o_proj.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.12.mlp.gate_proj.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.12.mlp.up_proj.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.12.mlp.down_proj.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.13.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.13.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.13.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.13.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.13.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.13.self_attn.o_proj.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.13.mlp.gate_proj.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.13.mlp.up_proj.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.13.mlp.down_proj.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.14.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.14.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.14.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.14.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.14.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,515][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.14.self_attn.o_proj.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.14.mlp.gate_proj.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.14.mlp.up_proj.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.14.mlp.down_proj.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.15.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.15.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.15.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.15.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.15.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.15.self_attn.o_proj.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.15.mlp.gate_proj.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.15.mlp.up_proj.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.15.mlp.down_proj.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.16.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.16.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.16.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.16.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.16.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.16.self_attn.o_proj.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.16.mlp.gate_proj.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.16.mlp.up_proj.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.16.mlp.down_proj.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.17.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.17.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.17.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.17.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.17.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.17.self_attn.o_proj.bias
[2024-06-21 17:33:59,516][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.17.mlp.gate_proj.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.17.mlp.up_proj.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.17.mlp.down_proj.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.18.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.18.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.18.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.18.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.18.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.18.self_attn.o_proj.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.18.mlp.gate_proj.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.18.mlp.up_proj.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.18.mlp.down_proj.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.19.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.19.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.19.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.19.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.19.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.19.self_attn.o_proj.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.19.mlp.gate_proj.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.19.mlp.up_proj.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.19.mlp.down_proj.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.20.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.20.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.20.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.20.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.20.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.20.self_attn.o_proj.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.20.mlp.gate_proj.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.20.mlp.up_proj.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.20.mlp.down_proj.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.21.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,517][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.21.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.21.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.21.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.21.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.21.self_attn.o_proj.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.21.mlp.gate_proj.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.21.mlp.up_proj.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.21.mlp.down_proj.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.22.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.22.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.22.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.22.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.22.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.22.self_attn.o_proj.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.22.mlp.gate_proj.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.22.mlp.up_proj.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.22.mlp.down_proj.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.23.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.23.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.23.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.23.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.23.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.23.self_attn.o_proj.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.23.mlp.gate_proj.bias
[2024-06-21 17:33:59,518][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.23.mlp.up_proj.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.23.mlp.down_proj.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.24.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.24.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.24.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.24.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.24.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.24.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.24.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.24.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.24.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.24.self_attn.o_proj.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.24.mlp.gate_proj.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.24.mlp.up_proj.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.24.mlp.down_proj.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.25.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.25.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.25.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.25.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.25.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.25.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.25.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.25.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.25.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.25.self_attn.o_proj.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.25.mlp.gate_proj.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.25.mlp.up_proj.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.25.mlp.down_proj.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.26.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.26.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.26.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.26.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.26.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.26.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.26.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.26.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.26.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.26.self_attn.o_proj.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.26.mlp.gate_proj.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.26.mlp.up_proj.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.26.mlp.down_proj.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.27.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.27.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.27.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.27.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,519][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.27.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.27.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.27.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.27.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.27.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.27.self_attn.o_proj.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.27.mlp.gate_proj.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.27.mlp.up_proj.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.27.mlp.down_proj.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.28.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.28.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.28.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.28.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.28.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.28.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.28.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.28.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.28.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.28.self_attn.o_proj.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.28.mlp.gate_proj.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.28.mlp.up_proj.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.28.mlp.down_proj.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.29.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.29.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.29.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.29.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.29.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.29.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.29.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.29.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.29.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.29.self_attn.o_proj.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.29.mlp.gate_proj.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.29.mlp.up_proj.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.29.mlp.down_proj.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.30.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.30.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.30.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.30.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.30.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.30.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.30.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.30.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.30.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.30.self_attn.o_proj.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.30.mlp.gate_proj.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.30.mlp.up_proj.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.30.mlp.down_proj.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.31.self_attn.q_proj.base_layer.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.31.self_attn.q_proj.lora_A.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.31.self_attn.q_proj.lora_B.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.31.self_attn.k_proj.base_layer.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.31.self_attn.k_proj.lora_A.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.31.self_attn.k_proj.lora_B.default.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.31.self_attn.v_proj.base_layer.bias
[2024-06-21 17:33:59,520][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.31.self_attn.v_proj.lora_A.default.bias
[2024-06-21 17:33:59,521][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.31.self_attn.v_proj.lora_B.default.bias
[2024-06-21 17:33:59,521][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.31.self_attn.o_proj.bias
[2024-06-21 17:33:59,521][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.31.mlp.gate_proj.bias
[2024-06-21 17:33:59,521][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.31.mlp.up_proj.bias
[2024-06-21 17:33:59,521][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.model.layers.31.mlp.down_proj.bias
[2024-06-21 17:33:59,521][fairseq.trainer][INFO] - detected shared parameter: encoder.w2v_model.feature_extractor_video.resnet.frontend3D.0.bias <- decoder.base_model.model.lm_head.bias
[2024-06-21 17:33:59,521][fairseq.utils][INFO] - ***********************CUDA enviroments for all 1 workers***********************
[2024-06-21 17:33:59,521][fairseq.utils][INFO] - rank   0: capabilities =  8.6  ; total memory = 15.729 GB ; name = NVIDIA RTX A4000                        
[2024-06-21 17:33:59,521][fairseq.utils][INFO] - ***********************CUDA enviroments for all 1 workers***********************
[2024-06-21 17:33:59,521][fairseq_cli.train][INFO] - training on 1 devices (GPUs/TPUs)
[2024-06-21 17:33:59,521][fairseq_cli.train][INFO] - max tokens per device = None and max sentences per device = 1
[2024-06-21 17:33:59,521][fairseq.trainer][INFO] - Preparing to load checkpoint checkpoints/checkpoint_last.pt
[2024-06-21 17:33:59,521][fairseq.trainer][INFO] - No existing checkpoint found checkpoints/checkpoint_last.pt
[2024-06-21 17:33:59,522][fairseq.trainer][INFO] - loading train data for epoch 1
[2024-06-21 17:33:59,522][src.vsp_llm_training][INFO] - Using tokenizer
[2024-06-21 17:33:59,690][src.vsp_llm_dataset][INFO] - max_keep=500, min_keep=None, loaded 120686, skipped 0 short and 0 long and 0 unaligned, longest-loaded=76, shortest-loaded=73
[2024-06-21 17:34:00,047][src.vsp_llm_dataset][INFO] - /home/theodore/Projects/VSP-LLM/data/processed/vasr/100h/1_2_2/train.wrd is sequence label. skipped
[2024-06-21 17:34:00,048][src.vsp_llm_dataset][INFO] - image transform: Compose(
    Normalize(mean=0.0, std=255.0)
    RandomCrop(size=(88, 88))
    <src.utils_vsp_llm.HorizontalFlip object at 0x795ad9026e80>
    Normalize(mean=0.421, std=0.165)
)
[2024-06-21 17:34:00,048][src.vsp_llm_dataset][INFO] - pad_audio=True, random_crop=False, normalize=True, max_sample_size=500, seqs2seq data=True,
[2024-06-21 17:34:00,048][src.vsp_llm_dataset][INFO] - Noise wav: None->0 wav, Prob: 0.0, SNR: 0, Number of mixture: 1
[2024-06-21 17:34:04,336][fairseq.trainer][INFO] - begin training epoch 1
[2024-06-21 17:34:04,336][fairseq_cli.train][INFO] - Start iterating over samples
[2024-06-21 17:39:31,483][train_inner][INFO] - {"epoch": 1, "update": 0.013, "loss": "7.619", "ntokens": "126.725", "acc_total": "126.725", "n_correct": "18.215", "wer_total": "126.725", "n_error": "108.44", "ppl": "196.55", "accuracy": "14.374", "wer": "85.571", "wps": "77.6", "ups": "0.61", "wpb": "126.7", "bsz": "8", "num_updates": "200", "lr": "1.49e-05", "gnorm": "8.744", "loss_scale": "128", "train_wall": "326", "gb_free": "7.1", "wall": "332"}
[2024-06-21 17:45:00,052][train_inner][INFO] - {"epoch": 1, "update": 0.027, "loss": "6.195", "ntokens": "126.93", "acc_total": "126.93", "n_correct": "25.71", "wer_total": "126.93", "n_error": "101.035", "ppl": "73.28", "accuracy": "20.255", "wer": "79.599", "wps": "77.3", "ups": "0.61", "wpb": "126.9", "bsz": "8", "num_updates": "400", "lr": "2.48e-05", "gnorm": "3.706", "loss_scale": "128", "train_wall": "328", "gb_free": "7.1", "wall": "661"}
[2024-06-21 17:50:28,684][train_inner][INFO] - {"epoch": 1, "update": 0.04, "loss": "6.075", "ntokens": "127.015", "acc_total": "127.015", "n_correct": "28.715", "wer_total": "127.015", "n_error": "98.02", "ppl": "67.41", "accuracy": "22.608", "wer": "77.172", "wps": "77.3", "ups": "0.61", "wpb": "127", "bsz": "8", "num_updates": "600", "lr": "3.47e-05", "gnorm": "3.963", "loss_scale": "128", "train_wall": "328", "gb_free": "7.1", "wall": "989"}
[2024-06-21 17:55:57,374][train_inner][INFO] - {"epoch": 1, "update": 0.053, "loss": "5.868", "ntokens": "126.865", "acc_total": "126.865", "n_correct": "30.73", "wer_total": "126.865", "n_error": "95.905", "ppl": "58.4", "accuracy": "24.223", "wer": "75.596", "wps": "77.2", "ups": "0.61", "wpb": "126.9", "bsz": "8", "num_updates": "800", "lr": "4.46e-05", "gnorm": "4.109", "loss_scale": "128", "train_wall": "328", "gb_free": "7.1", "wall": "1318"}
[2024-06-21 18:01:26,143][train_inner][INFO] - {"epoch": 1, "update": 0.066, "loss": "5.932", "ntokens": "127.025", "acc_total": "127.025", "n_correct": "30.575", "wer_total": "127.025", "n_error": "96.215", "ppl": "61.07", "accuracy": "24.07", "wer": "75.745", "wps": "77.3", "ups": "0.61", "wpb": "127", "bsz": "8", "num_updates": "1000", "lr": "5.45e-05", "gnorm": "3.776", "loss_scale": "128", "train_wall": "328", "gb_free": "7.1", "wall": "1647"}
[2024-06-21 18:06:54,791][train_inner][INFO] - {"epoch": 1, "update": 0.08, "loss": "5.883", "ntokens": "127.095", "acc_total": "127.095", "n_correct": "31", "wer_total": "127.095", "n_error": "95.88", "ppl": "59.01", "accuracy": "24.391", "wer": "75.44", "wps": "77.3", "ups": "0.61", "wpb": "127.1", "bsz": "8", "num_updates": "1200", "lr": "6.44e-05", "gnorm": "3.558", "loss_scale": "128", "train_wall": "328", "gb_free": "7.1", "wall": "1975"}
[2024-06-21 18:12:23,585][train_inner][INFO] - {"epoch": 1, "update": 0.093, "loss": "5.724", "ntokens": "127.62", "acc_total": "127.62", "n_correct": "32.185", "wer_total": "127.62", "n_error": "95.215", "ppl": "52.86", "accuracy": "25.219", "wer": "74.608", "wps": "77.6", "ups": "0.61", "wpb": "127.6", "bsz": "8", "num_updates": "1400", "lr": "7.43e-05", "gnorm": "3.412", "loss_scale": "128", "train_wall": "328", "gb_free": "7.1", "wall": "2304"}
[2024-06-21 18:17:52,372][train_inner][INFO] - {"epoch": 1, "update": 0.106, "loss": "5.738", "ntokens": "127.41", "acc_total": "127.41", "n_correct": "32.74", "wer_total": "127.41", "n_error": "94.395", "ppl": "53.35", "accuracy": "25.697", "wer": "74.088", "wps": "77.5", "ups": "0.61", "wpb": "127.4", "bsz": "8", "num_updates": "1600", "lr": "8.42e-05", "gnorm": "3.166", "loss_scale": "128", "train_wall": "328", "gb_free": "7.1", "wall": "2633"}
[2024-06-21 18:23:21,235][train_inner][INFO] - {"epoch": 1, "update": 0.119, "loss": "5.779", "ntokens": "126.56", "acc_total": "126.56", "n_correct": "32.125", "wer_total": "126.56", "n_error": "94.245", "ppl": "54.91", "accuracy": "25.383", "wer": "74.467", "wps": "77", "ups": "0.61", "wpb": "126.6", "bsz": "8", "num_updates": "1800", "lr": "9.41e-05", "gnorm": "2.952", "loss_scale": "128", "train_wall": "328", "gb_free": "7.1", "wall": "2962"}
[2024-06-21 18:28:49,852][train_inner][INFO] - {"epoch": 1, "update": 0.133, "loss": "5.681", "ntokens": "126.875", "acc_total": "126.875", "n_correct": "33.3", "wer_total": "126.875", "n_error": "93.35", "ppl": "51.3", "accuracy": "26.246", "wer": "73.576", "wps": "77.2", "ups": "0.61", "wpb": "126.9", "bsz": "8", "num_updates": "2000", "lr": "0.000104", "gnorm": "2.791", "loss_scale": "128", "train_wall": "328", "gb_free": "7.1", "wall": "3290"}
[2024-06-21 18:34:18,693][train_inner][INFO] - {"epoch": 1, "update": 0.146, "loss": "5.601", "ntokens": "128.2", "acc_total": "128.2", "n_correct": "34.815", "wer_total": "128.2", "n_error": "93.055", "ppl": "48.53", "accuracy": "27.157", "wer": "72.586", "wps": "78", "ups": "0.61", "wpb": "128.2", "bsz": "8", "num_updates": "2200", "lr": "0.0001139", "gnorm": "2.766", "loss_scale": "256", "train_wall": "328", "gb_free": "7.1", "wall": "3619"}
[2024-06-21 18:39:46,853][train_inner][INFO] - {"epoch": 1, "update": 0.159, "loss": "5.494", "ntokens": "127.775", "acc_total": "127.775", "n_correct": "37", "wer_total": "127.775", "n_error": "90.575", "ppl": "45.07", "accuracy": "28.957", "wer": "70.886", "wps": "77.9", "ups": "0.61", "wpb": "127.8", "bsz": "8", "num_updates": "2400", "lr": "0.0001238", "gnorm": "2.902", "loss_scale": "256", "train_wall": "327", "gb_free": "7.1", "wall": "3947"}
[2024-06-21 18:42:31,072][fairseq_cli.train][INFO] - begin validation on "valid" subset
[2024-06-21 19:24:21,320][valid][INFO] - {"epoch": 1, "valid_loss": "5.259", "valid_ntokens": "18.1585", "valid_acc_total": "18.1585", "valid_n_correct": "5.697", "valid_wer_total": "18.1585", "valid_n_error": "12.4353", "valid_ppl": "38.29", "valid_accuracy": "31.374", "valid_wer": "68.482", "valid_wps": "173.5", "valid_wpb": "18.2", "valid_bsz": "1", "valid_num_updates": "2500"}
[2024-06-21 19:24:21,321][fairseq.checkpoint_utils][INFO] - Preparing to save checkpoint for epoch 1 @ 2500 updates
[2024-06-21 19:24:21,321][fairseq.trainer][INFO] - Saving checkpoint to checkpoints/checkpoint_1_2500.pt
[2024-06-21 19:24:24,486][fairseq.trainer][INFO] - Finished saving checkpoint to checkpoints/checkpoint_1_2500.pt
[2024-06-21 19:24:27,413][fairseq.checkpoint_utils][INFO] - Saved checkpoint checkpoints/checkpoint_1_2500.pt (epoch 1 @ 2500 updates, score 31.374) (writing took 6.092055908986367 seconds)
[2024-06-21 19:27:11,286][train_inner][INFO] - {"epoch": 1, "update": 0.172, "loss": "5.317", "ntokens": "126.92", "acc_total": "126.92", "n_correct": "39.34", "wer_total": "126.92", "n_error": "87.34", "ppl": "39.85", "accuracy": "30.996", "wer": "68.815", "wps": "8.9", "ups": "0.07", "wpb": "126.9", "bsz": "8", "num_updates": "2600", "lr": "0.0001337", "gnorm": "3.239", "loss_scale": "256", "train_wall": "327", "gb_free": "7.1", "wall": "6792"}
[2024-06-21 19:32:39,717][train_inner][INFO] - {"epoch": 1, "update": 0.186, "loss": "5.183", "ntokens": "125.685", "acc_total": "125.685", "n_correct": "41.55", "wer_total": "125.685", "n_error": "83.845", "ppl": "36.32", "accuracy": "33.059", "wer": "66.71", "wps": "76.5", "ups": "0.61", "wpb": "125.7", "bsz": "8", "num_updates": "2800", "lr": "0.0001436", "gnorm": "3.663", "loss_scale": "256", "train_wall": "328", "gb_free": "7.1", "wall": "7120"}
[2024-06-21 19:38:07,898][train_inner][INFO] - {"epoch": 1, "update": 0.199, "loss": "4.937", "ntokens": "127.19", "acc_total": "127.19", "n_correct": "46.045", "wer_total": "127.19", "n_error": "80.88", "ppl": "30.64", "accuracy": "36.202", "wer": "63.59", "wps": "77.5", "ups": "0.61", "wpb": "127.2", "bsz": "8", "num_updates": "3000", "lr": "0.0001535", "gnorm": "3.979", "loss_scale": "256", "train_wall": "328", "gb_free": "7.1", "wall": "7448"}
[2024-06-21 19:43:36,311][train_inner][INFO] - {"epoch": 1, "update": 0.212, "loss": "4.651", "ntokens": "126.535", "acc_total": "126.535", "n_correct": "49.785", "wer_total": "126.535", "n_error": "76.52", "ppl": "25.13", "accuracy": "39.345", "wer": "60.473", "wps": "77.1", "ups": "0.61", "wpb": "126.5", "bsz": "8", "num_updates": "3200", "lr": "0.0001634", "gnorm": "4.149", "loss_scale": "256", "train_wall": "328", "gb_free": "7.1", "wall": "7777"}
[2024-06-21 19:49:04,728][train_inner][INFO] - {"epoch": 1, "update": 0.225, "loss": "4.484", "ntokens": "126.53", "acc_total": "126.53", "n_correct": "52.275", "wer_total": "126.53", "n_error": "74.08", "ppl": "22.38", "accuracy": "41.314", "wer": "58.547", "wps": "77.1", "ups": "0.61", "wpb": "126.5", "bsz": "8", "num_updates": "3400", "lr": "0.0001733", "gnorm": "4.273", "loss_scale": "256", "train_wall": "328", "gb_free": "7.1", "wall": "8105"}
[2024-06-21 19:54:33,025][train_inner][INFO] - {"epoch": 1, "update": 0.239, "loss": "4.232", "ntokens": "127.025", "acc_total": "127.025", "n_correct": "55.67", "wer_total": "127.025", "n_error": "71.16", "ppl": "18.79", "accuracy": "43.826", "wer": "56.02", "wps": "77.4", "ups": "0.61", "wpb": "127", "bsz": "8", "num_updates": "3600", "lr": "0.0001832", "gnorm": "4.334", "loss_scale": "256", "train_wall": "328", "gb_free": "7.1", "wall": "8434"}
[2024-06-21 20:00:01,459][train_inner][INFO] - {"epoch": 1, "update": 0.252, "loss": "4.09", "ntokens": "127.35", "acc_total": "127.35", "n_correct": "57.36", "wer_total": "127.35", "n_error": "69.74", "ppl": "17.03", "accuracy": "45.041", "wer": "54.762", "wps": "77.6", "ups": "0.61", "wpb": "127.3", "bsz": "8", "num_updates": "3800", "lr": "0.0001931", "gnorm": "4.413", "loss_scale": "256", "train_wall": "328", "gb_free": "7.1", "wall": "8762"}
[2024-06-21 20:05:29,785][train_inner][INFO] - {"epoch": 1, "update": 0.265, "loss": "3.954", "ntokens": "127.785", "acc_total": "127.785", "n_correct": "59.17", "wer_total": "127.785", "n_error": "68.42", "ppl": "15.5", "accuracy": "46.304", "wer": "53.543", "wps": "77.8", "ups": "0.61", "wpb": "127.8", "bsz": "8", "num_updates": "4000", "lr": "0.000203", "gnorm": "4.424", "loss_scale": "256", "train_wall": "328", "gb_free": "7.1", "wall": "9090"}
[2024-06-21 20:10:58,125][train_inner][INFO] - {"epoch": 1, "update": 0.278, "loss": "3.796", "ntokens": "126.255", "acc_total": "126.255", "n_correct": "60.805", "wer_total": "126.255", "n_error": "65.29", "ppl": "13.89", "accuracy": "48.16", "wer": "51.713", "wps": "76.9", "ups": "0.61", "wpb": "126.3", "bsz": "8", "num_updates": "4200", "lr": "0.0002129", "gnorm": "4.482", "loss_scale": "512", "train_wall": "328", "gb_free": "7.1", "wall": "9419"}
[2024-06-21 20:16:26,364][train_inner][INFO] - {"epoch": 1, "update": 0.292, "loss": "3.673", "ntokens": "125.9", "acc_total": "125.9", "n_correct": "62.1", "wer_total": "125.9", "n_error": "63.65", "ppl": "12.75", "accuracy": "49.325", "wer": "50.556", "wps": "76.7", "ups": "0.61", "wpb": "125.9", "bsz": "8", "num_updates": "4400", "lr": "0.0002228", "gnorm": "4.443", "loss_scale": "512", "train_wall": "328", "gb_free": "7.1", "wall": "9747"}
[2024-06-21 20:21:54,589][train_inner][INFO] - {"epoch": 1, "update": 0.305, "loss": "3.529", "ntokens": "127.885", "acc_total": "127.885", "n_correct": "65.22", "wer_total": "127.885", "n_error": "62.53", "ppl": "11.55", "accuracy": "50.999", "wer": "48.895", "wps": "77.9", "ups": "0.61", "wpb": "127.9", "bsz": "8", "num_updates": "4600", "lr": "0.0002327", "gnorm": "4.434", "loss_scale": "512", "train_wall": "328", "gb_free": "7.1", "wall": "10075"}
[2024-06-21 20:27:23,038][train_inner][INFO] - {"epoch": 1, "update": 0.318, "loss": "3.45", "ntokens": "126.21", "acc_total": "126.21", "n_correct": "65.17", "wer_total": "126.21", "n_error": "60.93", "ppl": "10.93", "accuracy": "51.636", "wer": "48.277", "wps": "76.9", "ups": "0.61", "wpb": "126.2", "bsz": "8", "num_updates": "4800", "lr": "0.0002426", "gnorm": "4.346", "loss_scale": "512", "train_wall": "328", "gb_free": "7.1", "wall": "10404"}
[2024-06-21 20:32:51,483][train_inner][INFO] - {"epoch": 1, "update": 0.331, "loss": "3.36", "ntokens": "126.87", "acc_total": "126.87", "n_correct": "66.1", "wer_total": "126.87", "n_error": "60.655", "ppl": "10.27", "accuracy": "52.101", "wer": "47.809", "wps": "77.3", "ups": "0.61", "wpb": "126.9", "bsz": "8", "num_updates": "5000", "lr": "0.0002525", "gnorm": "4.327", "loss_scale": "512", "train_wall": "328", "gb_free": "7.1", "wall": "10732"}
[2024-06-21 20:32:51,483][fairseq_cli.train][INFO] - begin validation on "valid" subset
[2024-06-21 21:14:44,981][valid][INFO] - {"epoch": 1, "valid_loss": "3.067", "valid_ntokens": "18.1585", "valid_acc_total": "18.1585", "valid_n_correct": "10.1", "valid_wer_total": "18.1585", "valid_n_error": "8.03097", "valid_ppl": "8.38", "valid_accuracy": "55.621", "valid_wer": "44.227", "valid_wps": "173.3", "valid_wpb": "18.2", "valid_bsz": "1", "valid_num_updates": "5000", "valid_best_accuracy": "55.621"}
[2024-06-21 21:14:44,981][fairseq.checkpoint_utils][INFO] - Preparing to save checkpoint for epoch 1 @ 5000 updates
[2024-06-21 21:14:44,982][fairseq.trainer][INFO] - Saving checkpoint to checkpoints/checkpoint_1_5000.pt
[2024-06-21 21:14:48,263][fairseq.trainer][INFO] - Finished saving checkpoint to checkpoints/checkpoint_1_5000.pt
[2024-06-21 21:14:55,375][fairseq.checkpoint_utils][INFO] - Saved checkpoint checkpoints/checkpoint_1_5000.pt (epoch 1 @ 5000 updates, score 55.621) (writing took 10.393895157030784 seconds)
[2024-06-21 21:20:23,546][train_inner][INFO] - {"epoch": 1, "update": 0.345, "loss": "3.136", "ntokens": "126.51", "acc_total": "126.51", "n_correct": "69.045", "wer_total": "126.51", "n_error": "57.35", "ppl": "8.79", "accuracy": "54.577", "wer": "45.332", "wps": "8.9", "ups": "0.07", "wpb": "126.5", "bsz": "8", "num_updates": "5200", "lr": "0.0002624", "gnorm": "4.174", "loss_scale": "512", "train_wall": "327", "gb_free": "7.1", "wall": "13584"}
[2024-06-21 21:25:51,918][train_inner][INFO] - {"epoch": 1, "update": 0.358, "loss": "3.162", "ntokens": "127.425", "acc_total": "127.425", "n_correct": "69.31", "wer_total": "127.425", "n_error": "57.985", "ppl": "8.95", "accuracy": "54.393", "wer": "45.505", "wps": "77.6", "ups": "0.61", "wpb": "127.4", "bsz": "8", "num_updates": "5400", "lr": "0.0002723", "gnorm": "4.231", "loss_scale": "512", "train_wall": "328", "gb_free": "7.1", "wall": "13912"}
[2024-06-21 21:31:20,429][train_inner][INFO] - {"epoch": 1, "update": 0.371, "loss": "3.063", "ntokens": "127.52", "acc_total": "127.52", "n_correct": "70.21", "wer_total": "127.52", "n_error": "57.165", "ppl": "8.35", "accuracy": "55.058", "wer": "44.828", "wps": "77.6", "ups": "0.61", "wpb": "127.5", "bsz": "8", "num_updates": "5600", "lr": "0.0002822", "gnorm": "4.247", "loss_scale": "512", "train_wall": "328", "gb_free": "7.1", "wall": "14241"}
[2024-06-21 21:36:48,780][train_inner][INFO] - {"epoch": 1, "update": 0.384, "loss": "3.138", "ntokens": "126.59", "acc_total": "126.59", "n_correct": "69.41", "wer_total": "126.59", "n_error": "57.055", "ppl": "8.8", "accuracy": "54.831", "wer": "45.071", "wps": "77.1", "ups": "0.61", "wpb": "126.6", "bsz": "8", "num_updates": "5800", "lr": "0.0002921", "gnorm": "4.141", "loss_scale": "512", "train_wall": "328", "gb_free": "7.1", "wall": "14569"}
[2024-06-21 21:42:17,031][train_inner][INFO] - {"epoch": 1, "update": 0.398, "loss": "2.993", "ntokens": "127.395", "acc_total": "127.395", "n_correct": "71.025", "wer_total": "127.395", "n_error": "56.225", "ppl": "7.96", "accuracy": "55.752", "wer": "44.134", "wps": "77.6", "ups": "0.61", "wpb": "127.4", "bsz": "8", "num_updates": "6000", "lr": "0.000302", "gnorm": "4.099", "loss_scale": "512", "train_wall": "328", "gb_free": "7.1", "wall": "14898"}
[2024-06-21 21:47:45,344][train_inner][INFO] - {"epoch": 1, "update": 0.411, "loss": "2.896", "ntokens": "126.995", "acc_total": "126.995", "n_correct": "72.3", "wer_total": "126.995", "n_error": "54.58", "ppl": "7.44", "accuracy": "56.931", "wer": "42.978", "wps": "77.4", "ups": "0.61", "wpb": "127", "bsz": "8", "num_updates": "6200", "lr": "0.0003119", "gnorm": "4.04", "loss_scale": "1024", "train_wall": "328", "gb_free": "7.1", "wall": "15226"}
[2024-06-21 21:53:13,640][train_inner][INFO] - {"epoch": 1, "update": 0.424, "loss": "2.939", "ntokens": "126.01", "acc_total": "126.01", "n_correct": "71.065", "wer_total": "126.01", "n_error": "54.815", "ppl": "7.67", "accuracy": "56.396", "wer": "43.501", "wps": "76.8", "ups": "0.61", "wpb": "126", "bsz": "8", "num_updates": "6400", "lr": "0.0003218", "gnorm": "4.115", "loss_scale": "1024", "train_wall": "328", "gb_free": "7.1", "wall": "15554"}
[2024-06-21 21:58:41,927][train_inner][INFO] - {"epoch": 1, "update": 0.437, "loss": "2.866", "ntokens": "126.33", "acc_total": "126.33", "n_correct": "72.43", "wer_total": "126.33", "n_error": "53.805", "ppl": "7.29", "accuracy": "57.334", "wer": "42.591", "wps": "77", "ups": "0.61", "wpb": "126.3", "bsz": "8", "num_updates": "6600", "lr": "0.0003317", "gnorm": "4.199", "loss_scale": "1024", "train_wall": "328", "gb_free": "7.1", "wall": "15882"}
[2024-06-21 22:04:10,063][train_inner][INFO] - {"epoch": 1, "update": 0.451, "loss": "2.815", "ntokens": "126.735", "acc_total": "126.735", "n_correct": "72.89", "wer_total": "126.735", "n_error": "53.73", "ppl": "7.04", "accuracy": "57.514", "wer": "42.396", "wps": "77.2", "ups": "0.61", "wpb": "126.7", "bsz": "8", "num_updates": "6800", "lr": "0.0003416", "gnorm": "4.035", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "16211"}
[2024-06-21 22:09:38,192][train_inner][INFO] - {"epoch": 1, "update": 0.464, "loss": "2.712", "ntokens": "126.98", "acc_total": "126.98", "n_correct": "74.815", "wer_total": "126.98", "n_error": "52.035", "ppl": "6.55", "accuracy": "58.919", "wer": "40.979", "wps": "77.4", "ups": "0.61", "wpb": "127", "bsz": "8", "num_updates": "7000", "lr": "0.0003515", "gnorm": "4.027", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "16539"}
[2024-06-21 22:15:06,173][train_inner][INFO] - {"epoch": 1, "update": 0.477, "loss": "2.792", "ntokens": "126.97", "acc_total": "126.97", "n_correct": "73.59", "wer_total": "126.97", "n_error": "53.285", "ppl": "6.93", "accuracy": "57.959", "wer": "41.967", "wps": "77.4", "ups": "0.61", "wpb": "127", "bsz": "8", "num_updates": "7200", "lr": "0.0003614", "gnorm": "4.14", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "16867"}
[2024-06-21 22:20:34,182][train_inner][INFO] - {"epoch": 1, "update": 0.491, "loss": "2.622", "ntokens": "127.45", "acc_total": "127.45", "n_correct": "76.255", "wer_total": "127.45", "n_error": "51.11", "ppl": "6.15", "accuracy": "59.831", "wer": "40.102", "wps": "77.7", "ups": "0.61", "wpb": "127.5", "bsz": "8", "num_updates": "7400", "lr": "0.0003713", "gnorm": "4.033", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "17195"}
[2024-06-21 22:23:18,254][fairseq_cli.train][INFO] - begin validation on "valid" subset
[2024-06-21 23:05:10,482][valid][INFO] - {"epoch": 1, "valid_loss": "2.337", "valid_ntokens": "18.1585", "valid_acc_total": "18.1585", "valid_n_correct": "11.5596", "valid_wer_total": "18.1585", "valid_n_error": "6.5822", "valid_ppl": "5.05", "valid_accuracy": "63.66", "valid_wer": "36.249", "valid_wps": "173.4", "valid_wpb": "18.2", "valid_bsz": "1", "valid_num_updates": "7500", "valid_best_accuracy": "63.66"}
[2024-06-21 23:05:10,483][fairseq.checkpoint_utils][INFO] - Preparing to save checkpoint for epoch 1 @ 7500 updates
[2024-06-21 23:05:10,483][fairseq.trainer][INFO] - Saving checkpoint to checkpoints/checkpoint_1_7500.pt
[2024-06-21 23:05:13,776][fairseq.trainer][INFO] - Finished saving checkpoint to checkpoints/checkpoint_1_7500.pt
[2024-06-21 23:05:20,769][fairseq.checkpoint_utils][INFO] - Saved checkpoint checkpoints/checkpoint_1_7500.pt (epoch 1 @ 7500 updates, score 63.66) (writing took 10.28679346095305 seconds)
[2024-06-21 23:08:04,636][train_inner][INFO] - {"epoch": 1, "update": 0.504, "loss": "2.65", "ntokens": "127.32", "acc_total": "127.32", "n_correct": "76.04", "wer_total": "127.32", "n_error": "51.205", "ppl": "6.27", "accuracy": "59.724", "wer": "40.218", "wps": "8.9", "ups": "0.07", "wpb": "127.3", "bsz": "8", "num_updates": "7600", "lr": "0.0003812", "gnorm": "4.084", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "20045"}
[2024-06-21 23:13:33,004][train_inner][INFO] - {"epoch": 1, "update": 0.517, "loss": "2.645", "ntokens": "127.11", "acc_total": "127.11", "n_correct": "75.41", "wer_total": "127.11", "n_error": "51.6", "ppl": "6.25", "accuracy": "59.327", "wer": "40.595", "wps": "77.4", "ups": "0.61", "wpb": "127.1", "bsz": "8", "num_updates": "7800", "lr": "0.0003911", "gnorm": "3.93", "loss_scale": "1024", "train_wall": "328", "gb_free": "7.1", "wall": "20373"}
[2024-06-21 23:19:01,481][train_inner][INFO] - {"epoch": 1, "update": 0.53, "loss": "2.617", "ntokens": "126.875", "acc_total": "126.875", "n_correct": "75.71", "wer_total": "126.875", "n_error": "51.025", "ppl": "6.13", "accuracy": "59.673", "wer": "40.217", "wps": "77.3", "ups": "0.61", "wpb": "126.9", "bsz": "8", "num_updates": "8000", "lr": "0.000401", "gnorm": "4.007", "loss_scale": "1024", "train_wall": "328", "gb_free": "7.1", "wall": "20702"}
[2024-06-21 23:24:29,739][train_inner][INFO] - {"epoch": 1, "update": 0.544, "loss": "2.58", "ntokens": "126.05", "acc_total": "126.05", "n_correct": "76.015", "wer_total": "126.05", "n_error": "49.94", "ppl": "5.98", "accuracy": "60.305", "wer": "39.619", "wps": "76.8", "ups": "0.61", "wpb": "126", "bsz": "8", "num_updates": "8200", "lr": "0.0004109", "gnorm": "3.968", "loss_scale": "2048", "train_wall": "328", "gb_free": "7.1", "wall": "21030"}
[2024-06-21 23:29:57,801][train_inner][INFO] - {"epoch": 1, "update": 0.557, "loss": "2.52", "ntokens": "126.785", "acc_total": "126.785", "n_correct": "77.585", "wer_total": "126.785", "n_error": "49.1", "ppl": "5.73", "accuracy": "61.194", "wer": "38.727", "wps": "77.3", "ups": "0.61", "wpb": "126.8", "bsz": "8", "num_updates": "8400", "lr": "0.0004208", "gnorm": "4.072", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "21358"}
[2024-06-21 23:35:25,967][train_inner][INFO] - {"epoch": 1, "update": 0.57, "loss": "2.528", "ntokens": "126.675", "acc_total": "126.675", "n_correct": "76.55", "wer_total": "126.675", "n_error": "50.01", "ppl": "5.77", "accuracy": "60.43", "wer": "39.479", "wps": "77.2", "ups": "0.61", "wpb": "126.7", "bsz": "8", "num_updates": "8600", "lr": "0.0004307", "gnorm": "4.041", "loss_scale": "2048", "train_wall": "328", "gb_free": "7.1", "wall": "21686"}
[2024-06-21 23:40:54,629][train_inner][INFO] - {"epoch": 1, "update": 0.583, "loss": "2.516", "ntokens": "127.09", "acc_total": "127.09", "n_correct": "77.96", "wer_total": "127.09", "n_error": "49.055", "ppl": "5.72", "accuracy": "61.342", "wer": "38.599", "wps": "77.3", "ups": "0.61", "wpb": "127.1", "bsz": "8", "num_updates": "8800", "lr": "0.0004406", "gnorm": "4.073", "loss_scale": "2048", "train_wall": "328", "gb_free": "7.1", "wall": "22015"}
[2024-06-21 23:46:22,769][train_inner][INFO] - {"epoch": 1, "update": 0.597, "loss": "2.535", "ntokens": "127.43", "acc_total": "127.43", "n_correct": "77.245", "wer_total": "127.43", "n_error": "50.105", "ppl": "5.79", "accuracy": "60.618", "wer": "39.32", "wps": "77.7", "ups": "0.61", "wpb": "127.4", "bsz": "8", "num_updates": "9000", "lr": "0.0004505", "gnorm": "3.983", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "22343"}
[2024-06-21 23:51:50,857][train_inner][INFO] - {"epoch": 1, "update": 0.61, "loss": "2.494", "ntokens": "127.145", "acc_total": "127.145", "n_correct": "78.055", "wer_total": "127.145", "n_error": "48.95", "ppl": "5.63", "accuracy": "61.391", "wer": "38.499", "wps": "77.5", "ups": "0.61", "wpb": "127.1", "bsz": "8", "num_updates": "9200", "lr": "0.0004604", "gnorm": "3.926", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "22671"}
[2024-06-21 23:57:19,051][train_inner][INFO] - {"epoch": 1, "update": 0.623, "loss": "2.487", "ntokens": "125.995", "acc_total": "125.995", "n_correct": "77.29", "wer_total": "125.995", "n_error": "48.65", "ppl": "5.6", "accuracy": "61.344", "wer": "38.613", "wps": "76.8", "ups": "0.61", "wpb": "126", "bsz": "8", "num_updates": "9400", "lr": "0.0004703", "gnorm": "4.056", "loss_scale": "2048", "train_wall": "328", "gb_free": "7.1", "wall": "23000"}
[2024-06-22 00:02:47,080][train_inner][INFO] - {"epoch": 1, "update": 0.636, "loss": "2.453", "ntokens": "126.87", "acc_total": "126.87", "n_correct": "77.955", "wer_total": "126.87", "n_error": "48.8", "ppl": "5.47", "accuracy": "61.445", "wer": "38.465", "wps": "77.4", "ups": "0.61", "wpb": "126.9", "bsz": "8", "num_updates": "9600", "lr": "0.0004802", "gnorm": "4.009", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "23328"}
[2024-06-22 00:08:15,232][train_inner][INFO] - {"epoch": 1, "update": 0.65, "loss": "2.426", "ntokens": "125.68", "acc_total": "125.68", "n_correct": "78.215", "wer_total": "125.68", "n_error": "47.385", "ppl": "5.37", "accuracy": "62.233", "wer": "37.703", "wps": "76.6", "ups": "0.61", "wpb": "125.7", "bsz": "8", "num_updates": "9800", "lr": "0.0004901", "gnorm": "4.075", "loss_scale": "2048", "train_wall": "328", "gb_free": "7.1", "wall": "23656"}
[2024-06-22 00:13:43,347][train_inner][INFO] - {"epoch": 1, "update": 0.663, "loss": "2.436", "ntokens": "127.56", "acc_total": "127.56", "n_correct": "78.595", "wer_total": "127.56", "n_error": "48.905", "ppl": "5.41", "accuracy": "61.614", "wer": "38.339", "wps": "77.8", "ups": "0.61", "wpb": "127.6", "bsz": "8", "num_updates": "10000", "lr": "0.0005", "gnorm": "3.98", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "23984"}
[2024-06-22 00:13:43,348][fairseq_cli.train][INFO] - begin validation on "valid" subset
[2024-06-22 00:55:30,560][valid][INFO] - {"epoch": 1, "valid_loss": "2.182", "valid_ntokens": "18.1585", "valid_acc_total": "18.1585", "valid_n_correct": "11.8287", "valid_wer_total": "18.1585", "valid_n_error": "6.3158", "valid_ppl": "4.54", "valid_accuracy": "65.142", "valid_wer": "34.782", "valid_wps": "173.8", "valid_wpb": "18.2", "valid_bsz": "1", "valid_num_updates": "10000", "valid_best_accuracy": "65.142"}
[2024-06-22 00:55:30,561][fairseq.checkpoint_utils][INFO] - Preparing to save checkpoint for epoch 1 @ 10000 updates
[2024-06-22 00:55:30,561][fairseq.trainer][INFO] - Saving checkpoint to checkpoints/checkpoint_1_10000.pt
[2024-06-22 00:55:33,778][fairseq.trainer][INFO] - Finished saving checkpoint to checkpoints/checkpoint_1_10000.pt
[2024-06-22 00:55:38,060][fairseq.checkpoint_utils][INFO] - Saved checkpoint checkpoints/checkpoint_1_10000.pt (epoch 1 @ 10000 updates, score 65.142) (writing took 7.49860826600343 seconds)
[2024-06-22 01:01:05,952][train_inner][INFO] - {"epoch": 1, "update": 0.676, "loss": "2.416", "ntokens": "126.605", "acc_total": "126.605", "n_correct": "78.89", "wer_total": "126.605", "n_error": "47.62", "ppl": "5.34", "accuracy": "62.312", "wer": "37.613", "wps": "8.9", "ups": "0.07", "wpb": "126.6", "bsz": "8", "num_updates": "10200", "lr": "0.000485243", "gnorm": "4.171", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "26826"}
[2024-06-22 01:02:14,720][fairseq.trainer][INFO] - NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 2048.0
[2024-06-22 01:06:35,485][train_inner][INFO] - {"epoch": 1, "update": 0.689, "loss": "2.384", "ntokens": "127.2", "acc_total": "127.2", "n_correct": "79.135", "wer_total": "127.2", "n_error": "47.935", "ppl": "5.22", "accuracy": "62.213", "wer": "37.685", "wps": "77.2", "ups": "0.61", "wpb": "127.2", "bsz": "8", "num_updates": "10400", "lr": "0.000470922", "gnorm": "3.907", "loss_scale": "2048", "train_wall": "329", "gb_free": "7.1", "wall": "27156"}
[2024-06-22 01:08:13,848][fairseq.trainer][INFO] - NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 1024.0
[2024-06-22 01:12:05,074][train_inner][INFO] - {"epoch": 1, "update": 0.703, "loss": "2.366", "ntokens": "127.84", "acc_total": "127.84", "n_correct": "79.74", "wer_total": "127.84", "n_error": "48.005", "ppl": "5.16", "accuracy": "62.375", "wer": "37.551", "wps": "77.6", "ups": "0.61", "wpb": "127.8", "bsz": "8", "num_updates": "10600", "lr": "0.000457024", "gnorm": "3.971", "loss_scale": "1024", "train_wall": "329", "gb_free": "7.1", "wall": "27486"}
[2024-06-22 01:17:33,036][train_inner][INFO] - {"epoch": 1, "update": 0.716, "loss": "2.392", "ntokens": "127.35", "acc_total": "127.35", "n_correct": "80.175", "wer_total": "127.35", "n_error": "47.1", "ppl": "5.25", "accuracy": "62.956", "wer": "36.985", "wps": "77.7", "ups": "0.61", "wpb": "127.3", "bsz": "8", "num_updates": "10800", "lr": "0.000443536", "gnorm": "3.971", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "27814"}
[2024-06-22 01:23:01,051][train_inner][INFO] - {"epoch": 1, "update": 0.729, "loss": "2.297", "ntokens": "126.115", "acc_total": "126.115", "n_correct": "79.955", "wer_total": "126.115", "n_error": "46.06", "ppl": "4.91", "accuracy": "63.398", "wer": "36.522", "wps": "76.9", "ups": "0.61", "wpb": "126.1", "bsz": "8", "num_updates": "11000", "lr": "0.000430446", "gnorm": "3.915", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "28142"}
[2024-06-22 01:28:28,990][train_inner][INFO] - {"epoch": 1, "update": 0.743, "loss": "2.218", "ntokens": "126.545", "acc_total": "126.545", "n_correct": "82.445", "wer_total": "126.545", "n_error": "44.025", "ppl": "4.65", "accuracy": "65.151", "wer": "34.79", "wps": "77.2", "ups": "0.61", "wpb": "126.5", "bsz": "8", "num_updates": "11200", "lr": "0.000417742", "gnorm": "3.754", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "28469"}
[2024-06-22 01:33:57,072][train_inner][INFO] - {"epoch": 1, "update": 0.756, "loss": "2.195", "ntokens": "125.9", "acc_total": "125.9", "n_correct": "82.215", "wer_total": "125.9", "n_error": "43.63", "ppl": "4.58", "accuracy": "65.302", "wer": "34.654", "wps": "76.7", "ups": "0.61", "wpb": "125.9", "bsz": "8", "num_updates": "11400", "lr": "0.000405413", "gnorm": "3.829", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "28798"}
[2024-06-22 01:39:24,981][train_inner][INFO] - {"epoch": 1, "update": 0.769, "loss": "2.216", "ntokens": "126.905", "acc_total": "126.905", "n_correct": "83.2", "wer_total": "126.905", "n_error": "43.645", "ppl": "4.65", "accuracy": "65.561", "wer": "34.392", "wps": "77.4", "ups": "0.61", "wpb": "126.9", "bsz": "8", "num_updates": "11600", "lr": "0.000393448", "gnorm": "3.697", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "29125"}
[2024-06-22 01:44:53,088][train_inner][INFO] - {"epoch": 1, "update": 0.782, "loss": "2.179", "ntokens": "126.775", "acc_total": "126.775", "n_correct": "83.505", "wer_total": "126.775", "n_error": "43.165", "ppl": "4.53", "accuracy": "65.869", "wer": "34.049", "wps": "77.3", "ups": "0.61", "wpb": "126.8", "bsz": "8", "num_updates": "11800", "lr": "0.000381836", "gnorm": "3.912", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "29454"}
[2024-06-22 01:50:21,190][train_inner][INFO] - {"epoch": 1, "update": 0.796, "loss": "2.156", "ntokens": "126.605", "acc_total": "126.605", "n_correct": "84.815", "wer_total": "126.605", "n_error": "41.715", "ppl": "4.46", "accuracy": "66.992", "wer": "32.949", "wps": "77.2", "ups": "0.61", "wpb": "126.6", "bsz": "8", "num_updates": "12000", "lr": "0.000370567", "gnorm": "3.845", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "29782"}
[2024-06-22 01:55:49,215][train_inner][INFO] - {"epoch": 1, "update": 0.809, "loss": "2.16", "ntokens": "126.91", "acc_total": "126.91", "n_correct": "86.03", "wer_total": "126.91", "n_error": "40.83", "ppl": "4.47", "accuracy": "67.788", "wer": "32.172", "wps": "77.4", "ups": "0.61", "wpb": "126.9", "bsz": "8", "num_updates": "12200", "lr": "0.000359631", "gnorm": "3.917", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "30110"}
[2024-06-22 02:01:17,283][train_inner][INFO] - {"epoch": 1, "update": 0.822, "loss": "2.144", "ntokens": "126.95", "acc_total": "126.95", "n_correct": "83.42", "wer_total": "126.95", "n_error": "43.45", "ppl": "4.42", "accuracy": "65.711", "wer": "34.226", "wps": "77.4", "ups": "0.61", "wpb": "127", "bsz": "8", "num_updates": "12400", "lr": "0.000349017", "gnorm": "3.777", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "30438"}
[2024-06-22 02:04:01,214][fairseq_cli.train][INFO] - begin validation on "valid" subset
[2024-06-22 02:45:48,664][valid][INFO] - {"epoch": 1, "valid_loss": "1.835", "valid_ntokens": "18.1585", "valid_acc_total": "18.1585", "valid_n_correct": "12.5855", "valid_wer_total": "18.1585", "valid_n_error": "5.56561", "valid_ppl": "3.57", "valid_accuracy": "69.309", "valid_wer": "30.65", "valid_wps": "173.7", "valid_wpb": "18.2", "valid_bsz": "1", "valid_num_updates": "12500", "valid_best_accuracy": "69.309"}
[2024-06-22 02:45:48,665][fairseq.checkpoint_utils][INFO] - Preparing to save checkpoint for epoch 1 @ 12500 updates
[2024-06-22 02:45:48,665][fairseq.trainer][INFO] - Saving checkpoint to checkpoints/checkpoint_1_12500.pt
[2024-06-22 02:45:51,886][fairseq.trainer][INFO] - Finished saving checkpoint to checkpoints/checkpoint_1_12500.pt
[2024-06-22 02:45:56,564][fairseq.checkpoint_utils][INFO] - Saved checkpoint checkpoints/checkpoint_1_12500.pt (epoch 1 @ 12500 updates, score 69.309) (writing took 7.899154600105248 seconds)
[2024-06-22 02:48:40,389][train_inner][INFO] - {"epoch": 1, "update": 0.835, "loss": "2.023", "ntokens": "126.58", "acc_total": "126.58", "n_correct": "84.08", "wer_total": "126.58", "n_error": "42.42", "ppl": "4.06", "accuracy": "66.424", "wer": "33.512", "wps": "8.9", "ups": "0.07", "wpb": "126.6", "bsz": "8", "num_updates": "12600", "lr": "0.000338716", "gnorm": "3.541", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "33281"}
[2024-06-22 02:54:08,351][train_inner][INFO] - {"epoch": 1, "update": 0.849, "loss": "2.03", "ntokens": "127.27", "acc_total": "127.27", "n_correct": "83.98", "wer_total": "127.27", "n_error": "43.2", "ppl": "4.09", "accuracy": "65.986", "wer": "33.944", "wps": "77.6", "ups": "0.61", "wpb": "127.3", "bsz": "8", "num_updates": "12800", "lr": "0.00032872", "gnorm": "3.539", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "33609"}
[2024-06-22 02:59:36,205][train_inner][INFO] - {"epoch": 1, "update": 0.862, "loss": "2.04", "ntokens": "125.955", "acc_total": "125.955", "n_correct": "86.48", "wer_total": "125.955", "n_error": "39.41", "ppl": "4.11", "accuracy": "68.659", "wer": "31.289", "wps": "76.8", "ups": "0.61", "wpb": "126", "bsz": "8", "num_updates": "13000", "lr": "0.000319018", "gnorm": "3.566", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "33937"}
[2024-06-22 03:05:03,886][train_inner][INFO] - {"epoch": 1, "update": 0.875, "loss": "2.04", "ntokens": "126.89", "acc_total": "126.89", "n_correct": "85.16", "wer_total": "126.89", "n_error": "41.66", "ppl": "4.11", "accuracy": "67.113", "wer": "32.832", "wps": "77.4", "ups": "0.61", "wpb": "126.9", "bsz": "8", "num_updates": "13200", "lr": "0.000309603", "gnorm": "3.571", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "34264"}
[2024-06-22 03:10:31,576][train_inner][INFO] - {"epoch": 1, "update": 0.888, "loss": "2.048", "ntokens": "127.49", "acc_total": "127.49", "n_correct": "87.12", "wer_total": "127.49", "n_error": "40.32", "ppl": "4.14", "accuracy": "68.335", "wer": "31.626", "wps": "77.8", "ups": "0.61", "wpb": "127.5", "bsz": "8", "num_updates": "13400", "lr": "0.000300466", "gnorm": "3.716", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "34592"}
[2024-06-22 03:15:59,352][train_inner][INFO] - {"epoch": 1, "update": 0.902, "loss": "1.952", "ntokens": "126.87", "acc_total": "126.87", "n_correct": "87.13", "wer_total": "126.87", "n_error": "39.695", "ppl": "3.87", "accuracy": "68.677", "wer": "31.288", "wps": "77.4", "ups": "0.61", "wpb": "126.9", "bsz": "8", "num_updates": "13600", "lr": "0.000291598", "gnorm": "3.505", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "34920"}
[2024-06-22 03:21:27,307][train_inner][INFO] - {"epoch": 1, "update": 0.915, "loss": "1.918", "ntokens": "127.055", "acc_total": "127.055", "n_correct": "87.23", "wer_total": "127.055", "n_error": "39.72", "ppl": "3.78", "accuracy": "68.655", "wer": "31.262", "wps": "77.5", "ups": "0.61", "wpb": "127.1", "bsz": "8", "num_updates": "13800", "lr": "0.000282992", "gnorm": "3.477", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "35248"}
[2024-06-22 03:26:55,148][train_inner][INFO] - {"epoch": 1, "update": 0.928, "loss": "1.934", "ntokens": "127.135", "acc_total": "127.135", "n_correct": "87.785", "wer_total": "127.135", "n_error": "39.32", "ppl": "3.82", "accuracy": "69.049", "wer": "30.928", "wps": "77.6", "ups": "0.61", "wpb": "127.1", "bsz": "8", "num_updates": "14000", "lr": "0.00027464", "gnorm": "3.377", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "35576"}
[2024-06-22 03:32:23,194][train_inner][INFO] - {"epoch": 1, "update": 0.941, "loss": "1.896", "ntokens": "126.005", "acc_total": "126.005", "n_correct": "87.395", "wer_total": "126.005", "n_error": "38.545", "ppl": "3.72", "accuracy": "69.358", "wer": "30.59", "wps": "76.8", "ups": "0.61", "wpb": "126", "bsz": "8", "num_updates": "14200", "lr": "0.000266535", "gnorm": "3.489", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "35904"}
[2024-06-22 03:34:04,893][fairseq.trainer][INFO] - NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 1024.0
[2024-06-22 03:37:52,875][train_inner][INFO] - {"epoch": 1, "update": 0.955, "loss": "1.912", "ntokens": "126.9", "acc_total": "126.9", "n_correct": "86.96", "wer_total": "126.9", "n_error": "39.87", "ppl": "3.76", "accuracy": "68.526", "wer": "31.418", "wps": "77", "ups": "0.61", "wpb": "126.9", "bsz": "8", "num_updates": "14400", "lr": "0.000258668", "gnorm": "3.387", "loss_scale": "1024", "train_wall": "329", "gb_free": "7.1", "wall": "36233"}
[2024-06-22 03:43:20,996][train_inner][INFO] - {"epoch": 1, "update": 0.968, "loss": "1.853", "ntokens": "127.515", "acc_total": "127.515", "n_correct": "89.005", "wer_total": "127.515", "n_error": "38.475", "ppl": "3.61", "accuracy": "69.8", "wer": "30.173", "wps": "77.7", "ups": "0.61", "wpb": "127.5", "bsz": "8", "num_updates": "14600", "lr": "0.000251034", "gnorm": "3.531", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "36561"}
[2024-06-22 03:48:49,012][train_inner][INFO] - {"epoch": 1, "update": 0.981, "loss": "1.892", "ntokens": "127.235", "acc_total": "127.235", "n_correct": "88.56", "wer_total": "127.235", "n_error": "38.63", "ppl": "3.71", "accuracy": "69.603", "wer": "30.361", "wps": "77.6", "ups": "0.61", "wpb": "127.2", "bsz": "8", "num_updates": "14800", "lr": "0.000243626", "gnorm": "3.373", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "36889"}
[2024-06-22 03:54:16,974][train_inner][INFO] - {"epoch": 1, "update": 0.994, "loss": "1.87", "ntokens": "127.245", "acc_total": "127.245", "n_correct": "87.73", "wer_total": "127.245", "n_error": "39.435", "ppl": "3.66", "accuracy": "68.946", "wer": "30.991", "wps": "77.6", "ups": "0.61", "wpb": "127.2", "bsz": "8", "num_updates": "15000", "lr": "0.000236435", "gnorm": "3.381", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "37217"}
[2024-06-22 03:54:16,975][fairseq_cli.train][INFO] - begin validation on "valid" subset
[2024-06-22 04:36:05,061][valid][INFO] - {"epoch": 1, "valid_loss": "1.587", "valid_ntokens": "18.1585", "valid_acc_total": "18.1585", "valid_n_correct": "13.1247", "valid_wer_total": "18.1585", "valid_n_error": "5.02676", "valid_ppl": "3", "valid_accuracy": "72.279", "valid_wer": "27.683", "valid_wps": "173.7", "valid_wpb": "18.2", "valid_bsz": "1", "valid_num_updates": "15000", "valid_best_accuracy": "72.279"}
[2024-06-22 04:36:05,062][fairseq.checkpoint_utils][INFO] - Preparing to save checkpoint for epoch 1 @ 15000 updates
[2024-06-22 04:36:05,062][fairseq.trainer][INFO] - Saving checkpoint to checkpoints/checkpoint_1_15000.pt
[2024-06-22 04:36:08,248][fairseq.trainer][INFO] - Finished saving checkpoint to checkpoints/checkpoint_1_15000.pt
[2024-06-22 04:36:12,576][fairseq.checkpoint_utils][INFO] - Saved checkpoint checkpoints/checkpoint_1_15000.pt (epoch 1 @ 15000 updates, score 72.279) (writing took 7.514834467088804 seconds)
[2024-06-22 04:38:28,101][fairseq_cli.train][INFO] - begin validation on "valid" subset
[2024-06-22 05:20:14,465][valid][INFO] - {"epoch": 1, "valid_loss": "1.597", "valid_ntokens": "18.1585", "valid_acc_total": "18.1585", "valid_n_correct": "13.0925", "valid_wer_total": "18.1585", "valid_n_error": "5.06086", "valid_ppl": "3.02", "valid_accuracy": "72.101", "valid_wer": "27.87", "valid_wps": "173.8", "valid_wpb": "18.2", "valid_bsz": "1", "valid_num_updates": "15083", "valid_best_accuracy": "72.279"}
[2024-06-22 05:20:14,466][fairseq.checkpoint_utils][INFO] - Preparing to save checkpoint for epoch 1 @ 15083 updates
[2024-06-22 05:20:14,466][fairseq.trainer][INFO] - Saving checkpoint to checkpoints/checkpoint_last.pt
[2024-06-22 05:20:18,374][fairseq.trainer][INFO] - Finished saving checkpoint to checkpoints/checkpoint_last.pt
[2024-06-22 05:20:18,453][fairseq.checkpoint_utils][INFO] - Saved checkpoint checkpoints/checkpoint_last.pt (epoch 1 @ 15083 updates, score 72.101) (writing took 3.9872186930151656 seconds)
[2024-06-22 05:20:18,454][fairseq_cli.train][INFO] - end of epoch 1 (average epoch stats below)
[2024-06-22 05:20:18,471][train][INFO] - {"epoch": 1, "train_loss": "3.282", "train_ntokens": "126.896", "train_acc_total": "126.896", "train_n_correct": "67.6397", "train_wer_total": "126.896", "train_n_error": "59.1277", "train_ppl": "9.73", "train_accuracy": "53.303", "train_wer": "46.595", "train_wps": "45.2", "train_ups": "0.36", "train_wpb": "126.9", "train_bsz": "8", "train_num_updates": "15083", "train_lr": "0.000233514", "train_gnorm": "3.919", "train_loss_scale": "1024", "train_train_wall": "24705", "train_gb_free": "7.1", "train_wall": "42379"}
[2024-06-22 05:20:18,523][fairseq.trainer][INFO] - begin training epoch 2
[2024-06-22 05:20:18,523][fairseq_cli.train][INFO] - Start iterating over samples
[2024-06-22 05:23:30,176][train_inner][INFO] - {"epoch": 2, "update": 1.008, "loss": "1.759", "ntokens": "127.23", "acc_total": "127.23", "n_correct": "88.75", "wer_total": "127.23", "n_error": "38.405", "ppl": "3.39", "accuracy": "69.756", "wer": "30.185", "wps": "4.8", "ups": "0.04", "wpb": "127.2", "bsz": "8", "num_updates": "15200", "lr": "0.000229457", "gnorm": "3.271", "loss_scale": "1024", "train_wall": "326", "gb_free": "7.1", "wall": "42571"}
[2024-06-22 05:28:57,852][train_inner][INFO] - {"epoch": 2, "update": 1.021, "loss": "1.705", "ntokens": "126.325", "acc_total": "126.325", "n_correct": "88.735", "wer_total": "126.325", "n_error": "37.56", "ppl": "3.26", "accuracy": "70.243", "wer": "29.733", "wps": "77.1", "ups": "0.61", "wpb": "126.3", "bsz": "8", "num_updates": "15400", "lr": "0.000222685", "gnorm": "3.194", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "42898"}
[2024-06-22 05:34:25,597][train_inner][INFO] - {"epoch": 2, "update": 1.034, "loss": "1.695", "ntokens": "126.915", "acc_total": "126.915", "n_correct": "89.225", "wer_total": "126.915", "n_error": "37.64", "ppl": "3.24", "accuracy": "70.303", "wer": "29.658", "wps": "77.4", "ups": "0.61", "wpb": "126.9", "bsz": "8", "num_updates": "15600", "lr": "0.000216113", "gnorm": "3.243", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "43226"}
[2024-06-22 05:39:53,168][train_inner][INFO] - {"epoch": 2, "update": 1.048, "loss": "1.773", "ntokens": "125.545", "acc_total": "125.545", "n_correct": "87.865", "wer_total": "125.545", "n_error": "37.635", "ppl": "3.42", "accuracy": "69.987", "wer": "29.977", "wps": "76.7", "ups": "0.61", "wpb": "125.5", "bsz": "8", "num_updates": "15800", "lr": "0.000209735", "gnorm": "3.112", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "43554"}
[2024-06-22 05:45:21,103][train_inner][INFO] - {"epoch": 2, "update": 1.061, "loss": "1.723", "ntokens": "127.405", "acc_total": "127.405", "n_correct": "88.815", "wer_total": "127.405", "n_error": "38.525", "ppl": "3.3", "accuracy": "69.711", "wer": "30.238", "wps": "77.7", "ups": "0.61", "wpb": "127.4", "bsz": "8", "num_updates": "16000", "lr": "0.000203545", "gnorm": "3.33", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "43882"}
[2024-06-22 05:50:48,860][train_inner][INFO] - {"epoch": 2, "update": 1.074, "loss": "1.584", "ntokens": "126.915", "acc_total": "126.915", "n_correct": "90.805", "wer_total": "126.915", "n_error": "36.095", "ppl": "3", "accuracy": "71.548", "wer": "28.44", "wps": "77.4", "ups": "0.61", "wpb": "126.9", "bsz": "8", "num_updates": "16200", "lr": "0.000197538", "gnorm": "3.07", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "44209"}
[2024-06-22 05:56:16,705][train_inner][INFO] - {"epoch": 2, "update": 1.087, "loss": "1.632", "ntokens": "127.14", "acc_total": "127.14", "n_correct": "90.885", "wer_total": "127.14", "n_error": "36.23", "ppl": "3.1", "accuracy": "71.484", "wer": "28.496", "wps": "77.6", "ups": "0.61", "wpb": "127.1", "bsz": "8", "num_updates": "16400", "lr": "0.000191708", "gnorm": "3.145", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "44537"}
[2024-06-22 06:01:44,271][train_inner][INFO] - {"epoch": 2, "update": 1.101, "loss": "1.621", "ntokens": "127.51", "acc_total": "127.51", "n_correct": "90.89", "wer_total": "127.51", "n_error": "36.565", "ppl": "3.08", "accuracy": "71.281", "wer": "28.676", "wps": "77.9", "ups": "0.61", "wpb": "127.5", "bsz": "8", "num_updates": "16600", "lr": "0.00018605", "gnorm": "3.098", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "44865"}
[2024-06-22 06:07:11,863][train_inner][INFO] - {"epoch": 2, "update": 1.114, "loss": "1.661", "ntokens": "126.405", "acc_total": "126.405", "n_correct": "90.03", "wer_total": "126.405", "n_error": "36.34", "ppl": "3.16", "accuracy": "71.223", "wer": "28.749", "wps": "77.2", "ups": "0.61", "wpb": "126.4", "bsz": "8", "num_updates": "16800", "lr": "0.000180559", "gnorm": "3.2", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "45192"}
[2024-06-22 06:12:39,552][train_inner][INFO] - {"epoch": 2, "update": 1.127, "loss": "1.582", "ntokens": "127.925", "acc_total": "127.925", "n_correct": "91.925", "wer_total": "127.925", "n_error": "35.985", "ppl": "2.99", "accuracy": "71.859", "wer": "28.13", "wps": "78.1", "ups": "0.61", "wpb": "127.9", "bsz": "8", "num_updates": "17000", "lr": "0.00017523", "gnorm": "3.263", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "45520"}
[2024-06-22 06:18:07,127][train_inner][INFO] - {"epoch": 2, "update": 1.14, "loss": "1.621", "ntokens": "126.125", "acc_total": "126.125", "n_correct": "89.895", "wer_total": "126.125", "n_error": "36.175", "ppl": "3.08", "accuracy": "71.275", "wer": "28.682", "wps": "77", "ups": "0.61", "wpb": "126.1", "bsz": "8", "num_updates": "17200", "lr": "0.000170059", "gnorm": "3.004", "loss_scale": "2048", "train_wall": "327", "gb_free": "7.1", "wall": "45848"}
[2024-06-22 06:19:15,859][fairseq.trainer][INFO] - NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 1024.0
[2024-06-22 06:23:36,291][train_inner][INFO] - {"epoch": 2, "update": 1.154, "loss": "1.633", "ntokens": "127.305", "acc_total": "127.305", "n_correct": "90.505", "wer_total": "127.305", "n_error": "36.74", "ppl": "3.1", "accuracy": "71.093", "wer": "28.86", "wps": "77.4", "ups": "0.61", "wpb": "127.3", "bsz": "8", "num_updates": "17400", "lr": "0.00016504", "gnorm": "3.113", "loss_scale": "1024", "train_wall": "329", "gb_free": "7.1", "wall": "46177"}
[2024-06-22 06:26:20,107][fairseq_cli.train][INFO] - begin validation on "valid" subset
[2024-06-22 07:08:04,309][valid][INFO] - {"epoch": 2, "valid_loss": "1.481", "valid_ntokens": "18.1585", "valid_acc_total": "18.1585", "valid_n_correct": "13.401", "valid_wer_total": "18.1585", "valid_n_error": "4.75169", "valid_ppl": "2.79", "valid_accuracy": "73.8", "valid_wer": "26.168", "valid_wps": "174", "valid_wpb": "18.2", "valid_bsz": "1", "valid_num_updates": "17500", "valid_best_accuracy": "73.8"}
[2024-06-22 07:08:04,309][fairseq.checkpoint_utils][INFO] - Preparing to save checkpoint for epoch 2 @ 17500 updates
[2024-06-22 07:08:04,310][fairseq.trainer][INFO] - Saving checkpoint to checkpoints/checkpoint_2_17500.pt
[2024-06-22 07:08:07,532][fairseq.trainer][INFO] - Finished saving checkpoint to checkpoints/checkpoint_2_17500.pt
[2024-06-22 07:08:11,755][fairseq.checkpoint_utils][INFO] - Saved checkpoint checkpoints/checkpoint_2_17500.pt (epoch 2 @ 17500 updates, score 73.8) (writing took 7.4456839909544215 seconds)
[2024-06-22 07:10:55,278][train_inner][INFO] - {"epoch": 2, "update": 1.167, "loss": "1.636", "ntokens": "126.19", "acc_total": "126.19", "n_correct": "90.78", "wer_total": "126.19", "n_error": "35.375", "ppl": "3.11", "accuracy": "71.939", "wer": "28.033", "wps": "8.9", "ups": "0.07", "wpb": "126.2", "bsz": "8", "num_updates": "17600", "lr": "0.000160169", "gnorm": "3.191", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "49016"}
[2024-06-22 07:16:22,894][train_inner][INFO] - {"epoch": 2, "update": 1.18, "loss": "1.523", "ntokens": "128.19", "acc_total": "128.19", "n_correct": "93.315", "wer_total": "128.19", "n_error": "34.85", "ppl": "2.87", "accuracy": "72.794", "wer": "27.186", "wps": "78.3", "ups": "0.61", "wpb": "128.2", "bsz": "8", "num_updates": "17800", "lr": "0.000155442", "gnorm": "3.04", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "49343"}
[2024-06-22 07:21:50,716][train_inner][INFO] - {"epoch": 2, "update": 1.193, "loss": "1.603", "ntokens": "126.265", "acc_total": "126.265", "n_correct": "90.325", "wer_total": "126.265", "n_error": "35.93", "ppl": "3.04", "accuracy": "71.536", "wer": "28.456", "wps": "77", "ups": "0.61", "wpb": "126.3", "bsz": "8", "num_updates": "18000", "lr": "0.000150854", "gnorm": "3.062", "loss_scale": "1024", "train_wall": "327", "gb_free": "7.1", "wall": "49671"}
[2024-06-22 07:27:54,243][train_inner][INFO] - {"epoch": 2, "update": 1.207, "loss": "1.663", "ntokens": "127.715", "acc_total": "127.715", "n_correct": "90.985", "wer_total": "127.715", "n_error": "36.69", "ppl": "3.17", "accuracy": "71.241", "wer": "28.728", "wps": "70.3", "ups": "0.55", "wpb": "127.7", "bsz": "8", "num_updates": "18200", "lr": "0.000146402", "gnorm": "10.289", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "50035"}
[2024-06-22 07:33:57,846][train_inner][INFO] - {"epoch": 2, "update": 1.22, "loss": "1.584", "ntokens": "126.27", "acc_total": "126.27", "n_correct": "91.255", "wer_total": "126.27", "n_error": "34.965", "ppl": "3", "accuracy": "72.27", "wer": "27.691", "wps": "69.5", "ups": "0.55", "wpb": "126.3", "bsz": "8", "num_updates": "18400", "lr": "0.000142081", "gnorm": "9.427", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "50398"}
[2024-06-22 07:40:01,475][train_inner][INFO] - {"epoch": 2, "update": 1.233, "loss": "1.573", "ntokens": "126.87", "acc_total": "126.87", "n_correct": "92.17", "wer_total": "126.87", "n_error": "34.68", "ppl": "2.98", "accuracy": "72.649", "wer": "27.335", "wps": "69.8", "ups": "0.55", "wpb": "126.9", "bsz": "8", "num_updates": "18600", "lr": "0.000137888", "gnorm": "9.074", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "50762"}
[2024-06-22 07:46:04,978][train_inner][INFO] - {"epoch": 2, "update": 1.246, "loss": "1.525", "ntokens": "127.155", "acc_total": "127.155", "n_correct": "92.86", "wer_total": "127.155", "n_error": "34.265", "ppl": "2.88", "accuracy": "73.029", "wer": "26.947", "wps": "70", "ups": "0.55", "wpb": "127.2", "bsz": "8", "num_updates": "18800", "lr": "0.000133819", "gnorm": "8.605", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "51125"}
[2024-06-22 07:52:08,638][train_inner][INFO] - {"epoch": 2, "update": 1.26, "loss": "1.484", "ntokens": "127.695", "acc_total": "127.695", "n_correct": "94.72", "wer_total": "127.695", "n_error": "32.965", "ppl": "2.8", "accuracy": "74.177", "wer": "25.815", "wps": "70.2", "ups": "0.55", "wpb": "127.7", "bsz": "8", "num_updates": "19000", "lr": "0.000129869", "gnorm": "8.491", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "51489"}
[2024-06-22 07:58:12,502][train_inner][INFO] - {"epoch": 2, "update": 1.273, "loss": "1.525", "ntokens": "127.145", "acc_total": "127.145", "n_correct": "96.025", "wer_total": "127.145", "n_error": "31.095", "ppl": "2.88", "accuracy": "75.524", "wer": "24.456", "wps": "69.9", "ups": "0.55", "wpb": "127.1", "bsz": "8", "num_updates": "19200", "lr": "0.000126036", "gnorm": "8.291", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "51853"}
[2024-06-22 08:01:32,502][fairseq.trainer][INFO] - NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 1024.0
[2024-06-22 08:04:17,972][train_inner][INFO] - {"epoch": 2, "update": 1.286, "loss": "1.392", "ntokens": "125.845", "acc_total": "125.845", "n_correct": "96.29", "wer_total": "125.845", "n_error": "29.53", "ppl": "2.62", "accuracy": "76.515", "wer": "23.465", "wps": "68.9", "ups": "0.55", "wpb": "125.8", "bsz": "8", "num_updates": "19400", "lr": "0.000122317", "gnorm": "8.266", "loss_scale": "1024", "train_wall": "365", "gb_free": "6.5", "wall": "52218"}
[2024-06-22 08:10:21,596][train_inner][INFO] - {"epoch": 2, "update": 1.3, "loss": "1.421", "ntokens": "125.665", "acc_total": "125.665", "n_correct": "94.66", "wer_total": "125.665", "n_error": "30.975", "ppl": "2.68", "accuracy": "75.327", "wer": "24.649", "wps": "69.1", "ups": "0.55", "wpb": "125.7", "bsz": "8", "num_updates": "19600", "lr": "0.000118707", "gnorm": "8.422", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "52582"}
[2024-06-22 08:16:25,258][train_inner][INFO] - {"epoch": 2, "update": 1.313, "loss": "1.392", "ntokens": "127.615", "acc_total": "127.615", "n_correct": "98.975", "wer_total": "127.615", "n_error": "28.625", "ppl": "2.63", "accuracy": "77.557", "wer": "22.431", "wps": "70.2", "ups": "0.55", "wpb": "127.6", "bsz": "8", "num_updates": "19800", "lr": "0.000115203", "gnorm": "7.975", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "52946"}
[2024-06-22 08:22:28,926][train_inner][INFO] - {"epoch": 2, "update": 1.326, "loss": "1.443", "ntokens": "126.165", "acc_total": "126.165", "n_correct": "94.98", "wer_total": "126.165", "n_error": "31.135", "ppl": "2.72", "accuracy": "75.282", "wer": "24.678", "wps": "69.4", "ups": "0.55", "wpb": "126.2", "bsz": "8", "num_updates": "20000", "lr": "0.000111803", "gnorm": "8.19", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "53309"}
[2024-06-22 08:22:28,927][fairseq_cli.train][INFO] - begin validation on "valid" subset
[2024-06-22 09:04:18,672][valid][INFO] - {"epoch": 2, "valid_loss": "1.289", "valid_ntokens": "18.1585", "valid_acc_total": "18.1585", "valid_n_correct": "13.9614", "valid_wer_total": "18.1585", "valid_n_error": "4.19441", "valid_ppl": "2.44", "valid_accuracy": "76.886", "valid_wer": "23.099", "valid_wps": "173.6", "valid_wpb": "18.2", "valid_bsz": "1", "valid_num_updates": "20000", "valid_best_accuracy": "76.886"}
[2024-06-22 09:04:18,673][fairseq.checkpoint_utils][INFO] - Preparing to save checkpoint for epoch 2 @ 20000 updates
[2024-06-22 09:04:18,673][fairseq.trainer][INFO] - Saving checkpoint to checkpoints/checkpoint_2_20000.pt
[2024-06-22 09:04:21,912][fairseq.trainer][INFO] - Finished saving checkpoint to checkpoints/checkpoint_2_20000.pt
[2024-06-22 09:04:26,195][fairseq.checkpoint_utils][INFO] - Saved checkpoint checkpoints/checkpoint_2_20000.pt (epoch 2 @ 20000 updates, score 76.886) (writing took 7.522035895963199 seconds)
[2024-06-22 09:10:29,837][train_inner][INFO] - {"epoch": 2, "update": 1.339, "loss": "1.429", "ntokens": "126.37", "acc_total": "126.37", "n_correct": "94.455", "wer_total": "126.37", "n_error": "31.895", "ppl": "2.69", "accuracy": "74.745", "wer": "25.239", "wps": "8.8", "ups": "0.07", "wpb": "126.4", "bsz": "8", "num_updates": "20200", "lr": "0.000108504", "gnorm": "7.919", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "56190"}
[2024-06-22 09:16:33,699][train_inner][INFO] - {"epoch": 2, "update": 1.353, "loss": "1.417", "ntokens": "127.095", "acc_total": "127.095", "n_correct": "96.145", "wer_total": "127.095", "n_error": "30.94", "ppl": "2.67", "accuracy": "75.648", "wer": "24.344", "wps": "69.9", "ups": "0.55", "wpb": "127.1", "bsz": "8", "num_updates": "20400", "lr": "0.000105301", "gnorm": "8.054", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "56554"}
[2024-06-22 09:22:37,392][train_inner][INFO] - {"epoch": 2, "update": 1.366, "loss": "1.384", "ntokens": "126.5", "acc_total": "126.5", "n_correct": "96.95", "wer_total": "126.5", "n_error": "29.52", "ppl": "2.61", "accuracy": "76.64", "wer": "23.336", "wps": "69.6", "ups": "0.55", "wpb": "126.5", "bsz": "8", "num_updates": "20600", "lr": "0.000102194", "gnorm": "8.006", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "56918"}
[2024-06-22 09:28:41,260][train_inner][INFO] - {"epoch": 2, "update": 1.379, "loss": "1.354", "ntokens": "127.265", "acc_total": "127.265", "n_correct": "98.84", "wer_total": "127.265", "n_error": "28.4", "ppl": "2.56", "accuracy": "77.665", "wer": "22.316", "wps": "70", "ups": "0.55", "wpb": "127.3", "bsz": "8", "num_updates": "20800", "lr": "9.91776e-05", "gnorm": "7.772", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "57282"}
[2024-06-22 09:34:44,903][train_inner][INFO] - {"epoch": 2, "update": 1.392, "loss": "1.351", "ntokens": "127.405", "acc_total": "127.405", "n_correct": "99.89", "wer_total": "127.405", "n_error": "27.495", "ppl": "2.55", "accuracy": "78.404", "wer": "21.581", "wps": "70.1", "ups": "0.55", "wpb": "127.4", "bsz": "8", "num_updates": "21000", "lr": "9.62506e-05", "gnorm": "7.522", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "57645"}
[2024-06-22 09:40:48,531][train_inner][INFO] - {"epoch": 2, "update": 1.406, "loss": "1.318", "ntokens": "126.7", "acc_total": "126.7", "n_correct": "99.21", "wer_total": "126.7", "n_error": "27.485", "ppl": "2.49", "accuracy": "78.303", "wer": "21.693", "wps": "69.7", "ups": "0.55", "wpb": "126.7", "bsz": "8", "num_updates": "21200", "lr": "9.341e-05", "gnorm": "7.509", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "58009"}
[2024-06-22 09:46:52,232][train_inner][INFO] - {"epoch": 2, "update": 1.419, "loss": "1.307", "ntokens": "127.32", "acc_total": "127.32", "n_correct": "99.48", "wer_total": "127.32", "n_error": "27.835", "ppl": "2.47", "accuracy": "78.134", "wer": "21.862", "wps": "70", "ups": "0.55", "wpb": "127.3", "bsz": "8", "num_updates": "21400", "lr": "9.06532e-05", "gnorm": "7.52", "loss_scale": "2048", "train_wall": "363", "gb_free": "6.5", "wall": "58373"}
[2024-06-22 09:52:55,870][train_inner][INFO] - {"epoch": 2, "update": 1.432, "loss": "1.324", "ntokens": "126.805", "acc_total": "126.805", "n_correct": "97.015", "wer_total": "126.805", "n_error": "29.735", "ppl": "2.5", "accuracy": "76.507", "wer": "23.449", "wps": "69.7", "ups": "0.55", "wpb": "126.8", "bsz": "8", "num_updates": "21600", "lr": "8.79777e-05", "gnorm": "7.647", "loss_scale": "2048", "train_wall": "363", "gb_free": "6.5", "wall": "58736"}
[2024-06-22 09:58:59,664][train_inner][INFO] - {"epoch": 2, "update": 1.445, "loss": "1.272", "ntokens": "125.555", "acc_total": "125.555", "n_correct": "98.245", "wer_total": "125.555", "n_error": "27.295", "ppl": "2.42", "accuracy": "78.249", "wer": "21.739", "wps": "69", "ups": "0.55", "wpb": "125.6", "bsz": "8", "num_updates": "21800", "lr": "8.53812e-05", "gnorm": "7.393", "loss_scale": "2048", "train_wall": "363", "gb_free": "6.5", "wall": "59100"}
[2024-06-22 10:05:03,468][train_inner][INFO] - {"epoch": 2, "update": 1.459, "loss": "1.347", "ntokens": "127.64", "acc_total": "127.64", "n_correct": "99.155", "wer_total": "127.64", "n_error": "28.455", "ppl": "2.54", "accuracy": "77.683", "wer": "22.293", "wps": "70.2", "ups": "0.55", "wpb": "127.6", "bsz": "8", "num_updates": "22000", "lr": "8.28614e-05", "gnorm": "7.375", "loss_scale": "2048", "train_wall": "363", "gb_free": "6.5", "wall": "59464"}
[2024-06-22 10:11:07,500][train_inner][INFO] - {"epoch": 2, "update": 1.472, "loss": "1.24", "ntokens": "126.515", "acc_total": "126.515", "n_correct": "99.64", "wer_total": "126.515", "n_error": "26.86", "ppl": "2.36", "accuracy": "78.757", "wer": "21.231", "wps": "69.5", "ups": "0.55", "wpb": "126.5", "bsz": "8", "num_updates": "22200", "lr": "8.04159e-05", "gnorm": "7.259", "loss_scale": "2048", "train_wall": "363", "gb_free": "6.5", "wall": "59828"}
[2024-06-22 10:17:11,297][train_inner][INFO] - {"epoch": 2, "update": 1.485, "loss": "1.257", "ntokens": "127.11", "acc_total": "127.11", "n_correct": "99.46", "wer_total": "127.11", "n_error": "27.635", "ppl": "2.39", "accuracy": "78.247", "wer": "21.741", "wps": "69.9", "ups": "0.55", "wpb": "127.1", "bsz": "8", "num_updates": "22400", "lr": "7.80425e-05", "gnorm": "7.621", "loss_scale": "2048", "train_wall": "363", "gb_free": "6.5", "wall": "60192"}
[2024-06-22 10:20:13,255][fairseq_cli.train][INFO] - begin validation on "valid" subset
[2024-06-22 11:02:00,943][valid][INFO] - {"epoch": 2, "valid_loss": "1.162", "valid_ntokens": "18.1585", "valid_acc_total": "18.1585", "valid_n_correct": "14.6363", "valid_wer_total": "18.1585", "valid_n_error": "3.52043", "valid_ppl": "2.24", "valid_accuracy": "80.603", "valid_wer": "19.387", "valid_wps": "173.7", "valid_wpb": "18.2", "valid_bsz": "1", "valid_num_updates": "22500", "valid_best_accuracy": "80.603"}
[2024-06-22 11:02:00,944][fairseq.checkpoint_utils][INFO] - Preparing to save checkpoint for epoch 2 @ 22500 updates
[2024-06-22 11:02:00,944][fairseq.trainer][INFO] - Saving checkpoint to checkpoints/checkpoint_2_22500.pt
[2024-06-22 11:02:04,164][fairseq.trainer][INFO] - Finished saving checkpoint to checkpoints/checkpoint_2_22500.pt
[2024-06-22 11:02:08,389][fairseq.checkpoint_utils][INFO] - Saved checkpoint checkpoints/checkpoint_2_22500.pt (epoch 2 @ 22500 updates, score 80.603) (writing took 7.444795946008526 seconds)
[2024-06-22 11:05:10,093][train_inner][INFO] - {"epoch": 2, "update": 1.498, "loss": "1.291", "ntokens": "126.78", "acc_total": "126.78", "n_correct": "99.425", "wer_total": "126.78", "n_error": "27.345", "ppl": "2.45", "accuracy": "78.423", "wer": "21.569", "wps": "8.8", "ups": "0.07", "wpb": "126.8", "bsz": "8", "num_updates": "22600", "lr": "7.57393e-05", "gnorm": "7.319", "loss_scale": "2048", "train_wall": "363", "gb_free": "6.5", "wall": "63071"}
[2024-06-22 11:10:43,083][fairseq.trainer][INFO] - NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 1024.0
[2024-06-22 11:11:15,828][train_inner][INFO] - {"epoch": 2, "update": 1.512, "loss": "1.268", "ntokens": "126.57", "acc_total": "126.57", "n_correct": "98.27", "wer_total": "126.57", "n_error": "28.27", "ppl": "2.41", "accuracy": "77.641", "wer": "22.335", "wps": "69.2", "ups": "0.55", "wpb": "126.6", "bsz": "8", "num_updates": "22800", "lr": "7.3504e-05", "gnorm": "7.058", "loss_scale": "1024", "train_wall": "365", "gb_free": "6.5", "wall": "63436"}
[2024-06-22 11:17:19,756][train_inner][INFO] - {"epoch": 2, "update": 1.525, "loss": "1.294", "ntokens": "126.155", "acc_total": "126.155", "n_correct": "97.775", "wer_total": "126.155", "n_error": "28.375", "ppl": "2.45", "accuracy": "77.504", "wer": "22.492", "wps": "69.3", "ups": "0.55", "wpb": "126.2", "bsz": "8", "num_updates": "23000", "lr": "7.13346e-05", "gnorm": "7.446", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "63800"}
[2024-06-22 11:23:24,001][train_inner][INFO] - {"epoch": 2, "update": 1.538, "loss": "1.22", "ntokens": "126.97", "acc_total": "126.97", "n_correct": "99.385", "wer_total": "126.97", "n_error": "27.58", "ppl": "2.33", "accuracy": "78.274", "wer": "21.722", "wps": "69.7", "ups": "0.55", "wpb": "127", "bsz": "8", "num_updates": "23200", "lr": "6.92293e-05", "gnorm": "7.134", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "64164"}
[2024-06-22 11:29:28,097][train_inner][INFO] - {"epoch": 2, "update": 1.552, "loss": "1.174", "ntokens": "126.29", "acc_total": "126.29", "n_correct": "99.355", "wer_total": "126.29", "n_error": "26.915", "ppl": "2.26", "accuracy": "78.672", "wer": "21.312", "wps": "69.4", "ups": "0.55", "wpb": "126.3", "bsz": "8", "num_updates": "23400", "lr": "6.71862e-05", "gnorm": "6.989", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "64529"}
[2024-06-22 11:35:32,209][train_inner][INFO] - {"epoch": 2, "update": 1.565, "loss": "1.3", "ntokens": "126.595", "acc_total": "126.595", "n_correct": "97.23", "wer_total": "126.595", "n_error": "29.35", "ppl": "2.46", "accuracy": "76.804", "wer": "23.184", "wps": "69.5", "ups": "0.55", "wpb": "126.6", "bsz": "8", "num_updates": "23600", "lr": "6.52033e-05", "gnorm": "7.549", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "64893"}
[2024-06-22 11:41:36,256][train_inner][INFO] - {"epoch": 2, "update": 1.578, "loss": "1.231", "ntokens": "127.175", "acc_total": "127.175", "n_correct": "98.345", "wer_total": "127.175", "n_error": "28.81", "ppl": "2.35", "accuracy": "77.33", "wer": "22.654", "wps": "69.9", "ups": "0.55", "wpb": "127.2", "bsz": "8", "num_updates": "23800", "lr": "6.3279e-05", "gnorm": "6.956", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "65257"}
[2024-06-22 11:47:40,308][train_inner][INFO] - {"epoch": 2, "update": 1.591, "loss": "1.242", "ntokens": "126.775", "acc_total": "126.775", "n_correct": "98.035", "wer_total": "126.775", "n_error": "28.735", "ppl": "2.37", "accuracy": "77.33", "wer": "22.666", "wps": "69.6", "ups": "0.55", "wpb": "126.8", "bsz": "8", "num_updates": "24000", "lr": "6.14114e-05", "gnorm": "7.062", "loss_scale": "1024", "train_wall": "363", "gb_free": "6.5", "wall": "65621"}
[2024-06-22 11:53:44,629][train_inner][INFO] - {"epoch": 2, "update": 1.605, "loss": "1.213", "ntokens": "126.17", "acc_total": "126.17", "n_correct": "97.62", "wer_total": "126.17", "n_error": "28.55", "ppl": "2.32", "accuracy": "77.372", "wer": "22.628", "wps": "69.3", "ups": "0.55", "wpb": "126.2", "bsz": "8", "num_updates": "24200", "lr": "5.9599e-05", "gnorm": "7.387", "loss_scale": "1024", "train_wall": "364", "gb_free": "6.5", "wall": "65985"}
[2024-06-22 11:59:49,002][train_inner][INFO] - {"epoch": 2, "update": 1.618, "loss": "1.22", "ntokens": "127.2", "acc_total": "127.2", "n_correct": "98.415", "wer_total": "127.2", "n_error": "28.775", "ppl": "2.33", "accuracy": "77.37", "wer": "22.622", "wps": "69.8", "ups": "0.55", "wpb": "127.2", "bsz": "8", "num_updates": "24400", "lr": "5.784e-05", "gnorm": "7.167", "loss_scale": "1024", "train_wall": "364", "gb_free": "6.5", "wall": "66349"}
[2024-06-22 12:05:53,265][train_inner][INFO] - {"epoch": 2, "update": 1.631, "loss": "1.261", "ntokens": "127.38", "acc_total": "127.38", "n_correct": "98.09", "wer_total": "127.38", "n_error": "29.29", "ppl": "2.4", "accuracy": "77.006", "wer": "22.994", "wps": "69.9", "ups": "0.55", "wpb": "127.4", "bsz": "8", "num_updates": "24600", "lr": "5.6133e-05", "gnorm": "6.764", "loss_scale": "1024", "train_wall": "364", "gb_free": "6.5", "wall": "66714"}
[2024-06-22 12:11:57,498][train_inner][INFO] - {"epoch": 2, "update": 1.644, "loss": "1.187", "ntokens": "127.26", "acc_total": "127.26", "n_correct": "99.36", "wer_total": "127.26", "n_error": "27.88", "ppl": "2.28", "accuracy": "78.076", "wer": "21.908", "wps": "69.9", "ups": "0.55", "wpb": "127.3", "bsz": "8", "num_updates": "24800", "lr": "5.44763e-05", "gnorm": "6.797", "loss_scale": "1024", "train_wall": "364", "gb_free": "6.5", "wall": "67078"}
[2024-06-22 12:18:01,814][train_inner][INFO] - {"epoch": 2, "update": 1.658, "loss": "1.23", "ntokens": "125.92", "acc_total": "125.92", "n_correct": "97.5", "wer_total": "125.92", "n_error": "28.395", "ppl": "2.35", "accuracy": "77.43", "wer": "22.55", "wps": "69.1", "ups": "0.55", "wpb": "125.9", "bsz": "8", "num_updates": "25000", "lr": "5.28686e-05", "gnorm": "7.013", "loss_scale": "2048", "train_wall": "364", "gb_free": "6.5", "wall": "67442"}
[2024-06-22 12:18:01,814][fairseq_cli.train][INFO] - begin validation on "valid" subset
[2024-06-22 12:59:52,578][valid][INFO] - {"epoch": 2, "valid_loss": "1.061", "valid_ntokens": "18.1585", "valid_acc_total": "18.1585", "valid_n_correct": "14.4155", "valid_wer_total": "18.1585", "valid_n_error": "3.74089", "valid_ppl": "2.09", "valid_accuracy": "79.387", "valid_wer": "20.601", "valid_wps": "173.5", "valid_wpb": "18.2", "valid_bsz": "1", "valid_num_updates": "25000", "valid_best_accuracy": "80.603"}
[2024-06-22 12:59:52,578][fairseq.checkpoint_utils][INFO] - Preparing to save checkpoint for epoch 2 @ 25000 updates
[2024-06-22 12:59:52,579][fairseq.trainer][INFO] - Saving checkpoint to checkpoints/checkpoint_2_25000.pt
[2024-06-22 12:59:55,808][fairseq.trainer][INFO] - Finished saving checkpoint to checkpoints/checkpoint_2_25000.pt
[2024-06-22 12:59:58,115][fairseq.checkpoint_utils][INFO] - Saved checkpoint checkpoints/checkpoint_2_25000.pt (epoch 2 @ 25000 updates, score 79.387) (writing took 5.536739203962497 seconds)
[2024-06-22 13:06:02,428][train_inner][INFO] - {"epoch": 2, "update": 1.671, "loss": "1.136", "ntokens": "127.44", "acc_total": "127.44", "n_correct": "98.99", "wer_total": "127.44", "n_error": "28.43", "ppl": "2.2", "accuracy": "77.676", "wer": "22.309", "wps": "8.8", "ups": "0.07", "wpb": "127.4", "bsz": "8", "num_updates": "25200", "lr": "5.13083e-05", "gnorm": "6.919", "loss_scale": "2048", "train_wall": "364", "gb_free": "6.5", "wall": "70323"}
[2024-06-22 13:12:07,011][train_inner][INFO] - {"epoch": 2, "update": 1.684, "loss": "1.194", "ntokens": "125.36", "acc_total": "125.36", "n_correct": "96.655", "wer_total": "125.36", "n_error": "28.675", "ppl": "2.29", "accuracy": "77.102", "wer": "22.874", "wps": "68.8", "ups": "0.55", "wpb": "125.4", "bsz": "8", "num_updates": "25400", "lr": "4.9794e-05", "gnorm": "7.018", "loss_scale": "2048", "train_wall": "364", "gb_free": "6.5", "wall": "70687"}
[2024-06-22 13:18:11,577][train_inner][INFO] - {"epoch": 2, "update": 1.697, "loss": "1.132", "ntokens": "127.07", "acc_total": "127.07", "n_correct": "99.03", "wer_total": "127.07", "n_error": "28.03", "ppl": "2.19", "accuracy": "77.933", "wer": "22.059", "wps": "69.7", "ups": "0.55", "wpb": "127.1", "bsz": "8", "num_updates": "25600", "lr": "4.83244e-05", "gnorm": "6.827", "loss_scale": "2048", "train_wall": "364", "gb_free": "6.5", "wall": "71052"}
[2024-06-22 13:24:16,019][train_inner][INFO] - {"epoch": 2, "update": 1.711, "loss": "1.126", "ntokens": "127.055", "acc_total": "127.055", "n_correct": "99.275", "wer_total": "127.055", "n_error": "27.765", "ppl": "2.18", "accuracy": "78.135", "wer": "21.853", "wps": "69.7", "ups": "0.55", "wpb": "127.1", "bsz": "8", "num_updates": "25800", "lr": "4.68982e-05", "gnorm": "6.809", "loss_scale": "2048", "train_wall": "364", "gb_free": "6.5", "wall": "71416"}
[2024-06-22 13:30:20,647][train_inner][INFO] - {"epoch": 2, "update": 1.724, "loss": "1.191", "ntokens": "126.935", "acc_total": "126.935", "n_correct": "98.77", "wer_total": "126.935", "n_error": "28.135", "ppl": "2.28", "accuracy": "77.811", "wer": "22.165", "wps": "69.6", "ups": "0.55", "wpb": "126.9", "bsz": "8", "num_updates": "26000", "lr": "4.55141e-05", "gnorm": "7.044", "loss_scale": "2048", "train_wall": "364", "gb_free": "6.5", "wall": "71781"}
[2024-06-22 13:36:25,219][train_inner][INFO] - {"epoch": 2, "update": 1.737, "loss": "1.176", "ntokens": "126.305", "acc_total": "126.305", "n_correct": "98.015", "wer_total": "126.305", "n_error": "28.27", "ppl": "2.26", "accuracy": "77.602", "wer": "22.382", "wps": "69.3", "ups": "0.55", "wpb": "126.3", "bsz": "8", "num_updates": "26200", "lr": "4.41708e-05", "gnorm": "6.87", "loss_scale": "2048", "train_wall": "364", "gb_free": "6.5", "wall": "72146"}
[2024-06-22 13:42:30,028][train_inner][INFO] - {"epoch": 2, "update": 1.75, "loss": "1.153", "ntokens": "127.47", "acc_total": "127.47", "n_correct": "99.14", "wer_total": "127.47", "n_error": "28.325", "ppl": "2.22", "accuracy": "77.775", "wer": "22.221", "wps": "69.9", "ups": "0.55", "wpb": "127.5", "bsz": "8", "num_updates": "26400", "lr": "4.28672e-05", "gnorm": "6.681", "loss_scale": "2048", "train_wall": "364", "gb_free": "6.5", "wall": "72511"}
[2024-06-22 13:48:34,447][train_inner][INFO] - {"epoch": 2, "update": 1.764, "loss": "1.121", "ntokens": "126.6", "acc_total": "126.6", "n_correct": "99.325", "wer_total": "126.6", "n_error": "27.26", "ppl": "2.17", "accuracy": "78.456", "wer": "21.532", "wps": "69.5", "ups": "0.55", "wpb": "126.6", "bsz": "8", "num_updates": "26600", "lr": "4.16021e-05", "gnorm": "6.811", "loss_scale": "2048", "train_wall": "364", "gb_free": "6.5", "wall": "72875"}
[2024-06-22 13:54:38,934][train_inner][INFO] - {"epoch": 2, "update": 1.777, "loss": "1.118", "ntokens": "126.565", "acc_total": "126.565", "n_correct": "99.83", "wer_total": "126.565", "n_error": "26.725", "ppl": "2.17", "accuracy": "78.876", "wer": "21.116", "wps": "69.4", "ups": "0.55", "wpb": "126.6", "bsz": "8", "num_updates": "26800", "lr": "4.03743e-05", "gnorm": "6.558", "loss_scale": "2048", "train_wall": "364", "gb_free": "6.5", "wall": "73239"}
[2024-06-22 14:00:43,529][train_inner][INFO] - {"epoch": 2, "update": 1.79, "loss": "1.183", "ntokens": "127.69", "acc_total": "127.69", "n_correct": "99.81", "wer_total": "127.69", "n_error": "27.84", "ppl": "2.27", "accuracy": "78.166", "wer": "21.803", "wps": "70", "ups": "0.55", "wpb": "127.7", "bsz": "8", "num_updates": "27000", "lr": "3.91827e-05", "gnorm": "6.954", "loss_scale": "4096", "train_wall": "364", "gb_free": "6.5", "wall": "73604"}
[2024-06-22 14:00:56,216][fairseq.trainer][INFO] - NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 2048.0
[2024-06-22 14:06:49,676][train_inner][INFO] - {"epoch": 2, "update": 1.803, "loss": "1.166", "ntokens": "128.025", "acc_total": "128.025", "n_correct": "100.34", "wer_total": "128.025", "n_error": "27.665", "ppl": "2.24", "accuracy": "78.375", "wer": "21.609", "wps": "69.9", "ups": "0.55", "wpb": "128", "bsz": "8", "num_updates": "27200", "lr": "3.80263e-05", "gnorm": "6.957", "loss_scale": "2048", "train_wall": "365", "gb_free": "6.5", "wall": "73970"}
[2024-06-22 14:12:54,041][train_inner][INFO] - {"epoch": 2, "update": 1.817, "loss": "1.175", "ntokens": "126.68", "acc_total": "126.68", "n_correct": "99.31", "wer_total": "126.68", "n_error": "27.355", "ppl": "2.26", "accuracy": "78.394", "wer": "21.594", "wps": "69.5", "ups": "0.55", "wpb": "126.7", "bsz": "8", "num_updates": "27400", "lr": "3.6904e-05", "gnorm": "6.983", "loss_scale": "2048", "train_wall": "364", "gb_free": "6.5", "wall": "74335"}
[2024-06-22 14:15:56,290][fairseq_cli.train][INFO] - begin validation on "valid" subset
[2024-06-22 14:57:47,168][valid][INFO] - {"epoch": 2, "valid_loss": "1.003", "valid_ntokens": "18.1585", "valid_acc_total": "18.1585", "valid_n_correct": "14.6503", "valid_wer_total": "18.1585", "valid_n_error": "3.5065", "valid_ppl": "2", "valid_accuracy": "80.68", "valid_wer": "19.311", "valid_wps": "173.5", "valid_wpb": "18.2", "valid_bsz": "1", "valid_num_updates": "27500", "valid_best_accuracy": "80.68"}
[2024-06-22 14:57:47,168][fairseq.checkpoint_utils][INFO] - Preparing to save checkpoint for epoch 2 @ 27500 updates
[2024-06-22 14:57:47,169][fairseq.trainer][INFO] - Saving checkpoint to checkpoints/checkpoint_2_27500.pt
[2024-06-22 14:57:50,417][fairseq.trainer][INFO] - Finished saving checkpoint to checkpoints/checkpoint_2_27500.pt
[2024-06-22 14:57:54,768][fairseq.checkpoint_utils][INFO] - Saved checkpoint checkpoints/checkpoint_2_27500.pt (epoch 2 @ 27500 updates, score 80.68) (writing took 7.6001191979739815 seconds)
[2024-06-22 15:00:56,627][train_inner][INFO] - {"epoch": 2, "update": 1.83, "loss": "1.141", "ntokens": "126.45", "acc_total": "126.45", "n_correct": "99.56", "wer_total": "126.45", "n_error": "26.875", "ppl": "2.2", "accuracy": "78.735", "wer": "21.253", "wps": "8.8", "ups": "0.07", "wpb": "126.5", "bsz": "8", "num_updates": "27600", "lr": "3.58149e-05", "gnorm": "7.051", "loss_scale": "2048", "train_wall": "363", "gb_free": "6.5", "wall": "77217"}
[2024-06-22 15:07:00,843][train_inner][INFO] - {"epoch": 2, "update": 1.843, "loss": "1.142", "ntokens": "127.465", "acc_total": "127.465", "n_correct": "100.34", "wer_total": "127.465", "n_error": "27.115", "ppl": "2.21", "accuracy": "78.72", "wer": "21.273", "wps": "70", "ups": "0.55", "wpb": "127.5", "bsz": "8", "num_updates": "27800", "lr": "3.47579e-05", "gnorm": "6.627", "loss_scale": "2048", "train_wall": "363", "gb_free": "6.5", "wall": "77581"}
[2024-06-22 15:13:04,898][train_inner][INFO] - {"epoch": 2, "update": 1.856, "loss": "1.152", "ntokens": "127.49", "acc_total": "127.49", "n_correct": "99.875", "wer_total": "127.49", "n_error": "27.6", "ppl": "2.22", "accuracy": "78.339", "wer": "21.649", "wps": "70", "ups": "0.55", "wpb": "127.5", "bsz": "8", "num_updates": "28000", "lr": "3.37321e-05", "gnorm": "6.637", "loss_scale": "2048", "train_wall": "363", "gb_free": "6.5", "wall": "77945"}
[2024-06-22 15:19:09,320][train_inner][INFO] - {"epoch": 2, "update": 1.87, "loss": "1.117", "ntokens": "127.75", "acc_total": "127.75", "n_correct": "100.57", "wer_total": "127.75", "n_error": "27.16", "ppl": "2.17", "accuracy": "78.724", "wer": "21.26", "wps": "70.1", "ups": "0.55", "wpb": "127.8", "bsz": "8", "num_updates": "28200", "lr": "3.27365e-05", "gnorm": "6.8", "loss_scale": "2048", "train_wall": "364", "gb_free": "6.5", "wall": "78310"}
[2024-06-22 15:25:13,597][train_inner][INFO] - {"epoch": 2, "update": 1.883, "loss": "1.126", "ntokens": "127.125", "acc_total": "127.125", "n_correct": "100.14", "wer_total": "127.125", "n_error": "26.98", "ppl": "2.18", "accuracy": "78.773", "wer": "21.223", "wps": "69.8", "ups": "0.55", "wpb": "127.1", "bsz": "8", "num_updates": "28400", "lr": "3.17704e-05", "gnorm": "6.798", "loss_scale": "2048", "train_wall": "364", "gb_free": "6.5", "wall": "78674"}
[2024-06-22 15:31:17,645][train_inner][INFO] - {"epoch": 2, "update": 1.896, "loss": "1.139", "ntokens": "128.015", "acc_total": "128.015", "n_correct": "100.265", "wer_total": "128.015", "n_error": "27.72", "ppl": "2.2", "accuracy": "78.323", "wer": "21.654", "wps": "70.3", "ups": "0.55", "wpb": "128", "bsz": "8", "num_updates": "28600", "lr": "3.08327e-05", "gnorm": "6.679", "loss_scale": "2048", "train_wall": "363", "gb_free": "6.5", "wall": "79038"}
[2024-06-22 15:37:21,713][train_inner][INFO] - {"epoch": 2, "update": 1.91, "loss": "1.114", "ntokens": "126.755", "acc_total": "126.755", "n_correct": "100.065", "wer_total": "126.755", "n_error": "26.665", "ppl": "2.16", "accuracy": "78.944", "wer": "21.037", "wps": "69.6", "ups": "0.55", "wpb": "126.8", "bsz": "8", "num_updates": "28800", "lr": "2.99228e-05", "gnorm": "6.597", "loss_scale": "2048", "train_wall": "363", "gb_free": "6.5", "wall": "79402"}
[2024-06-22 15:43:25,851][train_inner][INFO] - {"epoch": 2, "update": 1.923, "loss": "1.08", "ntokens": "127.74", "acc_total": "127.74", "n_correct": "101.185", "wer_total": "127.74", "n_error": "26.545", "ppl": "2.11", "accuracy": "79.212", "wer": "20.78", "wps": "70.2", "ups": "0.55", "wpb": "127.7", "bsz": "8", "num_updates": "29000", "lr": "2.90397e-05", "gnorm": "6.564", "loss_scale": "2048", "train_wall": "363", "gb_free": "6.5", "wall": "79766"}
[2024-06-22 15:47:51,630][fairseq.trainer][INFO] - NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 2048.0
[2024-06-22 15:49:31,806][train_inner][INFO] - {"epoch": 2, "update": 1.936, "loss": "1.126", "ntokens": "126.745", "acc_total": "126.745", "n_correct": "100.205", "wer_total": "126.745", "n_error": "26.52", "ppl": "2.18", "accuracy": "79.06", "wer": "20.924", "wps": "69.3", "ups": "0.55", "wpb": "126.7", "bsz": "8", "num_updates": "29200", "lr": "2.81826e-05", "gnorm": "6.821", "loss_scale": "2048", "train_wall": "365", "gb_free": "6.5", "wall": "80132"}
[2024-06-22 15:55:36,119][train_inner][INFO] - {"epoch": 2, "update": 1.949, "loss": "1.12", "ntokens": "127.185", "acc_total": "127.185", "n_correct": "100.245", "wer_total": "127.185", "n_error": "26.925", "ppl": "2.17", "accuracy": "78.818", "wer": "21.17", "wps": "69.8", "ups": "0.55", "wpb": "127.2", "bsz": "8", "num_updates": "29400", "lr": "2.73509e-05", "gnorm": "6.799", "loss_scale": "2048", "train_wall": "364", "gb_free": "6.5", "wall": "80497"}
[2024-06-22 16:01:40,251][train_inner][INFO] - {"epoch": 2, "update": 1.963, "loss": "1.084", "ntokens": "127.1", "acc_total": "127.1", "n_correct": "100.795", "wer_total": "127.1", "n_error": "26.305", "ppl": "2.12", "accuracy": "79.304", "wer": "20.696", "wps": "69.8", "ups": "0.55", "wpb": "127.1", "bsz": "8", "num_updates": "29600", "lr": "2.65436e-05", "gnorm": "6.908", "loss_scale": "2048", "train_wall": "363", "gb_free": "6.5", "wall": "80861"}
[2024-06-22 16:07:44,245][train_inner][INFO] - {"epoch": 2, "update": 1.976, "loss": "1.12", "ntokens": "126.785", "acc_total": "126.785", "n_correct": "100.55", "wer_total": "126.785", "n_error": "26.235", "ppl": "2.17", "accuracy": "79.307", "wer": "20.693", "wps": "69.7", "ups": "0.55", "wpb": "126.8", "bsz": "8", "num_updates": "29800", "lr": "2.57603e-05", "gnorm": "6.843", "loss_scale": "2048", "train_wall": "363", "gb_free": "6.5", "wall": "81225"}
[2024-06-22 16:13:48,246][train_inner][INFO] - {"epoch": 2, "update": 1.989, "loss": "1.116", "ntokens": "126.96", "acc_total": "126.96", "n_correct": "100.825", "wer_total": "126.96", "n_error": "26.11", "ppl": "2.17", "accuracy": "79.415", "wer": "20.566", "wps": "69.8", "ups": "0.55", "wpb": "127", "bsz": "8", "num_updates": "30000", "lr": "2.5e-05", "gnorm": "6.792", "loss_scale": "2048", "train_wall": "363", "gb_free": "6.5", "wall": "81589"}
[2024-06-22 16:13:48,246][fairseq_cli.train][INFO] - Stopping training due to num_updates: 30000 >= max_update: 30000
[2024-06-22 16:13:48,246][fairseq_cli.train][INFO] - begin validation on "valid" subset
[2024-06-22 16:55:39,091][valid][INFO] - {"epoch": 2, "valid_loss": "0.968", "valid_ntokens": "18.1585", "valid_acc_total": "18.1585", "valid_n_correct": "14.7998", "valid_wer_total": "18.1585", "valid_n_error": "3.35723", "valid_ppl": "1.96", "valid_accuracy": "81.504", "valid_wer": "18.489", "valid_wps": "173.5", "valid_wpb": "18.2", "valid_bsz": "1", "valid_num_updates": "30000", "valid_best_accuracy": "81.504"}
[2024-06-22 16:55:39,092][fairseq.checkpoint_utils][INFO] - Preparing to save checkpoint for epoch 2 @ 30000 updates
[2024-06-22 16:55:39,092][fairseq.trainer][INFO] - Saving checkpoint to checkpoints/checkpoint_2_30000.pt
[2024-06-22 16:55:42,325][fairseq.trainer][INFO] - Finished saving checkpoint to checkpoints/checkpoint_2_30000.pt
[2024-06-22 16:55:46,706][fairseq.checkpoint_utils][INFO] - Saved checkpoint checkpoints/checkpoint_2_30000.pt (epoch 2 @ 30000 updates, score 81.504) (writing took 7.614066223963164 seconds)
[2024-06-22 16:55:46,738][fairseq_cli.train][INFO] - end of epoch 2 (average epoch stats below)
[2024-06-22 16:55:46,740][train][INFO] - {"epoch": 2, "train_loss": "1.335", "train_ntokens": "126.909", "train_acc_total": "126.909", "train_n_correct": "96.666", "train_wer_total": "126.909", "train_n_error": "30.221", "train_ppl": "2.52", "train_accuracy": "76.169", "train_wer": "23.813", "train_wps": "45.4", "train_ups": "0.36", "train_wpb": "126.9", "train_bsz": "8", "train_num_updates": "30000", "train_lr": "2.5e-05", "train_gnorm": "6.537", "train_loss_scale": "2048", "train_train_wall": "26578", "train_gb_free": "6.5", "train_wall": "84107"}
[2024-06-22 16:55:46,740][fairseq_cli.train][INFO] - done training in 84106.6 seconds