Model parameters: d_model 768 ffw_size 3072 kv_size 64 n_heads 12 n_layers 15 Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 15 --hidden-size 768 --num-attention-heads 12 --kv-channels 64 --ffn-hidden-size 3072 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 29_492_188 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-146m60b400m --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 29_492_188 --lr-warmup-samples 294_922 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 100 --save-interval 10000 --eval-interval 10000 --eval-iters 1 --tensorboard-dir tensorboard_146m60b400m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_146m60b400m --load checkpoints_146m60b400m --train-weighted-split-paths-path train400m.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3326734.json --zero-stage 0 START 3326734: Thu 16 Mar 2023 11:22:01 PM EET 0: 0: 0: ======================= ROCm System Management Interface ======================= 0: ================================= Concise Info ================================= 0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0: 0 40.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: 2 36.0c 80.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: 4 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: 6 44.0c 78.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: ================================================================================ 0: ============================= End of ROCm SMI Log ============================== 2: 2: 2: ======================= ROCm System Management Interface ======================= 2: ================================= Concise Info ================================= 2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 2: 0 51.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: 2 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: 4 40.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: 6 43.0c 82.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: ================================================================================ 2: ============================= End of ROCm SMI Log ============================== 4: 4: 4: ======================= ROCm System Management Interface ======================= 4: ================================= Concise Info ================================= 4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 4: 0 43.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: 2 44.0c 105.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: 4 45.0c 98.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: 6 40.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: ================================================================================ 4: ============================= End of ROCm SMI Log ============================== 1: 1: 1: ======================= ROCm System Management Interface ======================= 1: ================================= Concise Info ================================= 1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 1: 0 50.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: 2 42.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: 4 42.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: 6 41.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: ================================================================================ 1: ============================= End of ROCm SMI Log ============================== 7: 7: 7: ======================= ROCm System Management Interface ======================= 7: ================================= Concise Info ================================= 7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 7: 0 47.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: 2 40.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 3 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: 4 43.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 5 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: 6 41.0c 104.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: ================================================================================ 7: ============================= End of ROCm SMI Log ============================== 5: 5: 5: ======================= ROCm System Management Interface ======================= 5: ================================= Concise Info ================================= 5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 5: 0 43.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 1 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: 2 40.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: 4 44.0c 81.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: 6 38.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: ================================================================================ 5: ============================= End of ROCm SMI Log ============================== 6: 6: 6: ======================= ROCm System Management Interface ======================= 6: ================================= Concise Info ================================= 6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 6: 0 41.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 1 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: 2 45.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: 4 49.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: 6 44.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 7 38.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: ================================================================================ 6: ============================= End of ROCm SMI Log ============================== 3: 3: 3: ======================= ROCm System Management Interface ======================= 3: ================================= Concise Info ================================= 3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 3: 0 43.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: 2 42.0c 79.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: 4 43.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: 6 45.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 7 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: ================================================================================ 3: ============================= End of ROCm SMI Log ============================== 7: Launching on nid005617 (7/8), master nid005610 port 9999, GPUs 8, CUDA: True 6: Launching on nid005616 (6/8), master nid005610 port 9999, GPUs 8, CUDA: True 3: Launching on nid005613 (3/8), master nid005610 port 9999, GPUs 8, CUDA: True 5: Launching on nid005615 (5/8), master nid005610 port 9999, GPUs 8, CUDA: True 0: Launching on nid005610 (0/8), master nid005610 port 9999, GPUs 8, CUDA: True 4: Launching on nid005614 (4/8), master nid005610 port 9999, GPUs 8, CUDA: True 2: Launching on nid005612 (2/8), master nid005610 port 9999, GPUs 8, CUDA: True 1: Launching on nid005611 (1/8), master nid005610 port 9999, GPUs 8, CUDA: True 0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. 0: using torch.bfloat16 for parameters ... 0: ------------------------ arguments ------------------------ 0: abort_on_unmet_fused_kernel_constraints ......... False 0: accumulate_allreduce_grads_in_fp32 .............. True 0: adam_beta1 ...................................... 0.9 0: adam_beta2 ...................................... 0.999 0: adam_eps ........................................ 1e-08 0: adlr_autoresume ................................. False 0: adlr_autoresume_interval ........................ 1000 0: apply_query_key_layer_scaling ................... True 0: apply_residual_connection_post_layernorm ........ False 0: attention_dropout ............................... 0.1 0: attention_softmax_in_fp32 ....................... False 0: bert_binary_head ................................ True 0: bert_load ....................................... None 0: bf16 ............................................ True 0: bias_dropout_fusion ............................. True 0: bias_gelu_fusion ................................ True 0: biencoder_projection_dim ........................ 0 0: biencoder_shared_query_context_model ............ False 0: block_data_path ................................. None 0: checkpoint_activations .......................... True 0: checkpoint_in_cpu ............................... False 0: checkpoint_num_layers ........................... 1 0: clip_grad ....................................... 1.0 0: codecarbon_dir .................................. None 0: consumed_train_samples .......................... 0 0: consumed_train_tokens ........................... 0 0: consumed_valid_samples .......................... 0 0: contigious_checkpointing ........................ False 0: cpu_optimizer ................................... False 0: cpu_torch_adam .................................. False 0: curriculum_learning ............................. False 0: data_impl ....................................... mmap 0: data_parallel_size .............................. 64 0: data_path ....................................... None 0: dataloader_type ................................. single 0: DDP_impl ........................................ local 0: decoder_seq_length .............................. None 0: deepscale ....................................... False 0: deepscale_config ................................ None 0: deepspeed ....................................... True 0: deepspeed_activation_checkpointing .............. False 0: deepspeed_config ................................ ds_configs/3326734.json 0: deepspeed_mpi ................................... False 0: distribute_checkpointed_activations ............. False 0: distributed_backend ............................. nccl 0: embed_layernorm ................................. False 0: embedding_path .................................. None 0: encoder_seq_length .............................. 2048 0: eod_mask_loss ................................... False 0: eval_interval ................................... 10000 0: eval_iters ...................................... 1 0: eval_only ....................................... None 0: evidence_data_path .............................. None 0: exit_duration_in_mins ........................... None 0: exit_interval ................................... None 0: ffn_hidden_size ................................. 3072 0: finetune ........................................ False 0: fp16 ............................................ False 0: fp16_lm_cross_entropy ........................... False 0: fp32_residual_connection ........................ False 0: gigaflos_no_embeds .............................. 0 0: global_batch_size ............................... 256 0: glu_activation .................................. None 0: hidden_dropout .................................. 0.1 0: hidden_size ..................................... 768 0: hysteresis ...................................... 2 0: ict_head_size ................................... None 0: ict_load ........................................ None 0: img_dim ......................................... 224 0: indexer_batch_size .............................. 128 0: indexer_log_interval ............................ 1000 0: inference ....................................... False 0: init_method_std ................................. 0.02 0: init_method_xavier_uniform ...................... False 0: initial_loss_scale .............................. 4294967296 0: kill_switch_path ................................ kill-switch-146m60b400m 0: kv_channels ..................................... 64 0: layer_norm_fusion ............................... True 0: layernorm_epsilon ............................... 1e-05 0: lazy_mpu_init ................................... None 0: load ............................................ checkpoints_146m60b400m 0: local_rank ...................................... None 0: log_batch_size_to_tensorboard ................... True 0: log_interval .................................... 100 0: log_learning_rate_to_tensorboard ................ True 0: log_level ....................................... None 0: log_level_replica ............................... None 0: log_loss_scale_to_tensorboard ................... True 0: log_num_zeros_in_grad ........................... False 0: log_params_norm ................................. False 0: log_path ........................................ None 0: log_timers_to_tensorboard ....................... True 0: log_validation_ppl_to_tensorboard ............... True 0: loss_on_targets_only ............................ False 0: loss_scale ...................................... 12.0 0: loss_scale_window ............................... 1000 0: lr .............................................. 0.0002 0: lr_decay_iters .................................. None 0: lr_decay_samples ................................ 29492188 0: lr_decay_style .................................. cosine 0: lr_decay_tokens ................................. None 0: lr_warmup_fraction .............................. None 0: lr_warmup_iters ................................. 0 0: lr_warmup_samples ............................... 294922 0: make_vocab_size_divisible_by .................... 128 0: mask_prob ....................................... 0.15 0: masked_softmax_fusion ........................... True 0: max_position_embeddings ......................... 2048 0: mean_noise_span_length .......................... None 0: memory_centric_tiled_linear ..................... False 0: merge_file ...................................... gpt2/merges.txt 0: micro_batch_size ................................ 4 0: min_loss_scale .................................. 1.0 0: min_lr .......................................... 2e-05 0: mmap_warmup ..................................... False 0: no_load_optim ................................... None 0: no_load_rng ..................................... None 0: no_save_optim ................................... None 0: no_save_rng ..................................... None 0: noise_density ................................... None 0: num_attention_heads ............................. 12 0: num_channels .................................... 3 0: num_classes ..................................... 1000 0: num_layers ...................................... 15 0: num_layers_per_virtual_pipeline_stage ........... None 0: num_workers ..................................... 2 0: onnx_safe ....................................... None 0: openai_gelu ..................................... False 0: optimizer ....................................... adam 0: optimizer_fusion ................................ True 0: override_lr_scheduler ........................... False 0: pad_vocab_size_to ............................... None 0: params_dtype .................................... torch.bfloat16 0: partition_activations ........................... False 0: patch_dim ....................................... 16 0: pipeline_model_parallel_size .................... 1 0: position_embedding_type ......................... PositionEmbeddingType.absolute 0: pp_partition_method ............................. None 0: profile_backward ................................ False 0: query_in_block_prob ............................. 0.1 0: rampup_batch_size ............................... None 0: rank ............................................ 0 0: remote_device ................................... none 0: reset_attention_mask ............................ False 0: reset_position_ids .............................. False 0: reset_progress .................................. None 0: retriever_report_topk_accuracies ................ [] 0: retriever_score_scaling ......................... False 0: retriever_seq_length ............................ 256 0: reweight_loss_based_on_position_frequency ....... False 0: sample_rate ..................................... 1.0 0: save ............................................ checkpoints_146m60b400m 0: save_interval ................................... 10000 0: scatter_gather_tensors_in_pipeline .............. True 0: scattered_embeddings ............................ False 0: seed ............................................ 1234 0: seq_length ...................................... 2048 0: sgd_momentum .................................... 0.9 0: short_seq_prob .................................. 0.1 0: skip_train_iteration_range ...................... None 0: split ........................................... None 0: split_transformers .............................. False 0: sync_tp_duplicated_parameters ................... False 0: synchronize_each_layer .......................... False 0: tensor_model_parallel_size ...................... 1 0: tensorboard_dir ................................. tensorboard_146m60b400m 0: tensorboard_log_interval ........................ 1 0: tensorboard_queue_size .......................... 5 0: test_weighted_split_paths ....................... None 0: test_weighted_split_paths_path .................. None 0: tile_factor ..................................... 1 0: titles_data_path ................................ None 0: tokenizer_name_or_path .......................... None 0: tokenizer_type .................................. GPT2BPETokenizer 0: train_iters ..................................... None 0: train_samples ................................... 29492188 0: train_tokens .................................... None 0: train_weighted_split_names ...................... ['train'] 0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document']] 0: train_weighted_split_paths_path ................. None 0: train_weighted_split_splits ..................... [['0:1']] 0: train_weighted_split_weights .................... [['1.0']] 0: universal_checkpoint ............................ False 0: use_bnb_optimizer ............................... False 0: use_checkpoint_lr_scheduler ..................... False 0: use_contiguous_buffers_in_ddp ................... True 0: use_cpu_initialization .......................... None 0: use_one_sent_docs ............................... False 0: use_pin_memory .................................. False 0: valid_num_workers ............................... 2 0: valid_weighted_split_names ...................... ['validation'] 0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] 0: valid_weighted_split_paths_path ................. None 0: valid_weighted_split_splits ..................... [['0:1']] 0: valid_weighted_split_weights .................... [['1.0']] 0: virtual_pipeline_model_parallel_size ............ None 0: vocab_extra_ids ................................. 0 0: vocab_file ...................................... gpt2/vocab.json 0: weight_decay .................................... 0.1 0: world_size ...................................... 64 0: zero_allgather_bucket_size ...................... 0.0 0: zero_contigious_gradients ....................... False 0: zero_reduce_bucket_size ......................... 0.0 0: zero_reduce_scatter ............................. False 0: zero_stage ...................................... 0 0: -------------------- end of arguments --------------------- 0: setting number of micro-batches to constant 1 0: > building GPT2BPETokenizer tokenizer ... 0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) 0: DeepSpeed general environment info: 0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] 0: torch version .................... 1.13.0+rocm5.2 0: torch cuda version ............... None 0: torch hip version ................ 5.2.21151-afdc89f8 0: nvcc version ..................... None 0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] 0: deepspeed info ................... 0.7.5, unknown, unknown 0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** 0: > initializing torch distributed ... 0: [2023-03-16 23:23:10,683] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl 7: > setting tensorboard ... 0: > initializing tensor model parallel with size 1 0: > initializing pipeline model parallel with size 1 0: > setting random seeds to 1234 ... 0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 0: > compiling dataset index builder ... 0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' 0: make: Nothing to be done for 'default'. 0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' 0: >>> done with dataset index builder. Compilation time: 0.113 seconds 0: > compiling and loading fused kernels ... 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] 0: Total number of unsupported CUDA function calls: 0 0: 0: 0: Total number of replaced kernel launches: 87 0: [1/1] c++ scaled_upper_triang_masked_softmax_hip.cuda.o scaled_upper_triang_masked_softmax_hip.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_upper_triang_masked_softmax_cuda.so 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] 0: Total number of unsupported CUDA function calls: 0 0: 0: 0: Total number of replaced kernel launches: 63 0: ninja: no work to do. 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] 0: Total number of unsupported CUDA function calls: 0 0: 0: 0: Total number of replaced kernel launches: 67 0: [1/1] c++ layer_norm_hip_kernel.cuda.o layer_norm_cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o fused_mix_prec_layer_norm_cuda.so 0: >>> done with compiling and loading fused kernels. Compilation time: 25.722 seconds 0: time to initialize megatron (seconds): 91.385 0: [after megatron is initialized] datetime: 2023-03-16 23:23:39 0: building GPT model ... 0: [2023-03-16 23:23:39,538] [INFO] [utils.py:827:see_memory_usage] Before Building Model 0: [2023-03-16 23:23:39,539] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB 0: [2023-03-16 23:23:39,539] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.59 GB, percent = 6.1% 0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None 0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi 0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} 0: [2023-03-16 23:23:41,532] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer 0: stage=0 layers=22 0: 0: _to_float16 0: 1: EmbeddingPipe 0: 2: 0: 3: ParallelTransformerLayerPipe 0: 4: ParallelTransformerLayerPipe 0: 5: ParallelTransformerLayerPipe 0: 6: ParallelTransformerLayerPipe 0: 7: ParallelTransformerLayerPipe 0: 8: ParallelTransformerLayerPipe 0: 9: ParallelTransformerLayerPipe 0: 10: ParallelTransformerLayerPipe 0: 11: ParallelTransformerLayerPipe 0: 12: ParallelTransformerLayerPipe 0: 13: ParallelTransformerLayerPipe 0: 14: ParallelTransformerLayerPipe 0: 15: ParallelTransformerLayerPipe 0: 16: ParallelTransformerLayerPipe 0: 17: ParallelTransformerLayerPipe 0: 18: undo 0: 19: MixedFusedLayerNorm 0: 20: EmbeddingPipe 0: 21: float16_to_fp32 0: loss: CrossEntropy 0: [2023-03-16 23:23:41,788] [INFO] [utils.py:827:see_memory_usage] After Building Model 0: [2023-03-16 23:23:41,788] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.29 GB Max_CA 0 GB 0: [2023-03-16 23:23:41,789] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.61 GB, percent = 6.1% 0: setting training iterations to 115203 0: > learning rate decay style: cosine 0: DeepSpeed is enabled. 0: [2023-03-16 23:23:41,790] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown 0: [2023-03-16 23:23:55,266] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False 0: [2023-03-16 23:23:55,267] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer 0: [2023-03-16 23:23:55,267] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer 0: [2023-03-16 23:23:55,271] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam 0: [2023-03-16 23:23:55,271] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer 0: [2023-03-16 23:23:55,387] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer 0: [2023-03-16 23:23:55,388] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.29 GB CA 0.31 GB Max_CA 0 GB 0: [2023-03-16 23:23:55,388] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.29 GB, percent = 6.2% 0: ninja: no work to do. 0: Time to load utils op: 0.15475678443908691 seconds 0: Time to load utils op: 0.001177072525024414 seconds 4: ninja: no work to do. 4: Time to load utils op: 0.15273833274841309 seconds 4: Time to load utils op: 0.3043992519378662 seconds 0: Time to load utils op: 0.20263099670410156 seconds 7: Time to load utils op: 0.310671329498291 seconds 0: Time to load utils op: 0.2025458812713623 seconds 0: Time to load utils op: 0.2026665210723877 seconds 0: Time to load utils op: 0.2028193473815918 seconds 0: Time to load utils op: 0.2029268741607666 seconds 0: Time to load utils op: 0.20311784744262695 seconds 0: Time to load utils op: 0.2032308578491211 seconds 4: Time to load utils op: 0.2021956443786621 seconds 4: Time to load utils op: 0.20206928253173828 seconds 4: Time to load utils op: 0.20261311531066895 seconds 4: Time to load utils op: 0.20233440399169922 seconds 4: Time to load utils op: 0.20287322998046875 seconds 4: Time to load utils op: 0.20258474349975586 seconds 7: Time to load utils op: 0.20245814323425293 seconds 7: Time to load utils op: 0.20219039916992188 seconds 7: Time to load utils op: 0.2028672695159912 seconds 7: Time to load utils op: 0.2029857635498047 secondsTime to load utils op: 0.20295119285583496 seconds 7: 7: Time to load utils op: 0.20309114456176758 seconds 7: Time to load utils op: 0.2033100128173828 seconds 1: Time to load utils op: 0.21236515045166016 secondsTime to load utils op: 0.21236896514892578 secondsTime to load utils op: 0.21236729621887207 secondsTime to load utils op: 0.2123713493347168 seconds 1: 1: Time to load utils op: 0.21236777305603027 seconds 1: 1: 1: Time to load utils op: 0.21237897872924805 seconds 1: Time to load utils op: 0.21237587928771973 secondsTime to load utils op: 0.21238446235656738 seconds 1: 3: Time to load utils op: 0.21043133735656738 seconds 0: Time to load utils op: 0.00042700767517089844 seconds 3: Time to load utils op: 0.2067427635192871 seconds 3: Time to load utils op: 0.20752930641174316 seconds 3: Time to load utils op: 0.2070906162261963 secondsTime to load utils op: 0.2068922519683838 seconds 3: 3: Time to load utils op: 0.20636582374572754 secondsTime to load utils op: 0.2067558765411377 seconds 3: 3: Time to load utils op: 0.20638728141784668 seconds 5: Time to load utils op: 0.2120959758758545 seconds 5: Time to load utils op: 0.2130134105682373 seconds 5: Time to load utils op: 0.2128298282623291 secondsTime to load utils op: 0.21332550048828125 seconds 5: 5: Time to load utils op: 0.21227216720581055 seconds 5: Time to load utils op: 0.21324896812438965 seconds 0: Time to load utils op: 0.0003972053527832031 seconds 5: Time to load utils op: 0.2121415138244629 seconds 5: Time to load utils op: 0.2133176326751709 seconds 4: Time to load utils op: 0.0005273818969726562 seconds 0: Time to load utils op: 0.0003771781921386719 seconds 4: Time to load utils op: 0.0004954338073730469 seconds 0: Time to load utils op: 0.00038886070251464844 seconds 0: Time to load utils op: 0.0003712177276611328 seconds 0: Time to load utils op: 0.0004062652587890625 seconds 4: Time to load utils op: 0.0003769397735595703 seconds 4: Time to load utils op: 0.0003540515899658203 seconds 4: Time to load utils op: 0.0003867149353027344 seconds 4: Time to load utils op: 0.0004019737243652344 seconds 4: Time to load utils op: 0.0003681182861328125 seconds 4: Time to load utils op: 0.0004210472106933594 seconds 2: Time to load utils op: 0.22558069229125977 seconds 2: Time to load utils op: 0.22560501098632812 seconds 2: Time to load utils op: 0.22561192512512207 seconds 2: Time to load utils op: 0.22560358047485352 secondsTime to load utils op: 0.2256169319152832 secondsTime to load utils op: 0.2256162166595459 seconds 2: 2: 2: Time to load utils op: 0.2256166934967041 seconds 2: Time to load utils op: 0.2256317138671875 seconds 6: Time to load utils op: 0.2206261157989502 seconds 6: Time to load utils op: 0.2206556797027588 seconds 6: Time to load utils op: 0.22068285942077637 seconds 6: Time to load utils op: 0.22069096565246582 seconds 6: Time to load utils op: 0.2206883430480957 seconds 6: Time to load utils op: 0.22071146965026855 seconds 6: Time to load utils op: 0.22069239616394043 seconds 6: Time to load utils op: 0.2207179069519043 seconds 7: Time to load utils op: 0.0005655288696289062 seconds 7: Time to load utils op: 0.0005199909210205078 secondsTime to load utils op: 0.0005748271942138672 seconds 7: Time to load utils op: 0.0005500316619873047 seconds 7: 7: Time to load utils op: 0.0005567073822021484 seconds 7: Time to load utils op: 0.0005886554718017578 seconds 7: Time to load utils op: 0.0006072521209716797 seconds 7: Time to load utils op: 0.0005755424499511719 seconds 5: Time to load utils op: 0.0009953975677490234 seconds 5: Time to load utils op: 0.0011470317840576172 seconds 5: Time to load utils op: 0.001453399658203125 seconds 5: Time to load utils op: 0.0014345645904541016 seconds 5: Time to load utils op: 0.00139617919921875 seconds 5: Time to load utils op: 0.0014424324035644531 seconds 5: Time to load utils op: 0.0014302730560302734 seconds 5: Time to load utils op: 0.0014736652374267578 seconds 6: Time to load utils op: 0.0007827281951904297 seconds 1: Time to load utils op: 0.0007424354553222656 seconds 1: Time to load utils op: 0.0007781982421875 seconds 1: Time to load utils op: 0.0009431838989257812 seconds 2: Time to load utils op: 0.0009343624114990234 seconds 6: Time to load utils op: 0.0010597705841064453 secondsTime to load utils op: 0.0011343955993652344 seconds 6: 6: Time to load utils op: 0.0010919570922851562 seconds 6: Time to load utils op: 0.0010843276977539062 seconds 1: Time to load utils op: 0.001176595687866211 seconds 1: Time to load utils op: 0.0010976791381835938 seconds 6: Time to load utils op: 0.0011067390441894531 seconds 6: Time to load utils op: 0.0010530948638916016 seconds 6: Time to load utils op: 0.0011713504791259766 seconds 1: Time to load utils op: 0.0011072158813476562 seconds 1: Time to load utils op: 0.0011146068572998047 seconds 1: Time to load utils op: 0.0012285709381103516 seconds 2: Time to load utils op: 0.0011856555938720703 seconds 2: Time to load utils op: 0.0012767314910888672 seconds 2: Time to load utils op: 0.0013148784637451172 seconds 2: Time to load utils op: 0.0012919902801513672 secondsTime to load utils op: 0.0012395381927490234 seconds 2: 2: Time to load utils op: 0.0012927055358886719 seconds 2: Time to load utils op: 0.0013556480407714844 seconds 3: Time to load utils op: 0.0005304813385009766 seconds 3: Time to load utils op: 0.0005481243133544922 seconds 3: Time to load utils op: 0.00047898292541503906 seconds 3: Time to load utils op: 0.00046443939208984375 seconds 3: Time to load utils op: 0.0004799365997314453 seconds 3: Time to load utils op: 0.00047135353088378906 seconds 3: Time to load utils op: 0.0005133152008056641 seconds 3: Time to load utils op: 0.0005099773406982422 seconds 0: [2023-03-16 23:23:55,715] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 0: [2023-03-16 23:23:55,716] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.31 GB Max_CA 0 GB 0: [2023-03-16 23:23:55,716] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.44 GB, percent = 6.2% 0: [2023-03-16 23:23:55,835] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 0: [2023-03-16 23:23:55,835] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB 0: [2023-03-16 23:23:55,836] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.44 GB, percent = 6.2% 0: [2023-03-16 23:23:55,941] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 0: [2023-03-16 23:23:55,941] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB 0: [2023-03-16 23:23:55,942] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.44 GB, percent = 6.2% 0: [2023-03-16 23:23:56,045] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 0: [2023-03-16 23:23:56,046] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB 0: [2023-03-16 23:23:56,046] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.44 GB, percent = 6.2% 0: [2023-03-16 23:23:56,148] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 0: [2023-03-16 23:23:56,149] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB 0: [2023-03-16 23:23:56,149] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.44 GB, percent = 6.2% 0: [2023-03-16 23:23:56,253] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 0: [2023-03-16 23:23:56,254] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB 0: [2023-03-16 23:23:56,254] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.44 GB, percent = 6.2% 0: [2023-03-16 23:23:56,356] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer 0: [2023-03-16 23:23:56,357] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB 0: [2023-03-16 23:23:56,357] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.44 GB, percent = 6.2% 0: [2023-03-16 23:23:56,463] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer 0: [2023-03-16 23:23:56,464] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB 0: [2023-03-16 23:23:56,464] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.44 GB, percent = 6.2% 0: [2023-03-16 23:23:56,567] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer 0: [2023-03-16 23:23:56,567] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB 0: [2023-03-16 23:23:56,567] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.44 GB, percent = 6.2% 0: [2023-03-16 23:23:56,568] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam 0: [2023-03-16 23:23:56,568] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler 0: [2023-03-16 23:23:56,568] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = 0: [2023-03-16 23:23:56,568] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 0: [2023-03-16 23:23:56,568] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] activation_checkpointing_config { 0: "partition_activations": false, 0: "contiguous_memory_optimization": false, 0: "cpu_checkpointing": false, 0: "number_checkpoints": null, 0: "synchronize_checkpoint_boundary": false, 0: "profile": false 0: } 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] amp_enabled .................. False 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] amp_params ................... False 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] autotuning_config ............ { 0: "enabled": false, 0: "start_step": null, 0: "end_step": null, 0: "metric_path": null, 0: "arg_mappings": null, 0: "metric": "throughput", 0: "model_info": null, 0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", 0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", 0: "overwrite": true, 0: "fast": true, 0: "start_profile_step": 3, 0: "end_profile_step": 5, 0: "tuner_type": "gridsearch", 0: "tuner_early_stopping": 5, 0: "tuner_num_trials": 50, 0: "model_info_path": null, 0: "mp_size": 1, 0: "max_train_batch_size": null, 0: "min_train_batch_size": 1, 0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, 0: "min_train_micro_batch_size_per_gpu": 1, 0: "num_tuning_micro_batch_sizes": 3 0: } 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] bfloat16_enabled ............. True 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] comms_config ................. 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] communication_data_type ...... None 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa 0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] curriculum_enabled ........... False 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] curriculum_params ............ False 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] dataloader_drop_last ......... False 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] disable_allgather ............ False 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] dump_state ................... False 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] elasticity_enabled ........... False 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] flops_profiler_config ........ { 0: "enabled": false, 0: "profile_step": 1, 0: "module_depth": -1, 0: "top_modules": 1, 0: "detailed": true, 0: "output_file": null 0: } 0: [2023-03-16 23:23:56,569] [INFO] [config.py:1011:print] fp16_auto_cast ............... None 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] fp16_enabled ................. False 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] global_rank .................. 0 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] load_universal_checkpoint .... False 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] loss_scale ................... 1.0 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] memory_breakdown ............. False 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] monitor_config ............... 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] nebula_config ................ { 0: "enabled": false, 0: "persistent_storage_path": null, 0: "persistent_time_interval": 100, 0: "num_of_version_in_retention": 2, 0: "enable_nebula_load": true, 0: "load_path": null 0: } 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] optimizer_name ............... None 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] optimizer_params ............. None 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] pld_enabled .................. False 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] pld_params ................... False 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] prescale_gradients ........... False 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] scheduler_name ............... None 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] scheduler_params ............. None 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] sparse_attention ............. None 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] steps_per_print .............. 2000 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] train_batch_size ............. 256 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] use_node_local_storage ....... False 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] world_size ................... 64 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] zero_enabled ................. False 0: [2023-03-16 23:23:56,570] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 0: [2023-03-16 23:23:56,570] [INFO] [config.py:996:print_user_config] json = { 0: "train_micro_batch_size_per_gpu": 4, 0: "train_batch_size": 256, 0: "gradient_clipping": 1.0, 0: "zero_optimization": { 0: "stage": 0 0: }, 0: "bf16": { 0: "enabled": true 0: }, 0: "steps_per_print": 2.000000e+03, 0: "wall_clock_breakdown": false 0: } 0: Time to load utils op: 0.0004239082336425781 seconds 0: [2023-03-16 23:23:56,571] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 0: [2023-03-16 23:23:56,582] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=22 [0, 22) STAGE_PARAMS=146525952 (146.526M) TOTAL_PARAMS=146525952 (146.526M) UNIQUE_PARAMS=146525952 (146.526M) 0: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: WARNING: could not find the metadata file checkpoints_146m60b400m 0: will not load any checkpoints and will start from random 6: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-16 23:23:56,591] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-16 23:23:56,590] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-16 23:23:56,591] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-16 23:23:56,591] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-16 23:23:56,591] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-16 23:23:56,591] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-16 23:23:56,591] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b400m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: time (ms) | load-checkpoint: 8.22 0: estimated model parameters: 0.146525952 0: estimated model parameters without embeddings: 0.106319616 0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-16 23:23:57 0: > building train, validation, and test datasets ... 0: > datasets target sizes (minimum size): 0: train: 29492188 0: validation: 3072 0: test: 256 0: > building train, validation, and test datasets for GPT ... 0: > building dataset index ... 0: reading sizes... 0: reading pointers... 0: reading document index... 0: creating numpy buffer of mmap... 0: creating memory view of numpy buffer... 0: > finished creating indexed dataset in 0.008479 seconds 0: number of documents: 835726 0: > dataset split: 0: train: 0: document indices in [0, 835726) total of 835726 documents 0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_29492188ns_2048sl_1234s_doc_idx.npy 0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_29492188ns_2048sl_1234s_sample_idx.npy 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_400M_text_document_train_indexmap_29492188ns_2048sl_1234s_shuffle_idx.npy 0: loaded indexed file in 0.099 seconds 0: total number of samples: 29655283 0: total number of epochs: 152 0: > building dataset index ... 0: reading sizes... 0: reading pointers... 0: reading document index... 0: creating numpy buffer of mmap... 0: creating memory view of numpy buffer... 0: > finished creating indexed dataset in 0.040353 seconds 0: number of documents: 364608 0: > dataset split: 0: validation: 0: document indices in [0, 364608) total of 364608 documents 0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_3072ns_2048sl_1234s_doc_idx.npy 0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_3072ns_2048sl_1234s_sample_idx.npy 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_3072ns_2048sl_1234s_shuffle_idx.npy 0: loaded indexed file in 0.105 seconds 0: total number of samples: 84978 0: total number of epochs: 1 0: > finished creating GPT datasets ... 0: [after dataloaders are built] datetime: 2023-03-16 23:24:10 0: done with setup ... 0: training ... 0: Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: 7: time (ms) | model-and-optimizer-setup: 17554.60 | train/valid/test-data-iterators-setup: 12786.11 0: [000-000] 0.1465B / 0.1063B 0: [before the start of training step] datetime: 2023-03-16 23:24:10 0: [2023-03-16 23:24:10,940] [INFO] [checkpointing.py:553:forward] Activation Checkpointing Information 0: [2023-03-16 23:24:10,940] [INFO] [checkpointing.py:554:forward] ----Partition Activations False, CPU CHECKPOINTING False 0: [2023-03-16 23:24:10,940] [INFO] [checkpointing.py:557:forward] ----contiguous Memory Checkpointing False with None total layers 0: [2023-03-16 23:24:10,940] [INFO] [checkpointing.py:560:forward] ----Synchronization False 0: [2023-03-16 23:24:10,940] [INFO] [checkpointing.py:561:forward] ----Profiling time in checkpointing False 0: [Rank 0] (after 100 iterations) memory (MB) | allocated: 2728.54736328125 | max allocated: 5305.046875 | reserved: 6818.0 | max reserved: 6818.0 7: iteration 100/ 115203 | consumed samples: 25600 | consumed tokens: 52428800 | elapsed time per iteration (s): 0.49 | learning rate: 1.736E-05 | global batch size: 256 | lm loss: 9.484871E+00 | grad norm: 1.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 519.448 | TFLOPs: 24.25 | 7: iteration 200/ 115203 | consumed samples: 51200 | consumed tokens: 104857600 | elapsed time per iteration (s): 0.39 | learning rate: 3.472E-05 | global batch size: 256 | lm loss: 7.679164E+00 | grad norm: 0.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 662.541 | TFLOPs: 30.92 | 7: iteration 300/ 115203 | consumed samples: 76800 | consumed tokens: 157286400 | elapsed time per iteration (s): 0.38 | learning rate: 5.208E-05 | global batch size: 256 | lm loss: 6.815796E+00 | grad norm: 1.008 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.018 | TFLOPs: 31.32 | 7: iteration 400/ 115203 | consumed samples: 102400 | consumed tokens: 209715200 | elapsed time per iteration (s): 0.39 | learning rate: 6.944E-05 | global batch size: 256 | lm loss: 6.469844E+00 | grad norm: 1.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 662.993 | TFLOPs: 30.95 | 7: iteration 500/ 115203 | consumed samples: 128000 | consumed tokens: 262144000 | elapsed time per iteration (s): 0.39 | learning rate: 8.680E-05 | global batch size: 256 | lm loss: 6.268621E+00 | grad norm: 1.179 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 662.286 | TFLOPs: 30.91 | 7: iteration 600/ 115203 | consumed samples: 153600 | consumed tokens: 314572800 | elapsed time per iteration (s): 0.39 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 6.087438E+00 | grad norm: 0.788 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 654.844 | TFLOPs: 30.57 | 7: iteration 700/ 115203 | consumed samples: 179200 | consumed tokens: 367001600 | elapsed time per iteration (s): 0.39 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 5.902330E+00 | grad norm: 1.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.774 | TFLOPs: 31.03 | 7: iteration 800/ 115203 | consumed samples: 204800 | consumed tokens: 419430400 | elapsed time per iteration (s): 0.38 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 5.717325E+00 | grad norm: 1.142 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.215 | TFLOPs: 31.28 | 7: iteration 900/ 115203 | consumed samples: 230400 | consumed tokens: 471859200 | elapsed time per iteration (s): 0.38 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 5.533638E+00 | grad norm: 0.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.033 | TFLOPs: 31.32 | 7: iteration 1000/ 115203 | consumed samples: 256000 | consumed tokens: 524288000 | elapsed time per iteration (s): 0.38 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 5.359065E+00 | grad norm: 0.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.798 | TFLOPs: 31.45 | 7: iteration 1100/ 115203 | consumed samples: 281600 | consumed tokens: 576716800 | elapsed time per iteration (s): 0.38 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 5.194250E+00 | grad norm: 0.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.111 | TFLOPs: 31.61 | 7: iteration 1200/ 115203 | consumed samples: 307200 | consumed tokens: 629145600 | elapsed time per iteration (s): 0.38 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.012748E+00 | grad norm: 0.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.258 | TFLOPs: 31.47 | 7: iteration 1300/ 115203 | consumed samples: 332800 | consumed tokens: 681574400 | elapsed time per iteration (s): 0.38 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.828039E+00 | grad norm: 0.943 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.273 | TFLOPs: 31.33 | 7: iteration 1400/ 115203 | consumed samples: 358400 | consumed tokens: 734003200 | elapsed time per iteration (s): 0.38 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.700465E+00 | grad norm: 0.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.115 | TFLOPs: 31.79 | 7: iteration 1500/ 115203 | consumed samples: 384000 | consumed tokens: 786432000 | elapsed time per iteration (s): 0.37 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.609189E+00 | grad norm: 0.597 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.943 | TFLOPs: 31.97 | 7: iteration 1600/ 115203 | consumed samples: 409600 | consumed tokens: 838860800 | elapsed time per iteration (s): 0.38 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.541578E+00 | grad norm: 0.605 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.117 | TFLOPs: 31.70 | 7: iteration 1700/ 115203 | consumed samples: 435200 | consumed tokens: 891289600 | elapsed time per iteration (s): 0.38 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.480572E+00 | grad norm: 0.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.447 | TFLOPs: 31.57 | 7: iteration 1800/ 115203 | consumed samples: 460800 | consumed tokens: 943718400 | elapsed time per iteration (s): 0.38 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.438905E+00 | grad norm: 0.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.471 | TFLOPs: 31.58 | 7: iteration 1900/ 115203 | consumed samples: 486400 | consumed tokens: 996147200 | elapsed time per iteration (s): 0.38 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.392836E+00 | grad norm: 0.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.033 | TFLOPs: 31.74 | 0: [2023-03-16 23:37:03,487] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=0, lr=[0.0001999754506631688, 0.0001999754506631688, 0.0001999754506631688], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 2000/ 115203 | consumed samples: 512000 | consumed tokens: 1048576000 | elapsed time per iteration (s): 0.38 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.346132E+00 | grad norm: 0.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.295 | TFLOPs: 31.61 | 0: steps: 2000 loss: 4.3318 iter time (s): 0.385 samples/sec: 665.734 7: iteration 2100/ 115203 | consumed samples: 537600 | consumed tokens: 1101004800 | elapsed time per iteration (s): 0.37 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.316991E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.071 | TFLOPs: 31.93 | 7: iteration 2200/ 115203 | consumed samples: 563200 | consumed tokens: 1153433600 | elapsed time per iteration (s): 0.37 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.287134E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.300 | TFLOPs: 31.94 | 7: iteration 2300/ 115203 | consumed samples: 588800 | consumed tokens: 1205862400 | elapsed time per iteration (s): 0.38 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.255029E+00 | grad norm: 0.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.073 | TFLOPs: 31.84 | 7: iteration 2400/ 115203 | consumed samples: 614400 | consumed tokens: 1258291200 | elapsed time per iteration (s): 0.38 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 4.228290E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.140 | TFLOPs: 31.79 | 7: iteration 2500/ 115203 | consumed samples: 640000 | consumed tokens: 1310720000 | elapsed time per iteration (s): 0.37 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 4.202252E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.505 | TFLOPs: 32.04 | 7: iteration 2600/ 115203 | consumed samples: 665600 | consumed tokens: 1363148800 | elapsed time per iteration (s): 0.37 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 4.179962E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.750 | TFLOPs: 32.05 | 7: iteration 2700/ 115203 | consumed samples: 691200 | consumed tokens: 1415577600 | elapsed time per iteration (s): 0.37 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 4.154895E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.119 | TFLOPs: 32.21 | 7: iteration 2800/ 115203 | consumed samples: 716800 | consumed tokens: 1468006400 | elapsed time per iteration (s): 0.37 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 4.138191E+00 | grad norm: 0.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.200 | TFLOPs: 31.98 | 7: iteration 2900/ 115203 | consumed samples: 742400 | consumed tokens: 1520435200 | elapsed time per iteration (s): 0.37 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 4.116041E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.920 | TFLOPs: 32.06 | 7: iteration 3000/ 115203 | consumed samples: 768000 | consumed tokens: 1572864000 | elapsed time per iteration (s): 0.37 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 4.097479E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.415 | TFLOPs: 32.18 | 7: iteration 3100/ 115203 | consumed samples: 793600 | consumed tokens: 1625292800 | elapsed time per iteration (s): 0.37 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 4.083486E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.331 | TFLOPs: 32.18 | 7: iteration 3200/ 115203 | consumed samples: 819200 | consumed tokens: 1677721600 | elapsed time per iteration (s): 0.37 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 4.068229E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.057 | TFLOPs: 32.16 | 7: iteration 3300/ 115203 | consumed samples: 844800 | consumed tokens: 1730150400 | elapsed time per iteration (s): 0.37 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.053554E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.936 | TFLOPs: 32.20 | 7: iteration 3400/ 115203 | consumed samples: 870400 | consumed tokens: 1782579200 | elapsed time per iteration (s): 0.37 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.036441E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.171 | TFLOPs: 32.03 | 7: iteration 3500/ 115203 | consumed samples: 896000 | consumed tokens: 1835008000 | elapsed time per iteration (s): 0.37 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.018178E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.870 | TFLOPs: 32.15 | 7: iteration 3600/ 115203 | consumed samples: 921600 | consumed tokens: 1887436800 | elapsed time per iteration (s): 0.37 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.007709E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.052 | TFLOPs: 32.16 | 7: iteration 3700/ 115203 | consumed samples: 947200 | consumed tokens: 1939865600 | elapsed time per iteration (s): 0.37 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.996023E+00 | grad norm: 0.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.981 | TFLOPs: 32.07 | 7: iteration 3800/ 115203 | consumed samples: 972800 | consumed tokens: 1992294400 | elapsed time per iteration (s): 0.37 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.985421E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.107 | TFLOPs: 32.16 | 7: iteration 3900/ 115203 | consumed samples: 998400 | consumed tokens: 2044723200 | elapsed time per iteration (s): 0.37 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.970894E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.855 | TFLOPs: 32.25 | 0: [2023-03-16 23:49:28,428] [INFO] [logging.py:68:log_dist] [Rank 0] step=4000, skipped=0, lr=[0.00019972320825211248, 0.00019972320825211248, 0.00019972320825211248], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 4000/ 115203 | consumed samples: 1024000 | consumed tokens: 2097152000 | elapsed time per iteration (s): 0.37 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.961776E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.433 | TFLOPs: 32.23 | 0: steps: 4000 loss: 3.9176 iter time (s): 0.370 samples/sec: 691.026 7: iteration 4100/ 115203 | consumed samples: 1049600 | consumed tokens: 2149580800 | elapsed time per iteration (s): 0.37 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.953803E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.180 | TFLOPs: 32.22 | 7: iteration 4200/ 115203 | consumed samples: 1075200 | consumed tokens: 2202009600 | elapsed time per iteration (s): 0.37 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.938759E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.716 | TFLOPs: 32.29 | 7: iteration 4300/ 115203 | consumed samples: 1100800 | consumed tokens: 2254438400 | elapsed time per iteration (s): 0.37 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.928066E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.795 | TFLOPs: 32.29 | 7: iteration 4400/ 115203 | consumed samples: 1126400 | consumed tokens: 2306867200 | elapsed time per iteration (s): 0.37 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.916086E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.834 | TFLOPs: 32.29 | 7: iteration 4500/ 115203 | consumed samples: 1152000 | consumed tokens: 2359296000 | elapsed time per iteration (s): 0.37 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.915997E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.695 | TFLOPs: 32.29 | 7: iteration 4600/ 115203 | consumed samples: 1177600 | consumed tokens: 2411724800 | elapsed time per iteration (s): 0.37 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.900874E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.385 | TFLOPs: 32.32 | 7: iteration 4700/ 115203 | consumed samples: 1203200 | consumed tokens: 2464153600 | elapsed time per iteration (s): 0.37 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.892840E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.907 | TFLOPs: 32.16 | 7: iteration 4800/ 115203 | consumed samples: 1228800 | consumed tokens: 2516582400 | elapsed time per iteration (s): 0.37 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.882703E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.280 | TFLOPs: 32.31 | 7: iteration 4900/ 115203 | consumed samples: 1254400 | consumed tokens: 2569011200 | elapsed time per iteration (s): 0.37 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.875167E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.661 | TFLOPs: 32.24 | 7: iteration 5000/ 115203 | consumed samples: 1280000 | consumed tokens: 2621440000 | elapsed time per iteration (s): 0.37 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.870123E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.219 | TFLOPs: 32.31 | 7: iteration 5100/ 115203 | consumed samples: 1305600 | consumed tokens: 2673868800 | elapsed time per iteration (s): 0.37 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.858556E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.862 | TFLOPs: 32.34 | 7: iteration 5200/ 115203 | consumed samples: 1331200 | consumed tokens: 2726297600 | elapsed time per iteration (s): 0.37 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.857785E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.836 | TFLOPs: 32.34 | 7: iteration 5300/ 115203 | consumed samples: 1356800 | consumed tokens: 2778726400 | elapsed time per iteration (s): 0.37 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.847967E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.542 | TFLOPs: 32.00 | 7: iteration 5400/ 115203 | consumed samples: 1382400 | consumed tokens: 2831155200 | elapsed time per iteration (s): 0.37 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.842799E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.037 | TFLOPs: 32.26 | 7: iteration 5500/ 115203 | consumed samples: 1408000 | consumed tokens: 2883584000 | elapsed time per iteration (s): 0.37 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.830702E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.491 | TFLOPs: 32.32 | 7: iteration 5600/ 115203 | consumed samples: 1433600 | consumed tokens: 2936012800 | elapsed time per iteration (s): 0.37 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.824853E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.213 | TFLOPs: 32.17 | 7: iteration 5700/ 115203 | consumed samples: 1459200 | consumed tokens: 2988441600 | elapsed time per iteration (s): 0.37 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.816799E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.083 | TFLOPs: 32.21 | 7: iteration 5800/ 115203 | consumed samples: 1484800 | consumed tokens: 3040870400 | elapsed time per iteration (s): 0.37 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.812826E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.912 | TFLOPs: 32.34 | 7: iteration 5900/ 115203 | consumed samples: 1510400 | consumed tokens: 3093299200 | elapsed time per iteration (s): 0.37 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.801189E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.494 | TFLOPs: 32.28 | 0: [2023-03-17 00:01:49,257] [INFO] [logging.py:68:log_dist] [Rank 0] step=6000, skipped=0, lr=[0.00019919872690019844, 0.00019919872690019844, 0.00019919872690019844], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 6000/ 115203 | consumed samples: 1536000 | consumed tokens: 3145728000 | elapsed time per iteration (s): 0.37 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.796833E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.277 | TFLOPs: 32.22 | 0: steps: 6000 loss: 3.8422 iter time (s): 0.368 samples/sec: 695.084 7: iteration 6100/ 115203 | consumed samples: 1561600 | consumed tokens: 3198156800 | elapsed time per iteration (s): 0.37 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.792737E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.052 | TFLOPs: 32.35 | 7: iteration 6200/ 115203 | consumed samples: 1587200 | consumed tokens: 3250585600 | elapsed time per iteration (s): 0.37 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.785873E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.126 | TFLOPs: 32.35 | 7: iteration 6300/ 115203 | consumed samples: 1612800 | consumed tokens: 3303014400 | elapsed time per iteration (s): 0.37 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.782452E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.830 | TFLOPs: 32.29 | 7: iteration 6400/ 115203 | consumed samples: 1638400 | consumed tokens: 3355443200 | elapsed time per iteration (s): 0.37 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.775261E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.290 | TFLOPs: 32.27 | 7: iteration 6500/ 115203 | consumed samples: 1664000 | consumed tokens: 3407872000 | elapsed time per iteration (s): 0.37 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.772836E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.280 | TFLOPs: 32.08 | 7: iteration 6600/ 115203 | consumed samples: 1689600 | consumed tokens: 3460300800 | elapsed time per iteration (s): 0.37 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.768607E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.013 | TFLOPs: 32.21 | 7: iteration 6700/ 115203 | consumed samples: 1715200 | consumed tokens: 3512729600 | elapsed time per iteration (s): 0.37 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.757554E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.058 | TFLOPs: 32.35 | 7: iteration 6800/ 115203 | consumed samples: 1740800 | consumed tokens: 3565158400 | elapsed time per iteration (s): 0.37 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.756965E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.027 | TFLOPs: 32.30 | 7: iteration 6900/ 115203 | consumed samples: 1766400 | consumed tokens: 3617587200 | elapsed time per iteration (s): 0.37 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.751232E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.802 | TFLOPs: 32.34 | 7: iteration 7000/ 115203 | consumed samples: 1792000 | consumed tokens: 3670016000 | elapsed time per iteration (s): 0.37 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.752580E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.162 | TFLOPs: 32.21 | 7: iteration 7100/ 115203 | consumed samples: 1817600 | consumed tokens: 3722444800 | elapsed time per iteration (s): 0.37 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.743446E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.320 | TFLOPs: 32.36 | 7: iteration 7200/ 115203 | consumed samples: 1843200 | consumed tokens: 3774873600 | elapsed time per iteration (s): 0.37 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.743470E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.823 | TFLOPs: 32.34 | 7: iteration 7300/ 115203 | consumed samples: 1868800 | consumed tokens: 3827302400 | elapsed time per iteration (s): 0.37 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.731718E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.825 | TFLOPs: 32.34 | 7: iteration 7400/ 115203 | consumed samples: 1894400 | consumed tokens: 3879731200 | elapsed time per iteration (s): 0.37 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.729053E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.945 | TFLOPs: 32.34 | 7: iteration 7500/ 115203 | consumed samples: 1920000 | consumed tokens: 3932160000 | elapsed time per iteration (s): 0.37 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.725275E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.845 | TFLOPs: 32.34 | 7: iteration 7600/ 115203 | consumed samples: 1945600 | consumed tokens: 3984588800 | elapsed time per iteration (s): 0.37 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.718953E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.930 | TFLOPs: 32.30 | 7: iteration 7700/ 115203 | consumed samples: 1971200 | consumed tokens: 4037017600 | elapsed time per iteration (s): 0.37 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.715694E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.880 | TFLOPs: 32.34 | 7: iteration 7800/ 115203 | consumed samples: 1996800 | consumed tokens: 4089446400 | elapsed time per iteration (s): 0.37 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.708808E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.736 | TFLOPs: 32.33 | 7: iteration 7900/ 115203 | consumed samples: 2022400 | consumed tokens: 4141875200 | elapsed time per iteration (s): 0.37 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.707235E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.378 | TFLOPs: 32.36 | 0: [2023-03-17 00:14:08,949] [INFO] [logging.py:68:log_dist] [Rank 0] step=8000, skipped=0, lr=[0.00019840359799331808, 0.00019840359799331808, 0.00019840359799331808], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 8000/ 115203 | consumed samples: 2048000 | consumed tokens: 4194304000 | elapsed time per iteration (s): 0.37 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.706508E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.374 | TFLOPs: 32.36 | 0: steps: 8000 loss: 3.7332 iter time (s): 0.368 samples/sec: 696.034 7: iteration 8100/ 115203 | consumed samples: 2073600 | consumed tokens: 4246732800 | elapsed time per iteration (s): 0.37 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.699150E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.315 | TFLOPs: 32.36 | 7: iteration 8200/ 115203 | consumed samples: 2099200 | consumed tokens: 4299161600 | elapsed time per iteration (s): 0.37 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.695820E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.411 | TFLOPs: 32.18 | 7: iteration 8300/ 115203 | consumed samples: 2124800 | consumed tokens: 4351590400 | elapsed time per iteration (s): 0.37 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.688670E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.258 | TFLOPs: 32.27 | 7: iteration 8400/ 115203 | consumed samples: 2150400 | consumed tokens: 4404019200 | elapsed time per iteration (s): 0.37 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.683102E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.473 | TFLOPs: 32.37 | 7: iteration 8500/ 115203 | consumed samples: 2176000 | consumed tokens: 4456448000 | elapsed time per iteration (s): 0.37 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.684876E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.433 | TFLOPs: 32.32 | 7: iteration 8600/ 115203 | consumed samples: 2201600 | consumed tokens: 4508876800 | elapsed time per iteration (s): 0.37 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.680855E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.667 | TFLOPs: 32.33 | 7: iteration 8700/ 115203 | consumed samples: 2227200 | consumed tokens: 4561305600 | elapsed time per iteration (s): 0.37 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.679194E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.109 | TFLOPs: 32.35 | 7: iteration 8800/ 115203 | consumed samples: 2252800 | consumed tokens: 4613734400 | elapsed time per iteration (s): 0.37 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.671020E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.817 | TFLOPs: 32.34 | 7: iteration 8900/ 115203 | consumed samples: 2278400 | consumed tokens: 4666163200 | elapsed time per iteration (s): 0.37 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.670084E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.022 | TFLOPs: 32.35 | 7: iteration 9000/ 115203 | consumed samples: 2304000 | consumed tokens: 4718592000 | elapsed time per iteration (s): 0.37 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.670360E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.740 | TFLOPs: 32.33 | 7: iteration 9100/ 115203 | consumed samples: 2329600 | consumed tokens: 4771020800 | elapsed time per iteration (s): 0.37 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.662920E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.372 | TFLOPs: 32.36 | 7: iteration 9200/ 115203 | consumed samples: 2355200 | consumed tokens: 4823449600 | elapsed time per iteration (s): 0.37 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.658157E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.538 | TFLOPs: 32.37 | 7: iteration 9300/ 115203 | consumed samples: 2380800 | consumed tokens: 4875878400 | elapsed time per iteration (s): 0.37 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.658419E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.118 | TFLOPs: 32.35 | 7: iteration 9400/ 115203 | consumed samples: 2406400 | consumed tokens: 4928307200 | elapsed time per iteration (s): 0.37 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.657258E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.712 | TFLOPs: 32.33 | 7: iteration 9500/ 115203 | consumed samples: 2432000 | consumed tokens: 4980736000 | elapsed time per iteration (s): 0.37 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.655745E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.483 | TFLOPs: 32.32 | 7: iteration 9600/ 115203 | consumed samples: 2457600 | consumed tokens: 5033164800 | elapsed time per iteration (s): 0.37 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.644161E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.239 | TFLOPs: 32.31 | 7: iteration 9700/ 115203 | consumed samples: 2483200 | consumed tokens: 5085593600 | elapsed time per iteration (s): 0.37 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.645582E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.983 | TFLOPs: 32.35 | 7: iteration 9800/ 115203 | consumed samples: 2508800 | consumed tokens: 5138022400 | elapsed time per iteration (s): 0.37 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.637863E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.622 | TFLOPs: 32.38 | 7: iteration 9900/ 115203 | consumed samples: 2534400 | consumed tokens: 5190451200 | elapsed time per iteration (s): 0.37 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.640139E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 694.017 | TFLOPs: 32.39 | 0: [2023-03-17 00:26:27,963] [INFO] [logging.py:68:log_dist] [Rank 0] step=10000, skipped=0, lr=[0.00019734023411853413, 0.00019734023411853413, 0.00019734023411853413], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 10000/ 115203 | consumed samples: 2560000 | consumed tokens: 5242880000 | elapsed time per iteration (s): 0.37 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.634106E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.995 | TFLOPs: 32.39 | 0: steps: 10000 loss: 3.6663 iter time (s): 0.367 samples/sec: 696.633 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 10000 | lm loss value: 3.672648E+00 | lm loss PPL: 3.935599E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 10000 to checkpoints_146m60b400m 0: [2023-03-17 00:26:28,085] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step10000 is begin to save! 0: [2023-03-17 00:26:28,106] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:26:28,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:26:28,194] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:26:28,210] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:26:28,210] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:26:28,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:26:28,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:26:28,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:26:28,241] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:26:28,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:26:28,257] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/layer_07-model_00-model_states.pt... 0: [2023-03-17 00:26:28,272] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/layer_07-model_00-model_states.pt. 0: [2023-03-17 00:26:28,272] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:26:28,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:26:28,287] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/layer_09-model_00-model_states.pt... 0: [2023-03-17 00:26:28,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/layer_09-model_00-model_states.pt. 0: [2023-03-17 00:26:28,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/layer_10-model_00-model_states.pt... 0: [2023-03-17 00:26:28,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/layer_10-model_00-model_states.pt. 0: [2023-03-17 00:26:28,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/layer_11-model_00-model_states.pt... 0: [2023-03-17 00:26:28,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/layer_11-model_00-model_states.pt. 0: [2023-03-17 00:26:28,334] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/layer_12-model_00-model_states.pt... 0: [2023-03-17 00:26:28,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/layer_12-model_00-model_states.pt. 0: [2023-03-17 00:26:28,349] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/layer_13-model_00-model_states.pt... 0: [2023-03-17 00:26:28,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/layer_13-model_00-model_states.pt. 0: [2023-03-17 00:26:28,364] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/layer_14-model_00-model_states.pt... 0: [2023-03-17 00:26:28,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/layer_14-model_00-model_states.pt. 0: [2023-03-17 00:26:28,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/layer_15-model_00-model_states.pt... 0: [2023-03-17 00:26:28,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/layer_15-model_00-model_states.pt. 0: [2023-03-17 00:26:28,395] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/layer_16-model_00-model_states.pt... 0: [2023-03-17 00:26:28,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/layer_16-model_00-model_states.pt. 0: [2023-03-17 00:26:28,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/layer_17-model_00-model_states.pt... 0: [2023-03-17 00:26:28,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/layer_17-model_00-model_states.pt. 0: [2023-03-17 00:26:28,426] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/layer_19-model_00-model_states.pt... 0: [2023-03-17 00:26:28,427] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/layer_19-model_00-model_states.pt. 0: [2023-03-17 00:26:28,428] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b400m/global_step10000/mp_rank_00_model_states.pt 0: [2023-03-17 00:26:28,428] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:26:28,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:26:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:26:28,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:26:28,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:26:28,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:26:28,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-17 00:26:28,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:26:28,488] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 00:26:28,488] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-17 00:26:28,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:26:28,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 00:26:28,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-17 00:26:28,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:26:28,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:26:28,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-17 00:26:28,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:26:28,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:26:28,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:26:28,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:26:28,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:26:28,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:26:28,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 00:26:28,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:26:28,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:26:28,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:26:28,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:26:28,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:26:28,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:26:28,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:26:28,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:26:28,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-17 00:26:28,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 00:26:28,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-17 00:26:28,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:26:28,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:26:28,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:26:28,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:26:28,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 0: [2023-03-17 00:26:28,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-17 00:26:28,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-17 00:26:28,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:26:28,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-17 00:26:28,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:26:28,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:26:28,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 00:26:28,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:26:28,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-17 00:26:28,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-17 00:26:28,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:26:28,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 00:26:28,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-17 00:26:28,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:26:28,496] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:26:28,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-17 00:26:28,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:26:28,496] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 00:26:28,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-17 00:26:28,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:26:28,496] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:26:28,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-17 00:26:28,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:26:28,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 00:26:28,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-17 00:26:28,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:26:28,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:26:28,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:26:28,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:26:28,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 00:26:28,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-17 00:26:28,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 00:26:28,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-17 00:26:28,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:26:28,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-17 00:26:28,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 00:26:28,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-17 00:26:28,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:26:28,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:26:28,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:26:28,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:26:28,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:26:28,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:26:28,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:26:28,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:26:28,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:26:28,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-17 00:26:28,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:26:28,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-17 00:26:28,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 00:26:28,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 00:26:28,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 00:26:28,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:26:28,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-17 00:26:28,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-17 00:26:28,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-17 00:26:28,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-17 00:26:28,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-17 00:26:28,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:26:28,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 00:26:28,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-17 00:26:28,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:26:28,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:26:28,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:26:28,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 00:26:28,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:26:28,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 00:26:28,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-17 00:26:28,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-17 00:26:28,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-17 00:26:28,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:26:28,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:26:28,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-17 00:26:28,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:26:28,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:26:28,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:26:28,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:26:28,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:26:28,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:26:28,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:26:28,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-17 00:26:28,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 00:26:28,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-17 00:26:28,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-17 00:26:28,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-17 00:26:28,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:26:28,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-17 00:26:28,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:26:28,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:26:28,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:26:28,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:26:28,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 00:26:28,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:26:28,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:26:28,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:26:28,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-17 00:26:28,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-17 00:26:28,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:26:28,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:26:28,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:26:28,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:26:28,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:26:28,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 00:26:28,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-17 00:26:28,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-17 00:26:28,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 00:26:28,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 00:26:28,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-17 00:26:28,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-17 00:26:28,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-17 00:26:28,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-17 00:26:28,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:26:28,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:26:28,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:26:28,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:26:28,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:26:28,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:26:28,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 00:26:28,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:26:28,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-17 00:26:28,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-17 00:26:28,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:26:28,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:26:28,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-17 00:26:28,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:26:28,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 00:26:28,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:26:28,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 00:26:28,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:26:28,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-17 00:26:28,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-17 00:26:28,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-17 00:26:28,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step10000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:26:28,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-17 00:26:28,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: successfully saved checkpoint at iteration 10000 to checkpoints_146m60b400m 7: time (ms) | save-checkpoint: 441.54 7: iteration 10100/ 115203 | consumed samples: 2585600 | consumed tokens: 5295308800 | elapsed time per iteration (s): 0.38 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.639563E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.679 | TFLOPs: 31.77 | 7: iteration 10200/ 115203 | consumed samples: 2611200 | consumed tokens: 5347737600 | elapsed time per iteration (s): 0.37 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.637530E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.703 | TFLOPs: 31.91 | 7: iteration 10300/ 115203 | consumed samples: 2636800 | consumed tokens: 5400166400 | elapsed time per iteration (s): 0.37 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.627655E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.484 | TFLOPs: 31.95 | 7: iteration 10400/ 115203 | consumed samples: 2662400 | consumed tokens: 5452595200 | elapsed time per iteration (s): 0.38 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.621785E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.312 | TFLOPs: 31.47 | 7: iteration 10500/ 115203 | consumed samples: 2688000 | consumed tokens: 5505024000 | elapsed time per iteration (s): 0.39 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.619882E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 663.510 | TFLOPs: 30.97 | 7: iteration 10600/ 115203 | consumed samples: 2713600 | consumed tokens: 5557452800 | elapsed time per iteration (s): 0.38 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.620073E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.258 | TFLOPs: 31.75 | 7: iteration 10700/ 115203 | consumed samples: 2739200 | consumed tokens: 5609881600 | elapsed time per iteration (s): 0.38 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.619742E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.572 | TFLOPs: 31.81 | 7: iteration 10800/ 115203 | consumed samples: 2764800 | consumed tokens: 5662310400 | elapsed time per iteration (s): 0.38 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.615818E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.114 | TFLOPs: 31.61 | 7: iteration 10900/ 115203 | consumed samples: 2790400 | consumed tokens: 5714739200 | elapsed time per iteration (s): 0.38 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.614532E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.673 | TFLOPs: 31.77 | 7: iteration 11000/ 115203 | consumed samples: 2816000 | consumed tokens: 5767168000 | elapsed time per iteration (s): 0.38 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.607955E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.651 | TFLOPs: 31.49 | 7: iteration 11100/ 115203 | consumed samples: 2841600 | consumed tokens: 5819596800 | elapsed time per iteration (s): 0.38 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.608188E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.995 | TFLOPs: 31.83 | 7: iteration 11200/ 115203 | consumed samples: 2867200 | consumed tokens: 5872025600 | elapsed time per iteration (s): 0.38 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.606526E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.022 | TFLOPs: 31.74 | 7: iteration 11300/ 115203 | consumed samples: 2892800 | consumed tokens: 5924454400 | elapsed time per iteration (s): 0.38 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.600647E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.295 | TFLOPs: 31.57 | 7: iteration 11400/ 115203 | consumed samples: 2918400 | consumed tokens: 5976883200 | elapsed time per iteration (s): 0.37 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.598888E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.292 | TFLOPs: 31.89 | 7: iteration 11500/ 115203 | consumed samples: 2944000 | consumed tokens: 6029312000 | elapsed time per iteration (s): 0.38 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.598518E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.384 | TFLOPs: 31.57 | 7: iteration 11600/ 115203 | consumed samples: 2969600 | consumed tokens: 6081740800 | elapsed time per iteration (s): 0.38 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.593759E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.083 | TFLOPs: 31.79 | 7: iteration 11700/ 115203 | consumed samples: 2995200 | consumed tokens: 6134169600 | elapsed time per iteration (s): 0.37 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.594342E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.510 | TFLOPs: 31.95 | 7: iteration 11800/ 115203 | consumed samples: 3020800 | consumed tokens: 6186598400 | elapsed time per iteration (s): 0.37 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.591852E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.743 | TFLOPs: 31.91 | 7: iteration 11900/ 115203 | consumed samples: 3046400 | consumed tokens: 6239027200 | elapsed time per iteration (s): 0.37 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.586167E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.793 | TFLOPs: 31.96 | 0: [2023-03-17 00:39:01,320] [INFO] [logging.py:68:log_dist] [Rank 0] step=12000, skipped=0, lr=[0.0001960118617437879, 0.0001960118617437879, 0.0001960118617437879], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 12000/ 115203 | consumed samples: 3072000 | consumed tokens: 6291456000 | elapsed time per iteration (s): 0.38 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.581163E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.130 | TFLOPs: 31.75 | 0: steps: 12000 loss: 3.5684 iter time (s): 0.374 samples/sec: 684.065 7: iteration 12100/ 115203 | consumed samples: 3097600 | consumed tokens: 6343884800 | elapsed time per iteration (s): 0.37 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.589141E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.474 | TFLOPs: 32.14 | 7: iteration 12200/ 115203 | consumed samples: 3123200 | consumed tokens: 6396313600 | elapsed time per iteration (s): 0.37 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.586015E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.443 | TFLOPs: 32.18 | 7: iteration 12300/ 115203 | consumed samples: 3148800 | consumed tokens: 6448742400 | elapsed time per iteration (s): 0.37 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.579088E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.353 | TFLOPs: 32.22 | 7: iteration 12400/ 115203 | consumed samples: 3174400 | consumed tokens: 6501171200 | elapsed time per iteration (s): 0.37 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.578688E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.513 | TFLOPs: 32.09 | 7: iteration 12500/ 115203 | consumed samples: 3200000 | consumed tokens: 6553600000 | elapsed time per iteration (s): 0.37 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.573822E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.047 | TFLOPs: 32.16 | 7: iteration 12600/ 115203 | consumed samples: 3225600 | consumed tokens: 6606028800 | elapsed time per iteration (s): 0.37 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.579553E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.271 | TFLOPs: 32.13 | 7: iteration 12700/ 115203 | consumed samples: 3251200 | consumed tokens: 6658457600 | elapsed time per iteration (s): 0.37 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.572371E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.846 | TFLOPs: 32.20 | 7: iteration 12800/ 115203 | consumed samples: 3276800 | consumed tokens: 6710886400 | elapsed time per iteration (s): 0.37 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.567444E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.596 | TFLOPs: 32.28 | 7: iteration 12900/ 115203 | consumed samples: 3302400 | consumed tokens: 6763315200 | elapsed time per iteration (s): 0.37 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.561262E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.415 | TFLOPs: 32.04 | 7: iteration 13000/ 115203 | consumed samples: 3328000 | consumed tokens: 6815744000 | elapsed time per iteration (s): 0.37 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.562345E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.827 | TFLOPs: 32.06 | 7: iteration 13100/ 115203 | consumed samples: 3353600 | consumed tokens: 6868172800 | elapsed time per iteration (s): 0.37 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.569301E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.263 | TFLOPs: 32.13 | 7: iteration 13200/ 115203 | consumed samples: 3379200 | consumed tokens: 6920601600 | elapsed time per iteration (s): 0.37 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.564954E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.515 | TFLOPs: 32.14 | 7: iteration 13300/ 115203 | consumed samples: 3404800 | consumed tokens: 6973030400 | elapsed time per iteration (s): 0.37 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.561546E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.609 | TFLOPs: 32.24 | 7: iteration 13400/ 115203 | consumed samples: 3430400 | consumed tokens: 7025459200 | elapsed time per iteration (s): 0.37 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.557144E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.667 | TFLOPs: 32.24 | 7: iteration 13500/ 115203 | consumed samples: 3456000 | consumed tokens: 7077888000 | elapsed time per iteration (s): 0.37 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.561782E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.668 | TFLOPs: 32.14 | 7: iteration 13600/ 115203 | consumed samples: 3481600 | consumed tokens: 7130316800 | elapsed time per iteration (s): 0.37 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.558216E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.958 | TFLOPs: 32.20 | 7: iteration 13700/ 115203 | consumed samples: 3507200 | consumed tokens: 7182745600 | elapsed time per iteration (s): 0.37 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.550358E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.003 | TFLOPs: 32.25 | 7: iteration 13800/ 115203 | consumed samples: 3532800 | consumed tokens: 7235174400 | elapsed time per iteration (s): 0.37 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 3.558510E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.037 | TFLOPs: 32.26 | 7: iteration 13900/ 115203 | consumed samples: 3558400 | consumed tokens: 7287603200 | elapsed time per iteration (s): 0.37 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.554420E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.767 | TFLOPs: 32.29 | 0: [2023-03-17 00:51:24,111] [INFO] [logging.py:68:log_dist] [Rank 0] step=14000, skipped=0, lr=[0.00019442251142812213, 0.00019442251142812213, 0.00019442251142812213], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 14000/ 115203 | consumed samples: 3584000 | consumed tokens: 7340032000 | elapsed time per iteration (s): 0.37 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.546312E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.655 | TFLOPs: 32.10 | 0: steps: 14000 loss: 3.4963 iter time (s): 0.369 samples/sec: 692.994 7: iteration 14100/ 115203 | consumed samples: 3609600 | consumed tokens: 7392460800 | elapsed time per iteration (s): 0.37 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 3.548483E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.750 | TFLOPs: 32.29 | 7: iteration 14200/ 115203 | consumed samples: 3635200 | consumed tokens: 7444889600 | elapsed time per iteration (s): 0.37 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.550305E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.339 | TFLOPs: 32.32 | 7: iteration 14300/ 115203 | consumed samples: 3660800 | consumed tokens: 7497318400 | elapsed time per iteration (s): 0.37 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.541406E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.674 | TFLOPs: 32.28 | 7: iteration 14400/ 115203 | consumed samples: 3686400 | consumed tokens: 7549747200 | elapsed time per iteration (s): 0.37 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 3.543459E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.640 | TFLOPs: 32.28 | 7: iteration 14500/ 115203 | consumed samples: 3712000 | consumed tokens: 7602176000 | elapsed time per iteration (s): 0.37 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 3.544835E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.795 | TFLOPs: 32.29 | 7: iteration 14600/ 115203 | consumed samples: 3737600 | consumed tokens: 7654604800 | elapsed time per iteration (s): 0.37 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 3.538712E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.473 | TFLOPs: 32.28 | 7: iteration 14700/ 115203 | consumed samples: 3763200 | consumed tokens: 7707033600 | elapsed time per iteration (s): 0.37 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 3.541636E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.680 | TFLOPs: 32.24 | 7: iteration 14800/ 115203 | consumed samples: 3788800 | consumed tokens: 7759462400 | elapsed time per iteration (s): 0.37 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 3.537430E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.317 | TFLOPs: 32.27 | 7: iteration 14900/ 115203 | consumed samples: 3814400 | consumed tokens: 7811891200 | elapsed time per iteration (s): 0.37 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 3.535343E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.650 | TFLOPs: 32.28 | 7: iteration 15000/ 115203 | consumed samples: 3840000 | consumed tokens: 7864320000 | elapsed time per iteration (s): 0.37 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 3.530674E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.421 | TFLOPs: 32.27 | 7: iteration 15100/ 115203 | consumed samples: 3865600 | consumed tokens: 7916748800 | elapsed time per iteration (s): 0.37 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 3.533319E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.011 | TFLOPs: 32.25 | 7: iteration 15200/ 115203 | consumed samples: 3891200 | consumed tokens: 7969177600 | elapsed time per iteration (s): 0.37 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.530378E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.573 | TFLOPs: 32.23 | 7: iteration 15300/ 115203 | consumed samples: 3916800 | consumed tokens: 8021606400 | elapsed time per iteration (s): 0.37 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.533315E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.283 | TFLOPs: 32.22 | 7: iteration 15400/ 115203 | consumed samples: 3942400 | consumed tokens: 8074035200 | elapsed time per iteration (s): 0.37 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 3.529052E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.281 | TFLOPs: 32.22 | 7: iteration 15500/ 115203 | consumed samples: 3968000 | consumed tokens: 8126464000 | elapsed time per iteration (s): 0.37 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 3.526464E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.009 | TFLOPs: 32.30 | 7: iteration 15600/ 115203 | consumed samples: 3993600 | consumed tokens: 8178892800 | elapsed time per iteration (s): 0.37 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 3.519663E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.005 | TFLOPs: 32.25 | 7: iteration 15700/ 115203 | consumed samples: 4019200 | consumed tokens: 8231321600 | elapsed time per iteration (s): 0.37 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 3.524811E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.277 | TFLOPs: 32.27 | 7: iteration 15800/ 115203 | consumed samples: 4044800 | consumed tokens: 8283750400 | elapsed time per iteration (s): 0.37 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 3.523963E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.797 | TFLOPs: 32.24 | 7: iteration 15900/ 115203 | consumed samples: 4070400 | consumed tokens: 8336179200 | elapsed time per iteration (s): 0.37 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 3.517659E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.887 | TFLOPs: 32.25 | 0: [2023-03-17 01:03:44,768] [INFO] [logging.py:68:log_dist] [Rank 0] step=16000, skipped=0, lr=[0.00019257700559212364, 0.00019257700559212364, 0.00019257700559212364], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 16000/ 115203 | consumed samples: 4096000 | consumed tokens: 8388608000 | elapsed time per iteration (s): 0.37 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 3.518300E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.699 | TFLOPs: 32.29 | 0: steps: 16000 loss: 3.4983 iter time (s): 0.368 samples/sec: 695.022 7: iteration 16100/ 115203 | consumed samples: 4121600 | consumed tokens: 8441036800 | elapsed time per iteration (s): 0.37 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 3.518976E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.550 | TFLOPs: 32.28 | 7: iteration 16200/ 115203 | consumed samples: 4147200 | consumed tokens: 8493465600 | elapsed time per iteration (s): 0.37 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 3.517580E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.350 | TFLOPs: 32.27 | 7: iteration 16300/ 115203 | consumed samples: 4172800 | consumed tokens: 8545894400 | elapsed time per iteration (s): 0.37 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 3.513374E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.624 | TFLOPs: 32.28 | 7: iteration 16400/ 115203 | consumed samples: 4198400 | consumed tokens: 8598323200 | elapsed time per iteration (s): 0.37 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 3.510234E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.775 | TFLOPs: 32.24 | 7: iteration 16500/ 115203 | consumed samples: 4224000 | consumed tokens: 8650752000 | elapsed time per iteration (s): 0.37 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 3.510717E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.137 | TFLOPs: 32.26 | 7: iteration 16600/ 115203 | consumed samples: 4249600 | consumed tokens: 8703180800 | elapsed time per iteration (s): 0.37 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 3.507933E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.105 | TFLOPs: 32.16 | 7: iteration 16700/ 115203 | consumed samples: 4275200 | consumed tokens: 8755609600 | elapsed time per iteration (s): 0.37 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 3.513651E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.509 | TFLOPs: 32.23 | 7: iteration 16800/ 115203 | consumed samples: 4300800 | consumed tokens: 8808038400 | elapsed time per iteration (s): 0.37 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 3.509384E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.944 | TFLOPs: 32.25 | 7: iteration 16900/ 115203 | consumed samples: 4326400 | consumed tokens: 8860467200 | elapsed time per iteration (s): 0.37 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 3.509418E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.311 | TFLOPs: 32.22 | 7: iteration 17000/ 115203 | consumed samples: 4352000 | consumed tokens: 8912896000 | elapsed time per iteration (s): 0.37 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 3.503535E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.122 | TFLOPs: 32.26 | 7: iteration 17100/ 115203 | consumed samples: 4377600 | consumed tokens: 8965324800 | elapsed time per iteration (s): 0.37 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 3.502693E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.744 | TFLOPs: 32.24 | 7: iteration 17200/ 115203 | consumed samples: 4403200 | consumed tokens: 9017753600 | elapsed time per iteration (s): 0.37 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 3.503413E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.485 | TFLOPs: 32.18 | 7: iteration 17300/ 115203 | consumed samples: 4428800 | consumed tokens: 9070182400 | elapsed time per iteration (s): 0.37 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 3.502277E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.059 | TFLOPs: 32.26 | 7: iteration 17400/ 115203 | consumed samples: 4454400 | consumed tokens: 9122611200 | elapsed time per iteration (s): 0.37 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 3.499415E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.887 | TFLOPs: 32.20 | 7: iteration 17500/ 115203 | consumed samples: 4480000 | consumed tokens: 9175040000 | elapsed time per iteration (s): 0.37 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 3.497626E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.792 | TFLOPs: 32.20 | 7: iteration 17600/ 115203 | consumed samples: 4505600 | consumed tokens: 9227468800 | elapsed time per iteration (s): 0.37 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 3.497064E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.290 | TFLOPs: 32.31 | 7: iteration 17700/ 115203 | consumed samples: 4531200 | consumed tokens: 9279897600 | elapsed time per iteration (s): 0.37 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 3.495381E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.262 | TFLOPs: 32.31 | 7: iteration 17800/ 115203 | consumed samples: 4556800 | consumed tokens: 9332326400 | elapsed time per iteration (s): 0.37 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 3.499055E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.414 | TFLOPs: 32.32 | 7: iteration 17900/ 115203 | consumed samples: 4582400 | consumed tokens: 9384755200 | elapsed time per iteration (s): 0.37 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 3.492248E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.909 | TFLOPs: 32.34 | 0: [2023-03-17 01:16:05,617] [INFO] [logging.py:68:log_dist] [Rank 0] step=18000, skipped=0, lr=[0.00019048094388569267, 0.00019048094388569267, 0.00019048094388569267], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 18000/ 115203 | consumed samples: 4608000 | consumed tokens: 9437184000 | elapsed time per iteration (s): 0.37 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 3.497688E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.776 | TFLOPs: 32.34 | 0: steps: 18000 loss: 3.4826 iter time (s): 0.368 samples/sec: 695.233 7: iteration 18100/ 115203 | consumed samples: 4633600 | consumed tokens: 9489612800 | elapsed time per iteration (s): 0.37 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 3.494030E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.447 | TFLOPs: 32.32 | 7: iteration 18200/ 115203 | consumed samples: 4659200 | consumed tokens: 9542041600 | elapsed time per iteration (s): 0.37 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 3.490980E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.965 | TFLOPs: 32.35 | 7: iteration 18300/ 115203 | consumed samples: 4684800 | consumed tokens: 9594470400 | elapsed time per iteration (s): 0.37 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 3.492402E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.896 | TFLOPs: 32.30 | 7: iteration 18400/ 115203 | consumed samples: 4710400 | consumed tokens: 9646899200 | elapsed time per iteration (s): 0.37 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 3.491577E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.096 | TFLOPs: 32.35 | 7: iteration 18500/ 115203 | consumed samples: 4736000 | consumed tokens: 9699328000 | elapsed time per iteration (s): 0.37 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 3.486741E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.494 | TFLOPs: 32.32 | 7: iteration 18600/ 115203 | consumed samples: 4761600 | consumed tokens: 9751756800 | elapsed time per iteration (s): 0.37 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 3.488798E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.454 | TFLOPs: 32.32 | 7: iteration 18700/ 115203 | consumed samples: 4787200 | consumed tokens: 9804185600 | elapsed time per iteration (s): 0.37 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 3.485893E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.034 | TFLOPs: 32.30 | 7: iteration 18800/ 115203 | consumed samples: 4812800 | consumed tokens: 9856614400 | elapsed time per iteration (s): 0.37 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 3.486815E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.484 | TFLOPs: 32.32 | 7: iteration 18900/ 115203 | consumed samples: 4838400 | consumed tokens: 9909043200 | elapsed time per iteration (s): 0.37 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 3.487838E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.918 | TFLOPs: 32.34 | 7: iteration 19000/ 115203 | consumed samples: 4864000 | consumed tokens: 9961472000 | elapsed time per iteration (s): 0.37 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 3.486896E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.017 | TFLOPs: 32.35 | 7: iteration 19100/ 115203 | consumed samples: 4889600 | consumed tokens: 10013900800 | elapsed time per iteration (s): 0.37 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 3.482422E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.637 | TFLOPs: 32.33 | 7: iteration 19200/ 115203 | consumed samples: 4915200 | consumed tokens: 10066329600 | elapsed time per iteration (s): 0.37 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 3.481419E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.017 | TFLOPs: 32.35 | 7: iteration 19300/ 115203 | consumed samples: 4940800 | consumed tokens: 10118758400 | elapsed time per iteration (s): 0.37 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 3.479555E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.741 | TFLOPs: 32.33 | 7: iteration 19400/ 115203 | consumed samples: 4966400 | consumed tokens: 10171187200 | elapsed time per iteration (s): 0.37 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 3.476544E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.676 | TFLOPs: 32.33 | 7: iteration 19500/ 115203 | consumed samples: 4992000 | consumed tokens: 10223616000 | elapsed time per iteration (s): 0.37 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 3.476145E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.738 | TFLOPs: 32.33 | 7: iteration 19600/ 115203 | consumed samples: 5017600 | consumed tokens: 10276044800 | elapsed time per iteration (s): 0.37 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 3.474848E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.193 | TFLOPs: 32.36 | 7: iteration 19700/ 115203 | consumed samples: 5043200 | consumed tokens: 10328473600 | elapsed time per iteration (s): 0.37 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 3.475783E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.326 | TFLOPs: 32.32 | 7: iteration 19800/ 115203 | consumed samples: 5068800 | consumed tokens: 10380902400 | elapsed time per iteration (s): 0.37 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 3.474546E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.777 | TFLOPs: 32.34 | 7: iteration 19900/ 115203 | consumed samples: 5094400 | consumed tokens: 10433331200 | elapsed time per iteration (s): 0.37 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 3.474497E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.483 | TFLOPs: 32.28 | 0: [2023-03-17 01:28:24,920] [INFO] [logging.py:68:log_dist] [Rank 0] step=20000, skipped=0, lr=[0.00018814068619753637, 0.00018814068619753637, 0.00018814068619753637], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 20000/ 115203 | consumed samples: 5120000 | consumed tokens: 10485760000 | elapsed time per iteration (s): 0.37 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 3.476766E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.491 | TFLOPs: 32.28 | 0: steps: 20000 loss: 3.4869 iter time (s): 0.368 samples/sec: 696.474 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 20000 | lm loss value: 3.494229E+00 | lm loss PPL: 3.292490E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 20000 to checkpoints_146m60b400m 0: [2023-03-17 01:28:25,048] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step20000 is begin to save! 0: [2023-03-17 01:28:25,054] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:28:25,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:28:25,158] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:28:25,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:28:25,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:28:25,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:28:25,190] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:28:25,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:28:25,206] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:28:25,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:28:25,221] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/layer_07-model_00-model_states.pt... 0: [2023-03-17 01:28:25,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/layer_07-model_00-model_states.pt. 0: [2023-03-17 01:28:25,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:28:25,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:28:25,252] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/layer_09-model_00-model_states.pt... 0: [2023-03-17 01:28:25,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/layer_09-model_00-model_states.pt. 0: [2023-03-17 01:28:25,267] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/layer_10-model_00-model_states.pt... 0: [2023-03-17 01:28:25,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/layer_10-model_00-model_states.pt. 0: [2023-03-17 01:28:25,283] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/layer_11-model_00-model_states.pt... 0: [2023-03-17 01:28:25,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/layer_11-model_00-model_states.pt. 0: [2023-03-17 01:28:25,298] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/layer_12-model_00-model_states.pt... 0: [2023-03-17 01:28:25,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/layer_12-model_00-model_states.pt. 0: [2023-03-17 01:28:25,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/layer_13-model_00-model_states.pt... 0: [2023-03-17 01:28:25,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/layer_13-model_00-model_states.pt. 0: [2023-03-17 01:28:25,329] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/layer_14-model_00-model_states.pt... 0: [2023-03-17 01:28:25,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/layer_14-model_00-model_states.pt. 0: [2023-03-17 01:28:25,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/layer_15-model_00-model_states.pt... 0: [2023-03-17 01:28:25,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/layer_15-model_00-model_states.pt. 0: [2023-03-17 01:28:25,359] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/layer_16-model_00-model_states.pt... 0: [2023-03-17 01:28:25,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/layer_16-model_00-model_states.pt. 0: [2023-03-17 01:28:25,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/layer_17-model_00-model_states.pt... 0: [2023-03-17 01:28:25,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/layer_17-model_00-model_states.pt. 0: [2023-03-17 01:28:25,390] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/layer_19-model_00-model_states.pt... 0: [2023-03-17 01:28:25,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/layer_19-model_00-model_states.pt. 0: [2023-03-17 01:28:25,392] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b400m/global_step20000/mp_rank_00_model_states.pt 0: [2023-03-17 01:28:25,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:28:25,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:28:25,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:28:25,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:28:25,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:28:25,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:28:25,447] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:28:25,447] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:28:25,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-17 01:28:25,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-17 01:28:25,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:28:25,448] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:28:25,448] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-17 01:28:25,449] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:28:25,449] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:28:25,449] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-17 01:28:25,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:28:25,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:28:25,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:28:25,451] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:28:25,451] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:28:25,451] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-17 01:28:25,451] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:28:25,451] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-17 01:28:25,451] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-17 01:28:25,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:28:25,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:28:25,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:28:25,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:28:25,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:28:25,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:28:25,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:28:25,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:28:25,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:28:25,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:28:25,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:28:25,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:28:25,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-17 01:28:25,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-17 01:28:25,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-17 01:28:25,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-17 01:28:25,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:28:25,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:28:25,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:28:25,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:28:25,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-17 01:28:25,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-17 01:28:25,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-17 01:28:25,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-17 01:28:25,479] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:28:25,479] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-17 01:28:25,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:28:25,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:28:25,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:28:25,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:28:25,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:28:25,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:28:25,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:28:25,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:28:25,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:28:25,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:28:25,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:28:25,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:28:25,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-17 01:28:25,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-17 01:28:25,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:28:25,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-17 01:28:25,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-17 01:28:25,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:28:25,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:28:25,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:28:25,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-17 01:28:25,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-17 01:28:25,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-17 01:28:25,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-17 01:28:25,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:28:25,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:28:25,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:28:25,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:28:25,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:28:25,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:28:25,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:28:25,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:28:25,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:28:25,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:28:25,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:28:25,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:28:25,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:28:25,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:28:25,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:28:25,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:28:25,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-17 01:28:25,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-17 01:28:25,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-17 01:28:25,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-17 01:28:25,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-17 01:28:25,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-17 01:28:25,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-17 01:28:25,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-17 01:28:25,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:28:25,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:28:25,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:28:25,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:28:25,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:28:25,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:28:25,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:28:25,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:28:25,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:28:25,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:28:25,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:28:25,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-17 01:28:25,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-17 01:28:25,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:28:25,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:28:25,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:28:25,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 01:28:25,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:28:25,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:28:25,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:28:25,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:28:25,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:28:25,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:28:25,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:28:25,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-17 01:28:25,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:28:25,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-17 01:28:25,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:28:25,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:28:25,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 7: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:28:25,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-17 01:28:25,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:28:25,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:28:25,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 7: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:28:25,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:28:25,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 7: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:28:25,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:28:25,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 1: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-17 01:28:25,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:28:25,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:28:25,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:28:25,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:28:25,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:28:25,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step20000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-17 01:28:25,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: successfully saved checkpoint at iteration 20000 to checkpoints_146m60b400m 7: time (ms) | save-checkpoint: 452.63 7: iteration 20100/ 115203 | consumed samples: 5145600 | consumed tokens: 10538188800 | elapsed time per iteration (s): 0.38 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 3.472434E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.183 | TFLOPs: 31.80 | 7: iteration 20200/ 115203 | consumed samples: 5171200 | consumed tokens: 10590617600 | elapsed time per iteration (s): 0.37 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 3.467258E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.692 | TFLOPs: 32.24 | 7: iteration 20300/ 115203 | consumed samples: 5196800 | consumed tokens: 10643046400 | elapsed time per iteration (s): 0.37 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 3.467327E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.416 | TFLOPs: 32.27 | 7: iteration 20400/ 115203 | consumed samples: 5222400 | consumed tokens: 10695475200 | elapsed time per iteration (s): 0.37 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 3.464068E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.023 | TFLOPs: 32.25 | 7: iteration 20500/ 115203 | consumed samples: 5248000 | consumed tokens: 10747904000 | elapsed time per iteration (s): 0.37 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 3.470296E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.695 | TFLOPs: 32.24 | 7: iteration 20600/ 115203 | consumed samples: 5273600 | consumed tokens: 10800332800 | elapsed time per iteration (s): 0.37 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 3.470815E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.784 | TFLOPs: 32.01 | 7: iteration 20700/ 115203 | consumed samples: 5299200 | consumed tokens: 10852761600 | elapsed time per iteration (s): 0.37 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 3.467445E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.509 | TFLOPs: 32.09 | 7: iteration 20800/ 115203 | consumed samples: 5324800 | consumed tokens: 10905190400 | elapsed time per iteration (s): 0.38 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 3.463481E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.950 | TFLOPs: 31.32 | 7: iteration 20900/ 115203 | consumed samples: 5350400 | consumed tokens: 10957619200 | elapsed time per iteration (s): 0.38 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 3.465147E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.871 | TFLOPs: 31.22 | 7: iteration 21000/ 115203 | consumed samples: 5376000 | consumed tokens: 11010048000 | elapsed time per iteration (s): 0.39 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 3.463473E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.756 | TFLOPs: 31.03 | 7: iteration 21100/ 115203 | consumed samples: 5401600 | consumed tokens: 11062476800 | elapsed time per iteration (s): 0.39 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 3.462115E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 662.187 | TFLOPs: 30.91 | 7: iteration 21200/ 115203 | consumed samples: 5427200 | consumed tokens: 11114905600 | elapsed time per iteration (s): 0.39 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 3.466526E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 662.951 | TFLOPs: 30.94 | 7: iteration 21300/ 115203 | consumed samples: 5452800 | consumed tokens: 11167334400 | elapsed time per iteration (s): 0.38 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 3.458178E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.759 | TFLOPs: 31.36 | 7: iteration 21400/ 115203 | consumed samples: 5478400 | consumed tokens: 11219763200 | elapsed time per iteration (s): 0.39 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 3.457375E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.001 | TFLOPs: 30.99 | 7: iteration 21500/ 115203 | consumed samples: 5504000 | consumed tokens: 11272192000 | elapsed time per iteration (s): 0.38 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 3.459525E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 665.660 | TFLOPs: 31.07 | 7: iteration 21600/ 115203 | consumed samples: 5529600 | consumed tokens: 11324620800 | elapsed time per iteration (s): 0.38 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 3.454467E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.789 | TFLOPs: 31.45 | 7: iteration 21700/ 115203 | consumed samples: 5555200 | consumed tokens: 11377049600 | elapsed time per iteration (s): 0.38 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 3.458861E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.224 | TFLOPs: 31.28 | 7: iteration 21800/ 115203 | consumed samples: 5580800 | consumed tokens: 11429478400 | elapsed time per iteration (s): 0.38 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 3.459573E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.570 | TFLOPs: 31.53 | 7: iteration 21900/ 115203 | consumed samples: 5606400 | consumed tokens: 11481907200 | elapsed time per iteration (s): 0.38 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 3.457815E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.101 | TFLOPs: 31.42 | 0: [2023-03-17 01:41:02,934] [INFO] [logging.py:68:log_dist] [Rank 0] step=22000, skipped=0, lr=[0.00018556333335793902, 0.00018556333335793902, 0.00018556333335793902], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 22000/ 115203 | consumed samples: 5632000 | consumed tokens: 11534336000 | elapsed time per iteration (s): 0.38 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 3.459019E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.916 | TFLOPs: 31.27 | 0: steps: 22000 loss: 3.4623 iter time (s): 0.377 samples/sec: 678.542 7: iteration 22100/ 115203 | consumed samples: 5657600 | consumed tokens: 11586764800 | elapsed time per iteration (s): 0.38 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 3.458525E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.009 | TFLOPs: 31.51 | 7: iteration 22200/ 115203 | consumed samples: 5683200 | consumed tokens: 11639193600 | elapsed time per iteration (s): 0.38 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 3.456712E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.799 | TFLOPs: 31.64 | 7: iteration 22300/ 115203 | consumed samples: 5708800 | consumed tokens: 11691622400 | elapsed time per iteration (s): 0.38 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 3.454270E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.349 | TFLOPs: 31.34 | 7: iteration 22400/ 115203 | consumed samples: 5734400 | consumed tokens: 11744051200 | elapsed time per iteration (s): 0.38 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 3.454387E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.444 | TFLOPs: 31.81 | 7: iteration 22500/ 115203 | consumed samples: 5760000 | consumed tokens: 11796480000 | elapsed time per iteration (s): 0.38 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 3.451008E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.111 | TFLOPs: 31.65 | 7: iteration 22600/ 115203 | consumed samples: 5785600 | consumed tokens: 11848908800 | elapsed time per iteration (s): 0.38 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 3.453444E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.445 | TFLOPs: 31.85 | 7: iteration 22700/ 115203 | consumed samples: 5811200 | consumed tokens: 11901337600 | elapsed time per iteration (s): 0.37 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 3.450601E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.154 | TFLOPs: 31.89 | 7: iteration 22800/ 115203 | consumed samples: 5836800 | consumed tokens: 11953766400 | elapsed time per iteration (s): 0.38 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 3.447869E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.753 | TFLOPs: 31.49 | 7: iteration 22900/ 115203 | consumed samples: 5862400 | consumed tokens: 12006195200 | elapsed time per iteration (s): 0.38 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 3.449367E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.241 | TFLOPs: 31.70 | 7: iteration 23000/ 115203 | consumed samples: 5888000 | consumed tokens: 12058624000 | elapsed time per iteration (s): 0.38 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 3.449413E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.724 | TFLOPs: 31.68 | 7: iteration 23100/ 115203 | consumed samples: 5913600 | consumed tokens: 12111052800 | elapsed time per iteration (s): 0.38 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 3.443712E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.419 | TFLOPs: 31.71 | 7: iteration 23200/ 115203 | consumed samples: 5939200 | consumed tokens: 12163481600 | elapsed time per iteration (s): 0.38 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 3.443962E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.793 | TFLOPs: 31.73 | 7: iteration 23300/ 115203 | consumed samples: 5964800 | consumed tokens: 12215910400 | elapsed time per iteration (s): 0.38 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 3.446426E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.621 | TFLOPs: 31.72 | 7: iteration 23400/ 115203 | consumed samples: 5990400 | consumed tokens: 12268339200 | elapsed time per iteration (s): 0.38 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 3.443938E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.704 | TFLOPs: 31.40 | 7: iteration 23500/ 115203 | consumed samples: 6016000 | consumed tokens: 12320768000 | elapsed time per iteration (s): 0.38 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 3.442463E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.324 | TFLOPs: 31.33 | 7: iteration 23600/ 115203 | consumed samples: 6041600 | consumed tokens: 12373196800 | elapsed time per iteration (s): 0.38 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 3.441856E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.450 | TFLOPs: 31.57 | 7: iteration 23700/ 115203 | consumed samples: 6067200 | consumed tokens: 12425625600 | elapsed time per iteration (s): 0.38 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 3.444517E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.613 | TFLOPs: 31.77 | 7: iteration 23800/ 115203 | consumed samples: 6092800 | consumed tokens: 12478054400 | elapsed time per iteration (s): 0.38 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 3.443516E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.063 | TFLOPs: 31.65 | 7: iteration 23900/ 115203 | consumed samples: 6118400 | consumed tokens: 12530483200 | elapsed time per iteration (s): 0.38 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 3.439739E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.692 | TFLOPs: 31.59 | 0: [2023-03-17 01:53:38,361] [INFO] [logging.py:68:log_dist] [Rank 0] step=24000, skipped=0, lr=[0.00018275670559336077, 0.00018275670559336077, 0.00018275670559336077], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 24000/ 115203 | consumed samples: 6144000 | consumed tokens: 12582912000 | elapsed time per iteration (s): 0.38 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 3.443568E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.844 | TFLOPs: 31.69 | 0: steps: 24000 loss: 3.4431 iter time (s): 0.376 samples/sec: 681.333 7: iteration 24100/ 115203 | consumed samples: 6169600 | consumed tokens: 12635340800 | elapsed time per iteration (s): 0.38 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 3.440937E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.969 | TFLOPs: 31.69 | 7: iteration 24200/ 115203 | consumed samples: 6195200 | consumed tokens: 12687769600 | elapsed time per iteration (s): 0.38 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 3.441788E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.055 | TFLOPs: 31.65 | 7: iteration 24300/ 115203 | consumed samples: 6220800 | consumed tokens: 12740198400 | elapsed time per iteration (s): 0.38 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 3.440204E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.074 | TFLOPs: 31.84 | 7: iteration 24400/ 115203 | consumed samples: 6246400 | consumed tokens: 12792627200 | elapsed time per iteration (s): 0.38 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 3.443724E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.847 | TFLOPs: 31.64 | 7: iteration 24500/ 115203 | consumed samples: 6272000 | consumed tokens: 12845056000 | elapsed time per iteration (s): 0.38 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 3.437380E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.207 | TFLOPs: 31.28 | 7: iteration 24600/ 115203 | consumed samples: 6297600 | consumed tokens: 12897484800 | elapsed time per iteration (s): 0.37 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 3.431632E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.348 | TFLOPs: 31.94 | 7: iteration 24700/ 115203 | consumed samples: 6323200 | consumed tokens: 12949913600 | elapsed time per iteration (s): 0.38 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 3.435341E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.730 | TFLOPs: 31.77 | 7: iteration 24800/ 115203 | consumed samples: 6348800 | consumed tokens: 13002342400 | elapsed time per iteration (s): 0.37 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 3.439850E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.554 | TFLOPs: 32.00 | 7: iteration 24900/ 115203 | consumed samples: 6374400 | consumed tokens: 13054771200 | elapsed time per iteration (s): 0.37 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 3.439125E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.357 | TFLOPs: 32.18 | 7: iteration 25000/ 115203 | consumed samples: 6400000 | consumed tokens: 13107200000 | elapsed time per iteration (s): 0.37 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 3.435618E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.078 | TFLOPs: 31.98 | 7: iteration 25100/ 115203 | consumed samples: 6425600 | consumed tokens: 13159628800 | elapsed time per iteration (s): 0.37 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 3.430949E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.905 | TFLOPs: 32.06 | 7: iteration 25200/ 115203 | consumed samples: 6451200 | consumed tokens: 13212057600 | elapsed time per iteration (s): 0.37 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 3.434430E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.448 | TFLOPs: 32.18 | 7: iteration 25300/ 115203 | consumed samples: 6476800 | consumed tokens: 13264486400 | elapsed time per iteration (s): 0.37 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 3.434229E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.600 | TFLOPs: 32.00 | 7: iteration 25400/ 115203 | consumed samples: 6502400 | consumed tokens: 13316915200 | elapsed time per iteration (s): 0.37 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 3.434759E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.717 | TFLOPs: 32.15 | 7: iteration 25500/ 115203 | consumed samples: 6528000 | consumed tokens: 13369344000 | elapsed time per iteration (s): 0.37 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 3.437114E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.008 | TFLOPs: 31.93 | 7: iteration 25600/ 115203 | consumed samples: 6553600 | consumed tokens: 13421772800 | elapsed time per iteration (s): 0.38 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 3.434751E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.370 | TFLOPs: 31.80 | 7: iteration 25700/ 115203 | consumed samples: 6579200 | consumed tokens: 13474201600 | elapsed time per iteration (s): 0.37 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 3.426992E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.496 | TFLOPs: 32.00 | 7: iteration 25800/ 115203 | consumed samples: 6604800 | consumed tokens: 13526630400 | elapsed time per iteration (s): 0.37 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 3.426432E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.368 | TFLOPs: 31.94 | 7: iteration 25900/ 115203 | consumed samples: 6630400 | consumed tokens: 13579059200 | elapsed time per iteration (s): 0.37 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 3.422080E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.530 | TFLOPs: 32.09 | 0: [2023-03-17 02:06:07,403] [INFO] [logging.py:68:log_dist] [Rank 0] step=26000, skipped=0, lr=[0.00017972931879823854, 0.00017972931879823854, 0.00017972931879823854], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 26000/ 115203 | consumed samples: 6656000 | consumed tokens: 13631488000 | elapsed time per iteration (s): 0.37 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 3.423220E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.756 | TFLOPs: 32.01 | 0: steps: 26000 loss: 3.4473 iter time (s): 0.373 samples/sec: 687.243 7: iteration 26100/ 115203 | consumed samples: 6681600 | consumed tokens: 13683916800 | elapsed time per iteration (s): 0.37 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 3.429502E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.033 | TFLOPs: 32.21 | 7: iteration 26200/ 115203 | consumed samples: 6707200 | consumed tokens: 13736345600 | elapsed time per iteration (s): 0.37 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 3.423713E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.474 | TFLOPs: 32.28 | 7: iteration 26300/ 115203 | consumed samples: 6732800 | consumed tokens: 13788774400 | elapsed time per iteration (s): 0.37 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 3.427071E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.670 | TFLOPs: 32.14 | 7: iteration 26400/ 115203 | consumed samples: 6758400 | consumed tokens: 13841203200 | elapsed time per iteration (s): 0.37 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 3.424427E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.554 | TFLOPs: 32.23 | 7: iteration 26500/ 115203 | consumed samples: 6784000 | consumed tokens: 13893632000 | elapsed time per iteration (s): 0.37 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 3.425886E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.207 | TFLOPs: 32.17 | 7: iteration 26600/ 115203 | consumed samples: 6809600 | consumed tokens: 13946060800 | elapsed time per iteration (s): 0.37 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 3.427391E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.031 | TFLOPs: 32.21 | 7: iteration 26700/ 115203 | consumed samples: 6835200 | consumed tokens: 13998489600 | elapsed time per iteration (s): 0.37 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 3.424198E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.674 | TFLOPs: 32.19 | 7: iteration 26800/ 115203 | consumed samples: 6860800 | consumed tokens: 14050918400 | elapsed time per iteration (s): 0.37 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 3.425431E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.009 | TFLOPs: 32.25 | 7: iteration 26900/ 115203 | consumed samples: 6886400 | consumed tokens: 14103347200 | elapsed time per iteration (s): 0.37 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 3.421583E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.967 | TFLOPs: 32.25 | 7: iteration 27000/ 115203 | consumed samples: 6912000 | consumed tokens: 14155776000 | elapsed time per iteration (s): 0.37 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 3.423282E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.551 | TFLOPs: 32.19 | 7: iteration 27100/ 115203 | consumed samples: 6937600 | consumed tokens: 14208204800 | elapsed time per iteration (s): 0.37 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 3.419037E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.956 | TFLOPs: 32.25 | 7: iteration 27200/ 115203 | consumed samples: 6963200 | consumed tokens: 14260633600 | elapsed time per iteration (s): 0.37 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 3.418676E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.890 | TFLOPs: 32.20 | 7: iteration 27300/ 115203 | consumed samples: 6988800 | consumed tokens: 14313062400 | elapsed time per iteration (s): 0.37 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 3.424281E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.809 | TFLOPs: 32.24 | 7: iteration 27400/ 115203 | consumed samples: 7014400 | consumed tokens: 14365491200 | elapsed time per iteration (s): 0.37 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 3.414033E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.773 | TFLOPs: 32.24 | 7: iteration 27500/ 115203 | consumed samples: 7040000 | consumed tokens: 14417920000 | elapsed time per iteration (s): 0.37 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 3.420112E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.475 | TFLOPs: 32.23 | 7: iteration 27600/ 115203 | consumed samples: 7065600 | consumed tokens: 14470348800 | elapsed time per iteration (s): 0.37 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 3.412974E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.183 | TFLOPs: 32.22 | 7: iteration 27700/ 115203 | consumed samples: 7091200 | consumed tokens: 14522777600 | elapsed time per iteration (s): 0.37 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 3.419088E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.565 | TFLOPs: 32.33 | 7: iteration 27800/ 115203 | consumed samples: 7116800 | consumed tokens: 14575206400 | elapsed time per iteration (s): 0.37 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 3.417410E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.559 | TFLOPs: 32.33 | 7: iteration 27900/ 115203 | consumed samples: 7142400 | consumed tokens: 14627635200 | elapsed time per iteration (s): 0.37 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 3.416945E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.197 | TFLOPs: 32.26 | 0: [2023-03-17 02:18:28,862] [INFO] [logging.py:68:log_dist] [Rank 0] step=28000, skipped=0, lr=[0.00017649035869598463, 0.00017649035869598463, 0.00017649035869598463], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 28000/ 115203 | consumed samples: 7168000 | consumed tokens: 14680064000 | elapsed time per iteration (s): 0.37 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 3.413784E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.064 | TFLOPs: 32.21 | 0: steps: 28000 loss: 3.3916 iter time (s): 0.369 samples/sec: 694.296 7: iteration 28100/ 115203 | consumed samples: 7193600 | consumed tokens: 14732492800 | elapsed time per iteration (s): 0.37 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 3.416883E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.916 | TFLOPs: 32.20 | 7: iteration 28200/ 115203 | consumed samples: 7219200 | consumed tokens: 14784921600 | elapsed time per iteration (s): 0.37 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 3.418391E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.869 | TFLOPs: 32.25 | 7: iteration 28300/ 115203 | consumed samples: 7244800 | consumed tokens: 14837350400 | elapsed time per iteration (s): 0.37 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 3.411138E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.295 | TFLOPs: 32.31 | 7: iteration 28400/ 115203 | consumed samples: 7270400 | consumed tokens: 14889779200 | elapsed time per iteration (s): 0.37 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 3.415203E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.867 | TFLOPs: 32.34 | 7: iteration 28500/ 115203 | consumed samples: 7296000 | consumed tokens: 14942208000 | elapsed time per iteration (s): 0.37 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 3.412534E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.685 | TFLOPs: 32.33 | 7: iteration 28600/ 115203 | consumed samples: 7321600 | consumed tokens: 14994636800 | elapsed time per iteration (s): 0.37 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 3.411783E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.840 | TFLOPs: 32.29 | 7: iteration 28700/ 115203 | consumed samples: 7347200 | consumed tokens: 15047065600 | elapsed time per iteration (s): 0.37 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 3.415094E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.852 | TFLOPs: 32.29 | 7: iteration 28800/ 115203 | consumed samples: 7372800 | consumed tokens: 15099494400 | elapsed time per iteration (s): 0.37 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 3.416324E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.595 | TFLOPs: 32.28 | 7: iteration 28900/ 115203 | consumed samples: 7398400 | consumed tokens: 15151923200 | elapsed time per iteration (s): 0.37 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 3.408772E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.703 | TFLOPs: 32.29 | 7: iteration 29000/ 115203 | consumed samples: 7424000 | consumed tokens: 15204352000 | elapsed time per iteration (s): 0.37 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 3.405961E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.008 | TFLOPs: 32.30 | 7: iteration 29100/ 115203 | consumed samples: 7449600 | consumed tokens: 15256780800 | elapsed time per iteration (s): 0.37 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 3.409908E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.139 | TFLOPs: 32.35 | 7: iteration 29200/ 115203 | consumed samples: 7475200 | consumed tokens: 15309209600 | elapsed time per iteration (s): 0.37 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 3.414403E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.910 | TFLOPs: 32.30 | 7: iteration 29300/ 115203 | consumed samples: 7500800 | consumed tokens: 15361638400 | elapsed time per iteration (s): 0.37 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 3.410713E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.992 | TFLOPs: 32.35 | 7: iteration 29400/ 115203 | consumed samples: 7526400 | consumed tokens: 15414067200 | elapsed time per iteration (s): 0.37 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 3.409621E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.001 | TFLOPs: 32.35 | 7: iteration 29500/ 115203 | consumed samples: 7552000 | consumed tokens: 15466496000 | elapsed time per iteration (s): 0.37 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 3.410232E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.928 | TFLOPs: 32.30 | 7: iteration 29600/ 115203 | consumed samples: 7577600 | consumed tokens: 15518924800 | elapsed time per iteration (s): 0.37 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 3.410431E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.635 | TFLOPs: 32.33 | 7: iteration 29700/ 115203 | consumed samples: 7603200 | consumed tokens: 15571353600 | elapsed time per iteration (s): 0.37 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 3.405126E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.783 | TFLOPs: 32.34 | 7: iteration 29800/ 115203 | consumed samples: 7628800 | consumed tokens: 15623782400 | elapsed time per iteration (s): 0.37 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 3.408869E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.143 | TFLOPs: 32.31 | 7: iteration 29900/ 115203 | consumed samples: 7654400 | consumed tokens: 15676211200 | elapsed time per iteration (s): 0.37 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 3.407487E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.675 | TFLOPs: 32.33 | 0: [2023-03-17 02:30:48,554] [INFO] [logging.py:68:log_dist] [Rank 0] step=30000, skipped=0, lr=[0.00017304965296758478, 0.00017304965296758478, 0.00017304965296758478], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 30000/ 115203 | consumed samples: 7680000 | consumed tokens: 15728640000 | elapsed time per iteration (s): 0.37 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 3.403141E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.799 | TFLOPs: 32.34 | 0: steps: 30000 loss: 3.4164 iter time (s): 0.368 samples/sec: 696.186 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 30000 | lm loss value: 3.441741E+00 | lm loss PPL: 3.124131E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 30000 to checkpoints_146m60b400m 0: [2023-03-17 02:30:48,680] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step30000 is begin to save! 0: [2023-03-17 02:30:48,685] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:30:48,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:30:48,783] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:30:48,798] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:30:48,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:30:48,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:30:48,814] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:30:48,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:30:48,829] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:30:48,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:30:48,844] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/layer_07-model_00-model_states.pt... 0: [2023-03-17 02:30:48,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/layer_07-model_00-model_states.pt. 0: [2023-03-17 02:30:48,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:30:48,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:30:48,874] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/layer_09-model_00-model_states.pt... 0: [2023-03-17 02:30:48,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/layer_09-model_00-model_states.pt. 0: [2023-03-17 02:30:48,889] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/layer_10-model_00-model_states.pt... 0: [2023-03-17 02:30:48,904] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/layer_10-model_00-model_states.pt. 0: [2023-03-17 02:30:48,904] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/layer_11-model_00-model_states.pt... 0: [2023-03-17 02:30:48,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/layer_11-model_00-model_states.pt. 0: [2023-03-17 02:30:48,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/layer_12-model_00-model_states.pt... 0: [2023-03-17 02:30:48,934] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/layer_12-model_00-model_states.pt. 0: [2023-03-17 02:30:48,934] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/layer_13-model_00-model_states.pt... 0: [2023-03-17 02:30:48,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/layer_13-model_00-model_states.pt. 0: [2023-03-17 02:30:48,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/layer_14-model_00-model_states.pt... 0: [2023-03-17 02:30:48,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/layer_14-model_00-model_states.pt. 0: [2023-03-17 02:30:48,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/layer_15-model_00-model_states.pt... 0: [2023-03-17 02:30:48,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/layer_15-model_00-model_states.pt. 0: [2023-03-17 02:30:48,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/layer_16-model_00-model_states.pt... 0: [2023-03-17 02:30:48,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/layer_16-model_00-model_states.pt. 0: [2023-03-17 02:30:48,994] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/layer_17-model_00-model_states.pt... 0: [2023-03-17 02:30:49,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/layer_17-model_00-model_states.pt. 0: [2023-03-17 02:30:49,009] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/layer_19-model_00-model_states.pt... 0: [2023-03-17 02:30:49,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/layer_19-model_00-model_states.pt. 0: [2023-03-17 02:30:49,011] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b400m/global_step30000/mp_rank_00_model_states.pt 0: [2023-03-17 02:30:49,011] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:30:49,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:30:49,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:30:49,064] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:30:49,070] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:30:49,070] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:30:49,070] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:30:49,070] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:30:49,070] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-17 02:30:49,070] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-17 02:30:49,070] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:30:49,070] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:30:49,070] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:30:49,070] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:30:49,070] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-17 02:30:49,070] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-17 02:30:49,071] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:30:49,072] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:30:49,072] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-17 02:30:49,072] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:30:49,072] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:30:49,072] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-17 02:30:49,072] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:30:49,072] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 02:30:49,072] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-17 02:30:49,073] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:30:49,073] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:30:49,073] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-17 02:30:49,073] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:30:49,073] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:30:49,073] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-17 02:30:49,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:30:49,074] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:30:49,074] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-17 02:30:49,075] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:30:49,075] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:30:49,075] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:30:49,075] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:30:49,076] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 3: [2023-03-17 02:30:49,075] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:30:49,076] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 0: [2023-03-17 02:30:49,076] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:30:49,076] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:30:49,076] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-17 02:30:49,076] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 0: [2023-03-17 02:30:49,076] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-17 02:30:49,076] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-17 02:30:49,076] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-17 02:30:49,076] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-17 02:30:49,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:30:49,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:30:49,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:30:49,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:30:49,076] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:30:49,076] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:30:49,076] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-17 02:30:49,076] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:30:49,076] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:30:49,077] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-17 02:30:49,077] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-17 02:30:49,077] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-17 02:30:49,077] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:30:49,077] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:30:49,077] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:30:49,077] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:30:49,077] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-17 02:30:49,077] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:30:49,077] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:30:49,077] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-17 02:30:49,077] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-17 02:30:49,078] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:30:49,078] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:30:49,078] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-17 02:30:49,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:30:49,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:30:49,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:30:49,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:30:49,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:30:49,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:30:49,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:30:49,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-17 02:30:49,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 02:30:49,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-17 02:30:49,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-17 02:30:49,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-17 02:30:49,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:30:49,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:30:49,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:30:49,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:30:49,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:30:49,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:30:49,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:30:49,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:30:49,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-17 02:30:49,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-17 02:30:49,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-17 02:30:49,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-17 02:30:49,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:30:49,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:30:49,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:30:49,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:30:49,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:30:49,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:30:49,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:30:49,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:30:49,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:30:49,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:30:49,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:30:49,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:30:49,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 02:30:49,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:30:49,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:30:49,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:30:49,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-17 02:30:49,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-17 02:30:49,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-17 02:30:49,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-17 02:30:49,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-17 02:30:49,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-17 02:30:49,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-17 02:30:49,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-17 02:30:49,098] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:30:49,098] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-17 02:30:49,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:30:49,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:30:49,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:30:49,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:30:49,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:30:49,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:30:49,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:30:49,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:30:49,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:30:49,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:30:49,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:30:49,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:30:49,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:30:49,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:30:49,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:30:49,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-17 02:30:49,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-17 02:30:49,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-17 02:30:49,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:30:49,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-17 02:30:49,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-17 02:30:49,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-17 02:30:49,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-17 02:30:49,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-17 02:30:49,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:30:49,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:30:49,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:30:49,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:30:49,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:30:49,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:30:49,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:30:49,104] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:30:49,104] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:30:49,104] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:30:49,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:30:49,104] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:30:49,104] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-17 02:30:49,104] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:30:49,104] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-17 02:30:49,104] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:30:49,104] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:30:49,104] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-17 02:30:49,104] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:30:49,104] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-17 02:30:49,104] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-17 02:30:49,104] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-17 02:30:49,104] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-17 02:30:49,104] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-17 02:30:49,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:30:49,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:30:49,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:30:49,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:30:49,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:30:49,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:30:49,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:30:49,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:30:49,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:30:49,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:30:49,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:30:49,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:30:49,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:30:49,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:30:49,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:30:49,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step30000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 02:30:49,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-17 02:30:49,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-17 02:30:49,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-17 02:30:49,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-17 02:30:49,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-17 02:30:49,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-17 02:30:49,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-17 02:30:49,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: successfully saved checkpoint at iteration 30000 to checkpoints_146m60b400m 7: time (ms) | save-checkpoint: 436.90 7: iteration 30100/ 115203 | consumed samples: 7705600 | consumed tokens: 15781068800 | elapsed time per iteration (s): 0.37 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 3.401728E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.782 | TFLOPs: 31.87 | 7: iteration 30200/ 115203 | consumed samples: 7731200 | consumed tokens: 15833497600 | elapsed time per iteration (s): 0.37 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 3.402149E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.647 | TFLOPs: 32.33 | 7: iteration 30300/ 115203 | consumed samples: 7756800 | consumed tokens: 15885926400 | elapsed time per iteration (s): 0.37 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 3.405547E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.728 | TFLOPs: 32.33 | 7: iteration 30400/ 115203 | consumed samples: 7782400 | consumed tokens: 15938355200 | elapsed time per iteration (s): 0.37 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 3.403597E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.486 | TFLOPs: 32.32 | 7: iteration 30500/ 115203 | consumed samples: 7808000 | consumed tokens: 15990784000 | elapsed time per iteration (s): 0.37 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 3.405003E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.791 | TFLOPs: 31.92 | 7: iteration 30600/ 115203 | consumed samples: 7833600 | consumed tokens: 16043212800 | elapsed time per iteration (s): 0.37 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 3.402183E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.871 | TFLOPs: 32.29 | 7: iteration 30700/ 115203 | consumed samples: 7859200 | consumed tokens: 16095641600 | elapsed time per iteration (s): 0.37 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 3.399170E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.116 | TFLOPs: 32.26 | 7: iteration 30800/ 115203 | consumed samples: 7884800 | consumed tokens: 16148070400 | elapsed time per iteration (s): 0.37 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 3.404072E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.859 | TFLOPs: 32.29 | 7: iteration 30900/ 115203 | consumed samples: 7910400 | consumed tokens: 16200499200 | elapsed time per iteration (s): 0.37 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 3.400811E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.959 | TFLOPs: 32.30 | 7: iteration 31000/ 115203 | consumed samples: 7936000 | consumed tokens: 16252928000 | elapsed time per iteration (s): 0.37 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 3.396577E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.975 | TFLOPs: 32.11 | 7: iteration 31100/ 115203 | consumed samples: 7961600 | consumed tokens: 16305356800 | elapsed time per iteration (s): 0.37 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 3.401385E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.384 | TFLOPs: 32.18 | 7: iteration 31200/ 115203 | consumed samples: 7987200 | consumed tokens: 16357785600 | elapsed time per iteration (s): 0.38 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 3.398258E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.124 | TFLOPs: 31.79 | 7: iteration 31300/ 115203 | consumed samples: 8012800 | consumed tokens: 16410214400 | elapsed time per iteration (s): 0.38 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 3.401761E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.382 | TFLOPs: 31.80 | 7: iteration 31400/ 115203 | consumed samples: 8038400 | consumed tokens: 16462643200 | elapsed time per iteration (s): 0.38 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 3.400627E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.864 | TFLOPs: 31.59 | 7: iteration 31500/ 115203 | consumed samples: 8064000 | consumed tokens: 16515072000 | elapsed time per iteration (s): 0.37 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 3.397463E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.419 | TFLOPs: 32.18 | 7: iteration 31600/ 115203 | consumed samples: 8089600 | consumed tokens: 16567500800 | elapsed time per iteration (s): 0.37 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 3.401570E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.248 | TFLOPs: 32.12 | 7: iteration 31700/ 115203 | consumed samples: 8115200 | consumed tokens: 16619929600 | elapsed time per iteration (s): 0.38 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 3.400518E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.018 | TFLOPs: 31.74 | 7: iteration 31800/ 115203 | consumed samples: 8140800 | consumed tokens: 16672358400 | elapsed time per iteration (s): 0.38 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 3.398281E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.812 | TFLOPs: 31.59 | 7: iteration 31900/ 115203 | consumed samples: 8166400 | consumed tokens: 16724787200 | elapsed time per iteration (s): 0.38 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 3.396044E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.067 | TFLOPs: 31.56 | 0: [2023-03-17 02:43:15,105] [INFO] [logging.py:68:log_dist] [Rank 0] step=32000, skipped=0, lr=[0.00016941764143236279, 0.00016941764143236279, 0.00016941764143236279], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 32000/ 115203 | consumed samples: 8192000 | consumed tokens: 16777216000 | elapsed time per iteration (s): 0.38 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 3.397801E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.926 | TFLOPs: 31.69 | 0: steps: 32000 loss: 3.3156 iter time (s): 0.372 samples/sec: 689.033 7: iteration 32100/ 115203 | consumed samples: 8217600 | consumed tokens: 16829644800 | elapsed time per iteration (s): 0.37 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 3.392514E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.546 | TFLOPs: 31.95 | 7: iteration 32200/ 115203 | consumed samples: 8243200 | consumed tokens: 16882073600 | elapsed time per iteration (s): 0.38 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 3.401075E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.372 | TFLOPs: 31.85 | 7: iteration 32300/ 115203 | consumed samples: 8268800 | consumed tokens: 16934502400 | elapsed time per iteration (s): 0.38 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 3.391964E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.271 | TFLOPs: 31.85 | 7: iteration 32400/ 115203 | consumed samples: 8294400 | consumed tokens: 16986931200 | elapsed time per iteration (s): 0.38 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 3.394292E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.048 | TFLOPs: 31.74 | 7: iteration 32500/ 115203 | consumed samples: 8320000 | consumed tokens: 17039360000 | elapsed time per iteration (s): 0.38 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 3.394536E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.979 | TFLOPs: 31.32 | 7: iteration 32600/ 115203 | consumed samples: 8345600 | consumed tokens: 17091788800 | elapsed time per iteration (s): 0.38 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 3.393370E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.476 | TFLOPs: 31.39 | 7: iteration 32700/ 115203 | consumed samples: 8371200 | consumed tokens: 17144217600 | elapsed time per iteration (s): 0.38 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 3.394616E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.135 | TFLOPs: 31.42 | 7: iteration 32800/ 115203 | consumed samples: 8396800 | consumed tokens: 17196646400 | elapsed time per iteration (s): 0.38 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 3.394094E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.117 | TFLOPs: 31.14 | 7: iteration 32900/ 115203 | consumed samples: 8422400 | consumed tokens: 17249075200 | elapsed time per iteration (s): 0.38 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 3.396031E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.298 | TFLOPs: 31.43 | 7: iteration 33000/ 115203 | consumed samples: 8448000 | consumed tokens: 17301504000 | elapsed time per iteration (s): 0.38 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 3.389556E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.562 | TFLOPs: 31.58 | 7: iteration 33100/ 115203 | consumed samples: 8473600 | consumed tokens: 17353932800 | elapsed time per iteration (s): 0.38 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 3.387409E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.972 | TFLOPs: 31.65 | 7: iteration 33200/ 115203 | consumed samples: 8499200 | consumed tokens: 17406361600 | elapsed time per iteration (s): 0.38 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 3.392635E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.587 | TFLOPs: 31.86 | 7: iteration 33300/ 115203 | consumed samples: 8524800 | consumed tokens: 17458790400 | elapsed time per iteration (s): 0.38 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 3.390621E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.289 | TFLOPs: 31.85 | 7: iteration 33400/ 115203 | consumed samples: 8550400 | consumed tokens: 17511219200 | elapsed time per iteration (s): 0.38 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 3.390562E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.504 | TFLOPs: 31.67 | 7: iteration 33500/ 115203 | consumed samples: 8576000 | consumed tokens: 17563648000 | elapsed time per iteration (s): 0.38 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 3.392759E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.878 | TFLOPs: 31.55 | 7: iteration 33600/ 115203 | consumed samples: 8601600 | consumed tokens: 17616076800 | elapsed time per iteration (s): 0.38 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 3.391866E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.449 | TFLOPs: 31.53 | 7: iteration 33700/ 115203 | consumed samples: 8627200 | consumed tokens: 17668505600 | elapsed time per iteration (s): 0.38 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 3.387135E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.115 | TFLOPs: 31.84 | 7: iteration 33800/ 115203 | consumed samples: 8652800 | consumed tokens: 17720934400 | elapsed time per iteration (s): 0.38 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 3.389640E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.417 | TFLOPs: 31.57 | 7: iteration 33900/ 115203 | consumed samples: 8678400 | consumed tokens: 17773363200 | elapsed time per iteration (s): 0.38 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 3.390421E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.132 | TFLOPs: 31.61 | 0: [2023-03-17 02:55:50,487] [INFO] [logging.py:68:log_dist] [Rank 0] step=34000, skipped=0, lr=[0.00016560534437138965, 0.00016560534437138965, 0.00016560534437138965], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 34000/ 115203 | consumed samples: 8704000 | consumed tokens: 17825792000 | elapsed time per iteration (s): 0.37 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 3.390866E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.576 | TFLOPs: 32.00 | 0: steps: 34000 loss: 3.3883 iter time (s): 0.376 samples/sec: 681.287 7: iteration 34100/ 115203 | consumed samples: 8729600 | consumed tokens: 17878220800 | elapsed time per iteration (s): 0.37 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 3.386884E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.281 | TFLOPs: 32.13 | 7: iteration 34200/ 115203 | consumed samples: 8755200 | consumed tokens: 17930649600 | elapsed time per iteration (s): 0.37 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 3.391151E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.086 | TFLOPs: 31.88 | 7: iteration 34300/ 115203 | consumed samples: 8780800 | consumed tokens: 17983078400 | elapsed time per iteration (s): 0.37 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 3.389807E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.126 | TFLOPs: 32.07 | 7: iteration 34400/ 115203 | consumed samples: 8806400 | consumed tokens: 18035507200 | elapsed time per iteration (s): 0.37 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 3.386932E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.348 | TFLOPs: 32.08 | 7: iteration 34500/ 115203 | consumed samples: 8832000 | consumed tokens: 18087936000 | elapsed time per iteration (s): 0.37 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 3.389663E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.994 | TFLOPs: 32.21 | 7: iteration 34600/ 115203 | consumed samples: 8857600 | consumed tokens: 18140364800 | elapsed time per iteration (s): 0.37 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 3.385942E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.804 | TFLOPs: 32.20 | 7: iteration 34700/ 115203 | consumed samples: 8883200 | consumed tokens: 18192793600 | elapsed time per iteration (s): 0.37 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 3.387265E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.052 | TFLOPs: 32.16 | 7: iteration 34800/ 115203 | consumed samples: 8908800 | consumed tokens: 18245222400 | elapsed time per iteration (s): 0.37 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 3.386834E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.260 | TFLOPs: 32.17 | 7: iteration 34900/ 115203 | consumed samples: 8934400 | consumed tokens: 18297651200 | elapsed time per iteration (s): 0.37 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 3.384660E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.947 | TFLOPs: 32.25 | 7: iteration 35000/ 115203 | consumed samples: 8960000 | consumed tokens: 18350080000 | elapsed time per iteration (s): 0.37 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 3.383178E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.694 | TFLOPs: 32.24 | 7: iteration 35100/ 115203 | consumed samples: 8985600 | consumed tokens: 18402508800 | elapsed time per iteration (s): 0.37 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 3.381220E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.954 | TFLOPs: 32.16 | 7: iteration 35200/ 115203 | consumed samples: 9011200 | consumed tokens: 18454937600 | elapsed time per iteration (s): 0.37 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 3.384016E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.153 | TFLOPs: 32.07 | 7: iteration 35300/ 115203 | consumed samples: 9036800 | consumed tokens: 18507366400 | elapsed time per iteration (s): 0.37 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 3.382898E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.573 | TFLOPs: 32.23 | 7: iteration 35400/ 115203 | consumed samples: 9062400 | consumed tokens: 18559795200 | elapsed time per iteration (s): 0.37 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 3.384358E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.704 | TFLOPs: 32.29 | 7: iteration 35500/ 115203 | consumed samples: 9088000 | consumed tokens: 18612224000 | elapsed time per iteration (s): 0.37 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 3.384501E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.141 | TFLOPs: 32.07 | 7: iteration 35600/ 115203 | consumed samples: 9113600 | consumed tokens: 18664652800 | elapsed time per iteration (s): 0.37 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 3.378712E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.810 | TFLOPs: 32.24 | 7: iteration 35700/ 115203 | consumed samples: 9139200 | consumed tokens: 18717081600 | elapsed time per iteration (s): 0.37 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 3.380365E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.020 | TFLOPs: 32.02 | 7: iteration 35800/ 115203 | consumed samples: 9164800 | consumed tokens: 18769510400 | elapsed time per iteration (s): 0.37 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 3.380829E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.531 | TFLOPs: 32.04 | 7: iteration 35900/ 115203 | consumed samples: 9190400 | consumed tokens: 18821939200 | elapsed time per iteration (s): 0.37 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 3.376628E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.680 | TFLOPs: 32.05 | 0: [2023-03-17 03:08:14,298] [INFO] [logging.py:68:log_dist] [Rank 0] step=36000, skipped=0, lr=[0.00016162432908965068, 0.00016162432908965068, 0.00016162432908965068], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 36000/ 115203 | consumed samples: 9216000 | consumed tokens: 18874368000 | elapsed time per iteration (s): 0.37 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 3.379542E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.914 | TFLOPs: 32.02 | 0: steps: 36000 loss: 3.3729 iter time (s): 0.370 samples/sec: 692.233 7: iteration 36100/ 115203 | consumed samples: 9241600 | consumed tokens: 18926796800 | elapsed time per iteration (s): 0.38 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 3.379816E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.434 | TFLOPs: 31.76 | 7: iteration 36200/ 115203 | consumed samples: 9267200 | consumed tokens: 18979225600 | elapsed time per iteration (s): 0.38 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 3.381287E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.301 | TFLOPs: 31.71 | 7: iteration 36300/ 115203 | consumed samples: 9292800 | consumed tokens: 19031654400 | elapsed time per iteration (s): 0.38 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 3.380523E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.529 | TFLOPs: 31.86 | 7: iteration 36400/ 115203 | consumed samples: 9318400 | consumed tokens: 19084083200 | elapsed time per iteration (s): 0.38 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 3.373419E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.980 | TFLOPs: 31.69 | 7: iteration 36500/ 115203 | consumed samples: 9344000 | consumed tokens: 19136512000 | elapsed time per iteration (s): 0.37 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 3.376035E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.238 | TFLOPs: 31.89 | 7: iteration 36600/ 115203 | consumed samples: 9369600 | consumed tokens: 19188940800 | elapsed time per iteration (s): 0.38 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 3.375347E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.832 | TFLOPs: 31.78 | 7: iteration 36700/ 115203 | consumed samples: 9395200 | consumed tokens: 19241369600 | elapsed time per iteration (s): 0.38 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 3.377530E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.633 | TFLOPs: 31.77 | 7: iteration 36800/ 115203 | consumed samples: 9420800 | consumed tokens: 19293798400 | elapsed time per iteration (s): 0.37 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 3.382464E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.076 | TFLOPs: 32.02 | 7: iteration 36900/ 115203 | consumed samples: 9446400 | consumed tokens: 19346227200 | elapsed time per iteration (s): 0.38 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 3.380158E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.076 | TFLOPs: 31.79 | 7: iteration 37000/ 115203 | consumed samples: 9472000 | consumed tokens: 19398656000 | elapsed time per iteration (s): 0.38 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 3.378071E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.315 | TFLOPs: 31.80 | 7: iteration 37100/ 115203 | consumed samples: 9497600 | consumed tokens: 19451084800 | elapsed time per iteration (s): 0.37 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 3.376678E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.861 | TFLOPs: 31.97 | 7: iteration 37200/ 115203 | consumed samples: 9523200 | consumed tokens: 19503513600 | elapsed time per iteration (s): 0.37 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 3.377141E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.251 | TFLOPs: 31.99 | 7: iteration 37300/ 115203 | consumed samples: 9548800 | consumed tokens: 19555942400 | elapsed time per iteration (s): 0.37 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 3.374605E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.904 | TFLOPs: 31.92 | 7: iteration 37400/ 115203 | consumed samples: 9574400 | consumed tokens: 19608371200 | elapsed time per iteration (s): 0.37 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 3.377912E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.690 | TFLOPs: 31.96 | 7: iteration 37500/ 115203 | consumed samples: 9600000 | consumed tokens: 19660800000 | elapsed time per iteration (s): 0.37 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 3.372838E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.317 | TFLOPs: 32.03 | 7: iteration 37600/ 115203 | consumed samples: 9625600 | consumed tokens: 19713228800 | elapsed time per iteration (s): 0.37 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 3.372453E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.912 | TFLOPs: 32.02 | 7: iteration 37700/ 115203 | consumed samples: 9651200 | consumed tokens: 19765657600 | elapsed time per iteration (s): 0.37 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 3.370115E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.579 | TFLOPs: 32.05 | 7: iteration 37800/ 115203 | consumed samples: 9676800 | consumed tokens: 19818086400 | elapsed time per iteration (s): 0.37 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 3.374871E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.394 | TFLOPs: 32.18 | 7: iteration 37900/ 115203 | consumed samples: 9702400 | consumed tokens: 19870515200 | elapsed time per iteration (s): 0.37 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 3.373426E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.727 | TFLOPs: 32.19 | 0: [2023-03-17 03:20:42,968] [INFO] [logging.py:68:log_dist] [Rank 0] step=38000, skipped=0, lr=[0.00015748667481842792, 0.00015748667481842792, 0.00015748667481842792], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 38000/ 115203 | consumed samples: 9728000 | consumed tokens: 19922944000 | elapsed time per iteration (s): 0.37 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 3.373882E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.821 | TFLOPs: 32.06 | 0: steps: 38000 loss: 3.3980 iter time (s): 0.372 samples/sec: 687.689 7: iteration 38100/ 115203 | consumed samples: 9753600 | consumed tokens: 19975372800 | elapsed time per iteration (s): 0.37 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 3.370765E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.127 | TFLOPs: 32.07 | 7: iteration 38200/ 115203 | consumed samples: 9779200 | consumed tokens: 20027801600 | elapsed time per iteration (s): 0.37 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 3.370795E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.083 | TFLOPs: 32.16 | 7: iteration 38300/ 115203 | consumed samples: 9804800 | consumed tokens: 20080230400 | elapsed time per iteration (s): 0.37 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 3.373941E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.854 | TFLOPs: 32.11 | 7: iteration 38400/ 115203 | consumed samples: 9830400 | consumed tokens: 20132659200 | elapsed time per iteration (s): 0.37 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 3.371746E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.817 | TFLOPs: 32.20 | 7: iteration 38500/ 115203 | consumed samples: 9856000 | consumed tokens: 20185088000 | elapsed time per iteration (s): 0.37 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 3.372321E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.473 | TFLOPs: 32.14 | 7: iteration 38600/ 115203 | consumed samples: 9881600 | consumed tokens: 20237516800 | elapsed time per iteration (s): 0.37 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 3.371039E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.653 | TFLOPs: 32.14 | 7: iteration 38700/ 115203 | consumed samples: 9907200 | consumed tokens: 20289945600 | elapsed time per iteration (s): 0.37 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 3.365969E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.959 | TFLOPs: 32.25 | 7: iteration 38800/ 115203 | consumed samples: 9932800 | consumed tokens: 20342374400 | elapsed time per iteration (s): 0.37 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 3.371797E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.152 | TFLOPs: 32.21 | 7: iteration 38900/ 115203 | consumed samples: 9958400 | consumed tokens: 20394803200 | elapsed time per iteration (s): 0.37 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 3.369199E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.871 | TFLOPs: 32.25 | 7: iteration 39000/ 115203 | consumed samples: 9984000 | consumed tokens: 20447232000 | elapsed time per iteration (s): 0.37 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 3.366471E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.494 | TFLOPs: 32.28 | 7: iteration 39100/ 115203 | consumed samples: 10009600 | consumed tokens: 20499660800 | elapsed time per iteration (s): 0.37 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 3.365453E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.266 | TFLOPs: 32.27 | 7: iteration 39200/ 115203 | consumed samples: 10035200 | consumed tokens: 20552089600 | elapsed time per iteration (s): 0.37 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 3.368979E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.864 | TFLOPs: 32.25 | 7: iteration 39300/ 115203 | consumed samples: 10060800 | consumed tokens: 20604518400 | elapsed time per iteration (s): 0.37 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 3.367286E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.799 | TFLOPs: 32.29 | 7: iteration 39400/ 115203 | consumed samples: 10086400 | consumed tokens: 20656947200 | elapsed time per iteration (s): 0.37 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 3.361953E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.946 | TFLOPs: 32.25 | 7: iteration 39500/ 115203 | consumed samples: 10112000 | consumed tokens: 20709376000 | elapsed time per iteration (s): 0.37 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 3.367920E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.423 | TFLOPs: 32.13 | 7: iteration 39600/ 115203 | consumed samples: 10137600 | consumed tokens: 20761804800 | elapsed time per iteration (s): 0.37 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 3.368110E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.801 | TFLOPs: 32.24 | 7: iteration 39700/ 115203 | consumed samples: 10163200 | consumed tokens: 20814233600 | elapsed time per iteration (s): 0.37 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 3.364244E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.672 | TFLOPs: 32.19 | 7: iteration 39800/ 115203 | consumed samples: 10188800 | consumed tokens: 20866662400 | elapsed time per iteration (s): 0.37 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 3.364358E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.125 | TFLOPs: 32.26 | 7: iteration 39900/ 115203 | consumed samples: 10214400 | consumed tokens: 20919091200 | elapsed time per iteration (s): 0.37 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 3.366282E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.903 | TFLOPs: 32.30 | 0: [2023-03-17 03:33:04,845] [INFO] [logging.py:68:log_dist] [Rank 0] step=40000, skipped=0, lr=[0.0001532049360643911, 0.0001532049360643911, 0.0001532049360643911], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 40000/ 115203 | consumed samples: 10240000 | consumed tokens: 20971520000 | elapsed time per iteration (s): 0.37 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 3.362990E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.623 | TFLOPs: 32.28 | 0: steps: 40000 loss: 3.3704 iter time (s): 0.369 samples/sec: 693.815 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 40000 | lm loss value: 3.424177E+00 | lm loss PPL: 3.069738E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 40000 to checkpoints_146m60b400m 0: [2023-03-17 03:33:04,970] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step40000 is begin to save! 0: [2023-03-17 03:33:04,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:33:05,082] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:33:05,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:33:05,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:33:05,098] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:33:05,113] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:33:05,114] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:33:05,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:33:05,129] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:33:05,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:33:05,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/layer_07-model_00-model_states.pt... 0: [2023-03-17 03:33:05,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/layer_07-model_00-model_states.pt. 0: [2023-03-17 03:33:05,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:33:05,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:33:05,174] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/layer_09-model_00-model_states.pt... 0: [2023-03-17 03:33:05,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/layer_09-model_00-model_states.pt. 0: [2023-03-17 03:33:05,189] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/layer_10-model_00-model_states.pt... 0: [2023-03-17 03:33:05,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/layer_10-model_00-model_states.pt. 0: [2023-03-17 03:33:05,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/layer_11-model_00-model_states.pt... 0: [2023-03-17 03:33:05,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/layer_11-model_00-model_states.pt. 0: [2023-03-17 03:33:05,220] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/layer_12-model_00-model_states.pt... 0: [2023-03-17 03:33:05,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/layer_12-model_00-model_states.pt. 0: [2023-03-17 03:33:05,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/layer_13-model_00-model_states.pt... 0: [2023-03-17 03:33:05,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/layer_13-model_00-model_states.pt. 0: [2023-03-17 03:33:05,250] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/layer_14-model_00-model_states.pt... 0: [2023-03-17 03:33:05,265] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/layer_14-model_00-model_states.pt. 0: [2023-03-17 03:33:05,265] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/layer_15-model_00-model_states.pt... 0: [2023-03-17 03:33:05,280] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/layer_15-model_00-model_states.pt. 0: [2023-03-17 03:33:05,280] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/layer_16-model_00-model_states.pt... 0: [2023-03-17 03:33:05,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/layer_16-model_00-model_states.pt. 0: [2023-03-17 03:33:05,296] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/layer_17-model_00-model_states.pt... 0: [2023-03-17 03:33:05,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/layer_17-model_00-model_states.pt. 0: [2023-03-17 03:33:05,311] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/layer_19-model_00-model_states.pt... 0: [2023-03-17 03:33:05,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/layer_19-model_00-model_states.pt. 0: [2023-03-17 03:33:05,312] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b400m/global_step40000/mp_rank_00_model_states.pt 0: [2023-03-17 03:33:05,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:33:05,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:33:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:33:05,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:33:05,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:33:05,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:33:05,368] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-17 03:33:05,370] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:33:05,370] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:33:05,370] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-17 03:33:05,370] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:33:05,370] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:33:05,370] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-17 03:33:05,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:33:05,371] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:33:05,371] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-17 03:33:05,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:33:05,372] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:33:05,372] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-17 03:33:05,373] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:33:05,373] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:33:05,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:33:05,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:33:05,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-17 03:33:05,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-17 03:33:05,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:33:05,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:33:05,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:33:05,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:33:05,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:33:05,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:33:05,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:33:05,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:33:05,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:33:05,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:33:05,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:33:05,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:33:05,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-17 03:33:05,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:33:05,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:33:05,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-17 03:33:05,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-17 03:33:05,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-17 03:33:05,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-17 03:33:05,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-17 03:33:05,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-17 03:33:05,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:33:05,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:33:05,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-17 03:33:05,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:33:05,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:33:05,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:33:05,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-17 03:33:05,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:33:05,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-17 03:33:05,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:33:05,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:33:05,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-17 03:33:05,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:33:05,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:33:05,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-17 03:33:05,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:33:05,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:33:05,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:33:05,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:33:05,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:33:05,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 03:33:05,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-17 03:33:05,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-17 03:33:05,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-17 03:33:05,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:33:05,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:33:05,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-17 03:33:05,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:33:05,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:33:05,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:33:05,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:33:05,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:33:05,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:33:05,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:33:05,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:33:05,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:33:05,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:33:05,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:33:05,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 03:33:05,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:33:05,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-17 03:33:05,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-17 03:33:05,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-17 03:33:05,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-17 03:33:05,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-17 03:33:05,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-17 03:33:05,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:33:05,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-17 03:33:05,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:33:05,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:33:05,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:33:05,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:33:05,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:33:05,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:33:05,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:33:05,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:33:05,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:33:05,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:33:05,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-17 03:33:05,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-17 03:33:05,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:33:05,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:33:05,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:33:05,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:33:05,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-17 03:33:05,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-17 03:33:05,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-17 03:33:05,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-17 03:33:05,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-17 03:33:05,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:33:05,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:33:05,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-17 03:33:05,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:33:05,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:33:05,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-17 03:33:05,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:33:05,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-17 03:33:05,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:33:05,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:33:05,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:33:05,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:33:05,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:33:05,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 03:33:05,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:33:05,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:33:05,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:33:05,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:33:05,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:33:05,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:33:05,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:33:05,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:33:05,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 03:33:05,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:33:05,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-17 03:33:05,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-17 03:33:05,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:33:05,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:33:05,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:33:05,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:33:05,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:33:05,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:33:05,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-17 03:33:05,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:33:05,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-17 03:33:05,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-17 03:33:05,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-17 03:33:05,405] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:33:05,405] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:33:05,405] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-17 03:33:05,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:33:05,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:33:05,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:33:05,407] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:33:05,407] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:33:05,407] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step40000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:33:05,407] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-17 03:33:05,407] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-17 03:33:05,407] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: successfully saved checkpoint at iteration 40000 to checkpoints_146m60b400m 7: time (ms) | save-checkpoint: 445.90 7: iteration 40100/ 115203 | consumed samples: 10265600 | consumed tokens: 21023948800 | elapsed time per iteration (s): 0.38 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 3.364365E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.758 | TFLOPs: 31.82 | 7: iteration 40200/ 115203 | consumed samples: 10291200 | consumed tokens: 21076377600 | elapsed time per iteration (s): 0.37 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 3.363035E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.089 | TFLOPs: 32.16 | 7: iteration 40300/ 115203 | consumed samples: 10316800 | consumed tokens: 21128806400 | elapsed time per iteration (s): 0.37 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 3.364980E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.399 | TFLOPs: 32.23 | 7: iteration 40400/ 115203 | consumed samples: 10342400 | consumed tokens: 21181235200 | elapsed time per iteration (s): 0.37 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 3.364936E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.100 | TFLOPs: 32.26 | 7: iteration 40500/ 115203 | consumed samples: 10368000 | consumed tokens: 21233664000 | elapsed time per iteration (s): 0.37 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 3.362026E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.981 | TFLOPs: 32.21 | 7: iteration 40600/ 115203 | consumed samples: 10393600 | consumed tokens: 21286092800 | elapsed time per iteration (s): 0.37 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 3.362226E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.878 | TFLOPs: 32.01 | 7: iteration 40700/ 115203 | consumed samples: 10419200 | consumed tokens: 21338521600 | elapsed time per iteration (s): 0.37 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 3.360807E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.398 | TFLOPs: 32.13 | 7: iteration 40800/ 115203 | consumed samples: 10444800 | consumed tokens: 21390950400 | elapsed time per iteration (s): 0.37 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 3.360294E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.410 | TFLOPs: 32.23 | 7: iteration 40900/ 115203 | consumed samples: 10470400 | consumed tokens: 21443379200 | elapsed time per iteration (s): 0.37 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 3.357624E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.029 | TFLOPs: 32.16 | 7: iteration 41000/ 115203 | consumed samples: 10496000 | consumed tokens: 21495808000 | elapsed time per iteration (s): 0.37 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 3.360331E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.412 | TFLOPs: 32.27 | 7: iteration 41100/ 115203 | consumed samples: 10521600 | consumed tokens: 21548236800 | elapsed time per iteration (s): 0.37 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 3.359759E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.331 | TFLOPs: 32.04 | 7: iteration 41200/ 115203 | consumed samples: 10547200 | consumed tokens: 21600665600 | elapsed time per iteration (s): 0.37 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 3.366075E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.910 | TFLOPs: 32.30 | 7: iteration 41300/ 115203 | consumed samples: 10572800 | consumed tokens: 21653094400 | elapsed time per iteration (s): 0.37 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 3.360952E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.641 | TFLOPs: 32.28 | 7: iteration 41400/ 115203 | consumed samples: 10598400 | consumed tokens: 21705523200 | elapsed time per iteration (s): 0.37 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 3.361412E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.603 | TFLOPs: 32.19 | 7: iteration 41500/ 115203 | consumed samples: 10624000 | consumed tokens: 21757952000 | elapsed time per iteration (s): 0.37 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 3.357589E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.567 | TFLOPs: 32.28 | 7: iteration 41600/ 115203 | consumed samples: 10649600 | consumed tokens: 21810380800 | elapsed time per iteration (s): 0.37 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 3.359184E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.732 | TFLOPs: 32.05 | 7: iteration 41700/ 115203 | consumed samples: 10675200 | consumed tokens: 21862809600 | elapsed time per iteration (s): 0.37 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 3.357574E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.154 | TFLOPs: 32.03 | 7: iteration 41800/ 115203 | consumed samples: 10700800 | consumed tokens: 21915238400 | elapsed time per iteration (s): 0.38 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 3.359041E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 665.494 | TFLOPs: 31.06 | 7: iteration 41900/ 115203 | consumed samples: 10726400 | consumed tokens: 21967667200 | elapsed time per iteration (s): 0.37 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 3.358612E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.056 | TFLOPs: 32.07 | 0: [2023-03-17 03:45:29,470] [INFO] [logging.py:68:log_dist] [Rank 0] step=42000, skipped=0, lr=[0.0001487921045166041, 0.0001487921045166041, 0.0001487921045166041], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 42000/ 115203 | consumed samples: 10752000 | consumed tokens: 22020096000 | elapsed time per iteration (s): 0.37 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 3.360680E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.895 | TFLOPs: 32.16 | 0: steps: 42000 loss: 3.3750 iter time (s): 0.370 samples/sec: 691.697 7: iteration 42100/ 115203 | consumed samples: 10777600 | consumed tokens: 22072524800 | elapsed time per iteration (s): 0.37 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 3.362739E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.167 | TFLOPs: 32.07 | 7: iteration 42200/ 115203 | consumed samples: 10803200 | consumed tokens: 22124953600 | elapsed time per iteration (s): 0.37 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 3.355613E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.729 | TFLOPs: 31.91 | 7: iteration 42300/ 115203 | consumed samples: 10828800 | consumed tokens: 22177382400 | elapsed time per iteration (s): 0.37 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 3.360240E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.410 | TFLOPs: 32.09 | 7: iteration 42400/ 115203 | consumed samples: 10854400 | consumed tokens: 22229811200 | elapsed time per iteration (s): 0.37 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 3.358939E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.039 | TFLOPs: 32.02 | 7: iteration 42500/ 115203 | consumed samples: 10880000 | consumed tokens: 22282240000 | elapsed time per iteration (s): 0.37 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 3.355084E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.073 | TFLOPs: 32.07 | 7: iteration 42600/ 115203 | consumed samples: 10905600 | consumed tokens: 22334668800 | elapsed time per iteration (s): 0.37 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 3.357825E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.815 | TFLOPs: 32.01 | 7: iteration 42700/ 115203 | consumed samples: 10931200 | consumed tokens: 22387097600 | elapsed time per iteration (s): 0.37 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 3.354011E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.883 | TFLOPs: 32.06 | 7: iteration 42800/ 115203 | consumed samples: 10956800 | consumed tokens: 22439526400 | elapsed time per iteration (s): 0.37 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 3.353325E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.916 | TFLOPs: 32.20 | 7: iteration 42900/ 115203 | consumed samples: 10982400 | consumed tokens: 22491955200 | elapsed time per iteration (s): 0.37 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 3.354430E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.472 | TFLOPs: 31.90 | 7: iteration 43000/ 115203 | consumed samples: 11008000 | consumed tokens: 22544384000 | elapsed time per iteration (s): 0.37 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 3.353923E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.671 | TFLOPs: 31.86 | 7: iteration 43100/ 115203 | consumed samples: 11033600 | consumed tokens: 22596812800 | elapsed time per iteration (s): 0.38 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 3.357391E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.667 | TFLOPs: 31.16 | 7: iteration 43200/ 115203 | consumed samples: 11059200 | consumed tokens: 22649241600 | elapsed time per iteration (s): 0.38 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 3.355401E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.132 | TFLOPs: 31.79 | 7: iteration 43300/ 115203 | consumed samples: 11084800 | consumed tokens: 22701670400 | elapsed time per iteration (s): 0.38 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 3.352767E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.481 | TFLOPs: 31.76 | 7: iteration 43400/ 115203 | consumed samples: 11110400 | consumed tokens: 22754099200 | elapsed time per iteration (s): 0.37 | learning rate: 1.456E-04 | global batch size: 256 | lm loss: 3.357594E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.590 | TFLOPs: 31.91 | 7: iteration 43500/ 115203 | consumed samples: 11136000 | consumed tokens: 22806528000 | elapsed time per iteration (s): 0.37 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 3.356591E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.099 | TFLOPs: 31.93 | 7: iteration 43600/ 115203 | consumed samples: 11161600 | consumed tokens: 22858956800 | elapsed time per iteration (s): 0.38 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 3.353429E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.079 | TFLOPs: 31.65 | 7: iteration 43700/ 115203 | consumed samples: 11187200 | consumed tokens: 22911385600 | elapsed time per iteration (s): 0.38 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 3.352071E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.443 | TFLOPs: 31.67 | 7: iteration 43800/ 115203 | consumed samples: 11212800 | consumed tokens: 22963814400 | elapsed time per iteration (s): 0.38 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 3.352666E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.347 | TFLOPs: 31.85 | 7: iteration 43900/ 115203 | consumed samples: 11238400 | consumed tokens: 23016243200 | elapsed time per iteration (s): 0.38 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 3.355602E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.876 | TFLOPs: 31.78 | 0: [2023-03-17 03:57:59,319] [INFO] [logging.py:68:log_dist] [Rank 0] step=44000, skipped=0, lr=[0.00014426156962702883, 0.00014426156962702883, 0.00014426156962702883], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 44000/ 115203 | consumed samples: 11264000 | consumed tokens: 23068672000 | elapsed time per iteration (s): 0.38 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 3.352586E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.719 | TFLOPs: 31.73 | 0: steps: 44000 loss: 3.3064 iter time (s): 0.373 samples/sec: 686.291 7: iteration 44100/ 115203 | consumed samples: 11289600 | consumed tokens: 23121100800 | elapsed time per iteration (s): 0.38 | learning rate: 1.440E-04 | global batch size: 256 | lm loss: 3.350448E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.729 | TFLOPs: 31.73 | 7: iteration 44200/ 115203 | consumed samples: 11315200 | consumed tokens: 23173529600 | elapsed time per iteration (s): 0.37 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 3.351569E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.422 | TFLOPs: 31.90 | 7: iteration 44300/ 115203 | consumed samples: 11340800 | consumed tokens: 23225958400 | elapsed time per iteration (s): 0.37 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 3.350062E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.553 | TFLOPs: 32.00 | 7: iteration 44400/ 115203 | consumed samples: 11366400 | consumed tokens: 23278387200 | elapsed time per iteration (s): 0.37 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 3.351046E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.879 | TFLOPs: 31.87 | 7: iteration 44500/ 115203 | consumed samples: 11392000 | consumed tokens: 23330816000 | elapsed time per iteration (s): 0.37 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 3.355590E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.298 | TFLOPs: 31.94 | 7: iteration 44600/ 115203 | consumed samples: 11417600 | consumed tokens: 23383244800 | elapsed time per iteration (s): 0.37 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 3.350562E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.045 | TFLOPs: 32.02 | 7: iteration 44700/ 115203 | consumed samples: 11443200 | consumed tokens: 23435673600 | elapsed time per iteration (s): 0.37 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 3.352770E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.377 | TFLOPs: 32.04 | 7: iteration 44800/ 115203 | consumed samples: 11468800 | consumed tokens: 23488102400 | elapsed time per iteration (s): 0.37 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 3.347050E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.364 | TFLOPs: 31.90 | 7: iteration 44900/ 115203 | consumed samples: 11494400 | consumed tokens: 23540531200 | elapsed time per iteration (s): 0.37 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 3.352751E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.410 | TFLOPs: 32.13 | 7: iteration 45000/ 115203 | consumed samples: 11520000 | consumed tokens: 23592960000 | elapsed time per iteration (s): 0.37 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 3.352029E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.424 | TFLOPs: 32.18 | 7: iteration 45100/ 115203 | consumed samples: 11545600 | consumed tokens: 23645388800 | elapsed time per iteration (s): 0.37 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 3.345623E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.133 | TFLOPs: 32.07 | 7: iteration 45200/ 115203 | consumed samples: 11571200 | consumed tokens: 23697817600 | elapsed time per iteration (s): 0.37 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 3.347935E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.839 | TFLOPs: 32.11 | 7: iteration 45300/ 115203 | consumed samples: 11596800 | consumed tokens: 23750246400 | elapsed time per iteration (s): 0.37 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 3.347918E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.705 | TFLOPs: 31.91 | 7: iteration 45400/ 115203 | consumed samples: 11622400 | consumed tokens: 23802675200 | elapsed time per iteration (s): 0.37 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 3.350308E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.537 | TFLOPs: 32.09 | 7: iteration 45500/ 115203 | consumed samples: 11648000 | consumed tokens: 23855104000 | elapsed time per iteration (s): 0.37 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 3.351165E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.735 | TFLOPs: 32.24 | 7: iteration 45600/ 115203 | consumed samples: 11673600 | consumed tokens: 23907532800 | elapsed time per iteration (s): 0.37 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 3.349805E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.883 | TFLOPs: 32.11 | 7: iteration 45700/ 115203 | consumed samples: 11699200 | consumed tokens: 23959961600 | elapsed time per iteration (s): 0.37 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 3.352808E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.550 | TFLOPs: 32.14 | 7: iteration 45800/ 115203 | consumed samples: 11724800 | consumed tokens: 24012390400 | elapsed time per iteration (s): 0.37 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 3.348293E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.489 | TFLOPs: 32.14 | 7: iteration 45900/ 115203 | consumed samples: 11750400 | consumed tokens: 24064819200 | elapsed time per iteration (s): 0.37 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 3.347563E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.796 | TFLOPs: 32.15 | 0: [2023-03-17 04:10:25,139] [INFO] [logging.py:68:log_dist] [Rank 0] step=46000, skipped=0, lr=[0.0001396270779841331, 0.0001396270779841331, 0.0001396270779841331], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 46000/ 115203 | consumed samples: 11776000 | consumed tokens: 24117248000 | elapsed time per iteration (s): 0.37 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 3.344055E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.063 | TFLOPs: 32.21 | 0: steps: 46000 loss: 3.3779 iter time (s): 0.371 samples/sec: 690.346 7: iteration 46100/ 115203 | consumed samples: 11801600 | consumed tokens: 24169676800 | elapsed time per iteration (s): 0.37 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 3.348799E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.164 | TFLOPs: 31.89 | 7: iteration 46200/ 115203 | consumed samples: 11827200 | consumed tokens: 24222105600 | elapsed time per iteration (s): 0.37 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 3.342650E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.843 | TFLOPs: 32.06 | 7: iteration 46300/ 115203 | consumed samples: 11852800 | consumed tokens: 24274534400 | elapsed time per iteration (s): 0.37 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 3.349612E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.818 | TFLOPs: 32.06 | 7: iteration 46400/ 115203 | consumed samples: 11878400 | consumed tokens: 24326963200 | elapsed time per iteration (s): 0.37 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 3.342498E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.986 | TFLOPs: 32.07 | 7: iteration 46500/ 115203 | consumed samples: 11904000 | consumed tokens: 24379392000 | elapsed time per iteration (s): 0.37 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 3.345339E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.930 | TFLOPs: 32.16 | 7: iteration 46600/ 115203 | consumed samples: 11929600 | consumed tokens: 24431820800 | elapsed time per iteration (s): 0.37 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 3.346948E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.510 | TFLOPs: 32.04 | 7: iteration 46700/ 115203 | consumed samples: 11955200 | consumed tokens: 24484249600 | elapsed time per iteration (s): 0.37 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 3.347158E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.830 | TFLOPs: 32.06 | 7: iteration 46800/ 115203 | consumed samples: 11980800 | consumed tokens: 24536678400 | elapsed time per iteration (s): 0.37 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 3.346983E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.571 | TFLOPs: 32.00 | 7: iteration 46900/ 115203 | consumed samples: 12006400 | consumed tokens: 24589107200 | elapsed time per iteration (s): 0.37 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 3.345577E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.293 | TFLOPs: 31.94 | 7: iteration 47000/ 115203 | consumed samples: 12032000 | consumed tokens: 24641536000 | elapsed time per iteration (s): 0.38 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 3.341481E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.043 | TFLOPs: 31.84 | 7: iteration 47100/ 115203 | consumed samples: 12057600 | consumed tokens: 24693964800 | elapsed time per iteration (s): 0.37 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 3.348954E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.540 | TFLOPs: 31.91 | 7: iteration 47200/ 115203 | consumed samples: 12083200 | consumed tokens: 24746393600 | elapsed time per iteration (s): 0.37 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 3.343245E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.427 | TFLOPs: 31.99 | 7: iteration 47300/ 115203 | consumed samples: 12108800 | consumed tokens: 24798822400 | elapsed time per iteration (s): 0.37 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 3.345574E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.236 | TFLOPs: 31.89 | 7: iteration 47400/ 115203 | consumed samples: 12134400 | consumed tokens: 24851251200 | elapsed time per iteration (s): 0.38 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 3.345370E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.933 | TFLOPs: 31.78 | 7: iteration 47500/ 115203 | consumed samples: 12160000 | consumed tokens: 24903680000 | elapsed time per iteration (s): 0.37 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 3.343854E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.196 | TFLOPs: 31.89 | 7: iteration 47600/ 115203 | consumed samples: 12185600 | consumed tokens: 24956108800 | elapsed time per iteration (s): 0.37 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 3.339538E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.260 | TFLOPs: 32.03 | 7: iteration 47700/ 115203 | consumed samples: 12211200 | consumed tokens: 25008537600 | elapsed time per iteration (s): 0.37 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 3.346030E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.543 | TFLOPs: 32.00 | 7: iteration 47800/ 115203 | consumed samples: 12236800 | consumed tokens: 25060966400 | elapsed time per iteration (s): 0.37 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 3.343392E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.629 | TFLOPs: 31.96 | 7: iteration 47900/ 115203 | consumed samples: 12262400 | consumed tokens: 25113395200 | elapsed time per iteration (s): 0.37 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 3.342840E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.122 | TFLOPs: 31.89 | 0: [2023-03-17 04:22:52,525] [INFO] [logging.py:68:log_dist] [Rank 0] step=48000, skipped=0, lr=[0.00013490269160287214, 0.00013490269160287214, 0.00013490269160287214], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 48000/ 115203 | consumed samples: 12288000 | consumed tokens: 25165824000 | elapsed time per iteration (s): 0.37 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 3.337744E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.338 | TFLOPs: 32.08 | 0: steps: 48000 loss: 3.3457 iter time (s): 0.372 samples/sec: 688.769 7: iteration 48100/ 115203 | consumed samples: 12313600 | consumed tokens: 25218252800 | elapsed time per iteration (s): 0.38 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 3.340476E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.352 | TFLOPs: 31.62 | 7: iteration 48200/ 115203 | consumed samples: 12339200 | consumed tokens: 25270681600 | elapsed time per iteration (s): 0.37 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 3.343268E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.581 | TFLOPs: 32.05 | 7: iteration 48300/ 115203 | consumed samples: 12364800 | consumed tokens: 25323110400 | elapsed time per iteration (s): 0.37 | learning rate: 1.342E-04 | global batch size: 256 | lm loss: 3.340456E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.408 | TFLOPs: 32.09 | 7: iteration 48400/ 115203 | consumed samples: 12390400 | consumed tokens: 25375539200 | elapsed time per iteration (s): 0.37 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 3.340262E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.975 | TFLOPs: 32.11 | 7: iteration 48500/ 115203 | consumed samples: 12416000 | consumed tokens: 25427968000 | elapsed time per iteration (s): 0.37 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 3.341749E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.992 | TFLOPs: 32.07 | 7: iteration 48600/ 115203 | consumed samples: 12441600 | consumed tokens: 25480396800 | elapsed time per iteration (s): 0.37 | learning rate: 1.335E-04 | global batch size: 256 | lm loss: 3.340056E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.919 | TFLOPs: 32.06 | 7: iteration 48700/ 115203 | consumed samples: 12467200 | consumed tokens: 25532825600 | elapsed time per iteration (s): 0.38 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 3.345265E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.251 | TFLOPs: 31.84 | 7: iteration 48800/ 115203 | consumed samples: 12492800 | consumed tokens: 25585254400 | elapsed time per iteration (s): 0.37 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 3.342435E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.800 | TFLOPs: 32.15 | 7: iteration 48900/ 115203 | consumed samples: 12518400 | consumed tokens: 25637683200 | elapsed time per iteration (s): 0.37 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 3.340870E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.817 | TFLOPs: 32.06 | 7: iteration 49000/ 115203 | consumed samples: 12544000 | consumed tokens: 25690112000 | elapsed time per iteration (s): 0.37 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 3.342971E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.036 | TFLOPs: 32.25 | 7: iteration 49100/ 115203 | consumed samples: 12569600 | consumed tokens: 25742540800 | elapsed time per iteration (s): 0.37 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 3.339811E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.370 | TFLOPs: 32.22 | 7: iteration 49200/ 115203 | consumed samples: 12595200 | consumed tokens: 25794969600 | elapsed time per iteration (s): 0.37 | learning rate: 1.320E-04 | global batch size: 256 | lm loss: 3.336713E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.348 | TFLOPs: 32.27 | 7: iteration 49300/ 115203 | consumed samples: 12620800 | consumed tokens: 25847398400 | elapsed time per iteration (s): 0.37 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 3.334235E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.688 | TFLOPs: 32.24 | 7: iteration 49400/ 115203 | consumed samples: 12646400 | consumed tokens: 25899827200 | elapsed time per iteration (s): 0.37 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 3.339199E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.415 | TFLOPs: 32.27 | 7: iteration 49500/ 115203 | consumed samples: 12672000 | consumed tokens: 25952256000 | elapsed time per iteration (s): 0.37 | learning rate: 1.313E-04 | global batch size: 256 | lm loss: 3.339090E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.381 | TFLOPs: 32.27 | 7: iteration 49600/ 115203 | consumed samples: 12697600 | consumed tokens: 26004684800 | elapsed time per iteration (s): 0.37 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 3.336392E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.081 | TFLOPs: 32.21 | 7: iteration 49700/ 115203 | consumed samples: 12723200 | consumed tokens: 26057113600 | elapsed time per iteration (s): 0.37 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 3.337758E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.003 | TFLOPs: 32.25 | 7: iteration 49800/ 115203 | consumed samples: 12748800 | consumed tokens: 26109542400 | elapsed time per iteration (s): 0.37 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 3.336416E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.278 | TFLOPs: 31.99 | 7: iteration 49900/ 115203 | consumed samples: 12774400 | consumed tokens: 26161971200 | elapsed time per iteration (s): 0.37 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 3.337776E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.390 | TFLOPs: 32.22 | 0: [2023-03-17 04:35:16,423] [INFO] [logging.py:68:log_dist] [Rank 0] step=50000, skipped=0, lr=[0.00013010274525760026, 0.00013010274525760026, 0.00013010274525760026], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 50000/ 115203 | consumed samples: 12800000 | consumed tokens: 26214400000 | elapsed time per iteration (s): 0.37 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 3.337666E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.606 | TFLOPs: 32.28 | 0: steps: 50000 loss: 3.3716 iter time (s): 0.370 samples/sec: 692.077 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 50000 | lm loss value: 3.398521E+00 | lm loss PPL: 2.991981E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 50000 to checkpoints_146m60b400m 0: [2023-03-17 04:35:16,547] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step50000 is begin to save! 0: [2023-03-17 04:35:16,552] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:35:16,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:35:16,669] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:35:16,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:35:16,684] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:35:16,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:35:16,699] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:35:16,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:35:16,714] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:35:16,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:35:16,730] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/layer_07-model_00-model_states.pt... 0: [2023-03-17 04:35:16,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/layer_07-model_00-model_states.pt. 0: [2023-03-17 04:35:16,745] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:35:16,759] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:35:16,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/layer_09-model_00-model_states.pt... 0: [2023-03-17 04:35:16,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/layer_09-model_00-model_states.pt. 0: [2023-03-17 04:35:16,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/layer_10-model_00-model_states.pt... 0: [2023-03-17 04:35:16,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/layer_10-model_00-model_states.pt. 0: [2023-03-17 04:35:16,790] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/layer_11-model_00-model_states.pt... 0: [2023-03-17 04:35:16,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/layer_11-model_00-model_states.pt. 0: [2023-03-17 04:35:16,805] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/layer_12-model_00-model_states.pt... 0: [2023-03-17 04:35:16,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/layer_12-model_00-model_states.pt. 0: [2023-03-17 04:35:16,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/layer_13-model_00-model_states.pt... 0: [2023-03-17 04:35:16,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/layer_13-model_00-model_states.pt. 0: [2023-03-17 04:35:16,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/layer_14-model_00-model_states.pt... 0: [2023-03-17 04:35:16,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/layer_14-model_00-model_states.pt. 0: [2023-03-17 04:35:16,850] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/layer_15-model_00-model_states.pt... 0: [2023-03-17 04:35:16,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/layer_15-model_00-model_states.pt. 0: [2023-03-17 04:35:16,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/layer_16-model_00-model_states.pt... 0: [2023-03-17 04:35:16,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/layer_16-model_00-model_states.pt. 0: [2023-03-17 04:35:16,880] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/layer_17-model_00-model_states.pt... 0: [2023-03-17 04:35:16,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/layer_17-model_00-model_states.pt. 0: [2023-03-17 04:35:16,895] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/layer_19-model_00-model_states.pt... 0: [2023-03-17 04:35:16,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/layer_19-model_00-model_states.pt. 0: [2023-03-17 04:35:16,897] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b400m/global_step50000/mp_rank_00_model_states.pt 0: [2023-03-17 04:35:16,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:35:16,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:35:16,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:35:16,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:35:16,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:35:16,955] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 04:35:16,955] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 04:35:16,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:35:16,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:35:16,957] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:35:16,957] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 04:35:16,957] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 04:35:16,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:35:16,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 04:35:16,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 04:35:16,956] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 04:35:16,956] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 04:35:16,956] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 04:35:16,956] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 04:35:16,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:35:16,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 04:35:16,959] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 04:35:16,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:35:16,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 04:35:16,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 04:35:16,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:35:16,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:35:16,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:35:16,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:35:16,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 04:35:16,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 04:35:16,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 04:35:16,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 04:35:16,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 04:35:16,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 04:35:16,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 04:35:16,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 04:35:16,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:35:16,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 04:35:16,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 04:35:16,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:35:16,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 04:35:16,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:35:16,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 04:35:16,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 04:35:16,963] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 04:35:16,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:35:16,963] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 04:35:16,963] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 04:35:16,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:35:16,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:35:16,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:35:16,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:35:16,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:35:16,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:35:16,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:35:16,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:35:16,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 04:35:16,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 04:35:16,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 04:35:16,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 04:35:16,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 04:35:16,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 04:35:16,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 04:35:16,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 04:35:16,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 04:35:16,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 04:35:16,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 04:35:16,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 04:35:16,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 04:35:16,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 04:35:16,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 04:35:16,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 04:35:16,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:35:16,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:35:16,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:35:16,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:35:16,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:35:16,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:35:16,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:35:16,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 04:35:16,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:35:16,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 04:35:16,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 04:35:16,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 04:35:16,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 04:35:16,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 04:35:16,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 04:35:16,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 04:35:16,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 04:35:16,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 04:35:16,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 04:35:16,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 04:35:16,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 04:35:16,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 04:35:16,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 04:35:16,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 04:35:16,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:35:16,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:35:16,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:35:16,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:35:16,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:35:16,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:35:16,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:35:16,988] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 04:35:16,988] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 04:35:16,988] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 04:35:16,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:35:16,988] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 04:35:16,989] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 04:35:16,989] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 04:35:16,989] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 04:35:16,989] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 04:35:16,989] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 04:35:16,989] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 04:35:16,989] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 04:35:16,989] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 04:35:16,989] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 04:35:16,989] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 04:35:16,989] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 04:35:16,989] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 04:35:16,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:35:16,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:35:16,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:35:16,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:35:16,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:35:16,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:35:16,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:35:16,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:35:16,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 04:35:16,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 04:35:16,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 04:35:16,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 04:35:16,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 04:35:16,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 04:35:16,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 04:35:16,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 04:35:16,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 04:35:16,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 04:35:16,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 04:35:16,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 04:35:16,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 04:35:16,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 04:35:16,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 04:35:16,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 04:35:16,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:35:16,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:35:16,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:35:16,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:35:16,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:35:16,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:35:16,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:35:16,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:35:16,993] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:35:16,993] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 04:35:16,993] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 04:35:16,993] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 04:35:16,993] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 04:35:16,993] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 04:35:16,993] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 04:35:16,993] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 04:35:16,993] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 04:35:16,993] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 04:35:16,993] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 04:35:16,993] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 04:35:16,993] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 04:35:16,993] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 04:35:16,993] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 04:35:16,993] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 04:35:16,997] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 04:35:16,997] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 04:35:17,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:35:17,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:35:17,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:35:17,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:35:17,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:35:17,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:35:17,000] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 04:35:17,000] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 04:35:17,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:35:17,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:35:17,000] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 04:35:17,000] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 04:35:17,000] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 04:35:17,000] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 04:35:17,000] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 04:35:17,000] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 04:35:17,000] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 04:35:17,000] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 04:35:17,000] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 04:35:17,000] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step50000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 04:35:17,000] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 04:35:17,000] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 04:35:17,000] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 04:35:17,000] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: successfully saved checkpoint at iteration 50000 to checkpoints_146m60b400m 7: time (ms) | save-checkpoint: 460.13 7: iteration 50100/ 115203 | consumed samples: 12825600 | consumed tokens: 26266828800 | elapsed time per iteration (s): 0.38 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 3.336257E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.314 | TFLOPs: 31.80 | 7: iteration 50200/ 115203 | consumed samples: 12851200 | consumed tokens: 26319257600 | elapsed time per iteration (s): 0.37 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 3.333818E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.590 | TFLOPs: 32.28 | 7: iteration 50300/ 115203 | consumed samples: 12876800 | consumed tokens: 26371686400 | elapsed time per iteration (s): 0.37 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 3.333874E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.731 | TFLOPs: 32.29 | 7: iteration 50400/ 115203 | consumed samples: 12902400 | consumed tokens: 26424115200 | elapsed time per iteration (s): 0.37 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 3.333524E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.547 | TFLOPs: 32.23 | 7: iteration 50500/ 115203 | consumed samples: 12928000 | consumed tokens: 26476544000 | elapsed time per iteration (s): 0.37 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 3.338385E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.505 | TFLOPs: 32.18 | 7: iteration 50600/ 115203 | consumed samples: 12953600 | consumed tokens: 26528972800 | elapsed time per iteration (s): 0.37 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 3.336481E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.481 | TFLOPs: 31.95 | 7: iteration 50700/ 115203 | consumed samples: 12979200 | consumed tokens: 26581401600 | elapsed time per iteration (s): 0.37 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 3.333703E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.716 | TFLOPs: 32.05 | 7: iteration 50800/ 115203 | consumed samples: 13004800 | consumed tokens: 26633830400 | elapsed time per iteration (s): 0.37 | learning rate: 1.282E-04 | global batch size: 256 | lm loss: 3.336327E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.282 | TFLOPs: 32.17 | 7: iteration 50900/ 115203 | consumed samples: 13030400 | consumed tokens: 26686259200 | elapsed time per iteration (s): 0.37 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 3.331545E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.850 | TFLOPs: 32.20 | 7: iteration 51000/ 115203 | consumed samples: 13056000 | consumed tokens: 26738688000 | elapsed time per iteration (s): 0.37 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 3.332264E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.551 | TFLOPs: 32.05 | 7: iteration 51100/ 115203 | consumed samples: 13081600 | consumed tokens: 26791116800 | elapsed time per iteration (s): 0.37 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 3.334635E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.154 | TFLOPs: 31.93 | 7: iteration 51200/ 115203 | consumed samples: 13107200 | consumed tokens: 26843545600 | elapsed time per iteration (s): 0.37 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 3.333123E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.832 | TFLOPs: 31.92 | 7: iteration 51300/ 115203 | consumed samples: 13132800 | consumed tokens: 26895974400 | elapsed time per iteration (s): 0.37 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 3.332661E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.851 | TFLOPs: 32.01 | 7: iteration 51400/ 115203 | consumed samples: 13158400 | consumed tokens: 26948403200 | elapsed time per iteration (s): 0.37 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 3.332115E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.593 | TFLOPs: 31.91 | 7: iteration 51500/ 115203 | consumed samples: 13184000 | consumed tokens: 27000832000 | elapsed time per iteration (s): 0.37 | learning rate: 1.265E-04 | global batch size: 256 | lm loss: 3.332328E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.379 | TFLOPs: 32.08 | 7: iteration 51600/ 115203 | consumed samples: 13209600 | consumed tokens: 27053260800 | elapsed time per iteration (s): 0.37 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 3.332306E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.848 | TFLOPs: 31.87 | 7: iteration 51700/ 115203 | consumed samples: 13235200 | consumed tokens: 27105689600 | elapsed time per iteration (s): 0.37 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 3.327810E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.953 | TFLOPs: 32.06 | 7: iteration 51800/ 115203 | consumed samples: 13260800 | consumed tokens: 27158118400 | elapsed time per iteration (s): 0.37 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 3.339486E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.674 | TFLOPs: 32.05 | 7: iteration 51900/ 115203 | consumed samples: 13286400 | consumed tokens: 27210547200 | elapsed time per iteration (s): 0.37 | learning rate: 1.255E-04 | global batch size: 256 | lm loss: 3.334164E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.170 | TFLOPs: 32.21 | 0: [2023-03-17 04:47:42,302] [INFO] [logging.py:68:log_dist] [Rank 0] step=52000, skipped=0, lr=[0.00012524180298737348, 0.00012524180298737348, 0.00012524180298737348], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 52000/ 115203 | consumed samples: 13312000 | consumed tokens: 27262976000 | elapsed time per iteration (s): 0.38 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 3.332596E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.178 | TFLOPs: 31.56 | 0: steps: 52000 loss: 3.3138 iter time (s): 0.371 samples/sec: 689.608 7: iteration 52100/ 115203 | consumed samples: 13337600 | consumed tokens: 27315404800 | elapsed time per iteration (s): 0.37 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 3.329696E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.926 | TFLOPs: 31.92 | 7: iteration 52200/ 115203 | consumed samples: 13363200 | consumed tokens: 27367833600 | elapsed time per iteration (s): 0.38 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 3.332341E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.889 | TFLOPs: 31.69 | 7: iteration 52300/ 115203 | consumed samples: 13388800 | consumed tokens: 27420262400 | elapsed time per iteration (s): 0.38 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 3.328040E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.897 | TFLOPs: 31.78 | 7: iteration 52400/ 115203 | consumed samples: 13414400 | consumed tokens: 27472691200 | elapsed time per iteration (s): 0.37 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 3.329022E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.335 | TFLOPs: 32.04 | 7: iteration 52500/ 115203 | consumed samples: 13440000 | consumed tokens: 27525120000 | elapsed time per iteration (s): 0.37 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 3.329213E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.306 | TFLOPs: 31.99 | 7: iteration 52600/ 115203 | consumed samples: 13465600 | consumed tokens: 27577548800 | elapsed time per iteration (s): 0.37 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 3.331642E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.673 | TFLOPs: 32.14 | 7: iteration 52700/ 115203 | consumed samples: 13491200 | consumed tokens: 27629977600 | elapsed time per iteration (s): 0.37 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 3.331575E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.294 | TFLOPs: 31.99 | 7: iteration 52800/ 115203 | consumed samples: 13516800 | consumed tokens: 27682406400 | elapsed time per iteration (s): 0.37 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 3.327137E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.845 | TFLOPs: 32.11 | 7: iteration 52900/ 115203 | consumed samples: 13542400 | consumed tokens: 27734835200 | elapsed time per iteration (s): 0.37 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 3.331573E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.614 | TFLOPs: 32.14 | 7: iteration 53000/ 115203 | consumed samples: 13568000 | consumed tokens: 27787264000 | elapsed time per iteration (s): 0.37 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 3.331785E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.268 | TFLOPs: 32.22 | 7: iteration 53100/ 115203 | consumed samples: 13593600 | consumed tokens: 27839692800 | elapsed time per iteration (s): 0.37 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 3.328425E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.799 | TFLOPs: 32.20 | 7: iteration 53200/ 115203 | consumed samples: 13619200 | consumed tokens: 27892121600 | elapsed time per iteration (s): 0.37 | learning rate: 1.223E-04 | global batch size: 256 | lm loss: 3.331290E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.815 | TFLOPs: 32.24 | 7: iteration 53300/ 115203 | consumed samples: 13644800 | consumed tokens: 27944550400 | elapsed time per iteration (s): 0.37 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 3.331645E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.192 | TFLOPs: 32.22 | 7: iteration 53400/ 115203 | consumed samples: 13670400 | consumed tokens: 27996979200 | elapsed time per iteration (s): 0.37 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 3.330435E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.423 | TFLOPs: 32.09 | 7: iteration 53500/ 115203 | consumed samples: 13696000 | consumed tokens: 28049408000 | elapsed time per iteration (s): 0.37 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 3.323270E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.021 | TFLOPs: 32.07 | 7: iteration 53600/ 115203 | consumed samples: 13721600 | consumed tokens: 28101836800 | elapsed time per iteration (s): 0.37 | learning rate: 1.213E-04 | global batch size: 256 | lm loss: 3.325445E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.204 | TFLOPs: 32.17 | 7: iteration 53700/ 115203 | consumed samples: 13747200 | consumed tokens: 28154265600 | elapsed time per iteration (s): 0.37 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 3.327185E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.534 | TFLOPs: 32.28 | 7: iteration 53800/ 115203 | consumed samples: 13772800 | consumed tokens: 28206694400 | elapsed time per iteration (s): 0.37 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 3.331172E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.262 | TFLOPs: 32.31 | 7: iteration 53900/ 115203 | consumed samples: 13798400 | consumed tokens: 28259123200 | elapsed time per iteration (s): 0.37 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 3.328842E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.225 | TFLOPs: 32.31 | 0: [2023-03-17 05:00:06,610] [INFO] [logging.py:68:log_dist] [Rank 0] step=54000, skipped=0, lr=[0.00012033461390561511, 0.00012033461390561511, 0.00012033461390561511], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 54000/ 115203 | consumed samples: 13824000 | consumed tokens: 28311552000 | elapsed time per iteration (s): 0.37 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 3.325135E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.617 | TFLOPs: 32.28 | 0: steps: 54000 loss: 3.3478 iter time (s): 0.370 samples/sec: 691.438 7: iteration 54100/ 115203 | consumed samples: 13849600 | consumed tokens: 28363980800 | elapsed time per iteration (s): 0.37 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 3.326263E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.802 | TFLOPs: 32.24 | 7: iteration 54200/ 115203 | consumed samples: 13875200 | consumed tokens: 28416409600 | elapsed time per iteration (s): 0.37 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 3.324172E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.380 | TFLOPs: 32.27 | 7: iteration 54300/ 115203 | consumed samples: 13900800 | consumed tokens: 28468838400 | elapsed time per iteration (s): 0.37 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 3.322839E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.697 | TFLOPs: 32.29 | 7: iteration 54400/ 115203 | consumed samples: 13926400 | consumed tokens: 28521267200 | elapsed time per iteration (s): 0.37 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 3.322943E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.702 | TFLOPs: 32.29 | 7: iteration 54500/ 115203 | consumed samples: 13952000 | consumed tokens: 28573696000 | elapsed time per iteration (s): 0.37 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 3.328411E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.428 | TFLOPs: 32.27 | 7: iteration 54600/ 115203 | consumed samples: 13977600 | consumed tokens: 28626124800 | elapsed time per iteration (s): 0.37 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 3.326039E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.994 | TFLOPs: 32.30 | 7: iteration 54700/ 115203 | consumed samples: 14003200 | consumed tokens: 28678553600 | elapsed time per iteration (s): 0.37 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 3.324805E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.967 | TFLOPs: 32.30 | 7: iteration 54800/ 115203 | consumed samples: 14028800 | consumed tokens: 28730982400 | elapsed time per iteration (s): 0.37 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 3.324232E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.606 | TFLOPs: 32.28 | 7: iteration 54900/ 115203 | consumed samples: 14054400 | consumed tokens: 28783411200 | elapsed time per iteration (s): 0.37 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 3.321252E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.491 | TFLOPs: 32.28 | 7: iteration 55000/ 115203 | consumed samples: 14080000 | consumed tokens: 28835840000 | elapsed time per iteration (s): 0.37 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 3.320541E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.579 | TFLOPs: 32.28 | 7: iteration 55100/ 115203 | consumed samples: 14105600 | consumed tokens: 28888268800 | elapsed time per iteration (s): 0.37 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 3.323049E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.915 | TFLOPs: 32.30 | 7: iteration 55200/ 115203 | consumed samples: 14131200 | consumed tokens: 28940697600 | elapsed time per iteration (s): 0.57 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 3.324938E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 451.374 | TFLOPs: 21.07 | 7: iteration 55300/ 115203 | consumed samples: 14156800 | consumed tokens: 28993126400 | elapsed time per iteration (s): 0.37 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 3.322037E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.319 | TFLOPs: 32.31 | 7: iteration 55400/ 115203 | consumed samples: 14182400 | consumed tokens: 29045555200 | elapsed time per iteration (s): 0.37 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 3.320269E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.132 | TFLOPs: 32.31 | 7: iteration 55500/ 115203 | consumed samples: 14208000 | consumed tokens: 29097984000 | elapsed time per iteration (s): 0.37 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 3.324608E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.026 | TFLOPs: 32.30 | 7: iteration 55600/ 115203 | consumed samples: 14233600 | consumed tokens: 29150412800 | elapsed time per iteration (s): 0.37 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 3.320932E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.626 | TFLOPs: 32.28 | 7: iteration 55700/ 115203 | consumed samples: 14259200 | consumed tokens: 29202841600 | elapsed time per iteration (s): 0.37 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 3.325618E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.565 | TFLOPs: 32.33 | 7: iteration 55800/ 115203 | consumed samples: 14284800 | consumed tokens: 29255270400 | elapsed time per iteration (s): 0.37 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 3.324146E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.642 | TFLOPs: 32.28 | 7: iteration 55900/ 115203 | consumed samples: 14310400 | consumed tokens: 29307699200 | elapsed time per iteration (s): 0.37 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 3.321333E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.904 | TFLOPs: 32.30 | 0: [2023-03-17 05:12:46,446] [INFO] [logging.py:68:log_dist] [Rank 0] step=56000, skipped=0, lr=[0.00011539606744822729, 0.00011539606744822729, 0.00011539606744822729], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 56000/ 115203 | consumed samples: 14336000 | consumed tokens: 29360128000 | elapsed time per iteration (s): 0.37 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 3.319990E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.913 | TFLOPs: 32.30 | 0: steps: 56000 loss: 3.3644 iter time (s): 0.378 samples/sec: 677.473 7: iteration 56100/ 115203 | consumed samples: 14361600 | consumed tokens: 29412556800 | elapsed time per iteration (s): 0.37 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 3.322210E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.931 | TFLOPs: 32.30 | 7: iteration 56200/ 115203 | consumed samples: 14387200 | consumed tokens: 29464985600 | elapsed time per iteration (s): 0.37 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 3.322021E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.961 | TFLOPs: 32.30 | 7: iteration 56300/ 115203 | consumed samples: 14412800 | consumed tokens: 29517414400 | elapsed time per iteration (s): 0.37 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 3.324124E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.418 | TFLOPs: 32.27 | 7: iteration 56400/ 115203 | consumed samples: 14438400 | consumed tokens: 29569843200 | elapsed time per iteration (s): 0.37 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 3.327791E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.729 | TFLOPs: 32.29 | 7: iteration 56500/ 115203 | consumed samples: 14464000 | consumed tokens: 29622272000 | elapsed time per iteration (s): 0.37 | learning rate: 1.142E-04 | global batch size: 256 | lm loss: 3.326833E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.142 | TFLOPs: 32.26 | 7: iteration 56600/ 115203 | consumed samples: 14489600 | consumed tokens: 29674700800 | elapsed time per iteration (s): 0.37 | learning rate: 1.139E-04 | global batch size: 256 | lm loss: 3.322397E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.530 | TFLOPs: 32.28 | 7: iteration 56700/ 115203 | consumed samples: 14515200 | consumed tokens: 29727129600 | elapsed time per iteration (s): 0.37 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 3.322480E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.047 | TFLOPs: 32.30 | 7: iteration 56800/ 115203 | consumed samples: 14540800 | consumed tokens: 29779558400 | elapsed time per iteration (s): 0.37 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 3.322668E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.190 | TFLOPs: 32.31 | 7: iteration 56900/ 115203 | consumed samples: 14566400 | consumed tokens: 29831987200 | elapsed time per iteration (s): 0.37 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 3.318668E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.751 | TFLOPs: 32.29 | 7: iteration 57000/ 115203 | consumed samples: 14592000 | consumed tokens: 29884416000 | elapsed time per iteration (s): 0.37 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 3.325048E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.212 | TFLOPs: 32.31 | 7: iteration 57100/ 115203 | consumed samples: 14617600 | consumed tokens: 29936844800 | elapsed time per iteration (s): 0.37 | learning rate: 1.127E-04 | global batch size: 256 | lm loss: 3.321803E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.284 | TFLOPs: 32.31 | 7: iteration 57200/ 115203 | consumed samples: 14643200 | consumed tokens: 29989273600 | elapsed time per iteration (s): 0.37 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 3.320332E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.789 | TFLOPs: 32.34 | 7: iteration 57300/ 115203 | consumed samples: 14668800 | consumed tokens: 30041702400 | elapsed time per iteration (s): 0.37 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 3.319529E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.177 | TFLOPs: 32.31 | 7: iteration 57400/ 115203 | consumed samples: 14694400 | consumed tokens: 30094131200 | elapsed time per iteration (s): 0.37 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 3.321275E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.905 | TFLOPs: 32.30 | 7: iteration 57500/ 115203 | consumed samples: 14720000 | consumed tokens: 30146560000 | elapsed time per iteration (s): 0.37 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 3.318838E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.008 | TFLOPs: 32.30 | 7: iteration 57600/ 115203 | consumed samples: 14745600 | consumed tokens: 30198988800 | elapsed time per iteration (s): 0.37 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 3.317020E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.652 | TFLOPs: 32.28 | 7: iteration 57700/ 115203 | consumed samples: 14771200 | consumed tokens: 30251417600 | elapsed time per iteration (s): 0.37 | learning rate: 1.112E-04 | global batch size: 256 | lm loss: 3.320238E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.776 | TFLOPs: 32.29 | 7: iteration 57800/ 115203 | consumed samples: 14796800 | consumed tokens: 30303846400 | elapsed time per iteration (s): 0.37 | learning rate: 1.109E-04 | global batch size: 256 | lm loss: 3.317049E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.679 | TFLOPs: 32.33 | 7: iteration 57900/ 115203 | consumed samples: 14822400 | consumed tokens: 30356275200 | elapsed time per iteration (s): 0.37 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 3.322231E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.097 | TFLOPs: 32.30 | 0: [2023-03-17 05:25:06,327] [INFO] [logging.py:68:log_dist] [Rank 0] step=58000, skipped=0, lr=[0.00011044114819593482, 0.00011044114819593482, 0.00011044114819593482], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 58000/ 115203 | consumed samples: 14848000 | consumed tokens: 30408704000 | elapsed time per iteration (s): 0.37 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 3.317813E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.814 | TFLOPs: 32.34 | 0: steps: 58000 loss: 3.3041 iter time (s): 0.368 samples/sec: 695.753 7: iteration 58100/ 115203 | consumed samples: 14873600 | consumed tokens: 30461132800 | elapsed time per iteration (s): 0.37 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 3.320113E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.714 | TFLOPs: 32.24 | 7: iteration 58200/ 115203 | consumed samples: 14899200 | consumed tokens: 30513561600 | elapsed time per iteration (s): 0.37 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 3.317721E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.904 | TFLOPs: 32.34 | 7: iteration 58300/ 115203 | consumed samples: 14924800 | consumed tokens: 30565990400 | elapsed time per iteration (s): 0.37 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 3.319861E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.368 | TFLOPs: 32.36 | 7: iteration 58400/ 115203 | consumed samples: 14950400 | consumed tokens: 30618419200 | elapsed time per iteration (s): 0.37 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 3.318450E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.845 | TFLOPs: 32.34 | 7: iteration 58500/ 115203 | consumed samples: 14976000 | consumed tokens: 30670848000 | elapsed time per iteration (s): 0.37 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 3.318581E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.013 | TFLOPs: 32.35 | 7: iteration 58600/ 115203 | consumed samples: 15001600 | consumed tokens: 30723276800 | elapsed time per iteration (s): 0.37 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 3.321956E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.668 | TFLOPs: 32.33 | 7: iteration 58700/ 115203 | consumed samples: 15027200 | consumed tokens: 30775705600 | elapsed time per iteration (s): 0.37 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 3.320153E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.921 | TFLOPs: 32.34 | 7: iteration 58800/ 115203 | consumed samples: 15052800 | consumed tokens: 30828134400 | elapsed time per iteration (s): 0.37 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 3.318310E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.792 | TFLOPs: 32.34 | 7: iteration 58900/ 115203 | consumed samples: 15078400 | consumed tokens: 30880563200 | elapsed time per iteration (s): 0.37 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 3.318772E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.451 | TFLOPs: 32.37 | 7: iteration 59000/ 115203 | consumed samples: 15104000 | consumed tokens: 30932992000 | elapsed time per iteration (s): 0.37 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 3.315235E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.666 | TFLOPs: 32.33 | 7: iteration 59100/ 115203 | consumed samples: 15129600 | consumed tokens: 30985420800 | elapsed time per iteration (s): 0.37 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 3.313359E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.107 | TFLOPs: 32.21 | 7: iteration 59200/ 115203 | consumed samples: 15155200 | consumed tokens: 31037849600 | elapsed time per iteration (s): 0.38 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 3.319599E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.447 | TFLOPs: 31.71 | 7: iteration 59300/ 115203 | consumed samples: 15180800 | consumed tokens: 31090278400 | elapsed time per iteration (s): 0.37 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 3.314102E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.556 | TFLOPs: 32.00 | 7: iteration 59400/ 115203 | consumed samples: 15206400 | consumed tokens: 31142707200 | elapsed time per iteration (s): 0.37 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 3.311355E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.545 | TFLOPs: 31.91 | 7: iteration 59500/ 115203 | consumed samples: 15232000 | consumed tokens: 31195136000 | elapsed time per iteration (s): 0.37 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 3.313082E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.934 | TFLOPs: 32.16 | 7: iteration 59600/ 115203 | consumed samples: 15257600 | consumed tokens: 31247564800 | elapsed time per iteration (s): 0.37 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 3.317202E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.994 | TFLOPs: 32.25 | 7: iteration 59700/ 115203 | consumed samples: 15283200 | consumed tokens: 31299993600 | elapsed time per iteration (s): 0.37 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 3.316643E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.113 | TFLOPs: 32.12 | 7: iteration 59800/ 115203 | consumed samples: 15308800 | consumed tokens: 31352422400 | elapsed time per iteration (s): 0.37 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 3.315558E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.434 | TFLOPs: 32.13 | 7: iteration 59900/ 115203 | consumed samples: 15334400 | consumed tokens: 31404851200 | elapsed time per iteration (s): 0.37 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 3.313790E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.876 | TFLOPs: 32.01 | 0: [2023-03-17 05:37:28,507] [INFO] [logging.py:68:log_dist] [Rank 0] step=60000, skipped=0, lr=[0.00010548489040793946, 0.00010548489040793946, 0.00010548489040793946], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 60000/ 115203 | consumed samples: 15360000 | consumed tokens: 31457280000 | elapsed time per iteration (s): 0.37 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 3.317002E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.233 | TFLOPs: 32.17 | 0: steps: 60000 loss: 3.2742 iter time (s): 0.369 samples/sec: 693.753 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 60000 | lm loss value: 3.396961E+00 | lm loss PPL: 2.987319E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 60000 to checkpoints_146m60b400m 0: [2023-03-17 05:37:28,635] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step60000 is begin to save! 0: [2023-03-17 05:37:28,638] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/layer_01-model_00-model_states.pt... 0: [2023-03-17 05:37:28,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/layer_01-model_00-model_states.pt. 0: [2023-03-17 05:37:28,742] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/layer_03-model_00-model_states.pt... 0: [2023-03-17 05:37:28,758] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/layer_03-model_00-model_states.pt. 0: [2023-03-17 05:37:28,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/layer_04-model_00-model_states.pt... 0: [2023-03-17 05:37:28,776] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/layer_04-model_00-model_states.pt. 0: [2023-03-17 05:37:28,776] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/layer_05-model_00-model_states.pt... 0: [2023-03-17 05:37:28,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/layer_05-model_00-model_states.pt. 0: [2023-03-17 05:37:28,792] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/layer_06-model_00-model_states.pt... 0: [2023-03-17 05:37:28,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/layer_06-model_00-model_states.pt. 0: [2023-03-17 05:37:28,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/layer_07-model_00-model_states.pt... 0: [2023-03-17 05:37:28,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/layer_07-model_00-model_states.pt. 0: [2023-03-17 05:37:28,824] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/layer_08-model_00-model_states.pt... 0: [2023-03-17 05:37:28,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/layer_08-model_00-model_states.pt. 0: [2023-03-17 05:37:28,839] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/layer_09-model_00-model_states.pt... 0: [2023-03-17 05:37:28,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/layer_09-model_00-model_states.pt. 0: [2023-03-17 05:37:28,855] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/layer_10-model_00-model_states.pt... 0: [2023-03-17 05:37:28,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/layer_10-model_00-model_states.pt. 0: [2023-03-17 05:37:28,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/layer_11-model_00-model_states.pt... 0: [2023-03-17 05:37:28,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/layer_11-model_00-model_states.pt. 0: [2023-03-17 05:37:28,887] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/layer_12-model_00-model_states.pt... 0: [2023-03-17 05:37:28,902] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/layer_12-model_00-model_states.pt. 0: [2023-03-17 05:37:28,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/layer_13-model_00-model_states.pt... 0: [2023-03-17 05:37:28,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/layer_13-model_00-model_states.pt. 0: [2023-03-17 05:37:28,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/layer_14-model_00-model_states.pt... 0: [2023-03-17 05:37:28,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/layer_14-model_00-model_states.pt. 0: [2023-03-17 05:37:28,934] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/layer_15-model_00-model_states.pt... 0: [2023-03-17 05:37:28,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/layer_15-model_00-model_states.pt. 0: [2023-03-17 05:37:28,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/layer_16-model_00-model_states.pt... 0: [2023-03-17 05:37:28,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/layer_16-model_00-model_states.pt. 0: [2023-03-17 05:37:28,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/layer_17-model_00-model_states.pt... 0: [2023-03-17 05:37:28,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/layer_17-model_00-model_states.pt. 0: [2023-03-17 05:37:28,980] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/layer_19-model_00-model_states.pt... 0: [2023-03-17 05:37:28,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/layer_19-model_00-model_states.pt. 0: [2023-03-17 05:37:28,981] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b400m/global_step60000/mp_rank_00_model_states.pt 0: [2023-03-17 05:37:28,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/mp_rank_00_model_states.pt... 0: [2023-03-17 05:37:28,986] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/mp_rank_00_model_states.pt. 0: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 2: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 4: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 6: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 1: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 3: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 0: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 4: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 6: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 3: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 2: [2023-03-17 05:37:29,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 0: [2023-03-17 05:37:29,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 05:37:29,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 05:37:29,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 05:37:29,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 05:37:29,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 05:37:29,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 05:37:29,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 05:37:29,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 05:37:29,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 05:37:29,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 05:37:29,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 05:37:29,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 05:37:29,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 05:37:29,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 05:37:29,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 05:37:29,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 05:37:29,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 05:37:29,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 6: [2023-03-17 05:37:29,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 05:37:29,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 05:37:29,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 05:37:29,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 05:37:29,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 05:37:29,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 05:37:29,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 05:37:29,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2023-03-17 05:37:29,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 05:37:29,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 6: [2023-03-17 05:37:29,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 05:37:29,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 05:37:29,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 05:37:29,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 05:37:29,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 05:37:29,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 05:37:29,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 05:37:29,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 05:37:29,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 05:37:29,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 05:37:29,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 05:37:29,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 05:37:29,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 05:37:29,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 05:37:29,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 05:37:29,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 05:37:29,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 05:37:29,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 05:37:29,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 05:37:29,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 05:37:29,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 05:37:29,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 05:37:29,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 05:37:29,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 05:37:29,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 05:37:29,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 05:37:29,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 05:37:29,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 05:37:29,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 05:37:29,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 05:37:29,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 05:37:29,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 05:37:29,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 05:37:29,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 05:37:29,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 05:37:29,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 05:37:29,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 05:37:29,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 05:37:29,053] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 05:37:29,053] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 05:37:29,053] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 05:37:29,053] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 05:37:29,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 05:37:29,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 05:37:29,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 05:37:29,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 05:37:29,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 05:37:29,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 05:37:29,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 05:37:29,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 05:37:29,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 05:37:29,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 05:37:29,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 05:37:29,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 05:37:29,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 05:37:29,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 05:37:29,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 05:37:29,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 05:37:29,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 05:37:29,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 05:37:29,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 05:37:29,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 05:37:29,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 05:37:29,057] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 05:37:29,058] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 05:37:29,058] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 05:37:29,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 05:37:29,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 05:37:29,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 05:37:29,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 05:37:29,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 05:37:29,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 05:37:29,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 05:37:29,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 05:37:29,074] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 05:37:29,074] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 05:37:29,074] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 05:37:29,074] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 05:37:29,074] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 05:37:29,074] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 05:37:29,074] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 05:37:29,074] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 05:37:29,074] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 05:37:29,074] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 05:37:29,074] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 05:37:29,074] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 05:37:29,074] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 05:37:29,074] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 05:37:29,074] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 05:37:29,074] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 05:37:29,075] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 05:37:29,075] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 05:37:29,075] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 05:37:29,075] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 05:37:29,075] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 05:37:29,075] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 05:37:29,075] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 05:37:29,075] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 05:37:29,075] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 05:37:29,075] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 05:37:29,075] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 05:37:29,075] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 05:37:29,075] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 05:37:29,075] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 05:37:29,075] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 05:37:29,075] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 0: [2023-03-17 05:37:29,075] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 05:37:29,075] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 05:37:29,075] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 05:37:29,075] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 05:37:29,075] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 05:37:29,075] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 05:37:29,075] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 05:37:29,075] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 05:37:29,075] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 05:37:29,075] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 05:37:29,082] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 05:37:29,082] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 05:37:29,082] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 05:37:29,082] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 05:37:29,082] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 05:37:29,082] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 05:37:29,082] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 05:37:29,082] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 05:37:29,082] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 05:37:29,082] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 05:37:29,082] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 05:37:29,082] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 05:37:29,082] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 05:37:29,082] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 05:37:29,082] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 05:37:29,082] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 05:37:29,082] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 05:37:29,082] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 05:37:29,082] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 05:37:29,082] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 05:37:29,082] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 05:37:29,082] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 05:37:29,082] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 05:37:29,082] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 05:37:29,084] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 05:37:29,084] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 05:37:29,084] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 05:37:29,084] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 05:37:29,084] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 05:37:29,084] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 05:37:29,084] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 05:37:29,084] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 05:37:29,084] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 05:37:29,084] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 05:37:29,084] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 05:37:29,084] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 05:37:29,084] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 05:37:29,084] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 05:37:29,084] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 05:37:29,084] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step60000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 05:37:29,084] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 05:37:29,084] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 05:37:29,084] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 05:37:29,084] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 05:37:29,084] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 05:37:29,084] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 05:37:29,084] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 05:37:29,084] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: successfully saved checkpoint at iteration 60000 to checkpoints_146m60b400m 7: time (ms) | save-checkpoint: 459.85 7: iteration 60100/ 115203 | consumed samples: 15385600 | consumed tokens: 31509708800 | elapsed time per iteration (s): 0.38 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 3.311485E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.653 | TFLOPs: 31.63 | 7: iteration 60200/ 115203 | consumed samples: 15411200 | consumed tokens: 31562137600 | elapsed time per iteration (s): 0.37 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 3.314914E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.667 | TFLOPs: 31.96 | 7: iteration 60300/ 115203 | consumed samples: 15436800 | consumed tokens: 31614566400 | elapsed time per iteration (s): 0.37 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 3.315870E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.798 | TFLOPs: 32.06 | 7: iteration 60400/ 115203 | consumed samples: 15462400 | consumed tokens: 31666995200 | elapsed time per iteration (s): 0.37 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 3.315694E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.883 | TFLOPs: 32.06 | 7: iteration 60500/ 115203 | consumed samples: 15488000 | consumed tokens: 31719424000 | elapsed time per iteration (s): 0.37 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 3.315447E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.410 | TFLOPs: 32.09 | 7: iteration 60600/ 115203 | consumed samples: 15513600 | consumed tokens: 31771852800 | elapsed time per iteration (s): 0.37 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 3.316740E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.962 | TFLOPs: 31.88 | 7: iteration 60700/ 115203 | consumed samples: 15539200 | consumed tokens: 31824281600 | elapsed time per iteration (s): 0.38 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 3.314930E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.898 | TFLOPs: 31.69 | 7: iteration 60800/ 115203 | consumed samples: 15564800 | consumed tokens: 31876710400 | elapsed time per iteration (s): 0.37 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 3.315073E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.570 | TFLOPs: 31.95 | 7: iteration 60900/ 115203 | consumed samples: 15590400 | consumed tokens: 31929139200 | elapsed time per iteration (s): 0.37 | learning rate: 1.033E-04 | global batch size: 256 | lm loss: 3.308767E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.952 | TFLOPs: 32.02 | 7: iteration 61000/ 115203 | consumed samples: 15616000 | consumed tokens: 31981568000 | elapsed time per iteration (s): 0.38 | learning rate: 1.030E-04 | global batch size: 256 | lm loss: 3.314614E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.778 | TFLOPs: 31.82 | 7: iteration 61100/ 115203 | consumed samples: 15641600 | consumed tokens: 32033996800 | elapsed time per iteration (s): 0.37 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 3.315111E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.520 | TFLOPs: 32.00 | 7: iteration 61200/ 115203 | consumed samples: 15667200 | consumed tokens: 32086425600 | elapsed time per iteration (s): 0.37 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 3.309240E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.196 | TFLOPs: 31.89 | 7: iteration 61300/ 115203 | consumed samples: 15692800 | consumed tokens: 32138854400 | elapsed time per iteration (s): 0.37 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 3.308825E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.148 | TFLOPs: 32.03 | 7: iteration 61400/ 115203 | consumed samples: 15718400 | consumed tokens: 32191283200 | elapsed time per iteration (s): 0.38 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 3.310417E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.854 | TFLOPs: 31.59 | 7: iteration 61500/ 115203 | consumed samples: 15744000 | consumed tokens: 32243712000 | elapsed time per iteration (s): 0.37 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 3.312066E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.544 | TFLOPs: 31.91 | 7: iteration 61600/ 115203 | consumed samples: 15769600 | consumed tokens: 32296140800 | elapsed time per iteration (s): 0.38 | learning rate: 1.015E-04 | global batch size: 256 | lm loss: 3.311797E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.942 | TFLOPs: 31.74 | 7: iteration 61700/ 115203 | consumed samples: 15795200 | consumed tokens: 32348569600 | elapsed time per iteration (s): 0.38 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 3.307858E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.101 | TFLOPs: 31.79 | 7: iteration 61800/ 115203 | consumed samples: 15820800 | consumed tokens: 32400998400 | elapsed time per iteration (s): 0.38 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 3.316063E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.477 | TFLOPs: 31.62 | 7: iteration 61900/ 115203 | consumed samples: 15846400 | consumed tokens: 32453427200 | elapsed time per iteration (s): 0.37 | learning rate: 1.008E-04 | global batch size: 256 | lm loss: 3.312456E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.415 | TFLOPs: 31.95 | 0: [2023-03-17 05:49:58,149] [INFO] [logging.py:68:log_dist] [Rank 0] step=62000, skipped=0, lr=[0.0001005423324048397, 0.0001005423324048397, 0.0001005423324048397], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 62000/ 115203 | consumed samples: 15872000 | consumed tokens: 32505856000 | elapsed time per iteration (s): 0.37 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 3.312693E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.204 | TFLOPs: 31.94 | 0: steps: 62000 loss: 3.3449 iter time (s): 0.373 samples/sec: 686.164 7: iteration 62100/ 115203 | consumed samples: 15897600 | consumed tokens: 32558284800 | elapsed time per iteration (s): 0.38 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 3.312429E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.874 | TFLOPs: 31.73 | 7: iteration 62200/ 115203 | consumed samples: 15923200 | consumed tokens: 32610713600 | elapsed time per iteration (s): 0.38 | learning rate: 1.000E-04 | global batch size: 256 | lm loss: 3.310504E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.744 | TFLOPs: 31.82 | 7: iteration 62300/ 115203 | consumed samples: 15948800 | consumed tokens: 32663142400 | elapsed time per iteration (s): 0.38 | learning rate: 9.980E-05 | global batch size: 256 | lm loss: 3.313377E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.063 | TFLOPs: 31.84 | 7: iteration 62400/ 115203 | consumed samples: 15974400 | consumed tokens: 32715571200 | elapsed time per iteration (s): 0.37 | learning rate: 9.956E-05 | global batch size: 256 | lm loss: 3.314687E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.099 | TFLOPs: 31.98 | 7: iteration 62500/ 115203 | consumed samples: 16000000 | consumed tokens: 32768000000 | elapsed time per iteration (s): 0.38 | learning rate: 9.931E-05 | global batch size: 256 | lm loss: 3.310538E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.335 | TFLOPs: 31.24 | 7: iteration 62600/ 115203 | consumed samples: 16025600 | consumed tokens: 32820428800 | elapsed time per iteration (s): 0.38 | learning rate: 9.906E-05 | global batch size: 256 | lm loss: 3.308269E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.426 | TFLOPs: 31.48 | 7: iteration 62700/ 115203 | consumed samples: 16051200 | consumed tokens: 32872857600 | elapsed time per iteration (s): 0.38 | learning rate: 9.882E-05 | global batch size: 256 | lm loss: 3.310247E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.176 | TFLOPs: 31.65 | 7: iteration 62800/ 115203 | consumed samples: 16076800 | consumed tokens: 32925286400 | elapsed time per iteration (s): 0.37 | learning rate: 9.857E-05 | global batch size: 256 | lm loss: 3.310360E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.804 | TFLOPs: 32.10 | 7: iteration 62900/ 115203 | consumed samples: 16102400 | consumed tokens: 32977715200 | elapsed time per iteration (s): 0.37 | learning rate: 9.833E-05 | global batch size: 256 | lm loss: 3.311101E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.044 | TFLOPs: 32.12 | 7: iteration 63000/ 115203 | consumed samples: 16128000 | consumed tokens: 33030144000 | elapsed time per iteration (s): 0.37 | learning rate: 9.808E-05 | global batch size: 256 | lm loss: 3.308380E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.714 | TFLOPs: 32.01 | 7: iteration 63100/ 115203 | consumed samples: 16153600 | consumed tokens: 33082572800 | elapsed time per iteration (s): 0.37 | learning rate: 9.784E-05 | global batch size: 256 | lm loss: 3.309421E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.278 | TFLOPs: 32.03 | 7: iteration 63200/ 115203 | consumed samples: 16179200 | consumed tokens: 33135001600 | elapsed time per iteration (s): 0.37 | learning rate: 9.759E-05 | global batch size: 256 | lm loss: 3.305376E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.030 | TFLOPs: 32.02 | 7: iteration 63300/ 115203 | consumed samples: 16204800 | consumed tokens: 33187430400 | elapsed time per iteration (s): 0.37 | learning rate: 9.734E-05 | global batch size: 256 | lm loss: 3.307427E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.907 | TFLOPs: 32.11 | 7: iteration 63400/ 115203 | consumed samples: 16230400 | consumed tokens: 33239859200 | elapsed time per iteration (s): 0.37 | learning rate: 9.710E-05 | global batch size: 256 | lm loss: 3.311384E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.708 | TFLOPs: 31.96 | 7: iteration 63500/ 115203 | consumed samples: 16256000 | consumed tokens: 33292288000 | elapsed time per iteration (s): 0.37 | learning rate: 9.685E-05 | global batch size: 256 | lm loss: 3.308073E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.719 | TFLOPs: 32.10 | 7: iteration 63600/ 115203 | consumed samples: 16281600 | consumed tokens: 33344716800 | elapsed time per iteration (s): 0.37 | learning rate: 9.661E-05 | global batch size: 256 | lm loss: 3.311278E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.601 | TFLOPs: 32.09 | 7: iteration 63700/ 115203 | consumed samples: 16307200 | consumed tokens: 33397145600 | elapsed time per iteration (s): 0.37 | learning rate: 9.636E-05 | global batch size: 256 | lm loss: 3.311397E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.033 | TFLOPs: 32.11 | 7: iteration 63800/ 115203 | consumed samples: 16332800 | consumed tokens: 33449574400 | elapsed time per iteration (s): 0.37 | learning rate: 9.612E-05 | global batch size: 256 | lm loss: 3.309792E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.341 | TFLOPs: 32.22 | 7: iteration 63900/ 115203 | consumed samples: 16358400 | consumed tokens: 33502003200 | elapsed time per iteration (s): 0.37 | learning rate: 9.587E-05 | global batch size: 256 | lm loss: 3.310659E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.065 | TFLOPs: 32.16 | 0: [2023-03-17 06:02:26,230] [INFO] [logging.py:68:log_dist] [Rank 0] step=64000, skipped=0, lr=[9.56284709392273e-05, 9.56284709392273e-05, 9.56284709392273e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 64000/ 115203 | consumed samples: 16384000 | consumed tokens: 33554432000 | elapsed time per iteration (s): 0.37 | learning rate: 9.563E-05 | global batch size: 256 | lm loss: 3.306992E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.432 | TFLOPs: 32.18 | 0: steps: 64000 loss: 3.2900 iter time (s): 0.373 samples/sec: 687.173 7: iteration 64100/ 115203 | consumed samples: 16409600 | consumed tokens: 33606860800 | elapsed time per iteration (s): 0.37 | learning rate: 9.538E-05 | global batch size: 256 | lm loss: 3.307485E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.505 | TFLOPs: 32.00 | 7: iteration 64200/ 115203 | consumed samples: 16435200 | consumed tokens: 33659289600 | elapsed time per iteration (s): 0.37 | learning rate: 9.514E-05 | global batch size: 256 | lm loss: 3.307108E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.253 | TFLOPs: 32.22 | 7: iteration 64300/ 115203 | consumed samples: 16460800 | consumed tokens: 33711718400 | elapsed time per iteration (s): 0.37 | learning rate: 9.489E-05 | global batch size: 256 | lm loss: 3.307513E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.316 | TFLOPs: 32.17 | 7: iteration 64400/ 115203 | consumed samples: 16486400 | consumed tokens: 33764147200 | elapsed time per iteration (s): 0.37 | learning rate: 9.465E-05 | global batch size: 256 | lm loss: 3.307881E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.083 | TFLOPs: 32.16 | 7: iteration 64500/ 115203 | consumed samples: 16512000 | consumed tokens: 33816576000 | elapsed time per iteration (s): 0.37 | learning rate: 9.441E-05 | global batch size: 256 | lm loss: 3.305110E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.197 | TFLOPs: 32.26 | 7: iteration 64600/ 115203 | consumed samples: 16537600 | consumed tokens: 33869004800 | elapsed time per iteration (s): 0.37 | learning rate: 9.416E-05 | global batch size: 256 | lm loss: 3.307708E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.457 | TFLOPs: 32.18 | 7: iteration 64700/ 115203 | consumed samples: 16563200 | consumed tokens: 33921433600 | elapsed time per iteration (s): 0.37 | learning rate: 9.392E-05 | global batch size: 256 | lm loss: 3.302239E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.975 | TFLOPs: 32.21 | 7: iteration 64800/ 115203 | consumed samples: 16588800 | consumed tokens: 33973862400 | elapsed time per iteration (s): 0.37 | learning rate: 9.367E-05 | global batch size: 256 | lm loss: 3.309278E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.041 | TFLOPs: 32.07 | 7: iteration 64900/ 115203 | consumed samples: 16614400 | consumed tokens: 34026291200 | elapsed time per iteration (s): 0.37 | learning rate: 9.343E-05 | global batch size: 256 | lm loss: 3.306219E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.162 | TFLOPs: 32.31 | 7: iteration 65000/ 115203 | consumed samples: 16640000 | consumed tokens: 34078720000 | elapsed time per iteration (s): 0.37 | learning rate: 9.319E-05 | global batch size: 256 | lm loss: 3.304778E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.738 | TFLOPs: 32.29 | 7: iteration 65100/ 115203 | consumed samples: 16665600 | consumed tokens: 34131148800 | elapsed time per iteration (s): 0.37 | learning rate: 9.294E-05 | global batch size: 256 | lm loss: 3.308089E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.989 | TFLOPs: 32.25 | 7: iteration 65200/ 115203 | consumed samples: 16691200 | consumed tokens: 34183577600 | elapsed time per iteration (s): 0.37 | learning rate: 9.270E-05 | global batch size: 256 | lm loss: 3.308034E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.436 | TFLOPs: 32.27 | 7: iteration 65300/ 115203 | consumed samples: 16716800 | consumed tokens: 34236006400 | elapsed time per iteration (s): 0.37 | learning rate: 9.246E-05 | global batch size: 256 | lm loss: 3.300324E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.269 | TFLOPs: 32.31 | 7: iteration 65400/ 115203 | consumed samples: 16742400 | consumed tokens: 34288435200 | elapsed time per iteration (s): 0.37 | learning rate: 9.221E-05 | global batch size: 256 | lm loss: 3.307447E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.211 | TFLOPs: 32.31 | 7: iteration 65500/ 115203 | consumed samples: 16768000 | consumed tokens: 34340864000 | elapsed time per iteration (s): 0.37 | learning rate: 9.197E-05 | global batch size: 256 | lm loss: 3.305567E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.959 | TFLOPs: 32.30 | 7: iteration 65600/ 115203 | consumed samples: 16793600 | consumed tokens: 34393292800 | elapsed time per iteration (s): 0.37 | learning rate: 9.173E-05 | global batch size: 256 | lm loss: 3.300979E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.488 | TFLOPs: 32.32 | 7: iteration 65700/ 115203 | consumed samples: 16819200 | consumed tokens: 34445721600 | elapsed time per iteration (s): 0.37 | learning rate: 9.149E-05 | global batch size: 256 | lm loss: 3.304576E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.726 | TFLOPs: 32.33 | 7: iteration 65800/ 115203 | consumed samples: 16844800 | consumed tokens: 34498150400 | elapsed time per iteration (s): 0.37 | learning rate: 9.124E-05 | global batch size: 256 | lm loss: 3.300150E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.627 | TFLOPs: 32.33 | 7: iteration 65900/ 115203 | consumed samples: 16870400 | consumed tokens: 34550579200 | elapsed time per iteration (s): 0.37 | learning rate: 9.100E-05 | global batch size: 256 | lm loss: 3.302105E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.419 | TFLOPs: 32.18 | 0: [2023-03-17 06:14:47,546] [INFO] [logging.py:68:log_dist] [Rank 0] step=66000, skipped=0, lr=[9.075821569240965e-05, 9.075821569240965e-05, 9.075821569240965e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 66000/ 115203 | consumed samples: 16896000 | consumed tokens: 34603008000 | elapsed time per iteration (s): 0.37 | learning rate: 9.076E-05 | global batch size: 256 | lm loss: 3.305724E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.523 | TFLOPs: 32.28 | 0: steps: 66000 loss: 3.2653 iter time (s): 0.369 samples/sec: 693.372 7: iteration 66100/ 115203 | consumed samples: 16921600 | consumed tokens: 34655436800 | elapsed time per iteration (s): 0.37 | learning rate: 9.052E-05 | global batch size: 256 | lm loss: 3.301473E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.725 | TFLOPs: 32.29 | 7: iteration 66200/ 115203 | consumed samples: 16947200 | consumed tokens: 34707865600 | elapsed time per iteration (s): 0.37 | learning rate: 9.027E-05 | global batch size: 256 | lm loss: 3.304799E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.555 | TFLOPs: 32.33 | 7: iteration 66300/ 115203 | consumed samples: 16972800 | consumed tokens: 34760294400 | elapsed time per iteration (s): 0.37 | learning rate: 9.003E-05 | global batch size: 256 | lm loss: 3.304798E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.615 | TFLOPs: 32.33 | 7: iteration 66400/ 115203 | consumed samples: 16998400 | consumed tokens: 34812723200 | elapsed time per iteration (s): 0.37 | learning rate: 8.979E-05 | global batch size: 256 | lm loss: 3.302678E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.684 | TFLOPs: 32.33 | 7: iteration 66500/ 115203 | consumed samples: 17024000 | consumed tokens: 34865152000 | elapsed time per iteration (s): 0.37 | learning rate: 8.955E-05 | global batch size: 256 | lm loss: 3.300524E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.603 | TFLOPs: 32.33 | 7: iteration 66600/ 115203 | consumed samples: 17049600 | consumed tokens: 34917580800 | elapsed time per iteration (s): 0.37 | learning rate: 8.931E-05 | global batch size: 256 | lm loss: 3.307196E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.495 | TFLOPs: 32.32 | 7: iteration 66700/ 115203 | consumed samples: 17075200 | consumed tokens: 34970009600 | elapsed time per iteration (s): 0.37 | learning rate: 8.907E-05 | global batch size: 256 | lm loss: 3.307567E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.635 | TFLOPs: 32.33 | 7: iteration 66800/ 115203 | consumed samples: 17100800 | consumed tokens: 35022438400 | elapsed time per iteration (s): 0.37 | learning rate: 8.883E-05 | global batch size: 256 | lm loss: 3.306215E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.445 | TFLOPs: 32.32 | 7: iteration 66900/ 115203 | consumed samples: 17126400 | consumed tokens: 35074867200 | elapsed time per iteration (s): 0.37 | learning rate: 8.858E-05 | global batch size: 256 | lm loss: 3.301021E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.382 | TFLOPs: 32.32 | 7: iteration 67000/ 115203 | consumed samples: 17152000 | consumed tokens: 35127296000 | elapsed time per iteration (s): 0.37 | learning rate: 8.834E-05 | global batch size: 256 | lm loss: 3.304190E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.055 | TFLOPs: 32.30 | 7: iteration 67100/ 115203 | consumed samples: 17177600 | consumed tokens: 35179724800 | elapsed time per iteration (s): 0.37 | learning rate: 8.810E-05 | global batch size: 256 | lm loss: 3.300943E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.009 | TFLOPs: 32.30 | 7: iteration 67200/ 115203 | consumed samples: 17203200 | consumed tokens: 35232153600 | elapsed time per iteration (s): 0.37 | learning rate: 8.786E-05 | global batch size: 256 | lm loss: 3.303647E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.721 | TFLOPs: 32.29 | 7: iteration 67300/ 115203 | consumed samples: 17228800 | consumed tokens: 35284582400 | elapsed time per iteration (s): 0.37 | learning rate: 8.762E-05 | global batch size: 256 | lm loss: 3.300211E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.233 | TFLOPs: 32.31 | 7: iteration 67400/ 115203 | consumed samples: 17254400 | consumed tokens: 35337011200 | elapsed time per iteration (s): 0.37 | learning rate: 8.738E-05 | global batch size: 256 | lm loss: 3.297923E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.315 | TFLOPs: 32.27 | 7: iteration 67500/ 115203 | consumed samples: 17280000 | consumed tokens: 35389440000 | elapsed time per iteration (s): 0.37 | learning rate: 8.714E-05 | global batch size: 256 | lm loss: 3.297503E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.041 | TFLOPs: 32.30 | 7: iteration 67600/ 115203 | consumed samples: 17305600 | consumed tokens: 35441868800 | elapsed time per iteration (s): 0.37 | learning rate: 8.690E-05 | global batch size: 256 | lm loss: 3.303371E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.562 | TFLOPs: 32.37 | 7: iteration 67700/ 115203 | consumed samples: 17331200 | consumed tokens: 35494297600 | elapsed time per iteration (s): 0.37 | learning rate: 8.666E-05 | global batch size: 256 | lm loss: 3.304814E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.805 | TFLOPs: 32.38 | 7: iteration 67800/ 115203 | consumed samples: 17356800 | consumed tokens: 35546726400 | elapsed time per iteration (s): 0.37 | learning rate: 8.642E-05 | global batch size: 256 | lm loss: 3.297902E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.604 | TFLOPs: 32.37 | 7: iteration 67900/ 115203 | consumed samples: 17382400 | consumed tokens: 35599155200 | elapsed time per iteration (s): 0.37 | learning rate: 8.619E-05 | global batch size: 256 | lm loss: 3.300228E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.419 | TFLOPs: 32.37 | 0: [2023-03-17 06:27:06,863] [INFO] [logging.py:68:log_dist] [Rank 0] step=68000, skipped=0, lr=[8.594634403532495e-05, 8.594634403532495e-05, 8.594634403532495e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 68000/ 115203 | consumed samples: 17408000 | consumed tokens: 35651584000 | elapsed time per iteration (s): 0.37 | learning rate: 8.595E-05 | global batch size: 256 | lm loss: 3.297016E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.754 | TFLOPs: 32.34 | 0: steps: 68000 loss: 3.2788 iter time (s): 0.368 samples/sec: 695.166 7: iteration 68100/ 115203 | consumed samples: 17433600 | consumed tokens: 35704012800 | elapsed time per iteration (s): 0.37 | learning rate: 8.571E-05 | global batch size: 256 | lm loss: 3.300591E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.381 | TFLOPs: 32.27 | 7: iteration 68200/ 115203 | consumed samples: 17459200 | consumed tokens: 35756441600 | elapsed time per iteration (s): 0.37 | learning rate: 8.547E-05 | global batch size: 256 | lm loss: 3.307255E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.162 | TFLOPs: 32.31 | 7: iteration 68300/ 115203 | consumed samples: 17484800 | consumed tokens: 35808870400 | elapsed time per iteration (s): 0.37 | learning rate: 8.523E-05 | global batch size: 256 | lm loss: 3.298655E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.589 | TFLOPs: 32.33 | 7: iteration 68400/ 115203 | consumed samples: 17510400 | consumed tokens: 35861299200 | elapsed time per iteration (s): 0.37 | learning rate: 8.499E-05 | global batch size: 256 | lm loss: 3.303450E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.804 | TFLOPs: 32.29 | 7: iteration 68500/ 115203 | consumed samples: 17536000 | consumed tokens: 35913728000 | elapsed time per iteration (s): 0.37 | learning rate: 8.475E-05 | global batch size: 256 | lm loss: 3.304822E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.089 | TFLOPs: 32.30 | 7: iteration 68600/ 115203 | consumed samples: 17561600 | consumed tokens: 35966156800 | elapsed time per iteration (s): 0.37 | learning rate: 8.452E-05 | global batch size: 256 | lm loss: 3.298113E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.330 | TFLOPs: 32.32 | 7: iteration 68700/ 115203 | consumed samples: 17587200 | consumed tokens: 36018585600 | elapsed time per iteration (s): 0.37 | learning rate: 8.428E-05 | global batch size: 256 | lm loss: 3.299864E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.820 | TFLOPs: 32.34 | 7: iteration 68800/ 115203 | consumed samples: 17612800 | consumed tokens: 36071014400 | elapsed time per iteration (s): 0.37 | learning rate: 8.404E-05 | global batch size: 256 | lm loss: 3.297259E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.375 | TFLOPs: 32.36 | 7: iteration 68900/ 115203 | consumed samples: 17638400 | consumed tokens: 36123443200 | elapsed time per iteration (s): 0.37 | learning rate: 8.380E-05 | global batch size: 256 | lm loss: 3.300506E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.693 | TFLOPs: 32.38 | 7: iteration 69000/ 115203 | consumed samples: 17664000 | consumed tokens: 36175872000 | elapsed time per iteration (s): 0.37 | learning rate: 8.357E-05 | global batch size: 256 | lm loss: 3.296601E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.532 | TFLOPs: 32.37 | 7: iteration 69100/ 115203 | consumed samples: 17689600 | consumed tokens: 36228300800 | elapsed time per iteration (s): 0.37 | learning rate: 8.333E-05 | global batch size: 256 | lm loss: 3.295871E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.416 | TFLOPs: 32.37 | 7: iteration 69200/ 115203 | consumed samples: 17715200 | consumed tokens: 36280729600 | elapsed time per iteration (s): 0.37 | learning rate: 8.309E-05 | global batch size: 256 | lm loss: 3.299691E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.838 | TFLOPs: 32.34 | 7: iteration 69300/ 115203 | consumed samples: 17740800 | consumed tokens: 36333158400 | elapsed time per iteration (s): 0.37 | learning rate: 8.286E-05 | global batch size: 256 | lm loss: 3.299237E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.362 | TFLOPs: 32.36 | 7: iteration 69400/ 115203 | consumed samples: 17766400 | consumed tokens: 36385587200 | elapsed time per iteration (s): 0.37 | learning rate: 8.262E-05 | global batch size: 256 | lm loss: 3.299748E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.427 | TFLOPs: 32.37 | 7: iteration 69500/ 115203 | consumed samples: 17792000 | consumed tokens: 36438016000 | elapsed time per iteration (s): 0.37 | learning rate: 8.238E-05 | global batch size: 256 | lm loss: 3.298146E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.498 | TFLOPs: 32.32 | 7: iteration 69600/ 115203 | consumed samples: 17817600 | consumed tokens: 36490444800 | elapsed time per iteration (s): 0.37 | learning rate: 8.215E-05 | global batch size: 256 | lm loss: 3.298929E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.474 | TFLOPs: 32.32 | 7: iteration 69700/ 115203 | consumed samples: 17843200 | consumed tokens: 36542873600 | elapsed time per iteration (s): 0.37 | learning rate: 8.191E-05 | global batch size: 256 | lm loss: 3.297062E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.617 | TFLOPs: 32.00 | 7: iteration 69800/ 115203 | consumed samples: 17868800 | consumed tokens: 36595302400 | elapsed time per iteration (s): 0.37 | learning rate: 8.168E-05 | global batch size: 256 | lm loss: 3.294957E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.755 | TFLOPs: 32.29 | 7: iteration 69900/ 115203 | consumed samples: 17894400 | consumed tokens: 36647731200 | elapsed time per iteration (s): 0.37 | learning rate: 8.144E-05 | global batch size: 256 | lm loss: 3.302117E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.371 | TFLOPs: 32.36 | 0: [2023-03-17 06:39:26,324] [INFO] [logging.py:68:log_dist] [Rank 0] step=70000, skipped=0, lr=[8.120745619091417e-05, 8.120745619091417e-05, 8.120745619091417e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 70000/ 115203 | consumed samples: 17920000 | consumed tokens: 36700160000 | elapsed time per iteration (s): 0.37 | learning rate: 8.121E-05 | global batch size: 256 | lm loss: 3.296920E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.482 | TFLOPs: 32.37 | 0: steps: 70000 loss: 3.3020 iter time (s): 0.368 samples/sec: 695.019 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 70000 | lm loss value: 3.406045E+00 | lm loss PPL: 3.014577E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 70000 to checkpoints_146m60b400m 0: [2023-03-17 06:39:26,448] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step70000 is begin to save! 0: [2023-03-17 06:39:26,452] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/layer_01-model_00-model_states.pt... 0: [2023-03-17 06:39:26,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/layer_01-model_00-model_states.pt. 0: [2023-03-17 06:39:26,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/layer_03-model_00-model_states.pt... 0: [2023-03-17 06:39:26,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/layer_03-model_00-model_states.pt. 0: [2023-03-17 06:39:26,576] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/layer_04-model_00-model_states.pt... 0: [2023-03-17 06:39:26,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/layer_04-model_00-model_states.pt. 0: [2023-03-17 06:39:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/layer_05-model_00-model_states.pt... 0: [2023-03-17 06:39:26,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/layer_05-model_00-model_states.pt. 0: [2023-03-17 06:39:26,607] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/layer_06-model_00-model_states.pt... 0: [2023-03-17 06:39:26,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/layer_06-model_00-model_states.pt. 0: [2023-03-17 06:39:26,623] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/layer_07-model_00-model_states.pt... 0: [2023-03-17 06:39:26,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/layer_07-model_00-model_states.pt. 0: [2023-03-17 06:39:26,639] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/layer_08-model_00-model_states.pt... 0: [2023-03-17 06:39:26,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/layer_08-model_00-model_states.pt. 0: [2023-03-17 06:39:26,655] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/layer_09-model_00-model_states.pt... 0: [2023-03-17 06:39:26,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/layer_09-model_00-model_states.pt. 0: [2023-03-17 06:39:26,670] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/layer_10-model_00-model_states.pt... 0: [2023-03-17 06:39:26,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/layer_10-model_00-model_states.pt. 0: [2023-03-17 06:39:26,686] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/layer_11-model_00-model_states.pt... 0: [2023-03-17 06:39:26,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/layer_11-model_00-model_states.pt. 0: [2023-03-17 06:39:26,701] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/layer_12-model_00-model_states.pt... 0: [2023-03-17 06:39:26,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/layer_12-model_00-model_states.pt. 0: [2023-03-17 06:39:26,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/layer_13-model_00-model_states.pt... 0: [2023-03-17 06:39:26,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/layer_13-model_00-model_states.pt. 0: [2023-03-17 06:39:26,733] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/layer_14-model_00-model_states.pt... 0: [2023-03-17 06:39:26,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/layer_14-model_00-model_states.pt. 0: [2023-03-17 06:39:26,749] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/layer_15-model_00-model_states.pt... 0: [2023-03-17 06:39:26,764] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/layer_15-model_00-model_states.pt. 0: [2023-03-17 06:39:26,764] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/layer_16-model_00-model_states.pt... 0: [2023-03-17 06:39:26,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/layer_16-model_00-model_states.pt. 0: [2023-03-17 06:39:26,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/layer_17-model_00-model_states.pt... 0: [2023-03-17 06:39:26,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/layer_17-model_00-model_states.pt. 0: [2023-03-17 06:39:26,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/layer_19-model_00-model_states.pt... 0: [2023-03-17 06:39:26,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/layer_19-model_00-model_states.pt. 0: [2023-03-17 06:39:26,797] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b400m/global_step70000/mp_rank_00_model_states.pt 0: [2023-03-17 06:39:26,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/mp_rank_00_model_states.pt... 0: [2023-03-17 06:39:26,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/mp_rank_00_model_states.pt. 0: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 2: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 4: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 6: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 1: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 3: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 0: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 2: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 4: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 6: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 1: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 3: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 7: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 2: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 4: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 6: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 06:39:26,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 06:39:26,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 06:39:26,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 06:39:26,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 06:39:26,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 06:39:26,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 06:39:26,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 06:39:26,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 06:39:26,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 06:39:26,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 06:39:26,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 06:39:26,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 06:39:26,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 06:39:26,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 06:39:26,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 06:39:26,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 06:39:26,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 06:39:26,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 06:39:26,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 06:39:26,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 06:39:26,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 06:39:26,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 06:39:26,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 06:39:26,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 06:39:26,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 06:39:26,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 06:39:26,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 06:39:26,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 06:39:26,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 6: [2023-03-17 06:39:26,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 7: [2023-03-17 06:39:26,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 6: [2023-03-17 06:39:26,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 06:39:26,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 06:39:26,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 06:39:26,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 06:39:26,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 06:39:26,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 06:39:26,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 06:39:26,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 06:39:26,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 06:39:26,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 06:39:26,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 06:39:26,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 06:39:26,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 06:39:26,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 06:39:26,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 06:39:26,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 06:39:26,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 06:39:26,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 06:39:26,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 06:39:26,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 06:39:26,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 06:39:26,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 06:39:26,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 06:39:26,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 06:39:26,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 06:39:26,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 1: [2023-03-17 06:39:26,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 0: [2023-03-17 06:39:26,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 1: [2023-03-17 06:39:26,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 0: [2023-03-17 06:39:26,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 06:39:26,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 06:39:26,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 06:39:26,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 06:39:26,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 06:39:26,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 06:39:26,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 06:39:26,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 06:39:26,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 06:39:26,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 06:39:26,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 06:39:26,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 06:39:26,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 06:39:26,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 06:39:26,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 06:39:26,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 06:39:26,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 06:39:26,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 06:39:26,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 06:39:26,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 06:39:26,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 06:39:26,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 06:39:26,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 06:39:26,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 06:39:26,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 06:39:26,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 06:39:26,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 06:39:26,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 06:39:26,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 06:39:26,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 06:39:26,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 06:39:26,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 06:39:26,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 06:39:26,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 06:39:26,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 06:39:26,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 06:39:26,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 2: [2023-03-17 06:39:26,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 06:39:26,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 06:39:26,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 06:39:26,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 06:39:26,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 3: [2023-03-17 06:39:26,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 06:39:26,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 06:39:26,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 06:39:26,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 06:39:26,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 06:39:26,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 06:39:26,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 06:39:26,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 2: [2023-03-17 06:39:26,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 06:39:26,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 06:39:26,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 3: [2023-03-17 06:39:26,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 06:39:26,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 06:39:26,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 2: [2023-03-17 06:39:26,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 06:39:26,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 06:39:26,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 06:39:26,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 06:39:26,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 3: [2023-03-17 06:39:26,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 06:39:26,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 2: [2023-03-17 06:39:26,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 06:39:26,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 06:39:26,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 06:39:26,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 06:39:26,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 06:39:26,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 06:39:26,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 06:39:26,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 06:39:26,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 06:39:26,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 06:39:26,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 06:39:26,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 06:39:26,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 06:39:26,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 06:39:26,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 06:39:26,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 06:39:26,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 06:39:26,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 06:39:26,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 06:39:26,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 06:39:26,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 06:39:26,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 06:39:26,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 06:39:26,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 06:39:26,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 06:39:26,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 06:39:26,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 06:39:26,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 06:39:26,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 06:39:26,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 06:39:26,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 06:39:26,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 06:39:26,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 06:39:26,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 06:39:26,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 06:39:26,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 06:39:26,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 06:39:26,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 06:39:26,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 06:39:26,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 06:39:26,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 06:39:26,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 06:39:26,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 06:39:26,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 06:39:26,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 06:39:26,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 06:39:26,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 06:39:26,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 06:39:26,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 06:39:26,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 06:39:26,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 06:39:26,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 06:39:26,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 06:39:26,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 06:39:26,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 06:39:26,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 06:39:26,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 06:39:26,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 06:39:26,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 06:39:26,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 06:39:26,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 06:39:26,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step70000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 06:39:26,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 06:39:26,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 06:39:26,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 06:39:26,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 06:39:26,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 06:39:26,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 06:39:26,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 06:39:26,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: successfully saved checkpoint at iteration 70000 to checkpoints_146m60b400m 7: time (ms) | save-checkpoint: 447.59 7: iteration 70100/ 115203 | consumed samples: 17945600 | consumed tokens: 36752588800 | elapsed time per iteration (s): 0.38 | learning rate: 8.097E-05 | global batch size: 256 | lm loss: 3.295732E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.948 | TFLOPs: 31.83 | 7: iteration 70200/ 115203 | consumed samples: 17971200 | consumed tokens: 36805017600 | elapsed time per iteration (s): 0.37 | learning rate: 8.074E-05 | global batch size: 256 | lm loss: 3.298371E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.475 | TFLOPs: 32.37 | 7: iteration 70300/ 115203 | consumed samples: 17996800 | consumed tokens: 36857446400 | elapsed time per iteration (s): 0.37 | learning rate: 8.050E-05 | global batch size: 256 | lm loss: 3.296129E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.174 | TFLOPs: 32.35 | 7: iteration 70400/ 115203 | consumed samples: 18022400 | consumed tokens: 36909875200 | elapsed time per iteration (s): 0.37 | learning rate: 8.027E-05 | global batch size: 256 | lm loss: 3.299754E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.053 | TFLOPs: 32.35 | 7: iteration 70500/ 115203 | consumed samples: 18048000 | consumed tokens: 36962304000 | elapsed time per iteration (s): 0.37 | learning rate: 8.004E-05 | global batch size: 256 | lm loss: 3.293254E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.816 | TFLOPs: 32.34 | 7: iteration 70600/ 115203 | consumed samples: 18073600 | consumed tokens: 37014732800 | elapsed time per iteration (s): 0.38 | learning rate: 7.980E-05 | global batch size: 256 | lm loss: 3.296987E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.507 | TFLOPs: 31.81 | 7: iteration 70700/ 115203 | consumed samples: 18099200 | consumed tokens: 37067161600 | elapsed time per iteration (s): 0.38 | learning rate: 7.957E-05 | global batch size: 256 | lm loss: 3.295877E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.829 | TFLOPs: 31.64 | 7: iteration 70800/ 115203 | consumed samples: 18124800 | consumed tokens: 37119590400 | elapsed time per iteration (s): 0.38 | learning rate: 7.934E-05 | global batch size: 256 | lm loss: 3.292406E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.878 | TFLOPs: 31.41 | 7: iteration 70900/ 115203 | consumed samples: 18150400 | consumed tokens: 37172019200 | elapsed time per iteration (s): 0.38 | learning rate: 7.910E-05 | global batch size: 256 | lm loss: 3.294398E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.660 | TFLOPs: 31.54 | 7: iteration 71000/ 115203 | consumed samples: 18176000 | consumed tokens: 37224448000 | elapsed time per iteration (s): 0.38 | learning rate: 7.887E-05 | global batch size: 256 | lm loss: 3.292472E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.784 | TFLOPs: 31.50 | 7: iteration 71100/ 115203 | consumed samples: 18201600 | consumed tokens: 37276876800 | elapsed time per iteration (s): 0.38 | learning rate: 7.864E-05 | global batch size: 256 | lm loss: 3.294762E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.329 | TFLOPs: 31.71 | 7: iteration 71200/ 115203 | consumed samples: 18227200 | consumed tokens: 37329305600 | elapsed time per iteration (s): 0.38 | learning rate: 7.841E-05 | global batch size: 256 | lm loss: 3.294382E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.812 | TFLOPs: 31.64 | 7: iteration 71300/ 115203 | consumed samples: 18252800 | consumed tokens: 37381734400 | elapsed time per iteration (s): 0.38 | learning rate: 7.817E-05 | global batch size: 256 | lm loss: 3.298961E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.442 | TFLOPs: 31.48 | 7: iteration 71400/ 115203 | consumed samples: 18278400 | consumed tokens: 37434163200 | elapsed time per iteration (s): 0.38 | learning rate: 7.794E-05 | global batch size: 256 | lm loss: 3.290491E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.393 | TFLOPs: 31.38 | 7: iteration 71500/ 115203 | consumed samples: 18304000 | consumed tokens: 37486592000 | elapsed time per iteration (s): 0.38 | learning rate: 7.771E-05 | global batch size: 256 | lm loss: 3.294526E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.289 | TFLOPs: 31.85 | 7: iteration 71600/ 115203 | consumed samples: 18329600 | consumed tokens: 37539020800 | elapsed time per iteration (s): 0.38 | learning rate: 7.748E-05 | global batch size: 256 | lm loss: 3.295201E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.805 | TFLOPs: 31.64 | 7: iteration 71700/ 115203 | consumed samples: 18355200 | consumed tokens: 37591449600 | elapsed time per iteration (s): 0.38 | learning rate: 7.725E-05 | global batch size: 256 | lm loss: 3.298756E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.910 | TFLOPs: 31.83 | 7: iteration 71800/ 115203 | consumed samples: 18380800 | consumed tokens: 37643878400 | elapsed time per iteration (s): 0.38 | learning rate: 7.702E-05 | global batch size: 256 | lm loss: 3.299259E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.795 | TFLOPs: 31.82 | 7: iteration 71900/ 115203 | consumed samples: 18406400 | consumed tokens: 37696307200 | elapsed time per iteration (s): 0.38 | learning rate: 7.679E-05 | global batch size: 256 | lm loss: 3.292580E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.673 | TFLOPs: 31.82 | 0: [2023-03-17 06:51:58,086] [INFO] [logging.py:68:log_dist] [Rank 0] step=72000, skipped=0, lr=[7.655593093399763e-05, 7.655593093399763e-05, 7.655593093399763e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 72000/ 115203 | consumed samples: 18432000 | consumed tokens: 37748736000 | elapsed time per iteration (s): 0.38 | learning rate: 7.656E-05 | global batch size: 256 | lm loss: 3.293961E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.057 | TFLOPs: 31.56 | 0: steps: 72000 loss: 3.2848 iter time (s): 0.374 samples/sec: 684.931 7: iteration 72100/ 115203 | consumed samples: 18457600 | consumed tokens: 37801164800 | elapsed time per iteration (s): 0.37 | learning rate: 7.633E-05 | global batch size: 256 | lm loss: 3.293619E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.984 | TFLOPs: 31.93 | 7: iteration 72200/ 115203 | consumed samples: 18483200 | consumed tokens: 37853593600 | elapsed time per iteration (s): 0.38 | learning rate: 7.610E-05 | global batch size: 256 | lm loss: 3.298297E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.127 | TFLOPs: 31.79 | 7: iteration 72300/ 115203 | consumed samples: 18508800 | consumed tokens: 37906022400 | elapsed time per iteration (s): 0.37 | learning rate: 7.587E-05 | global batch size: 256 | lm loss: 3.291091E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.068 | TFLOPs: 31.98 | 7: iteration 72400/ 115203 | consumed samples: 18534400 | consumed tokens: 37958451200 | elapsed time per iteration (s): 0.37 | learning rate: 7.564E-05 | global batch size: 256 | lm loss: 3.294988E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.349 | TFLOPs: 31.90 | 7: iteration 72500/ 115203 | consumed samples: 18560000 | consumed tokens: 38010880000 | elapsed time per iteration (s): 0.37 | learning rate: 7.541E-05 | global batch size: 256 | lm loss: 3.294960E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.699 | TFLOPs: 31.91 | 7: iteration 72600/ 115203 | consumed samples: 18585600 | consumed tokens: 38063308800 | elapsed time per iteration (s): 0.37 | learning rate: 7.518E-05 | global batch size: 256 | lm loss: 3.295647E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.522 | TFLOPs: 32.00 | 7: iteration 72700/ 115203 | consumed samples: 18611200 | consumed tokens: 38115737600 | elapsed time per iteration (s): 0.37 | learning rate: 7.495E-05 | global batch size: 256 | lm loss: 3.290345E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.572 | TFLOPs: 32.05 | 7: iteration 72800/ 115203 | consumed samples: 18636800 | consumed tokens: 38168166400 | elapsed time per iteration (s): 0.37 | learning rate: 7.472E-05 | global batch size: 256 | lm loss: 3.290414E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.815 | TFLOPs: 31.92 | 7: iteration 72900/ 115203 | consumed samples: 18662400 | consumed tokens: 38220595200 | elapsed time per iteration (s): 0.37 | learning rate: 7.450E-05 | global batch size: 256 | lm loss: 3.294265E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.077 | TFLOPs: 31.88 | 7: iteration 73000/ 115203 | consumed samples: 18688000 | consumed tokens: 38273024000 | elapsed time per iteration (s): 0.38 | learning rate: 7.427E-05 | global batch size: 256 | lm loss: 3.292849E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.337 | TFLOPs: 31.62 | 7: iteration 73100/ 115203 | consumed samples: 18713600 | consumed tokens: 38325452800 | elapsed time per iteration (s): 0.37 | learning rate: 7.404E-05 | global batch size: 256 | lm loss: 3.294001E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.996 | TFLOPs: 31.88 | 7: iteration 73200/ 115203 | consumed samples: 18739200 | consumed tokens: 38377881600 | elapsed time per iteration (s): 0.38 | learning rate: 7.381E-05 | global batch size: 256 | lm loss: 3.295185E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.243 | TFLOPs: 31.80 | 7: iteration 73300/ 115203 | consumed samples: 18764800 | consumed tokens: 38430310400 | elapsed time per iteration (s): 0.37 | learning rate: 7.359E-05 | global batch size: 256 | lm loss: 3.294347E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.789 | TFLOPs: 31.96 | 7: iteration 73400/ 115203 | consumed samples: 18790400 | consumed tokens: 38482739200 | elapsed time per iteration (s): 0.37 | learning rate: 7.336E-05 | global batch size: 256 | lm loss: 3.292531E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.055 | TFLOPs: 31.98 | 7: iteration 73500/ 115203 | consumed samples: 18816000 | consumed tokens: 38535168000 | elapsed time per iteration (s): 0.37 | learning rate: 7.313E-05 | global batch size: 256 | lm loss: 3.289242E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.816 | TFLOPs: 32.06 | 7: iteration 73600/ 115203 | consumed samples: 18841600 | consumed tokens: 38587596800 | elapsed time per iteration (s): 0.37 | learning rate: 7.291E-05 | global batch size: 256 | lm loss: 3.295910E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.877 | TFLOPs: 32.11 | 7: iteration 73700/ 115203 | consumed samples: 18867200 | consumed tokens: 38640025600 | elapsed time per iteration (s): 0.37 | learning rate: 7.268E-05 | global batch size: 256 | lm loss: 3.289602E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.007 | TFLOPs: 31.88 | 7: iteration 73800/ 115203 | consumed samples: 18892800 | consumed tokens: 38692454400 | elapsed time per iteration (s): 0.37 | learning rate: 7.246E-05 | global batch size: 256 | lm loss: 3.290467E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.985 | TFLOPs: 31.97 | 7: iteration 73900/ 115203 | consumed samples: 18918400 | consumed tokens: 38744883200 | elapsed time per iteration (s): 0.37 | learning rate: 7.223E-05 | global batch size: 256 | lm loss: 3.288858E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.745 | TFLOPs: 31.87 | 0: [2023-03-17 07:04:26,686] [INFO] [logging.py:68:log_dist] [Rank 0] step=74000, skipped=0, lr=[7.20058819630707e-05, 7.20058819630707e-05, 7.20058819630707e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 74000/ 115203 | consumed samples: 18944000 | consumed tokens: 38797312000 | elapsed time per iteration (s): 0.37 | learning rate: 7.201E-05 | global batch size: 256 | lm loss: 3.292284E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.973 | TFLOPs: 32.02 | 0: steps: 74000 loss: 3.2722 iter time (s): 0.372 samples/sec: 687.463 7: iteration 74100/ 115203 | consumed samples: 18969600 | consumed tokens: 38849740800 | elapsed time per iteration (s): 0.37 | learning rate: 7.178E-05 | global batch size: 256 | lm loss: 3.287816E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.653 | TFLOPs: 32.00 | 7: iteration 74200/ 115203 | consumed samples: 18995200 | consumed tokens: 38902169600 | elapsed time per iteration (s): 0.37 | learning rate: 7.156E-05 | global batch size: 256 | lm loss: 3.292397E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.396 | TFLOPs: 32.04 | 7: iteration 74300/ 115203 | consumed samples: 19020800 | consumed tokens: 38954598400 | elapsed time per iteration (s): 0.37 | learning rate: 7.133E-05 | global batch size: 256 | lm loss: 3.284949E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.053 | TFLOPs: 31.98 | 7: iteration 74400/ 115203 | consumed samples: 19046400 | consumed tokens: 39007027200 | elapsed time per iteration (s): 0.37 | learning rate: 7.111E-05 | global batch size: 256 | lm loss: 3.292219E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.467 | TFLOPs: 32.04 | 7: iteration 74500/ 115203 | consumed samples: 19072000 | consumed tokens: 39059456000 | elapsed time per iteration (s): 0.37 | learning rate: 7.089E-05 | global batch size: 256 | lm loss: 3.293717E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.790 | TFLOPs: 32.01 | 7: iteration 74600/ 115203 | consumed samples: 19097600 | consumed tokens: 39111884800 | elapsed time per iteration (s): 0.37 | learning rate: 7.066E-05 | global batch size: 256 | lm loss: 3.282631E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.680 | TFLOPs: 32.10 | 7: iteration 74700/ 115203 | consumed samples: 19123200 | consumed tokens: 39164313600 | elapsed time per iteration (s): 0.37 | learning rate: 7.044E-05 | global batch size: 256 | lm loss: 3.290201E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.049 | TFLOPs: 32.12 | 7: iteration 74800/ 115203 | consumed samples: 19148800 | consumed tokens: 39216742400 | elapsed time per iteration (s): 0.37 | learning rate: 7.022E-05 | global batch size: 256 | lm loss: 3.292664E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.308 | TFLOPs: 32.08 | 7: iteration 74900/ 115203 | consumed samples: 19174400 | consumed tokens: 39269171200 | elapsed time per iteration (s): 0.37 | learning rate: 7.000E-05 | global batch size: 256 | lm loss: 3.288365E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.902 | TFLOPs: 31.92 | 7: iteration 75000/ 115203 | consumed samples: 19200000 | consumed tokens: 39321600000 | elapsed time per iteration (s): 0.37 | learning rate: 6.977E-05 | global batch size: 256 | lm loss: 3.289924E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.309 | TFLOPs: 31.89 | 7: iteration 75100/ 115203 | consumed samples: 19225600 | consumed tokens: 39374028800 | elapsed time per iteration (s): 0.37 | learning rate: 6.955E-05 | global batch size: 256 | lm loss: 3.290401E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.763 | TFLOPs: 31.96 | 7: iteration 75200/ 115203 | consumed samples: 19251200 | consumed tokens: 39426457600 | elapsed time per iteration (s): 0.37 | learning rate: 6.933E-05 | global batch size: 256 | lm loss: 3.288574E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.386 | TFLOPs: 32.13 | 7: iteration 75300/ 115203 | consumed samples: 19276800 | consumed tokens: 39478886400 | elapsed time per iteration (s): 0.37 | learning rate: 6.911E-05 | global batch size: 256 | lm loss: 3.290075E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.414 | TFLOPs: 32.13 | 7: iteration 75400/ 115203 | consumed samples: 19302400 | consumed tokens: 39531315200 | elapsed time per iteration (s): 0.37 | learning rate: 6.889E-05 | global batch size: 256 | lm loss: 3.289817E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.029 | TFLOPs: 32.11 | 7: iteration 75500/ 115203 | consumed samples: 19328000 | consumed tokens: 39583744000 | elapsed time per iteration (s): 0.37 | learning rate: 6.867E-05 | global batch size: 256 | lm loss: 3.289924E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.208 | TFLOPs: 31.98 | 7: iteration 75600/ 115203 | consumed samples: 19353600 | consumed tokens: 39636172800 | elapsed time per iteration (s): 0.37 | learning rate: 6.845E-05 | global batch size: 256 | lm loss: 3.290027E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.987 | TFLOPs: 32.07 | 7: iteration 75700/ 115203 | consumed samples: 19379200 | consumed tokens: 39688601600 | elapsed time per iteration (s): 0.37 | learning rate: 6.823E-05 | global batch size: 256 | lm loss: 3.289471E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.037 | TFLOPs: 32.16 | 7: iteration 75800/ 115203 | consumed samples: 19404800 | consumed tokens: 39741030400 | elapsed time per iteration (s): 0.37 | learning rate: 6.801E-05 | global batch size: 256 | lm loss: 3.286186E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.772 | TFLOPs: 32.29 | 7: iteration 75900/ 115203 | consumed samples: 19430400 | consumed tokens: 39793459200 | elapsed time per iteration (s): 0.37 | learning rate: 6.779E-05 | global batch size: 256 | lm loss: 3.288966E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.865 | TFLOPs: 32.29 | 0: [2023-03-17 07:16:51,743] [INFO] [logging.py:68:log_dist] [Rank 0] step=76000, skipped=0, lr=[6.757111507639708e-05, 6.757111507639708e-05, 6.757111507639708e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 76000/ 115203 | consumed samples: 19456000 | consumed tokens: 39845888000 | elapsed time per iteration (s): 0.37 | learning rate: 6.757E-05 | global batch size: 256 | lm loss: 3.284984E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.002 | TFLOPs: 32.21 | 0: steps: 76000 loss: 3.2970 iter time (s): 0.371 samples/sec: 690.375 7: iteration 76100/ 115203 | consumed samples: 19481600 | consumed tokens: 39898316800 | elapsed time per iteration (s): 0.37 | learning rate: 6.735E-05 | global batch size: 256 | lm loss: 3.285045E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.728 | TFLOPs: 32.29 | 7: iteration 76200/ 115203 | consumed samples: 19507200 | consumed tokens: 39950745600 | elapsed time per iteration (s): 0.37 | learning rate: 6.713E-05 | global batch size: 256 | lm loss: 3.286656E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.667 | TFLOPs: 32.24 | 7: iteration 76300/ 115203 | consumed samples: 19532800 | consumed tokens: 40003174400 | elapsed time per iteration (s): 0.37 | learning rate: 6.692E-05 | global batch size: 256 | lm loss: 3.291569E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.612 | TFLOPs: 32.05 | 7: iteration 76400/ 115203 | consumed samples: 19558400 | consumed tokens: 40055603200 | elapsed time per iteration (s): 0.37 | learning rate: 6.670E-05 | global batch size: 256 | lm loss: 3.284526E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.785 | TFLOPs: 31.96 | 7: iteration 76500/ 115203 | consumed samples: 19584000 | consumed tokens: 40108032000 | elapsed time per iteration (s): 0.37 | learning rate: 6.648E-05 | global batch size: 256 | lm loss: 3.289949E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.230 | TFLOPs: 32.22 | 7: iteration 76600/ 115203 | consumed samples: 19609600 | consumed tokens: 40160460800 | elapsed time per iteration (s): 0.37 | learning rate: 6.627E-05 | global batch size: 256 | lm loss: 3.289725E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.671 | TFLOPs: 32.28 | 7: iteration 76700/ 115203 | consumed samples: 19635200 | consumed tokens: 40212889600 | elapsed time per iteration (s): 0.37 | learning rate: 6.605E-05 | global batch size: 256 | lm loss: 3.289471E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.383 | TFLOPs: 32.22 | 7: iteration 76800/ 115203 | consumed samples: 19660800 | consumed tokens: 40265318400 | elapsed time per iteration (s): 0.37 | learning rate: 6.583E-05 | global batch size: 256 | lm loss: 3.285323E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.225 | TFLOPs: 32.26 | 7: iteration 76900/ 115203 | consumed samples: 19686400 | consumed tokens: 40317747200 | elapsed time per iteration (s): 0.37 | learning rate: 6.562E-05 | global batch size: 256 | lm loss: 3.291577E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.451 | TFLOPs: 32.04 | 7: iteration 77000/ 115203 | consumed samples: 19712000 | consumed tokens: 40370176000 | elapsed time per iteration (s): 0.37 | learning rate: 6.540E-05 | global batch size: 256 | lm loss: 3.288383E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.731 | TFLOPs: 32.29 | 7: iteration 77100/ 115203 | consumed samples: 19737600 | consumed tokens: 40422604800 | elapsed time per iteration (s): 0.37 | learning rate: 6.519E-05 | global batch size: 256 | lm loss: 3.292609E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.683 | TFLOPs: 32.29 | 7: iteration 77200/ 115203 | consumed samples: 19763200 | consumed tokens: 40475033600 | elapsed time per iteration (s): 0.37 | learning rate: 6.497E-05 | global batch size: 256 | lm loss: 3.286579E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.935 | TFLOPs: 32.30 | 7: iteration 77300/ 115203 | consumed samples: 19788800 | consumed tokens: 40527462400 | elapsed time per iteration (s): 0.37 | learning rate: 6.476E-05 | global batch size: 256 | lm loss: 3.284080E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.274 | TFLOPs: 32.22 | 7: iteration 77400/ 115203 | consumed samples: 19814400 | consumed tokens: 40579891200 | elapsed time per iteration (s): 0.37 | learning rate: 6.454E-05 | global batch size: 256 | lm loss: 3.287910E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.565 | TFLOPs: 32.19 | 7: iteration 77500/ 115203 | consumed samples: 19840000 | consumed tokens: 40632320000 | elapsed time per iteration (s): 0.37 | learning rate: 6.433E-05 | global batch size: 256 | lm loss: 3.282907E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.715 | TFLOPs: 32.24 | 7: iteration 77600/ 115203 | consumed samples: 19865600 | consumed tokens: 40684748800 | elapsed time per iteration (s): 0.37 | learning rate: 6.412E-05 | global batch size: 256 | lm loss: 3.289288E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.967 | TFLOPs: 32.16 | 7: iteration 77700/ 115203 | consumed samples: 19891200 | consumed tokens: 40737177600 | elapsed time per iteration (s): 0.37 | learning rate: 6.390E-05 | global batch size: 256 | lm loss: 3.283051E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.961 | TFLOPs: 32.20 | 7: iteration 77800/ 115203 | consumed samples: 19916800 | consumed tokens: 40789606400 | elapsed time per iteration (s): 0.37 | learning rate: 6.369E-05 | global batch size: 256 | lm loss: 3.291657E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.364 | TFLOPs: 32.27 | 7: iteration 77900/ 115203 | consumed samples: 19942400 | consumed tokens: 40842035200 | elapsed time per iteration (s): 0.37 | learning rate: 6.348E-05 | global batch size: 256 | lm loss: 3.286572E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.273 | TFLOPs: 32.27 | 0: [2023-03-17 07:29:13,598] [INFO] [logging.py:68:log_dist] [Rank 0] step=78000, skipped=0, lr=[6.326508628233516e-05, 6.326508628233516e-05, 6.326508628233516e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 78000/ 115203 | consumed samples: 19968000 | consumed tokens: 40894464000 | elapsed time per iteration (s): 0.37 | learning rate: 6.327E-05 | global batch size: 256 | lm loss: 3.283056E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.142 | TFLOPs: 32.31 | 0: steps: 78000 loss: 3.2419 iter time (s): 0.369 samples/sec: 693.822 7: iteration 78100/ 115203 | consumed samples: 19993600 | consumed tokens: 40946892800 | elapsed time per iteration (s): 0.37 | learning rate: 6.305E-05 | global batch size: 256 | lm loss: 3.286205E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.466 | TFLOPs: 32.32 | 7: iteration 78200/ 115203 | consumed samples: 20019200 | consumed tokens: 40999321600 | elapsed time per iteration (s): 0.37 | learning rate: 6.284E-05 | global batch size: 256 | lm loss: 3.286519E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.411 | TFLOPs: 32.32 | 7: iteration 78300/ 115203 | consumed samples: 20044800 | consumed tokens: 41051750400 | elapsed time per iteration (s): 0.37 | learning rate: 6.263E-05 | global batch size: 256 | lm loss: 3.285725E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.936 | TFLOPs: 32.25 | 7: iteration 78400/ 115203 | consumed samples: 20070400 | consumed tokens: 41104179200 | elapsed time per iteration (s): 0.37 | learning rate: 6.242E-05 | global batch size: 256 | lm loss: 3.288805E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.414 | TFLOPs: 32.23 | 7: iteration 78500/ 115203 | consumed samples: 20096000 | consumed tokens: 41156608000 | elapsed time per iteration (s): 0.37 | learning rate: 6.221E-05 | global batch size: 256 | lm loss: 3.283210E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.998 | TFLOPs: 32.30 | 7: iteration 78600/ 115203 | consumed samples: 20121600 | consumed tokens: 41209036800 | elapsed time per iteration (s): 0.37 | learning rate: 6.200E-05 | global batch size: 256 | lm loss: 3.287499E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.810 | TFLOPs: 32.29 | 7: iteration 78700/ 115203 | consumed samples: 20147200 | consumed tokens: 41261465600 | elapsed time per iteration (s): 0.37 | learning rate: 6.179E-05 | global batch size: 256 | lm loss: 3.282019E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.863 | TFLOPs: 32.29 | 7: iteration 78800/ 115203 | consumed samples: 20172800 | consumed tokens: 41313894400 | elapsed time per iteration (s): 0.37 | learning rate: 6.158E-05 | global batch size: 256 | lm loss: 3.284706E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.137 | TFLOPs: 32.26 | 7: iteration 78900/ 115203 | consumed samples: 20198400 | consumed tokens: 41366323200 | elapsed time per iteration (s): 0.37 | learning rate: 6.137E-05 | global batch size: 256 | lm loss: 3.286779E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.446 | TFLOPs: 32.27 | 7: iteration 79000/ 115203 | consumed samples: 20224000 | consumed tokens: 41418752000 | elapsed time per iteration (s): 0.37 | learning rate: 6.116E-05 | global batch size: 256 | lm loss: 3.282501E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.722 | TFLOPs: 32.33 | 7: iteration 79100/ 115203 | consumed samples: 20249600 | consumed tokens: 41471180800 | elapsed time per iteration (s): 0.37 | learning rate: 6.096E-05 | global batch size: 256 | lm loss: 3.285329E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.640 | TFLOPs: 32.28 | 7: iteration 79200/ 115203 | consumed samples: 20275200 | consumed tokens: 41523609600 | elapsed time per iteration (s): 0.37 | learning rate: 6.075E-05 | global batch size: 256 | lm loss: 3.287371E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.233 | TFLOPs: 32.26 | 7: iteration 79300/ 115203 | consumed samples: 20300800 | consumed tokens: 41576038400 | elapsed time per iteration (s): 0.37 | learning rate: 6.054E-05 | global batch size: 256 | lm loss: 3.284047E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.209 | TFLOPs: 32.26 | 7: iteration 79400/ 115203 | consumed samples: 20326400 | consumed tokens: 41628467200 | elapsed time per iteration (s): 0.37 | learning rate: 6.033E-05 | global batch size: 256 | lm loss: 3.287639E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.922 | TFLOPs: 32.30 | 7: iteration 79500/ 115203 | consumed samples: 20352000 | consumed tokens: 41680896000 | elapsed time per iteration (s): 0.37 | learning rate: 6.013E-05 | global batch size: 256 | lm loss: 3.280777E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.352 | TFLOPs: 32.32 | 7: iteration 79600/ 115203 | consumed samples: 20377600 | consumed tokens: 41733324800 | elapsed time per iteration (s): 0.37 | learning rate: 5.992E-05 | global batch size: 256 | lm loss: 3.284085E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.336 | TFLOPs: 32.32 | 7: iteration 79700/ 115203 | consumed samples: 20403200 | consumed tokens: 41785753600 | elapsed time per iteration (s): 0.37 | learning rate: 5.972E-05 | global batch size: 256 | lm loss: 3.284882E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.754 | TFLOPs: 32.29 | 7: iteration 79800/ 115203 | consumed samples: 20428800 | consumed tokens: 41838182400 | elapsed time per iteration (s): 0.37 | learning rate: 5.951E-05 | global batch size: 256 | lm loss: 3.286173E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.945 | TFLOPs: 32.25 | 7: iteration 79900/ 115203 | consumed samples: 20454400 | consumed tokens: 41890611200 | elapsed time per iteration (s): 0.37 | learning rate: 5.931E-05 | global batch size: 256 | lm loss: 3.282792E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.134 | TFLOPs: 32.31 | 0: [2023-03-17 07:41:33,770] [INFO] [logging.py:68:log_dist] [Rank 0] step=80000, skipped=0, lr=[5.910086097100006e-05, 5.910086097100006e-05, 5.910086097100006e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 80000/ 115203 | consumed samples: 20480000 | consumed tokens: 41943040000 | elapsed time per iteration (s): 0.37 | learning rate: 5.910E-05 | global batch size: 256 | lm loss: 3.284028E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.907 | TFLOPs: 32.30 | 0: steps: 80000 loss: 3.2755 iter time (s): 0.368 samples/sec: 695.141 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 80000 | lm loss value: 3.399223E+00 | lm loss PPL: 2.994083E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 80000 to checkpoints_146m60b400m 0: [2023-03-17 07:41:33,896] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step80000 is begin to save! 0: [2023-03-17 07:41:33,899] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/layer_01-model_00-model_states.pt... 0: [2023-03-17 07:41:33,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/layer_01-model_00-model_states.pt. 0: [2023-03-17 07:41:33,990] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/layer_03-model_00-model_states.pt... 0: [2023-03-17 07:41:34,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/layer_03-model_00-model_states.pt. 0: [2023-03-17 07:41:34,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/layer_04-model_00-model_states.pt... 0: [2023-03-17 07:41:34,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/layer_04-model_00-model_states.pt. 0: [2023-03-17 07:41:34,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/layer_05-model_00-model_states.pt... 0: [2023-03-17 07:41:34,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/layer_05-model_00-model_states.pt. 0: [2023-03-17 07:41:34,036] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/layer_06-model_00-model_states.pt... 0: [2023-03-17 07:41:34,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/layer_06-model_00-model_states.pt. 0: [2023-03-17 07:41:34,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/layer_07-model_00-model_states.pt... 0: [2023-03-17 07:41:34,066] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/layer_07-model_00-model_states.pt. 0: [2023-03-17 07:41:34,067] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/layer_08-model_00-model_states.pt... 0: [2023-03-17 07:41:34,082] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/layer_08-model_00-model_states.pt. 0: [2023-03-17 07:41:34,082] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/layer_09-model_00-model_states.pt... 0: [2023-03-17 07:41:34,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/layer_09-model_00-model_states.pt. 0: [2023-03-17 07:41:34,098] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/layer_10-model_00-model_states.pt... 0: [2023-03-17 07:41:34,113] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/layer_10-model_00-model_states.pt. 0: [2023-03-17 07:41:34,113] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/layer_11-model_00-model_states.pt... 0: [2023-03-17 07:41:34,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/layer_11-model_00-model_states.pt. 0: [2023-03-17 07:41:34,128] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/layer_12-model_00-model_states.pt... 0: [2023-03-17 07:41:34,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/layer_12-model_00-model_states.pt. 0: [2023-03-17 07:41:34,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/layer_13-model_00-model_states.pt... 0: [2023-03-17 07:41:34,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/layer_13-model_00-model_states.pt. 0: [2023-03-17 07:41:34,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/layer_14-model_00-model_states.pt... 0: [2023-03-17 07:41:34,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/layer_14-model_00-model_states.pt. 0: [2023-03-17 07:41:34,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/layer_15-model_00-model_states.pt... 0: [2023-03-17 07:41:34,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/layer_15-model_00-model_states.pt. 0: [2023-03-17 07:41:34,190] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/layer_16-model_00-model_states.pt... 0: [2023-03-17 07:41:34,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/layer_16-model_00-model_states.pt. 0: [2023-03-17 07:41:34,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/layer_17-model_00-model_states.pt... 0: [2023-03-17 07:41:34,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/layer_17-model_00-model_states.pt. 0: [2023-03-17 07:41:34,220] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/layer_19-model_00-model_states.pt... 0: [2023-03-17 07:41:34,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/layer_19-model_00-model_states.pt. 0: [2023-03-17 07:41:34,222] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b400m/global_step80000/mp_rank_00_model_states.pt 0: [2023-03-17 07:41:34,222] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/mp_rank_00_model_states.pt... 0: [2023-03-17 07:41:34,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/mp_rank_00_model_states.pt. 0: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 2: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 4: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 6: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 1: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 3: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 0: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 4: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 6: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 3: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 0: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 1: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 0: [2023-03-17 07:41:34,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 07:41:34,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 07:41:34,280] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 07:41:34,280] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 07:41:34,280] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 07:41:34,280] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 07:41:34,280] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 07:41:34,280] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 07:41:34,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 07:41:34,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 07:41:34,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 07:41:34,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 07:41:34,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 07:41:34,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 07:41:34,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 07:41:34,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 07:41:34,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 07:41:34,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 07:41:34,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 07:41:34,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 07:41:34,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 07:41:34,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 07:41:34,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 07:41:34,301] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 07:41:34,301] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 07:41:34,301] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 07:41:34,301] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 07:41:34,301] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 07:41:34,301] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 07:41:34,301] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 07:41:34,301] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 07:41:34,301] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 07:41:34,301] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 07:41:34,301] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 07:41:34,301] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 07:41:34,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 07:41:34,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 07:41:34,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 07:41:34,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 07:41:34,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 07:41:34,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 07:41:34,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 07:41:34,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 07:41:34,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 07:41:34,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 07:41:34,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 07:41:34,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 07:41:34,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 07:41:34,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 07:41:34,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 07:41:34,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 07:41:34,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 07:41:34,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 07:41:34,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 07:41:34,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 07:41:34,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 07:41:34,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 07:41:34,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 07:41:34,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 07:41:34,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 07:41:34,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 07:41:34,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 07:41:34,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 07:41:34,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 07:41:34,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 07:41:34,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 07:41:34,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 07:41:34,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 07:41:34,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 07:41:34,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 07:41:34,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 07:41:34,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 07:41:34,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 07:41:34,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 07:41:34,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 3: [2023-03-17 07:41:34,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 07:41:34,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 07:41:34,308] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 07:41:34,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 07:41:34,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 07:41:34,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 07:41:34,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 07:41:34,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 07:41:34,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 07:41:34,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 07:41:34,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 07:41:34,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 07:41:34,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 07:41:34,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 07:41:34,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 07:41:34,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 07:41:34,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 07:41:34,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 07:41:34,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 07:41:34,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 07:41:34,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 07:41:34,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 07:41:34,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 07:41:34,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 07:41:34,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 07:41:34,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 07:41:34,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 07:41:34,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 07:41:34,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 07:41:34,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 07:41:34,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 07:41:34,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 07:41:34,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 07:41:34,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 07:41:34,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 07:41:34,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 07:41:34,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 07:41:34,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 07:41:34,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 07:41:34,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 07:41:34,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 07:41:34,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 07:41:34,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 07:41:34,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 07:41:34,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 07:41:34,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 07:41:34,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 07:41:34,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 07:41:34,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 07:41:34,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 07:41:34,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 07:41:34,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 07:41:34,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 07:41:34,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 07:41:34,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 07:41:34,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 07:41:34,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 07:41:34,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 07:41:34,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 07:41:34,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 07:41:34,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 07:41:34,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 07:41:34,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 07:41:34,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 07:41:34,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 07:41:34,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 07:41:34,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 07:41:34,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 07:41:34,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 07:41:34,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 07:41:34,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 07:41:34,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 07:41:34,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 07:41:34,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 07:41:34,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 07:41:34,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 07:41:34,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 07:41:34,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 07:41:34,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 07:41:34,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 07:41:34,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 07:41:34,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 07:41:34,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 07:41:34,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 07:41:34,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 07:41:34,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 07:41:34,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 07:41:34,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 07:41:34,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step80000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 07:41:34,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 07:41:34,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 07:41:34,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 07:41:34,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 07:41:34,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 07:41:34,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 07:41:34,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 07:41:34,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: successfully saved checkpoint at iteration 80000 to checkpoints_146m60b400m 7: time (ms) | save-checkpoint: 433.27 7: iteration 80100/ 115203 | consumed samples: 20505600 | consumed tokens: 41995468800 | elapsed time per iteration (s): 0.38 | learning rate: 5.890E-05 | global batch size: 256 | lm loss: 3.281224E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.812 | TFLOPs: 31.82 | 7: iteration 80200/ 115203 | consumed samples: 20531200 | consumed tokens: 42047897600 | elapsed time per iteration (s): 0.37 | learning rate: 5.869E-05 | global batch size: 256 | lm loss: 3.285089E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.111 | TFLOPs: 32.17 | 7: iteration 80300/ 115203 | consumed samples: 20556800 | consumed tokens: 42100326400 | elapsed time per iteration (s): 0.37 | learning rate: 5.849E-05 | global batch size: 256 | lm loss: 3.284117E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.162 | TFLOPs: 32.31 | 7: iteration 80400/ 115203 | consumed samples: 20582400 | consumed tokens: 42152755200 | elapsed time per iteration (s): 0.37 | learning rate: 5.829E-05 | global batch size: 256 | lm loss: 3.279944E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.592 | TFLOPs: 32.23 | 7: iteration 80500/ 115203 | consumed samples: 20608000 | consumed tokens: 42205184000 | elapsed time per iteration (s): 0.38 | learning rate: 5.808E-05 | global batch size: 256 | lm loss: 3.283707E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.417 | TFLOPs: 31.81 | 7: iteration 80600/ 115203 | consumed samples: 20633600 | consumed tokens: 42257612800 | elapsed time per iteration (s): 0.37 | learning rate: 5.788E-05 | global batch size: 256 | lm loss: 3.281590E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.576 | TFLOPs: 32.09 | 7: iteration 80700/ 115203 | consumed samples: 20659200 | consumed tokens: 42310041600 | elapsed time per iteration (s): 0.37 | learning rate: 5.768E-05 | global batch size: 256 | lm loss: 3.285792E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.781 | TFLOPs: 32.15 | 7: iteration 80800/ 115203 | consumed samples: 20684800 | consumed tokens: 42362470400 | elapsed time per iteration (s): 0.37 | learning rate: 5.748E-05 | global batch size: 256 | lm loss: 3.281270E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.833 | TFLOPs: 31.92 | 7: iteration 80900/ 115203 | consumed samples: 20710400 | consumed tokens: 42414899200 | elapsed time per iteration (s): 0.37 | learning rate: 5.728E-05 | global batch size: 256 | lm loss: 3.284761E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.354 | TFLOPs: 32.18 | 7: iteration 81000/ 115203 | consumed samples: 20736000 | consumed tokens: 42467328000 | elapsed time per iteration (s): 0.37 | learning rate: 5.708E-05 | global batch size: 256 | lm loss: 3.283272E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.324 | TFLOPs: 32.08 | 7: iteration 81100/ 115203 | consumed samples: 20761600 | consumed tokens: 42519756800 | elapsed time per iteration (s): 0.37 | learning rate: 5.688E-05 | global batch size: 256 | lm loss: 3.282517E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.858 | TFLOPs: 32.29 | 7: iteration 81200/ 115203 | consumed samples: 20787200 | consumed tokens: 42572185600 | elapsed time per iteration (s): 0.37 | learning rate: 5.668E-05 | global batch size: 256 | lm loss: 3.277809E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.249 | TFLOPs: 32.26 | 7: iteration 81300/ 115203 | consumed samples: 20812800 | consumed tokens: 42624614400 | elapsed time per iteration (s): 0.37 | learning rate: 5.648E-05 | global batch size: 256 | lm loss: 3.280147E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.718 | TFLOPs: 32.19 | 7: iteration 81400/ 115203 | consumed samples: 20838400 | consumed tokens: 42677043200 | elapsed time per iteration (s): 0.37 | learning rate: 5.628E-05 | global batch size: 256 | lm loss: 3.278325E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.245 | TFLOPs: 32.26 | 7: iteration 81500/ 115203 | consumed samples: 20864000 | consumed tokens: 42729472000 | elapsed time per iteration (s): 0.37 | learning rate: 5.608E-05 | global batch size: 256 | lm loss: 3.281913E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.610 | TFLOPs: 32.19 | 7: iteration 81600/ 115203 | consumed samples: 20889600 | consumed tokens: 42781900800 | elapsed time per iteration (s): 0.37 | learning rate: 5.588E-05 | global batch size: 256 | lm loss: 3.278883E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.996 | TFLOPs: 32.25 | 7: iteration 81700/ 115203 | consumed samples: 20915200 | consumed tokens: 42834329600 | elapsed time per iteration (s): 0.37 | learning rate: 5.568E-05 | global batch size: 256 | lm loss: 3.283543E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.124 | TFLOPs: 32.31 | 7: iteration 81800/ 115203 | consumed samples: 20940800 | consumed tokens: 42886758400 | elapsed time per iteration (s): 0.37 | learning rate: 5.548E-05 | global batch size: 256 | lm loss: 3.279847E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.586 | TFLOPs: 32.14 | 7: iteration 81900/ 115203 | consumed samples: 20966400 | consumed tokens: 42939187200 | elapsed time per iteration (s): 0.37 | learning rate: 5.529E-05 | global batch size: 256 | lm loss: 3.280660E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.160 | TFLOPs: 32.31 | 0: [2023-03-17 07:53:56,860] [INFO] [logging.py:68:log_dist] [Rank 0] step=82000, skipped=0, lr=[5.5091074271143155e-05, 5.5091074271143155e-05, 5.5091074271143155e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 82000/ 115203 | consumed samples: 20992000 | consumed tokens: 42991616000 | elapsed time per iteration (s): 0.37 | learning rate: 5.509E-05 | global batch size: 256 | lm loss: 3.282192E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.084 | TFLOPs: 32.26 | 0: steps: 82000 loss: 3.2892 iter time (s): 0.370 samples/sec: 692.506 7: iteration 82100/ 115203 | consumed samples: 21017600 | consumed tokens: 43044044800 | elapsed time per iteration (s): 0.37 | learning rate: 5.489E-05 | global batch size: 256 | lm loss: 3.281803E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.200 | TFLOPs: 32.26 | 7: iteration 82200/ 115203 | consumed samples: 21043200 | consumed tokens: 43096473600 | elapsed time per iteration (s): 0.37 | learning rate: 5.470E-05 | global batch size: 256 | lm loss: 3.278499E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.528 | TFLOPs: 32.32 | 7: iteration 82300/ 115203 | consumed samples: 21068800 | consumed tokens: 43148902400 | elapsed time per iteration (s): 0.38 | learning rate: 5.450E-05 | global batch size: 256 | lm loss: 3.282873E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.538 | TFLOPs: 31.81 | 7: iteration 82400/ 115203 | consumed samples: 21094400 | consumed tokens: 43201331200 | elapsed time per iteration (s): 0.37 | learning rate: 5.431E-05 | global batch size: 256 | lm loss: 3.282068E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.881 | TFLOPs: 32.29 | 7: iteration 82500/ 115203 | consumed samples: 21120000 | consumed tokens: 43253760000 | elapsed time per iteration (s): 0.37 | learning rate: 5.411E-05 | global batch size: 256 | lm loss: 3.274529E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.179 | TFLOPs: 32.31 | 7: iteration 82600/ 115203 | consumed samples: 21145600 | consumed tokens: 43306188800 | elapsed time per iteration (s): 0.37 | learning rate: 5.392E-05 | global batch size: 256 | lm loss: 3.283621E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.628 | TFLOPs: 32.33 | 7: iteration 82700/ 115203 | consumed samples: 21171200 | consumed tokens: 43358617600 | elapsed time per iteration (s): 0.37 | learning rate: 5.373E-05 | global batch size: 256 | lm loss: 3.284028E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.079 | TFLOPs: 32.30 | 7: iteration 82800/ 115203 | consumed samples: 21196800 | consumed tokens: 43411046400 | elapsed time per iteration (s): 0.37 | learning rate: 5.353E-05 | global batch size: 256 | lm loss: 3.276927E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.594 | TFLOPs: 32.33 | 7: iteration 82900/ 115203 | consumed samples: 21222400 | consumed tokens: 43463475200 | elapsed time per iteration (s): 0.37 | learning rate: 5.334E-05 | global batch size: 256 | lm loss: 3.279845E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.765 | TFLOPs: 32.34 | 7: iteration 83000/ 115203 | consumed samples: 21248000 | consumed tokens: 43515904000 | elapsed time per iteration (s): 0.37 | learning rate: 5.315E-05 | global batch size: 256 | lm loss: 3.286049E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.498 | TFLOPs: 32.32 | 7: iteration 83100/ 115203 | consumed samples: 21273600 | consumed tokens: 43568332800 | elapsed time per iteration (s): 0.37 | learning rate: 5.296E-05 | global batch size: 256 | lm loss: 3.281243E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.800 | TFLOPs: 32.34 | 7: iteration 83200/ 115203 | consumed samples: 21299200 | consumed tokens: 43620761600 | elapsed time per iteration (s): 0.37 | learning rate: 5.276E-05 | global batch size: 256 | lm loss: 3.282958E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.536 | TFLOPs: 32.28 | 7: iteration 83300/ 115203 | consumed samples: 21324800 | consumed tokens: 43673190400 | elapsed time per iteration (s): 0.37 | learning rate: 5.257E-05 | global batch size: 256 | lm loss: 3.279920E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.757 | TFLOPs: 32.29 | 7: iteration 83400/ 115203 | consumed samples: 21350400 | consumed tokens: 43725619200 | elapsed time per iteration (s): 0.37 | learning rate: 5.238E-05 | global batch size: 256 | lm loss: 3.280926E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.151 | TFLOPs: 32.26 | 7: iteration 83500/ 115203 | consumed samples: 21376000 | consumed tokens: 43778048000 | elapsed time per iteration (s): 0.37 | learning rate: 5.219E-05 | global batch size: 256 | lm loss: 3.277865E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.661 | TFLOPs: 32.05 | 7: iteration 83600/ 115203 | consumed samples: 21401600 | consumed tokens: 43830476800 | elapsed time per iteration (s): 0.37 | learning rate: 5.200E-05 | global batch size: 256 | lm loss: 3.279561E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.537 | TFLOPs: 32.19 | 7: iteration 83700/ 115203 | consumed samples: 21427200 | consumed tokens: 43882905600 | elapsed time per iteration (s): 0.37 | learning rate: 5.181E-05 | global batch size: 256 | lm loss: 3.277372E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.815 | TFLOPs: 32.06 | 7: iteration 83800/ 115203 | consumed samples: 21452800 | consumed tokens: 43935334400 | elapsed time per iteration (s): 0.37 | learning rate: 5.162E-05 | global batch size: 256 | lm loss: 3.281695E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.356 | TFLOPs: 32.08 | 7: iteration 83900/ 115203 | consumed samples: 21478400 | consumed tokens: 43987763200 | elapsed time per iteration (s): 0.37 | learning rate: 5.144E-05 | global batch size: 256 | lm loss: 3.275649E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.814 | TFLOPs: 32.24 | 0: [2023-03-17 08:06:18,224] [INFO] [logging.py:68:log_dist] [Rank 0] step=84000, skipped=0, lr=[5.124789271253415e-05, 5.124789271253415e-05, 5.124789271253415e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 84000/ 115203 | consumed samples: 21504000 | consumed tokens: 44040192000 | elapsed time per iteration (s): 0.37 | learning rate: 5.125E-05 | global batch size: 256 | lm loss: 3.275042E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.290 | TFLOPs: 32.31 | 0: steps: 84000 loss: 3.2543 iter time (s): 0.369 samples/sec: 694.474 7: iteration 84100/ 115203 | consumed samples: 21529600 | consumed tokens: 44092620800 | elapsed time per iteration (s): 0.37 | learning rate: 5.106E-05 | global batch size: 256 | lm loss: 3.274781E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.357 | TFLOPs: 32.27 | 7: iteration 84200/ 115203 | consumed samples: 21555200 | consumed tokens: 44145049600 | elapsed time per iteration (s): 0.37 | learning rate: 5.087E-05 | global batch size: 256 | lm loss: 3.275647E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.588 | TFLOPs: 32.33 | 7: iteration 84300/ 115203 | consumed samples: 21580800 | consumed tokens: 44197478400 | elapsed time per iteration (s): 0.37 | learning rate: 5.069E-05 | global batch size: 256 | lm loss: 3.281107E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.569 | TFLOPs: 32.33 | 7: iteration 84400/ 115203 | consumed samples: 21606400 | consumed tokens: 44249907200 | elapsed time per iteration (s): 0.37 | learning rate: 5.050E-05 | global batch size: 256 | lm loss: 3.278146E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.903 | TFLOPs: 32.34 | 7: iteration 84500/ 115203 | consumed samples: 21632000 | consumed tokens: 44302336000 | elapsed time per iteration (s): 0.37 | learning rate: 5.031E-05 | global batch size: 256 | lm loss: 3.275379E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.495 | TFLOPs: 32.37 | 7: iteration 84600/ 115203 | consumed samples: 21657600 | consumed tokens: 44354764800 | elapsed time per iteration (s): 0.37 | learning rate: 5.013E-05 | global batch size: 256 | lm loss: 3.283363E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.609 | TFLOPs: 32.38 | 7: iteration 84700/ 115203 | consumed samples: 21683200 | consumed tokens: 44407193600 | elapsed time per iteration (s): 0.37 | learning rate: 4.994E-05 | global batch size: 256 | lm loss: 3.277858E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.505 | TFLOPs: 32.37 | 7: iteration 84800/ 115203 | consumed samples: 21708800 | consumed tokens: 44459622400 | elapsed time per iteration (s): 0.37 | learning rate: 4.976E-05 | global batch size: 256 | lm loss: 3.278573E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.118 | TFLOPs: 32.31 | 7: iteration 84900/ 115203 | consumed samples: 21734400 | consumed tokens: 44512051200 | elapsed time per iteration (s): 0.37 | learning rate: 4.958E-05 | global batch size: 256 | lm loss: 3.277742E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.371 | TFLOPs: 32.36 | 7: iteration 85000/ 115203 | consumed samples: 21760000 | consumed tokens: 44564480000 | elapsed time per iteration (s): 0.37 | learning rate: 4.939E-05 | global batch size: 256 | lm loss: 3.276126E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.500 | TFLOPs: 32.37 | 7: iteration 85100/ 115203 | consumed samples: 21785600 | consumed tokens: 44616908800 | elapsed time per iteration (s): 0.37 | learning rate: 4.921E-05 | global batch size: 256 | lm loss: 3.275939E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.348 | TFLOPs: 32.36 | 7: iteration 85200/ 115203 | consumed samples: 21811200 | consumed tokens: 44669337600 | elapsed time per iteration (s): 0.37 | learning rate: 4.903E-05 | global batch size: 256 | lm loss: 3.278838E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.986 | TFLOPs: 32.35 | 7: iteration 85300/ 115203 | consumed samples: 21836800 | consumed tokens: 44721766400 | elapsed time per iteration (s): 0.37 | learning rate: 4.884E-05 | global batch size: 256 | lm loss: 3.273960E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.192 | TFLOPs: 32.36 | 7: iteration 85400/ 115203 | consumed samples: 21862400 | consumed tokens: 44774195200 | elapsed time per iteration (s): 0.37 | learning rate: 4.866E-05 | global batch size: 256 | lm loss: 3.273882E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.457 | TFLOPs: 32.37 | 7: iteration 85500/ 115203 | consumed samples: 21888000 | consumed tokens: 44826624000 | elapsed time per iteration (s): 0.37 | learning rate: 4.848E-05 | global batch size: 256 | lm loss: 3.281582E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.730 | TFLOPs: 32.38 | 7: iteration 85600/ 115203 | consumed samples: 21913600 | consumed tokens: 44879052800 | elapsed time per iteration (s): 0.37 | learning rate: 4.830E-05 | global batch size: 256 | lm loss: 3.277771E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.702 | TFLOPs: 32.38 | 7: iteration 85700/ 115203 | consumed samples: 21939200 | consumed tokens: 44931481600 | elapsed time per iteration (s): 0.37 | learning rate: 4.812E-05 | global batch size: 256 | lm loss: 3.277065E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.654 | TFLOPs: 32.38 | 7: iteration 85800/ 115203 | consumed samples: 21964800 | consumed tokens: 44983910400 | elapsed time per iteration (s): 0.37 | learning rate: 4.794E-05 | global batch size: 256 | lm loss: 3.280822E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.683 | TFLOPs: 32.38 | 7: iteration 85900/ 115203 | consumed samples: 21990400 | consumed tokens: 45036339200 | elapsed time per iteration (s): 0.37 | learning rate: 4.776E-05 | global batch size: 256 | lm loss: 3.276694E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.702 | TFLOPs: 32.38 | 0: [2023-03-17 08:18:36,820] [INFO] [logging.py:68:log_dist] [Rank 0] step=86000, skipped=0, lr=[4.7582977310170454e-05, 4.7582977310170454e-05, 4.7582977310170454e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 86000/ 115203 | consumed samples: 22016000 | consumed tokens: 45088768000 | elapsed time per iteration (s): 0.37 | learning rate: 4.758E-05 | global batch size: 256 | lm loss: 3.275878E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.705 | TFLOPs: 32.38 | 0: steps: 86000 loss: 3.2544 iter time (s): 0.367 samples/sec: 697.142 7: iteration 86100/ 115203 | consumed samples: 22041600 | consumed tokens: 45141196800 | elapsed time per iteration (s): 0.37 | learning rate: 4.740E-05 | global batch size: 256 | lm loss: 3.272046E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.262 | TFLOPs: 32.36 | 7: iteration 86200/ 115203 | consumed samples: 22067200 | consumed tokens: 45193625600 | elapsed time per iteration (s): 0.37 | learning rate: 4.723E-05 | global batch size: 256 | lm loss: 3.281470E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.662 | TFLOPs: 32.38 | 7: iteration 86300/ 115203 | consumed samples: 22092800 | consumed tokens: 45246054400 | elapsed time per iteration (s): 0.37 | learning rate: 4.705E-05 | global batch size: 256 | lm loss: 3.273960E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.811 | TFLOPs: 32.38 | 7: iteration 86400/ 115203 | consumed samples: 22118400 | consumed tokens: 45298483200 | elapsed time per iteration (s): 0.37 | learning rate: 4.687E-05 | global batch size: 256 | lm loss: 3.271680E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.950 | TFLOPs: 32.34 | 7: iteration 86500/ 115203 | consumed samples: 22144000 | consumed tokens: 45350912000 | elapsed time per iteration (s): 0.37 | learning rate: 4.670E-05 | global batch size: 256 | lm loss: 3.271402E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.637 | TFLOPs: 32.33 | 7: iteration 86600/ 115203 | consumed samples: 22169600 | consumed tokens: 45403340800 | elapsed time per iteration (s): 0.37 | learning rate: 4.652E-05 | global batch size: 256 | lm loss: 3.273362E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.818 | TFLOPs: 32.38 | 7: iteration 86700/ 115203 | consumed samples: 22195200 | consumed tokens: 45455769600 | elapsed time per iteration (s): 0.37 | learning rate: 4.634E-05 | global batch size: 256 | lm loss: 3.275040E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.654 | TFLOPs: 32.38 | 7: iteration 86800/ 115203 | consumed samples: 22220800 | consumed tokens: 45508198400 | elapsed time per iteration (s): 0.37 | learning rate: 4.617E-05 | global batch size: 256 | lm loss: 3.277583E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.776 | TFLOPs: 32.38 | 7: iteration 86900/ 115203 | consumed samples: 22246400 | consumed tokens: 45560627200 | elapsed time per iteration (s): 0.37 | learning rate: 4.599E-05 | global batch size: 256 | lm loss: 3.273469E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.525 | TFLOPs: 32.37 | 7: iteration 87000/ 115203 | consumed samples: 22272000 | consumed tokens: 45613056000 | elapsed time per iteration (s): 0.37 | learning rate: 4.582E-05 | global batch size: 256 | lm loss: 3.273796E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.720 | TFLOPs: 32.38 | 7: iteration 87100/ 115203 | consumed samples: 22297600 | consumed tokens: 45665484800 | elapsed time per iteration (s): 0.37 | learning rate: 4.565E-05 | global batch size: 256 | lm loss: 3.277024E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.160 | TFLOPs: 32.31 | 7: iteration 87200/ 115203 | consumed samples: 22323200 | consumed tokens: 45717913600 | elapsed time per iteration (s): 0.37 | learning rate: 4.547E-05 | global batch size: 256 | lm loss: 3.273201E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.942 | TFLOPs: 32.30 | 7: iteration 87300/ 115203 | consumed samples: 22348800 | consumed tokens: 45770342400 | elapsed time per iteration (s): 0.37 | learning rate: 4.530E-05 | global batch size: 256 | lm loss: 3.277118E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.628 | TFLOPs: 32.33 | 7: iteration 87400/ 115203 | consumed samples: 22374400 | consumed tokens: 45822771200 | elapsed time per iteration (s): 0.37 | learning rate: 4.513E-05 | global batch size: 256 | lm loss: 3.273674E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.125 | TFLOPs: 32.31 | 7: iteration 87500/ 115203 | consumed samples: 22400000 | consumed tokens: 45875200000 | elapsed time per iteration (s): 0.37 | learning rate: 4.496E-05 | global batch size: 256 | lm loss: 3.272727E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.272 | TFLOPs: 32.31 | 7: iteration 87600/ 115203 | consumed samples: 22425600 | consumed tokens: 45927628800 | elapsed time per iteration (s): 0.37 | learning rate: 4.479E-05 | global batch size: 256 | lm loss: 3.275771E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.779 | TFLOPs: 32.29 | 7: iteration 87700/ 115203 | consumed samples: 22451200 | consumed tokens: 45980057600 | elapsed time per iteration (s): 0.37 | learning rate: 4.462E-05 | global batch size: 256 | lm loss: 3.277753E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.935 | TFLOPs: 32.25 | 7: iteration 87800/ 115203 | consumed samples: 22476800 | consumed tokens: 46032486400 | elapsed time per iteration (s): 0.37 | learning rate: 4.445E-05 | global batch size: 256 | lm loss: 3.275055E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.138 | TFLOPs: 32.31 | 7: iteration 87900/ 115203 | consumed samples: 22502400 | consumed tokens: 46084915200 | elapsed time per iteration (s): 0.37 | learning rate: 4.428E-05 | global batch size: 256 | lm loss: 3.278275E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.125 | TFLOPs: 32.26 | 0: [2023-03-17 08:30:55,925] [INFO] [logging.py:68:log_dist] [Rank 0] step=88000, skipped=0, lr=[4.410744818232367e-05, 4.410744818232367e-05, 4.410744818232367e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 88000/ 115203 | consumed samples: 22528000 | consumed tokens: 46137344000 | elapsed time per iteration (s): 0.37 | learning rate: 4.411E-05 | global batch size: 256 | lm loss: 3.269291E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.708 | TFLOPs: 32.33 | 0: steps: 88000 loss: 3.2950 iter time (s): 0.367 samples/sec: 696.783 7: iteration 88100/ 115203 | consumed samples: 22553600 | consumed tokens: 46189772800 | elapsed time per iteration (s): 0.37 | learning rate: 4.394E-05 | global batch size: 256 | lm loss: 3.278976E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.299 | TFLOPs: 32.31 | 7: iteration 88200/ 115203 | consumed samples: 22579200 | consumed tokens: 46242201600 | elapsed time per iteration (s): 0.37 | learning rate: 4.377E-05 | global batch size: 256 | lm loss: 3.277333E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.316 | TFLOPs: 32.31 | 7: iteration 88300/ 115203 | consumed samples: 22604800 | consumed tokens: 46294630400 | elapsed time per iteration (s): 0.37 | learning rate: 4.360E-05 | global batch size: 256 | lm loss: 3.272373E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.167 | TFLOPs: 32.31 | 7: iteration 88400/ 115203 | consumed samples: 22630400 | consumed tokens: 46347059200 | elapsed time per iteration (s): 0.37 | learning rate: 4.344E-05 | global batch size: 256 | lm loss: 3.278094E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.005 | TFLOPs: 32.30 | 7: iteration 88500/ 115203 | consumed samples: 22656000 | consumed tokens: 46399488000 | elapsed time per iteration (s): 0.37 | learning rate: 4.327E-05 | global batch size: 256 | lm loss: 3.273058E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.408 | TFLOPs: 32.32 | 7: iteration 88600/ 115203 | consumed samples: 22681600 | consumed tokens: 46451916800 | elapsed time per iteration (s): 0.37 | learning rate: 4.310E-05 | global batch size: 256 | lm loss: 3.276505E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.288 | TFLOPs: 32.31 | 7: iteration 88700/ 115203 | consumed samples: 22707200 | consumed tokens: 46504345600 | elapsed time per iteration (s): 0.37 | learning rate: 4.294E-05 | global batch size: 256 | lm loss: 3.276615E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.319 | TFLOPs: 32.31 | 7: iteration 88800/ 115203 | consumed samples: 22732800 | consumed tokens: 46556774400 | elapsed time per iteration (s): 0.37 | learning rate: 4.277E-05 | global batch size: 256 | lm loss: 3.271468E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.195 | TFLOPs: 32.31 | 7: iteration 88900/ 115203 | consumed samples: 22758400 | consumed tokens: 46609203200 | elapsed time per iteration (s): 0.37 | learning rate: 4.261E-05 | global batch size: 256 | lm loss: 3.272907E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.575 | TFLOPs: 32.28 | 7: iteration 89000/ 115203 | consumed samples: 22784000 | consumed tokens: 46661632000 | elapsed time per iteration (s): 0.37 | learning rate: 4.244E-05 | global batch size: 256 | lm loss: 3.273910E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.147 | TFLOPs: 32.31 | 7: iteration 89100/ 115203 | consumed samples: 22809600 | consumed tokens: 46714060800 | elapsed time per iteration (s): 0.37 | learning rate: 4.228E-05 | global batch size: 256 | lm loss: 3.275222E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.009 | TFLOPs: 32.30 | 7: iteration 89200/ 115203 | consumed samples: 22835200 | consumed tokens: 46766489600 | elapsed time per iteration (s): 0.37 | learning rate: 4.212E-05 | global batch size: 256 | lm loss: 3.272130E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.579 | TFLOPs: 32.33 | 7: iteration 89300/ 115203 | consumed samples: 22860800 | consumed tokens: 46818918400 | elapsed time per iteration (s): 0.37 | learning rate: 4.195E-05 | global batch size: 256 | lm loss: 3.275504E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.830 | TFLOPs: 32.34 | 7: iteration 89400/ 115203 | consumed samples: 22886400 | consumed tokens: 46871347200 | elapsed time per iteration (s): 0.37 | learning rate: 4.179E-05 | global batch size: 256 | lm loss: 3.276441E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.863 | TFLOPs: 32.34 | 7: iteration 89500/ 115203 | consumed samples: 22912000 | consumed tokens: 46923776000 | elapsed time per iteration (s): 0.37 | learning rate: 4.163E-05 | global batch size: 256 | lm loss: 3.269824E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.286 | TFLOPs: 32.36 | 7: iteration 89600/ 115203 | consumed samples: 22937600 | consumed tokens: 46976204800 | elapsed time per iteration (s): 0.37 | learning rate: 4.147E-05 | global batch size: 256 | lm loss: 3.277184E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.990 | TFLOPs: 32.35 | 7: iteration 89700/ 115203 | consumed samples: 22963200 | consumed tokens: 47028633600 | elapsed time per iteration (s): 0.37 | learning rate: 4.131E-05 | global batch size: 256 | lm loss: 3.270610E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.122 | TFLOPs: 32.35 | 7: iteration 89800/ 115203 | consumed samples: 22988800 | consumed tokens: 47081062400 | elapsed time per iteration (s): 0.37 | learning rate: 4.115E-05 | global batch size: 256 | lm loss: 3.273926E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.155 | TFLOPs: 32.31 | 7: iteration 89900/ 115203 | consumed samples: 23014400 | consumed tokens: 47133491200 | elapsed time per iteration (s): 0.37 | learning rate: 4.099E-05 | global batch size: 256 | lm loss: 3.269955E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.377 | TFLOPs: 32.32 | 0: [2023-03-17 08:43:15,387] [INFO] [logging.py:68:log_dist] [Rank 0] step=90000, skipped=0, lr=[4.083185080977982e-05, 4.083185080977982e-05, 4.083185080977982e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 90000/ 115203 | consumed samples: 23040000 | consumed tokens: 47185920000 | elapsed time per iteration (s): 0.37 | learning rate: 4.083E-05 | global batch size: 256 | lm loss: 3.272336E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.990 | TFLOPs: 32.30 | 0: steps: 90000 loss: 3.2650 iter time (s): 0.368 samples/sec: 696.353 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 90000 | lm loss value: 3.334121E+00 | lm loss PPL: 2.805373E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 90000 to checkpoints_146m60b400m 0: [2023-03-17 08:43:15,518] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step90000 is begin to save! 0: [2023-03-17 08:43:15,529] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/layer_01-model_00-model_states.pt... 0: [2023-03-17 08:43:15,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/layer_01-model_00-model_states.pt. 0: [2023-03-17 08:43:15,629] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/layer_03-model_00-model_states.pt... 0: [2023-03-17 08:43:15,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/layer_03-model_00-model_states.pt. 0: [2023-03-17 08:43:15,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/layer_04-model_00-model_states.pt... 0: [2023-03-17 08:43:15,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/layer_04-model_00-model_states.pt. 0: [2023-03-17 08:43:15,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/layer_05-model_00-model_states.pt... 0: [2023-03-17 08:43:15,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/layer_05-model_00-model_states.pt. 0: [2023-03-17 08:43:15,676] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/layer_06-model_00-model_states.pt... 0: [2023-03-17 08:43:15,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/layer_06-model_00-model_states.pt. 0: [2023-03-17 08:43:15,690] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/layer_07-model_00-model_states.pt... 0: [2023-03-17 08:43:15,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/layer_07-model_00-model_states.pt. 0: [2023-03-17 08:43:15,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/layer_08-model_00-model_states.pt... 0: [2023-03-17 08:43:15,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/layer_08-model_00-model_states.pt. 0: [2023-03-17 08:43:15,721] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/layer_09-model_00-model_states.pt... 0: [2023-03-17 08:43:15,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/layer_09-model_00-model_states.pt. 0: [2023-03-17 08:43:15,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/layer_10-model_00-model_states.pt... 0: [2023-03-17 08:43:15,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/layer_10-model_00-model_states.pt. 0: [2023-03-17 08:43:15,751] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/layer_11-model_00-model_states.pt... 0: [2023-03-17 08:43:15,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/layer_11-model_00-model_states.pt. 0: [2023-03-17 08:43:15,766] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/layer_12-model_00-model_states.pt... 0: [2023-03-17 08:43:15,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/layer_12-model_00-model_states.pt. 0: [2023-03-17 08:43:15,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/layer_13-model_00-model_states.pt... 0: [2023-03-17 08:43:15,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/layer_13-model_00-model_states.pt. 0: [2023-03-17 08:43:15,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/layer_14-model_00-model_states.pt... 0: [2023-03-17 08:43:15,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/layer_14-model_00-model_states.pt. 0: [2023-03-17 08:43:15,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/layer_15-model_00-model_states.pt... 0: [2023-03-17 08:43:15,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/layer_15-model_00-model_states.pt. 0: [2023-03-17 08:43:15,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/layer_16-model_00-model_states.pt... 0: [2023-03-17 08:43:15,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/layer_16-model_00-model_states.pt. 0: [2023-03-17 08:43:15,840] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/layer_17-model_00-model_states.pt... 0: [2023-03-17 08:43:15,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/layer_17-model_00-model_states.pt. 0: [2023-03-17 08:43:15,855] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/layer_19-model_00-model_states.pt... 0: [2023-03-17 08:43:15,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/layer_19-model_00-model_states.pt. 0: [2023-03-17 08:43:15,857] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b400m/global_step90000/mp_rank_00_model_states.pt 0: [2023-03-17 08:43:15,857] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/mp_rank_00_model_states.pt... 0: [2023-03-17 08:43:15,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/mp_rank_00_model_states.pt. 0: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 2: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 4: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 6: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 1: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 3: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 0: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 2: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 4: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 6: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 1: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 3: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 0: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 3: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 0: [2023-03-17 08:43:15,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 08:43:15,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 08:43:15,913] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 08:43:15,913] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 08:43:15,913] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 08:43:15,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 08:43:15,914] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 08:43:15,914] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 08:43:15,915] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 08:43:15,915] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 08:43:15,915] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 08:43:15,915] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 08:43:15,915] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 08:43:15,915] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 08:43:15,915] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 08:43:15,915] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 08:43:15,916] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 08:43:15,916] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 08:43:15,916] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 08:43:15,916] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 08:43:15,916] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 08:43:15,916] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 08:43:15,916] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 08:43:15,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 08:43:15,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 08:43:15,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 08:43:15,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 08:43:15,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 08:43:15,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 08:43:15,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 08:43:15,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 08:43:15,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 08:43:15,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 08:43:15,923] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 08:43:15,923] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 08:43:15,925] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 08:43:15,925] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 08:43:15,925] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 08:43:15,925] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 08:43:15,925] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 08:43:15,925] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 08:43:15,925] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 08:43:15,925] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 08:43:15,926] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 08:43:15,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 08:43:15,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 08:43:15,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 08:43:15,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 08:43:15,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 08:43:15,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 08:43:15,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 08:43:15,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 08:43:15,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 08:43:15,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 08:43:15,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 08:43:15,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 08:43:15,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 08:43:15,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 08:43:15,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 08:43:15,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 08:43:15,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 08:43:15,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 08:43:15,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 08:43:15,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 08:43:15,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 08:43:15,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 08:43:15,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 08:43:15,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 08:43:15,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 08:43:15,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 08:43:15,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 08:43:15,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 08:43:15,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 08:43:15,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 08:43:15,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 08:43:15,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 08:43:15,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 08:43:15,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 08:43:15,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 08:43:15,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 08:43:15,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 08:43:15,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 08:43:15,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 08:43:15,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 08:43:15,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 08:43:15,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 08:43:15,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 08:43:15,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 08:43:15,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 08:43:15,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 08:43:15,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 08:43:15,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 08:43:15,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 08:43:15,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 08:43:15,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 08:43:15,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 08:43:15,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 08:43:15,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 08:43:15,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 08:43:15,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 08:43:15,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 08:43:15,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 08:43:15,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 08:43:15,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 08:43:15,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 08:43:15,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 08:43:15,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 08:43:15,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 08:43:15,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 08:43:15,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 08:43:15,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 08:43:15,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 08:43:15,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 08:43:15,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 08:43:15,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 08:43:15,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 08:43:15,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 08:43:15,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 08:43:15,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 08:43:15,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 08:43:15,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 08:43:15,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 08:43:15,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 08:43:15,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 08:43:15,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 08:43:15,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 08:43:15,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 08:43:15,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 08:43:15,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 08:43:15,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 08:43:15,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 08:43:15,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 08:43:15,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 08:43:15,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 08:43:15,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 08:43:15,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 08:43:15,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 08:43:15,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 08:43:15,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 08:43:15,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 08:43:15,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 08:43:15,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 08:43:15,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 08:43:15,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 08:43:15,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 08:43:15,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 08:43:15,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 08:43:15,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 08:43:15,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 08:43:15,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 08:43:15,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 08:43:15,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 08:43:15,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 08:43:15,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 08:43:15,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 08:43:15,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 08:43:15,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 08:43:15,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 08:43:15,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 08:43:15,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 08:43:15,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 08:43:15,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 08:43:15,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 08:43:15,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 08:43:15,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 08:43:15,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 08:43:15,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 08:43:15,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 08:43:15,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 08:43:15,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 08:43:15,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 08:43:15,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 08:43:15,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 08:43:15,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 08:43:15,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 08:43:15,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 08:43:15,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 08:43:15,955] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 08:43:15,955] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 08:43:15,955] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 08:43:15,955] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 08:43:15,955] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 08:43:15,955] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 08:43:15,955] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 08:43:15,955] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step90000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 08:43:15,955] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 08:43:15,955] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 08:43:15,955] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 08:43:15,955] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 08:43:15,955] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 08:43:15,955] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 08:43:15,955] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 08:43:15,955] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: successfully saved checkpoint at iteration 90000 to checkpoints_146m60b400m 7: time (ms) | save-checkpoint: 444.41 7: iteration 90100/ 115203 | consumed samples: 23065600 | consumed tokens: 47238348800 | elapsed time per iteration (s): 0.38 | learning rate: 4.067E-05 | global batch size: 256 | lm loss: 3.270669E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.981 | TFLOPs: 31.83 | 7: iteration 90200/ 115203 | consumed samples: 23091200 | consumed tokens: 47290777600 | elapsed time per iteration (s): 0.37 | learning rate: 4.052E-05 | global batch size: 256 | lm loss: 3.273802E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.420 | TFLOPs: 32.32 | 7: iteration 90300/ 115203 | consumed samples: 23116800 | consumed tokens: 47343206400 | elapsed time per iteration (s): 0.37 | learning rate: 4.036E-05 | global batch size: 256 | lm loss: 3.267701E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.375 | TFLOPs: 32.32 | 7: iteration 90400/ 115203 | consumed samples: 23142400 | consumed tokens: 47395635200 | elapsed time per iteration (s): 0.37 | learning rate: 4.020E-05 | global batch size: 256 | lm loss: 3.271848E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.124 | TFLOPs: 32.31 | 7: iteration 90500/ 115203 | consumed samples: 23168000 | consumed tokens: 47448064000 | elapsed time per iteration (s): 0.37 | learning rate: 4.005E-05 | global batch size: 256 | lm loss: 3.269261E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.494 | TFLOPs: 32.32 | 7: iteration 90600/ 115203 | consumed samples: 23193600 | consumed tokens: 47500492800 | elapsed time per iteration (s): 0.37 | learning rate: 3.989E-05 | global batch size: 256 | lm loss: 3.272436E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.624 | TFLOPs: 32.05 | 7: iteration 90700/ 115203 | consumed samples: 23219200 | consumed tokens: 47552921600 | elapsed time per iteration (s): 0.38 | learning rate: 3.973E-05 | global batch size: 256 | lm loss: 3.271733E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.892 | TFLOPs: 31.78 | 7: iteration 90800/ 115203 | consumed samples: 23244800 | consumed tokens: 47605350400 | elapsed time per iteration (s): 0.37 | learning rate: 3.958E-05 | global batch size: 256 | lm loss: 3.268527E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.495 | TFLOPs: 31.95 | 7: iteration 90900/ 115203 | consumed samples: 23270400 | consumed tokens: 47657779200 | elapsed time per iteration (s): 0.37 | learning rate: 3.943E-05 | global batch size: 256 | lm loss: 3.273649E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.486 | TFLOPs: 31.90 | 7: iteration 91000/ 115203 | consumed samples: 23296000 | consumed tokens: 47710208000 | elapsed time per iteration (s): 0.38 | learning rate: 3.927E-05 | global batch size: 256 | lm loss: 3.270411E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.979 | TFLOPs: 31.79 | 7: iteration 91100/ 115203 | consumed samples: 23321600 | consumed tokens: 47762636800 | elapsed time per iteration (s): 0.37 | learning rate: 3.912E-05 | global batch size: 256 | lm loss: 3.273889E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.140 | TFLOPs: 32.07 | 7: iteration 91200/ 115203 | consumed samples: 23347200 | consumed tokens: 47815065600 | elapsed time per iteration (s): 0.37 | learning rate: 3.897E-05 | global batch size: 256 | lm loss: 3.273925E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.409 | TFLOPs: 31.90 | 7: iteration 91300/ 115203 | consumed samples: 23372800 | consumed tokens: 47867494400 | elapsed time per iteration (s): 0.38 | learning rate: 3.881E-05 | global batch size: 256 | lm loss: 3.272837E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.331 | TFLOPs: 31.62 | 7: iteration 91400/ 115203 | consumed samples: 23398400 | consumed tokens: 47919923200 | elapsed time per iteration (s): 0.38 | learning rate: 3.866E-05 | global batch size: 256 | lm loss: 3.270315E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.722 | TFLOPs: 31.59 | 7: iteration 91500/ 115203 | consumed samples: 23424000 | consumed tokens: 47972352000 | elapsed time per iteration (s): 0.37 | learning rate: 3.851E-05 | global batch size: 256 | lm loss: 3.267922E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.458 | TFLOPs: 31.95 | 7: iteration 91600/ 115203 | consumed samples: 23449600 | consumed tokens: 48024780800 | elapsed time per iteration (s): 0.38 | learning rate: 3.836E-05 | global batch size: 256 | lm loss: 3.268555E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.343 | TFLOPs: 31.85 | 7: iteration 91700/ 115203 | consumed samples: 23475200 | consumed tokens: 48077209600 | elapsed time per iteration (s): 0.38 | learning rate: 3.821E-05 | global batch size: 256 | lm loss: 3.267251E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.578 | TFLOPs: 31.81 | 7: iteration 91800/ 115203 | consumed samples: 23500800 | consumed tokens: 48129638400 | elapsed time per iteration (s): 0.37 | learning rate: 3.806E-05 | global batch size: 256 | lm loss: 3.270149E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.949 | TFLOPs: 31.92 | 7: iteration 91900/ 115203 | consumed samples: 23526400 | consumed tokens: 48182067200 | elapsed time per iteration (s): 0.37 | learning rate: 3.791E-05 | global batch size: 256 | lm loss: 3.274757E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.512 | TFLOPs: 31.95 | 0: [2023-03-17 08:55:43,087] [INFO] [logging.py:68:log_dist] [Rank 0] step=92000, skipped=0, lr=[3.776612403864962e-05, 3.776612403864962e-05, 3.776612403864962e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 92000/ 115203 | consumed samples: 23552000 | consumed tokens: 48234496000 | elapsed time per iteration (s): 0.37 | learning rate: 3.777E-05 | global batch size: 256 | lm loss: 3.271174E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.636 | TFLOPs: 32.05 | 0: steps: 92000 loss: 3.2750 iter time (s): 0.372 samples/sec: 687.968 7: iteration 92100/ 115203 | consumed samples: 23577600 | consumed tokens: 48286924800 | elapsed time per iteration (s): 0.37 | learning rate: 3.762E-05 | global batch size: 256 | lm loss: 3.275237E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.099 | TFLOPs: 31.88 | 7: iteration 92200/ 115203 | consumed samples: 23603200 | consumed tokens: 48339353600 | elapsed time per iteration (s): 0.38 | learning rate: 3.747E-05 | global batch size: 256 | lm loss: 3.270608E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.809 | TFLOPs: 31.73 | 7: iteration 92300/ 115203 | consumed samples: 23628800 | consumed tokens: 48391782400 | elapsed time per iteration (s): 0.37 | learning rate: 3.732E-05 | global batch size: 256 | lm loss: 3.267275E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.816 | TFLOPs: 31.87 | 7: iteration 92400/ 115203 | consumed samples: 23654400 | consumed tokens: 48444211200 | elapsed time per iteration (s): 0.37 | learning rate: 3.718E-05 | global batch size: 256 | lm loss: 3.274781E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.737 | TFLOPs: 32.01 | 7: iteration 92500/ 115203 | consumed samples: 23680000 | consumed tokens: 48496640000 | elapsed time per iteration (s): 0.37 | learning rate: 3.703E-05 | global batch size: 256 | lm loss: 3.266625E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.833 | TFLOPs: 32.06 | 7: iteration 92600/ 115203 | consumed samples: 23705600 | consumed tokens: 48549068800 | elapsed time per iteration (s): 0.37 | learning rate: 3.689E-05 | global batch size: 256 | lm loss: 3.272368E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.061 | TFLOPs: 32.02 | 7: iteration 92700/ 115203 | consumed samples: 23731200 | consumed tokens: 48601497600 | elapsed time per iteration (s): 0.37 | learning rate: 3.674E-05 | global batch size: 256 | lm loss: 3.271238E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.507 | TFLOPs: 32.14 | 7: iteration 92800/ 115203 | consumed samples: 23756800 | consumed tokens: 48653926400 | elapsed time per iteration (s): 0.38 | learning rate: 3.660E-05 | global batch size: 256 | lm loss: 3.271104E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.673 | TFLOPs: 31.35 | 7: iteration 92900/ 115203 | consumed samples: 23782400 | consumed tokens: 48706355200 | elapsed time per iteration (s): 0.38 | learning rate: 3.646E-05 | global batch size: 256 | lm loss: 3.270986E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.561 | TFLOPs: 31.53 | 7: iteration 93000/ 115203 | consumed samples: 23808000 | consumed tokens: 48758784000 | elapsed time per iteration (s): 0.37 | learning rate: 3.631E-05 | global batch size: 256 | lm loss: 3.272124E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.117 | TFLOPs: 31.89 | 7: iteration 93100/ 115203 | consumed samples: 23833600 | consumed tokens: 48811212800 | elapsed time per iteration (s): 0.38 | learning rate: 3.617E-05 | global batch size: 256 | lm loss: 3.270413E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.668 | TFLOPs: 31.63 | 7: iteration 93200/ 115203 | consumed samples: 23859200 | consumed tokens: 48863641600 | elapsed time per iteration (s): 0.38 | learning rate: 3.603E-05 | global batch size: 256 | lm loss: 3.269010E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.131 | TFLOPs: 31.70 | 7: iteration 93300/ 115203 | consumed samples: 23884800 | consumed tokens: 48916070400 | elapsed time per iteration (s): 0.38 | learning rate: 3.589E-05 | global batch size: 256 | lm loss: 3.270165E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.041 | TFLOPs: 31.28 | 7: iteration 93400/ 115203 | consumed samples: 23910400 | consumed tokens: 48968499200 | elapsed time per iteration (s): 0.38 | learning rate: 3.575E-05 | global batch size: 256 | lm loss: 3.271046E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.960 | TFLOPs: 31.55 | 7: iteration 93500/ 115203 | consumed samples: 23936000 | consumed tokens: 49020928000 | elapsed time per iteration (s): 0.38 | learning rate: 3.561E-05 | global batch size: 256 | lm loss: 3.275811E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.595 | TFLOPs: 31.44 | 7: iteration 93600/ 115203 | consumed samples: 23961600 | consumed tokens: 49073356800 | elapsed time per iteration (s): 0.38 | learning rate: 3.547E-05 | global batch size: 256 | lm loss: 3.268617E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.204 | TFLOPs: 31.38 | 7: iteration 93700/ 115203 | consumed samples: 23987200 | consumed tokens: 49125785600 | elapsed time per iteration (s): 0.38 | learning rate: 3.533E-05 | global batch size: 256 | lm loss: 3.271323E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.861 | TFLOPs: 31.41 | 7: iteration 93800/ 115203 | consumed samples: 24012800 | consumed tokens: 49178214400 | elapsed time per iteration (s): 0.38 | learning rate: 3.519E-05 | global batch size: 256 | lm loss: 3.272781E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.000 | TFLOPs: 31.69 | 7: iteration 93900/ 115203 | consumed samples: 24038400 | consumed tokens: 49230643200 | elapsed time per iteration (s): 0.38 | learning rate: 3.506E-05 | global batch size: 256 | lm loss: 3.269478E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.887 | TFLOPs: 31.64 | 0: [2023-03-17 09:08:17,290] [INFO] [logging.py:68:log_dist] [Rank 0] step=94000, skipped=0, lr=[3.4919569923835e-05, 3.4919569923835e-05, 3.4919569923835e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 94000/ 115203 | consumed samples: 24064000 | consumed tokens: 49283072000 | elapsed time per iteration (s): 0.38 | learning rate: 3.492E-05 | global batch size: 256 | lm loss: 3.270840E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.549 | TFLOPs: 31.58 | 0: steps: 94000 loss: 3.2626 iter time (s): 0.375 samples/sec: 681.871 7: iteration 94100/ 115203 | consumed samples: 24089600 | consumed tokens: 49335500800 | elapsed time per iteration (s): 0.38 | learning rate: 3.478E-05 | global batch size: 256 | lm loss: 3.270222E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.364 | TFLOPs: 31.66 | 7: iteration 94200/ 115203 | consumed samples: 24115200 | consumed tokens: 49387929600 | elapsed time per iteration (s): 0.38 | learning rate: 3.465E-05 | global batch size: 256 | lm loss: 3.269049E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.955 | TFLOPs: 31.74 | 7: iteration 94300/ 115203 | consumed samples: 24140800 | consumed tokens: 49440358400 | elapsed time per iteration (s): 0.38 | learning rate: 3.451E-05 | global batch size: 256 | lm loss: 3.265353E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.831 | TFLOPs: 31.78 | 7: iteration 94400/ 115203 | consumed samples: 24166400 | consumed tokens: 49492787200 | elapsed time per iteration (s): 0.37 | learning rate: 3.438E-05 | global batch size: 256 | lm loss: 3.266218E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.995 | TFLOPs: 31.88 | 7: iteration 94500/ 115203 | consumed samples: 24192000 | consumed tokens: 49545216000 | elapsed time per iteration (s): 0.38 | learning rate: 3.424E-05 | global batch size: 256 | lm loss: 3.269472E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.974 | TFLOPs: 31.83 | 7: iteration 94600/ 115203 | consumed samples: 24217600 | consumed tokens: 49597644800 | elapsed time per iteration (s): 0.37 | learning rate: 3.411E-05 | global batch size: 256 | lm loss: 3.266819E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.732 | TFLOPs: 32.24 | 7: iteration 94700/ 115203 | consumed samples: 24243200 | consumed tokens: 49650073600 | elapsed time per iteration (s): 0.37 | learning rate: 3.398E-05 | global batch size: 256 | lm loss: 3.272218E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.247 | TFLOPs: 31.98 | 7: iteration 94800/ 115203 | consumed samples: 24268800 | consumed tokens: 49702502400 | elapsed time per iteration (s): 0.37 | learning rate: 3.384E-05 | global batch size: 256 | lm loss: 3.270432E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.167 | TFLOPs: 32.07 | 7: iteration 94900/ 115203 | consumed samples: 24294400 | consumed tokens: 49754931200 | elapsed time per iteration (s): 0.37 | learning rate: 3.371E-05 | global batch size: 256 | lm loss: 3.269497E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.762 | TFLOPs: 32.15 | 7: iteration 95000/ 115203 | consumed samples: 24320000 | consumed tokens: 49807360000 | elapsed time per iteration (s): 0.37 | learning rate: 3.358E-05 | global batch size: 256 | lm loss: 3.268628E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.633 | TFLOPs: 32.05 | 7: iteration 95100/ 115203 | consumed samples: 24345600 | consumed tokens: 49859788800 | elapsed time per iteration (s): 0.37 | learning rate: 3.345E-05 | global batch size: 256 | lm loss: 3.269436E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.689 | TFLOPs: 32.01 | 7: iteration 95200/ 115203 | consumed samples: 24371200 | consumed tokens: 49912217600 | elapsed time per iteration (s): 0.37 | learning rate: 3.332E-05 | global batch size: 256 | lm loss: 3.270073E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.659 | TFLOPs: 32.05 | 7: iteration 95300/ 115203 | consumed samples: 24396800 | consumed tokens: 49964646400 | elapsed time per iteration (s): 0.37 | learning rate: 3.319E-05 | global batch size: 256 | lm loss: 3.269778E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.139 | TFLOPs: 32.26 | 7: iteration 95400/ 115203 | consumed samples: 24422400 | consumed tokens: 50017075200 | elapsed time per iteration (s): 0.37 | learning rate: 3.306E-05 | global batch size: 256 | lm loss: 3.268199E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.858 | TFLOPs: 31.97 | 7: iteration 95500/ 115203 | consumed samples: 24448000 | consumed tokens: 50069504000 | elapsed time per iteration (s): 0.37 | learning rate: 3.293E-05 | global batch size: 256 | lm loss: 3.272469E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.460 | TFLOPs: 32.09 | 7: iteration 95600/ 115203 | consumed samples: 24473600 | consumed tokens: 50121932800 | elapsed time per iteration (s): 0.37 | learning rate: 3.281E-05 | global batch size: 256 | lm loss: 3.265235E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.853 | TFLOPs: 32.20 | 7: iteration 95700/ 115203 | consumed samples: 24499200 | consumed tokens: 50174361600 | elapsed time per iteration (s): 0.37 | learning rate: 3.268E-05 | global batch size: 256 | lm loss: 3.272460E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.100 | TFLOPs: 32.07 | 7: iteration 95800/ 115203 | consumed samples: 24524800 | consumed tokens: 50226790400 | elapsed time per iteration (s): 0.37 | learning rate: 3.255E-05 | global batch size: 256 | lm loss: 3.263514E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.872 | TFLOPs: 32.11 | 7: iteration 95900/ 115203 | consumed samples: 24550400 | consumed tokens: 50279219200 | elapsed time per iteration (s): 0.37 | learning rate: 3.243E-05 | global batch size: 256 | lm loss: 3.266529E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.775 | TFLOPs: 32.15 | 0: [2023-03-17 09:20:43,486] [INFO] [logging.py:68:log_dist] [Rank 0] step=96000, skipped=0, lr=[3.230082550465275e-05, 3.230082550465275e-05, 3.230082550465275e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 96000/ 115203 | consumed samples: 24576000 | consumed tokens: 50331648000 | elapsed time per iteration (s): 0.37 | learning rate: 3.230E-05 | global batch size: 256 | lm loss: 3.264462E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.245 | TFLOPs: 32.26 | 0: steps: 96000 loss: 3.2818 iter time (s): 0.371 samples/sec: 689.908 7: iteration 96100/ 115203 | consumed samples: 24601600 | consumed tokens: 50384076800 | elapsed time per iteration (s): 0.37 | learning rate: 3.218E-05 | global batch size: 256 | lm loss: 3.268656E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.120 | TFLOPs: 32.26 | 7: iteration 96200/ 115203 | consumed samples: 24627200 | consumed tokens: 50436505600 | elapsed time per iteration (s): 0.37 | learning rate: 3.205E-05 | global batch size: 256 | lm loss: 3.269136E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.058 | TFLOPs: 32.21 | 7: iteration 96300/ 115203 | consumed samples: 24652800 | consumed tokens: 50488934400 | elapsed time per iteration (s): 0.37 | learning rate: 3.193E-05 | global batch size: 256 | lm loss: 3.268699E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.486 | TFLOPs: 32.18 | 7: iteration 96400/ 115203 | consumed samples: 24678400 | consumed tokens: 50541363200 | elapsed time per iteration (s): 0.37 | learning rate: 3.181E-05 | global batch size: 256 | lm loss: 3.268477E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.865 | TFLOPs: 32.20 | 7: iteration 96500/ 115203 | consumed samples: 24704000 | consumed tokens: 50593792000 | elapsed time per iteration (s): 0.37 | learning rate: 3.168E-05 | global batch size: 256 | lm loss: 3.266351E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.049 | TFLOPs: 32.26 | 7: iteration 96600/ 115203 | consumed samples: 24729600 | consumed tokens: 50646220800 | elapsed time per iteration (s): 0.37 | learning rate: 3.156E-05 | global batch size: 256 | lm loss: 3.264113E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.016 | TFLOPs: 32.30 | 7: iteration 96700/ 115203 | consumed samples: 24755200 | consumed tokens: 50698649600 | elapsed time per iteration (s): 0.37 | learning rate: 3.144E-05 | global batch size: 256 | lm loss: 3.268403E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.660 | TFLOPs: 32.33 | 7: iteration 96800/ 115203 | consumed samples: 24780800 | consumed tokens: 50751078400 | elapsed time per iteration (s): 0.37 | learning rate: 3.132E-05 | global batch size: 256 | lm loss: 3.269384E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.446 | TFLOPs: 32.27 | 7: iteration 96900/ 115203 | consumed samples: 24806400 | consumed tokens: 50803507200 | elapsed time per iteration (s): 0.37 | learning rate: 3.120E-05 | global batch size: 256 | lm loss: 3.269709E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.845 | TFLOPs: 32.25 | 7: iteration 97000/ 115203 | consumed samples: 24832000 | consumed tokens: 50855936000 | elapsed time per iteration (s): 0.37 | learning rate: 3.108E-05 | global batch size: 256 | lm loss: 3.269133E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.261 | TFLOPs: 32.22 | 7: iteration 97100/ 115203 | consumed samples: 24857600 | consumed tokens: 50908364800 | elapsed time per iteration (s): 0.37 | learning rate: 3.096E-05 | global batch size: 256 | lm loss: 3.266325E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.832 | TFLOPs: 32.25 | 7: iteration 97200/ 115203 | consumed samples: 24883200 | consumed tokens: 50960793600 | elapsed time per iteration (s): 0.37 | learning rate: 3.084E-05 | global batch size: 256 | lm loss: 3.269073E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.531 | TFLOPs: 32.23 | 7: iteration 97300/ 115203 | consumed samples: 24908800 | consumed tokens: 51013222400 | elapsed time per iteration (s): 0.37 | learning rate: 3.072E-05 | global batch size: 256 | lm loss: 3.270528E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.724 | TFLOPs: 32.29 | 7: iteration 97400/ 115203 | consumed samples: 24934400 | consumed tokens: 51065651200 | elapsed time per iteration (s): 0.37 | learning rate: 3.061E-05 | global batch size: 256 | lm loss: 3.264284E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.106 | TFLOPs: 32.30 | 7: iteration 97500/ 115203 | consumed samples: 24960000 | consumed tokens: 51118080000 | elapsed time per iteration (s): 0.37 | learning rate: 3.049E-05 | global batch size: 256 | lm loss: 3.262379E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.827 | TFLOPs: 32.29 | 7: iteration 97600/ 115203 | consumed samples: 24985600 | consumed tokens: 51170508800 | elapsed time per iteration (s): 0.37 | learning rate: 3.038E-05 | global batch size: 256 | lm loss: 3.267758E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.333 | TFLOPs: 32.27 | 7: iteration 97700/ 115203 | consumed samples: 25011200 | consumed tokens: 51222937600 | elapsed time per iteration (s): 0.37 | learning rate: 3.026E-05 | global batch size: 256 | lm loss: 3.270950E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.446 | TFLOPs: 32.27 | 7: iteration 97800/ 115203 | consumed samples: 25036800 | consumed tokens: 51275366400 | elapsed time per iteration (s): 0.37 | learning rate: 3.015E-05 | global batch size: 256 | lm loss: 3.267437E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.040 | TFLOPs: 32.26 | 7: iteration 97900/ 115203 | consumed samples: 25062400 | consumed tokens: 51327795200 | elapsed time per iteration (s): 0.37 | learning rate: 3.003E-05 | global batch size: 256 | lm loss: 3.272906E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.464 | TFLOPs: 32.28 | 0: [2023-03-17 09:33:04,266] [INFO] [logging.py:68:log_dist] [Rank 0] step=98000, skipped=0, lr=[2.9917836598254863e-05, 2.9917836598254863e-05, 2.9917836598254863e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 98000/ 115203 | consumed samples: 25088000 | consumed tokens: 51380224000 | elapsed time per iteration (s): 0.37 | learning rate: 2.992E-05 | global batch size: 256 | lm loss: 3.264392E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.198 | TFLOPs: 32.31 | 0: steps: 98000 loss: 3.3218 iter time (s): 0.368 samples/sec: 694.956 7: iteration 98100/ 115203 | consumed samples: 25113600 | consumed tokens: 51432652800 | elapsed time per iteration (s): 0.37 | learning rate: 2.981E-05 | global batch size: 256 | lm loss: 3.265590E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.807 | TFLOPs: 32.29 | 7: iteration 98200/ 115203 | consumed samples: 25139200 | consumed tokens: 51485081600 | elapsed time per iteration (s): 0.37 | learning rate: 2.969E-05 | global batch size: 256 | lm loss: 3.271105E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.165 | TFLOPs: 32.31 | 7: iteration 98300/ 115203 | consumed samples: 25164800 | consumed tokens: 51537510400 | elapsed time per iteration (s): 0.37 | learning rate: 2.958E-05 | global batch size: 256 | lm loss: 3.269978E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.556 | TFLOPs: 32.28 | 7: iteration 98400/ 115203 | consumed samples: 25190400 | consumed tokens: 51589939200 | elapsed time per iteration (s): 0.37 | learning rate: 2.947E-05 | global batch size: 256 | lm loss: 3.265805E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.023 | TFLOPs: 32.30 | 7: iteration 98500/ 115203 | consumed samples: 25216000 | consumed tokens: 51642368000 | elapsed time per iteration (s): 0.37 | learning rate: 2.936E-05 | global batch size: 256 | lm loss: 3.270173E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.828 | TFLOPs: 32.25 | 7: iteration 98600/ 115203 | consumed samples: 25241600 | consumed tokens: 51694796800 | elapsed time per iteration (s): 0.37 | learning rate: 2.925E-05 | global batch size: 256 | lm loss: 3.267348E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.761 | TFLOPs: 32.29 | 7: iteration 98700/ 115203 | consumed samples: 25267200 | consumed tokens: 51747225600 | elapsed time per iteration (s): 0.37 | learning rate: 2.914E-05 | global batch size: 256 | lm loss: 3.262898E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.732 | TFLOPs: 32.29 | 7: iteration 98800/ 115203 | consumed samples: 25292800 | consumed tokens: 51799654400 | elapsed time per iteration (s): 0.37 | learning rate: 2.903E-05 | global batch size: 256 | lm loss: 3.267419E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.639 | TFLOPs: 32.28 | 7: iteration 98900/ 115203 | consumed samples: 25318400 | consumed tokens: 51852083200 | elapsed time per iteration (s): 0.37 | learning rate: 2.892E-05 | global batch size: 256 | lm loss: 3.265435E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.909 | TFLOPs: 32.25 | 7: iteration 99000/ 115203 | consumed samples: 25344000 | consumed tokens: 51904512000 | elapsed time per iteration (s): 0.37 | learning rate: 2.882E-05 | global batch size: 256 | lm loss: 3.264549E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.954 | TFLOPs: 32.30 | 7: iteration 99100/ 115203 | consumed samples: 25369600 | consumed tokens: 51956940800 | elapsed time per iteration (s): 0.37 | learning rate: 2.871E-05 | global batch size: 256 | lm loss: 3.265759E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.264 | TFLOPs: 32.31 | 7: iteration 99200/ 115203 | consumed samples: 25395200 | consumed tokens: 52009369600 | elapsed time per iteration (s): 0.37 | learning rate: 2.860E-05 | global batch size: 256 | lm loss: 3.263822E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.495 | TFLOPs: 32.32 | 7: iteration 99300/ 115203 | consumed samples: 25420800 | consumed tokens: 52061798400 | elapsed time per iteration (s): 0.37 | learning rate: 2.850E-05 | global batch size: 256 | lm loss: 3.268742E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.534 | TFLOPs: 32.32 | 7: iteration 99400/ 115203 | consumed samples: 25446400 | consumed tokens: 52114227200 | elapsed time per iteration (s): 0.37 | learning rate: 2.839E-05 | global batch size: 256 | lm loss: 3.270327E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.027 | TFLOPs: 32.30 | 7: iteration 99500/ 115203 | consumed samples: 25472000 | consumed tokens: 52166656000 | elapsed time per iteration (s): 0.37 | learning rate: 2.829E-05 | global batch size: 256 | lm loss: 3.265218E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.301 | TFLOPs: 32.27 | 7: iteration 99600/ 115203 | consumed samples: 25497600 | consumed tokens: 52219084800 | elapsed time per iteration (s): 0.37 | learning rate: 2.819E-05 | global batch size: 256 | lm loss: 3.265867E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.088 | TFLOPs: 32.26 | 7: iteration 99700/ 115203 | consumed samples: 25523200 | consumed tokens: 52271513600 | elapsed time per iteration (s): 0.37 | learning rate: 2.808E-05 | global batch size: 256 | lm loss: 3.264974E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.043 | TFLOPs: 32.21 | 7: iteration 99800/ 115203 | consumed samples: 25548800 | consumed tokens: 52323942400 | elapsed time per iteration (s): 0.37 | learning rate: 2.798E-05 | global batch size: 256 | lm loss: 3.264912E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.925 | TFLOPs: 32.30 | 7: iteration 99900/ 115203 | consumed samples: 25574400 | consumed tokens: 52376371200 | elapsed time per iteration (s): 0.37 | learning rate: 2.788E-05 | global batch size: 256 | lm loss: 3.269108E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.952 | TFLOPs: 32.30 | 0: [2023-03-17 09:45:24,504] [INFO] [logging.py:68:log_dist] [Rank 0] step=100000, skipped=0, lr=[2.777783369036059e-05, 2.777783369036059e-05, 2.777783369036059e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 100000/ 115203 | consumed samples: 25600000 | consumed tokens: 52428800000 | elapsed time per iteration (s): 0.37 | learning rate: 2.778E-05 | global batch size: 256 | lm loss: 3.265310E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.395 | TFLOPs: 32.27 | 0: steps: 100000 loss: 3.2739 iter time (s): 0.368 samples/sec: 695.648 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 100000 | lm loss value: 3.289312E+00 | lm loss PPL: 2.682441E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 100000 to checkpoints_146m60b400m 0: [2023-03-17 09:45:24,633] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step100000 is begin to save! 0: [2023-03-17 09:45:24,638] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/layer_01-model_00-model_states.pt... 0: [2023-03-17 09:45:24,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/layer_01-model_00-model_states.pt. 0: [2023-03-17 09:45:24,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/layer_03-model_00-model_states.pt... 0: [2023-03-17 09:45:24,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/layer_03-model_00-model_states.pt. 0: [2023-03-17 09:45:24,748] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/layer_04-model_00-model_states.pt... 0: [2023-03-17 09:45:24,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/layer_04-model_00-model_states.pt. 0: [2023-03-17 09:45:24,763] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/layer_05-model_00-model_states.pt... 0: [2023-03-17 09:45:24,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/layer_05-model_00-model_states.pt. 0: [2023-03-17 09:45:24,778] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/layer_06-model_00-model_states.pt... 0: [2023-03-17 09:45:24,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/layer_06-model_00-model_states.pt. 0: [2023-03-17 09:45:24,793] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/layer_07-model_00-model_states.pt... 0: [2023-03-17 09:45:24,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/layer_07-model_00-model_states.pt. 0: [2023-03-17 09:45:24,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/layer_08-model_00-model_states.pt... 0: [2023-03-17 09:45:24,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/layer_08-model_00-model_states.pt. 0: [2023-03-17 09:45:24,823] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/layer_09-model_00-model_states.pt... 0: [2023-03-17 09:45:24,838] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/layer_09-model_00-model_states.pt. 0: [2023-03-17 09:45:24,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/layer_10-model_00-model_states.pt... 0: [2023-03-17 09:45:24,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/layer_10-model_00-model_states.pt. 0: [2023-03-17 09:45:24,853] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/layer_11-model_00-model_states.pt... 0: [2023-03-17 09:45:24,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/layer_11-model_00-model_states.pt. 0: [2023-03-17 09:45:24,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/layer_12-model_00-model_states.pt... 0: [2023-03-17 09:45:24,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/layer_12-model_00-model_states.pt. 0: [2023-03-17 09:45:24,883] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/layer_13-model_00-model_states.pt... 0: [2023-03-17 09:45:24,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/layer_13-model_00-model_states.pt. 0: [2023-03-17 09:45:24,898] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/layer_14-model_00-model_states.pt... 0: [2023-03-17 09:45:24,913] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/layer_14-model_00-model_states.pt. 0: [2023-03-17 09:45:24,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/layer_15-model_00-model_states.pt... 0: [2023-03-17 09:45:24,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/layer_15-model_00-model_states.pt. 0: [2023-03-17 09:45:24,928] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/layer_16-model_00-model_states.pt... 0: [2023-03-17 09:45:24,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/layer_16-model_00-model_states.pt. 0: [2023-03-17 09:45:24,943] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/layer_17-model_00-model_states.pt... 0: [2023-03-17 09:45:24,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/layer_17-model_00-model_states.pt. 0: [2023-03-17 09:45:24,958] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/layer_19-model_00-model_states.pt... 0: [2023-03-17 09:45:24,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/layer_19-model_00-model_states.pt. 0: [2023-03-17 09:45:24,960] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b400m/global_step100000/mp_rank_00_model_states.pt 0: [2023-03-17 09:45:24,960] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/mp_rank_00_model_states.pt... 0: [2023-03-17 09:45:24,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/mp_rank_00_model_states.pt. 0: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 2: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 4: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 5: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 3: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 0: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 2: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 4: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 6: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 1: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 3: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 4: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 6: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 1: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 09:45:24,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 0: [2023-03-17 09:45:25,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 09:45:25,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 09:45:25,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 09:45:25,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 09:45:25,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 09:45:25,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 09:45:25,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 09:45:25,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 09:45:25,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 09:45:25,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 09:45:25,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 09:45:25,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 09:45:25,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 09:45:25,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 09:45:25,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 09:45:25,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 09:45:25,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 09:45:25,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 09:45:25,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 09:45:25,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 09:45:25,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 09:45:25,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 09:45:25,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 09:45:25,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 09:45:25,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 09:45:25,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 09:45:25,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 09:45:25,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 09:45:25,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 09:45:25,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 09:45:25,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 09:45:25,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 09:45:25,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 09:45:25,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 09:45:25,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 09:45:25,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 09:45:25,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 09:45:25,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 09:45:25,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 09:45:25,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 09:45:25,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 09:45:25,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 09:45:25,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 09:45:25,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 09:45:25,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 09:45:25,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 09:45:25,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 09:45:25,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 09:45:25,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 09:45:25,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 09:45:25,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 09:45:25,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 09:45:25,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 09:45:25,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 09:45:25,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 09:45:25,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 09:45:25,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 09:45:25,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 09:45:25,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 09:45:25,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 09:45:25,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 09:45:25,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 09:45:25,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 09:45:25,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 09:45:25,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 09:45:25,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 09:45:25,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 09:45:25,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 09:45:25,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 09:45:25,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 09:45:25,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 09:45:25,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 09:45:25,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 09:45:25,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 09:45:25,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 09:45:25,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 09:45:25,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 09:45:25,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 09:45:25,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 09:45:25,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 09:45:25,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 09:45:25,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 09:45:25,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 09:45:25,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 09:45:25,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 09:45:25,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 09:45:25,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 09:45:25,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 09:45:25,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 09:45:25,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 09:45:25,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 09:45:25,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 09:45:25,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 09:45:25,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 09:45:25,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 09:45:25,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 09:45:25,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 09:45:25,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 09:45:25,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 09:45:25,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 09:45:25,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 09:45:25,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 09:45:25,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 09:45:25,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 09:45:25,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 09:45:25,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 09:45:25,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 09:45:25,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 09:45:25,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 09:45:25,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 09:45:25,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 09:45:25,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 09:45:25,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 09:45:25,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 09:45:25,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 09:45:25,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 09:45:25,052] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 09:45:25,052] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 09:45:25,052] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 09:45:25,052] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 09:45:25,052] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 09:45:25,052] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 09:45:25,052] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 09:45:25,052] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 09:45:25,052] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 09:45:25,052] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 09:45:25,052] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 09:45:25,052] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 09:45:25,052] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 09:45:25,052] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 09:45:25,052] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 09:45:25,052] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 09:45:25,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 09:45:25,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 09:45:25,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 09:45:25,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 09:45:25,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 09:45:25,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 09:45:25,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 09:45:25,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 09:45:25,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 09:45:25,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 09:45:25,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 09:45:25,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 09:45:25,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 09:45:25,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 09:45:25,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 2: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 7: [2023-03-17 09:45:25,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 09:45:25,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 09:45:25,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 09:45:25,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 09:45:25,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 7: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 09:45:25,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 7: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 09:45:25,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 09:45:25,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 09:45:25,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 09:45:25,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 09:45:25,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 09:45:25,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 09:45:25,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step100000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 09:45:25,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: successfully saved checkpoint at iteration 100000 to checkpoints_146m60b400m 7: time (ms) | save-checkpoint: 435.37 7: iteration 100100/ 115203 | consumed samples: 25625600 | consumed tokens: 52481228800 | elapsed time per iteration (s): 0.38 | learning rate: 2.768E-05 | global batch size: 256 | lm loss: 3.263742E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.307 | TFLOPs: 31.80 | 7: iteration 100200/ 115203 | consumed samples: 25651200 | consumed tokens: 52533657600 | elapsed time per iteration (s): 0.37 | learning rate: 2.758E-05 | global batch size: 256 | lm loss: 3.266755E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.191 | TFLOPs: 32.31 | 7: iteration 100300/ 115203 | consumed samples: 25676800 | consumed tokens: 52586086400 | elapsed time per iteration (s): 0.37 | learning rate: 2.748E-05 | global batch size: 256 | lm loss: 3.265172E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.666 | TFLOPs: 32.28 | 7: iteration 100400/ 115203 | consumed samples: 25702400 | consumed tokens: 52638515200 | elapsed time per iteration (s): 0.37 | learning rate: 2.738E-05 | global batch size: 256 | lm loss: 3.261620E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.484 | TFLOPs: 32.32 | 7: iteration 100500/ 115203 | consumed samples: 25728000 | consumed tokens: 52690944000 | elapsed time per iteration (s): 0.37 | learning rate: 2.728E-05 | global batch size: 256 | lm loss: 3.266691E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.029 | TFLOPs: 32.25 | 7: iteration 100600/ 115203 | consumed samples: 25753600 | consumed tokens: 52743372800 | elapsed time per iteration (s): 0.37 | learning rate: 2.718E-05 | global batch size: 256 | lm loss: 3.258689E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.482 | TFLOPs: 32.18 | 7: iteration 100700/ 115203 | consumed samples: 25779200 | consumed tokens: 52795801600 | elapsed time per iteration (s): 0.37 | learning rate: 2.709E-05 | global batch size: 256 | lm loss: 3.267918E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.182 | TFLOPs: 32.31 | 7: iteration 100800/ 115203 | consumed samples: 25804800 | consumed tokens: 52848230400 | elapsed time per iteration (s): 0.37 | learning rate: 2.699E-05 | global batch size: 256 | lm loss: 3.264210E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.363 | TFLOPs: 32.32 | 7: iteration 100900/ 115203 | consumed samples: 25830400 | consumed tokens: 52900659200 | elapsed time per iteration (s): 0.37 | learning rate: 2.690E-05 | global batch size: 256 | lm loss: 3.263231E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.591 | TFLOPs: 32.33 | 7: iteration 101000/ 115203 | consumed samples: 25856000 | consumed tokens: 52953088000 | elapsed time per iteration (s): 0.37 | learning rate: 2.680E-05 | global batch size: 256 | lm loss: 3.263130E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.598 | TFLOPs: 32.33 | 7: iteration 101100/ 115203 | consumed samples: 25881600 | consumed tokens: 53005516800 | elapsed time per iteration (s): 0.37 | learning rate: 2.671E-05 | global batch size: 256 | lm loss: 3.263433E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.694 | TFLOPs: 32.19 | 7: iteration 101200/ 115203 | consumed samples: 25907200 | consumed tokens: 53057945600 | elapsed time per iteration (s): 0.37 | learning rate: 2.661E-05 | global batch size: 256 | lm loss: 3.265814E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.977 | TFLOPs: 32.11 | 7: iteration 101300/ 115203 | consumed samples: 25932800 | consumed tokens: 53110374400 | elapsed time per iteration (s): 0.37 | learning rate: 2.652E-05 | global batch size: 256 | lm loss: 3.263789E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.736 | TFLOPs: 32.29 | 7: iteration 101400/ 115203 | consumed samples: 25958400 | consumed tokens: 53162803200 | elapsed time per iteration (s): 0.37 | learning rate: 2.643E-05 | global batch size: 256 | lm loss: 3.267837E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.253 | TFLOPs: 32.27 | 7: iteration 101500/ 115203 | consumed samples: 25984000 | consumed tokens: 53215232000 | elapsed time per iteration (s): 0.37 | learning rate: 2.634E-05 | global batch size: 256 | lm loss: 3.264481E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.331 | TFLOPs: 32.32 | 7: iteration 101600/ 115203 | consumed samples: 26009600 | consumed tokens: 53267660800 | elapsed time per iteration (s): 0.37 | learning rate: 2.625E-05 | global batch size: 256 | lm loss: 3.264725E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.863 | TFLOPs: 32.25 | 7: iteration 101700/ 115203 | consumed samples: 26035200 | consumed tokens: 53320089600 | elapsed time per iteration (s): 0.37 | learning rate: 2.615E-05 | global batch size: 256 | lm loss: 3.264341E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.821 | TFLOPs: 31.92 | 7: iteration 101800/ 115203 | consumed samples: 26060800 | consumed tokens: 53372518400 | elapsed time per iteration (s): 0.37 | learning rate: 2.606E-05 | global batch size: 256 | lm loss: 3.264125E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.930 | TFLOPs: 32.02 | 7: iteration 101900/ 115203 | consumed samples: 26086400 | consumed tokens: 53424947200 | elapsed time per iteration (s): 0.37 | learning rate: 2.598E-05 | global batch size: 256 | lm loss: 3.263801E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.142 | TFLOPs: 32.03 | 0: [2023-03-17 09:57:46,576] [INFO] [logging.py:68:log_dist] [Rank 0] step=102000, skipped=0, lr=[2.5887309996453706e-05, 2.5887309996453706e-05, 2.5887309996453706e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 102000/ 115203 | consumed samples: 26112000 | consumed tokens: 53477376000 | elapsed time per iteration (s): 0.37 | learning rate: 2.589E-05 | global batch size: 256 | lm loss: 3.257553E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.855 | TFLOPs: 32.29 | 0: steps: 102000 loss: 3.2526 iter time (s): 0.369 samples/sec: 694.484 7: iteration 102100/ 115203 | consumed samples: 26137600 | consumed tokens: 53529804800 | elapsed time per iteration (s): 0.38 | learning rate: 2.580E-05 | global batch size: 256 | lm loss: 3.263480E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.950 | TFLOPs: 31.78 | 7: iteration 102200/ 115203 | consumed samples: 26163200 | consumed tokens: 53582233600 | elapsed time per iteration (s): 0.37 | learning rate: 2.571E-05 | global batch size: 256 | lm loss: 3.267635E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.282 | TFLOPs: 31.94 | 7: iteration 102300/ 115203 | consumed samples: 26188800 | consumed tokens: 53634662400 | elapsed time per iteration (s): 0.38 | learning rate: 2.563E-05 | global batch size: 256 | lm loss: 3.263841E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.723 | TFLOPs: 31.82 | 7: iteration 102400/ 115203 | consumed samples: 26214400 | consumed tokens: 53687091200 | elapsed time per iteration (s): 0.38 | learning rate: 2.554E-05 | global batch size: 256 | lm loss: 3.260755E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.659 | TFLOPs: 31.82 | 7: iteration 102500/ 115203 | consumed samples: 26240000 | consumed tokens: 53739520000 | elapsed time per iteration (s): 0.37 | learning rate: 2.545E-05 | global batch size: 256 | lm loss: 3.265418E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.813 | TFLOPs: 32.01 | 7: iteration 102600/ 115203 | consumed samples: 26265600 | consumed tokens: 53791948800 | elapsed time per iteration (s): 0.37 | learning rate: 2.537E-05 | global batch size: 256 | lm loss: 3.264455E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.163 | TFLOPs: 32.07 | 7: iteration 102700/ 115203 | consumed samples: 26291200 | consumed tokens: 53844377600 | elapsed time per iteration (s): 0.37 | learning rate: 2.529E-05 | global batch size: 256 | lm loss: 3.263239E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.833 | TFLOPs: 31.87 | 7: iteration 102800/ 115203 | consumed samples: 26316800 | consumed tokens: 53896806400 | elapsed time per iteration (s): 0.37 | learning rate: 2.520E-05 | global batch size: 256 | lm loss: 3.263556E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.720 | TFLOPs: 31.91 | 7: iteration 102900/ 115203 | consumed samples: 26342400 | consumed tokens: 53949235200 | elapsed time per iteration (s): 0.37 | learning rate: 2.512E-05 | global batch size: 256 | lm loss: 3.265542E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.208 | TFLOPs: 31.98 | 7: iteration 103000/ 115203 | consumed samples: 26368000 | consumed tokens: 54001664000 | elapsed time per iteration (s): 0.38 | learning rate: 2.504E-05 | global batch size: 256 | lm loss: 3.260927E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.671 | TFLOPs: 31.72 | 7: iteration 103100/ 115203 | consumed samples: 26393600 | consumed tokens: 54054092800 | elapsed time per iteration (s): 0.37 | learning rate: 2.496E-05 | global batch size: 256 | lm loss: 3.262350E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.182 | TFLOPs: 32.03 | 7: iteration 103200/ 115203 | consumed samples: 26419200 | consumed tokens: 54106521600 | elapsed time per iteration (s): 0.37 | learning rate: 2.488E-05 | global batch size: 256 | lm loss: 3.267043E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.757 | TFLOPs: 32.06 | 7: iteration 103300/ 115203 | consumed samples: 26444800 | consumed tokens: 54158950400 | elapsed time per iteration (s): 0.38 | learning rate: 2.480E-05 | global batch size: 256 | lm loss: 3.266506E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.478 | TFLOPs: 31.67 | 7: iteration 103400/ 115203 | consumed samples: 26470400 | consumed tokens: 54211379200 | elapsed time per iteration (s): 0.38 | learning rate: 2.472E-05 | global batch size: 256 | lm loss: 3.261607E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.823 | TFLOPs: 31.68 | 7: iteration 103500/ 115203 | consumed samples: 26496000 | consumed tokens: 54263808000 | elapsed time per iteration (s): 0.38 | learning rate: 2.464E-05 | global batch size: 256 | lm loss: 3.266759E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.466 | TFLOPs: 31.67 | 7: iteration 103600/ 115203 | consumed samples: 26521600 | consumed tokens: 54316236800 | elapsed time per iteration (s): 0.37 | learning rate: 2.456E-05 | global batch size: 256 | lm loss: 3.263209E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.739 | TFLOPs: 31.96 | 7: iteration 103700/ 115203 | consumed samples: 26547200 | consumed tokens: 54368665600 | elapsed time per iteration (s): 0.37 | learning rate: 2.448E-05 | global batch size: 256 | lm loss: 3.263328E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.083 | TFLOPs: 31.88 | 7: iteration 103800/ 115203 | consumed samples: 26572800 | consumed tokens: 54421094400 | elapsed time per iteration (s): 0.38 | learning rate: 2.440E-05 | global batch size: 256 | lm loss: 3.261308E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.991 | TFLOPs: 31.74 | 7: iteration 103900/ 115203 | consumed samples: 26598400 | consumed tokens: 54473523200 | elapsed time per iteration (s): 0.38 | learning rate: 2.433E-05 | global batch size: 256 | lm loss: 3.263028E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.131 | TFLOPs: 31.84 | 0: [2023-03-17 10:10:16,531] [INFO] [logging.py:68:log_dist] [Rank 0] step=104000, skipped=0, lr=[2.4252001760011466e-05, 2.4252001760011466e-05, 2.4252001760011466e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 104000/ 115203 | consumed samples: 26624000 | consumed tokens: 54525952000 | elapsed time per iteration (s): 0.37 | learning rate: 2.425E-05 | global batch size: 256 | lm loss: 3.261915E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.694 | TFLOPs: 31.87 | 0: steps: 104000 loss: 3.2826 iter time (s): 0.373 samples/sec: 686.474 7: iteration 104100/ 115203 | consumed samples: 26649600 | consumed tokens: 54578380800 | elapsed time per iteration (s): 0.37 | learning rate: 2.418E-05 | global batch size: 256 | lm loss: 3.264478E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.871 | TFLOPs: 32.01 | 7: iteration 104200/ 115203 | consumed samples: 26675200 | consumed tokens: 54630809600 | elapsed time per iteration (s): 0.38 | learning rate: 2.410E-05 | global batch size: 256 | lm loss: 3.264619E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.802 | TFLOPs: 31.82 | 7: iteration 104300/ 115203 | consumed samples: 26700800 | consumed tokens: 54683238400 | elapsed time per iteration (s): 0.37 | learning rate: 2.403E-05 | global batch size: 256 | lm loss: 3.256637E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.975 | TFLOPs: 32.07 | 7: iteration 104400/ 115203 | consumed samples: 26726400 | consumed tokens: 54735667200 | elapsed time per iteration (s): 0.37 | learning rate: 2.396E-05 | global batch size: 256 | lm loss: 3.263799E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.561 | TFLOPs: 32.19 | 7: iteration 104500/ 115203 | consumed samples: 26752000 | consumed tokens: 54788096000 | elapsed time per iteration (s): 0.37 | learning rate: 2.388E-05 | global batch size: 256 | lm loss: 3.260132E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.288 | TFLOPs: 31.89 | 7: iteration 104600/ 115203 | consumed samples: 26777600 | consumed tokens: 54840524800 | elapsed time per iteration (s): 0.37 | learning rate: 2.381E-05 | global batch size: 256 | lm loss: 3.260612E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.232 | TFLOPs: 32.08 | 7: iteration 104700/ 115203 | consumed samples: 26803200 | consumed tokens: 54892953600 | elapsed time per iteration (s): 0.37 | learning rate: 2.374E-05 | global batch size: 256 | lm loss: 3.265874E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.323 | TFLOPs: 32.08 | 7: iteration 104800/ 115203 | consumed samples: 26828800 | consumed tokens: 54945382400 | elapsed time per iteration (s): 0.37 | learning rate: 2.367E-05 | global batch size: 256 | lm loss: 3.263626E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.478 | TFLOPs: 32.32 | 7: iteration 104900/ 115203 | consumed samples: 26854400 | consumed tokens: 54997811200 | elapsed time per iteration (s): 0.37 | learning rate: 2.360E-05 | global batch size: 256 | lm loss: 3.261180E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.341 | TFLOPs: 32.32 | 7: iteration 105000/ 115203 | consumed samples: 26880000 | consumed tokens: 55050240000 | elapsed time per iteration (s): 0.37 | learning rate: 2.353E-05 | global batch size: 256 | lm loss: 3.265053E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.760 | TFLOPs: 32.29 | 7: iteration 105100/ 115203 | consumed samples: 26905600 | consumed tokens: 55102668800 | elapsed time per iteration (s): 0.37 | learning rate: 2.346E-05 | global batch size: 256 | lm loss: 3.264037E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.749 | TFLOPs: 32.29 | 7: iteration 105200/ 115203 | consumed samples: 26931200 | consumed tokens: 55155097600 | elapsed time per iteration (s): 0.37 | learning rate: 2.340E-05 | global batch size: 256 | lm loss: 3.265132E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.200 | TFLOPs: 32.31 | 7: iteration 105300/ 115203 | consumed samples: 26956800 | consumed tokens: 55207526400 | elapsed time per iteration (s): 0.37 | learning rate: 2.333E-05 | global batch size: 256 | lm loss: 3.263697E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.857 | TFLOPs: 32.29 | 7: iteration 105400/ 115203 | consumed samples: 26982400 | consumed tokens: 55259955200 | elapsed time per iteration (s): 0.37 | learning rate: 2.326E-05 | global batch size: 256 | lm loss: 3.260626E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.613 | TFLOPs: 32.24 | 7: iteration 105500/ 115203 | consumed samples: 27008000 | consumed tokens: 55312384000 | elapsed time per iteration (s): 0.37 | learning rate: 2.320E-05 | global batch size: 256 | lm loss: 3.263477E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.434 | TFLOPs: 32.23 | 7: iteration 105600/ 115203 | consumed samples: 27033600 | consumed tokens: 55364812800 | elapsed time per iteration (s): 0.37 | learning rate: 2.313E-05 | global batch size: 256 | lm loss: 3.261003E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.482 | TFLOPs: 32.28 | 7: iteration 105700/ 115203 | consumed samples: 27059200 | consumed tokens: 55417241600 | elapsed time per iteration (s): 0.37 | learning rate: 2.307E-05 | global batch size: 256 | lm loss: 3.262827E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.057 | TFLOPs: 32.30 | 7: iteration 105800/ 115203 | consumed samples: 27084800 | consumed tokens: 55469670400 | elapsed time per iteration (s): 0.37 | learning rate: 2.300E-05 | global batch size: 256 | lm loss: 3.263013E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.998 | TFLOPs: 32.35 | 7: iteration 105900/ 115203 | consumed samples: 27110400 | consumed tokens: 55522099200 | elapsed time per iteration (s): 0.37 | learning rate: 2.294E-05 | global batch size: 256 | lm loss: 3.258717E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.316 | TFLOPs: 32.27 | 0: [2023-03-17 10:22:38,774] [INFO] [logging.py:68:log_dist] [Rank 0] step=106000, skipped=0, lr=[2.2876870847544666e-05, 2.2876870847544666e-05, 2.2876870847544666e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 106000/ 115203 | consumed samples: 27136000 | consumed tokens: 55574528000 | elapsed time per iteration (s): 0.37 | learning rate: 2.288E-05 | global batch size: 256 | lm loss: 3.258559E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.002 | TFLOPs: 32.35 | 0: steps: 106000 loss: 3.3010 iter time (s): 0.369 samples/sec: 693.560 7: iteration 106100/ 115203 | consumed samples: 27161600 | consumed tokens: 55626956800 | elapsed time per iteration (s): 0.37 | learning rate: 2.282E-05 | global batch size: 256 | lm loss: 3.260901E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.220 | TFLOPs: 32.31 | 7: iteration 106200/ 115203 | consumed samples: 27187200 | consumed tokens: 55679385600 | elapsed time per iteration (s): 0.37 | learning rate: 2.275E-05 | global batch size: 256 | lm loss: 3.268472E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.518 | TFLOPs: 32.23 | 7: iteration 106300/ 115203 | consumed samples: 27212800 | consumed tokens: 55731814400 | elapsed time per iteration (s): 0.37 | learning rate: 2.269E-05 | global batch size: 256 | lm loss: 3.259597E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.668 | TFLOPs: 32.14 | 7: iteration 106400/ 115203 | consumed samples: 27238400 | consumed tokens: 55784243200 | elapsed time per iteration (s): 0.37 | learning rate: 2.263E-05 | global batch size: 256 | lm loss: 3.261964E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.998 | TFLOPs: 32.02 | 7: iteration 106500/ 115203 | consumed samples: 27264000 | consumed tokens: 55836672000 | elapsed time per iteration (s): 0.37 | learning rate: 2.257E-05 | global batch size: 256 | lm loss: 3.267054E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.134 | TFLOPs: 32.31 | 7: iteration 106600/ 115203 | consumed samples: 27289600 | consumed tokens: 55889100800 | elapsed time per iteration (s): 0.37 | learning rate: 2.252E-05 | global batch size: 256 | lm loss: 3.258536E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.991 | TFLOPs: 32.35 | 7: iteration 106700/ 115203 | consumed samples: 27315200 | consumed tokens: 55941529600 | elapsed time per iteration (s): 0.37 | learning rate: 2.246E-05 | global batch size: 256 | lm loss: 3.261928E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.250 | TFLOPs: 32.31 | 7: iteration 106800/ 115203 | consumed samples: 27340800 | consumed tokens: 55993958400 | elapsed time per iteration (s): 0.37 | learning rate: 2.240E-05 | global batch size: 256 | lm loss: 3.262657E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.072 | TFLOPs: 32.30 | 7: iteration 106900/ 115203 | consumed samples: 27366400 | consumed tokens: 56046387200 | elapsed time per iteration (s): 0.37 | learning rate: 2.234E-05 | global batch size: 256 | lm loss: 3.262403E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.562 | TFLOPs: 32.33 | 7: iteration 107000/ 115203 | consumed samples: 27392000 | consumed tokens: 56098816000 | elapsed time per iteration (s): 0.37 | learning rate: 2.229E-05 | global batch size: 256 | lm loss: 3.260955E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.385 | TFLOPs: 32.32 | 7: iteration 107100/ 115203 | consumed samples: 27417600 | consumed tokens: 56151244800 | elapsed time per iteration (s): 0.37 | learning rate: 2.223E-05 | global batch size: 256 | lm loss: 3.264105E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.416 | TFLOPs: 32.32 | 7: iteration 107200/ 115203 | consumed samples: 27443200 | consumed tokens: 56203673600 | elapsed time per iteration (s): 0.37 | learning rate: 2.218E-05 | global batch size: 256 | lm loss: 3.262127E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.496 | TFLOPs: 32.32 | 7: iteration 107300/ 115203 | consumed samples: 27468800 | consumed tokens: 56256102400 | elapsed time per iteration (s): 0.37 | learning rate: 2.212E-05 | global batch size: 256 | lm loss: 3.259118E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.711 | TFLOPs: 32.24 | 7: iteration 107400/ 115203 | consumed samples: 27494400 | consumed tokens: 56308531200 | elapsed time per iteration (s): 0.37 | learning rate: 2.207E-05 | global batch size: 256 | lm loss: 3.260641E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.895 | TFLOPs: 32.16 | 7: iteration 107500/ 115203 | consumed samples: 27520000 | consumed tokens: 56360960000 | elapsed time per iteration (s): 0.37 | learning rate: 2.202E-05 | global batch size: 256 | lm loss: 3.261773E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.376 | TFLOPs: 32.27 | 7: iteration 107600/ 115203 | consumed samples: 27545600 | consumed tokens: 56413388800 | elapsed time per iteration (s): 0.37 | learning rate: 2.197E-05 | global batch size: 256 | lm loss: 3.262619E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.155 | TFLOPs: 32.31 | 7: iteration 107700/ 115203 | consumed samples: 27571200 | consumed tokens: 56465817600 | elapsed time per iteration (s): 0.37 | learning rate: 2.192E-05 | global batch size: 256 | lm loss: 3.259665E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.759 | TFLOPs: 32.10 | 7: iteration 107800/ 115203 | consumed samples: 27596800 | consumed tokens: 56518246400 | elapsed time per iteration (s): 0.38 | learning rate: 2.187E-05 | global batch size: 256 | lm loss: 3.255318E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.128 | TFLOPs: 31.79 | 7: iteration 107900/ 115203 | consumed samples: 27622400 | consumed tokens: 56570675200 | elapsed time per iteration (s): 0.37 | learning rate: 2.182E-05 | global batch size: 256 | lm loss: 3.258187E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.339 | TFLOPs: 32.18 | 0: [2023-03-17 10:35:00,389] [INFO] [logging.py:68:log_dist] [Rank 0] step=108000, skipped=0, lr=[2.176608969325893e-05, 2.176608969325893e-05, 2.176608969325893e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 108000/ 115203 | consumed samples: 27648000 | consumed tokens: 56623104000 | elapsed time per iteration (s): 0.37 | learning rate: 2.177E-05 | global batch size: 256 | lm loss: 3.261756E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.860 | TFLOPs: 32.20 | 0: steps: 108000 loss: 3.2603 iter time (s): 0.369 samples/sec: 694.124 7: iteration 108100/ 115203 | consumed samples: 27673600 | consumed tokens: 56675532800 | elapsed time per iteration (s): 0.37 | learning rate: 2.172E-05 | global batch size: 256 | lm loss: 3.261184E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.373 | TFLOPs: 32.32 | 7: iteration 108200/ 115203 | consumed samples: 27699200 | consumed tokens: 56727961600 | elapsed time per iteration (s): 0.37 | learning rate: 2.167E-05 | global batch size: 256 | lm loss: 3.261007E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.276 | TFLOPs: 32.31 | 7: iteration 108300/ 115203 | consumed samples: 27724800 | consumed tokens: 56780390400 | elapsed time per iteration (s): 0.37 | learning rate: 2.162E-05 | global batch size: 256 | lm loss: 3.262835E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.538 | TFLOPs: 32.33 | 7: iteration 108400/ 115203 | consumed samples: 27750400 | consumed tokens: 56832819200 | elapsed time per iteration (s): 0.37 | learning rate: 2.158E-05 | global batch size: 256 | lm loss: 3.261227E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.474 | TFLOPs: 32.32 | 7: iteration 108500/ 115203 | consumed samples: 27776000 | consumed tokens: 56885248000 | elapsed time per iteration (s): 0.37 | learning rate: 2.153E-05 | global batch size: 256 | lm loss: 3.260512E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.834 | TFLOPs: 32.29 | 7: iteration 108600/ 115203 | consumed samples: 27801600 | consumed tokens: 56937676800 | elapsed time per iteration (s): 0.37 | learning rate: 2.148E-05 | global batch size: 256 | lm loss: 3.258994E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.151 | TFLOPs: 32.31 | 7: iteration 108700/ 115203 | consumed samples: 27827200 | consumed tokens: 56990105600 | elapsed time per iteration (s): 0.43 | learning rate: 2.144E-05 | global batch size: 256 | lm loss: 3.263311E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.159 | TFLOPs: 28.11 | 7: iteration 108800/ 115203 | consumed samples: 27852800 | consumed tokens: 57042534400 | elapsed time per iteration (s): 0.37 | learning rate: 2.140E-05 | global batch size: 256 | lm loss: 3.257080E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.782 | TFLOPs: 32.29 | 7: iteration 108900/ 115203 | consumed samples: 27878400 | consumed tokens: 57094963200 | elapsed time per iteration (s): 0.37 | learning rate: 2.135E-05 | global batch size: 256 | lm loss: 3.261346E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.689 | TFLOPs: 32.33 | 7: iteration 109000/ 115203 | consumed samples: 27904000 | consumed tokens: 57147392000 | elapsed time per iteration (s): 0.37 | learning rate: 2.131E-05 | global batch size: 256 | lm loss: 3.264169E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.351 | TFLOPs: 32.32 | 7: iteration 109100/ 115203 | consumed samples: 27929600 | consumed tokens: 57199820800 | elapsed time per iteration (s): 0.37 | learning rate: 2.127E-05 | global batch size: 256 | lm loss: 3.259404E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 689.801 | TFLOPs: 32.20 | 7: iteration 109200/ 115203 | consumed samples: 27955200 | consumed tokens: 57252249600 | elapsed time per iteration (s): 0.37 | learning rate: 2.123E-05 | global batch size: 256 | lm loss: 3.258012E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.541 | TFLOPs: 32.33 | 7: iteration 109300/ 115203 | consumed samples: 27980800 | consumed tokens: 57304678400 | elapsed time per iteration (s): 0.37 | learning rate: 2.119E-05 | global batch size: 256 | lm loss: 3.260306E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.818 | TFLOPs: 32.29 | 7: iteration 109400/ 115203 | consumed samples: 28006400 | consumed tokens: 57357107200 | elapsed time per iteration (s): 0.37 | learning rate: 2.115E-05 | global batch size: 256 | lm loss: 3.262026E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.019 | TFLOPs: 32.30 | 7: iteration 109500/ 115203 | consumed samples: 28032000 | consumed tokens: 57409536000 | elapsed time per iteration (s): 0.37 | learning rate: 2.111E-05 | global batch size: 256 | lm loss: 3.262962E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.804 | TFLOPs: 32.29 | 7: iteration 109600/ 115203 | consumed samples: 28057600 | consumed tokens: 57461964800 | elapsed time per iteration (s): 0.37 | learning rate: 2.107E-05 | global batch size: 256 | lm loss: 3.260493E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.548 | TFLOPs: 32.28 | 7: iteration 109700/ 115203 | consumed samples: 28083200 | consumed tokens: 57514393600 | elapsed time per iteration (s): 0.37 | learning rate: 2.103E-05 | global batch size: 256 | lm loss: 3.259239E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.358 | TFLOPs: 32.32 | 7: iteration 109800/ 115203 | consumed samples: 28108800 | consumed tokens: 57566822400 | elapsed time per iteration (s): 0.37 | learning rate: 2.100E-05 | global batch size: 256 | lm loss: 3.259791E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.802 | TFLOPs: 32.24 | 7: iteration 109900/ 115203 | consumed samples: 28134400 | consumed tokens: 57619251200 | elapsed time per iteration (s): 0.37 | learning rate: 2.096E-05 | global batch size: 256 | lm loss: 3.262592E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.370 | TFLOPs: 32.27 | 0: [2023-03-17 10:47:25,847] [INFO] [logging.py:68:log_dist] [Rank 0] step=110000, skipped=0, lr=[2.092302863901853e-05, 2.092302863901853e-05, 2.092302863901853e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 110000/ 115203 | consumed samples: 28160000 | consumed tokens: 57671680000 | elapsed time per iteration (s): 0.37 | learning rate: 2.092E-05 | global batch size: 256 | lm loss: 3.260002E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.481 | TFLOPs: 32.32 | 0: steps: 110000 loss: 3.2487 iter time (s): 0.371 samples/sec: 690.603 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 110000 | lm loss value: 3.394175E+00 | lm loss PPL: 2.979008E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 110000 to checkpoints_146m60b400m 0: [2023-03-17 10:47:25,971] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step110000 is begin to save! 0: [2023-03-17 10:47:25,975] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/layer_01-model_00-model_states.pt... 0: [2023-03-17 10:47:26,066] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/layer_01-model_00-model_states.pt. 0: [2023-03-17 10:47:26,066] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/layer_03-model_00-model_states.pt... 0: [2023-03-17 10:47:26,082] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/layer_03-model_00-model_states.pt. 0: [2023-03-17 10:47:26,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/layer_04-model_00-model_states.pt... 0: [2023-03-17 10:47:26,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/layer_04-model_00-model_states.pt. 0: [2023-03-17 10:47:26,098] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/layer_05-model_00-model_states.pt... 0: [2023-03-17 10:47:26,113] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/layer_05-model_00-model_states.pt. 0: [2023-03-17 10:47:26,113] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/layer_06-model_00-model_states.pt... 0: [2023-03-17 10:47:26,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/layer_06-model_00-model_states.pt. 0: [2023-03-17 10:47:26,128] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/layer_07-model_00-model_states.pt... 0: [2023-03-17 10:47:26,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/layer_07-model_00-model_states.pt. 0: [2023-03-17 10:47:26,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/layer_08-model_00-model_states.pt... 0: [2023-03-17 10:47:26,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/layer_08-model_00-model_states.pt. 0: [2023-03-17 10:47:26,158] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/layer_09-model_00-model_states.pt... 0: [2023-03-17 10:47:26,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/layer_09-model_00-model_states.pt. 0: [2023-03-17 10:47:26,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/layer_10-model_00-model_states.pt... 0: [2023-03-17 10:47:26,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/layer_10-model_00-model_states.pt. 0: [2023-03-17 10:47:26,188] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/layer_11-model_00-model_states.pt... 0: [2023-03-17 10:47:26,202] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/layer_11-model_00-model_states.pt. 0: [2023-03-17 10:47:26,202] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/layer_12-model_00-model_states.pt... 0: [2023-03-17 10:47:26,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/layer_12-model_00-model_states.pt. 0: [2023-03-17 10:47:26,217] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/layer_13-model_00-model_states.pt... 0: [2023-03-17 10:47:26,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/layer_13-model_00-model_states.pt. 0: [2023-03-17 10:47:26,232] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/layer_14-model_00-model_states.pt... 0: [2023-03-17 10:47:26,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/layer_14-model_00-model_states.pt. 0: [2023-03-17 10:47:26,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/layer_15-model_00-model_states.pt... 0: [2023-03-17 10:47:26,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/layer_15-model_00-model_states.pt. 0: [2023-03-17 10:47:26,261] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/layer_16-model_00-model_states.pt... 0: [2023-03-17 10:47:26,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/layer_16-model_00-model_states.pt. 0: [2023-03-17 10:47:26,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/layer_17-model_00-model_states.pt... 0: [2023-03-17 10:47:26,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/layer_17-model_00-model_states.pt. 0: [2023-03-17 10:47:26,291] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/layer_19-model_00-model_states.pt... 0: [2023-03-17 10:47:26,292] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/layer_19-model_00-model_states.pt. 0: [2023-03-17 10:47:26,292] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b400m/global_step110000/mp_rank_00_model_states.pt 0: [2023-03-17 10:47:26,292] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/mp_rank_00_model_states.pt... 0: [2023-03-17 10:47:26,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/mp_rank_00_model_states.pt. 0: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 2: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 4: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 6: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 1: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 3: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 7: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 2: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 4: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 6: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 3: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 7: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 5: [2023-03-17 10:47:26,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 0: [2023-03-17 10:47:26,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 10:47:26,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 10:47:26,348] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 10:47:26,348] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 10:47:26,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 10:47:26,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 10:47:26,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 10:47:26,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 6: [2023-03-17 10:47:26,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 10:47:26,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 10:47:26,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 10:47:26,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 10:47:26,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 10:47:26,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 10:47:26,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 10:47:26,351] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 10:47:26,351] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 10:47:26,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 10:47:26,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 10:47:26,352] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 10:47:26,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 10:47:26,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 10:47:26,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 10:47:26,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 10:47:26,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 10:47:26,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 10:47:26,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 10:47:26,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 10:47:26,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 10:47:26,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 10:47:26,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 10:47:26,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 10:47:26,355] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 10:47:26,355] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 10:47:26,355] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 10:47:26,355] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 10:47:26,355] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 10:47:26,355] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 10:47:26,355] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 7: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 6: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 10:47:26,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 10:47:26,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 10:47:26,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 10:47:26,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 10:47:26,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 10:47:26,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 10:47:26,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 10:47:26,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 10:47:26,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 10:47:26,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 10:47:26,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 10:47:26,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 10:47:26,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 10:47:26,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 10:47:26,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 10:47:26,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 10:47:26,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 10:47:26,351] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 10:47:26,351] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 10:47:26,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 10:47:26,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 10:47:26,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 10:47:26,354] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 0: [2023-03-17 10:47:26,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 1: [2023-03-17 10:47:26,354] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 10:47:26,354] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 10:47:26,355] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 0: [2023-03-17 10:47:26,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 10:47:26,355] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 10:47:26,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 10:47:26,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 10:47:26,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 10:47:26,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 10:47:26,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 10:47:26,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 10:47:26,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 10:47:26,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 10:47:26,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 10:47:26,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 10:47:26,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 10:47:26,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 10:47:26,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 10:47:26,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 10:47:26,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 10:47:26,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 10:47:26,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 10:47:26,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 10:47:26,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 10:47:26,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 10:47:26,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 10:47:26,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 10:47:26,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 10:47:26,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 10:47:26,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 10:47:26,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 10:47:26,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 10:47:26,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 10:47:26,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 10:47:26,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 10:47:26,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 10:47:26,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 10:47:26,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 10:47:26,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 10:47:26,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 10:47:26,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 10:47:26,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 10:47:26,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 10:47:26,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 10:47:26,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 10:47:26,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 10:47:26,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 10:47:26,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 10:47:26,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 10:47:26,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 10:47:26,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 10:47:26,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 10:47:26,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 10:47:26,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 10:47:26,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 10:47:26,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 10:47:26,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 10:47:26,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 10:47:26,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 10:47:26,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 10:47:26,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 10:47:26,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 10:47:26,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 10:47:26,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 10:47:26,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 10:47:26,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 10:47:26,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 10:47:26,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 10:47:26,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 10:47:26,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 10:47:26,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 10:47:26,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 10:47:26,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 10:47:26,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 10:47:26,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 10:47:26,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 10:47:26,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 10:47:26,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 10:47:26,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 10:47:26,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 10:47:26,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 10:47:26,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 10:47:26,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 10:47:26,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 10:47:26,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 10:47:26,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 10:47:26,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 10:47:26,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 10:47:26,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 10:47:26,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 10:47:26,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 10:47:26,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 10:47:26,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 10:47:26,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 10:47:26,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 10:47:26,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 10:47:26,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 10:47:26,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 10:47:26,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 10:47:26,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 10:47:26,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 10:47:26,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step110000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 10:47:26,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 10:47:26,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 10:47:26,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 10:47:26,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 10:47:26,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 10:47:26,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 10:47:26,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 10:47:26,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: successfully saved checkpoint at iteration 110000 to checkpoints_146m60b400m 7: time (ms) | save-checkpoint: 429.06 7: iteration 110100/ 115203 | consumed samples: 28185600 | consumed tokens: 57724108800 | elapsed time per iteration (s): 0.38 | learning rate: 2.089E-05 | global batch size: 256 | lm loss: 3.262252E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.015 | TFLOPs: 31.23 | 7: iteration 110200/ 115203 | consumed samples: 28211200 | consumed tokens: 57776537600 | elapsed time per iteration (s): 0.37 | learning rate: 2.085E-05 | global batch size: 256 | lm loss: 3.260299E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.220 | TFLOPs: 32.31 | 7: iteration 110300/ 115203 | consumed samples: 28236800 | consumed tokens: 57828966400 | elapsed time per iteration (s): 0.37 | learning rate: 2.082E-05 | global batch size: 256 | lm loss: 3.263617E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.515 | TFLOPs: 32.32 | 7: iteration 110400/ 115203 | consumed samples: 28262400 | consumed tokens: 57881395200 | elapsed time per iteration (s): 0.37 | learning rate: 2.079E-05 | global batch size: 256 | lm loss: 3.258792E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.748 | TFLOPs: 32.33 | 7: iteration 110500/ 115203 | consumed samples: 28288000 | consumed tokens: 57933824000 | elapsed time per iteration (s): 0.37 | learning rate: 2.075E-05 | global batch size: 256 | lm loss: 3.256809E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.957 | TFLOPs: 31.97 | 7: iteration 110600/ 115203 | consumed samples: 28313600 | consumed tokens: 57986252800 | elapsed time per iteration (s): 0.38 | learning rate: 2.072E-05 | global batch size: 256 | lm loss: 3.260181E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.153 | TFLOPs: 31.65 | 7: iteration 110700/ 115203 | consumed samples: 28339200 | consumed tokens: 58038681600 | elapsed time per iteration (s): 0.37 | learning rate: 2.069E-05 | global batch size: 256 | lm loss: 3.259878E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.324 | TFLOPs: 32.32 | 7: iteration 110800/ 115203 | consumed samples: 28364800 | consumed tokens: 58091110400 | elapsed time per iteration (s): 0.37 | learning rate: 2.066E-05 | global batch size: 256 | lm loss: 3.259263E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.490 | TFLOPs: 32.32 | 7: iteration 110900/ 115203 | consumed samples: 28390400 | consumed tokens: 58143539200 | elapsed time per iteration (s): 0.37 | learning rate: 2.063E-05 | global batch size: 256 | lm loss: 3.258869E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.142 | TFLOPs: 32.31 | 7: iteration 111000/ 115203 | consumed samples: 28416000 | consumed tokens: 58195968000 | elapsed time per iteration (s): 0.37 | learning rate: 2.060E-05 | global batch size: 256 | lm loss: 3.262869E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 691.445 | TFLOPs: 32.27 | 7: iteration 111100/ 115203 | consumed samples: 28441600 | consumed tokens: 58248396800 | elapsed time per iteration (s): 0.37 | learning rate: 2.057E-05 | global batch size: 256 | lm loss: 3.256908E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.707 | TFLOPs: 32.15 | 7: iteration 111200/ 115203 | consumed samples: 28467200 | consumed tokens: 58300825600 | elapsed time per iteration (s): 0.37 | learning rate: 2.055E-05 | global batch size: 256 | lm loss: 3.262903E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.737 | TFLOPs: 32.24 | 7: iteration 111300/ 115203 | consumed samples: 28492800 | consumed tokens: 58353254400 | elapsed time per iteration (s): 0.37 | learning rate: 2.052E-05 | global batch size: 256 | lm loss: 3.257304E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.475 | TFLOPs: 32.37 | 7: iteration 111400/ 115203 | consumed samples: 28518400 | consumed tokens: 58405683200 | elapsed time per iteration (s): 0.37 | learning rate: 2.049E-05 | global batch size: 256 | lm loss: 3.257198E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.545 | TFLOPs: 32.37 | 7: iteration 111500/ 115203 | consumed samples: 28544000 | consumed tokens: 58458112000 | elapsed time per iteration (s): 0.37 | learning rate: 2.047E-05 | global batch size: 256 | lm loss: 3.259069E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.526 | TFLOPs: 32.37 | 7: iteration 111600/ 115203 | consumed samples: 28569600 | consumed tokens: 58510540800 | elapsed time per iteration (s): 0.64 | learning rate: 2.044E-05 | global batch size: 256 | lm loss: 3.260231E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 400.367 | TFLOPs: 18.69 | 7: iteration 111700/ 115203 | consumed samples: 28595200 | consumed tokens: 58562969600 | elapsed time per iteration (s): 0.37 | learning rate: 2.042E-05 | global batch size: 256 | lm loss: 3.263009E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.755 | TFLOPs: 32.38 | 7: iteration 111800/ 115203 | consumed samples: 28620800 | consumed tokens: 58615398400 | elapsed time per iteration (s): 0.37 | learning rate: 2.040E-05 | global batch size: 256 | lm loss: 3.263038E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 694.284 | TFLOPs: 32.41 | 7: iteration 111900/ 115203 | consumed samples: 28646400 | consumed tokens: 58667827200 | elapsed time per iteration (s): 0.37 | learning rate: 2.037E-05 | global batch size: 256 | lm loss: 3.265057E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.724 | TFLOPs: 32.38 | 0: [2023-03-17 11:00:14,584] [INFO] [logging.py:68:log_dist] [Rank 0] step=112000, skipped=0, lr=[2.0350245708025642e-05, 2.0350245708025642e-05, 2.0350245708025642e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 112000/ 115203 | consumed samples: 28672000 | consumed tokens: 58720256000 | elapsed time per iteration (s): 0.37 | learning rate: 2.035E-05 | global batch size: 256 | lm loss: 3.260204E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.820 | TFLOPs: 32.38 | 0: steps: 112000 loss: 3.2287 iter time (s): 0.382 samples/sec: 670.238 7: iteration 112100/ 115203 | consumed samples: 28697600 | consumed tokens: 58772684800 | elapsed time per iteration (s): 0.37 | learning rate: 2.033E-05 | global batch size: 256 | lm loss: 3.259479E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.700 | TFLOPs: 32.33 | 7: iteration 112200/ 115203 | consumed samples: 28723200 | consumed tokens: 58825113600 | elapsed time per iteration (s): 0.37 | learning rate: 2.031E-05 | global batch size: 256 | lm loss: 3.256598E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.106 | TFLOPs: 32.30 | 7: iteration 112300/ 115203 | consumed samples: 28748800 | consumed tokens: 58877542400 | elapsed time per iteration (s): 0.37 | learning rate: 2.029E-05 | global batch size: 256 | lm loss: 3.256645E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.966 | TFLOPs: 32.35 | 7: iteration 112400/ 115203 | consumed samples: 28774400 | consumed tokens: 58929971200 | elapsed time per iteration (s): 0.37 | learning rate: 2.027E-05 | global batch size: 256 | lm loss: 3.261909E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 694.073 | TFLOPs: 32.40 | 7: iteration 112500/ 115203 | consumed samples: 28800000 | consumed tokens: 58982400000 | elapsed time per iteration (s): 0.37 | learning rate: 2.025E-05 | global batch size: 256 | lm loss: 3.256765E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 694.091 | TFLOPs: 32.40 | 7: iteration 112600/ 115203 | consumed samples: 28825600 | consumed tokens: 59034828800 | elapsed time per iteration (s): 0.37 | learning rate: 2.023E-05 | global batch size: 256 | lm loss: 3.256867E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.618 | TFLOPs: 32.38 | 7: iteration 112700/ 115203 | consumed samples: 28851200 | consumed tokens: 59087257600 | elapsed time per iteration (s): 0.37 | learning rate: 2.021E-05 | global batch size: 256 | lm loss: 3.260610E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.916 | TFLOPs: 32.34 | 7: iteration 112800/ 115203 | consumed samples: 28876800 | consumed tokens: 59139686400 | elapsed time per iteration (s): 0.37 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 3.256365E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.744 | TFLOPs: 32.38 | 7: iteration 112900/ 115203 | consumed samples: 28902400 | consumed tokens: 59192115200 | elapsed time per iteration (s): 0.37 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 3.261324E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.930 | TFLOPs: 32.39 | 7: iteration 113000/ 115203 | consumed samples: 28928000 | consumed tokens: 59244544000 | elapsed time per iteration (s): 0.37 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 3.260137E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.994 | TFLOPs: 32.35 | 7: iteration 113100/ 115203 | consumed samples: 28953600 | consumed tokens: 59296972800 | elapsed time per iteration (s): 0.37 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 3.258194E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.507 | TFLOPs: 32.32 | 7: iteration 113200/ 115203 | consumed samples: 28979200 | consumed tokens: 59349401600 | elapsed time per iteration (s): 0.37 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 3.264286E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.178 | TFLOPs: 32.31 | 7: iteration 113300/ 115203 | consumed samples: 29004800 | consumed tokens: 59401830400 | elapsed time per iteration (s): 0.37 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 3.260460E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.426 | TFLOPs: 32.32 | 7: iteration 113400/ 115203 | consumed samples: 29030400 | consumed tokens: 59454259200 | elapsed time per iteration (s): 0.37 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 3.258915E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 693.039 | TFLOPs: 32.35 | 7: iteration 113500/ 115203 | consumed samples: 29056000 | consumed tokens: 59506688000 | elapsed time per iteration (s): 0.37 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 3.257757E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.204 | TFLOPs: 32.31 | 7: iteration 113600/ 115203 | consumed samples: 29081600 | consumed tokens: 59559116800 | elapsed time per iteration (s): 0.37 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 3.260330E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 692.387 | TFLOPs: 32.32 | 7: iteration 113700/ 115203 | consumed samples: 29107200 | consumed tokens: 59611545600 | elapsed time per iteration (s): 0.37 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 3.260601E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 690.192 | TFLOPs: 32.22 | 7: iteration 113800/ 115203 | consumed samples: 29132800 | consumed tokens: 59663974400 | elapsed time per iteration (s): 0.38 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 3.253901E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 666.802 | TFLOPs: 31.12 | 7: iteration 113900/ 115203 | consumed samples: 29158400 | consumed tokens: 59716403200 | elapsed time per iteration (s): 0.37 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 3.260705E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.518 | TFLOPs: 32.00 | 0: [2023-03-17 11:12:35,909] [INFO] [logging.py:68:log_dist] [Rank 0] step=114000, skipped=0, lr=[2.004947884324412e-05, 2.004947884324412e-05, 2.004947884324412e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 114000/ 115203 | consumed samples: 29184000 | consumed tokens: 59768832000 | elapsed time per iteration (s): 0.37 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 3.256629E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.807 | TFLOPs: 31.92 | 0: steps: 114000 loss: 3.2687 iter time (s): 0.369 samples/sec: 694.525 7: iteration 114100/ 115203 | consumed samples: 29209600 | consumed tokens: 59821260800 | elapsed time per iteration (s): 0.37 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 3.257898E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.139 | TFLOPs: 31.98 | 7: iteration 114200/ 115203 | consumed samples: 29235200 | consumed tokens: 59873689600 | elapsed time per iteration (s): 0.37 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 3.256015E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.073 | TFLOPs: 32.07 | 7: iteration 114300/ 115203 | consumed samples: 29260800 | consumed tokens: 59926118400 | elapsed time per iteration (s): 0.37 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 3.260459E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.695 | TFLOPs: 32.15 | 7: iteration 114400/ 115203 | consumed samples: 29286400 | consumed tokens: 59978547200 | elapsed time per iteration (s): 0.37 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 3.257061E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 688.038 | TFLOPs: 32.12 | 7: iteration 114500/ 115203 | consumed samples: 29312000 | consumed tokens: 60030976000 | elapsed time per iteration (s): 0.37 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 3.264987E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.987 | TFLOPs: 32.07 | 7: iteration 114600/ 115203 | consumed samples: 29337600 | consumed tokens: 60083404800 | elapsed time per iteration (s): 0.37 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 3.261461E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.273 | TFLOPs: 32.03 | 7: iteration 114700/ 115203 | consumed samples: 29363200 | consumed tokens: 60135833600 | elapsed time per iteration (s): 0.38 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 3.261295E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.463 | TFLOPs: 31.81 | 7: iteration 114800/ 115203 | consumed samples: 29388800 | consumed tokens: 60188262400 | elapsed time per iteration (s): 0.37 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 3.262063E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.251 | TFLOPs: 32.03 | 7: iteration 114900/ 115203 | consumed samples: 29414400 | consumed tokens: 60240691200 | elapsed time per iteration (s): 0.37 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 3.262746E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.754 | TFLOPs: 32.10 | 7: iteration 115000/ 115203 | consumed samples: 29440000 | consumed tokens: 60293120000 | elapsed time per iteration (s): 0.38 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 3.262730E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.544 | TFLOPs: 31.67 | 7: iteration 115100/ 115203 | consumed samples: 29465600 | consumed tokens: 60345548800 | elapsed time per iteration (s): 0.37 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 3.260055E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 686.266 | TFLOPs: 32.03 | 7: iteration 115200/ 115203 | consumed samples: 29491200 | consumed tokens: 60397977600 | elapsed time per iteration (s): 0.37 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 3.257144E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 687.664 | TFLOPs: 32.10 | 0: [after training is done] datetime: 2023-03-17 11:20:04 0: saving checkpoint at iteration 115203 to checkpoints_146m60b400m 7: ----------------------------------------------------------------------------------------------------------------- 7: validation loss at the end of training for val data | lm loss value: 3.279045E+00 | lm loss PPL: 2.655041E+01 | 7: ----------------------------------------------------------------------------------------------------------------- 0: [2023-03-17 11:20:05,070] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step115203 is begin to save! 0: [2023-03-17 11:20:05,094] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/layer_01-model_00-model_states.pt... 0: [2023-03-17 11:20:05,210] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/layer_01-model_00-model_states.pt. 0: [2023-03-17 11:20:05,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/layer_03-model_00-model_states.pt... 0: [2023-03-17 11:20:05,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/layer_03-model_00-model_states.pt. 0: [2023-03-17 11:20:05,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/layer_04-model_00-model_states.pt... 0: [2023-03-17 11:20:05,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/layer_04-model_00-model_states.pt. 0: [2023-03-17 11:20:05,243] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/layer_05-model_00-model_states.pt... 0: [2023-03-17 11:20:05,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/layer_05-model_00-model_states.pt. 0: [2023-03-17 11:20:05,258] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/layer_06-model_00-model_states.pt... 0: [2023-03-17 11:20:05,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/layer_06-model_00-model_states.pt. 0: [2023-03-17 11:20:05,273] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/layer_07-model_00-model_states.pt... 0: [2023-03-17 11:20:05,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/layer_07-model_00-model_states.pt. 0: [2023-03-17 11:20:05,288] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/layer_08-model_00-model_states.pt... 0: [2023-03-17 11:20:05,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/layer_08-model_00-model_states.pt. 0: [2023-03-17 11:20:05,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/layer_09-model_00-model_states.pt... 0: [2023-03-17 11:20:05,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/layer_09-model_00-model_states.pt. 0: [2023-03-17 11:20:05,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/layer_10-model_00-model_states.pt... 0: [2023-03-17 11:20:05,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/layer_10-model_00-model_states.pt. 0: [2023-03-17 11:20:05,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/layer_11-model_00-model_states.pt... 0: [2023-03-17 11:20:05,347] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/layer_11-model_00-model_states.pt. 0: [2023-03-17 11:20:05,348] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/layer_12-model_00-model_states.pt... 0: [2023-03-17 11:20:05,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/layer_12-model_00-model_states.pt. 0: [2023-03-17 11:20:05,363] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/layer_13-model_00-model_states.pt... 0: [2023-03-17 11:20:05,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/layer_13-model_00-model_states.pt. 0: [2023-03-17 11:20:05,378] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/layer_14-model_00-model_states.pt... 0: [2023-03-17 11:20:05,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/layer_14-model_00-model_states.pt. 0: [2023-03-17 11:20:05,393] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/layer_15-model_00-model_states.pt... 0: [2023-03-17 11:20:05,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/layer_15-model_00-model_states.pt. 0: [2023-03-17 11:20:05,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/layer_16-model_00-model_states.pt... 0: [2023-03-17 11:20:05,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/layer_16-model_00-model_states.pt. 0: [2023-03-17 11:20:05,423] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/layer_17-model_00-model_states.pt... 0: [2023-03-17 11:20:05,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/layer_17-model_00-model_states.pt. 0: [2023-03-17 11:20:05,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/layer_19-model_00-model_states.pt... 0: [2023-03-17 11:20:05,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/layer_19-model_00-model_states.pt. 0: [2023-03-17 11:20:05,439] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b400m/global_step115203/mp_rank_00_model_states.pt 0: [2023-03-17 11:20:05,439] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/mp_rank_00_model_states.pt... 0: [2023-03-17 11:20:05,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/mp_rank_00_model_states.pt. 0: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 4: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 6: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 1: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 3: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 7: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 2: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 4: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 6: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 3: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 11:20:05,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 0: [2023-03-17 11:20:05,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 11:20:05,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 11:20:05,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 11:20:05,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2023-03-17 11:20:05,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 11:20:05,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 11:20:05,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2023-03-17 11:20:05,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 11:20:05,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 11:20:05,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2023-03-17 11:20:05,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 11:20:05,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 11:20:05,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2023-03-17 11:20:05,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 11:20:05,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 11:20:05,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2023-03-17 11:20:05,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 11:20:05,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 11:20:05,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2023-03-17 11:20:05,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 11:20:05,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 11:20:05,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2023-03-17 11:20:05,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 11:20:05,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 11:20:05,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 11:20:05,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 11:20:05,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 11:20:05,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 11:20:05,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 11:20:05,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 11:20:05,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 11:20:05,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 11:20:05,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 11:20:05,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 11:20:05,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 11:20:05,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 11:20:05,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2023-03-17 11:20:05,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2023-03-17 11:20:05,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2023-03-17 11:20:05,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2023-03-17 11:20:05,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2023-03-17 11:20:05,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2023-03-17 11:20:05,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2023-03-17 11:20:05,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 11:20:05,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 11:20:05,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2023-03-17 11:20:05,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 11:20:05,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 11:20:05,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2023-03-17 11:20:05,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 11:20:05,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 11:20:05,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2023-03-17 11:20:05,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 11:20:05,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 11:20:05,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2023-03-17 11:20:05,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 11:20:05,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 11:20:05,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2023-03-17 11:20:05,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 11:20:05,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2023-03-17 11:20:05,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 11:20:05,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 11:20:05,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2023-03-17 11:20:05,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 11:20:05,535] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 11:20:05,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2023-03-17 11:20:05,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 11:20:05,535] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 11:20:05,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2023-03-17 11:20:05,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 11:20:05,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 11:20:05,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2023-03-17 11:20:05,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 11:20:05,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 11:20:05,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 11:20:05,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 11:20:05,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 11:20:05,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 11:20:05,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 11:20:05,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 11:20:05,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 11:20:05,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 11:20:05,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 11:20:05,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 11:20:05,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 11:20:05,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 11:20:05,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 11:20:05,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 11:20:05,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2023-03-17 11:20:05,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2023-03-17 11:20:05,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2023-03-17 11:20:05,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2023-03-17 11:20:05,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2023-03-17 11:20:05,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2023-03-17 11:20:05,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2023-03-17 11:20:05,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2023-03-17 11:20:05,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 11:20:05,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 11:20:05,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 11:20:05,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 11:20:05,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 11:20:05,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 11:20:05,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 11:20:05,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 11:20:05,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 11:20:05,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 11:20:05,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 11:20:05,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 11:20:05,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 11:20:05,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 11:20:05,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 11:20:05,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 11:20:05,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2023-03-17 11:20:05,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2023-03-17 11:20:05,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2023-03-17 11:20:05,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2023-03-17 11:20:05,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2023-03-17 11:20:05,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2023-03-17 11:20:05,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2023-03-17 11:20:05,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2023-03-17 11:20:05,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 11:20:05,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 11:20:05,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 11:20:05,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 11:20:05,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 11:20:05,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 11:20:05,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 11:20:05,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 11:20:05,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 11:20:05,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 11:20:05,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 11:20:05,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 11:20:05,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 11:20:05,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2023-03-17 11:20:05,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2023-03-17 11:20:05,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 11:20:05,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2023-03-17 11:20:05,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2023-03-17 11:20:05,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 11:20:05,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 11:20:05,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2023-03-17 11:20:05,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2023-03-17 11:20:05,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2023-03-17 11:20:05,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2023-03-17 11:20:05,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 11:20:05,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 11:20:05,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 11:20:05,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 11:20:05,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 11:20:05,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 11:20:05,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 11:20:05,673] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 11:20:05,673] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 11:20:05,673] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 11:20:05,673] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 11:20:05,673] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 11:20:05,673] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2023-03-17 11:20:05,673] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2023-03-17 11:20:05,673] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 11:20:05,673] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 11:20:05,673] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2023-03-17 11:20:05,673] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2023-03-17 11:20:05,673] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2023-03-17 11:20:05,673] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2023-03-17 11:20:05,673] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2023-03-17 11:20:05,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 11:20:05,674] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 11:20:05,674] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2023-03-17 11:20:05,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 11:20:05,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 11:20:05,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 11:20:05,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 11:20:05,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 11:20:05,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 11:20:05,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 11:20:05,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 11:20:05,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 11:20:05,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 11:20:05,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 11:20:05,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2023-03-17 11:20:05,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 11:20:05,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 11:20:05,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 11:20:05,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 11:20:05,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b400m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 11:20:05,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2023-03-17 11:20:05,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2023-03-17 11:20:05,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2023-03-17 11:20:05,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2023-03-17 11:20:05,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2023-03-17 11:20:05,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2023-03-17 11:20:05,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: successfully saved checkpoint at iteration 115203 to checkpoints_146m60b400m END 3326734: Fri 17 Mar 2023 11:20:36 AM EET